From speechsc-bounces@ietf.org Sun Jul 02 15:37:42 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1Fx7lE-0001qT-3a; Sun, 02 Jul 2006 15:37:40 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1Fx7lC-0001kV-Ug
	for speechsc@ietf.org; Sun, 02 Jul 2006 15:37:38 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Fx7lC-00049J-1o
	for speechsc@ietf.org; Sun, 02 Jul 2006 15:37:38 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id 738CB214105; Sun,  2 Jul 2006 19:37:36 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-4.2 required=5.5 tests=ALL_TRUSTED,AWL,BAYES_00,
	HTML_50_60,HTML_MESSAGE autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (dsl-34-34.dsl.netsource.ie [213.79.34.34])
	by mail.voxpilot.com (Postfix) with ESMTP id 887532140F5
	for <speechsc@ietf.org>; Sun,  2 Jul 2006 19:37:30 +0000 (GMT)
Message-ID: <08b301c69e0e$ed374170$6600000a@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: <speechsc@ietf.org>
Subject: [speechsc] Review of recogniser resource 
Date: Sun, 2 Jul 2006 20:37:16 +0100
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 025f8c5000216988bfe31585db759250
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1716346198=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============1716346198==
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_08B0_01C69E17.4E9CE7A0"

This is a multi-part message in MIME format.

------=_NextPart_000_08B0_01C69E17.4E9CE7A0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I reviewed the recogniser resource (section 9) and have a few issues to =
report below (in an approximate decreasing order of severity).

Dave

~~~

In section 9.10 STOP, make language consistent with the speechsynth =
resource with regard to queuing aspects (i.e. when Cancel-If-Queue is =
false). In other words:
    a. clarify that active-request-id-list can be specified in the STOP =
request
    b. clarify that after a STOP, the next PENDING RECOGNIZE gets =
processed
    c. clarify that if no active-request-id-list is specified in STOP, =
then all requests are terminated

~~~

What happens if you call END-PHRASE-ENROLLMENT but =
<num-repetitions-still-needed> > 0? Is the badly trained phrase still =
committed?

~~~

Clarify that MODIFY-PHRASE and DELETE-PHRASE can occur outside of =
START-ENROLLMENT-SESSION

What happens if you attempt to modify or delete a phrase you're =
enrolling? This suggests only allowing MODIFY-PHRASE / DELETE-PHRASE =
outside of START-ENROLLMENT-SESSION.

~~~

9.4.25 Ver-Buffer-Utterance says "The client MUST NOT send this header =
unless a verification resource is instantiated for the session".
What happens if it does?=20

~~~

In 9.4.11, 004, 005, 009, 012 can presumably occur in a DEFINE-GRAMMAR =
407 response but this is not specified in the table

~~~

Why is Cancel-If-Queue required on every RECOGNIZE (note that none of =
the example show this!)? Why not SET-PARAMS/GET-PARAMS and default to =
false (i.e. same behavior as speechsynth)?

~~~

GET-PARAM/SET-PARAM should be allowed (i.e. specified) for =
Hotword-Max-Duration, Hotword-Min-Duration, Early-No-Match

~~~

Clarify which methods/responses/events Save-Waveform appears in. =
Presumably RECOGNIZE, GET-PARAMS, SET-PARAMS?

~~~=20

Specify where Personal-Grammar-URI appears - presumably =
START-PHRASE-ENROLLMENT, MODIFY-PHRASE, DELETE-PHRASE?

~~~

Specify where Save-Best-Waveform appears - presumably just =
START-PHRASE-ENROLLMENT?

~~~

Clarify - does Speech-Complete-Timeout / Speech-Incomplete-Timeout / =
DTMF-Interdigit-Timeout / DTMF-Term-Timeout also apply to hotword? I =
presume they do subject to usual caveat that no-match of any kind will =
be returned.

~~~

Clarify the confidence score report in NLSML recognition results is 1.0 =
for a match with INTERPRET

~~~

If you delete all phrases from a grammar, is the Personal-Grammar-URI =
deleted?

~~~

What's the specified Completion-Cause for a RECOGNIZE with =
Enroll-Utterance true?
    -> The definition of 000 success needs to be expanded.

~~~

Which headers can be specified in GET-RESULT? For example, =
Confidence-Level says it only appears in RECOGNIZE / SET-PARAMS, etc

~~~

Specify Ver-Buffer-Utterance only allowed in RECOGNIZE for this =
resource?. Specify default value of false.

~~~

What grammar format must be supported by Confusable-Phrases-URI? Can =
this header be set with SET-PARAMS? Does the protocol allow you to =
enroll confusable phrases and just warn via NLSML?

~~~


------=_NextPart_000_08B0_01C69E17.4E9CE7A0
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2900.2873" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>I reviewed the recogniser resource =
(section 9) and=20
have a few issues&nbsp;to report below (in an approximate decreasing =
order of=20
severity).</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV>
<DIV><FONT face=3DArial size=3D2>In section 9.10 STOP, make language =
consistent with=20
the speechsynth resource with regard to queuing aspects (i.e. when=20
Cancel-If-Queue is false). In other words:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; a. clarify that=20
active-request-id-list can be specified in the STOP request</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; b. clarify that =
after a STOP,=20
the next PENDING RECOGNIZE gets processed</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; c. clarify that if =
no=20
active-request-id-list is specified in STOP, then all requests are=20
terminated</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>What happens if you call =
END-PHRASE-ENROLLMENT but=20
&lt;num-repetitions-still-needed&gt; &gt; 0? Is the badly trained phrase =
still=20
committed?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV>
<DIV>
<DIV><FONT face=3DArial size=3D2>Clarify that MODIFY-PHRASE and =
DELETE-PHRASE can=20
occur outside of START-ENROLLMENT-SESSION</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>What happens if you attempt to modify =
or delete a=20
phrase you're enrolling? This suggests only allowing MODIFY-PHRASE /=20
DELETE-PHRASE outside of START-ENROLLMENT-SESSION.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV>
<DIV><FONT face=3DArial size=3D2>9.4.25 Ver-Buffer-Utterance says "The =
client MUST=20
NOT send this header unless a verification resource is instantiated for =
the=20
session".</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>What happens if it does? </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>In 9.4.11, 004, 005, 009, 012 can =
presumably occur=20
in a DEFINE-GRAMMAR 407 response but this is not specified in the=20
table</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Why is Cancel-If-Queue required on =
every RECOGNIZE=20
(note that none of the example show this!)? Why =
not&nbsp;SET-PARAMS/GET-PARAMS=20
and default to&nbsp;false (i.e. same behavior as =
speechsynth)?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV></DIV></DIV></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>GET-PARAM/SET-PARAM should be allowed =
(i.e.=20
specified) for Hotword-Max-Duration, Hotword-Min-Duration,=20
Early-No-Match</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Clarify which methods/responses/events=20
Save-Waveform appears in. Presumably RECOGNIZE, GET-PARAMS,=20
SET-PARAMS?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT> </DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Specify where Personal-Grammar-URI =
appears -=20
presumably&nbsp;START-PHRASE-ENROLLMENT, MODIFY-PHRASE,=20
DELETE-PHRASE?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Specify where Save-Best-Waveform =
appears -=20
presumably just START-PHRASE-ENROLLMENT?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Clarify - does Speech-Complete-Timeout =
/=20
Speech-Incomplete-Timeout / DTMF-Interdigit-Timeout / DTMF-Term-Timeout =
also=20
apply to hotword? I presume they do subject to usual caveat that =
no-match of any=20
kind will be returned.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Clarify the confidence score report in =
NLSML=20
recognition results is 1.0 for a match with&nbsp;INTERPRET</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV>
<DIV><FONT face=3DArial size=3D2>If you delete all phrases from a =
grammar, is the=20
Personal-Grammar-URI deleted?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>What's the specified Completion-Cause =
for a=20
RECOGNIZE with Enroll-Utterance true?<BR>&nbsp;&nbsp;&nbsp; -&gt; The =
definition=20
of 000 success needs to be expanded.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Which headers can be specified in =
GET-RESULT? For=20
example, Confidence-Level says it only appears in RECOGNIZE / =
SET-PARAMS,=20
etc</FONT></DIV><FONT face=3DArial size=3D2></FONT></DIV>
<DIV><FONT face=3DArial size=3D2>
<DIV>&nbsp;</DIV>
<DIV>~~~</DIV>
<DIV>&nbsp;</DIV>
<DIV>Specify Ver-Buffer-Utterance only allowed in RECOGNIZE for this =
resource?.=20
Specify default value of false.</DIV>
<DIV>&nbsp;</DIV>
<DIV>~~~</DIV>
<DIV>&nbsp;</DIV>
<DIV>What grammar format must be supported by Confusable-Phrases-URI? =
Can this=20
header be set with SET-PARAMS? Does the protocol allow you to enroll =
confusable=20
phrases and just warn via NLSML?</DIV>
<DIV>&nbsp;</DIV>
<DIV>~~~</DIV>
<DIV>&nbsp;</DIV>
<DIV><BR>&nbsp;</DIV></FONT></DIV></BODY></HTML>

------=_NextPart_000_08B0_01C69E17.4E9CE7A0--


--===============1716346198==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============1716346198==--


From speechsc-bounces@ietf.org Sun Jul 02 15:59:40 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1Fx86T-0002az-Ps; Sun, 02 Jul 2006 15:59:37 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1Fx86S-0002ar-CV
	for speechsc@ietf.org; Sun, 02 Jul 2006 15:59:36 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Fx86Q-0007JR-HK
	for speechsc@ietf.org; Sun, 02 Jul 2006 15:59:36 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id BF2C72140F5; Sun,  2 Jul 2006 19:59:33 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-4.2 required=5.5 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (dsl-34-34.dsl.netsource.ie [213.79.34.34])
	by mail.voxpilot.com (Postfix) with ESMTP
	id 2CD992140F5; Sun,  2 Jul 2006 19:59:28 +0000 (GMT)
Message-ID: <09d501c69e11$febc8c40$6600000a@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: "Burger, Eric" <eburger@cantata.com>,
	"Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>,
	"IETF SPEECHSC (E-mail)" <speechsc@ietf.org>
References: <330A23D8336C0346B5C1A5BB1966664703389116@ATLANTIS.Brooktrout.com>
Subject: Re: [speechsc] The NLSML schema and namespaces
Date: Sun, 2 Jul 2006 20:59:14 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 2a76bcd37b1c8a21336eb0a1ea6bbf48
Cc: 
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Errors-To: speechsc-bounces@ietf.org

"Existing" NLSML (i.e. the public NLSML document) has no defined schema. I 
believe this thread is about about fixing MRCPv2 NLSML.  I agree with all 
the suggestions below and think they're easily applied...

Dave


----- Original Message ----- 
From: "Burger, Eric" <eburger@cantata.com>
To: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>; "IETF SPEECHSC (E-mail)" 
<speechsc@ietf.org>
Sent: Thursday, June 29, 2006 9:15 PM
Subject: RE: [speechsc] The NLSML schema and namespaces


What does "existing NLSML" do?  I'm not interested in fixing NLSML.  At
the rate we're going, we'll be looking at EMMA v6.0 :-(

-----Original Message-----
From: Andrew Wahbe [mailto:Andrew.Wahbe@genesyslab.com]
Sent: Tuesday, June 27, 2006 3:46 PM
To: IETF SPEECHSC (E-mail)
Subject: [speechsc] The NLSML schema and namespaces

I would like to raise a few issues with both the NSLML schema and it's
use of namespaces.

First, SRGS and SISR allow you to define a grammar so that multiple
token sequences map to one string literal result. For example, "yes",
"ya", "sure", "yes please", and "ok" could all result in the string
literal result "yes". Thus, if you said "sure", the string literal
interpretation result would be "yes".

Unfortunately there doesn't seem to be a way to specify string literals
in NLSML. You would think that the example above could be expressed as
follows:

<?xml version="1.0" encoding="UTF-8"?>
<result xmlns="http://www.ietf.org/xml/ns/mrcpv2">
  <interpretation confidence="0.9">
    <instance>yes</instance>
    <input mode="speech">sure</input>
  </interpretation>
</result>

However this isn't allowed by the NLSML schema in the current MRCPv2
draft. This could be allowed by changing the <instance> type to allow
"mixed" contents (see the definition of <input>). Also, we would need to
change the schema to allow <instance> to have no child elements.
Applying these changes we get the following element definition:

<xs:element name="instance" minOccurs="0">
 <xs:complexType mixed="true">
  <xs:sequence minOccurs="0">
   <xs:any/>
  </xs:sequence>
 </xs:complexType>
</xs:element>

Of course this allows for a mix of text and elements (eg. <instance> yes
<no/> maybe </instance>) which is probably not desirable. XML schema has
no way to restrict this but the format we define could specify it this
way (in the text of the spec). The alternative would be to do what EMMA
does with the <emma:literal/> element. Either way would be fine with me.

The second issue is with the <xs:any/> portion of the instance element
definition. As currently defined, a schema validator will try to
validate it's contents even if a schema is not available. We should
probably relax this by adding a processContents attribute of "lax". This
will cause the validator to only process the contents if a schema is
available.

Also, this currently allows any elements, including those from the NLSML
namespace to be within an <instance/> element. I'm guessing that we
actually want to allow elements from other namespaces, and to restrict
it to elements from other namespaces. E.g. you shouldn't be able to do
this:

<result xmlns="http://www.ietf.org/xml/ns/mrcpv2">
 <interpretation>
   <instance>
     <result>
       <interpretation>
         <instance/>
       </interpretation>
     </result>
   </instance>
 </interpretation>
</result>

However, this is ok:

<result xmlns="http://www.ietf.org/xml/ns/mrcpv2">
 <interpretation>
   <instance>
     <result xmlns="http://example.com/myNamespace">
       <interpretation>
         <instance/>
       </interpretation>
     </result>
   </instance>
 </interpretation>
</result>

The final element definition for <instance/> would then be:
<xs:element name="instance" minOccurs="0">
 <xs:complexType mixed="true">
  <xs:sequence minOccurs="0">
   <xs:any namespace="##other" processContents="lax"/>
  </xs:sequence>
 </xs:complexType>
</xs:element>

Of course, this also raises the issue that all of the examples in the
spec don't declare namespaces at all. It would probably be a good idea
to do this properly.
So examples such as this:
<?xml version="1.0"?>
<result grammar="session:request1@form-level.store">
 <interpretation>
  <instance name="Person">
   <Person>
    <Name> Andre Roy </Name>
   </Person>
  </instance>
  <input>   may I speak to Andre Roy </input>
 </interpretation>
</result>

Become this:

<?xml version="1.0"?>
<nl:result xmlns:nl="http://www.ietf.org/xml/ns/mrcpv2"
           xmlns="http://www.example.com/example"
           grammar="session:request1@form-level.store">
    <nl:interpretation>
        <nl:instance>
            <Person>
                <Name> Andre Roy </Name>
</Person>
        </nl:instance>
        <nl:input>   may I speak to Andre Roy </nl:input>
    </nl:interpretation>
</nl:result>

Finally, it is not clear what the namespace of the NSLML format is
supposed to be. The schema says this:
   <xs:schema     xmlns:xs="http://www.w3.org/2001/XMLSchema"
               targetNamespace="http://www.ietf.org/xml/schema/mrcpv2"
               xmlns="http://www.ietf.org/xml/ns/mrcpv2" ...

I have a feeling that "http://www.ietf.org/xml/schema/mrcpv2" is
supposed to be the location of the schema and
"http://www.ietf.org/xml/ns/mrcpv2" is supposed to be the namespace for
NLSML. If that is the case, then the schema should be written this way:
   <xs:schema     xmlns:xs="http://www.w3.org/2001/XMLSchema"
               targetNamespace="http://www.ietf.org/xml/ns/mrcpv2"
               xmlns="http://www.ietf.org/xml/ns/mrcpv2" ...

The schema location is not referenced in the schema content at all.
Either way, the default namespace and the targetNamespace should match
otherwise referencing the "confidenceinfo" simpleType in the definitions
of the confidence attributes does not work properly.
Thanks,

Andrew Wahbe

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


From speechsc-bounces@ietf.org Sun Jul 02 16:11:36 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1Fx8I3-0006iv-Dc; Sun, 02 Jul 2006 16:11:35 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1Fx8I1-0006i0-Pd
	for speechsc@ietf.org; Sun, 02 Jul 2006 16:11:33 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Fx8Hz-0007Yg-0K
	for speechsc@ietf.org; Sun, 02 Jul 2006 16:11:33 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id E74A32140F5; Sun,  2 Jul 2006 20:11:29 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-4.1 required=5.5 tests=ALL_TRUSTED,AWL,BAYES_00,
	HTML_90_100,HTML_MESSAGE autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (dsl-34-34.dsl.netsource.ie [213.79.34.34])
	by mail.voxpilot.com (Postfix) with ESMTP
	id EB77C214046; Sun,  2 Jul 2006 20:11:22 +0000 (GMT)
Message-ID: <0a4c01c69e13$a8a54b10$6600000a@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: <arvinds@huawei.com>,
	"IETF SPEECHSC (E-mail)" <speechsc@ietf.org>
References: <002c01c69a7c$1b954f30$9665460a@china.huawei.com>
Subject: Re: [Speechsc] Reg. Unsupported/Bad parameters in requests....
Date: Sun, 2 Jul 2006 21:11:09 +0100
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.2 (/)
X-Scan-Signature: c7b65f969b5da4cf846ca7ebcd192223
Cc: 
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0015453947=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============0015453947==
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0A49_01C69E1C.0A1EC760"

This is a multi-part message in MIME format.

------=_NextPart_000_0A49_01C69E1C.0A1EC760
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I believe the intent is that 403 Unsupported header and 404 Illegal =
value are returned for any request method along with the bad header(s). =
My suggestIon is to clarify section 5.4 such that for 403 and 404, the =
response "MUST include the bad or unsupported headers and their values =
exactly as they were sent from the client." For 406, the mandatory =
header should be returned with no value.

Dave
  ----- Original Message -----=20
  From: Arvind Saraswat=20
  To: IETF SPEECHSC (E-mail)=20
  Sent: Wednesday, June 28, 2006 7:28 AM
  Subject: [Speechsc] Reg. Unsupported/Bad parameters in requests....


  Hi,

      In MRCPv2 specification for SET-PARAMS it is mentioned that... =
(6.1.1)

  =20

  The "SET-PARAMS" method, from the client to the server, tells the

     MRCPv2 resource to define parameters for the session, such as voice

     characteristics and prosody on synthesizers, recognition timers on

     recognizers, etc.  If the server accepts and sets all parameters it

     MUST return a Response-Status of 200.  If it chooses to ignore some

     optional headers that can be safely ignored without affecting

     operation of the server it MUST return 201.

  =20

     If some of the headers being set are unsupported for the resource =
or

     have illegal values, the server MUST reject the request with a 403

     Unsupported Header or 404 Illegal Value for Header, as appropriate.

     If the request had both bad and unsupported parameters 404 MUST be

     returned.  Such a response MUST include the bad or unsupported

     headers and their values exactly as they were sent from the client.

     Session parameters modified using "SET-PARAMS" do not override

     parameters explicitly specified on individual requests or requests

     that are in-PROGRESS.

  =20

  It is clear from above that if SET-PARAM method tries to SET an =
unsupported or bad parameter, the server will responds with 403 or 404 =
as the case may be and will send the header and value exactly as sent in =
the initial request method.

  =20

  However, what will happen if the parameter is set using SPEAK or =
RECOGNIZE or DEFINE-GRAMMAR or any other similar method? Will the =
response to these messages is similar to SET-PARAMS? The following =
options are there:

  a) Just send the error code (response line)

  b) Include all the parameters that could not be set/defined by the =
request,

  c) any other...

  =20

  regards
  Arvind

  =20

  -Arvind


-------------------------------------------------------------------------=
-----


  _______________________________________________
  Speechsc mailing list
  Speechsc@ietf.org
  https://www1.ietf.org/mailman/listinfo/speechsc

------=_NextPart_000_0A49_01C69E1C.0A1EC760
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML xmlns=3D"http://www.w3.org/TR/REC-html40" xmlns:v =3D=20
"urn:schemas-microsoft-com:vml" xmlns:o =3D=20
"urn:schemas-microsoft-com:office:office" xmlns:w =3D=20
"urn:schemas-microsoft-com:office:word"><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2900.2873" name=3DGENERATOR><!--[if !mso]>
<STYLE>v\:* {
	BEHAVIOR: url(#default#VML)
}
o\:* {
	BEHAVIOR: url(#default#VML)
}
w\:* {
	BEHAVIOR: url(#default#VML)
}
.shape {
	BEHAVIOR: url(#default#VML)
}
</STYLE>
<![endif]-->
<STYLE>
<!--
 /* Font Definitions */
 @font-face
	{font-family:Courier;
	panose-1:2 7 4 9 2 2 5 2 4 4;}
@font-face
	{font-family:\5B8B\4F53;
	panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
	{font-family:\9ED1\4F53;
	panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
	{font-family:\6977\4F53_GB2312;}
@font-face
	{font-family:"\@\5B8B\4F53";
	panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
	{font-family:"\@\9ED1\4F53";
	panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
	{font-family:"\@\6977\4F53_GB2312";}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin-top:0cm;
	margin-right:0cm;
	margin-bottom:0cm;
	margin-left:10.0pt;
	margin-bottom:.0001pt;
	mso-para-margin-top:0cm;
	mso-para-margin-right:0cm;
	mso-para-margin-bottom:0cm;
	mso-para-margin-left:2.0gd;
	mso-para-margin-bottom:.0001pt;
	line-height:150%;
	text-autospace:none;
	font-size:10.5pt;
	font-family:"Times New Roman";}
h1
	{margin-top:12.0pt;
	margin-right:0cm;
	margin-bottom:12.0pt;
	margin-left:21.55pt;
	text-align:justify;
	text-justify:inter-ideograph;
	text-indent:-21.55pt;
	page-break-after:avoid;
	mso-list:l10 level1 lfo35;
	font-size:16.0pt;
	font-family:Arial;}
h2
	{margin-top:12.0pt;
	margin-right:0cm;
	margin-bottom:12.0pt;
	margin-left:28.8pt;
	text-align:justify;
	text-justify:inter-ideograph;
	text-indent:-28.8pt;
	page-break-after:avoid;
	mso-list:l10 level2 lfo35;
	font-size:12.0pt;
	font-family:Arial;
	font-weight:normal;}
h3
	{margin-top:13.0pt;
	margin-right:0cm;
	margin-bottom:13.0pt;
	margin-left:36.0pt;
	mso-para-margin-top:13.0pt;
	mso-para-margin-right:0cm;
	mso-para-margin-bottom:13.0pt;
	mso-para-margin-left:2.0gd;
	text-align:justify;
	text-justify:inter-ideograph;
	text-indent:-36.0pt;
	line-height:173%;
	page-break-after:avoid;
	mso-list:l10 level3 lfo35;
	font-size:12.0pt;
	font-family:Arial;
	font-weight:normal;}
p.MsoHeader, li.MsoHeader, div.MsoHeader
	{margin:0cm;
	margin-bottom:.0001pt;
	text-align:justify;
	text-justify:inter-ideograph;
	layout-grid-mode:char;
	font-size:9.0pt;
	font-family:Arial;}
p.MsoFooter, li.MsoFooter, div.MsoFooter
	{margin:0cm;
	margin-bottom:.0001pt;
	font-size:9.0pt;
	font-family:Arial;}
a:link, span.MsoHyperlink
	{color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{color:purple;
	text-decoration:underline;}
p.Table, li.Table, div.Table
	{margin-top:5.0pt;
	margin-right:0cm;
	margin-bottom:0cm;
	margin-left:0cm;
	margin-bottom:.0001pt;
	mso-para-margin-top:1.0gd;
	mso-para-margin-right:0cm;
	mso-para-margin-bottom:0cm;
	mso-para-margin-left:0cm;
	mso-para-margin-bottom:.0001pt;
	text-align:center;
	text-indent:0cm;
	mso-list:l7 level9 lfo5;
	font-size:9.0pt;
	font-family:Arial;}
p.TableText, li.TableText, div.TableText
	{margin:0cm;
	margin-bottom:.0001pt;
	font-size:10.5pt;
	font-family:Arial;}
p.TableHeader, li.TableHeader, div.TableHeader
	{margin:0cm;
	margin-bottom:.0001pt;
	text-align:center;
	font-size:10.5pt;
	font-family:Arial;
	font-weight:bold;}
p.FigureStyle, li.FigureStyle, div.FigureStyle
	{margin-top:4.0pt;
	margin-right:0cm;
	margin-bottom:4.0pt;
	margin-left:0cm;
	text-align:center;
	line-height:150%;
	page-break-after:avoid;
	text-autospace:none;
	font-size:10.5pt;
	font-family:"Times New Roman";}
p.DocumentTitle, li.DocumentTitle, div.DocumentTitle
	{margin-top:15.0pt;
	margin-right:0cm;
	margin-bottom:15.0pt;
	margin-left:0cm;
	text-align:center;
	line-height:150%;
	text-autospace:none;
	font-size:18.0pt;
	font-family:Arial;}
p.NotesHeader, li.NotesHeader, div.NotesHeader
	{margin-top:0cm;
	margin-right:0cm;
	margin-bottom:0cm;
	margin-left:10.0pt;
	margin-bottom:.0001pt;
	mso-para-margin-top:0cm;
	mso-para-margin-right:0cm;
	mso-para-margin-bottom:0cm;
	mso-para-margin-left:2.0gd;
	mso-para-margin-bottom:.0001pt;
	text-align:justify;
	text-justify:inter-ideograph;
	line-height:150%;
	text-autospace:none;
	border:none;
	padding:0cm;
	font-size:9.0pt;
	font-family:Arial;}
p.NotesText, li.NotesText, div.NotesText
	{margin-top:0cm;
	margin-right:0cm;
	margin-bottom:0cm;
	margin-left:10.0pt;
	margin-bottom:.0001pt;
	mso-para-margin-top:0cm;
	mso-para-margin-right:0cm;
	mso-para-margin-bottom:0cm;
	mso-para-margin-left:2.0gd;
	mso-para-margin-bottom:.0001pt;
	text-align:justify;
	text-justify:inter-ideograph;
	text-indent:18.0pt;
	line-height:150%;
	text-autospace:none;
	border:none;
	padding:0cm;
	font-size:9.0pt;
	font-family:Arial;}
p.CompilingAdvice, li.CompilingAdvice, div.CompilingAdvice
	{margin-top:0cm;
	margin-right:0cm;
	margin-bottom:0cm;
	margin-left:10.0pt;
	margin-bottom:.0001pt;
	mso-para-margin-top:0cm;
	mso-para-margin-right:0cm;
	mso-para-margin-bottom:0cm;
	mso-para-margin-left:2.0gd;
	mso-para-margin-bottom:.0001pt;
	line-height:150%;
	text-autospace:none;
	font-size:10.5pt;
	font-family:Arial;
	color:blue;
	font-style:italic;}
span.EmailStyle28
	{mso-style-type:personal-compose;
	font-family:Courier;
	color:blue;
	font-weight:normal;
	font-style:normal;
	text-decoration:none none;}
p.Figure, li.Figure, div.Figure
	{margin:0cm;
	margin-bottom:.0001pt;
	text-align:center;
	text-indent:0cm;
	line-height:150%;
	mso-list:l7 level8 lfo5;
	text-autospace:none;
	font-size:10.5pt;
	font-family:"Times New Roman";}
 /* Page Definitions */

 @page
	{mso-endnote-separator:url("cid:header.htm\@01C69ABF.2962A700") es;
	=
mso-endnote-continuation-separator:url("cid:header.htm\@01C69ABF.2962A700=
") ecs;}
@page Section1
	{size:595.3pt 841.9pt;
	margin:72.0pt 90.0pt 72.0pt 90.0pt;
	mso-footer:url("cid:header.htm\@01C69ABF.2962A700") f1;
	layout-grid:15.6pt;}
div.Section1
	{page:Section1;}
 /* List Definitions */
 @list l0
	{mso-list-id:171800355;
	mso-list-template-ids:-1278163850;}
@list l0:level1
	{mso-level-text:%1;
	mso-level-tab-stop:21.6pt;
	mso-level-number-position:left;
	margin-left:21.6pt;
	text-indent:-21.6pt;}
@list l0:level2
	{mso-level-text:"%1\.%2";
	mso-level-tab-stop:28.8pt;
	mso-level-number-position:left;
	margin-left:28.8pt;
	text-indent:-28.8pt;}
@list l0:level3
	{mso-level-text:"%1\.%2\.%3";
	mso-level-tab-stop:36.0pt;
	mso-level-number-position:left;
	margin-left:36.0pt;
	text-indent:-36.0pt;}
@list l0:level4
	{mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l0:level5
	{mso-level-text:%5\FF09;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l0:level6
	{mso-level-number-format:alpha-lower;
	mso-level-text:%6\FF09;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l0:level7
	{mso-level-number-format:roman-lower;
	mso-level-text:%7;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l0:level8
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8";
	mso-level-tab-stop:72.0pt;
	mso-level-number-position:left;
	margin-left:72.0pt;
	text-indent:-72.0pt;}
@list l0:level9
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8\.%9";
	mso-level-tab-stop:79.2pt;
	mso-level-number-position:left;
	margin-left:79.2pt;
	text-indent:-79.2pt;}
@list l1
	{mso-list-id:191647984;
	mso-list-template-ids:345692754;}
@list l1:level1
	{mso-level-number-format:alpha-upper;
	mso-level-text:\9644\5F55%1;
	mso-level-tab-stop:64.15pt;
	mso-level-number-position:left;
	margin-left:64.15pt;
	text-indent:-21.6pt;}
@list l1:level2
	{mso-level-text:"%1\.%2";
	mso-level-tab-stop:71.35pt;
	mso-level-number-position:left;
	margin-left:71.35pt;
	text-indent:-28.8pt;}
@list l1:level3
	{mso-level-text:"%1\.%2\.%3";
	mso-level-tab-stop:78.55pt;
	mso-level-number-position:left;
	margin-left:78.55pt;
	text-indent:-36.0pt;}
@list l1:level4
	{mso-level-tab-stop:70.9pt;
	mso-level-number-position:left;
	margin-left:89.35pt;
	text-indent:-34.0pt;}
@list l1:level5
	{mso-level-text:%5\FF09;
	mso-level-tab-stop:70.9pt;
	mso-level-number-position:left;
	margin-left:89.35pt;
	text-indent:-34.0pt;}
@list l1:level6
	{mso-level-number-format:alpha-lower;
	mso-level-text:%6\FF09;
	mso-level-tab-stop:70.9pt;
	mso-level-number-position:left;
	margin-left:89.35pt;
	text-indent:-34.0pt;}
@list l1:level7
	{mso-level-number-format:roman-lower;
	mso-level-text:%7;
	mso-level-tab-stop:70.9pt;
	mso-level-number-position:left;
	margin-left:89.35pt;
	text-indent:-34.0pt;}
@list l1:level8
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8";
	mso-level-tab-stop:114.55pt;
	mso-level-number-position:left;
	margin-left:114.55pt;
	text-indent:-72.0pt;}
@list l1:level9
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8\.%9";
	mso-level-tab-stop:121.75pt;
	mso-level-number-position:left;
	margin-left:121.75pt;
	text-indent:-79.2pt;}
@list l2
	{mso-list-id:541409008;
	mso-list-template-ids:-249166292;}
@list l2:level1
	{mso-level-number-format:alpha-upper;
	mso-level-text:\9644\5F55%1;
	mso-level-tab-stop:21.6pt;
	mso-level-number-position:left;
	margin-left:21.6pt;
	text-indent:-21.6pt;}
@list l2:level2
	{mso-level-text:"%1\.%2";
	mso-level-tab-stop:28.8pt;
	mso-level-number-position:left;
	margin-left:28.8pt;
	text-indent:-28.8pt;}
@list l2:level3
	{mso-level-text:"%1\.%2\.%3";
	mso-level-tab-stop:36.0pt;
	mso-level-number-position:left;
	margin-left:36.0pt;
	text-indent:-36.0pt;}
@list l2:level4
	{mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l2:level5
	{mso-level-text:%5\FF09;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l2:level6
	{mso-level-number-format:alpha-lower;
	mso-level-text:%6\FF09;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l2:level7
	{mso-level-number-format:roman-lower;
	mso-level-text:%7;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l2:level8
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8";
	mso-level-tab-stop:72.0pt;
	mso-level-number-position:left;
	margin-left:72.0pt;
	text-indent:-72.0pt;}
@list l2:level9
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8\.%9";
	mso-level-tab-stop:79.2pt;
	mso-level-number-position:left;
	margin-left:79.2pt;
	text-indent:-79.2pt;}
@list l3
	{mso-list-id:731736200;
	mso-list-template-ids:67698717;}
@list l3:level1
	{mso-level-text:%1;
	mso-level-tab-stop:21.25pt;
	mso-level-number-position:left;
	margin-left:21.25pt;
	text-indent:-21.25pt;}
@list l3:level2
	{mso-level-text:"%1\.%2";
	mso-level-tab-stop:57.25pt;
	mso-level-number-position:left;
	margin-left:49.6pt;
	text-indent:-1.0cm;}
@list l3:level3
	{mso-level-text:"%1\.%2\.%3";
	mso-level-tab-stop:96.55pt;
	mso-level-number-position:left;
	margin-left:70.9pt;
	text-indent:-1.0cm;}
@list l3:level4
	{mso-level-text:"%1\.%2\.%3\.%4";
	mso-level-tab-stop:153.8pt;
	mso-level-number-position:left;
	margin-left:99.2pt;
	text-indent:-35.4pt;}
@list l3:level5
	{mso-level-text:"%1\.%2\.%3\.%4\.%5";
	mso-level-tab-stop:193.05pt;
	mso-level-number-position:left;
	margin-left:127.55pt;
	text-indent:-42.5pt;}
@list l3:level6
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6";
	mso-level-tab-stop:232.3pt;
	mso-level-number-position:left;
	margin-left:163.0pt;
	text-indent:-2.0cm;}
@list l3:level7
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7";
	mso-level-tab-stop:271.55pt;
	mso-level-number-position:left;
	margin-left:191.35pt;
	text-indent:-63.8pt;}
@list l3:level8
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8";
	mso-level-tab-stop:310.8pt;
	mso-level-number-position:left;
	margin-left:219.7pt;
	text-indent:-70.9pt;}
@list l3:level9
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8\.%9";
	mso-level-tab-stop:350.1pt;
	mso-level-number-position:left;
	margin-left:255.1pt;
	text-indent:-85.0pt;}
@list l4
	{mso-list-id:818422186;
	mso-list-template-ids:1344984950;}
@list l4:level1
	{mso-level-text:%1;
	mso-level-tab-stop:21.6pt;
	mso-level-number-position:left;
	margin-left:21.6pt;
	text-indent:-21.6pt;}
@list l4:level2
	{mso-level-text:"%1\.%2";
	mso-level-tab-stop:28.8pt;
	mso-level-number-position:left;
	margin-left:28.8pt;
	text-indent:-28.8pt;}
@list l4:level3
	{mso-level-text:"%1\.%2\.%3";
	mso-level-tab-stop:36.0pt;
	mso-level-number-position:left;
	margin-left:36.0pt;
	text-indent:-36.0pt;}
@list l4:level4
	{mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l4:level5
	{mso-level-text:%5\FF09;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l4:level6
	{mso-level-number-format:alpha-lower;
	mso-level-text:%6\FF09;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l4:level7
	{mso-level-number-format:roman-lower;
	mso-level-text:%7;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l4:level8
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8";
	mso-level-tab-stop:72.0pt;
	mso-level-number-position:left;
	margin-left:72.0pt;
	text-indent:-72.0pt;}
@list l4:level9
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8\.%9";
	mso-level-tab-stop:79.2pt;
	mso-level-number-position:left;
	margin-left:79.2pt;
	text-indent:-79.2pt;}
@list l5
	{mso-list-id:838886720;
	mso-list-template-ids:-819953982;}
@list l5:level1
	{mso-level-text:%1;
	mso-level-tab-stop:21.6pt;
	mso-level-number-position:left;
	margin-left:21.6pt;
	text-indent:-21.6pt;
	mso-ansi-font-size:18.0pt;
	mso-bidi-font-size:18.0pt;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l5:level2
	{mso-level-text:"%1\.%2";
	mso-level-tab-stop:28.8pt;
	mso-level-number-position:left;
	margin-left:28.8pt;
	text-indent:-28.8pt;
	mso-ansi-font-size:15.0pt;
	mso-bidi-font-size:15.0pt;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l5:level3
	{mso-level-text:"%1\.%2\.%3";
	mso-level-tab-stop:36.0pt;
	mso-level-number-position:left;
	margin-left:36.0pt;
	text-indent:-36.0pt;
	mso-ansi-font-size:12.0pt;
	mso-bidi-font-size:12.0pt;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l5:level4
	{mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;
	mso-ansi-font-size:10.5pt;
	mso-bidi-font-size:10.5pt;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l5:level5
	{mso-level-text:%5\FF09;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;
	mso-ansi-font-size:10.5pt;
	mso-bidi-font-size:10.5pt;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l5:level6
	{mso-level-number-format:alpha-lower;
	mso-level-text:%6\FF09;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;
	mso-ansi-font-size:10.5pt;
	mso-bidi-font-size:10.5pt;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l5:level7
	{mso-level-number-format:roman-lower;
	mso-level-text:%7;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;
	mso-ansi-font-size:10.5pt;
	mso-bidi-font-size:10.5pt;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l5:level8
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8";
	mso-level-tab-stop:72.0pt;
	mso-level-number-position:left;
	margin-left:72.0pt;
	text-indent:-72.0pt;
	mso-ansi-font-size:9.0pt;
	mso-bidi-font-size:9.0pt;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l5:level9
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8\.%9";
	mso-level-tab-stop:79.2pt;
	mso-level-number-position:left;
	margin-left:79.2pt;
	text-indent:-79.2pt;
	mso-ansi-font-size:9.0pt;
	mso-bidi-font-size:9.0pt;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l6
	{mso-list-id:942373150;
	mso-list-template-ids:67698717;}
@list l6:level1
	{mso-level-text:%1;
	mso-level-tab-stop:21.25pt;
	mso-level-number-position:left;
	margin-left:21.25pt;
	text-indent:-21.25pt;}
@list l6:level2
	{mso-level-text:"%1\.%2";
	mso-level-tab-stop:57.25pt;
	mso-level-number-position:left;
	margin-left:49.6pt;
	text-indent:-1.0cm;}
@list l6:level3
	{mso-level-text:"%1\.%2\.%3";
	mso-level-tab-stop:96.55pt;
	mso-level-number-position:left;
	margin-left:70.9pt;
	text-indent:-1.0cm;}
@list l6:level4
	{mso-level-text:"%1\.%2\.%3\.%4";
	mso-level-tab-stop:135.8pt;
	mso-level-number-position:left;
	margin-left:99.2pt;
	text-indent:-35.4pt;}
@list l6:level5
	{mso-level-text:"%1\.%2\.%3\.%4\.%5";
	mso-level-tab-stop:175.05pt;
	mso-level-number-position:left;
	margin-left:127.55pt;
	text-indent:-42.5pt;}
@list l6:level6
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6";
	mso-level-tab-stop:214.3pt;
	mso-level-number-position:left;
	margin-left:163.0pt;
	text-indent:-2.0cm;}
@list l6:level7
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7";
	mso-level-tab-stop:253.55pt;
	mso-level-number-position:left;
	margin-left:191.35pt;
	text-indent:-63.8pt;}
@list l6:level8
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8";
	mso-level-tab-stop:292.8pt;
	mso-level-number-position:left;
	margin-left:219.7pt;
	text-indent:-70.9pt;}
@list l6:level9
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8\.%9";
	mso-level-tab-stop:332.1pt;
	mso-level-number-position:left;
	margin-left:255.1pt;
	text-indent:-85.0pt;}
@list l7
	{mso-list-id:1123964682;
	mso-list-template-ids:548200814;}
@list l7:level1
	{mso-level-suffix:none;
	mso-level-text:"%1  ";
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:0cm;
	mso-ansi-font-size:18.0pt;
	mso-bidi-font-size:18.0pt;
	font-family:Arial;
	mso-fareast-font-family:\9ED1\4F53;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l7:level2
	{mso-level-suffix:none;
	mso-level-text:"%1\.%2  ";
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	margin-left:36.0pt;
	text-indent:0cm;
	mso-ansi-font-size:15.0pt;
	mso-bidi-font-size:15.0pt;
	font-family:Arial;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l7:level3
	{mso-level-suffix:none;
	mso-level-text:"%1\.%2\.%3  ";
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	margin-left:36.0pt;
	text-indent:0cm;
	mso-ansi-font-size:12.0pt;
	mso-bidi-font-size:12.0pt;
	font-family:Arial;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l7:level4
	{mso-level-suffix:none;
	mso-level-text:"%1\.%2\.%3\.%4  ";
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	margin-left:36.0pt;
	text-indent:0cm;
	mso-ansi-font-size:10.5pt;
	mso-bidi-font-size:10.5pt;
	font-family:Arial;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l7:level5
	{mso-level-tab-stop:92.7pt;
	mso-level-number-position:left;
	margin-left:92.7pt;
	text-indent:-15.6pt;
	mso-ansi-font-size:10.5pt;
	mso-bidi-font-size:10.5pt;
	font-family:Arial;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l7:level6
	{mso-level-text:"%6\)";
	mso-level-tab-stop:92.7pt;
	mso-level-number-position:left;
	margin-left:92.7pt;
	text-indent:-15.6pt;
	mso-ansi-font-size:10.5pt;
	mso-bidi-font-size:10.5pt;
	font-family:Arial;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l7:level7
	{mso-level-number-format:alpha-lower;
	mso-level-tab-stop:92.7pt;
	mso-level-number-position:left;
	margin-left:92.7pt;
	text-indent:-15.6pt;
	mso-ansi-font-size:10.5pt;
	mso-bidi-font-size:10.5pt;
	font-family:Arial;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l7:level8
	{mso-level-reset-level:level1;
	mso-level-style-link:Figure;
	mso-level-suffix:space;
	mso-level-text:Figure%8;
	mso-level-tab-stop:none;
	mso-level-number-position:center;
	margin-left:36.0pt;
	text-indent:0cm;
	mso-ansi-font-size:9.0pt;
	mso-bidi-font-size:9.0pt;
	font-family:Arial;
	mso-fareast-font-family:\9ED1\4F53;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l7:level9
	{mso-level-reset-level:level1;
	mso-level-style-link:Table;
	mso-level-suffix:space;
	mso-level-text:Table%9;
	mso-level-tab-stop:none;
	mso-level-number-position:center;
	margin-left:36.0pt;
	text-indent:0cm;
	mso-ansi-font-size:9.0pt;
	mso-bidi-font-size:9.0pt;
	font-family:Arial;
	mso-fareast-font-family:\9ED1\4F53;
	mso-ansi-font-weight:normal;
	mso-ansi-font-style:normal;}
@list l8
	{mso-list-id:1380013528;
	mso-list-template-ids:-1435872280;}
@list l8:level1
	{mso-level-number-format:none;
	mso-level-text:"\9644\5F55A ";
	mso-level-tab-stop:21.25pt;
	mso-level-number-position:left;
	margin-left:21.25pt;
	text-indent:-21.25pt;}
@list l8:level2
	{mso-level-text:"A\.%2";
	mso-level-tab-stop:49.6pt;
	mso-level-number-position:left;
	margin-left:49.6pt;
	text-indent:-1.0cm;}
@list l8:level3
	{mso-level-text:"%1\.%2\.%3";
	mso-level-tab-stop:70.9pt;
	mso-level-number-position:left;
	margin-left:70.9pt;
	text-indent:-1.0cm;}
@list l8:level4
	{mso-level-text:"%1\.%2\.%3\.%4";
	mso-level-tab-stop:99.2pt;
	mso-level-number-position:left;
	margin-left:99.2pt;
	text-indent:-35.4pt;}
@list l8:level5
	{mso-level-text:"%1\.%2\.%3\.%4\.%5";
	mso-level-tab-stop:127.55pt;
	mso-level-number-position:left;
	margin-left:127.55pt;
	text-indent:-42.5pt;}
@list l8:level6
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6";
	mso-level-tab-stop:163.0pt;
	mso-level-number-position:left;
	margin-left:163.0pt;
	text-indent:-2.0cm;}
@list l8:level7
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7";
	mso-level-tab-stop:191.35pt;
	mso-level-number-position:left;
	margin-left:191.35pt;
	text-indent:-63.8pt;}
@list l8:level8
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8";
	mso-level-tab-stop:219.7pt;
	mso-level-number-position:left;
	margin-left:219.7pt;
	text-indent:-70.9pt;}
@list l8:level9
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8\.%9";
	mso-level-tab-stop:255.1pt;
	mso-level-number-position:left;
	margin-left:255.1pt;
	text-indent:-85.0pt;}
@list l9
	{mso-list-id:1425803385;
	mso-list-template-ids:67698717;}
@list l9:level1
	{mso-level-text:%1;
	mso-level-tab-stop:21.25pt;
	mso-level-number-position:left;
	margin-left:21.25pt;
	text-indent:-21.25pt;}
@list l9:level2
	{mso-level-text:"%1\.%2";
	mso-level-tab-stop:57.25pt;
	mso-level-number-position:left;
	margin-left:49.6pt;
	text-indent:-1.0cm;}
@list l9:level3
	{mso-level-text:"%1\.%2\.%3";
	mso-level-tab-stop:96.55pt;
	mso-level-number-position:left;
	margin-left:70.9pt;
	text-indent:-1.0cm;}
@list l9:level4
	{mso-level-text:"%1\.%2\.%3\.%4";
	mso-level-tab-stop:153.8pt;
	mso-level-number-position:left;
	margin-left:99.2pt;
	text-indent:-35.4pt;}
@list l9:level5
	{mso-level-text:"%1\.%2\.%3\.%4\.%5";
	mso-level-tab-stop:193.05pt;
	mso-level-number-position:left;
	margin-left:127.55pt;
	text-indent:-42.5pt;}
@list l9:level6
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6";
	mso-level-tab-stop:232.3pt;
	mso-level-number-position:left;
	margin-left:163.0pt;
	text-indent:-2.0cm;}
@list l9:level7
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7";
	mso-level-tab-stop:271.55pt;
	mso-level-number-position:left;
	margin-left:191.35pt;
	text-indent:-63.8pt;}
@list l9:level8
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8";
	mso-level-tab-stop:310.8pt;
	mso-level-number-position:left;
	margin-left:219.7pt;
	text-indent:-70.9pt;}
@list l9:level9
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8\.%9";
	mso-level-tab-stop:350.1pt;
	mso-level-number-position:left;
	margin-left:255.1pt;
	text-indent:-85.0pt;}
@list l10
	{mso-list-id:1666475049;
	mso-list-template-ids:-28945502;}
@list l10:level1
	{mso-level-style-link:"Heading 1";
	mso-level-text:%1;
	mso-level-tab-stop:21.6pt;
	mso-level-number-position:left;
	margin-left:21.6pt;
	text-indent:-21.6pt;}
@list l10:level2
	{mso-level-style-link:"Heading 2";
	mso-level-text:"%1\.%2";
	mso-level-tab-stop:28.8pt;
	mso-level-number-position:left;
	margin-left:28.8pt;
	text-indent:-28.8pt;}
@list l10:level3
	{mso-level-style-link:"Heading 3";
	mso-level-text:"%1\.%2\.%3";
	mso-level-tab-stop:36.0pt;
	mso-level-number-position:left;
	margin-left:36.0pt;
	text-indent:-36.0pt;}
@list l10:level4
	{mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l10:level5
	{mso-level-text:%5\FF09;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l10:level6
	{mso-level-number-format:alpha-lower;
	mso-level-text:%6\FF09;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l10:level7
	{mso-level-number-format:roman-lower;
	mso-level-text:%7;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l10:level8
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8";
	mso-level-tab-stop:72.0pt;
	mso-level-number-position:left;
	margin-left:72.0pt;
	text-indent:-72.0pt;}
@list l10:level9
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8\.%9";
	mso-level-tab-stop:79.2pt;
	mso-level-number-position:left;
	margin-left:79.2pt;
	text-indent:-79.2pt;}
@list l11
	{mso-list-id:1916042858;
	mso-list-template-ids:-648263936;}
@list l11:level1
	{mso-level-number-format:alpha-upper;
	mso-level-text:\9644\5F55%1;
	mso-level-tab-stop:21.6pt;
	mso-level-number-position:left;
	margin-left:21.6pt;
	text-indent:-21.6pt;}
@list l11:level2
	{mso-level-text:"%1\.%2";
	mso-level-tab-stop:28.8pt;
	mso-level-number-position:left;
	margin-left:28.8pt;
	text-indent:-28.8pt;}
@list l11:level3
	{mso-level-text:"%1\.%2\.%3";
	mso-level-tab-stop:36.0pt;
	mso-level-number-position:left;
	margin-left:36.0pt;
	text-indent:-36.0pt;}
@list l11:level4
	{mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l11:level5
	{mso-level-text:%5\FF09;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l11:level6
	{mso-level-number-format:alpha-lower;
	mso-level-text:%6&#65289;;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l11:level7
	{mso-level-number-format:roman-lower;
	mso-level-text:%7;
	mso-level-tab-stop:1.0cm;
	mso-level-number-position:left;
	margin-left:46.8pt;
	text-indent:-34.0pt;}
@list l11:level8
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8";
	mso-level-tab-stop:72.0pt;
	mso-level-number-position:left;
	margin-left:72.0pt;
	text-indent:-72.0pt;}
@list l11:level9
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8\.%9";
	mso-level-tab-stop:79.2pt;
	mso-level-number-position:left;
	margin-left:79.2pt;
	text-indent:-79.2pt;}
@list l12
	{mso-list-id:2114861838;
	mso-list-template-ids:-433129230;}
@list l12:level1
	{mso-level-number-format:none;
	mso-level-text:"&#38468;&#24405;A ";
	mso-level-tab-stop:21.25pt;
	mso-level-number-position:left;
	margin-left:21.25pt;
	text-indent:-21.25pt;}
@list l12:level2
	{mso-level-text:"A\.%2";
	mso-level-tab-stop:49.6pt;
	mso-level-number-position:left;
	margin-left:49.6pt;
	text-indent:-1.0cm;}
@list l12:level3
	{mso-level-text:"%1A\.%2\.%3";
	mso-level-tab-stop:70.9pt;
	mso-level-number-position:left;
	margin-left:70.9pt;
	text-indent:-1.0cm;}
@list l12:level4
	{mso-level-text:"%1\.%2\.%3\.%4";
	mso-level-tab-stop:99.2pt;
	mso-level-number-position:left;
	margin-left:99.2pt;
	text-indent:-35.4pt;}
@list l12:level5
	{mso-level-text:"%1\.%2\.%3\.%4\.%5";
	mso-level-tab-stop:127.55pt;
	mso-level-number-position:left;
	margin-left:127.55pt;
	text-indent:-42.5pt;}
@list l12:level6
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6";
	mso-level-tab-stop:163.0pt;
	mso-level-number-position:left;
	margin-left:163.0pt;
	text-indent:-2.0cm;}
@list l12:level7
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7";
	mso-level-tab-stop:191.35pt;
	mso-level-number-position:left;
	margin-left:191.35pt;
	text-indent:-63.8pt;}
@list l12:level8
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8";
	mso-level-tab-stop:219.7pt;
	mso-level-number-position:left;
	margin-left:219.7pt;
	text-indent:-70.9pt;}
@list l12:level9
	{mso-level-text:"%1\.%2\.%3\.%4\.%5\.%6\.%7\.%8\.%9";
	mso-level-tab-stop:255.1pt;
	mso-level-number-position:left;
	margin-left:255.1pt;
	text-indent:-85.0pt;}
ol
	{margin-bottom:0cm;}
ul
	{margin-bottom:0cm;}
-->
</STYLE>
</HEAD>
<BODY lang=3DZH-CN style=3D"TEXT-JUSTIFY-TRIM: punctuation" =
vLink=3Dpurple link=3Dblue=20
bgColor=3D#ffffff>
<DIV>
<DIV><FONT face=3DArial size=3D2>I believe the intent is that 403 =
Unsupported header=20
and 404 Illegal value are&nbsp;returned for any request method along =
with the=20
bad header(s). My suggestIon is to clarify section 5.4 such that for 403 =
and=20
404, the response "MUST include the bad or unsupported headers and their =
values=20
exactly as they were sent from the client." For 406, the mandatory =
header should=20
be returned with no value.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV></DIV>
<BLOCKQUOTE=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
  <DIV=20
  style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
  <A title=3Darvinds@huawei.com =
href=3D"mailto:arvinds@huawei.com">Arvind=20
  Saraswat</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3Dspeechsc@ietf.org=20
  href=3D"mailto:speechsc@ietf.org">IETF SPEECHSC (E-mail)</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Wednesday, June 28, 2006 =
7:28=20
  AM</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [Speechsc] Reg. =
Unsupported/Bad=20
  parameters in requests....</DIV>
  <DIV><BR></DIV>
  <DIV class=3DSection1 style=3D"LAYOUT-GRID:  15.6pt none">
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier">Hi,<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier">&nbsp;&nbsp;&nbsp;=20
  In MRCPv2 specification for SET-PARAMS it is mentioned that...=20
  (6.1.1)<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier"><o:p>&nbsp;</o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm; TEXT-INDENT: =
15pt"><FONT=20
  face=3DCourier color=3Dgreen size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">The=20
  "SET-PARAMS" method, from the client to the server, tells=20
  the<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  MRCPv2 resource to define parameters for the session, such as=20
  voice<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  characteristics and prosody on synthesizers, recognition timers=20
  on<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  recognizers, etc.&nbsp; If the server accepts and sets all parameters=20
  it<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  MUST return a Response-Status of 200.&nbsp; If it chooses to ignore=20
  some<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  optional headers that can be safely ignored without=20
  affecting<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  operation of the server it MUST return =
201.<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier"><o:p>&nbsp;</o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  If some of the headers being set are unsupported for the resource=20
  or<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  have illegal values, the server MUST reject the request with a=20
  403<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  Unsupported Header or 404 Illegal Value for Header, as=20
  appropriate.<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  If the request had both bad and unsupported parameters 404 MUST=20
  be<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  returned.&nbsp; </SPAN></FONT><FONT face=3DCourier color=3Dred =
size=3D2><SPAN=20
  lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: red; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier">Such=20
  a response MUST include the bad or =
unsupported<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dred=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: red; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier">&nbsp;&nbsp;=20
  headers and their values exactly as they were sent from the=20
  client.<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  Session parameters modified using "SET-PARAMS" do not=20
  override<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  parameters explicitly specified on individual requests or=20
  requests<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dgreen=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: green; LINE-HEIGHT: 150%; =
FONT-FAMILY: Courier">&nbsp;&nbsp;=20
  that are in-PROGRESS.<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier"><o:p>&nbsp;</o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier">It=20
  is clear from above that if SET-PARAM method tries to SET an =
unsupported or=20
  bad parameter, the server will responds with 403 or 404 as the case =
may be and=20
  will send the header and value exactly as sent in the initial request=20
  method.<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier"><o:p>&nbsp;</o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier">However,=20
  what will happen if the parameter is set using SPEAK or RECOGNIZE or=20
  DEFINE-GRAMMAR or any other similar method? Will the response to these =

  messages is similar to SET-PARAMS? The following options are=20
  there:<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier">a)=20
  Just send the error code (response line)<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier">b)=20
  Include all the parameters that could not be set/defined by the=20
  request,<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier">c)=20
  any other...<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier"><o:p>&nbsp;</o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier">regards<BR>Arvind<o:p></o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier"><o:p>&nbsp;</o:p></SPAN></FONT></P>
  <P class=3DMsoNormal style=3D"MARGIN-LEFT: 0cm"><FONT face=3DCourier =
color=3Dblue=20
  size=3D2><SPAN lang=3DEN-US=20
  style=3D"FONT-SIZE: 10pt; COLOR: blue; LINE-HEIGHT: 150%; FONT-FAMILY: =
Courier">-Arvind<o:p></o:p></SPAN></FONT></P></DIV>
  <P>
  <HR>

  <P></P>_______________________________________________<BR>Speechsc =
mailing=20
  =
list<BR>Speechsc@ietf.org<BR>https://www1.ietf.org/mailman/listinfo/speec=
hsc<BR></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_0A49_01C69E1C.0A1EC760--


--===============0015453947==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============0015453947==--


From speechsc-bounces@ietf.org Sun Jul 02 17:54:17 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1Fx9tQ-0002fc-DE; Sun, 02 Jul 2006 17:54:16 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1Fx9tN-0002JN-GL
	for speechsc@ietf.org; Sun, 02 Jul 2006 17:54:13 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Fx9kF-0005zX-Qm
	for speechsc@ietf.org; Sun, 02 Jul 2006 17:44:49 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id 204792140F6; Sun,  2 Jul 2006 21:44:47 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-4.1 required=5.5 tests=ALL_TRUSTED,AWL,BAYES_00,
	HTML_50_60,HTML_MESSAGE autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (dsl-34-34.dsl.netsource.ie [213.79.34.34])
	by mail.voxpilot.com (Postfix) with ESMTP
	id 616A8214046; Sun,  2 Jul 2006 21:44:42 +0000 (GMT)
Message-ID: <0ad701c69e20$b22acf90$6600000a@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>,
	"IETF SPEECHSC (E-mail)" <speechsc@ietf.org>
References: <911B89A9FD71E649AA624FF24790D76F450BF5@GIMLI.us.int.genesyslab.com>
Subject: Re: [speechsc] unsupported language errors do not indicate
	theunsupported language
Date: Sun, 2 Jul 2006 22:44:28 +0100
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.1 (/)
X-Scan-Signature: a7d2e37451f7f22841e3b6f40c67db0f
Cc: 
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0631956344=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============0631956344==
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0AD4_01C69E29.139816E0"

This is a multi-part message in MIME format.

------=_NextPart_000_0AD4_01C69E29.139816E0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I suppose Completion-Reason could be used for a non-machine processable =
result. I'm in favour of an Unsupported-Language header field being =
added. Obviously applies to both the speech synthesiser and speech =
recogniser resources.=20

Dave
  ----- Original Message -----=20
  From: Andrew Wahbe=20
  To: IETF SPEECHSC (E-mail)=20
  Sent: Thursday, June 29, 2006 3:21 PM
  Subject: [speechsc] unsupported language errors do not indicate =
theunsupported language


  Most of the resource types provide an "unsupported-language" =
completion-cause code but there doesn't seem to be any way to indicate =
what language(s) were not supported.

  Options here could be re-using the speech-language header for this, =
adding a new "unsupported-language" header or defining a content type =
(as discussed for failed URIs) that cause describe what went wrong with =
the request.


-------------------------------------------------------------------------=
-----


  _______________________________________________
  Speechsc mailing list
  Speechsc@ietf.org
  https://www1.ietf.org/mailman/listinfo/speechsc

------=_NextPart_000_0AD4_01C69E29.139816E0
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2900.2873" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>I suppose Completion-Reason could be =
used for a=20
non-machine processable result. I'm in favour of an Unsupported-Language =
header=20
field being added. Obviously applies to both&nbsp;the speech synthesiser =
and=20
speech recogniser resources. </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
<BLOCKQUOTE=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
  <DIV=20
  style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
  <A title=3DAndrew.Wahbe@genesyslab.com=20
  href=3D"mailto:Andrew.Wahbe@genesyslab.com">Andrew Wahbe</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3Dspeechsc@ietf.org=20
  href=3D"mailto:speechsc@ietf.org">IETF SPEECHSC (E-mail)</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Thursday, June 29, 2006 =
3:21=20
  PM</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [speechsc] unsupported =
language=20
  errors do not indicate theunsupported language</DIV>
  <DIV><BR></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN class=3D526242621-28062006>Most =
of the=20
  resource types provide an "unsupported-language" completion-cause code =
but=20
  there doesn't seem to be any way to indicate what language(s) were not =

  supported.</SPAN></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN=20
  class=3D526242621-28062006></SPAN></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN =
class=3D526242621-28062006>Options here could=20
  be re-using the speech-language header for this, adding a new=20
  "unsupported-language" header or defining a content type (as discussed =
for=20
  failed URIs) that cause describe what went wrong with the=20
  request.</SPAN></FONT></DIV>
  <P>
  <HR>

  <P></P>_______________________________________________<BR>Speechsc =
mailing=20
  =
list<BR>Speechsc@ietf.org<BR>https://www1.ietf.org/mailman/listinfo/speec=
hsc<BR></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_0AD4_01C69E29.139816E0--


--===============0631956344==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============0631956344==--


From speechsc-bounces@ietf.org Sun Jul 02 18:33:26 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FxAVJ-0005jZ-HL; Sun, 02 Jul 2006 18:33:25 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1Fx9tT-0002Tb-52
	for speechsc@ietf.org; Sun, 02 Jul 2006 17:54:19 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Fx9gd-0004xK-Fb
	for speechsc@ietf.org; Sun, 02 Jul 2006 17:41:06 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id 31C0F214046; Sun,  2 Jul 2006 21:41:01 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-4.2 required=5.5 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (dsl-34-34.dsl.netsource.ie [213.79.34.34])
	by mail.voxpilot.com (Postfix) with ESMTP
	id 50CAF214046; Sun,  2 Jul 2006 21:40:55 +0000 (GMT)
Message-ID: <0abb01c69e20$2ad015a0$6600000a@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>,
	"IETF SPEECHSC (E-mail)" <speechsc@ietf.org>
References: <911B89A9FD71E649AA624FF24790D76F2E96F7@GIMLI.us.int.genesyslab.com>
Subject: Re: [speechsc] Hotword Recognition and Timers 
Date: Sun, 2 Jul 2006 22:40:41 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 4bb0e9e1ca9d18125bc841b2d8d77e24
Cc: 
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Errors-To: speechsc-bounces@ietf.org

Andrew's proposals/clarifications make sense to me.

One interesting result, however, is that Andrew's definition for 
Recognition-Timeout coincides with Hotword-Max-Duration except that the 
former terminates the recognition when it fires. I don't think this is 
necessarily a problem.

It seems (if I understand this thread properly) that the VoiceXML world 
needs a maxspeechtimeout to terminate hotword but the MRCP protocol also 
might need a safety net to prevent a RECOGNIZE going IN-PROGRESS forever. 
For normal recognition, the Recognition-Timeout gets you both the safety net 
and the maxspeechtimeout. Since the MRCP client can STOP a recognition at 
any point this safety net is not crucial. In short - I'm fine with Andrew's 
suggested changes.

Dave

----- Original Message ----- 
From: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>
To: "IETF SPEECHSC (E-mail)" <speechsc@ietf.org>
Sent: Monday, June 19, 2006 8:30 PM
Subject: RE: [speechsc] Hotword Recognition and Timers


The thing is that nowhere in your explanation are you mentioning the
prompt and it's completion (ie. the START-INPUT-TIMERS message). The
main use case and reason for hotword recognition/recognition-based
barge-in is to prevent accidental barge-in on audio content such as a
voicemail, tts email, etc. The scenario you describe below requires that
the client knows how long the content is when the RECOGNIZE is started;
this is definitely not an assumption you can make. The client won't know
how long it will take to TTS a chunk of text or how long the set of
audio files (prompts) are or even if they end at all (it could be a
continuous stream).

My proposal is that hotword recognition should "work" in a similar
manner to normal recognition from the client's perspective:

* RECOGNIZE is sent with the start-input-timers header set to "false".
The recognition-mode is set to "hotword". Prompt playback starts at this
point as well.
* START-INPUT-TIMERS is sent when the prompt completes. The
no-input-timer starts at this point.

The above two points are identical to the normal case except that the
recognition-mode is "hotword". My proposal is that the general meaning
of the recognition and no-input timers are also the same as the normal
case. Namely:

* The no-input timer is the max amount of time after the prompt
completes that we are willing to wait for input. This is equivalent to
the "timeout" property in VoiceXML. It is usually on the order of a few
seconds.
* The recognition timer is the max amount of time that we will run
recognition on a single "utterance". This is basically a safety net
protecting against noise (say the user left the phone off the hook next
to the radio) keeping the recognizer occupied for an unreasonable amount
of time. This only applies when speech is detected since the
no-input-timer will take effect (once the prompt is done) to terminate
the recognition. This is equivalent to the "maxspeechtimeout" property
in VoiceXML. This is usually quite a bit longer than the no-input
timeout, say 10 to 30 seconds.

Note that the definitions of timeout and maxspeechtimeout properties in
VoiceXML apply to both normal and hotword recognition, which is part of
the rational for keeping the high-level meaning the same for both modes
in MRCP. At the end of the day, the developer has to answer two
questions regardless of what mode they are using:
* How long after the end of a prompt do I want to want to wait for
input? (no-input timeout)
* How much continuous noise am I willing to process before aborting a
recognition? (recognition timeout)

What makes things a little complicated is that in hotword recognition:
1) the detection of speech does not mean that "input" was detected -- we
don't have "input" until we have a match;
2) we can go from a state of processing speech/sound back to a state
where there is silence and we are waiting for speech.

The behaviors that were specified in the original email was an attempt
to keep the same high-level meanings for the timers while taking into
account the two points above. These special behaviors for hotword mode
were:
a) the no-input timer is not cancelled until there is a recognition
result.
b) the recognition timer is reset and turned off when an utterance that
doesn't match anything "ends" as determined by the incomplete timeout
firing. The recognition timer is re-enabled when subsequent speech is
detected.

Another behavior that the VoiceXML Forum MRCP Liaison Committee has
discussed recently is as follows:
c) if the no-input timer fires while speech is being processed, then the
recognition will not be aborted until the recognizer makes a decision on
that segment of speech (eg. complete timeout, incomplete timeout,
recognition timeout, or early no-match). A no-match on the utterance at
this point would cause "no-input-timeout" to be returned for the
recognition.

This last behavior would prevent the no-input timeout from cutting off
recognition in the middle of an utterance, which might happen if we
followed (a) above.

To address your use cases below:

1. If you say nothing, the no-input timer will eventually fire (at the
specified number of milliseconds after the prompt is completed) and end
the recognition.

2. If you say something unintelligible, the no-input timer is not
stopped as that does not correspond to a recognition result in hotword.
Note that the no-input timer may not even be enabled if the prompt is
still playing. At the end of the unintelligible speech, the recognition
timer is stopped and turned off. When you later say something
intelligible, the recognition timer is turned back on while you are
speaking. Assuming your speech was short, the recognition timer is
turned back off when you are done speaking.  Since you now generated a
match, the no-input timer is also cancelled (if the prompt had finished)
and the result is returned.

Thanks,

Andrew Wahbe

-----Original Message-----
From: Saravanan Shanmugham (sarvi) [mailto:sarvi@cisco.com]
Sent: June 16, 2006 2:46 PM
To: Dan Burnett; IETF SPEECHSC (E-mail)
Subject: RE: [speechsc] Hotword Recognition and Timers


I can see that both No-Input-Timout and Recognition-Tiemout values will
be usefull for Hotword recognition.
But saying that Recognition-Timer is started after speech is detected
bothers me.
Also what do you expect typical values for these timers based on your
proposed definitions.

Hotword recognition is very often used to issue commands.
So lets take the following scenario and look at possible cases.

When the system reading out a long email, you should be able to issue
command like "speedup" or "slow down" or "repeat" etc.

1. But then I might never say any command at all. So defining
Recognition-Timer as starting after speech is detected makes no sense in
this case. No-Input-Timer, if defined to be applicable to Hotword
recognition might make sense in this case.

2. Then I might say something unintelligible in the middle. Which should
be technically ignored. And then a little later I might actually speak a
command, "speed up". Here when I said something unintelligible, the
No-Input-Timer would be stopped. If we went with the definition
proposed, the Recognition-Timer would be started here.

If you assume No-Input-Timer would be sufficiently large and
Recognition-Timer will be relatively small. This means that once we say
something not matching a hotword(which should technically expected to be
ignored), the RECOGNIZE would complete due to Recogition-Timeout.

If we assume No-Input-Timer to be short and Recognition-Timer to be
long, then we are requiring that the user MUST say something
intelligible or unintelligible reasobaly quickly. Or the Recognize would
terminate due to No-Input-timeout.

If we assume No-Input-Timer to be large and Recognition-timer to be
large as well. The depending on whether I say something unintelligible
or not, the over all timeout could be  pretty large upto max of
No-Tinput-timer + Recognition-Timer.

The way I would expect this to work is, that No-Input-Timer and
Recognition-Timers are started at beginning of a hotword RECOGNIZE and
both are reasonably large values. The No-Input-Timer being most likely
possible equal to or smaller than Recognition-Timer.

Now, if I said nothing at all an the No-Input-Timer expired, the
RECOGNIZE commplete with no-input-timeout. The moment I say something,
unintelligible or intelligible, the No-Input-timer is stopped.
Recognition-Timer continues on.  If the current speech or a future
command matches a hotword grammar, the RECOGNIZE command, it completes
with success.
If nothing matches and the Recognition-Timer expires, the RECOGNIZE
completes with recognition-timeout.

This way for hotword, Recognition-Timer is the max recognition time for
the RECOGNIZE. While No-Input-Timer would only be equal or smaller.

Thx,
Sarvi

     -----Original Message-----
     From: Dan Burnett [mailto:dan_burnett2000@yahoo.com]
     Sent: Thursday, June 08, 2006 5:06 AM
     To: IETF SPEECHSC (E-mail)
     Subject: Re: [speechsc] Hotword Recognition and Timers

     This email is a result of discussions by the MRCP subgroup
     of the VoiceXML Forum, in which I participated, so I
     already agree with the proposals given here.

     However, I would like to hear comments from others before
     applying these changes to the spec draft, preferably from
     those who did not participate in the VoiceXML Forum discussions.

     This has been added to the issue tracker
     (http://www.softarmor.com/roundup/speechsc) as issue 88.

     -- dan


     --- Andrew Wahbe <awahbe@voicegenie.com> wrote:

     > The description of how timers (no-input and
     > recognition) are used during
     > hotword recognition is inconsistent. In sections 9.4.7,
     it is stated
     > that "For a hotword recognition mode, this timer is
     started when the
     > user begins speaking. Note that for Hotword mode recognition the
     > START-OF-INPUT event is not generated." However, section
     9.9 states
     > that for the hotword case: "The Recognition-Timer gets
     started at the
     > beginning of RECOGNIZE."
     >
     > It seems that section 9.9 is incorrect (or at least is
     inconsistent
     > with VoiceXML).
     >
     > Section 9.9 omits any mention of the no-input timer for
     the hotword
     > mode recognition case; however, none of the sections
     that deal with
     > the no-input timer make a distinction between the hotword and
     > non-hotword cases. VoiceXML also does not make this distinction.
     > It would seem that
     > section 9.9 should be changed to indicate that no-input
     timers are
     > started in the hotword case and that no-input-timeout is a valid
     > completion cause for a hotword recognition.
     >
     > A related question worth considering is if the
     recognition timer is
     > reset at any point, for example, on the detection of
     silence. Consider
     > the case when maxspeech has a value of say 20 seconds (a
     > typical/reasonable value) and hotword barge-in is being
     used on a
     > prompt that is 30 seconds long. This would mean that a
     user that spoke
     > briefly
     > 2 seconds into the prompt (and was silent for the
     remainder of the
     > prompt) would experience a maxspeech timeout at about 22
     seconds into
     > the prompt. They would not hear the whole prompt which seems
     > inappropriate. The reason for maxspeech timeout is to
     catch continuous
     > noise and keep it from occupying a recognizer; but what
     should happen
     > in periods of silence in the hotword case?
     >
     > Similarly, when is the no-input timer canceled in the
     hotword case? Is
     > it when speech (not necessarily matching) is detected?
     Or is it only
     > upon a match?
     >
     > The correct behavior in my opinion is that the no-input timer is
     > canceled only on a match, and that the recognition timer
     should be
     > reset if silence (determined by complete timeout and incomplete
     > timeout) is detected. If we are just processing
     intermittent noise,
     > the no-input timer will eventually expire. Continuous
     noise is handled
     > by the recognition timer. Of course other there are other
     > possibilities as well, this is just one option that I
     think fits with
     > VoiceXML.
     > > begin:vcard
     > fn:Andrew Wahbe
     > n:Wahbe;Andrew
     > org:VoiceGenie Technologies INC.
     > adr:8th Floor;;1120 Finch Avenue W.;Toronto;ON;M3J 3H7;Canada
     > email;internet:awahbe@voicegenie.com
     > title:Senior Architect
     > tel;work:(416) 736-0905 ext. 258
     > tel;fax:(416) 736-1551
     > x-mozilla-html:TRUE
     > url:http://www.voicegenie.com
     > version:2.1
     > end:vcard
     >
     > > _______________________________________________
     > Speechsc mailing list
     > Speechsc@ietf.org
     > https://www1.ietf.org/mailman/listinfo/speechsc
     >


     __________________________________________________
     Do You Yahoo!?
     Tired of spam?  Yahoo! Mail has the best spam protection
     around http://mail.yahoo.com

     _______________________________________________
     Speechsc mailing list
     Speechsc@ietf.org
     https://www1.ietf.org/mailman/listinfo/speechsc


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


From speechsc-bounces@ietf.org Sun Jul 02 19:50:26 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FxBhp-0007vJ-Jn; Sun, 02 Jul 2006 19:50:25 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FxBho-0007vD-L9
	for speechsc@ietf.org; Sun, 02 Jul 2006 19:50:24 -0400
Received: from ns1.jerrycarter.org ([66.92.77.144] helo=jerrycarter.org)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FxBhm-0005be-1F
	for speechsc@ietf.org; Sun, 02 Jul 2006 19:50:24 -0400
Received: from [127.0.0.1] (localhost [127.0.0.1])
	by jerrycarter.org (Postfix) with ESMTP
	id D97C3CEB41C; Sun,  2 Jul 2006 19:50:20 -0400 (EDT)
In-Reply-To: <330A23D8336C0346B5C1A5BB1966664703389116@ATLANTIS.Brooktrout.com>
References: <330A23D8336C0346B5C1A5BB1966664703389116@ATLANTIS.Brooktrout.com>
Mime-Version: 1.0 (Apple Message framework v624)
Content-Type: text/plain; charset=US-ASCII; format=flowed
Message-Id: <82fb8e782571ba018c0ea6c3487587f9@jerrycarter.org>
Content-Transfer-Encoding: 7bit
From: Jerry Carter <jerry@jerrycarter.org>
Subject: Re: [speechsc] The NLSML schema and namespaces
Date: Sun, 2 Jul 2006 19:50:19 -0400
To: "Burger, Eric" <eburger@cantata.com>
X-Mailer: Apple Mail (2.624)
X-Spam-Score: 0.0 (/)
X-Scan-Signature: a92270ba83d7ead10c5001bb42ec3221
Cc: "IETF SPEECHSC \(E-mail\)" <speechsc@ietf.org>,
	Andrew Wahbe <Andrew.Wahbe@genesyslab.com>
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Errors-To: speechsc-bounces@ietf.org

Eric:

If you feel that way, there is an easy solution: drop NLSML from the 
specification.  If could easily be published as a separate document.

So long as MRCP includes a definition a result format, the definition 
should be able to express at least the minimum test cases required by 
VoiceXML.  The ability to support string literals is one of these.  
There are other difficult issues (such as supporting ECMA Script 
boolean results for the deprecated VoiceXML builtin grammars) that 
Andrew does not mention.  An EMMA result is much better format for 
returning recognition results, having considerably more expressive 
power to handle both the easy and the complex cases.

-=- Jerry


On Jun 29, 2006, at 4:15 PM, Burger, Eric wrote:

> What does "existing NLSML" do?  I'm not interested in fixing NLSML.  At
> the rate we're going, we'll be looking at EMMA v6.0 :-(
>
> -----Original Message-----
> From: Andrew Wahbe [mailto:Andrew.Wahbe@genesyslab.com]
> Sent: Tuesday, June 27, 2006 3:46 PM
> To: IETF SPEECHSC (E-mail)
> Subject: [speechsc] The NLSML schema and namespaces
>
> I would like to raise a few issues with both the NSLML schema and it's
> use of namespaces.
>
> First, SRGS and SISR allow you to define a grammar so that multiple
> token sequences map to one string literal result. For example, "yes",
> "ya", "sure", "yes please", and "ok" could all result in the string
> literal result "yes". Thus, if you said "sure", the string literal
> interpretation result would be "yes".
>
> Unfortunately there doesn't seem to be a way to specify string literals
> in NLSML. You would think that the example above could be expressed as
> follows:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <result xmlns="http://www.ietf.org/xml/ns/mrcpv2">
>   <interpretation confidence="0.9">
>     <instance>yes</instance>
>     <input mode="speech">sure</input>
>   </interpretation>
> </result>
>
> However this isn't allowed by the NLSML schema in the current MRCPv2
> draft. This could be allowed by changing the <instance> type to allow
> "mixed" contents (see the definition of <input>). Also, we would need 
> to
> change the schema to allow <instance> to have no child elements.
> Applying these changes we get the following element definition:
>
> <xs:element name="instance" minOccurs="0">
>  <xs:complexType mixed="true">
>   <xs:sequence minOccurs="0">
>    <xs:any/>
>   </xs:sequence>
>  </xs:complexType>
> </xs:element>
>
> Of course this allows for a mix of text and elements (eg. <instance> 
> yes
> <no/> maybe </instance>) which is probably not desirable. XML schema 
> has
> no way to restrict this but the format we define could specify it this
> way (in the text of the spec). The alternative would be to do what EMMA
> does with the <emma:literal/> element. Either way would be fine with 
> me.
>
> The second issue is with the <xs:any/> portion of the instance element
> definition. As currently defined, a schema validator will try to
> validate it's contents even if a schema is not available. We should
> probably relax this by adding a processContents attribute of "lax". 
> This
> will cause the validator to only process the contents if a schema is
> available.
>
> Also, this currently allows any elements, including those from the 
> NLSML
> namespace to be within an <instance/> element. I'm guessing that we
> actually want to allow elements from other namespaces, and to restrict
> it to elements from other namespaces. E.g. you shouldn't be able to do
> this:
>
> <result xmlns="http://www.ietf.org/xml/ns/mrcpv2">
>  <interpretation>
>    <instance>
>      <result>
>        <interpretation>
>          <instance/>
>        </interpretation>
>      </result>
>    </instance>
>  </interpretation>
> </result>
>
> However, this is ok:
>
> <result xmlns="http://www.ietf.org/xml/ns/mrcpv2">
>  <interpretation>
>    <instance>
>      <result xmlns="http://example.com/myNamespace">
>        <interpretation>
>          <instance/>
>        </interpretation>
>      </result>
>    </instance>
>  </interpretation>
> </result>
>
> The final element definition for <instance/> would then be:
> <xs:element name="instance" minOccurs="0">
>  <xs:complexType mixed="true">
>   <xs:sequence minOccurs="0">
>    <xs:any namespace="##other" processContents="lax"/>
>   </xs:sequence>
>  </xs:complexType>
> </xs:element>
>
> Of course, this also raises the issue that all of the examples in the
> spec don't declare namespaces at all. It would probably be a good idea
> to do this properly.
> So examples such as this:
> <?xml version="1.0"?>
> <result grammar="session:request1@form-level.store">
>  <interpretation>
>   <instance name="Person">
>    <Person>
>     <Name> Andre Roy </Name>
>    </Person>
>   </instance>
>   <input>   may I speak to Andre Roy </input>
>  </interpretation>
> </result>
>
> Become this:
>
> <?xml version="1.0"?>
> <nl:result xmlns:nl="http://www.ietf.org/xml/ns/mrcpv2"
>            xmlns="http://www.example.com/example"
>            grammar="session:request1@form-level.store">
>     <nl:interpretation>
>         <nl:instance>
>             <Person>
>                 <Name> Andre Roy </Name>
> 		</Person>
>         </nl:instance>
>         <nl:input>   may I speak to Andre Roy </nl:input>
>     </nl:interpretation>
> </nl:result>
>
> Finally, it is not clear what the namespace of the NSLML format is
> supposed to be. The schema says this:
>    <xs:schema     xmlns:xs="http://www.w3.org/2001/XMLSchema"
>                targetNamespace="http://www.ietf.org/xml/schema/mrcpv2"
>                xmlns="http://www.ietf.org/xml/ns/mrcpv2" ...
>
> I have a feeling that "http://www.ietf.org/xml/schema/mrcpv2" is
> supposed to be the location of the schema and
> "http://www.ietf.org/xml/ns/mrcpv2" is supposed to be the namespace for
> NLSML. If that is the case, then the schema should be written this way:
>    <xs:schema     xmlns:xs="http://www.w3.org/2001/XMLSchema"
>                targetNamespace="http://www.ietf.org/xml/ns/mrcpv2"
>                xmlns="http://www.ietf.org/xml/ns/mrcpv2" ...
>
> The schema location is not referenced in the schema content at all.
> Either way, the default namespace and the targetNamespace should match
> otherwise referencing the "confidenceinfo" simpleType in the 
> definitions
> of the confidence attributes does not work properly.
> Thanks,
>
> Andrew Wahbe
>
> _______________________________________________
> Speechsc mailing list
> Speechsc@ietf.org
> https://www1.ietf.org/mailman/listinfo/speechsc
>
> _______________________________________________
> Speechsc mailing list
> Speechsc@ietf.org
> https://www1.ietf.org/mailman/listinfo/speechsc
>


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


From speechsc-bounces@ietf.org Sun Jul 02 21:04:09 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FxCrB-0006G3-Cp; Sun, 02 Jul 2006 21:04:09 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FxCr9-0006Fy-GE
	for speechsc@ietf.org; Sun, 02 Jul 2006 21:04:07 -0400
Received: from e36.co.us.ibm.com ([32.97.110.154])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FxCr8-0003ko-0K
	for speechsc@ietf.org; Sun, 02 Jul 2006 21:04:07 -0400
Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com
	[9.17.195.106])
	by e36.co.us.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id
	k63143VO024661
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL)
	for <speechsc@ietf.org>; Sun, 2 Jul 2006 21:04:03 -0400
Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170])
	by d03relay04.boulder.ibm.com (8.13.6/NCO/VER7.0) with ESMTP id
	k6314NGo084914
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <speechsc@ietf.org>; Sun, 2 Jul 2006 19:04:23 -0600
Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1])
	by d03av04.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id
	k631429U008035
	for <speechsc@ietf.org>; Sun, 2 Jul 2006 19:04:03 -0600
Received: from d03nm119.boulder.ibm.com (d03nm119.boulder.ibm.com
	[9.17.195.145])
	by d03av04.boulder.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id
	k63142v2008029; Sun, 2 Jul 2006 19:04:02 -0600
In-Reply-To: <09d501c69e11$febc8c40$6600000a@db01.voxpilot.com>
To: "Dave Burke" <david.burke@voxpilot.com>
Subject: Re: [speechsc] The NLSML schema and namespaces
MIME-Version: 1.0
X-Mailer: Lotus Notes Release 7.0 HF144 February 01, 2006
Message-ID: <OF32EE30D8.7CAE4EF0-ON872571A0.0005E310-852571A0.0005DCA1@us.ibm.com>
From: Brett Gavagni <gavagni@us.ibm.com>
Date: Sun, 2 Jul 2006 21:08:38 -0400
X-MIMETrack: Serialize by Router on D03NM119/03/M/IBM(Release 7.0.1HF123 |
	April 14, 2006) at 07/02/2006 19:08:39,
	Serialize complete at 07/02/2006 19:08:39
Content-Type: text/plain; charset="US-ASCII"
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 7268a2980febc47a9fa732aba2b737ba
Cc: "IETF SPEECHSC \(E-mail\)" <speechsc@ietf.org>,
	Andrew Wahbe <Andrew.Wahbe@genesyslab.com>
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Errors-To: speechsc-bounces@ietf.org

I agree, the current definition of the MRCPv2 referenced NLSML schemas are 
deficient.

Thanks,

Brett Gavagni 
WebSphere Voice Server Development 
http://www-306.ibm.com/software/pervasive/voice_server/) 
gavagni@us.ibm.com


"Dave Burke" <david.burke@voxpilot.com> 
07/02/2006 03:59 PM

To
"Burger, Eric" <eburger@cantata.com>, "Andrew Wahbe" 
<Andrew.Wahbe@genesyslab.com>, "IETF SPEECHSC (E-mail)" 
<speechsc@ietf.org>
cc

Subject
Re: [speechsc] The NLSML schema and namespaces


"Existing" NLSML (i.e. the public NLSML document) has no defined schema. I 

believe this thread is about about fixing MRCPv2 NLSML.  I agree with all 
the suggestions below and think they're easily applied...

Dave


----- Original Message ----- 
From: "Burger, Eric" <eburger@cantata.com>
To: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>; "IETF SPEECHSC (E-mail)" 

<speechsc@ietf.org>
Sent: Thursday, June 29, 2006 9:15 PM
Subject: RE: [speechsc] The NLSML schema and namespaces


What does "existing NLSML" do?  I'm not interested in fixing NLSML.  At
the rate we're going, we'll be looking at EMMA v6.0 :-(

-----Original Message-----
From: Andrew Wahbe [mailto:Andrew.Wahbe@genesyslab.com]
Sent: Tuesday, June 27, 2006 3:46 PM
To: IETF SPEECHSC (E-mail)
Subject: [speechsc] The NLSML schema and namespaces

I would like to raise a few issues with both the NSLML schema and it's
use of namespaces.

First, SRGS and SISR allow you to define a grammar so that multiple
token sequences map to one string literal result. For example, "yes",
"ya", "sure", "yes please", and "ok" could all result in the string
literal result "yes". Thus, if you said "sure", the string literal
interpretation result would be "yes".

Unfortunately there doesn't seem to be a way to specify string literals
in NLSML. You would think that the example above could be expressed as
follows:

<?xml version="1.0" encoding="UTF-8"?>
<result xmlns="http://www.ietf.org/xml/ns/mrcpv2">
  <interpretation confidence="0.9">
    <instance>yes</instance>
    <input mode="speech">sure</input>
  </interpretation>
</result>

However this isn't allowed by the NLSML schema in the current MRCPv2
draft. This could be allowed by changing the <instance> type to allow
"mixed" contents (see the definition of <input>). Also, we would need to
change the schema to allow <instance> to have no child elements.
Applying these changes we get the following element definition:

<xs:element name="instance" minOccurs="0">
 <xs:complexType mixed="true">
  <xs:sequence minOccurs="0">
   <xs:any/>
  </xs:sequence>
 </xs:complexType>
</xs:element>

Of course this allows for a mix of text and elements (eg. <instance> yes
<no/> maybe </instance>) which is probably not desirable. XML schema has
no way to restrict this but the format we define could specify it this
way (in the text of the spec). The alternative would be to do what EMMA
does with the <emma:literal/> element. Either way would be fine with me.

The second issue is with the <xs:any/> portion of the instance element
definition. As currently defined, a schema validator will try to
validate it's contents even if a schema is not available. We should
probably relax this by adding a processContents attribute of "lax". This
will cause the validator to only process the contents if a schema is
available.

Also, this currently allows any elements, including those from the NLSML
namespace to be within an <instance/> element. I'm guessing that we
actually want to allow elements from other namespaces, and to restrict
it to elements from other namespaces. E.g. you shouldn't be able to do
this:

<result xmlns="http://www.ietf.org/xml/ns/mrcpv2">
 <interpretation>
   <instance>
     <result>
       <interpretation>
         <instance/>
       </interpretation>
     </result>
   </instance>
 </interpretation>
</result>

However, this is ok:

<result xmlns="http://www.ietf.org/xml/ns/mrcpv2">
 <interpretation>
   <instance>
     <result xmlns="http://example.com/myNamespace">
       <interpretation>
         <instance/>
       </interpretation>
     </result>
   </instance>
 </interpretation>
</result>

The final element definition for <instance/> would then be:
<xs:element name="instance" minOccurs="0">
 <xs:complexType mixed="true">
  <xs:sequence minOccurs="0">
   <xs:any namespace="##other" processContents="lax"/>
  </xs:sequence>
 </xs:complexType>
</xs:element>

Of course, this also raises the issue that all of the examples in the
spec don't declare namespaces at all. It would probably be a good idea
to do this properly.
So examples such as this:
<?xml version="1.0"?>
<result grammar="session:request1@form-level.store">
 <interpretation>
  <instance name="Person">
   <Person>
    <Name> Andre Roy </Name>
   </Person>
  </instance>
  <input>   may I speak to Andre Roy </input>
 </interpretation>
</result>

Become this:

<?xml version="1.0"?>
<nl:result xmlns:nl="http://www.ietf.org/xml/ns/mrcpv2"
           xmlns="http://www.example.com/example"
           grammar="session:request1@form-level.store">
    <nl:interpretation>
        <nl:instance>
            <Person>
                <Name> Andre Roy </Name>
</Person>
        </nl:instance>
        <nl:input>   may I speak to Andre Roy </nl:input>
    </nl:interpretation>
</nl:result>

Finally, it is not clear what the namespace of the NSLML format is
supposed to be. The schema says this:
   <xs:schema     xmlns:xs="http://www.w3.org/2001/XMLSchema"
               targetNamespace="http://www.ietf.org/xml/schema/mrcpv2"
               xmlns="http://www.ietf.org/xml/ns/mrcpv2" ...

I have a feeling that "http://www.ietf.org/xml/schema/mrcpv2" is
supposed to be the location of the schema and
"http://www.ietf.org/xml/ns/mrcpv2" is supposed to be the namespace for
NLSML. If that is the case, then the schema should be written this way:
   <xs:schema     xmlns:xs="http://www.w3.org/2001/XMLSchema"
               targetNamespace="http://www.ietf.org/xml/ns/mrcpv2"
               xmlns="http://www.ietf.org/xml/ns/mrcpv2" ...

The schema location is not referenced in the schema content at all.
Either way, the default namespace and the targetNamespace should match
otherwise referencing the "confidenceinfo" simpleType in the definitions
of the confidence attributes does not work properly.
Thanks,

Andrew Wahbe

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


From speechsc-bounces@ietf.org Mon Jul 03 16:28:47 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FxV2D-00058l-QZ; Mon, 03 Jul 2006 16:28:45 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FxV2D-00058f-6o
	for speechsc@ietf.org; Mon, 03 Jul 2006 16:28:45 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FxV2B-0005T2-Kf
	for speechsc@ietf.org; Mon, 03 Jul 2006 16:28:45 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id 2850D2140FA; Mon,  3 Jul 2006 20:28:36 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-4.1 required=5.5 tests=ALL_TRUSTED,AWL,BAYES_00,
	HTML_MESSAGE autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (dsl-34-34.dsl.netsource.ie [213.79.34.34])
	by mail.voxpilot.com (Postfix) with ESMTP id 6A5C2214046
	for <speechsc@ietf.org>; Mon,  3 Jul 2006 20:28:23 +0000 (GMT)
Message-ID: <0d2901c69edf$39091530$6600000a@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: <speechsc@ietf.org>
Subject: [speechsc] Comments from a review of section 11
Date: Mon, 3 Jul 2006 21:28:11 +0100
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 72dbfff5c6b8ad2b1b727c13be042129
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1495338902=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============1495338902==
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0D26_01C69EE7.95E562B0"

This is a multi-part message in MIME format.

------=_NextPart_000_0D26_01C69EE7.95E562B0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hello:

Some issues from a review of section 11 in roughly decreasing order of =
severity.

Dave

~~~

What does Speech-Complete-Timeout mean for the speakverify resource? =
What grammar was a complete match triggered on?

~~~

State machine broken - see previous comments to list

~~~
  =20
  From section 11.6:

  "Before a verification/identification session is started, only VERIFY-
   ROLLBACK and generic "SET-PARAMS" and "GET-PARAMS" operations may be
   performed on the verification resource.  The server SHOULD return 402
   (Method not valid in this state) for all other operations, such as
   VERIFY, or QUERY-VOICEPRINT."
  =20
   Why not restrict QUERY-VOICEPRINT, DELETE-VOICEPRINT outside of =
session?
   Avoids issues of deleting a voiceprint in use.=20

~~~

What's the queuing behaviour for VERIFY / VERIFY-FROM-BUFFER (presumably =
none)?

~~~

If Save-Waveform is true and STOP is issued with Abort-Verification =
false, shouldn't Waveform-URI be returned?

~~~

Why not have Save-Waveform work with VERIFY in a training session?

~~~


 "a CLEAR-BUFFER request fails if the verification buffer is in" =
(section 11.7)
    -> Specify exactly how it does fail

~~~

    From section 11.18 GET-INTERMEDIATE-RESULT:

   "The verification resource collects the accumulated verification =
results
   and returns the information in the method response."
  =20
   -> Clarify that means? What does <incremental> mean in this case?
   -> Clarify this doesn't work for training results only =
verification/identification

~~~

Clarify START-OF-INPUT only generated for live (VERIFY) requests.

~~~

Which request methods can Num-Min(Max)-Verification-Phrases be specified =
in?
    -> Presume same as Min-Verification-Score?

~~~

State the default value for Num-Max-Verification-Phrases.

~~~

Why can't Save-Waveform be used with SET-/GET-PARAMS?

~~~

New-Audio-Channel - confirm  the header field can be specified only on =
VERIFY.

~~~

Add an example on training a voiceprint for completeness.

~~~

  
------=_NextPart_000_0D26_01C69EE7.95E562B0
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2900.2873" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Hello:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Some issues from a review of section 11 =
in roughly=20
decreasing order of severity.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>What does Speech-Complete-Timeout mean =
for the=20
speakverify resource? What grammar was a complete match triggered=20
on?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>State machine broken - see previous =
comments to=20
list</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp;</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;From section =
11.6:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;"Before a =
verification/identification=20
session is started, only VERIFY-<BR>&nbsp;&nbsp; ROLLBACK and generic=20
"SET-PARAMS" and "GET-PARAMS" operations may be<BR>&nbsp;&nbsp; =
performed on the=20
verification resource.&nbsp; The server SHOULD return =
402<BR>&nbsp;&nbsp;=20
(Method not valid in this state) for all other operations, such=20
as<BR>&nbsp;&nbsp; VERIFY, or QUERY-VOICEPRINT."<BR>&nbsp;&nbsp;=20
<BR>&nbsp;&nbsp; Why not restrict QUERY-VOICEPRINT, DELETE-VOICEPRINT =
outside of=20
session?<BR>&nbsp;&nbsp; Avoids issues of deleting a voiceprint in use.=20
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>What's the queuing behaviour for VERIFY =
/=20
VERIFY-FROM-BUFFER (presumably none)?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial>If Save-Waveform is true and STOP is issued with =

Abort-Verification false, shouldn't Waveform-URI be =
returned?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial>~~~</FONT></DIV>
<DIV><FONT face=3DArial></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial>
<DIV><FONT face=3DArial>Why not have Save-Waveform work with VERIFY in a =
training=20
session?</FONT></DIV></FONT></DIV>
<DIV><FONT face=3DArial></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial>~~~</FONT></DIV>
<DIV><FONT face=3DArial></FONT>&nbsp;</DIV>
<DIV><BR><FONT face=3DArial size=3D2>&nbsp;"a CLEAR-BUFFER request fails =
if the=20
verification buffer is in" (section 11.7)<BR>&nbsp;&nbsp;&nbsp; -&gt;=20
Specify&nbsp;exactly how it does fail</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; From section 11.18=20
GET-INTERMEDIATE-RESULT:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial>&nbsp;&nbsp; "The verification resource collects =
the=20
accumulated verification results<BR>&nbsp;&nbsp; and returns the =
information in=20
the method response."<BR>&nbsp;&nbsp; <BR>&nbsp;&nbsp; -&gt; Clarify =
that means?=20
What does &lt;incremental&gt; mean in this case?<BR>&nbsp;&nbsp; -&gt; =
Clarify=20
this doesn't work for training results</FONT>&nbsp;<FONT =
face=3DArial>only=20
verification/identification</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Clarify START-OF-INPUT only generated =
for live=20
(VERIFY) requests.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV><FONT face=3DArial =
size=3D2>Which=20
request methods can Num-Min(Max)-Verification-Phrases be specified=20
in?<BR>&nbsp;&nbsp;&nbsp; -&gt; Presume same as =
Min-Verification-Score?</FONT>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>State the default value for=20
Num-Max-Verification-Phrases.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Why can't Save-Waveform&nbsp;be used=20
with&nbsp;SET-/GET-PARAMS?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>~~~</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>New-Audio-Channel - confirm&nbsp; the =
header=20
field&nbsp;can be specified only on VERIFY.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV><FONT face=3DArial =
size=3D2>
<DIV>~~~<BR></DIV>
<DIV>Add an example on training a voiceprint for completeness.</DIV>
<DIV>&nbsp;</DIV>
<DIV>~~~<BR><BR>&nbsp;&nbsp;</DIV></FONT>
<DIV><FONT face=3DArial size=3D2></FONT></DIV></BODY></HTML>

------=_NextPart_000_0D26_01C69EE7.95E562B0--


--===============1495338902==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============1495338902==--


From speechsc-bounces@ietf.org Thu Jul 06 06:01:15 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FyQfa-0004Ux-TS; Thu, 06 Jul 2006 06:01:14 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FyQfZ-0004Us-U4
	for speechsc@ietf.org; Thu, 06 Jul 2006 06:01:13 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FyQfY-0002Uf-DL
	for speechsc@ietf.org; Thu, 06 Jul 2006 06:01:13 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id 08AB7214103; Thu,  6 Jul 2006 10:01:08 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-4.1 required=5.5 tests=ALL_TRUSTED,AWL,BAYES_00,
	HTML_40_50,HTML_MESSAGE autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (LAubervilliers-151-13-89-63.w217-128.abo.wanadoo.fr
	[217.128.136.63])
	by mail.voxpilot.com (Postfix) with ESMTP id 97D9F2140FB
	for <speechsc@ietf.org>; Thu,  6 Jul 2006 10:01:04 +0000 (GMT)
Message-ID: <13f201c6a0eb$746a7390$6600000a@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: <speechsc@ietf.org>
Subject: [speechsc] Accurate determination of START-OF-INPUT time
Date: Thu, 6 Jul 2006 12:00:54 +0100
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.1 (/)
X-Scan-Signature: f4c2cf0bccc868e4cc88dace71fb3f44
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0176279524=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============0176279524==
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_13EF_01C6A0F3.D5A6BFD0"

This is a multi-part message in MIME format.

------=_NextPart_000_13EF_01C6A0F3.D5A6BFD0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

It is useful to know the START-OF-INPUT time in many speech applications =
(e.g. to be able to resume from where one left off or to offset from the =
point of barge-in, e.g. "skip forward two seconds"). While this can be =
*estimated* from the time of receipt of the event, it 's accuracy is =
subject to network transit times.  We've got an NTP timestamp for =
SPEECH-MARKER and I believe we should also have one for START-OF-INPUT. =
Note that, implementation of VoiceXML's marktime requires one subtract =
the last SPEECH-MARKER time from the START-OF-INPUT time.

We could simply add a ;timestamp attribute to Input-Type a la =
Speech-Marker.

Dave
------=_NextPart_000_13EF_01C6A0F3.D5A6BFD0
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2900.2873" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>It is useful to know the START-OF-INPUT =
time in=20
many speech applications (e.g. to be able to resume from where one left =
off or=20
to offset from the point of barge-in, e.g.&nbsp;"skip forward two =
seconds").=20
While this can be *estimated* from the time of receipt of the event, it =
's=20
accuracy is subject to network transit times.&nbsp; </FONT><FONT =
face=3DArial=20
size=3D2>We've got an NTP timestamp for SPEECH-MARKER and I believe we =
should also=20
have one for START-OF-INPUT. </FONT><FONT face=3DArial size=3D2>Note =
that,=20
implementation of VoiceXML's marktime requires one subtract the last=20
SPEECH-MARKER time from the&nbsp;START-OF-INPUT time.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>We could simply add a ;timestamp =
attribute to=20
Input-Type a la Speech-Marker.</FONT>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV></BODY></HTML>

------=_NextPart_000_13EF_01C6A0F3.D5A6BFD0--


--===============0176279524==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============0176279524==--


From speechsc-bounces@ietf.org Thu Jul 06 08:59:48 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FyTSN-0006yC-2F; Thu, 06 Jul 2006 08:59:47 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FyTSL-0006wh-4D
	for speechsc@ietf.org; Thu, 06 Jul 2006 08:59:45 -0400
Received: from e34.co.us.ibm.com ([32.97.110.152])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FyTSJ-0001qS-Os
	for speechsc@ietf.org; Thu, 06 Jul 2006 08:59:45 -0400
Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com
	[9.17.195.106])
	by e34.co.us.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id
	k66CxhmG021215
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL)
	for <speechsc@ietf.org>; Thu, 6 Jul 2006 08:59:43 -0400
Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167])
	by d03relay04.boulder.ibm.com (8.13.6/NCO/VER7.0) with ESMTP id
	k66D05S0115780
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <speechsc@ietf.org>; Thu, 6 Jul 2006 07:00:05 -0600
Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1])
	by d03av01.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id
	k66Cxg4P026859
	for <speechsc@ietf.org>; Thu, 6 Jul 2006 06:59:42 -0600
Received: from d03nm119.boulder.ibm.com (d03nm119.boulder.ibm.com
	[9.17.195.145])
	by d03av01.boulder.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id
	k66CxgH5026841; Thu, 6 Jul 2006 06:59:42 -0600
In-Reply-To: <13f201c6a0eb$746a7390$6600000a@db01.voxpilot.com>
To: "Dave Burke" <david.burke@voxpilot.com>
Subject: Re: [speechsc] Accurate determination of START-OF-INPUT time
MIME-Version: 1.0
X-Mailer: Lotus Notes Release 7.0 HF144 February 01, 2006
Message-ID: <OF4517B39E.C11114B9-ON872571A3.00479C3C-852571A3.004762EF@us.ibm.com>
From: Brett Gavagni <gavagni@us.ibm.com>
Date: Thu, 6 Jul 2006 09:04:24 -0400
X-MIMETrack: Serialize by Router on D03NM119/03/M/IBM(Release 7.0.1HF123 |
	April 14, 2006) at 07/06/2006 07:04:24,
	Serialize complete at 07/06/2006 07:04:24
Content-Type: text/plain; charset="US-ASCII"
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 769a46790fb42fbb0b0cc700c82f7081
Cc: speechsc@ietf.org
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Errors-To: speechsc-bounces@ietf.org

Good suggestion Dave.

This sounds like a reasonable request to facilitate more precise accuracy 
measurements.

Thanks,

Brett Gavagni 
WebSphere Voice Server Development 
http://www-306.ibm.com/software/pervasive/voice_server/ 
gavagni@us.ibm.com


"Dave Burke" <david.burke@voxpilot.com> 
07/06/2006 07:00 AM

To
<speechsc@ietf.org>
cc

Subject
[speechsc] Accurate determination of START-OF-INPUT time


It is useful to know the START-OF-INPUT time in many speech applications 
(e.g. to be able to resume from where one left off or to offset from the 
point of barge-in, e.g. "skip forward two seconds"). While this can be 
*estimated* from the time of receipt of the event, it 's accuracy is 
subject to network transit times.  We've got an NTP timestamp for 
SPEECH-MARKER and I believe we should also have one for START-OF-INPUT. 
Note that, implementation of VoiceXML's marktime requires one subtract the 
last SPEECH-MARKER time from the START-OF-INPUT time.
 
We could simply add a ;timestamp attribute to Input-Type a la 
Speech-Marker. 
 
Dave_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


From speechsc-bounces@ietf.org Thu Jul 06 10:19:19 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FyUhJ-0007bk-Ph; Thu, 06 Jul 2006 10:19:17 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FyUhJ-0007a8-2w
	for speechsc@ietf.org; Thu, 06 Jul 2006 10:19:17 -0400
Received: from g2.genesyslab.com ([198.49.180.210])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FyUhG-0006yn-Na
	for speechsc@ietf.org; Thu, 06 Jul 2006 10:19:17 -0400
Received: from GIMLI.us.int.genesyslab.com ([192.168.20.233]) by
	g2.genesyslab.com with Microsoft SMTPSVC(6.0.3790.1830); 
	Thu, 6 Jul 2006 07:19:13 -0700
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Subject: RE: [speechsc] Accurate determination of START-OF-INPUT time
Date: Thu, 6 Jul 2006 07:19:13 -0700
Message-ID: <911B89A9FD71E649AA624FF24790D76F451837@GIMLI.us.int.genesyslab.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [speechsc] Accurate determination of START-OF-INPUT time
Thread-Index: Acag4zvfzhqtLYLHQACmmMBVBTGDTQAIPLUA
From: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>
To: "Dave Burke" <david.burke@voxpilot.com>,
	<speechsc@ietf.org>
X-OriginalArrivalTime: 06 Jul 2006 14:19:13.0245 (UTC)
	FILETIME=[27DFA8D0:01C6A107]
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 386e0819b1192672467565a524848168
Cc: 
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============2100967681=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============2100967681==
Content-class: urn:content-classes:message
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C6A107.2854AD61"

This is a multi-part message in MIME format.

------_=_NextPart_001_01C6A107.2854AD61
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

I agree that this is useful. I can also think of many instances where an
accurate determination of the "end of input" time would be useful as
well. We should consider if the emma:start and emma:end attributes are
sufficient though. Or do you think there is a reason to have this
information (well the start time) before the recognition completes?

________________________________

From: Dave Burke [mailto:david.burke@voxpilot.com]=20
Sent: July 6, 2006 7:01 AM
To: speechsc@ietf.org
Subject: [speechsc] Accurate determination of START-OF-INPUT time


It is useful to know the START-OF-INPUT time in many speech applications
(e.g. to be able to resume from where one left off or to offset from the
point of barge-in, e.g. "skip forward two seconds"). While this can be
*estimated* from the time of receipt of the event, it 's accuracy is
subject to network transit times.  We've got an NTP timestamp for
SPEECH-MARKER and I believe we should also have one for START-OF-INPUT.
Note that, implementation of VoiceXML's marktime requires one subtract
the last SPEECH-MARKER time from the START-OF-INPUT time.
=20
We could simply add a ;timestamp attribute to Input-Type a la
Speech-Marker.=20
=20
Dave

------_=_NextPart_001_01C6A107.2854AD61
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii">
<META content=3D"MSHTML 6.00.2900.2769" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D222565713-06072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>I agree that this is useful. I can also think =
of many=20
instances where an accurate determination of the "end of input" time =
would be=20
useful as well. We should consider if the emma:start and emma:end =
attributes are=20
sufficient though. Or do you think there is a reason to have this =
information=20
(well the start time) before the recognition =
completes?</FONT></SPAN></DIV><BR>
<DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
<HR tabIndex=3D-1>
<FONT face=3DTahoma size=3D2><B>From:</B> Dave Burke=20
[mailto:david.burke@voxpilot.com] <BR><B>Sent:</B> July 6, 2006 7:01=20
AM<BR><B>To:</B> speechsc@ietf.org<BR><B>Subject:</B> [speechsc] =
Accurate=20
determination of START-OF-INPUT time<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><FONT face=3DArial size=3D2>It is useful to know the START-OF-INPUT =
time in=20
many speech applications (e.g. to be able to resume from where one left =
off or=20
to offset from the point of barge-in, e.g.&nbsp;"skip forward two =
seconds").=20
While this can be *estimated* from the time of receipt of the event, it =
's=20
accuracy is subject to network transit times.&nbsp; </FONT><FONT =
face=3DArial=20
size=3D2>We've got an NTP timestamp for SPEECH-MARKER and I believe we =
should also=20
have one for START-OF-INPUT. </FONT><FONT face=3DArial size=3D2>Note =
that,=20
implementation of VoiceXML's marktime requires one subtract the last=20
SPEECH-MARKER time from the&nbsp;START-OF-INPUT time.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>We could simply add a ;timestamp =
attribute to=20
Input-Type a la Speech-Marker.</FONT>=20
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV></BODY></HTML>

------_=_NextPart_001_01C6A107.2854AD61--


--===============2100967681==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============2100967681==--


From speechsc-bounces@ietf.org Thu Jul 06 11:06:22 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FyVQq-0000u8-TH; Thu, 06 Jul 2006 11:06:20 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FyVQo-0000u3-Va
	for speechsc@ietf.org; Thu, 06 Jul 2006 11:06:18 -0400
Received: from szxga03-in.huawei.com ([61.144.161.55])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FyVQk-0000KB-UZ
	for speechsc@ietf.org; Thu, 06 Jul 2006 11:06:18 -0400
Received: from huawei.com (szxga03-in [172.24.2.9])
	by szxga03-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTP id <0J1Z005CPMCS0F@szxga03-in.huawei.com> for
	speechsc@ietf.org; Thu, 06 Jul 2006 23:14:53 +0800 (CST)
Received: from huawei.com ([172.24.1.18])
	by szxga03-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTP id <0J1Z003M3MCSDC@szxga03-in.huawei.com> for
	speechsc@ietf.org; Thu, 06 Jul 2006 23:14:52 +0800 (CST)
Received: from SANTHANA ([10.18.5.51])
	by szxml03-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTPA id <0J1Z0013MM0F8O@szxml03-in.huawei.com> for
	speechsc@ietf.org; Thu, 06 Jul 2006 23:07:28 +0800 (CST)
Date: Thu, 06 Jul 2006 20:35:19 +0530
From: santhanakrishnan <santhana@huawei.com>
Subject: [speechsc] Some headers are not explained in the draft
To: speechsc@ietf.org
Message-id: <002301c6a10d$9927b180$3305120a@china.huawei.com>
MIME-version: 1.0
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Mailer: Microsoft Office Outlook 11
Thread-index: AcahDZhmcH+Mv1ebSo6QWwhJu6VGaA==
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 7da5a831c477fb6ef97f379a05fb683c
Cc: Sarvi@cisco.com, 'David R Oran' <oran@cisco.com>
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: santhana@huawei.com
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0614152348=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============0614152348==
Content-type: multipart/alternative;
	boundary="Boundary_(ID_XjG+u+Ss6TuypcJG3nRXWw)"

This is a multi-part message in MIME format.

--Boundary_(ID_XjG+u+Ss6TuypcJG3nRXWw)
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT

Hi,

            Some of the headers given in the examples are not explained
anywhere in the draft.

Here I have listed some of the headers from the examples of the
"draft-ietf-speechsc-mrcpv2-10" which are not explained.

 
Voiceprint-Mode

Voice-gender

Voice-variant

Voice-age

Prosody-volume

 
Please clarify on the same.

 
Regards,

Santhanakrishnan;


--Boundary_(ID_XjG+u+Ss6TuypcJG3nRXWw)
Content-type: text/html; charset=us-ascii
Content-transfer-encoding: 7BIT

<html>

<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 11 (filtered)">
<style>
<!--
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman";}
a:link, span.MsoHyperlink
	{color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{color:purple;
	text-decoration:underline;}
span.EmailStyle17
	{font-family:Arial;
	color:windowtext;}
@page Section1
	{size:8.5in 11.0in;
	margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
	{page:Section1;}
-->
</style>

</head>

<body lang=EN-US link=blue vlink=purple>

<div class=Section1>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Hi,</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Some
of the headers given in the examples are not explained anywhere in the draft.</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Here I have listed some of the headers from the examples of
the &#8220;</span></font><font size=2 color=black face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New";color:black'>draft-ietf-speechsc-mrcpv2-10&#8221;
which are not explained.</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>&nbsp;</span></font></p>

<p class=MsoNormal style='text-indent:.5in'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>Voiceprint-Mode</span></font></p>

<p class=MsoNormal style='text-indent:.5in'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>Voice-gender</span></font></p>

<p class=MsoNormal style='text-indent:.5in'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>Voice-variant</span></font></p>

<p class=MsoNormal style='text-indent:.5in'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>Voice-age</span></font></p>

<p class=MsoNormal style='text-indent:.5in'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>Prosody-volume</span></font></p>

<p class=MsoNormal><font size=2 color=black face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New";color:black'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=2 color=black face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New";color:black'>Please clarify
on the same.</span></font></p>

<p class=MsoNormal><font size=2 color=black face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New";color:black'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=2 color=black face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New";color:black'>Regards,</span></font></p>

<p class=MsoNormal><font size=2 color=black face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New";color:black'>Santhanakrishnan;</span></font></p>

</div>

</body>

</html>

--Boundary_(ID_XjG+u+Ss6TuypcJG3nRXWw)--


--===============0614152348==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============0614152348==--


From speechsc-bounces@ietf.org Thu Jul 06 14:00:33 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FyY9Q-0004q1-33; Thu, 06 Jul 2006 14:00:32 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FyY9O-0004pv-H2
	for speechsc@ietf.org; Thu, 06 Jul 2006 14:00:30 -0400
Received: from mail02.corp.tellme.com ([209.157.157.101])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FyY9M-0000ZE-18
	for speechsc@ietf.org; Thu, 06 Jul 2006 14:00:30 -0400
Received: from mail02.corp.tellme.com (localhost [127.0.0.1])
	by localhost.corp.tellme.com (Postfix) with ESMTP
	id 005C93510; Thu,  6 Jul 2006 11:00:26 -0700 (PDT)
Received: from [172.21.50.91] (corby-t40.sea.tellme.com [172.21.50.91])
	by mail02.corp.tellme.com (Postfix) with ESMTP
	id BFD02350F; Thu,  6 Jul 2006 11:00:26 -0700 (PDT)
Message-ID: <44AD4F6F.5010408@tellme.com>
Date: Thu, 06 Jul 2006 10:59:11 -0700
From: Corby Anderson <corby@tellme.com>
User-Agent: Thunderbird 1.5.0.4 (Windows/20060516)
MIME-Version: 1.0
To: Andrew Wahbe <Andrew.Wahbe@genesyslab.com>
Subject: Re: [speechsc] Accurate determination of START-OF-INPUT time
References: <911B89A9FD71E649AA624FF24790D76F451837@GIMLI.us.int.genesyslab.com>
In-Reply-To: <911B89A9FD71E649AA624FF24790D76F451837@GIMLI.us.int.genesyslab.com>
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 8f374d0786b25a451ef87d82c076f593
Cc: speechsc@ietf.org, Dave Burke <david.burke@voxpilot.com>
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0860084074=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.
--===============0860084074==
Content-Type: multipart/alternative;
	boundary="------------070001040103070000070705"

This is a multi-part message in MIME format.
--------------070001040103070000070705
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Yes, yes, very useful!

Two things:

1) Consider a client and server that have poor time synchronization.  
The client wouldn't be able to make much sense of the server's absolute 
timestamp.  Would it be possible to also provide an absolute timestamp 
of the start of audio (from the server's perspective) so that the client 
could determine with certainty how far in to the audio sample the start 
of speech occurred?

2) What about audio samples provided by input-waveform-uri?  In our 
case, we're much more interested in the sample offset where start and 
end of speech occured; not so much interested in the absolute time when 
they occurred.

Corby Anderson
Tellme Networks, Inc.


Andrew Wahbe wrote:
> I agree that this is useful. I can also think of many instances where 
> an accurate determination of the "end of input" time would be useful 
> as well. We should consider if the emma:start and emma:end attributes 
> are sufficient though. Or do you think there is a reason to have this 
> information (well the start time) before the recognition completes?
>
> ------------------------------------------------------------------------
> *From:* Dave Burke [mailto:david.burke@voxpilot.com]
> *Sent:* July 6, 2006 7:01 AM
> *To:* speechsc@ietf.org
> *Subject:* [speechsc] Accurate determination of START-OF-INPUT time
>
> It is useful to know the START-OF-INPUT time in many speech 
> applications (e.g. to be able to resume from where one left off or to 
> offset from the point of barge-in, e.g. "skip forward two seconds"). 
> While this can be *estimated* from the time of receipt of the event, 
> it 's accuracy is subject to network transit times.  We've got an NTP 
> timestamp for SPEECH-MARKER and I believe we should also have one for 
> START-OF-INPUT. Note that, implementation of VoiceXML's marktime 
> requires one subtract the last SPEECH-MARKER time from 
> the START-OF-INPUT time.
>  
> We could simply add a ;timestamp attribute to Input-Type a la 
> Speech-Marker.
>  
> Dave
> ------------------------------------------------------------------------
>
> _______________________________________________
> Speechsc mailing list
> Speechsc@ietf.org
> https://www1.ietf.org/mailman/listinfo/speechsc
>   

--------------070001040103070000070705
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Yes, yes, very useful!<br>
<br>
Two things:<br>
<br>
1) Consider a client and server that have poor time synchronization.&nbsp;
The client wouldn't be able to make much sense of the server's absolute
timestamp.&nbsp; Would it be possible to also provide an absolute timestamp
of the start of audio (from the server's perspective) so that the
client could determine with certainty how far in to the audio sample
the start of speech occurred?<br>
<br>
2) What about audio samples provided by input-waveform-uri?&nbsp; In our
case, we're much more interested in the sample offset where start and
end of speech occured; not so much interested in the absolute time when
they occurred.<br>
<br>
Corby Anderson<br>
Tellme Networks, Inc.<br>
<br>
<br>
Andrew Wahbe wrote:
<blockquote
 cite="mid911B89A9FD71E649AA624FF24790D76F451837@GIMLI.us.int.genesyslab.com"
 type="cite">
  <meta http-equiv="Content-Type" content="text/html; ">
  <meta content="MSHTML 6.00.2900.2769" name="GENERATOR">
  <style></style>
  <div dir="ltr" align="left"><span class="222565713-06072006"><font
 color="#0000ff" face="Arial" size="2">I agree that this is useful. I
can also think of many instances where an accurate determination of the
"end of input" time would be useful as well. We should consider if the
emma:start and emma:end attributes are sufficient though. Or do you
think there is a reason to have this information (well the start time)
before the recognition completes?</font></span></div>
  <br>
  <div class="OutlookMessageHeader" dir="ltr" align="left" lang="en-us">
  <hr tabindex="-1"><font face="Tahoma" size="2"><b>From:</b> Dave
Burke [<a class="moz-txt-link-freetext" href="mailto:david.burke@voxpilot.com">mailto:david.burke@voxpilot.com</a>] <br>
  <b>Sent:</b> July 6, 2006 7:01 AM<br>
  <b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:speechsc@ietf.org">speechsc@ietf.org</a><br>
  <b>Subject:</b> [speechsc] Accurate determination of START-OF-INPUT
time<br>
  </font><br>
  </div>
  <div><font face="Arial" size="2">It is useful to know the
START-OF-INPUT time in many speech applications (e.g. to be able to
resume from where one left off or to offset from the point of barge-in,
e.g.&nbsp;"skip forward two seconds"). While this can be *estimated* from
the time of receipt of the event, it 's accuracy is subject to network
transit times.&nbsp; </font><font face="Arial" size="2">We've got an NTP
timestamp for SPEECH-MARKER and I believe we should also have one for
START-OF-INPUT. </font><font face="Arial" size="2">Note that,
implementation of VoiceXML's marktime requires one subtract the last
SPEECH-MARKER time from the&nbsp;START-OF-INPUT time.</font></div>
  <div>&nbsp;</div>
  <div><font face="Arial" size="2">We could simply add a ;timestamp
attribute to Input-Type a la Speech-Marker.</font>
  <div>&nbsp;</div>
  </div>
  <div><font face="Arial" size="2">Dave</font></div>
  <pre wrap="">
<hr size="4" width="90%">
_______________________________________________
Speechsc mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Speechsc@ietf.org">Speechsc@ietf.org</a>
<a class="moz-txt-link-freetext" href="https://www1.ietf.org/mailman/listinfo/speechsc">https://www1.ietf.org/mailman/listinfo/speechsc</a>
  </pre>
</blockquote>
</body>
</html>

--------------070001040103070000070705--


--===============0860084074==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============0860084074==--


From speechsc-bounces@ietf.org Fri Jul 07 10:45:04 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FyrZm-00080o-Vl; Fri, 07 Jul 2006 10:45:03 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FyrZl-0007qv-DE
	for speechsc@ietf.org; Fri, 07 Jul 2006 10:45:01 -0400
Received: from g2.genesyslab.com ([198.49.180.210])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FyrZi-0008O7-Km
	for speechsc@ietf.org; Fri, 07 Jul 2006 10:45:01 -0400
Received: from GIMLI.us.int.genesyslab.com ([192.168.20.233]) by
	g2.genesyslab.com with Microsoft SMTPSVC(6.0.3790.1830); 
	Fri, 7 Jul 2006 07:44:56 -0700
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Subject: RE: [speechsc] Accurate determination of START-OF-INPUT time
Date: Fri, 7 Jul 2006 07:44:55 -0700
Message-ID: <911B89A9FD71E649AA624FF24790D76F519E85@GIMLI.us.int.genesyslab.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [speechsc] Accurate determination of START-OF-INPUT time
Thread-Index: AcahJhO4aH8Xg3fAQWCuukiZSAXUYwAGLz9w
From: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>
To: "Corby Anderson" <corby@tellme.com>
X-OriginalArrivalTime: 07 Jul 2006 14:44:56.0894 (UTC)
	FILETIME=[EA5F8DE0:01C6A1D3]
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 9cc83ac38bbbabacbf00f656311dd8d8
Cc: speechsc@ietf.org, Dave Burke <david.burke@voxpilot.com>
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============2093650231=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============2093650231==
Content-class: urn:content-classes:message
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C6A1D3.EA88A70B"

This is a multi-part message in MIME format.

------_=_NextPart_001_01C6A1D3.EA88A70B
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Right -- actually, I'd imagine by suggesting an NTP timestamp, Dave was
implying that this is the timestamp in the RTP stream where start of
speech was detected. As the input-waveform-uri begins at the start of
the RECOGNIZE, I can see how absolute times don't work for what you are
proposing unless you also have the timestamp of when recognition
started.
=20
How about the following:
A "Timestamp" header is added that has a timestamp as its value.
The header may appear on a response or an event. On a response, it
indicates the time in the stream at which the associated request was
applied (e.g. when did the RECOGNIZE start). On an event, it indicates
the time at which the event occured (e.g. when did start of speech
occur).
=20
This is a more general solution to what is being done with the
speech-marker (see:
http://www1.ietf.org/mail-archive/web/speechsc/current/msg01813.html) If
we really want to know when these things are happening in relation to
the RTP stream in general then I think a separate timestamp header makes
sense rather than sticking the timestamp information in different
headers for each resource or message type.
=20
Now there is the question of what format to use for the timestamp. It
seems what we are really interested in is the "point of time" in the
stream as opposed to the clock time. In the discussion related to the
speech marker timestamp it was pointed out that the RTP time can be
derived from the NTP time using an RTCP sender report (which includes
the RTP and NTP time when the report was sent). However, it is common
practice for "bursty" or faster-than-real-time transmission of RTP
between voice platforms and voice resources. For example, this can be
done in synthesis to allow telephony card buffers to be filled up, and
in recognition if the voice platform buffers audio while a speech
session is established. Doesn't this destroy the relationship between
NTP time and the RTP timestamp? I think it makes more sense to use the
RTP timestamp here.
=20
=20
Andrew Wahbe

________________________________

From: Corby Anderson [mailto:corby@tellme.com]=20
Sent: July 6, 2006 1:59 PM
To: Andrew Wahbe
Cc: Dave Burke; speechsc@ietf.org
Subject: Re: [speechsc] Accurate determination of START-OF-INPUT time


Yes, yes, very useful!

Two things:

1) Consider a client and server that have poor time synchronization.
The client wouldn't be able to make much sense of the server's absolute
timestamp.  Would it be possible to also provide an absolute timestamp
of the start of audio (from the server's perspective) so that the client
could determine with certainty how far in to the audio sample the start
of speech occurred?

2) What about audio samples provided by input-waveform-uri?  In our
case, we're much more interested in the sample offset where start and
end of speech occured; not so much interested in the absolute time when
they occurred.

Corby Anderson
Tellme Networks, Inc.


Andrew Wahbe wrote:=20

	I agree that this is useful. I can also think of many instances
where an accurate determination of the "end of input" time would be
useful as well. We should consider if the emma:start and emma:end
attributes are sufficient though. Or do you think there is a reason to
have this information (well the start time) before the recognition
completes?

________________________________

	From: Dave Burke [mailto:david.burke@voxpilot.com]=20
	Sent: July 6, 2006 7:01 AM
	To: speechsc@ietf.org
	Subject: [speechsc] Accurate determination of START-OF-INPUT
time
=09
=09
	It is useful to know the START-OF-INPUT time in many speech
applications (e.g. to be able to resume from where one left off or to
offset from the point of barge-in, e.g. "skip forward two seconds").
While this can be *estimated* from the time of receipt of the event, it
's accuracy is subject to network transit times.  We've got an NTP
timestamp for SPEECH-MARKER and I believe we should also have one for
START-OF-INPUT. Note that, implementation of VoiceXML's marktime
requires one subtract the last SPEECH-MARKER time from the
START-OF-INPUT time.
	=20
	We could simply add a ;timestamp attribute to Input-Type a la
Speech-Marker.=20
	=20
	Dave
=09
________________________________


	_______________________________________________
	Speechsc mailing list
	Speechsc@ietf.org
	https://www1.ietf.org/mailman/listinfo/speechsc
	 =20


------_=_NextPart_001_01C6A1D3.EA88A70B
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii">
<META content=3D"MSHTML 6.00.2900.2769" name=3DGENERATOR></HEAD>
<BODY text=3D#000000 bgColor=3D#ffffff>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>Right -- actually, I'd imagine by suggesting an =
NTP=20
timestamp, Dave was implying that this is the timestamp in the RTP =
stream where=20
start of speech was detected. As the input-waveform-uri begins at the =
start of=20
the RECOGNIZE, I can see how absolute times don't work for what you are=20
proposing unless you also have the timestamp of when recognition=20
started.</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>How about the following:</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>A "Timestamp" header is added that has a =
timestamp as its=20
value.</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>The header may appear on a response or an =
event. On a=20
response, it indicates the time in the stream at which the associated =
request=20
was applied (e.g. when did the RECOGNIZE start). On an event, it =
indicates the=20
time at which the event occured (e.g. when did start of speech=20
occur).</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>This is a more general solution to what is =
being done with=20
the speech-marker (see: </FONT><FONT face=3DArial size=3D2><A=20
href=3D"http://www1.ietf.org/mail-archive/web/speechsc/current/msg01813.h=
tml">http://www1.ietf.org/mail-archive/web/speechsc/current/msg01813.html=
</A><FONT=20
color=3D#0000ff>) If </FONT></FONT></SPAN><SPAN =
class=3D888385720-06072006><FONT=20
face=3DArial color=3D#0000ff size=3D2>we really want to know when these =
things are=20
happening in relation to the RTP stream in general then I think a =
separate=20
timestamp header makes sense rather than sticking the timestamp =
information=20
in&nbsp;different headers for each resource or message =
type.</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>Now there is the question of what format to use =
for the=20
timestamp. It seems what we are really interested in is the "point of =
time" in=20
the stream as opposed to the clock time. In the discussion related to =
the speech=20
marker timestamp it was pointed out that the RTP time can be derived =
from the=20
NTP time using an RTCP sender report (which includes the RTP and NTP =
time when=20
the report was sent). However, it is common practice for "bursty" or=20
faster-than-real-time transmission of RTP between voice platforms and =
voice=20
resources. For example, this can be done in synthesis to allow telephony =
card=20
buffers to be filled up, and in recognition if the voice platform =
buffers audio=20
while a speech session is established. Doesn't this destroy the =
relationship=20
between NTP time and the RTP timestamp?&nbsp;I think it makes more sense =
to use=20
the RTP timestamp here.</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>Andrew Wahbe</FONT></SPAN><BR></DIV>
<DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
<HR tabIndex=3D-1>
<FONT face=3DTahoma size=3D2><B>From:</B> Corby Anderson =
[mailto:corby@tellme.com]=20
<BR><B>Sent:</B> July 6, 2006 1:59 PM<BR><B>To:</B> Andrew =
Wahbe<BR><B>Cc:</B>=20
Dave Burke; speechsc@ietf.org<BR><B>Subject:</B> Re: [speechsc] Accurate =

determination of START-OF-INPUT time<BR></FONT><BR></DIV>
<DIV></DIV>Yes, yes, very useful!<BR><BR>Two things:<BR><BR>1) Consider =
a client=20
and server that have poor time synchronization.&nbsp; The client =
wouldn't be=20
able to make much sense of the server's absolute timestamp.&nbsp; Would =
it be=20
possible to also provide an absolute timestamp of the start of audio =
(from the=20
server's perspective) so that the client could determine with certainty =
how far=20
in to the audio sample the start of speech occurred?<BR><BR>2) What =
about audio=20
samples provided by input-waveform-uri?&nbsp; In our case, we're much =
more=20
interested in the sample offset where start and end of speech occured; =
not so=20
much interested in the absolute time when they occurred.<BR><BR>Corby=20
Anderson<BR>Tellme Networks, Inc.<BR><BR><BR>Andrew Wahbe wrote:=20
<BLOCKQUOTE=20
cite=3Dmid911B89A9FD71E649AA624FF24790D76F451837@GIMLI.us.int.genesyslab.=
com=20
type=3D"cite">
  <META content=3D"MSHTML 6.00.2900.2769" name=3DGENERATOR>
  <STYLE></STYLE>

  <DIV dir=3Dltr align=3Dleft><SPAN class=3D222565713-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>I agree that this is useful. I can also think =
of many=20
  instances where an accurate determination of the "end of input" time =
would be=20
  useful as well. We should consider if the emma:start and emma:end =
attributes=20
  are sufficient though. Or do you think there is a reason to have this=20
  information (well the start time) before the recognition=20
  completes?</FONT></SPAN></DIV><BR>
  <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
  <HR tabIndex=3D-1>
  <FONT face=3DTahoma size=3D2><B>From:</B> Dave Burke [<A=20
  class=3Dmoz-txt-link-freetext=20
  =
href=3D"mailto:david.burke@voxpilot.com">mailto:david.burke@voxpilot.com<=
/A>]=20
  <BR><B>Sent:</B> July 6, 2006 7:01 AM<BR><B>To:</B> <A=20
  class=3Dmoz-txt-link-abbreviated=20
  =
href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A><BR><B>Subject:</B=
>=20
  [speechsc] Accurate determination of START-OF-INPUT =
time<BR></FONT><BR></DIV>
  <DIV><FONT face=3DArial size=3D2>It is useful to know the =
START-OF-INPUT time in=20
  many speech applications (e.g. to be able to resume from where one =
left off or=20
  to offset from the point of barge-in, e.g.&nbsp;"skip forward two =
seconds").=20
  While this can be *estimated* from the time of receipt of the event, =
it 's=20
  accuracy is subject to network transit times.&nbsp; </FONT><FONT =
face=3DArial=20
  size=3D2>We've got an NTP timestamp for SPEECH-MARKER and I believe we =
should=20
  also have one for START-OF-INPUT. </FONT><FONT face=3DArial =
size=3D2>Note that,=20
  implementation of VoiceXML's marktime requires one subtract the last=20
  SPEECH-MARKER time from the&nbsp;START-OF-INPUT time.</FONT></DIV>
  <DIV>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>We could simply add a ;timestamp =
attribute to=20
  Input-Type a la Speech-Marker.</FONT>=20
  <DIV>&nbsp;</DIV></DIV>
  <DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV><PRE wrap=3D""><HR =
width=3D"90%" SIZE=3D4>
_______________________________________________
Speechsc mailing list
<A class=3Dmoz-txt-link-abbreviated =
href=3D"mailto:Speechsc@ietf.org">Speechsc@ietf.org</A>
<A class=3Dmoz-txt-link-freetext =
href=3D"https://www1.ietf.org/mailman/listinfo/speechsc">https://www1.iet=
f.org/mailman/listinfo/speechsc</A>
  </PRE></BLOCKQUOTE></BODY></HTML>

------_=_NextPart_001_01C6A1D3.EA88A70B--


--===============2093650231==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============2093650231==--


From speechsc-bounces@ietf.org Fri Jul 07 11:11:04 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1Fyryw-0002Z9-W0; Fri, 07 Jul 2006 11:11:02 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1Fyryv-0002Wi-29
	for speechsc@ietf.org; Fri, 07 Jul 2006 11:11:01 -0400
Received: from e33.co.us.ibm.com ([32.97.110.151])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FyrpL-0002iR-Be
	for speechsc@ietf.org; Fri, 07 Jul 2006 11:01:07 -0400
Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com
	[9.17.195.11])
	by e33.co.us.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id
	k67F151Z008371
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL)
	for <speechsc@ietf.org>; Fri, 7 Jul 2006 11:01:06 -0400
Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167])
	by westrelay02.boulder.ibm.com (8.13.6/NCO/VER7.0) with ESMTP id
	k67F0KgX276204
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <speechsc@ietf.org>; Fri, 7 Jul 2006 09:00:20 -0600
Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1])
	by d03av01.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id
	k67F13aJ007308
	for <speechsc@ietf.org>; Fri, 7 Jul 2006 09:01:04 -0600
Received: from d03nm119.boulder.ibm.com (d03nm119.boulder.ibm.com
	[9.17.195.145])
	by d03av01.boulder.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id
	k67F13JB007285; Fri, 7 Jul 2006 09:01:03 -0600
In-Reply-To: <911B89A9FD71E649AA624FF24790D76F519E85@GIMLI.us.int.genesyslab.com>
To: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>
Subject: RE: [speechsc] Accurate determination of START-OF-INPUT time
MIME-Version: 1.0
X-Mailer: Lotus Notes Release 7.0 HF144 February 01, 2006
Message-ID: <OF4C3A3C4E.149015A2-ON872571A4.00521917-852571A4.00527F16@us.ibm.com>
From: Brett Gavagni <gavagni@us.ibm.com>
Date: Fri, 7 Jul 2006 11:05:45 -0400
X-MIMETrack: Serialize by Router on D03NM119/03/M/IBM(Release 7.0.1HF123 |
	April 14, 2006) at 07/07/2006 09:05:46,
	Serialize complete at 07/07/2006 09:05:46
Content-Type: text/plain; charset="US-ASCII"
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 3d7f2f6612d734db849efa86ea692407
Cc: speechsc@ietf.org, Dave Burke <david.burke@voxpilot.com>
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Errors-To: speechsc-bounces@ietf.org

I agree, a client would probably obtain the most usefulness from a 
"Timestamp" value which represents the point in time w.r.t client's RTP 
stream fed to the server for the recognition request. 

I would also suggest that if this request is added to a future version of 
the MRCPv2 draft that the wording clearly articulates/references the 
wording in the RTP spec.

Specifically, some MRCPv2 draft readers may interpret the value as clock 
time, and not the terminology defined in RFC 3550.

>From RFC 3550:
   timestamp: 32 bits
      The timestamp reflects the sampling instant of the first octet in
      the RTP data packet.  The sampling instant MUST be derived from a
      clock that increments monotonically and linearly in time to allow
      synchronization and jitter calculations (see Section 6.4.1).  The
      resolution of the clock MUST be sufficient for the desired
      synchronization accuracy and for measuring packet arrival jitter
      (one tick per video frame is typically not sufficient).  The clock
      frequency is dependent on the format of data carried as payload
      and is specified statically in the profile or payload format
      specification that defines the format, or MAY be specified
      dynamically for payload formats defined through non-RTP means.  If
      RTP packets are generated periodically, the nominal sampling
      instant as determined from the sampling clock is to be used, not a
      reading of the system clock.  As an example, for fixed-rate audio
      the timestamp clock would likely increment by one for each
      sampling period.  If an audio application reads blocks covering
      160 sampling periods from the input device, the timestamp would be
      increased by 160 for each such block, regardless of whether the
      block is transmitted in a packet or dropped as silent.

Thanks,

Brett Gavagni 
WebSphere Voice Server Development 
http://www-306.ibm.com/software/pervasive/voice_server/
(561) 862-2097 T/L (975) 
gavagni@us.ibm.com


"Andrew Wahbe" <Andrew.Wahbe@genesyslab.com> 
07/07/2006 10:44 AM

To
"Corby Anderson" <corby@tellme.com>
cc
speechsc@ietf.org, Dave Burke <david.burke@voxpilot.com>
Subject
RE: [speechsc] Accurate determination of START-OF-INPUT time


Right -- actually, I'd imagine by suggesting an NTP timestamp, Dave was 
implying that this is the timestamp in the RTP stream where start of 
speech was detected. As the input-waveform-uri begins at the start of the 
RECOGNIZE, I can see how absolute times don't work for what you are 
proposing unless you also have the timestamp of when recognition started.
 
How about the following:
A "Timestamp" header is added that has a timestamp as its value.
The header may appear on a response or an event. On a response, it 
indicates the time in the stream at which the associated request was 
applied (e.g. when did the RECOGNIZE start). On an event, it indicates the 
time at which the event occured (e.g. when did start of speech occur).
 
This is a more general solution to what is being done with the 
speech-marker (see: 
http://www1.ietf.org/mail-archive/web/speechsc/current/msg01813.html) If 
we really want to know when these things are happening in relation to the 
RTP stream in general then I think a separate timestamp header makes sense 
rather than sticking the timestamp information in different headers for 
each resource or message type.
 
Now there is the question of what format to use for the timestamp. It 
seems what we are really interested in is the "point of time" in the 
stream as opposed to the clock time. In the discussion related to the 
speech marker timestamp it was pointed out that the RTP time can be 
derived from the NTP time using an RTCP sender report (which includes the 
RTP and NTP time when the report was sent). However, it is common practice 
for "bursty" or faster-than-real-time transmission of RTP between voice 
platforms and voice resources. For example, this can be done in synthesis 
to allow telephony card buffers to be filled up, and in recognition if the 
voice platform buffers audio while a speech session is established. 
Doesn't this destroy the relationship between NTP time and the RTP 
timestamp? I think it makes more sense to use the RTP timestamp here.
 
 
Andrew Wahbe
From: Corby Anderson [mailto:corby@tellme.com] 
Sent: July 6, 2006 1:59 PM
To: Andrew Wahbe
Cc: Dave Burke; speechsc@ietf.org
Subject: Re: [speechsc] Accurate determination of START-OF-INPUT time

Yes, yes, very useful!

Two things:

1) Consider a client and server that have poor time synchronization.  The 
client wouldn't be able to make much sense of the server's absolute 
timestamp.  Would it be possible to also provide an absolute timestamp of 
the start of audio (from the server's perspective) so that the client 
could determine with certainty how far in to the audio sample the start of 
speech occurred?

2) What about audio samples provided by input-waveform-uri?  In our case, 
we're much more interested in the sample offset where start and end of 
speech occured; not so much interested in the absolute time when they 
occurred.

Corby Anderson
Tellme Networks, Inc.


Andrew Wahbe wrote: 
I agree that this is useful. I can also think of many instances where an 
accurate determination of the "end of input" time would be useful as well. 
We should consider if the emma:start and emma:end attributes are 
sufficient though. Or do you think there is a reason to have this 
information (well the start time) before the recognition completes?

From: Dave Burke [mailto:david.burke@voxpilot.com] 
Sent: July 6, 2006 7:01 AM
To: speechsc@ietf.org
Subject: [speechsc] Accurate determination of START-OF-INPUT time

It is useful to know the START-OF-INPUT time in many speech applications 
(e.g. to be able to resume from where one left off or to offset from the 
point of barge-in, e.g. "skip forward two seconds"). While this can be 
*estimated* from the time of receipt of the event, it 's accuracy is 
subject to network transit times.  We've got an NTP timestamp for 
SPEECH-MARKER and I believe we should also have one for START-OF-INPUT. 
Note that, implementation of VoiceXML's marktime requires one subtract the 
last SPEECH-MARKER time from the START-OF-INPUT time.
 
We could simply add a ;timestamp attribute to Input-Type a la 
Speech-Marker. 
 
Dave

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc
  _______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


From speechsc-bounces@ietf.org Fri Jul 07 11:43:41 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FysUW-0002gT-QY; Fri, 07 Jul 2006 11:43:40 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FysUV-0002gO-Ju
	for speechsc@ietf.org; Fri, 07 Jul 2006 11:43:39 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FysUT-000737-B4
	for speechsc@ietf.org; Fri, 07 Jul 2006 11:43:39 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id 12DF921410F; Fri,  7 Jul 2006 15:43:31 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-4.0 required=5.5 tests=ALL_TRUSTED,AWL,BAYES_00,
	HTML_40_50,HTML_MESSAGE autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (dsl-34-34.dsl.netsource.ie [213.79.34.34])
	by mail.voxpilot.com (Postfix) with ESMTP
	id 135AF21410F; Fri,  7 Jul 2006 15:43:21 +0000 (GMT)
Message-ID: <01bc01c6a1dc$0d10ce00$308ee9d5@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>,
	"Corby Anderson" <corby@tellme.com>
References: <911B89A9FD71E649AA624FF24790D76F519E85@GIMLI.us.int.genesyslab.com>
Subject: Re: [speechsc] Accurate determination of START-OF-INPUT time
Date: Fri, 7 Jul 2006 16:43:08 +0100
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 75ac735ede4d089f7192d230671d536e
Cc: speechsc@ietf.org
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============2060736228=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============2060736228==
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_01B9_01C6A1E4.6D92CAF0"

This is a multi-part message in MIME format.

------=_NextPart_000_01B9_01C6A1E4.6D92CAF0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I'm in support of a separate Timestamp header.=20

On the subject of fast-than-real-time RTP. For stored data, the NTP =
value in the RTCP SR indicates the presentation time (when the sample =
indicated by the RTP timestamp is current)  and so if you speed up =
transmission rates, you accelerate the two clocks.=20

Dave
  ----- Original Message -----=20
  From: Andrew Wahbe=20
  To: Corby Anderson=20
  Cc: Dave Burke ; speechsc@ietf.org=20
  Sent: Friday, July 07, 2006 3:44 PM
  Subject: RE: [speechsc] Accurate determination of START-OF-INPUT time


  Right -- actually, I'd imagine by suggesting an NTP timestamp, Dave =
was implying that this is the timestamp in the RTP stream where start of =
speech was detected. As the input-waveform-uri begins at the start of =
the RECOGNIZE, I can see how absolute times don't work for what you are =
proposing unless you also have the timestamp of when recognition =
started.

  How about the following:
  A "Timestamp" header is added that has a timestamp as its value.
  The header may appear on a response or an event. On a response, it =
indicates the time in the stream at which the associated request was =
applied (e.g. when did the RECOGNIZE start). On an event, it indicates =
the time at which the event occured (e.g. when did start of speech =
occur).

  This is a more general solution to what is being done with the =
speech-marker (see: =
http://www1.ietf.org/mail-archive/web/speechsc/current/msg01813.html) If =
we really want to know when these things are happening in relation to =
the RTP stream in general then I think a separate timestamp header makes =
sense rather than sticking the timestamp information in different =
headers for each resource or message type.

  Now there is the question of what format to use for the timestamp. It =
seems what we are really interested in is the "point of time" in the =
stream as opposed to the clock time. In the discussion related to the =
speech marker timestamp it was pointed out that the RTP time can be =
derived from the NTP time using an RTCP sender report (which includes =
the RTP and NTP time when the report was sent). However, it is common =
practice for "bursty" or faster-than-real-time transmission of RTP =
between voice platforms and voice resources. For example, this can be =
done in synthesis to allow telephony card buffers to be filled up, and =
in recognition if the voice platform buffers audio while a speech =
session is established. Doesn't this destroy the relationship between =
NTP time and the RTP timestamp? I think it makes more sense to use the =
RTP timestamp here.


  Andrew Wahbe


-------------------------------------------------------------------------=
-----
  From: Corby Anderson [mailto:corby@tellme.com]=20
  Sent: July 6, 2006 1:59 PM
  To: Andrew Wahbe
  Cc: Dave Burke; speechsc@ietf.org
  Subject: Re: [speechsc] Accurate determination of START-OF-INPUT time


  Yes, yes, very useful!

  Two things:

  1) Consider a client and server that have poor time synchronization.  =
The client wouldn't be able to make much sense of the server's absolute =
timestamp.  Would it be possible to also provide an absolute timestamp =
of the start of audio (from the server's perspective) so that the client =
could determine with certainty how far in to the audio sample the start =
of speech occurred?

  2) What about audio samples provided by input-waveform-uri?  In our =
case, we're much more interested in the sample offset where start and =
end of speech occured; not so much interested in the absolute time when =
they occurred.

  Corby Anderson
  Tellme Networks, Inc.


  Andrew Wahbe wrote:=20
    I agree that this is useful. I can also think of many instances =
where an accurate determination of the "end of input" time would be =
useful as well. We should consider if the emma:start and emma:end =
attributes are sufficient though. Or do you think there is a reason to =
have this information (well the start time) before the recognition =
completes?


-------------------------------------------------------------------------=
---
    From: Dave Burke [mailto:david.burke@voxpilot.com]=20
    Sent: July 6, 2006 7:01 AM
    To: speechsc@ietf.org
    Subject: [speechsc] Accurate determination of START-OF-INPUT time


    It is useful to know the START-OF-INPUT time in many speech =
applications (e.g. to be able to resume from where one left off or to =
offset from the point of barge-in, e.g. "skip forward two seconds"). =
While this can be *estimated* from the time of receipt of the event, it =
's accuracy is subject to network transit times.  We've got an NTP =
timestamp for SPEECH-MARKER and I believe we should also have one for =
START-OF-INPUT. Note that, implementation of VoiceXML's marktime =
requires one subtract the last SPEECH-MARKER time from the =
START-OF-INPUT time.

    We could simply add a ;timestamp attribute to Input-Type a la =
Speech-Marker.=20

    Dave
-------------------------------------------------------------------------=
---
_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc
  
------=_NextPart_000_01B9_01C6A1E4.6D92CAF0
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2900.2873" name=3DGENERATOR></HEAD>
<BODY text=3D#000000 bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>I'm in support of a separate Timestamp =
header.=20
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>On the subject of fast-than-real-time =
RTP. For=20
stored data, the NTP&nbsp;value in the RTCP SR&nbsp;indicates the =
presentation=20
time (when the sample indicated by the RTP timestamp is current)&nbsp; =
and so if=20
you speed up transmission rates, you accelerate the two clocks. =
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
<BLOCKQUOTE dir=3Dltr=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
  <DIV=20
  style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
  <A title=3DAndrew.Wahbe@genesyslab.com=20
  href=3D"mailto:Andrew.Wahbe@genesyslab.com">Andrew Wahbe</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A title=3Dcorby@tellme.com =

  href=3D"mailto:corby@tellme.com">Corby Anderson</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Cc:</B> <A =
title=3Ddavid.burke@voxpilot.com=20
  href=3D"mailto:david.burke@voxpilot.com">Dave Burke</A> ; <A=20
  title=3Dspeechsc@ietf.org =
href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A>=20
  </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Friday, July 07, 2006 =
3:44 PM</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> RE: [speechsc] =
Accurate=20
  determination of START-OF-INPUT time</DIV>
  <DIV><BR></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>Right -- actually, I'd imagine by suggesting =
an NTP=20
  timestamp, Dave was implying that this is the timestamp in the RTP =
stream=20
  where start of speech was detected. As the input-waveform-uri begins =
at the=20
  start of the RECOGNIZE, I can see how absolute times don't work for =
what you=20
  are proposing unless you also have the timestamp of when recognition=20
  started.</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>How about the following:</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>A "Timestamp" header is added that has a =
timestamp as its=20
  value.</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>The header may appear on a response or an =
event. On a=20
  response, it indicates the time in the stream at which the associated =
request=20
  was applied (e.g. when did the RECOGNIZE start). On an event, it =
indicates the=20
  time at which the event occured (e.g. when did start of speech=20
  occur).</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>This is a more general solution to what is =
being done=20
  with the speech-marker (see: </FONT><FONT face=3DArial size=3D2><A=20
  =
href=3D"http://www1.ietf.org/mail-archive/web/speechsc/current/msg01813.h=
tml">http://www1.ietf.org/mail-archive/web/speechsc/current/msg01813.html=
</A><FONT=20
  color=3D#0000ff>) If </FONT></FONT></SPAN><SPAN =
class=3D888385720-06072006><FONT=20
  face=3DArial color=3D#0000ff size=3D2>we really want to know when =
these things are=20
  happening in relation to the RTP stream in general then I think a =
separate=20
  timestamp header makes sense rather than sticking the timestamp =
information=20
  in&nbsp;different headers for each resource or message=20
  type.</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>Now there is the question of what format to =
use for the=20
  timestamp. It seems what we are really interested in is the "point of =
time" in=20
  the stream as opposed to the clock time. In the discussion related to =
the=20
  speech marker timestamp it was pointed out that the RTP time can be =
derived=20
  from the NTP time using an RTCP sender report (which includes the RTP =
and NTP=20
  time when the report was sent). However, it is common practice for =
"bursty" or=20
  faster-than-real-time transmission of RTP between voice platforms and =
voice=20
  resources. For example, this can be done in synthesis to allow =
telephony card=20
  buffers to be filled up, and in recognition if the voice platform =
buffers=20
  audio while a speech session is established. Doesn't this destroy the=20
  relationship between NTP time and the RTP timestamp?&nbsp;I think it =
makes=20
  more sense to use the RTP timestamp here.</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>Andrew Wahbe</FONT></SPAN><BR></DIV>
  <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
  <HR tabIndex=3D-1>
  <FONT face=3DTahoma size=3D2><B>From:</B> Corby Anderson =
[mailto:corby@tellme.com]=20
  <BR><B>Sent:</B> July 6, 2006 1:59 PM<BR><B>To:</B> Andrew =
Wahbe<BR><B>Cc:</B>=20
  Dave Burke; speechsc@ietf.org<BR><B>Subject:</B> Re: [speechsc] =
Accurate=20
  determination of START-OF-INPUT time<BR></FONT><BR></DIV>
  <DIV></DIV>Yes, yes, very useful!<BR><BR>Two things:<BR><BR>1) =
Consider a=20
  client and server that have poor time synchronization.&nbsp; The =
client=20
  wouldn't be able to make much sense of the server's absolute =
timestamp.&nbsp;=20
  Would it be possible to also provide an absolute timestamp of the =
start of=20
  audio (from the server's perspective) so that the client could =
determine with=20
  certainty how far in to the audio sample the start of speech=20
  occurred?<BR><BR>2) What about audio samples provided by=20
  input-waveform-uri?&nbsp; In our case, we're much more interested in =
the=20
  sample offset where start and end of speech occured; not so much =
interested in=20
  the absolute time when they occurred.<BR><BR>Corby Anderson<BR>Tellme=20
  Networks, Inc.<BR><BR><BR>Andrew Wahbe wrote:=20
  <BLOCKQUOTE=20
  =
cite=3Dmid911B89A9FD71E649AA624FF24790D76F451837@GIMLI.us.int.genesyslab.=
com=20
  type=3D"cite">
    <META content=3D"MSHTML 6.00.2900.2769" name=3DGENERATOR>
    <STYLE></STYLE>

    <DIV dir=3Dltr align=3Dleft><SPAN class=3D222565713-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2>I agree that this is useful. I can also =
think of many=20
    instances where an accurate determination of the "end of input" time =
would=20
    be useful as well. We should consider if the emma:start and emma:end =

    attributes are sufficient though. Or do you think there is a reason =
to have=20
    this information (well the start time) before the recognition=20
    completes?</FONT></SPAN></DIV><BR>
    <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr =
align=3Dleft>
    <HR tabIndex=3D-1>
    <FONT face=3DTahoma size=3D2><B>From:</B> Dave Burke [<A=20
    class=3Dmoz-txt-link-freetext=20
    =
href=3D"mailto:david.burke@voxpilot.com">mailto:david.burke@voxpilot.com<=
/A>]=20
    <BR><B>Sent:</B> July 6, 2006 7:01 AM<BR><B>To:</B> <A=20
    class=3Dmoz-txt-link-abbreviated=20
    =
href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A><BR><B>Subject:</B=
>=20
    [speechsc] Accurate determination of START-OF-INPUT=20
time<BR></FONT><BR></DIV>
    <DIV><FONT face=3DArial size=3D2>It is useful to know the =
START-OF-INPUT time in=20
    many speech applications (e.g. to be able to resume from where one =
left off=20
    or to offset from the point of barge-in, e.g.&nbsp;"skip forward two =

    seconds"). While this can be *estimated* from the time of receipt of =
the=20
    event, it 's accuracy is subject to network transit times.&nbsp;=20
    </FONT><FONT face=3DArial size=3D2>We've got an NTP timestamp for =
SPEECH-MARKER=20
    and I believe we should also have one for START-OF-INPUT. =
</FONT><FONT=20
    face=3DArial size=3D2>Note that, implementation of VoiceXML's =
marktime requires=20
    one subtract the last SPEECH-MARKER time from =
the&nbsp;START-OF-INPUT=20
    time.</FONT></DIV>
    <DIV>&nbsp;</DIV>
    <DIV><FONT face=3DArial size=3D2>We could simply add a ;timestamp =
attribute to=20
    Input-Type a la Speech-Marker.</FONT>=20
    <DIV>&nbsp;</DIV></DIV>
    <DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV><PRE wrap=3D""><HR =
width=3D"90%" SIZE=3D4>
_______________________________________________
Speechsc mailing list
<A class=3Dmoz-txt-link-abbreviated =
href=3D"mailto:Speechsc@ietf.org">Speechsc@ietf.org</A>
<A class=3Dmoz-txt-link-freetext =
href=3D"https://www1.ietf.org/mailman/listinfo/speechsc">https://www1.iet=
f.org/mailman/listinfo/speechsc</A>
  </PRE></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_01B9_01C6A1E4.6D92CAF0--


--===============2060736228==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============2060736228==--


From speechsc-bounces@ietf.org Fri Jul 07 13:37:19 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FyuGU-0008Bx-Vy; Fri, 07 Jul 2006 13:37:18 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FyuGT-0008Bc-Hw
	for speechsc@ietf.org; Fri, 07 Jul 2006 13:37:17 -0400
Received: from g2.genesyslab.com ([198.49.180.210])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FyuGS-0001Rh-NP
	for speechsc@ietf.org; Fri, 07 Jul 2006 13:37:17 -0400
Received: from GIMLI.us.int.genesyslab.com ([192.168.20.233]) by
	g2.genesyslab.com with Microsoft SMTPSVC(6.0.3790.1830); 
	Fri, 7 Jul 2006 10:37:15 -0700
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Subject: RE: [speechsc] Accurate determination of START-OF-INPUT time
Date: Fri, 7 Jul 2006 10:37:14 -0700
Message-ID: <911B89A9FD71E649AA624FF24790D76F519F2B@GIMLI.us.int.genesyslab.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [speechsc] Accurate determination of START-OF-INPUT time
Thread-Index: Acah3B/FlTmwpn+cTfOadlOMII4blwAC7Blw
From: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>
To: "Dave Burke" <david.burke@voxpilot.com>,
	"Corby Anderson" <corby@tellme.com>
X-OriginalArrivalTime: 07 Jul 2006 17:37:15.0177 (UTC)
	FILETIME=[FC785990:01C6A1EB]
X-Spam-Score: 0.1 (/)
X-Scan-Signature: f5c1164b9029aa0dd842007e530e24ad
Cc: speechsc@ietf.org
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1655623479=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============1655623479==
Content-class: urn:content-classes:message
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C6A1EB.FCE16853"

This is a multi-part message in MIME format.

------_=_NextPart_001_01C6A1EB.FCE16853
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

While this makes sense, is it called out in rfc 3550? It seems to be
written from the perspective that the content is real-time, and not
stored or being sent at accelerated rates. It stresses that the most
common wallclock should be preferred over session elapsed time. However,
since the stated intent of the NTP timestamp is for synchronization I
think the behavior you suggest for sped-up transmission makes sense. If
we are going to depend on this behavior though, I think we should call
it out in the MRCP v2 spec (unless this is called out somewhere already
and I missed it).
=20
Also, what if you have not received an RTCP SR before you receive a
timestamp? (Dave, you raised this already in the previous speech marker
discussion.
http://www1.ietf.org/mail-archive/web/speechsc/current/msg01821.html)
Isn't the relevance of this issue is increased if we start using the NTP
timestamps more broadly. The MRCP recognizer resource can't even compute
the NTP timestamp if it hasn't got the SR from the client yet.
=20
Andrew

________________________________

From: Dave Burke [mailto:david.burke@voxpilot.com]=20
Sent: July 7, 2006 11:43 AM
To: Andrew Wahbe; Corby Anderson
Cc: speechsc@ietf.org
Subject: Re: [speechsc] Accurate determination of START-OF-INPUT time


I'm in support of a separate Timestamp header.=20
=20
On the subject of fast-than-real-time RTP. For stored data, the NTP
value in the RTCP SR indicates the presentation time (when the sample
indicated by the RTP timestamp is current)  and so if you speed up
transmission rates, you accelerate the two clocks.=20
=20
Dave

	----- Original Message -----=20
	From: Andrew Wahbe <mailto:Andrew.Wahbe@genesyslab.com> =20
	To: Corby Anderson <mailto:corby@tellme.com> =20
	Cc: Dave Burke <mailto:david.burke@voxpilot.com>  ;
speechsc@ietf.org=20
	Sent: Friday, July 07, 2006 3:44 PM
	Subject: RE: [speechsc] Accurate determination of START-OF-INPUT
time

	Right -- actually, I'd imagine by suggesting an NTP timestamp,
Dave was implying that this is the timestamp in the RTP stream where
start of speech was detected. As the input-waveform-uri begins at the
start of the RECOGNIZE, I can see how absolute times don't work for what
you are proposing unless you also have the timestamp of when recognition
started.
	=20
	How about the following:
	A "Timestamp" header is added that has a timestamp as its value.
	The header may appear on a response or an event. On a response,
it indicates the time in the stream at which the associated request was
applied (e.g. when did the RECOGNIZE start). On an event, it indicates
the time at which the event occured (e.g. when did start of speech
occur).
	=20
	This is a more general solution to what is being done with the
speech-marker (see:
http://www1.ietf.org/mail-archive/web/speechsc/current/msg01813.html) If
we really want to know when these things are happening in relation to
the RTP stream in general then I think a separate timestamp header makes
sense rather than sticking the timestamp information in different
headers for each resource or message type.
	=20
	Now there is the question of what format to use for the
timestamp. It seems what we are really interested in is the "point of
time" in the stream as opposed to the clock time. In the discussion
related to the speech marker timestamp it was pointed out that the RTP
time can be derived from the NTP time using an RTCP sender report (which
includes the RTP and NTP time when the report was sent). However, it is
common practice for "bursty" or faster-than-real-time transmission of
RTP between voice platforms and voice resources. For example, this can
be done in synthesis to allow telephony card buffers to be filled up,
and in recognition if the voice platform buffers audio while a speech
session is established. Doesn't this destroy the relationship between
NTP time and the RTP timestamp? I think it makes more sense to use the
RTP timestamp here.
	=20
	=20
	Andrew Wahbe
=09
________________________________

	From: Corby Anderson [mailto:corby@tellme.com]=20
	Sent: July 6, 2006 1:59 PM
	To: Andrew Wahbe
	Cc: Dave Burke; speechsc@ietf.org
	Subject: Re: [speechsc] Accurate determination of START-OF-INPUT
time
=09
=09
	Yes, yes, very useful!
=09
	Two things:
=09
	1) Consider a client and server that have poor time
synchronization.  The client wouldn't be able to make much sense of the
server's absolute timestamp.  Would it be possible to also provide an
absolute timestamp of the start of audio (from the server's perspective)
so that the client could determine with certainty how far in to the
audio sample the start of speech occurred?
=09
	2) What about audio samples provided by input-waveform-uri?  In
our case, we're much more interested in the sample offset where start
and end of speech occured; not so much interested in the absolute time
when they occurred.
=09
	Corby Anderson
	Tellme Networks, Inc.
=09
=09
	Andrew Wahbe wrote:=20

		I agree that this is useful. I can also think of many
instances where an accurate determination of the "end of input" time
would be useful as well. We should consider if the emma:start and
emma:end attributes are sufficient though. Or do you think there is a
reason to have this information (well the start time) before the
recognition completes?

________________________________

		From: Dave Burke [mailto:david.burke@voxpilot.com]=20
		Sent: July 6, 2006 7:01 AM
		To: speechsc@ietf.org
		Subject: [speechsc] Accurate determination of
START-OF-INPUT time
	=09
	=09
		It is useful to know the START-OF-INPUT time in many
speech applications (e.g. to be able to resume from where one left off
or to offset from the point of barge-in, e.g. "skip forward two
seconds"). While this can be *estimated* from the time of receipt of the
event, it 's accuracy is subject to network transit times.  We've got an
NTP timestamp for SPEECH-MARKER and I believe we should also have one
for START-OF-INPUT. Note that, implementation of VoiceXML's marktime
requires one subtract the last SPEECH-MARKER time from the
START-OF-INPUT time.
		=20
		We could simply add a ;timestamp attribute to Input-Type
a la Speech-Marker.=20
		=20
		Dave
	=09
________________________________


		_______________________________________________
		Speechsc mailing list
		Speechsc@ietf.org
		https://www1.ietf.org/mailman/listinfo/speechsc
		 =20


------_=_NextPart_001_01C6A1EB.FCE16853
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii">
<META content=3D"MSHTML 6.00.2900.2769" name=3DGENERATOR></HEAD>
<BODY text=3D#000000 bgColor=3D#ffffff>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D923220717-07072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>While this makes sense, is it called out in rfc =
3550? It=20
seems to be written&nbsp;from the perspective&nbsp;that the content is=20
real-time, and not stored or being sent at accelerated rates. It =
stresses that=20
the most common&nbsp;wallclock should&nbsp;be&nbsp;preferred =
over&nbsp;session=20
elapsed time. However, since the stated intent of the NTP timestamp is =
for=20
synchronization I think the behavior you suggest for sped-up =
transmission makes=20
sense. If we are going to depend on this behavior though, I think we =
should call=20
it out in the MRCP v2 spec (unless this is called out somewhere already =
and I=20
missed it).</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D923220717-07072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D923220717-07072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>Also, what if you have not received an RTCP SR =
before you=20
receive a timestamp? (Dave, you raised this&nbsp;already in the previous =
speech=20
marker discussion. <A=20
href=3D"http://www1.ietf.org/mail-archive/web/speechsc/current/msg01821.h=
tml">http://www1.ietf.org/mail-archive/web/speechsc/current/msg01821.html=
</A>)=20
Isn't the relevance of this issue is increased if we start using the NTP =

timestamps more broadly. The MRCP recognizer resource can't even compute =
the NTP=20
timestamp if it hasn't got the SR from the client =
yet.</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D923220717-07072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D923220717-07072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>Andrew</FONT></SPAN></DIV><BR>
<DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
<HR tabIndex=3D-1>
<FONT face=3DTahoma size=3D2><B>From:</B> Dave Burke=20
[mailto:david.burke@voxpilot.com] <BR><B>Sent:</B> July 7, 2006 11:43=20
AM<BR><B>To:</B> Andrew Wahbe; Corby Anderson<BR><B>Cc:</B>=20
speechsc@ietf.org<BR><B>Subject:</B> Re: [speechsc] Accurate =
determination of=20
START-OF-INPUT time<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><FONT face=3DArial size=3D2>I'm in support of a separate Timestamp =
header.=20
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>On the subject of fast-than-real-time =
RTP. For=20
stored data, the NTP&nbsp;value in the RTCP SR&nbsp;indicates the =
presentation=20
time (when the sample indicated by the RTP timestamp is current)&nbsp; =
and so if=20
you speed up transmission rates, you accelerate the two clocks. =
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
<BLOCKQUOTE dir=3Dltr=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
  <DIV=20
  style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
  <A title=3DAndrew.Wahbe@genesyslab.com=20
  href=3D"mailto:Andrew.Wahbe@genesyslab.com">Andrew Wahbe</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A title=3Dcorby@tellme.com =

  href=3D"mailto:corby@tellme.com">Corby Anderson</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Cc:</B> <A =
title=3Ddavid.burke@voxpilot.com=20
  href=3D"mailto:david.burke@voxpilot.com">Dave Burke</A> ; <A=20
  title=3Dspeechsc@ietf.org =
href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A>=20
  </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Friday, July 07, 2006 =
3:44 PM</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> RE: [speechsc] =
Accurate=20
  determination of START-OF-INPUT time</DIV>
  <DIV><BR></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>Right -- actually, I'd imagine by suggesting =
an NTP=20
  timestamp, Dave was implying that this is the timestamp in the RTP =
stream=20
  where start of speech was detected. As the input-waveform-uri begins =
at the=20
  start of the RECOGNIZE, I can see how absolute times don't work for =
what you=20
  are proposing unless you also have the timestamp of when recognition=20
  started.</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>How about the following:</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>A "Timestamp" header is added that has a =
timestamp as its=20
  value.</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>The header may appear on a response or an =
event. On a=20
  response, it indicates the time in the stream at which the associated =
request=20
  was applied (e.g. when did the RECOGNIZE start). On an event, it =
indicates the=20
  time at which the event occured (e.g. when did start of speech=20
  occur).</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>This is a more general solution to what is =
being done=20
  with the speech-marker (see: </FONT><FONT face=3DArial size=3D2><A=20
  =
href=3D"http://www1.ietf.org/mail-archive/web/speechsc/current/msg01813.h=
tml">http://www1.ietf.org/mail-archive/web/speechsc/current/msg01813.html=
</A><FONT=20
  color=3D#0000ff>) If </FONT></FONT></SPAN><SPAN =
class=3D888385720-06072006><FONT=20
  face=3DArial color=3D#0000ff size=3D2>we really want to know when =
these things are=20
  happening in relation to the RTP stream in general then I think a =
separate=20
  timestamp header makes sense rather than sticking the timestamp =
information=20
  in&nbsp;different headers for each resource or message=20
  type.</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>Now there is the question of what format to =
use for the=20
  timestamp. It seems what we are really interested in is the "point of =
time" in=20
  the stream as opposed to the clock time. In the discussion related to =
the=20
  speech marker timestamp it was pointed out that the RTP time can be =
derived=20
  from the NTP time using an RTCP sender report (which includes the RTP =
and NTP=20
  time when the report was sent). However, it is common practice for =
"bursty" or=20
  faster-than-real-time transmission of RTP between voice platforms and =
voice=20
  resources. For example, this can be done in synthesis to allow =
telephony card=20
  buffers to be filled up, and in recognition if the voice platform =
buffers=20
  audio while a speech session is established. Doesn't this destroy the=20
  relationship between NTP time and the RTP timestamp?&nbsp;I think it =
makes=20
  more sense to use the RTP timestamp here.</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>Andrew Wahbe</FONT></SPAN><BR></DIV>
  <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
  <HR tabIndex=3D-1>
  <FONT face=3DTahoma size=3D2><B>From:</B> Corby Anderson =
[mailto:corby@tellme.com]=20
  <BR><B>Sent:</B> July 6, 2006 1:59 PM<BR><B>To:</B> Andrew =
Wahbe<BR><B>Cc:</B>=20
  Dave Burke; speechsc@ietf.org<BR><B>Subject:</B> Re: [speechsc] =
Accurate=20
  determination of START-OF-INPUT time<BR></FONT><BR></DIV>
  <DIV></DIV>Yes, yes, very useful!<BR><BR>Two things:<BR><BR>1) =
Consider a=20
  client and server that have poor time synchronization.&nbsp; The =
client=20
  wouldn't be able to make much sense of the server's absolute =
timestamp.&nbsp;=20
  Would it be possible to also provide an absolute timestamp of the =
start of=20
  audio (from the server's perspective) so that the client could =
determine with=20
  certainty how far in to the audio sample the start of speech=20
  occurred?<BR><BR>2) What about audio samples provided by=20
  input-waveform-uri?&nbsp; In our case, we're much more interested in =
the=20
  sample offset where start and end of speech occured; not so much =
interested in=20
  the absolute time when they occurred.<BR><BR>Corby Anderson<BR>Tellme=20
  Networks, Inc.<BR><BR><BR>Andrew Wahbe wrote:=20
  <BLOCKQUOTE=20
  =
cite=3Dmid911B89A9FD71E649AA624FF24790D76F451837@GIMLI.us.int.genesyslab.=
com=20
  type=3D"cite">
    <META content=3D"MSHTML 6.00.2900.2769" name=3DGENERATOR>
    <STYLE></STYLE>

    <DIV dir=3Dltr align=3Dleft><SPAN class=3D222565713-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2>I agree that this is useful. I can also =
think of many=20
    instances where an accurate determination of the "end of input" time =
would=20
    be useful as well. We should consider if the emma:start and emma:end =

    attributes are sufficient though. Or do you think there is a reason =
to have=20
    this information (well the start time) before the recognition=20
    completes?</FONT></SPAN></DIV><BR>
    <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr =
align=3Dleft>
    <HR tabIndex=3D-1>
    <FONT face=3DTahoma size=3D2><B>From:</B> Dave Burke [<A=20
    class=3Dmoz-txt-link-freetext=20
    =
href=3D"mailto:david.burke@voxpilot.com">mailto:david.burke@voxpilot.com<=
/A>]=20
    <BR><B>Sent:</B> July 6, 2006 7:01 AM<BR><B>To:</B> <A=20
    class=3Dmoz-txt-link-abbreviated=20
    =
href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A><BR><B>Subject:</B=
>=20
    [speechsc] Accurate determination of START-OF-INPUT=20
time<BR></FONT><BR></DIV>
    <DIV><FONT face=3DArial size=3D2>It is useful to know the =
START-OF-INPUT time in=20
    many speech applications (e.g. to be able to resume from where one =
left off=20
    or to offset from the point of barge-in, e.g.&nbsp;"skip forward two =

    seconds"). While this can be *estimated* from the time of receipt of =
the=20
    event, it 's accuracy is subject to network transit times.&nbsp;=20
    </FONT><FONT face=3DArial size=3D2>We've got an NTP timestamp for =
SPEECH-MARKER=20
    and I believe we should also have one for START-OF-INPUT. =
</FONT><FONT=20
    face=3DArial size=3D2>Note that, implementation of VoiceXML's =
marktime requires=20
    one subtract the last SPEECH-MARKER time from =
the&nbsp;START-OF-INPUT=20
    time.</FONT></DIV>
    <DIV>&nbsp;</DIV>
    <DIV><FONT face=3DArial size=3D2>We could simply add a ;timestamp =
attribute to=20
    Input-Type a la Speech-Marker.</FONT>=20
    <DIV>&nbsp;</DIV></DIV>
    <DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV><PRE wrap=3D""><HR =
width=3D"90%" SIZE=3D4>
_______________________________________________
Speechsc mailing list
<A class=3Dmoz-txt-link-abbreviated =
href=3D"mailto:Speechsc@ietf.org">Speechsc@ietf.org</A>
<A class=3Dmoz-txt-link-freetext =
href=3D"https://www1.ietf.org/mailman/listinfo/speechsc">https://www1.iet=
f.org/mailman/listinfo/speechsc</A>
  </PRE></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>

------_=_NextPart_001_01C6A1EB.FCE16853--


--===============1655623479==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============1655623479==--


From speechsc-bounces@ietf.org Sat Jul 08 07:01:42 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FzAZ6-0005Sy-RO; Sat, 08 Jul 2006 07:01:36 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FzAZ6-0005St-9s
	for speechsc@ietf.org; Sat, 08 Jul 2006 07:01:36 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FzAZ4-0005Z8-Gd
	for speechsc@ietf.org; Sat, 08 Jul 2006 07:01:36 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id 0F85C2140F4; Sat,  8 Jul 2006 11:01:32 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-3.9 required=5.5 tests=ALL_TRUSTED,AWL,BAYES_00,
	HTML_40_50,HTML_MESSAGE autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (dsl-34-34.dsl.netsource.ie [213.79.34.34])
	by mail.voxpilot.com (Postfix) with ESMTP
	id 255602140F1; Sat,  8 Jul 2006 11:01:25 +0000 (GMT)
Message-ID: <027801c6a27d$d24d2280$308ee9d5@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>,
	"Corby Anderson" <corby@tellme.com>
References: <911B89A9FD71E649AA624FF24790D76F519F2B@GIMLI.us.int.genesyslab.com>
Subject: Re: [speechsc] Accurate determination of START-OF-INPUT time
Date: Sat, 8 Jul 2006 12:01:10 +0100
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 4da339c42fe5be09fa120bb0fcc4a575
Cc: speechsc@ietf.org
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0103022443=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============0103022443==
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0275_01C6A286.33AFE280"

This is a multi-part message in MIME format.

------=_NextPart_000_0275_01C6A286.33AFE280
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

RFC 3550 is certainly written from the perspective of real-time sampled =
content so you need to "read between the lines" a little to infer =
faster-than-real-time behaviour. I'm not sure we should really attempt =
to specify this kind of behaviour, however -  seems like a potential can =
of worms. I would feel more comfortable if left it as an implementation =
detail...

The transmission time of SR packets can be problematic both from an =
immediacy and reliability point of view. A practical solution is to send =
send an SR packet immediately on initiation of a new media stream though =
there is no guarantee that that packet doesn't get lost. RTSP addresses =
this through the rtptime parameter in the RTP-Info header, for =
example...

However, if we use timestamps relatively (e.g. we report the timestamp =
when recognition or playback started - i.e. IN-PROGRESS -, the instant =
of barge-in, the occurrence of a speech marker) then we don't need to =
worry about correlating with the RTP timestamp. Indeed, we can't assume =
the MRCP client has access to the media source/sink (hence RTP =
timestamps) in any meaningful way...

Dave

  ----- Original Message -----=20
  From: Andrew Wahbe=20
  To: Dave Burke ; Corby Anderson=20
  Cc: speechsc@ietf.org=20
  Sent: Friday, July 07, 2006 6:37 PM
  Subject: RE: [speechsc] Accurate determination of START-OF-INPUT time


  While this makes sense, is it called out in rfc 3550? It seems to be =
written from the perspective that the content is real-time, and not =
stored or being sent at accelerated rates. It stresses that the most =
common wallclock should be preferred over session elapsed time. However, =
since the stated intent of the NTP timestamp is for synchronization I =
think the behavior you suggest for sped-up transmission makes sense. If =
we are going to depend on this behavior though, I think we should call =
it out in the MRCP v2 spec (unless this is called out somewhere already =
and I missed it).

  Also, what if you have not received an RTCP SR before you receive a =
timestamp? (Dave, you raised this already in the previous speech marker =
discussion. =
http://www1.ietf.org/mail-archive/web/speechsc/current/msg01821.html) =
Isn't the relevance of this issue is increased if we start using the NTP =
timestamps more broadly. The MRCP recognizer resource can't even compute =
the NTP timestamp if it hasn't got the SR from the client yet.

  Andrew


-------------------------------------------------------------------------=
-----
  From: Dave Burke [mailto:david.burke@voxpilot.com]=20
  Sent: July 7, 2006 11:43 AM
  To: Andrew Wahbe; Corby Anderson
  Cc: speechsc@ietf.org
  Subject: Re: [speechsc] Accurate determination of START-OF-INPUT time


  I'm in support of a separate Timestamp header.=20

  On the subject of fast-than-real-time RTP. For stored data, the NTP =
value in the RTCP SR indicates the presentation time (when the sample =
indicated by the RTP timestamp is current)  and so if you speed up =
transmission rates, you accelerate the two clocks.=20

  Dave
    ----- Original Message -----=20
    From: Andrew Wahbe=20
    To: Corby Anderson=20
    Cc: Dave Burke ; speechsc@ietf.org=20
    Sent: Friday, July 07, 2006 3:44 PM
    Subject: RE: [speechsc] Accurate determination of START-OF-INPUT =
time


    Right -- actually, I'd imagine by suggesting an NTP timestamp, Dave =
was implying that this is the timestamp in the RTP stream where start of =
speech was detected. As the input-waveform-uri begins at the start of =
the RECOGNIZE, I can see how absolute times don't work for what you are =
proposing unless you also have the timestamp of when recognition =
started.

    How about the following:
    A "Timestamp" header is added that has a timestamp as its value.
    The header may appear on a response or an event. On a response, it =
indicates the time in the stream at which the associated request was =
applied (e.g. when did the RECOGNIZE start). On an event, it indicates =
the time at which the event occured (e.g. when did start of speech =
occur).

    This is a more general solution to what is being done with the =
speech-marker (see: =
http://www1.ietf.org/mail-archive/web/speechsc/current/msg01813.html) If =
we really want to know when these things are happening in relation to =
the RTP stream in general then I think a separate timestamp header makes =
sense rather than sticking the timestamp information in different =
headers for each resource or message type.

    Now there is the question of what format to use for the timestamp. =
It seems what we are really interested in is the "point of time" in the =
stream as opposed to the clock time. In the discussion related to the =
speech marker timestamp it was pointed out that the RTP time can be =
derived from the NTP time using an RTCP sender report (which includes =
the RTP and NTP time when the report was sent). However, it is common =
practice for "bursty" or faster-than-real-time transmission of RTP =
between voice platforms and voice resources. For example, this can be =
done in synthesis to allow telephony card buffers to be filled up, and =
in recognition if the voice platform buffers audio while a speech =
session is established. Doesn't this destroy the relationship between =
NTP time and the RTP timestamp? I think it makes more sense to use the =
RTP timestamp here.


    Andrew Wahbe


-------------------------------------------------------------------------=
---
    From: Corby Anderson [mailto:corby@tellme.com]=20
    Sent: July 6, 2006 1:59 PM
    To: Andrew Wahbe
    Cc: Dave Burke; speechsc@ietf.org
    Subject: Re: [speechsc] Accurate determination of START-OF-INPUT =
time


    Yes, yes, very useful!

    Two things:

    1) Consider a client and server that have poor time synchronization. =
 The client wouldn't be able to make much sense of the server's absolute =
timestamp.  Would it be possible to also provide an absolute timestamp =
of the start of audio (from the server's perspective) so that the client =
could determine with certainty how far in to the audio sample the start =
of speech occurred?

    2) What about audio samples provided by input-waveform-uri?  In our =
case, we're much more interested in the sample offset where start and =
end of speech occured; not so much interested in the absolute time when =
they occurred.

    Corby Anderson
    Tellme Networks, Inc.


    Andrew Wahbe wrote:=20
      I agree that this is useful. I can also think of many instances =
where an accurate determination of the "end of input" time would be =
useful as well. We should consider if the emma:start and emma:end =
attributes are sufficient though. Or do you think there is a reason to =
have this information (well the start time) before the recognition =
completes?


-------------------------------------------------------------------------=
-
      From: Dave Burke [mailto:david.burke@voxpilot.com]=20
      Sent: July 6, 2006 7:01 AM
      To: speechsc@ietf.org
      Subject: [speechsc] Accurate determination of START-OF-INPUT time


      It is useful to know the START-OF-INPUT time in many speech =
applications (e.g. to be able to resume from where one left off or to =
offset from the point of barge-in, e.g. "skip forward two seconds"). =
While this can be *estimated* from the time of receipt of the event, it =
's accuracy is subject to network transit times.  We've got an NTP =
timestamp for SPEECH-MARKER and I believe we should also have one for =
START-OF-INPUT. Note that, implementation of VoiceXML's marktime =
requires one subtract the last SPEECH-MARKER time from the =
START-OF-INPUT time.

      We could simply add a ;timestamp attribute to Input-Type a la =
Speech-Marker.=20

      Dave
-------------------------------------------------------------------------=
-
_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc
  
------=_NextPart_000_0275_01C6A286.33AFE280
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2900.2873" name=3DGENERATOR></HEAD>
<BODY text=3D#000000 bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>RFC 3550 is certainly written from the =
perspective=20
of real-time sampled content so you need to "read between the lines" a =
little to=20
infer faster-than-real-time behaviour. I'm not sure we should really =
attempt to=20
specify this kind of behaviour, however -&nbsp;&nbsp;seems like a =
potential can=20
of worms. I would feel more comfortable if left&nbsp;it as&nbsp;an=20
implementation detail...</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>The transmission time of SR packets can =
be=20
problematic both from an immediacy and reliability point of view. A =
practical=20
solution is to send</FONT><FONT face=3DArial size=3D2> send an SR packet =
immediately=20
on initiation of a new media stream though there is no guarantee that =
that=20
packet doesn't get lost. RTSP addresses this through the rtptime =
parameter in=20
the RTP-Info&nbsp;header, for example...</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>However, if we use timestamps =
relatively (e.g. we=20
report the timestamp when recognition or playback started - i.e. =
IN-PROGRESS -,=20
the instant of barge-in, the occurrence of a speech marker) then we =
don't need=20
to worry about correlating with the RTP timestamp. Indeed, we can't =
assume the=20
MRCP client has access to the media source/sink (hence RTP =
timestamps)&nbsp;in=20
any meaningful way...</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<BLOCKQUOTE dir=3Dltr=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
  <DIV=20
  style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
  <A title=3DAndrew.Wahbe@genesyslab.com=20
  href=3D"mailto:Andrew.Wahbe@genesyslab.com">Andrew Wahbe</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3Ddavid.burke@voxpilot.com=20
  href=3D"mailto:david.burke@voxpilot.com">Dave Burke</A> ; <A=20
  title=3Dcorby@tellme.com href=3D"mailto:corby@tellme.com">Corby =
Anderson</A>=20
</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Cc:</B> <A =
title=3Dspeechsc@ietf.org=20
  href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Friday, July 07, 2006 =
6:37 PM</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> RE: [speechsc] =
Accurate=20
  determination of START-OF-INPUT time</DIV>
  <DIV><BR></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D923220717-07072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>While this makes sense, is it called out in =
rfc 3550? It=20
  seems to be written&nbsp;from the perspective&nbsp;that the content is =

  real-time, and not stored or being sent at accelerated rates. It =
stresses that=20
  the most common&nbsp;wallclock should&nbsp;be&nbsp;preferred =
over&nbsp;session=20
  elapsed time. However, since the stated intent of the NTP timestamp is =
for=20
  synchronization I think the behavior you suggest for sped-up =
transmission=20
  makes sense. If we are going to depend on this behavior though, I =
think we=20
  should call it out in the MRCP v2 spec (unless this is called out =
somewhere=20
  already and I missed it).</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D923220717-07072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D923220717-07072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>Also, what if you have not received an RTCP =
SR before you=20
  receive a timestamp? (Dave, you raised this&nbsp;already in the =
previous=20
  speech marker discussion. <A=20
  =
href=3D"http://www1.ietf.org/mail-archive/web/speechsc/current/msg01821.h=
tml">http://www1.ietf.org/mail-archive/web/speechsc/current/msg01821.html=
</A>)=20
  Isn't the relevance of this issue is increased if we start using the =
NTP=20
  timestamps more broadly. The MRCP recognizer resource can't even =
compute the=20
  NTP timestamp if it hasn't got the SR from the client =
yet.</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D923220717-07072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D923220717-07072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>Andrew</FONT></SPAN></DIV><BR>
  <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
  <HR tabIndex=3D-1>
  <FONT face=3DTahoma size=3D2><B>From:</B> Dave Burke=20
  [mailto:david.burke@voxpilot.com] <BR><B>Sent:</B> July 7, 2006 11:43=20
  AM<BR><B>To:</B> Andrew Wahbe; Corby Anderson<BR><B>Cc:</B>=20
  speechsc@ietf.org<BR><B>Subject:</B> Re: [speechsc] Accurate =
determination of=20
  START-OF-INPUT time<BR></FONT><BR></DIV>
  <DIV></DIV>
  <DIV><FONT face=3DArial size=3D2>I'm in support of a separate =
Timestamp header.=20
  </FONT></DIV>
  <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>On the subject of fast-than-real-time =
RTP. For=20
  stored data, the NTP&nbsp;value in the RTCP SR&nbsp;indicates the =
presentation=20
  time (when the sample indicated by the RTP timestamp is current)&nbsp; =
and so=20
  if you speed up transmission rates, you accelerate the two clocks.=20
  </FONT></DIV>
  <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
  <BLOCKQUOTE dir=3Dltr=20
  style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
    <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
    <DIV=20
    style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
    <A title=3DAndrew.Wahbe@genesyslab.com=20
    href=3D"mailto:Andrew.Wahbe@genesyslab.com">Andrew Wahbe</A> </DIV>
    <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3Dcorby@tellme.com=20
    href=3D"mailto:corby@tellme.com">Corby Anderson</A> </DIV>
    <DIV style=3D"FONT: 10pt arial"><B>Cc:</B> <A =
title=3Ddavid.burke@voxpilot.com=20
    href=3D"mailto:david.burke@voxpilot.com">Dave Burke</A> ; <A=20
    title=3Dspeechsc@ietf.org=20
    href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A> </DIV>
    <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Friday, July 07, 2006 =
3:44=20
    PM</DIV>
    <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> RE: [speechsc] =
Accurate=20
    determination of START-OF-INPUT time</DIV>
    <DIV><BR></DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2>Right -- actually, I'd imagine by =
suggesting an NTP=20
    timestamp, Dave was implying that this is the timestamp in the RTP =
stream=20
    where start of speech was detected. As the input-waveform-uri begins =
at the=20
    start of the RECOGNIZE, I can see how absolute times don't work for =
what you=20
    are proposing unless you also have the timestamp of when recognition =

    started.</FONT></SPAN></DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2>How about the =
following:</FONT></SPAN></DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2>A "Timestamp" header is added that has a =
timestamp as=20
    its value.</FONT></SPAN></DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2>The header may appear on a response or an =
event. On a=20
    response, it indicates the time in the stream at which the =
associated=20
    request was applied (e.g. when did the RECOGNIZE start). On an =
event, it=20
    indicates the time at which the event occured (e.g. when did start =
of speech=20
    occur).</FONT></SPAN></DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2>This is a more general solution to what is =
being done=20
    with the speech-marker (see: </FONT><FONT face=3DArial size=3D2><A=20
    =
href=3D"http://www1.ietf.org/mail-archive/web/speechsc/current/msg01813.h=
tml">http://www1.ietf.org/mail-archive/web/speechsc/current/msg01813.html=
</A><FONT=20
    color=3D#0000ff>) If </FONT></FONT></SPAN><SPAN =
class=3D888385720-06072006><FONT=20
    face=3DArial color=3D#0000ff size=3D2>we really want to know when =
these things are=20
    happening in relation to the RTP stream in general then I think a =
separate=20
    timestamp header makes sense rather than sticking the timestamp =
information=20
    in&nbsp;different headers for each resource or message=20
    type.</FONT></SPAN></DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2>Now there is the question of what format to =
use for the=20
    timestamp. It seems what we are really interested in is the "point =
of time"=20
    in the stream as opposed to the clock time. In the discussion =
related to the=20
    speech marker timestamp it was pointed out that the RTP time can be =
derived=20
    from the NTP time using an RTCP sender report (which includes the =
RTP and=20
    NTP time when the report was sent). However, it is common practice =
for=20
    "bursty" or faster-than-real-time transmission of RTP between voice=20
    platforms and voice resources. For example, this can be done in =
synthesis to=20
    allow telephony card buffers to be filled up, and in recognition if =
the=20
    voice platform buffers audio while a speech session is established. =
Doesn't=20
    this destroy the relationship between NTP time and the RTP =
timestamp?&nbsp;I=20
    think it makes more sense to use the RTP timestamp =
here.</FONT></SPAN></DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D888385720-06072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2>Andrew Wahbe</FONT></SPAN><BR></DIV>
    <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr =
align=3Dleft>
    <HR tabIndex=3D-1>
    <FONT face=3DTahoma size=3D2><B>From:</B> Corby Anderson=20
    [mailto:corby@tellme.com] <BR><B>Sent:</B> July 6, 2006 1:59=20
    PM<BR><B>To:</B> Andrew Wahbe<BR><B>Cc:</B> Dave Burke;=20
    speechsc@ietf.org<BR><B>Subject:</B> Re: [speechsc] Accurate =
determination=20
    of START-OF-INPUT time<BR></FONT><BR></DIV>
    <DIV></DIV>Yes, yes, very useful!<BR><BR>Two things:<BR><BR>1) =
Consider a=20
    client and server that have poor time synchronization.&nbsp; The =
client=20
    wouldn't be able to make much sense of the server's absolute=20
    timestamp.&nbsp; Would it be possible to also provide an absolute =
timestamp=20
    of the start of audio (from the server's perspective) so that the =
client=20
    could determine with certainty how far in to the audio sample the =
start of=20
    speech occurred?<BR><BR>2) What about audio samples provided by=20
    input-waveform-uri?&nbsp; In our case, we're much more interested in =
the=20
    sample offset where start and end of speech occured; not so much =
interested=20
    in the absolute time when they occurred.<BR><BR>Corby =
Anderson<BR>Tellme=20
    Networks, Inc.<BR><BR><BR>Andrew Wahbe wrote:=20
    <BLOCKQUOTE=20
    =
cite=3Dmid911B89A9FD71E649AA624FF24790D76F451837@GIMLI.us.int.genesyslab.=
com=20
    type=3D"cite">
      <META content=3D"MSHTML 6.00.2900.2769" name=3DGENERATOR>
      <STYLE></STYLE>

      <DIV dir=3Dltr align=3Dleft><SPAN class=3D222565713-06072006><FONT =
face=3DArial=20
      color=3D#0000ff size=3D2>I agree that this is useful. I can also =
think of many=20
      instances where an accurate determination of the "end of input" =
time would=20
      be useful as well. We should consider if the emma:start and =
emma:end=20
      attributes are sufficient though. Or do you think there is a =
reason to=20
      have this information (well the start time) before the recognition =

      completes?</FONT></SPAN></DIV><BR>
      <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr =
align=3Dleft>
      <HR tabIndex=3D-1>
      <FONT face=3DTahoma size=3D2><B>From:</B> Dave Burke [<A=20
      class=3Dmoz-txt-link-freetext=20
      =
href=3D"mailto:david.burke@voxpilot.com">mailto:david.burke@voxpilot.com<=
/A>]=20
      <BR><B>Sent:</B> July 6, 2006 7:01 AM<BR><B>To:</B> <A=20
      class=3Dmoz-txt-link-abbreviated=20
      =
href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A><BR><B>Subject:</B=
>=20
      [speechsc] Accurate determination of START-OF-INPUT=20
      time<BR></FONT><BR></DIV>
      <DIV><FONT face=3DArial size=3D2>It is useful to know the =
START-OF-INPUT time=20
      in many speech applications (e.g. to be able to resume from where =
one left=20
      off or to offset from the point of barge-in, e.g.&nbsp;"skip =
forward two=20
      seconds"). While this can be *estimated* from the time of receipt =
of the=20
      event, it 's accuracy is subject to network transit times.&nbsp;=20
      </FONT><FONT face=3DArial size=3D2>We've got an NTP timestamp for=20
      SPEECH-MARKER and I believe we should also have one for =
START-OF-INPUT.=20
      </FONT><FONT face=3DArial size=3D2>Note that, implementation of =
VoiceXML's=20
      marktime requires one subtract the last SPEECH-MARKER time from=20
      the&nbsp;START-OF-INPUT time.</FONT></DIV>
      <DIV>&nbsp;</DIV>
      <DIV><FONT face=3DArial size=3D2>We could simply add a ;timestamp =
attribute to=20
      Input-Type a la Speech-Marker.</FONT>=20
      <DIV>&nbsp;</DIV></DIV>
      <DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV><PRE =
wrap=3D""><HR width=3D"90%" SIZE=3D4>
_______________________________________________
Speechsc mailing list
<A class=3Dmoz-txt-link-abbreviated =
href=3D"mailto:Speechsc@ietf.org">Speechsc@ietf.org</A>
<A class=3Dmoz-txt-link-freetext =
href=3D"https://www1.ietf.org/mailman/listinfo/speechsc">https://www1.iet=
f.org/mailman/listinfo/speechsc</A>
  </PRE></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_0275_01C6A286.33AFE280--


--===============0103022443==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============0103022443==--


From speechsc-bounces@ietf.org Mon Jul 10 03:50:56 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FzqXf-0005gr-Jh; Mon, 10 Jul 2006 03:50:55 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FzqXf-0005gg-5j
	for speechsc@ietf.org; Mon, 10 Jul 2006 03:50:55 -0400
Received: from stsc1260-eth-s1-s1p1-vip.va.neustar.com ([156.154.16.129]
	helo=chiedprmail1.ietf.org)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Fzpnu-00037A-Tz
	for speechsc@ietf.org; Mon, 10 Jul 2006 03:03:38 -0400
Received: from szxga01-in.huawei.com ([61.144.161.53])
	by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1FzpbF-0000W3-Qt
	for speechsc@ietf.org; Mon, 10 Jul 2006 02:50:39 -0400
Received: from huawei.com (szxga01-in [172.24.2.3])
	by szxga01-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTP id <0J26002SCDJVWN@szxga01-in.huawei.com> for
	speechsc@ietf.org; Mon, 10 Jul 2006 14:47:55 +0800 (CST)
Received: from huawei.com ([172.24.1.24])
	by szxga01-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTP id <0J2600GOODJUVJ@szxga01-in.huawei.com> for
	speechsc@ietf.org; Mon, 10 Jul 2006 14:47:55 +0800 (CST)
Received: from srilakshmi ([10.18.5.95])
	by szxml04-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTPA id <0J2600GTXDYW2Q@szxml04-in.huawei.com> for
	speechsc@ietf.org; Mon, 10 Jul 2006 14:56:57 +0800 (CST)
Date: Mon, 10 Jul 2006 12:15:47 +0530
From: Srilakshmi <ksrilakshmi@huawei.com>
In-reply-to: <0d2901c69edf$39091530$6600000a@db01.voxpilot.com>
To: 'Dave Burke' <david.burke@voxpilot.com>, speechsc@ietf.org
Message-id: <000c01c6a3ec$79f0c770$5f05120a@srilakshmi>
MIME-version: 1.0
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1807
X-Mailer: Microsoft Outlook, Build 10.0.6626
Importance: Normal
X-Priority: 3 (Normal)
X-MSMail-priority: Normal
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 88b11fc64c1bfdb4425294ef5374ca07
Cc: 
Subject: [Speechsc] Some issues in section 11
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0139712354=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============0139712354==
Content-type: multipart/alternative;
	boundary="Boundary_(ID_9/jng9r+F7K5ZS/moLoA+A)"

This is a multi-part message in MIME format.

--Boundary_(ID_9/jng9r+F7K5ZS/moLoA+A)
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT

Hi,

 
Some issues in section 11 of draft 10:

 
1. In section 11.6

In example for START-SESSION, Verification-Mode header is printed as
Voiceprint-Mode. This can be corrected.

 
2. In section 11.5 

in the first example for verification result, the voiceprint element data
are not ordered according to their cumulative verification scores. But the
statement in 11.5.1 says that they should ordered according to the
cumulative verification match score with the highest score first.

 
A snippet from draft 10:

 
11.5.1. Voiceprint

   This element in the verification results provides information on how the
speech data matched a single voiceprint.  The result data returned may have
more than one such entity in the case of Identification or
Multi-Verification.  Each "<voiceprint>" element and the XML data within the
element describe verification result information for how well the speech
data matched that particular voiceprint.  The list of voiceprint element
data are ordered according to their cumulative verification match scores,
with the highest score first.

 
3. In section 11.5

the element needmoredata is not explained in detail.

 
Regards,

Srilakshmi K.

  
--Boundary_(ID_9/jng9r+F7K5ZS/moLoA+A)
Content-type: text/html; charset=us-ascii
Content-transfer-encoding: 7BIT

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>

<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">


<meta name=Generator content="Microsoft Word 10 (filtered)">

<style>
<!--
 /* Font Definitions */
 @font-face
	{font-family:PMingLiU;
	panose-1:2 1 6 1 0 1 1 1 1 1;}
@font-face
	{font-family:"\@PMingLiU";
	panose-1:0 0 0 0 0 0 0 0 0 0;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman";}
a:link, span.MsoHyperlink
	{color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{color:purple;
	text-decoration:underline;}
p
	{margin-right:0in;
	margin-left:0in;
	font-size:12.0pt;
	font-family:"Times New Roman";}
span.EmailStyle18
	{font-family:Arial;
	color:navy;}
@page Section1
	{size:8.5in 11.0in;
	margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
	{page:Section1;}
-->
</style>

</head>

<body bgcolor=white lang=EN-US link=blue vlink=purple>

<div class=Section1>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Hi,</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Some issues in section 11 of draft 10:</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>1. In section 11.6</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>In example for START-SESSION, Verification-Mode header is
printed as Voiceprint-Mode. This can be corrected.</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>2. In section 11.5 </span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>in the first example for verification result, the voiceprint
element data are not ordered according to their cumulative verification scores.
But the statement in 11.5.1 says that they should ordered according to the
cumulative verification match score with the highest score first.</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>A snippet from draft 10:</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>&nbsp;</span></font></p>

<p class=MsoNormal style='margin-left:.5in'><font size=2 face=Arial><span
style='font-size:10.0pt;font-family:Arial'>11.5.1. Voiceprint</span></font></p>

<p class=MsoNormal style='margin-left:.5in'><font size=2 face=Arial><span
style='font-size:10.0pt;font-family:Arial'>&nbsp;&nbsp; This element in the
verification results provides information on how the speech data matched a
single voiceprint.&nbsp; The result data returned may have more than one such
entity in the case of Identification or Multi-Verification.&nbsp; Each
&quot;&lt;voiceprint&gt;&quot; element and the XML data within the element
describe verification result information for how well the speech data matched
that particular voiceprint.&nbsp; <b><span style='font-weight:bold'>The list of
voiceprint element data are ordered according to their cumulative verification
match scores, with the highest score first.</span></b></span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>3. In section 11.5</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>the element needmoredata is not explained in detail.</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Regards,</span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Srilakshmi K.</span></font></p>

<div>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;</span></font></p>

</div>

</div>

</body>

</html>

--Boundary_(ID_9/jng9r+F7K5ZS/moLoA+A)--


--===============0139712354==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============0139712354==--


From speechsc-bounces@ietf.org Mon Jul 10 05:32:55 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1Fzs8M-0001JG-N5; Mon, 10 Jul 2006 05:32:54 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1Fzs8M-0001JB-9m
	for speechsc@ietf.org; Mon, 10 Jul 2006 05:32:54 -0400
Received: from maile.telecomitalia.it ([156.54.233.31])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Fzs8H-00054A-Cg
	for speechsc@ietf.org; Mon, 10 Jul 2006 05:32:54 -0400
Received: from ptpxch007ba020.idc.cww.telecomitalia.it ([156.54.240.50]) by
	maile.telecomitalia.it with Microsoft SMTPSVC(6.0.3790.1830);
	Mon, 10 Jul 2006 11:32:44 +0200
Received: from PTPEVS106BA020.idc.cww.telecomitalia.it ([156.54.241.223]) by
	ptpxch007ba020.idc.cww.telecomitalia.it with Microsoft
	SMTPSVC(6.0.3790.1830); Mon, 10 Jul 2006 11:32:43 +0200
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.2663
Content-Class: urn:content-classes:message
MIME-Version: 1.0
Importance: normal
Priority: normal
Subject: [speechsc] Cache-control issue
Date: Mon, 10 Jul 2006 11:30:01 +0200
Message-ID: <01C0B9926BC410459FE9AACE49B815027714C3@PTPEVS106BA020.idc.cww.telecomitalia.it>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [speechsc] Cache-control issue
thread-index: AcakA2sOnYjMLFaZQi+2QOZGUKrjxA==
From: "Bergallo Patrizio" <patrizio.bergallo@loquendo.com>
To: <speechsc@ietf.org>
X-OriginalArrivalTime: 10 Jul 2006 09:32:43.0853 (UTC)
	FILETIME=[CBD997D0:01C6A403]
X-Spam-Score: 0.5 (/)
X-Scan-Signature: a2c12dacc0736f14d6b540e805505a86
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1570097695=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============1570097695==
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_10B780_01C6A414.90044D20"
Content-Class: urn:content-classes:message

This is a multi-part message in MIME format.

------=_NextPart_000_10B780_01C6A414.90044D20
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hi,

in the Cache-Control general header description (6.2.13, pag 33-34) is
stated:

   If the server implements content caching, it MUST adhere to the cache
   correctness rules of HTTP 1.1 [6] when accessing and caching stored
   content.  In particular, the "expires" and "cache-control" headers of
   the cached URI or document MUST be honored and take precedence over
   the Cache-Control defaults set by this header.=20

While, at the end of the same section, is stated:

   If both the MRCPv2 cache-control directive and the cached entry on
   the server include "max-age" directives, then the lesser of the two
   values is used for determining the freshness of the cached entry for
   that request.

It seems to me that the two sentences are in contradiction; what is the
correct behaviour?

Patrizio Bergallo, Loquendo.
=20


Gruppo Telecom Italia - Direzione e coordinamento di Telecom Italia =
S.p.A.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
CONFIDENTIALITY NOTICE
This message and its attachments are addressed solely to the persons =
above and may contain confidential information. If you have received the =
message in error, be informed that any use of the content hereof is =
prohibited. Please return it immediately to the sender and delete the =
message. Should you have any questions, please send an e_mail to =
<mailto:webmaster@telecomitalia.it>webmaster@telecomitalia.it. Thank =
you<http://www.loquendo.com>www.loquendo.com
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

------=_NextPart_000_10B780_01C6A414.90044D20
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 =
Transitional//EN"><HTML><HEAD><META HTTP-EQUIV=3D"Content-Type" =
CONTENT=3D"text/html; =
charset=3Diso-8859-1"></HEAD><BODY><DIV>&nbsp;</DIV>Hi,<BR><BR>in the =
Cache-Control general header description (6.2.13, pag 33-34) =
is<BR>stated:<BR><BR>   If the server implements content caching, it =
MUST adhere to the cache<BR>   correctness rules of HTTP 1.1 [6] when =
accessing and caching stored<BR>   content.  In particular, the =
"expires" and "cache-control" headers of<BR>   the cached URI or =
document MUST be honored and take precedence over<BR>   the =
Cache-Control defaults set by this header. <BR><BR>While, at the end of =
the same section, is stated:<BR><BR>   If both the MRCPv2 cache-control =
directive and the cached entry on<BR>   the server include "max-age" =
directives, then the lesser of the two<BR>   values is used for =
determining the freshness of the cached entry for<BR>   that =
request.<BR><BR>It seems to me that the two sentences are in =
contradiction; what is the<BR>correct behaviour?<BR><BR>Patrizio =
Bergallo, Loquendo.<BR> <BR><BR>Gruppo=20
Telecom Italia - Direzione e coordinamento di Telecom Italia =
S.p.A.<BR><BR><FONT=20
size=3D3>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D</FONT><BR>CONFIDENTIALITY=20
NOTICE<BR>This message and its attachments are addressed solely to the=20
persons<BR>above and may contain confidential information. If you have=20
received<BR>the message in error, be informed that any use of the =
content=20
hereof<BR>is prohibited. Please return it immediately to the sender and=20
delete<BR>the message. Should you have any questions, please send an =
e_mail=20
to<BR>&lt;<A=20
href=3D"mailto:webmaster@telecomitalia.it">mailto:webmaster@telecomitalia=
.it</A>&gt;webmaster@telecomitalia.it.=20
Thank you<BR>&lt;<A=20
href=3D"http://www.loquendo.com">http://www.loquendo.com</A>&gt;www.loque=
ndo.com<BR><FONT=20
size=3D3>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D</FONT><BR></P></FONT>
</BODY></HTML>
------=_NextPart_000_10B780_01C6A414.90044D20--


--===============1570097695==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============1570097695==--


From speechsc-bounces@ietf.org Mon Jul 10 06:57:54 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1FztSb-0005Md-49; Mon, 10 Jul 2006 06:57:53 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1FztSZ-0005MY-Gx
	for speechsc@ietf.org; Mon, 10 Jul 2006 06:57:51 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FztSU-0007Q2-P7
	for speechsc@ietf.org; Mon, 10 Jul 2006 06:57:51 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id 5A47D214107; Mon, 10 Jul 2006 10:57:45 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-3.9 required=5.5 tests=ALL_TRUSTED,BAYES_00,
	HTML_40_50,HTML_MESSAGE autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (unknown [10.0.0.102])
	by mail.voxpilot.com (Postfix) with ESMTP
	id 35BDE214107; Mon, 10 Jul 2006 10:57:39 +0000 (GMT)
Message-ID: <005e01c6a40f$a2888b00$6700000a@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: "Bergallo Patrizio" <patrizio.bergallo@loquendo.com>, <speechsc@ietf.org>
References: <01C0B9926BC410459FE9AACE49B815027714C3@PTPEVS106BA020.idc.cww.telecomitalia.it>
Subject: Re: [speechsc] Cache-control issue
Date: Mon, 10 Jul 2006 11:57:28 +0100
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.1 (/)
X-Scan-Signature: e8c5db863102a3ada84e0cd52a81a79e
Cc: 
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0826111418=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============0826111418==
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_005B_01C6A418.04223980"

This is a multi-part message in MIME format.

------=_NextPart_000_005B_01C6A418.04223980
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I agree - they are in contradiction. There are several other issues in =
the same section which I reported previously:

http://www1.ietf.org/mail-archive/web/speechsc/current/msg01539.html

Dave
  ----- Original Message -----=20
  From: Bergallo Patrizio=20
  To: speechsc@ietf.org=20
  Sent: Monday, July 10, 2006 10:30 AM
  Subject: [speechsc] Cache-control issue


  Hi,

  in the Cache-Control general header description (6.2.13, pag 33-34) is
  stated:

  If the server implements content caching, it MUST adhere to the cache
  correctness rules of HTTP 1.1 [6] when accessing and caching stored
  content. In particular, the "expires" and "cache-control" headers of
  the cached URI or document MUST be honored and take precedence over
  the Cache-Control defaults set by this header.=20

  While, at the end of the same section, is stated:

  If both the MRCPv2 cache-control directive and the cached entry on
  the server include "max-age" directives, then the lesser of the two
  values is used for determining the freshness of the cached entry for
  that request.

  It seems to me that the two sentences are in contradiction; what is =
the
  correct behaviour?

  Patrizio Bergallo, Loquendo.


  Gruppo Telecom Italia - Direzione e coordinamento di Telecom Italia =
S.p.A.

  =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
  CONFIDENTIALITY NOTICE
  This message and its attachments are addressed solely to the persons
  above and may contain confidential information. If you have received
  the message in error, be informed that any use of the content hereof
  is prohibited. Please return it immediately to the sender and delete
  the message. Should you have any questions, please send an e_mail to
  <mailto:webmaster@telecomitalia.it>webmaster@telecomitalia.it. Thank =
you
  <http://www.loquendo.com>www.loquendo.com
  =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D


-------------------------------------------------------------------------=
-----


  _______________________________________________
  Speechsc mailing list
  Speechsc@ietf.org
  https://www1.ietf.org/mailman/listinfo/speechsc

------=_NextPart_000_005B_01C6A418.04223980
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2900.2873" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>I agree - they are in contradiction. =
There are=20
several other issues in the same section which I reported=20
previously:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><A=20
href=3D"http://www1.ietf.org/mail-archive/web/speechsc/current/msg01539.h=
tml">http://www1.ietf.org/mail-archive/web/speechsc/current/msg01539.html=
</A></FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
<BLOCKQUOTE=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
  <DIV=20
  style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
  <A title=3Dpatrizio.bergallo@loquendo.com=20
  href=3D"mailto:patrizio.bergallo@loquendo.com">Bergallo Patrizio</A> =
</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3Dspeechsc@ietf.org=20
  href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Monday, July 10, 2006 =
10:30=20
AM</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [speechsc] =
Cache-control=20
  issue</DIV>
  <DIV><BR></DIV>
  <DIV>&nbsp;</DIV>Hi,<BR><BR>in the Cache-Control general header =
description=20
  (6.2.13, pag 33-34) is<BR>stated:<BR><BR>If the server implements =
content=20
  caching, it MUST adhere to the cache<BR>correctness rules of HTTP 1.1 =
[6] when=20
  accessing and caching stored<BR>content. In particular, the "expires" =
and=20
  "cache-control" headers of<BR>the cached URI or document MUST be =
honored and=20
  take precedence over<BR>the Cache-Control defaults set by this header. =

  <BR><BR>While, at the end of the same section, is stated:<BR><BR>If =
both the=20
  MRCPv2 cache-control directive and the cached entry on<BR>the server =
include=20
  "max-age" directives, then the lesser of the two<BR>values is used for =

  determining the freshness of the cached entry for<BR>that =
request.<BR><BR>It=20
  seems to me that the two sentences are in contradiction; what is=20
  the<BR>correct behaviour?<BR><BR>Patrizio Bergallo,=20
  Loquendo.<BR><BR><BR>Gruppo Telecom Italia - Direzione e coordinamento =
di=20
  Telecom Italia S.p.A.<BR><BR><FONT=20
  =
size=3D3>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D</FONT><BR>CONFIDENTIALITY=20
  NOTICE<BR>This message and its attachments are addressed solely to the =

  persons<BR>above and may contain confidential information. If you have =

  received<BR>the message in error, be informed that any use of the =
content=20
  hereof<BR>is prohibited. Please return it immediately to the sender =
and=20
  delete<BR>the message. Should you have any questions, please send an =
e_mail=20
  to<BR>&lt;<A=20
  =
href=3D"mailto:webmaster@telecomitalia.it">mailto:webmaster@telecomitalia=
.it</A>&gt;webmaster@telecomitalia.it.=20
  Thank you<BR>&lt;<A=20
  =
href=3D"http://www.loquendo.com">http://www.loquendo.com</A>&gt;www.loque=
ndo.com<BR><FONT=20
  =
size=3D3>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D</FONT><BR>
  <P></P></FONT>
  <P>
  <HR>

  <P></P>_______________________________________________<BR>Speechsc =
mailing=20
  =
list<BR>Speechsc@ietf.org<BR>https://www1.ietf.org/mailman/listinfo/speec=
hsc<BR></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_005B_01C6A418.04223980--


--===============0826111418==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============0826111418==--


From speechsc-bounces@ietf.org Tue Jul 11 01:57:17 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1G0BFF-0004VB-5W; Tue, 11 Jul 2006 01:57:17 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1G0BFE-0004V1-6a
	for speechsc@ietf.org; Tue, 11 Jul 2006 01:57:16 -0400
Received: from szxga01-in.huawei.com ([61.144.161.53])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G0BF8-00050j-Ka
	for speechsc@ietf.org; Tue, 11 Jul 2006 01:57:15 -0400
Received: from huawei.com (szxga01-in [172.24.2.3])
	by szxga01-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTP id <0J280032W5KQM6@szxga01-in.huawei.com> for
	speechsc@ietf.org; Tue, 11 Jul 2006 13:50:50 +0800 (CST)
Received: from huawei.com ([172.24.1.24])
	by szxga01-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTP id <0J28004YV5KPC0@szxga01-in.huawei.com> for
	speechsc@ietf.org; Tue, 11 Jul 2006 13:50:50 +0800 (CST)
Received: from srilakshmi ([10.18.5.95])
	by szxml04-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTPA id <0J28001DI5ZSXO@szxml04-in.huawei.com> for
	speechsc@ietf.org; Tue, 11 Jul 2006 13:59:54 +0800 (CST)
Date: Tue, 11 Jul 2006 11:18:41 +0530
From: Srilakshmi <ksrilakshmi@huawei.com>
In-reply-to: <0d2901c69edf$39091530$6600000a@db01.voxpilot.com>
To: speechsc@ietf.org
Message-id: <003401c6a4ad$aa813610$5f05120a@srilakshmi>
MIME-version: 1.0
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1807
X-Mailer: Microsoft Outlook, Build 10.0.6626
Importance: Normal
X-Priority: 3 (Normal)
X-MSMail-priority: Normal
X-Spam-Score: 0.0 (/)
X-Scan-Signature: d4673ea769cc9931269061744af205ce
Cc: "'Saravanan Shanmugham \(sarvi\)'" <sarvi@cisco.com>,
	'Dave Burke' <david.burke@voxpilot.com>
Subject: [Speechsc] Some queries in section 11
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0292475728=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============0292475728==
Content-type: multipart/alternative;
	boundary="Boundary_(ID_901xdV1Imsw2JYLZE37cqg)"

This is a multi-part message in MIME format.

--Boundary_(ID_901xdV1Imsw2JYLZE37cqg)
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT

1. 

How does the END-SESSION behave if a request is in progress at the
verification system? 

Should the END-SESSION be graceful or ungraceful or is it dependent on the
abort model that we choose?

 
Here in this example call flow a VERIFY request is in progress at the
server. An END-SESSION method is called in this situation.

Now will we get VERIFICATION-COMPLETE or not?

 
  MRCP client           MRCP server         

   |--START-SESSION------>|         

   |                      |         

   |<--200 COMPLETE-------|         

   |                      |         

   |----VERIFY ---------->|         

   |                      |         

   |<--200 IN PROGRESS----|         

   |                      |         

   |<--START-OF-INPUT-----|         

   |                      |         

   |----END-SESSION------>|         

   |                      |         

 
A snippet from draft 10 on END-SESSION:

11.7.  END-SESSION

 
   The END-SESSION method terminates an ongoing verification session and

   releases the verification voiceprint resources.  The session may

   terminate in one of three ways:

   a.  abort - the voiceprint adaptation or creation may be aborted so

       that the voiceprint remains unchanged (or is not created).

   b.  commit - when terminating a voiceprint training session, the new

       voiceprint is committed to the repository.

   c.  adapt - an existing voiceprint is modified using a successful

       verification.

 
   The header "Abort-Model" MAY be included in the END-SESSION to

   control whether or not to abort any pending changes to the

   voiceprint.  The default behavior is to commit (not abort) any

   pending changes to the designated voiceprint.

 
This explains the changes to the voiceprint on an END-SESSION request call
but doesn't address the above issues.

 
2. How will the VERIFY-ROLLBACK work outside a session?

Is it the last voiceprint that is adapted that will be rolled back to its
previous state or 

is it the last session's voiceprint that is rolled back.

 
A snippet from section 11.6 of draft 10:

 
   Before a verification/identification session is started, only VERIFY-

   ROLLBACK and generic "SET-PARAMS" and "GET-PARAMS" operations may be

   performed on the verification resource.  The server SHOULD return 402

   (Method not valid in this state) for all other operations, such as

   VERIFY, or QUERY-VOICEPRINT.

 
3. What should the behavior of the system be in case if it receives messages
with unwanted but supported headers? 

For example adapt model set to false in a START-SESSION method in train
mode. 

Should we take the header into account and not adapt the voiceprint or 

should we ignore it taking the train mode into consideration and train and
adapt the voiceprint or

should we ignore the request as an invalid one?

 
4. We can have Speech incomplete timeout header in the SI/SV system similar
to speech complete timeout header. 

 
5.  

A snippet from 11.4.16 Completion-Cause of draft 10:

 
   | 006        | out-of-sequence          | Verification operation    |

   |            |                          | failed due to             |

   |            |                          | out-of-sequence method    |

   |            |                          | invocations. For example |

   |            |                          | calling VERIFY before     |

   |            |                          | QUERY-VOICEPRINT.    

 
How can this scenario of "calling VERIFY before QUERY-VOICEPRINT" be an
out-of-sequence request?

 
Regards, 

Srilakshmi K.


--Boundary_(ID_901xdV1Imsw2JYLZE37cqg)
Content-type: text/html; charset=us-ascii
Content-transfer-encoding: 7BIT

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>

<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">


<meta name=Generator content="Microsoft Word 10 (filtered)">

<style>
<!--
 /* Font Definitions */
 @font-face
	{font-family:PMingLiU;
	panose-1:2 1 6 1 0 1 1 1 1 1;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
	{font-family:"\@PMingLiU";
	panose-1:0 0 0 0 0 0 0 0 0 0;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman";}
a:link, span.MsoHyperlink
	{color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{color:purple;
	text-decoration:underline;}
p
	{margin-right:0in;
	margin-left:0in;
	font-size:12.0pt;
	font-family:"Times New Roman";}
span.EmailStyle18
	{font-family:Arial;
	color:navy;}
@page Section1
	{size:8.5in 11.0in;
	margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
	{page:Section1;}
-->
</style>

</head>

<body bgcolor=white lang=EN-US link=blue vlink=purple>

<div class=Section1>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>1. </span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>How does the END-SESSION behave if a request is in progress at the
verification system? </span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>Should the END-SESSION be graceful or ungraceful or is it dependent on
the abort model that we choose?</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>Here in this example call flow a VERIFY request is in progress at the
server. An END-SESSION method is called in this situation.</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>Now will we get VERIFICATION-COMPLETE or not?</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>&nbsp;</span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New"'>&nbsp; MRCP
client&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; MRCP
server&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New"'>&nbsp;&nbsp;
|--START-SESSION------&gt;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></font></p>

<p class=MsoNormal><font size=2 face="Courier New"><span style='font-size:10.0pt;
font-family:"Courier New"'>&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></font></p>

<p class=MsoNormal><font size=2 face="Courier New"><span style='font-size:10.0pt;
font-family:"Courier New"'>&nbsp;&nbsp; |&lt;--200
COMPLETE-------|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></font></p>

<p class=MsoNormal><font size=2 face="Courier New"><span style='font-size:10.0pt;
font-family:"Courier New"'>&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></font></p>

<p class=MsoNormal><font size=2 face="Courier New"><span style='font-size:10.0pt;
font-family:"Courier New"'>&nbsp;&nbsp; |----VERIFY
----------&gt;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></font></p>

<p class=MsoNormal><font size=2 face="Courier New"><span style='font-size:10.0pt;
font-family:"Courier New"'>&nbsp;&nbsp; | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></font></p>

<p class=MsoNormal><font size=2 face="Courier New"><span style='font-size:10.0pt;
font-family:"Courier New"'>&nbsp;&nbsp; |&lt;--200 IN
PROGRESS----|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></font></p>

<p class=MsoNormal><font size=2 face="Courier New"><span style='font-size:10.0pt;
font-family:"Courier New"'>&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></font></p>

<p class=MsoNormal><font size=2 face="Courier New"><span style='font-size:10.0pt;
font-family:"Courier New"'>&nbsp;&nbsp;
|&lt;--START-OF-INPUT-----|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></font></p>

<p class=MsoNormal><font size=2 face="Courier New"><span style='font-size:10.0pt;
font-family:"Courier New"'>&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></font></p>

<p class=MsoNormal><font size=2 face="Courier New"><span style='font-size:10.0pt;
font-family:"Courier New"'>&nbsp;&nbsp;
|----END-SESSION------&gt;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></font></p>

<p class=MsoNormal><font size=2 face="Courier New"><span style='font-size:10.0pt;
font-family:"Courier New"'>&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></font></p>

<p class=MsoNormal><font size=2 face=Tahoma><span style='font-size:10.0pt;
font-family:Tahoma'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>A snippet from draft 10 on END-SESSION:</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><b><font
size=2 color="#000032" face="Courier New"><span style='font-size:10.0pt;
font-family:"Courier New";color:#000032;font-weight:bold'>11.7.&nbsp; END-SESSION</span></font></b></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; The END-SESSION method terminates an ongoing
verification session and</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; releases the verification voiceprint resources.&nbsp;
The session may</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; terminate in one of three ways:</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; a.&nbsp; abort - the voiceprint adaptation or
creation may be aborted so</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; that the voiceprint remains
unchanged (or is not created).</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; b.&nbsp; commit - when terminating a voiceprint
training session, the new</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; voiceprint is committed to
the repository.</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; c.&nbsp; adapt - an existing voiceprint is modified
using a successful</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; verification.</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; The header &quot;Abort-Model&quot; MAY be included in
the END-SESSION to</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; control whether or not to abort any pending changes
to the</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; voiceprint.&nbsp; The default behavior is to commit
(not abort) any</span></font></p>

<p class=MsoNormal style='margin-left:.5in;text-autospace:none'><font size=2
color=black face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; pending changes to the designated voiceprint.</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>This explains the changes to the voiceprint on an END-SESSION request call
but doesn&#8217;t address the above issues.</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>2. How will the VERIFY-ROLLBACK work outside a session?</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>Is it the last voiceprint that is adapted that will be rolled back to
its previous state or </span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>is it the last session&#8217;s voiceprint that is rolled back.</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>A snippet from section 11.6 of draft 10:</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>&nbsp;</span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; <b><span style='font-weight:bold'>Before a
verification/identification session is started, only VERIFY-</span></b></span></font></p>

<p class=MsoNormal style='text-autospace:none'><b><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black;font-weight:bold'>&nbsp;&nbsp; ROLLBACK and generic
&quot;SET-PARAMS&quot; and &quot;GET-PARAMS&quot; operations may be</span></font></b></p>

<p class=MsoNormal style='text-autospace:none'><b><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black;font-weight:bold'>&nbsp;&nbsp; performed on the verification
resource.</span></font></b><font size=2 color=black face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New";color:black'>&nbsp; The
server SHOULD return 402</span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; (Method not valid in this state) for all other
operations, such as</span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; VERIFY, or QUERY-VOICEPRINT.</span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>3. What should the behavior of the system be in case if it receives messages
with unwanted but supported headers? </span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>For example adapt model set to false in a START-SESSION method in train
mode. </span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>Should we take the header into account and not adapt the voiceprint or </span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>should we ignore it taking the train mode into consideration and train
and adapt the voiceprint or</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>should we ignore the request as an invalid one?</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>4. We can have Speech incomplete timeout header in the SI/SV system
similar to speech complete timeout header. </span></font></p>

<p class=MsoNormal><b><font size=2 color=black face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New";color:black;font-weight:bold'>&nbsp;</span></font></b></p>

<p class=MsoNormal style='text-autospace:none'><b><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black;font-weight:bold'>5. </span></font></b><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;</span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>A snippet from <b><span style='font-weight:bold'>11.4.16</span></b>
</span></font><b><font size=2 color="#000032" face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New";color:#000032;font-weight:
bold'>Completion-Cause </span></font></b><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>of draft 10:</span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;</span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; | 006&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
out-of-sequence&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
Verification operation&nbsp;&nbsp;&nbsp; |</span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|
failed due
to&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |</span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
| out-of-sequence method&nbsp;&nbsp;&nbsp; |</span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
| invocations. For example |</span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|<b><span style='font-weight:bold'> calling VERIFY before</span></b>&nbsp;&nbsp;&nbsp;&nbsp;
|</span></font></p>

<p class=MsoNormal style='text-autospace:none'><font size=2 color=black
face="Courier New"><span style='font-size:10.0pt;font-family:"Courier New";
color:black'>&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
| <b><span style='font-weight:bold'>QUERY-VOICEPRINT.</span></b>&nbsp;&nbsp;&nbsp;
</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>How can this scenario of &#8220;calling VERIFY before QUERY-VOICEPRINT&#8221;
be an out-of-sequence request?</span></font></p>

<p class=MsoNormal><b><font size=2 color=black face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New";color:black;font-weight:bold'>&nbsp;</span></font></b></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>Regards, </span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>Srilakshmi K.</span></font></p>

</div>

</body>

</html>

--Boundary_(ID_901xdV1Imsw2JYLZE37cqg)--


--===============0292475728==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============0292475728==--


From speechsc-bounces@ietf.org Thu Jul 13 03:55:47 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1G0w2y-0002EY-M2; Thu, 13 Jul 2006 03:55:44 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1G0w2x-0002ET-20
	for speechsc@ietf.org; Thu, 13 Jul 2006 03:55:43 -0400
Received: from szxga02-in.huawei.com ([61.144.161.54])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G0w2v-0005iU-AG
	for speechsc@ietf.org; Thu, 13 Jul 2006 03:55:43 -0400
Received: from huawei.com (szxga02-in [172.24.2.6])
	by szxga02-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTP id <0J2C00BHK1F5YF@szxga02-in.huawei.com> for
	speechsc@ietf.org; Thu, 13 Jul 2006 16:11:29 +0800 (CST)
Received: from huawei.com ([172.24.1.18])
	by szxga02-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTP id <0J2C007RZ1F5HA@szxga02-in.huawei.com> for
	speechsc@ietf.org; Thu, 13 Jul 2006 16:11:29 +0800 (CST)
Received: from a70208c ([10.70.101.150])
	by szxml03-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTPA id <0J2C00L4L0R8VZ@szxml03-in.huawei.com> for
	speechsc@ietf.org; Thu, 13 Jul 2006 15:57:13 +0800 (CST)
Date: Thu, 13 Jul 2006 15:54:46 +0800
From: Arvind Saraswat <arvinds@huawei.com>
To: speechsc@ietf.org
Message-id: <000b01c6a651$9bee3800$9665460a@china.huawei.com>
Organization: Huawei Technologies Co.Ltd.
MIME-version: 1.0
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
X-Mailer: Microsoft Office Outlook 11
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Thread-index: AcamUZu5hDkg4VeNSnOkbRNV0DTEhg==
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 79899194edc4f33a41f49410777972f8
Cc: "'Saravanan Shanmugham \(sarvi\)'" <sarvi@cisco.com>
Subject: [Speechsc] Regarding NLSML Schema in MRCPv1
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: arvinds@huawei.com
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Errors-To: speechsc-bounces@ietf.org

Hi,
	In MRCPv2 (draft 10), in section "16.1 NLSML Schema Definition", the
Schema for NLSML is provided that can be used for recognition results.

	Is there any such Schema defined for MRCPv1 (RFC 4463)? If so, can
any please send me the location where it is defined?

regards
Arvind


This e-mail and attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed
above. Any use of the information contained herein in any way (including,
but not limited to, total or partial disclosure, reproduction, or
dissemination) by persons other than the intended recipient's) is
prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


From speechsc-bounces@ietf.org Thu Jul 13 10:13:46 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1G11wn-000756-Lk; Thu, 13 Jul 2006 10:13:45 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1G11wn-000751-0W
	for speechsc@ietf.org; Thu, 13 Jul 2006 10:13:45 -0400
Received: from g2.genesyslab.com ([198.49.180.210])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G11wk-0005is-8V
	for speechsc@ietf.org; Thu, 13 Jul 2006 10:13:44 -0400
Received: from GIMLI.us.int.genesyslab.com ([192.168.20.233]) by
	g2.genesyslab.com with Microsoft SMTPSVC(6.0.3790.1830); 
	Thu, 13 Jul 2006 07:13:41 -0700
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Subject: RE: [speechsc] Hotword Recognition and Timers 
Date: Thu, 13 Jul 2006 07:13:39 -0700
Message-ID: <911B89A9FD71E649AA624FF24790D76F51AA40@GIMLI.us.int.genesyslab.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [speechsc] Hotword Recognition and Timers 
Thread-Index: AcaeIDnJnQkie6JaRWKHd1+MoHEcWwIZRZQw
From: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>
To: "Dave Burke" <david.burke@voxpilot.com>,
	"IETF SPEECHSC \(E-mail\)" <speechsc@ietf.org>
X-OriginalArrivalTime: 13 Jul 2006 14:13:41.0207 (UTC)
	FILETIME=[8ADAEE70:01C6A686]
X-Spam-Score: 0.0 (/)
X-Scan-Signature: e654cfa5e44bd623be3eb2c720858b05
Cc: 
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Errors-To: speechsc-bounces@ietf.org

One thing: if everyone is comfortable with these changes, then I wonder
what the purpose of the 003 hotword-maxtime completion cause code is. It
seems that throwing a 015 "no-match-maxtime" would not only work but
also make the most sense as the rest of the hotword behavior (from the
client's perspective) is more or less identical to the normal case.

What is the rationale for having a 003 hotword-maxtime completion cause
code at this point? If there is none, I would like to suggest that it be
removed. We can "reserve" the numeric code for future use to avoid
renumbering everything else if that is a concern.

Andrew

-----Original Message-----
From: Dave Burke [mailto:david.burke@voxpilot.com]=20
Sent: July 2, 2006 5:41 PM
To: Andrew Wahbe; IETF SPEECHSC (E-mail)
Subject: Re: [speechsc] Hotword Recognition and Timers=20

Andrew's proposals/clarifications make sense to me.

One interesting result, however, is that Andrew's definition for
Recognition-Timeout coincides with Hotword-Max-Duration except that the
former terminates the recognition when it fires. I don't think this is
necessarily a problem.

It seems (if I understand this thread properly) that the VoiceXML world
needs a maxspeechtimeout to terminate hotword but the MRCP protocol also
might need a safety net to prevent a RECOGNIZE going IN-PROGRESS
forever.=20
For normal recognition, the Recognition-Timeout gets you both the safety
net and the maxspeechtimeout. Since the MRCP client can STOP a
recognition at any point this safety net is not crucial. In short - I'm
fine with Andrew's suggested changes.

Dave

----- Original Message -----=20
From: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>
To: "IETF SPEECHSC (E-mail)" <speechsc@ietf.org>
Sent: Monday, June 19, 2006 8:30 PM
Subject: RE: [speechsc] Hotword Recognition and Timers


The thing is that nowhere in your explanation are you mentioning the
prompt and it's completion (ie. the START-INPUT-TIMERS message). The
main use case and reason for hotword recognition/recognition-based
barge-in is to prevent accidental barge-in on audio content such as a
voicemail, tts email, etc. The scenario you describe below requires that
the client knows how long the content is when the RECOGNIZE is started;
this is definitely not an assumption you can make. The client won't know
how long it will take to TTS a chunk of text or how long the set of
audio files (prompts) are or even if they end at all (it could be a
continuous stream).

My proposal is that hotword recognition should "work" in a similar
manner to normal recognition from the client's perspective:

* RECOGNIZE is sent with the start-input-timers header set to "false".
The recognition-mode is set to "hotword". Prompt playback starts at this
point as well.
* START-INPUT-TIMERS is sent when the prompt completes. The
no-input-timer starts at this point.

The above two points are identical to the normal case except that the
recognition-mode is "hotword". My proposal is that the general meaning
of the recognition and no-input timers are also the same as the normal
case. Namely:

* The no-input timer is the max amount of time after the prompt
completes that we are willing to wait for input. This is equivalent to
the "timeout" property in VoiceXML. It is usually on the order of a few
seconds.
* The recognition timer is the max amount of time that we will run
recognition on a single "utterance". This is basically a safety net
protecting against noise (say the user left the phone off the hook next
to the radio) keeping the recognizer occupied for an unreasonable amount
of time. This only applies when speech is detected since the
no-input-timer will take effect (once the prompt is done) to terminate
the recognition. This is equivalent to the "maxspeechtimeout" property
in VoiceXML. This is usually quite a bit longer than the no-input
timeout, say 10 to 30 seconds.

Note that the definitions of timeout and maxspeechtimeout properties in
VoiceXML apply to both normal and hotword recognition, which is part of
the rational for keeping the high-level meaning the same for both modes
in MRCP. At the end of the day, the developer has to answer two
questions regardless of what mode they are using:
* How long after the end of a prompt do I want to want to wait for
input? (no-input timeout)
* How much continuous noise am I willing to process before aborting a
recognition? (recognition timeout)

What makes things a little complicated is that in hotword recognition:
1) the detection of speech does not mean that "input" was detected -- we
don't have "input" until we have a match;
2) we can go from a state of processing speech/sound back to a state
where there is silence and we are waiting for speech.

The behaviors that were specified in the original email was an attempt
to keep the same high-level meanings for the timers while taking into
account the two points above. These special behaviors for hotword mode
were:
a) the no-input timer is not cancelled until there is a recognition
result.
b) the recognition timer is reset and turned off when an utterance that
doesn't match anything "ends" as determined by the incomplete timeout
firing. The recognition timer is re-enabled when subsequent speech is
detected.

Another behavior that the VoiceXML Forum MRCP Liaison Committee has
discussed recently is as follows:
c) if the no-input timer fires while speech is being processed, then the
recognition will not be aborted until the recognizer makes a decision on
that segment of speech (eg. complete timeout, incomplete timeout,
recognition timeout, or early no-match). A no-match on the utterance at
this point would cause "no-input-timeout" to be returned for the
recognition.

This last behavior would prevent the no-input timeout from cutting off
recognition in the middle of an utterance, which might happen if we
followed (a) above.

To address your use cases below:

1. If you say nothing, the no-input timer will eventually fire (at the
specified number of milliseconds after the prompt is completed) and end
the recognition.

2. If you say something unintelligible, the no-input timer is not
stopped as that does not correspond to a recognition result in hotword.
Note that the no-input timer may not even be enabled if the prompt is
still playing. At the end of the unintelligible speech, the recognition
timer is stopped and turned off. When you later say something
intelligible, the recognition timer is turned back on while you are
speaking. Assuming your speech was short, the recognition timer is
turned back off when you are done speaking.  Since you now generated a
match, the no-input timer is also cancelled (if the prompt had finished)
and the result is returned.

Thanks,

Andrew Wahbe

-----Original Message-----
From: Saravanan Shanmugham (sarvi) [mailto:sarvi@cisco.com]
Sent: June 16, 2006 2:46 PM
To: Dan Burnett; IETF SPEECHSC (E-mail)
Subject: RE: [speechsc] Hotword Recognition and Timers


I can see that both No-Input-Timout and Recognition-Tiemout values will
be usefull for Hotword recognition.
But saying that Recognition-Timer is started after speech is detected
bothers me.
Also what do you expect typical values for these timers based on your
proposed definitions.

Hotword recognition is very often used to issue commands.
So lets take the following scenario and look at possible cases.

When the system reading out a long email, you should be able to issue
command like "speedup" or "slow down" or "repeat" etc.

1. But then I might never say any command at all. So defining
Recognition-Timer as starting after speech is detected makes no sense in
this case. No-Input-Timer, if defined to be applicable to Hotword
recognition might make sense in this case.

2. Then I might say something unintelligible in the middle. Which should
be technically ignored. And then a little later I might actually speak a
command, "speed up". Here when I said something unintelligible, the
No-Input-Timer would be stopped. If we went with the definition
proposed, the Recognition-Timer would be started here.

If you assume No-Input-Timer would be sufficiently large and
Recognition-Timer will be relatively small. This means that once we say
something not matching a hotword(which should technically expected to be
ignored), the RECOGNIZE would complete due to Recogition-Timeout.

If we assume No-Input-Timer to be short and Recognition-Timer to be
long, then we are requiring that the user MUST say something
intelligible or unintelligible reasobaly quickly. Or the Recognize would
terminate due to No-Input-timeout.

If we assume No-Input-Timer to be large and Recognition-timer to be
large as well. The depending on whether I say something unintelligible
or not, the over all timeout could be  pretty large upto max of
No-Tinput-timer + Recognition-Timer.

The way I would expect this to work is, that No-Input-Timer and
Recognition-Timers are started at beginning of a hotword RECOGNIZE and
both are reasonably large values. The No-Input-Timer being most likely
possible equal to or smaller than Recognition-Timer.

Now, if I said nothing at all an the No-Input-Timer expired, the
RECOGNIZE commplete with no-input-timeout. The moment I say something,
unintelligible or intelligible, the No-Input-timer is stopped.
Recognition-Timer continues on.  If the current speech or a future
command matches a hotword grammar, the RECOGNIZE command, it completes
with success.
If nothing matches and the Recognition-Timer expires, the RECOGNIZE
completes with recognition-timeout.

This way for hotword, Recognition-Timer is the max recognition time for
the RECOGNIZE. While No-Input-Timer would only be equal or smaller.

Thx,
Sarvi

     -----Original Message-----
     From: Dan Burnett [mailto:dan_burnett2000@yahoo.com]
     Sent: Thursday, June 08, 2006 5:06 AM
     To: IETF SPEECHSC (E-mail)
     Subject: Re: [speechsc] Hotword Recognition and Timers

     This email is a result of discussions by the MRCP subgroup
     of the VoiceXML Forum, in which I participated, so I
     already agree with the proposals given here.

     However, I would like to hear comments from others before
     applying these changes to the spec draft, preferably from
     those who did not participate in the VoiceXML Forum discussions.

     This has been added to the issue tracker
     (http://www.softarmor.com/roundup/speechsc) as issue 88.

     -- dan


     --- Andrew Wahbe <awahbe@voicegenie.com> wrote:

     > The description of how timers (no-input and
     > recognition) are used during
     > hotword recognition is inconsistent. In sections 9.4.7,
     it is stated
     > that "For a hotword recognition mode, this timer is
     started when the
     > user begins speaking. Note that for Hotword mode recognition the
     > START-OF-INPUT event is not generated." However, section
     9.9 states
     > that for the hotword case: "The Recognition-Timer gets
     started at the
     > beginning of RECOGNIZE."
     >
     > It seems that section 9.9 is incorrect (or at least is
     inconsistent
     > with VoiceXML).
     >
     > Section 9.9 omits any mention of the no-input timer for
     the hotword
     > mode recognition case; however, none of the sections
     that deal with
     > the no-input timer make a distinction between the hotword and
     > non-hotword cases. VoiceXML also does not make this distinction.
     > It would seem that
     > section 9.9 should be changed to indicate that no-input
     timers are
     > started in the hotword case and that no-input-timeout is a valid
     > completion cause for a hotword recognition.
     >
     > A related question worth considering is if the
     recognition timer is
     > reset at any point, for example, on the detection of
     silence. Consider
     > the case when maxspeech has a value of say 20 seconds (a
     > typical/reasonable value) and hotword barge-in is being
     used on a
     > prompt that is 30 seconds long. This would mean that a
     user that spoke
     > briefly
     > 2 seconds into the prompt (and was silent for the
     remainder of the
     > prompt) would experience a maxspeech timeout at about 22
     seconds into
     > the prompt. They would not hear the whole prompt which seems
     > inappropriate. The reason for maxspeech timeout is to
     catch continuous
     > noise and keep it from occupying a recognizer; but what
     should happen
     > in periods of silence in the hotword case?
     >
     > Similarly, when is the no-input timer canceled in the
     hotword case? Is
     > it when speech (not necessarily matching) is detected?
     Or is it only
     > upon a match?
     >
     > The correct behavior in my opinion is that the no-input timer is
     > canceled only on a match, and that the recognition timer
     should be
     > reset if silence (determined by complete timeout and incomplete
     > timeout) is detected. If we are just processing
     intermittent noise,
     > the no-input timer will eventually expire. Continuous
     noise is handled
     > by the recognition timer. Of course other there are other
     > possibilities as well, this is just one option that I
     think fits with
     > VoiceXML.
     > > begin:vcard
     > fn:Andrew Wahbe
     > n:Wahbe;Andrew
     > org:VoiceGenie Technologies INC.
     > adr:8th Floor;;1120 Finch Avenue W.;Toronto;ON;M3J 3H7;Canada
     > email;internet:awahbe@voicegenie.com
     > title:Senior Architect
     > tel;work:(416) 736-0905 ext. 258
     > tel;fax:(416) 736-1551
     > x-mozilla-html:TRUE
     > url:http://www.voicegenie.com
     > version:2.1
     > end:vcard
     >
     > > _______________________________________________
     > Speechsc mailing list
     > Speechsc@ietf.org
     > https://www1.ietf.org/mailman/listinfo/speechsc
     >


     __________________________________________________
     Do You Yahoo!?
     Tired of spam?  Yahoo! Mail has the best spam protection
     around http://mail.yahoo.com

     _______________________________________________
     Speechsc mailing list
     Speechsc@ietf.org
     https://www1.ietf.org/mailman/listinfo/speechsc


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


From speechsc-bounces@ietf.org Thu Jul 13 11:56:24 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1G13Y8-0005bQ-GK; Thu, 13 Jul 2006 11:56:24 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1G13Y7-0005bL-0S
	for speechsc@ietf.org; Thu, 13 Jul 2006 11:56:23 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G13Y3-0003tZ-NU
	for speechsc@ietf.org; Thu, 13 Jul 2006 11:56:22 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id B662F214114; Thu, 13 Jul 2006 15:56:18 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-4.3 required=5.5 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (unknown [10.0.0.102])
	by mail.voxpilot.com (Postfix) with ESMTP
	id 579AA2140F6; Thu, 13 Jul 2006 15:56:10 +0000 (GMT)
Message-ID: <01ce01c6a694$d55a3fb0$6700000a@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>,
	"IETF SPEECHSC (E-mail)" <speechsc@ietf.org>
References: <911B89A9FD71E649AA624FF24790D76F51AA40@GIMLI.us.int.genesyslab.com>
Subject: Re: [speechsc] Hotword Recognition and Timers 
Date: Thu, 13 Jul 2006 16:55:58 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 4d9ae72af46718088458d214998cc683
Cc: 
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Errors-To: speechsc-bounces@ietf.org

That works for me. And just to clarify: this means that if the 
Recognition-Timer fired in hotword and there was a match, then 008 
success-maxtime would be returned - right?

Dave

----- Original Message ----- 
From: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>
To: "Dave Burke" <david.burke@voxpilot.com>; "IETF SPEECHSC (E-mail)" 
<speechsc@ietf.org>
Sent: Thursday, July 13, 2006 3:13 PM
Subject: RE: [speechsc] Hotword Recognition and Timers


One thing: if everyone is comfortable with these changes, then I wonder
what the purpose of the 003 hotword-maxtime completion cause code is. It
seems that throwing a 015 "no-match-maxtime" would not only work but
also make the most sense as the rest of the hotword behavior (from the
client's perspective) is more or less identical to the normal case.

What is the rationale for having a 003 hotword-maxtime completion cause
code at this point? If there is none, I would like to suggest that it be
removed. We can "reserve" the numeric code for future use to avoid
renumbering everything else if that is a concern.

Andrew

-----Original Message-----
From: Dave Burke [mailto:david.burke@voxpilot.com]
Sent: July 2, 2006 5:41 PM
To: Andrew Wahbe; IETF SPEECHSC (E-mail)
Subject: Re: [speechsc] Hotword Recognition and Timers

Andrew's proposals/clarifications make sense to me.

One interesting result, however, is that Andrew's definition for
Recognition-Timeout coincides with Hotword-Max-Duration except that the
former terminates the recognition when it fires. I don't think this is
necessarily a problem.

It seems (if I understand this thread properly) that the VoiceXML world
needs a maxspeechtimeout to terminate hotword but the MRCP protocol also
might need a safety net to prevent a RECOGNIZE going IN-PROGRESS
forever.
For normal recognition, the Recognition-Timeout gets you both the safety
net and the maxspeechtimeout. Since the MRCP client can STOP a
recognition at any point this safety net is not crucial. In short - I'm
fine with Andrew's suggested changes.

Dave

----- Original Message ----- 
From: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>
To: "IETF SPEECHSC (E-mail)" <speechsc@ietf.org>
Sent: Monday, June 19, 2006 8:30 PM
Subject: RE: [speechsc] Hotword Recognition and Timers


The thing is that nowhere in your explanation are you mentioning the
prompt and it's completion (ie. the START-INPUT-TIMERS message). The
main use case and reason for hotword recognition/recognition-based
barge-in is to prevent accidental barge-in on audio content such as a
voicemail, tts email, etc. The scenario you describe below requires that
the client knows how long the content is when the RECOGNIZE is started;
this is definitely not an assumption you can make. The client won't know
how long it will take to TTS a chunk of text or how long the set of
audio files (prompts) are or even if they end at all (it could be a
continuous stream).

My proposal is that hotword recognition should "work" in a similar
manner to normal recognition from the client's perspective:

* RECOGNIZE is sent with the start-input-timers header set to "false".
The recognition-mode is set to "hotword". Prompt playback starts at this
point as well.
* START-INPUT-TIMERS is sent when the prompt completes. The
no-input-timer starts at this point.

The above two points are identical to the normal case except that the
recognition-mode is "hotword". My proposal is that the general meaning
of the recognition and no-input timers are also the same as the normal
case. Namely:

* The no-input timer is the max amount of time after the prompt
completes that we are willing to wait for input. This is equivalent to
the "timeout" property in VoiceXML. It is usually on the order of a few
seconds.
* The recognition timer is the max amount of time that we will run
recognition on a single "utterance". This is basically a safety net
protecting against noise (say the user left the phone off the hook next
to the radio) keeping the recognizer occupied for an unreasonable amount
of time. This only applies when speech is detected since the
no-input-timer will take effect (once the prompt is done) to terminate
the recognition. This is equivalent to the "maxspeechtimeout" property
in VoiceXML. This is usually quite a bit longer than the no-input
timeout, say 10 to 30 seconds.

Note that the definitions of timeout and maxspeechtimeout properties in
VoiceXML apply to both normal and hotword recognition, which is part of
the rational for keeping the high-level meaning the same for both modes
in MRCP. At the end of the day, the developer has to answer two
questions regardless of what mode they are using:
* How long after the end of a prompt do I want to want to wait for
input? (no-input timeout)
* How much continuous noise am I willing to process before aborting a
recognition? (recognition timeout)

What makes things a little complicated is that in hotword recognition:
1) the detection of speech does not mean that "input" was detected -- we
don't have "input" until we have a match;
2) we can go from a state of processing speech/sound back to a state
where there is silence and we are waiting for speech.

The behaviors that were specified in the original email was an attempt
to keep the same high-level meanings for the timers while taking into
account the two points above. These special behaviors for hotword mode
were:
a) the no-input timer is not cancelled until there is a recognition
result.
b) the recognition timer is reset and turned off when an utterance that
doesn't match anything "ends" as determined by the incomplete timeout
firing. The recognition timer is re-enabled when subsequent speech is
detected.

Another behavior that the VoiceXML Forum MRCP Liaison Committee has
discussed recently is as follows:
c) if the no-input timer fires while speech is being processed, then the
recognition will not be aborted until the recognizer makes a decision on
that segment of speech (eg. complete timeout, incomplete timeout,
recognition timeout, or early no-match). A no-match on the utterance at
this point would cause "no-input-timeout" to be returned for the
recognition.

This last behavior would prevent the no-input timeout from cutting off
recognition in the middle of an utterance, which might happen if we
followed (a) above.

To address your use cases below:

1. If you say nothing, the no-input timer will eventually fire (at the
specified number of milliseconds after the prompt is completed) and end
the recognition.

2. If you say something unintelligible, the no-input timer is not
stopped as that does not correspond to a recognition result in hotword.
Note that the no-input timer may not even be enabled if the prompt is
still playing. At the end of the unintelligible speech, the recognition
timer is stopped and turned off. When you later say something
intelligible, the recognition timer is turned back on while you are
speaking. Assuming your speech was short, the recognition timer is
turned back off when you are done speaking.  Since you now generated a
match, the no-input timer is also cancelled (if the prompt had finished)
and the result is returned.

Thanks,

Andrew Wahbe

-----Original Message-----
From: Saravanan Shanmugham (sarvi) [mailto:sarvi@cisco.com]
Sent: June 16, 2006 2:46 PM
To: Dan Burnett; IETF SPEECHSC (E-mail)
Subject: RE: [speechsc] Hotword Recognition and Timers


I can see that both No-Input-Timout and Recognition-Tiemout values will
be usefull for Hotword recognition.
But saying that Recognition-Timer is started after speech is detected
bothers me.
Also what do you expect typical values for these timers based on your
proposed definitions.

Hotword recognition is very often used to issue commands.
So lets take the following scenario and look at possible cases.

When the system reading out a long email, you should be able to issue
command like "speedup" or "slow down" or "repeat" etc.

1. But then I might never say any command at all. So defining
Recognition-Timer as starting after speech is detected makes no sense in
this case. No-Input-Timer, if defined to be applicable to Hotword
recognition might make sense in this case.

2. Then I might say something unintelligible in the middle. Which should
be technically ignored. And then a little later I might actually speak a
command, "speed up". Here when I said something unintelligible, the
No-Input-Timer would be stopped. If we went with the definition
proposed, the Recognition-Timer would be started here.

If you assume No-Input-Timer would be sufficiently large and
Recognition-Timer will be relatively small. This means that once we say
something not matching a hotword(which should technically expected to be
ignored), the RECOGNIZE would complete due to Recogition-Timeout.

If we assume No-Input-Timer to be short and Recognition-Timer to be
long, then we are requiring that the user MUST say something
intelligible or unintelligible reasobaly quickly. Or the Recognize would
terminate due to No-Input-timeout.

If we assume No-Input-Timer to be large and Recognition-timer to be
large as well. The depending on whether I say something unintelligible
or not, the over all timeout could be  pretty large upto max of
No-Tinput-timer + Recognition-Timer.

The way I would expect this to work is, that No-Input-Timer and
Recognition-Timers are started at beginning of a hotword RECOGNIZE and
both are reasonably large values. The No-Input-Timer being most likely
possible equal to or smaller than Recognition-Timer.

Now, if I said nothing at all an the No-Input-Timer expired, the
RECOGNIZE commplete with no-input-timeout. The moment I say something,
unintelligible or intelligible, the No-Input-timer is stopped.
Recognition-Timer continues on.  If the current speech or a future
command matches a hotword grammar, the RECOGNIZE command, it completes
with success.
If nothing matches and the Recognition-Timer expires, the RECOGNIZE
completes with recognition-timeout.

This way for hotword, Recognition-Timer is the max recognition time for
the RECOGNIZE. While No-Input-Timer would only be equal or smaller.

Thx,
Sarvi

     -----Original Message-----
     From: Dan Burnett [mailto:dan_burnett2000@yahoo.com]
     Sent: Thursday, June 08, 2006 5:06 AM
     To: IETF SPEECHSC (E-mail)
     Subject: Re: [speechsc] Hotword Recognition and Timers

     This email is a result of discussions by the MRCP subgroup
     of the VoiceXML Forum, in which I participated, so I
     already agree with the proposals given here.

     However, I would like to hear comments from others before
     applying these changes to the spec draft, preferably from
     those who did not participate in the VoiceXML Forum discussions.

     This has been added to the issue tracker
     (http://www.softarmor.com/roundup/speechsc) as issue 88.

     -- dan


     --- Andrew Wahbe <awahbe@voicegenie.com> wrote:

     > The description of how timers (no-input and
     > recognition) are used during
     > hotword recognition is inconsistent. In sections 9.4.7,
     it is stated
     > that "For a hotword recognition mode, this timer is
     started when the
     > user begins speaking. Note that for Hotword mode recognition the
     > START-OF-INPUT event is not generated." However, section
     9.9 states
     > that for the hotword case: "The Recognition-Timer gets
     started at the
     > beginning of RECOGNIZE."
     >
     > It seems that section 9.9 is incorrect (or at least is
     inconsistent
     > with VoiceXML).
     >
     > Section 9.9 omits any mention of the no-input timer for
     the hotword
     > mode recognition case; however, none of the sections
     that deal with
     > the no-input timer make a distinction between the hotword and
     > non-hotword cases. VoiceXML also does not make this distinction.
     > It would seem that
     > section 9.9 should be changed to indicate that no-input
     timers are
     > started in the hotword case and that no-input-timeout is a valid
     > completion cause for a hotword recognition.
     >
     > A related question worth considering is if the
     recognition timer is
     > reset at any point, for example, on the detection of
     silence. Consider
     > the case when maxspeech has a value of say 20 seconds (a
     > typical/reasonable value) and hotword barge-in is being
     used on a
     > prompt that is 30 seconds long. This would mean that a
     user that spoke
     > briefly
     > 2 seconds into the prompt (and was silent for the
     remainder of the
     > prompt) would experience a maxspeech timeout at about 22
     seconds into
     > the prompt. They would not hear the whole prompt which seems
     > inappropriate. The reason for maxspeech timeout is to
     catch continuous
     > noise and keep it from occupying a recognizer; but what
     should happen
     > in periods of silence in the hotword case?
     >
     > Similarly, when is the no-input timer canceled in the
     hotword case? Is
     > it when speech (not necessarily matching) is detected?
     Or is it only
     > upon a match?
     >
     > The correct behavior in my opinion is that the no-input timer is
     > canceled only on a match, and that the recognition timer
     should be
     > reset if silence (determined by complete timeout and incomplete
     > timeout) is detected. If we are just processing
     intermittent noise,
     > the no-input timer will eventually expire. Continuous
     noise is handled
     > by the recognition timer. Of course other there are other
     > possibilities as well, this is just one option that I
     think fits with
     > VoiceXML.
     > > begin:vcard
     > fn:Andrew Wahbe
     > n:Wahbe;Andrew
     > org:VoiceGenie Technologies INC.
     > adr:8th Floor;;1120 Finch Avenue W.;Toronto;ON;M3J 3H7;Canada
     > email;internet:awahbe@voicegenie.com
     > title:Senior Architect
     > tel;work:(416) 736-0905 ext. 258
     > tel;fax:(416) 736-1551
     > x-mozilla-html:TRUE
     > url:http://www.voicegenie.com
     > version:2.1
     > end:vcard
     >
     > > _______________________________________________
     > Speechsc mailing list
     > Speechsc@ietf.org
     > https://www1.ietf.org/mailman/listinfo/speechsc
     >


     __________________________________________________
     Do You Yahoo!?
     Tired of spam?  Yahoo! Mail has the best spam protection
     around http://mail.yahoo.com

     _______________________________________________
     Speechsc mailing list
     Speechsc@ietf.org
     https://www1.ietf.org/mailman/listinfo/speechsc


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


From speechsc-bounces@ietf.org Thu Jul 13 12:34:44 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1G149D-0002fs-Uv; Thu, 13 Jul 2006 12:34:43 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1G149D-0002fk-AO
	for speechsc@ietf.org; Thu, 13 Jul 2006 12:34:43 -0400
Received: from g2.genesyslab.com ([198.49.180.210])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G149A-0005gi-G8
	for speechsc@ietf.org; Thu, 13 Jul 2006 12:34:43 -0400
Received: from GIMLI.us.int.genesyslab.com ([192.168.20.233]) by
	g2.genesyslab.com with Microsoft SMTPSVC(6.0.3790.1830); 
	Thu, 13 Jul 2006 09:34:39 -0700
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Subject: RE: [speechsc] Hotword Recognition and Timers 
Date: Thu, 13 Jul 2006 09:34:32 -0700
Message-ID: <911B89A9FD71E649AA624FF24790D76F51AB05@GIMLI.us.int.genesyslab.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [speechsc] Hotword Recognition and Timers 
Thread-Index: AcamlOPVef8vwiTbSQGExl5yWtkdMAABU+Vw
From: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>
To: "Dave Burke" <david.burke@voxpilot.com>,
	"IETF SPEECHSC \(E-mail\)" <speechsc@ietf.org>
X-OriginalArrivalTime: 13 Jul 2006 16:34:39.0350 (UTC)
	FILETIME=[3C4D2160:01C6A69A]
X-Spam-Score: 0.0 (/)
X-Scan-Signature: e06437eb72f6703f11713d345be8298a
Cc: 
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Errors-To: speechsc-bounces@ietf.org

yup=20

-----Original Message-----
From: Dave Burke [mailto:david.burke@voxpilot.com]=20
Sent: July 13, 2006 11:56 AM
To: Andrew Wahbe; IETF SPEECHSC (E-mail)
Subject: Re: [speechsc] Hotword Recognition and Timers=20

That works for me. And just to clarify: this means that if the
Recognition-Timer fired in hotword and there was a match, then 008
success-maxtime would be returned - right?

Dave

----- Original Message -----
From: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>
To: "Dave Burke" <david.burke@voxpilot.com>; "IETF SPEECHSC (E-mail)"=20
<speechsc@ietf.org>
Sent: Thursday, July 13, 2006 3:13 PM
Subject: RE: [speechsc] Hotword Recognition and Timers


One thing: if everyone is comfortable with these changes, then I wonder
what the purpose of the 003 hotword-maxtime completion cause code is. It
seems that throwing a 015 "no-match-maxtime" would not only work but
also make the most sense as the rest of the hotword behavior (from the
client's perspective) is more or less identical to the normal case.

What is the rationale for having a 003 hotword-maxtime completion cause
code at this point? If there is none, I would like to suggest that it be
removed. We can "reserve" the numeric code for future use to avoid
renumbering everything else if that is a concern.

Andrew

-----Original Message-----
From: Dave Burke [mailto:david.burke@voxpilot.com]
Sent: July 2, 2006 5:41 PM
To: Andrew Wahbe; IETF SPEECHSC (E-mail)
Subject: Re: [speechsc] Hotword Recognition and Timers

Andrew's proposals/clarifications make sense to me.

One interesting result, however, is that Andrew's definition for
Recognition-Timeout coincides with Hotword-Max-Duration except that the
former terminates the recognition when it fires. I don't think this is
necessarily a problem.

It seems (if I understand this thread properly) that the VoiceXML world
needs a maxspeechtimeout to terminate hotword but the MRCP protocol also
might need a safety net to prevent a RECOGNIZE going IN-PROGRESS
forever.
For normal recognition, the Recognition-Timeout gets you both the safety
net and the maxspeechtimeout. Since the MRCP client can STOP a
recognition at any point this safety net is not crucial. In short - I'm
fine with Andrew's suggested changes.

Dave

----- Original Message -----=20
From: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>
To: "IETF SPEECHSC (E-mail)" <speechsc@ietf.org>
Sent: Monday, June 19, 2006 8:30 PM
Subject: RE: [speechsc] Hotword Recognition and Timers


The thing is that nowhere in your explanation are you mentioning the
prompt and it's completion (ie. the START-INPUT-TIMERS message). The
main use case and reason for hotword recognition/recognition-based
barge-in is to prevent accidental barge-in on audio content such as a
voicemail, tts email, etc. The scenario you describe below requires that
the client knows how long the content is when the RECOGNIZE is started;
this is definitely not an assumption you can make. The client won't know
how long it will take to TTS a chunk of text or how long the set of
audio files (prompts) are or even if they end at all (it could be a
continuous stream).

My proposal is that hotword recognition should "work" in a similar
manner to normal recognition from the client's perspective:

* RECOGNIZE is sent with the start-input-timers header set to "false".
The recognition-mode is set to "hotword". Prompt playback starts at this
point as well.
* START-INPUT-TIMERS is sent when the prompt completes. The
no-input-timer starts at this point.

The above two points are identical to the normal case except that the
recognition-mode is "hotword". My proposal is that the general meaning
of the recognition and no-input timers are also the same as the normal
case. Namely:

* The no-input timer is the max amount of time after the prompt
completes that we are willing to wait for input. This is equivalent to
the "timeout" property in VoiceXML. It is usually on the order of a few
seconds.
* The recognition timer is the max amount of time that we will run
recognition on a single "utterance". This is basically a safety net
protecting against noise (say the user left the phone off the hook next
to the radio) keeping the recognizer occupied for an unreasonable amount
of time. This only applies when speech is detected since the
no-input-timer will take effect (once the prompt is done) to terminate
the recognition. This is equivalent to the "maxspeechtimeout" property
in VoiceXML. This is usually quite a bit longer than the no-input
timeout, say 10 to 30 seconds.

Note that the definitions of timeout and maxspeechtimeout properties in
VoiceXML apply to both normal and hotword recognition, which is part of
the rational for keeping the high-level meaning the same for both modes
in MRCP. At the end of the day, the developer has to answer two
questions regardless of what mode they are using:
* How long after the end of a prompt do I want to want to wait for
input? (no-input timeout)
* How much continuous noise am I willing to process before aborting a
recognition? (recognition timeout)

What makes things a little complicated is that in hotword recognition:
1) the detection of speech does not mean that "input" was detected -- we
don't have "input" until we have a match;
2) we can go from a state of processing speech/sound back to a state
where there is silence and we are waiting for speech.

The behaviors that were specified in the original email was an attempt
to keep the same high-level meanings for the timers while taking into
account the two points above. These special behaviors for hotword mode
were:
a) the no-input timer is not cancelled until there is a recognition
result.
b) the recognition timer is reset and turned off when an utterance that
doesn't match anything "ends" as determined by the incomplete timeout
firing. The recognition timer is re-enabled when subsequent speech is
detected.

Another behavior that the VoiceXML Forum MRCP Liaison Committee has
discussed recently is as follows:
c) if the no-input timer fires while speech is being processed, then the
recognition will not be aborted until the recognizer makes a decision on
that segment of speech (eg. complete timeout, incomplete timeout,
recognition timeout, or early no-match). A no-match on the utterance at
this point would cause "no-input-timeout" to be returned for the
recognition.

This last behavior would prevent the no-input timeout from cutting off
recognition in the middle of an utterance, which might happen if we
followed (a) above.

To address your use cases below:

1. If you say nothing, the no-input timer will eventually fire (at the
specified number of milliseconds after the prompt is completed) and end
the recognition.

2. If you say something unintelligible, the no-input timer is not
stopped as that does not correspond to a recognition result in hotword.
Note that the no-input timer may not even be enabled if the prompt is
still playing. At the end of the unintelligible speech, the recognition
timer is stopped and turned off. When you later say something
intelligible, the recognition timer is turned back on while you are
speaking. Assuming your speech was short, the recognition timer is
turned back off when you are done speaking.  Since you now generated a
match, the no-input timer is also cancelled (if the prompt had finished)
and the result is returned.

Thanks,

Andrew Wahbe

-----Original Message-----
From: Saravanan Shanmugham (sarvi) [mailto:sarvi@cisco.com]
Sent: June 16, 2006 2:46 PM
To: Dan Burnett; IETF SPEECHSC (E-mail)
Subject: RE: [speechsc] Hotword Recognition and Timers


I can see that both No-Input-Timout and Recognition-Tiemout values will
be usefull for Hotword recognition.
But saying that Recognition-Timer is started after speech is detected
bothers me.
Also what do you expect typical values for these timers based on your
proposed definitions.

Hotword recognition is very often used to issue commands.
So lets take the following scenario and look at possible cases.

When the system reading out a long email, you should be able to issue
command like "speedup" or "slow down" or "repeat" etc.

1. But then I might never say any command at all. So defining
Recognition-Timer as starting after speech is detected makes no sense in
this case. No-Input-Timer, if defined to be applicable to Hotword
recognition might make sense in this case.

2. Then I might say something unintelligible in the middle. Which should
be technically ignored. And then a little later I might actually speak a
command, "speed up". Here when I said something unintelligible, the
No-Input-Timer would be stopped. If we went with the definition
proposed, the Recognition-Timer would be started here.

If you assume No-Input-Timer would be sufficiently large and
Recognition-Timer will be relatively small. This means that once we say
something not matching a hotword(which should technically expected to be
ignored), the RECOGNIZE would complete due to Recogition-Timeout.

If we assume No-Input-Timer to be short and Recognition-Timer to be
long, then we are requiring that the user MUST say something
intelligible or unintelligible reasobaly quickly. Or the Recognize would
terminate due to No-Input-timeout.

If we assume No-Input-Timer to be large and Recognition-timer to be
large as well. The depending on whether I say something unintelligible
or not, the over all timeout could be  pretty large upto max of
No-Tinput-timer + Recognition-Timer.

The way I would expect this to work is, that No-Input-Timer and
Recognition-Timers are started at beginning of a hotword RECOGNIZE and
both are reasonably large values. The No-Input-Timer being most likely
possible equal to or smaller than Recognition-Timer.

Now, if I said nothing at all an the No-Input-Timer expired, the
RECOGNIZE commplete with no-input-timeout. The moment I say something,
unintelligible or intelligible, the No-Input-timer is stopped.
Recognition-Timer continues on.  If the current speech or a future
command matches a hotword grammar, the RECOGNIZE command, it completes
with success.
If nothing matches and the Recognition-Timer expires, the RECOGNIZE
completes with recognition-timeout.

This way for hotword, Recognition-Timer is the max recognition time for
the RECOGNIZE. While No-Input-Timer would only be equal or smaller.

Thx,
Sarvi

     -----Original Message-----
     From: Dan Burnett [mailto:dan_burnett2000@yahoo.com]
     Sent: Thursday, June 08, 2006 5:06 AM
     To: IETF SPEECHSC (E-mail)
     Subject: Re: [speechsc] Hotword Recognition and Timers

     This email is a result of discussions by the MRCP subgroup
     of the VoiceXML Forum, in which I participated, so I
     already agree with the proposals given here.

     However, I would like to hear comments from others before
     applying these changes to the spec draft, preferably from
     those who did not participate in the VoiceXML Forum discussions.

     This has been added to the issue tracker
     (http://www.softarmor.com/roundup/speechsc) as issue 88.

     -- dan


     --- Andrew Wahbe <awahbe@voicegenie.com> wrote:

     > The description of how timers (no-input and
     > recognition) are used during
     > hotword recognition is inconsistent. In sections 9.4.7,
     it is stated
     > that "For a hotword recognition mode, this timer is
     started when the
     > user begins speaking. Note that for Hotword mode recognition the
     > START-OF-INPUT event is not generated." However, section
     9.9 states
     > that for the hotword case: "The Recognition-Timer gets
     started at the
     > beginning of RECOGNIZE."
     >
     > It seems that section 9.9 is incorrect (or at least is
     inconsistent
     > with VoiceXML).
     >
     > Section 9.9 omits any mention of the no-input timer for
     the hotword
     > mode recognition case; however, none of the sections
     that deal with
     > the no-input timer make a distinction between the hotword and
     > non-hotword cases. VoiceXML also does not make this distinction.
     > It would seem that
     > section 9.9 should be changed to indicate that no-input
     timers are
     > started in the hotword case and that no-input-timeout is a valid
     > completion cause for a hotword recognition.
     >
     > A related question worth considering is if the
     recognition timer is
     > reset at any point, for example, on the detection of
     silence. Consider
     > the case when maxspeech has a value of say 20 seconds (a
     > typical/reasonable value) and hotword barge-in is being
     used on a
     > prompt that is 30 seconds long. This would mean that a
     user that spoke
     > briefly
     > 2 seconds into the prompt (and was silent for the
     remainder of the
     > prompt) would experience a maxspeech timeout at about 22
     seconds into
     > the prompt. They would not hear the whole prompt which seems
     > inappropriate. The reason for maxspeech timeout is to
     catch continuous
     > noise and keep it from occupying a recognizer; but what
     should happen
     > in periods of silence in the hotword case?
     >
     > Similarly, when is the no-input timer canceled in the
     hotword case? Is
     > it when speech (not necessarily matching) is detected?
     Or is it only
     > upon a match?
     >
     > The correct behavior in my opinion is that the no-input timer is
     > canceled only on a match, and that the recognition timer
     should be
     > reset if silence (determined by complete timeout and incomplete
     > timeout) is detected. If we are just processing
     intermittent noise,
     > the no-input timer will eventually expire. Continuous
     noise is handled
     > by the recognition timer. Of course other there are other
     > possibilities as well, this is just one option that I
     think fits with
     > VoiceXML.
     > > begin:vcard
     > fn:Andrew Wahbe
     > n:Wahbe;Andrew
     > org:VoiceGenie Technologies INC.
     > adr:8th Floor;;1120 Finch Avenue W.;Toronto;ON;M3J 3H7;Canada
     > email;internet:awahbe@voicegenie.com
     > title:Senior Architect
     > tel;work:(416) 736-0905 ext. 258
     > tel;fax:(416) 736-1551
     > x-mozilla-html:TRUE
     > url:http://www.voicegenie.com
     > version:2.1
     > end:vcard
     >
     > > _______________________________________________
     > Speechsc mailing list
     > Speechsc@ietf.org
     > https://www1.ietf.org/mailman/listinfo/speechsc
     >


     __________________________________________________
     Do You Yahoo!?
     Tired of spam?  Yahoo! Mail has the best spam protection
     around http://mail.yahoo.com

     _______________________________________________
     Speechsc mailing list
     Speechsc@ietf.org
     https://www1.ietf.org/mailman/listinfo/speechsc


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


From speechsc-bounces@ietf.org Thu Jul 20 10:41:39 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1G3Zic-0003ov-IR; Thu, 20 Jul 2006 10:41:38 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1G3Zhs-0003QS-9r
	for speechsc@ietf.org; Thu, 20 Jul 2006 10:40:52 -0400
Received: from stsc1260-eth-s1-s1p1-vip.va.neustar.com ([156.154.16.129]
	helo=chiedprmail1.ietf.org)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G3ZWt-0003tn-LT
	for speechsc@ietf.org; Thu, 20 Jul 2006 10:29:31 -0400
Received: from mailf.telecomitalia.it ([156.54.233.32])
	by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1G3ZIe-0003g6-UL
	for speechsc@ietf.org; Thu, 20 Jul 2006 10:14:51 -0400
Received: from ptpxch007ba020.idc.cww.telecomitalia.it ([156.54.240.50]) by
	mailf.telecomitalia.it with Microsoft SMTPSVC(6.0.3790.1830);
	Thu, 20 Jul 2006 16:14:43 +0200
Received: from PTPEVS106BA020.idc.cww.telecomitalia.it ([156.54.241.223]) by
	ptpxch007ba020.idc.cww.telecomitalia.it with Microsoft
	SMTPSVC(6.0.3790.1830); Thu, 20 Jul 2006 16:14:42 +0200
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.2663
Content-Class: urn:content-classes:message
MIME-Version: 1.0
Importance: normal
Priority: normal
Subject: [speechsc] Voice enrollment: confusable-phrases inconsistency
Date: Thu, 20 Jul 2006 16:11:46 +0200
Message-ID: <01C0B9926BC410459FE9AACE49B815027723D5@PTPEVS106BA020.idc.cww.telecomitalia.it>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [speechsc] Voice enrollment: confusable-phrases inconsistency
thread-index: AcasBm+KKch4ntntQH27+qQRD0iDHg==
From: "Bergallo Patrizio" <patrizio.bergallo@loquendo.com>
To: <speechsc@ietf.org>
X-OriginalArrivalTime: 20 Jul 2006 14:14:42.0958 (UTC)
	FILETIME=[D88DA2E0:01C6AC06]
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 140baa79ca42e6b0e2b4504291346186
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0080865426=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============0080865426==
Content-Transfer-Encoding: 7bit
Content-Class: urn:content-classes:message
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C6AC06.D8BED6BB"

This is a multi-part message in MIME format.

------_=_NextPart_001_01C6AC06.D8BED6BB
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hi,
=20
9.7.7, pag. 99 (draft 10) describes the CONFUSABLE-PHRASES element, part
of Enrollment Results.
The spec states:
   The <confusable-phrases> element contains a list of phrases from a
   command grammar that are confusable with the phrase being added to
   the personal grammar.  This element may be absent if there are no
   confusable phrases.
=20
Moreover, the Schema Definition (16.2, pag. 194) states:
        ...
        <element name=3D"confusable-phrases">
           <oneOrMore>
             <element name=3D"item">
               <text/>
             </element>
           </oneOrMore>
         </element>
        ...

But the only example about that states (9.14, pag. 113):
  ...
     <confusable-phrases>
         <item>
            <phrase> call </phrase>
            <confusion-level> 10 </confusion-level>
         </item>
    </confusable-phrases>
...
=20
It seems there is an inconsistency between the example and the rest of
the spec.
I think that either we should fix the example, like:
  ...
        <confusable-phrases>
            <item> call </item>
        </confusable-phrases>
...
losing the confusion-level concept, or we should align the spec and the
schema including phrase and confusion-level elements (or attributes).
=20
Bye,
Patrizio Bergallo, Loquendo.


Gruppo Telecom Italia - Direzione e coordinamento di Telecom Italia =
S.p.A.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
CONFIDENTIALITY NOTICE
This message and its attachments are addressed solely to the persons =
above and may contain confidential information. If you have received the =
message in error, be informed that any use of the content hereof is =
prohibited. Please return it immediately to the sender and delete the =
message. Should you have any questions, please send an e_mail to =
<mailto:webmaster@telecomitalia.it>webmaster@telecomitalia.it. Thank =
you<http://www.loquendo.com>www.loquendo.com
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

------_=_NextPart_001_01C6AC06.D8BED6BB
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<TITLE>Message</TITLE>

<META content=3D"MSHTML 6.00.2900.2802" name=3DGENERATOR></HEAD>
<BODY>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D356404513-20072006>Hi,</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D356404513-20072006></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D356404513-20072006>9.7.7, =
pag.=20
99&nbsp;(draft 10) describes the CONFUSABLE-PHRASES element, part of =
Enrollment=20
Results.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D356404513-20072006>The =
spec=20
states:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN =
class=3D356404513-20072006>&nbsp;&nbsp; The=20
&lt;confusable-phrases&gt; element contains a list of phrases from=20
a<BR>&nbsp;&nbsp; command grammar that are confusable with the phrase =
being=20
added to<BR>&nbsp;&nbsp; the personal grammar.&nbsp; This element may be =
absent=20
if there are no<BR>&nbsp;&nbsp; confusable phrases.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D356404513-20072006></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN =
class=3D356404513-20072006>Moreover, the Schema=20
Definition (16.2, pag. 194)&nbsp;states:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D356404513-20072006>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
...<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;element=20
name=3D"confusable-phrases"&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;=20
&lt;oneOrMore&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;=20
&lt;element=20
name=3D"item"&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
&lt;text/&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;=20
&lt;/element&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;=20
&lt;/oneOrMore&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
&lt;/element&gt;</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D356404513-20072006>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
...</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN =
class=3D356404513-20072006><BR>But the only=20
example about&nbsp;that states (9.14, pag. 113):</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D356404513-20072006>&nbsp; =

...</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D356404513-20072006>&nbsp;&nbsp;&nbsp;&nbsp;=20
&lt;confusable-phrases&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;=20
&lt;item&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;=20
&lt;phrase&gt; call=20
&lt;/phrase&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;=20
&lt;confusion-level&gt; 10=20
&lt;/confusion-level&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;=20
&lt;/item&gt;<BR>&nbsp;&nbsp;&nbsp;=20
&lt;/confusable-phrases&gt;</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D356404513-20072006>...</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D356404513-20072006></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D356404513-20072006>It =
seems there is an=20
inconsistency between the example and the rest of the =
spec.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D356404513-20072006>I =
think that either=20
we should fix the example, like:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D356404513-20072006>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D356404513-20072006>&nbsp; =

...</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D356404513-20072006>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
&lt;confusable-phrases&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;=20
&lt;item&gt; call =
&lt;/item&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
&lt;/confusable-phrases&gt;</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D356404513-20072006>...</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D356404513-20072006>losing =
the=20
confusion-level concept, or we should align the spec and the schema =
including=20
phrase and confusion-level elements (or attributes).</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D356404513-20072006></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D356404513-20072006>Bye,</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN =
class=3D356404513-20072006>Patrizio Bergallo,=20
Loquendo.</SPAN></FONT></DIV></SPAN></FONT></DIV><BR>Gruppo=20
Telecom Italia - Direzione e coordinamento di Telecom Italia =
S.p.A.<BR><BR><FONT=20
size=3D3>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D</FONT><BR>CONFIDENTIALITY=20
NOTICE<BR>This message and its attachments are addressed solely to the=20
persons<BR>above and may contain confidential information. If you have=20
received<BR>the message in error, be informed that any use of the =
content=20
hereof<BR>is prohibited. Please return it immediately to the sender and=20
delete<BR>the message. Should you have any questions, please send an =
e_mail=20
to<BR>&lt;<A=20
href=3D"mailto:webmaster@telecomitalia.it">mailto:webmaster@telecomitalia=
.it</A>&gt;webmaster@telecomitalia.it.=20
Thank you<BR>&lt;<A=20
href=3D"http://www.loquendo.com">http://www.loquendo.com</A>&gt;www.loque=
ndo.com<BR><FONT=20
size=3D3>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D</FONT><BR></P></FONT>
</BODY></HTML>

------_=_NextPart_001_01C6AC06.D8BED6BB--


--===============0080865426==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============0080865426==--


From speechsc-bounces@ietf.org Fri Jul 21 09:25:21 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1G3v0K-0006Hh-KQ; Fri, 21 Jul 2006 09:25:20 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1G3v0H-0006C2-6E
	for speechsc@ietf.org; Fri, 21 Jul 2006 09:25:17 -0400
Received: from szxga03-in.huawei.com ([61.144.161.55])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G3umZ-0007PD-V2
	for speechsc@ietf.org; Fri, 21 Jul 2006 09:11:11 -0400
Received: from huawei.com (szxga03-in [172.24.2.9])
	by szxga03-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTP id <0J2R00K8N91BR8@szxga03-in.huawei.com> for
	speechsc@ietf.org; Fri, 21 Jul 2006 21:19:59 +0800 (CST)
Received: from huawei.com ([172.24.1.24])
	by szxga03-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTP id <0J2R0037191AIM@szxga03-in.huawei.com> for
	speechsc@ietf.org; Fri, 21 Jul 2006 21:19:58 +0800 (CST)
Received: from htiplSASHIDHAR ([10.18.5.55])
	by szxml04-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 1.25
	(built Mar
	3 2004)) with ESMTPA id <0J2R0031R93Y96@szxml04-in.huawei.com> for
	speechsc@ietf.org; Fri, 21 Jul 2006 21:21:35 +0800 (CST)
Date: Fri, 21 Jul 2006 18:40:09 +0530
From: jakki sasidhar <jakkis@huawei.com>
To: speechsc@ietf.org
Message-id: <006101c6acc6$febb7d00$3705120a@china.huawei.com>
Organization: htipl
MIME-version: 1.0
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Mailer: Microsoft Office Outlook 11
Thread-index: Acasxv4NRdkPpLRCStiUSiJJdKGDgA==
X-Spam-Score: 0.1 (/)
X-Scan-Signature: c0bedb65cce30976f0bf60a0a39edea4
Cc: sarvi@cisco.com, david.burke@voxpilot.com
Subject: [Speechsc] Queuing in ASR
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: jakkis@huawei.com
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1223492474=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============1223492474==
Content-type: multipart/alternative;
	boundary="Boundary_(ID_8jcFgNuhrkarC30oBexUjw)"

This is a multi-part message in MIME format.

--Boundary_(ID_8jcFgNuhrkarC30oBexUjw)
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT

Hi,
    MRCP draft-ietf-speechsc-mrcpv2-10 is not clear about the queuing of
RECOGNIZE request at the ASR resource. As per section 9.9, if the server
receives a RECOGNIZE request and the ASR resource is currently active with
another request, then a 200 response with PENDING state will be dispatched
to the client. But once the ASR becomes free and starts serving the queued
request, it needs to indicate this to the client. In the case of TTS, this
is done by using the SPEECH-MARKER event which carries a NULL marker and the
state as IN-PROGRESS. But for ASR, the spec doesnot define any such
mechanism. 
    Also, if there are multiple requests pending at the client side, a STOP
request defined for TTS can carry a list of request ids to specify which all
requests should be stopped. There is no such provision in case of the STOP
of RECOGNIZE requests.
    Any comments on this will be appreciated.
 
Thanks & Regards,
Sasidhar

--Boundary_(ID_8jcFgNuhrkarC30oBexUjw)
Content-type: text/html; charset=us-ascii
Content-transfer-encoding: 7BIT

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.2912" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=241065112-21072006><FONT face=Arial 
size=2>Hi,</FONT></SPAN></DIV>
<DIV><SPAN class=241065112-21072006>&nbsp;&nbsp;&nbsp; <FONT face=Arial 
size=2>MRCP </FONT><FONT face=Arial><FONT 
size=2>draft-ietf-speechsc-mrcpv2-1<SPAN class=241065112-21072006>0 is not clear 
about the queuing&nbsp;of&nbsp;RECOGNIZE request at the ASR resource. As per 
section 9.9, if the server receives a RECOGNIZE request and the ASR resource is 
currently active with another&nbsp;request, then&nbsp;a 200 response with 
PENDING state will be dispatched to the client. But once the ASR&nbsp;becomes 
free and starts serving the queued request, it needs to indicate this&nbsp;to 
the client.&nbsp;</SPAN></FONT></FONT></SPAN><SPAN 
class=241065112-21072006><FONT face=Arial><FONT size=2><SPAN 
class=241065112-21072006>In the case of TTS, this is done by using the 
SPEECH-MARKER event which carries a NULL marker and the state as IN-PROGRESS. 
But for ASR, the spec doesnot define any such mechanism. 
</SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=241065112-21072006><FONT face=Arial><FONT size=2><SPAN 
class=241065112-21072006>&nbsp;&nbsp;&nbsp; Also, if there are multiple requests 
pending at the client side, a STOP request defined for TTS can carry a list of 
request ids to specify which all requests should be stopped. There is no such 
provision in case of the STOP of&nbsp;RECOGNIZE 
requests.</SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=241065112-21072006><FONT face=Arial><FONT size=2><SPAN 
class=241065112-21072006>&nbsp;&nbsp;&nbsp; Any comments on this will be 
appreciated.</SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=241065112-21072006><FONT face=Arial><FONT size=2><SPAN 
class=241065112-21072006></SPAN></FONT></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=241065112-21072006><FONT face=Arial><FONT size=2><SPAN 
class=241065112-21072006>Thanks &amp; Regards,</SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=241065112-21072006><FONT face=Arial><FONT size=2><SPAN 
class=241065112-21072006>Sasidhar</SPAN></FONT></FONT></SPAN></DIV></BODY></HTML>

--Boundary_(ID_8jcFgNuhrkarC30oBexUjw)--


--===============1223492474==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============1223492474==--


From speechsc-bounces@ietf.org Fri Jul 21 13:51:31 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1G3z9u-0001Zf-Rv; Fri, 21 Jul 2006 13:51:30 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1G3z9s-0001Xg-MD
	for speechsc@ietf.org; Fri, 21 Jul 2006 13:51:28 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G3z3h-000436-CX
	for speechsc@ietf.org; Fri, 21 Jul 2006 13:45:07 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id EB8E7214105; Fri, 21 Jul 2006 17:45:03 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-4.1 required=5.5 tests=ALL_TRUSTED,AWL,BAYES_00,
	HTML_50_60,HTML_MESSAGE autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (dsl-34-34.dsl.netsource.ie [213.79.34.34])
	by mail.voxpilot.com (Postfix) with ESMTP
	id 6D174214105; Fri, 21 Jul 2006 17:44:55 +0000 (GMT)
Message-ID: <010101c6aced$5a0b1460$6700000a@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: <jakkis@huawei.com>, <speechsc@ietf.org>
References: <006101c6acc6$febb7d00$3705120a@china.huawei.com>
Subject: Re: [Speechsc] Queuing in ASR
Date: Fri, 21 Jul 2006 18:44:43 +0100
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 827a2a57ca7ab0837847220f447e8d56
Cc: sarvi@cisco.com
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0880876201=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============0880876201==
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_00FE_01C6ACF5.BB892490"

This is a multi-part message in MIME format.

------=_NextPart_000_00FE_01C6ACF5.BB892490
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Good comments. I had also spotted the problem of notifying PENDING -> =
IN-PROGRESS for the speechrecog resource. My personal preference is to =
simply remove the queuing of RECOGNIZE requests as I don't see any =
worthwhile value to this feature. It will also be quicker than fixing it =
though that's just a nice side-effect. Any vehement disagreements?

Dave

----- Original Message -----=20
  From: jakki sasidhar=20
  To: speechsc@ietf.org=20
  Cc: sarvi@cisco.com ; david.burke@voxpilot.com=20
  Sent: Friday, July 21, 2006 2:10 PM
  Subject: [Speechsc] Queuing in ASR


  Hi,
      MRCP draft-ietf-speechsc-mrcpv2-10 is not clear about the queuing =
of RECOGNIZE request at the ASR resource. As per section 9.9, if the =
server receives a RECOGNIZE request and the ASR resource is currently =
active with another request, then a 200 response with PENDING state will =
be dispatched to the client. But once the ASR becomes free and starts =
serving the queued request, it needs to indicate this to the client. In =
the case of TTS, this is done by using the SPEECH-MARKER event which =
carries a NULL marker and the state as IN-PROGRESS. But for ASR, the =
spec doesnot define any such mechanism.=20
      Also, if there are multiple requests pending at the client side, a =
STOP request defined for TTS can carry a list of request ids to specify =
which all requests should be stopped. There is no such provision in case =
of the STOP of RECOGNIZE requests.
      Any comments on this will be appreciated.

  Thanks & Regards,
  Sasidhar


-------------------------------------------------------------------------=
-----


  _______________________________________________
  Speechsc mailing list
  Speechsc@ietf.org
  https://www1.ietf.org/mailman/listinfo/speechsc

------=_NextPart_000_00FE_01C6ACF5.BB892490
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2900.2912" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Good comments. I had also spotted the =
problem of=20
notifying PENDING -&gt; IN-PROGRESS for the speechrecog resource. My =
personal=20
preference is to simply remove the queuing of RECOGNIZE requests as I =
don't see=20
any worthwhile value to this feature. It will also be quicker than =
fixing it=20
though that's just a nice side-effect. Any vehement =
disagreements?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV>----- Original Message ----- </DIV>
<BLOCKQUOTE=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV=20
  style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
  <A title=3Djakkis@huawei.com href=3D"mailto:jakkis@huawei.com">jakki =
sasidhar</A>=20
  </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3Dspeechsc@ietf.org=20
  href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Cc:</B> <A title=3Dsarvi@cisco.com=20
  href=3D"mailto:sarvi@cisco.com">sarvi@cisco.com</A> ; <A=20
  title=3Ddavid.burke@voxpilot.com=20
  href=3D"mailto:david.burke@voxpilot.com">david.burke@voxpilot.com</A> =
</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Friday, July 21, 2006 =
2:10 PM</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [Speechsc] Queuing in =
ASR</DIV>
  <DIV><BR></DIV>
  <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial=20
  size=3D2>Hi,</FONT></SPAN></DIV>
  <DIV><SPAN class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; <FONT =
face=3DArial=20
  size=3D2>MRCP </FONT><FONT face=3DArial><FONT=20
  size=3D2>draft-ietf-speechsc-mrcpv2-1<SPAN =
class=3D241065112-21072006>0 is not=20
  clear about the queuing&nbsp;of&nbsp;RECOGNIZE request at the ASR =
resource. As=20
  per section 9.9, if the server receives a RECOGNIZE request and the =
ASR=20
  resource is currently active with another&nbsp;request, then&nbsp;a =
200=20
  response with PENDING state will be dispatched to the client. But once =
the=20
  ASR&nbsp;becomes free and starts serving the queued request, it needs =
to=20
  indicate this&nbsp;to the =
client.&nbsp;</SPAN></FONT></FONT></SPAN><SPAN=20
  class=3D241065112-21072006><FONT face=3DArial><FONT size=3D2><SPAN=20
  class=3D241065112-21072006>In the case of TTS, this is done by using =
the=20
  SPEECH-MARKER event which carries a NULL marker and the state as =
IN-PROGRESS.=20
  But for ASR, the spec doesnot define any such mechanism.=20
  </SPAN></FONT></FONT></SPAN></DIV>
  <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
  class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; Also, if there are =
multiple=20
  requests pending at the client side, a STOP request defined for TTS =
can carry=20
  a list of request ids to specify which all requests should be stopped. =
There=20
  is no such provision in case of the STOP of&nbsp;RECOGNIZE=20
  requests.</SPAN></FONT></FONT></SPAN></DIV>
  <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
  class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; Any comments on this =
will be=20
  appreciated.</SPAN></FONT></FONT></SPAN></DIV>
  <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
  class=3D241065112-21072006></SPAN></FONT></FONT></SPAN>&nbsp;</DIV>
  <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
  class=3D241065112-21072006>Thanks &amp;=20
  Regards,</SPAN></FONT></FONT></SPAN></DIV>
  <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
  class=3D241065112-21072006>Sasidhar</SPAN></FONT></FONT></SPAN></DIV>
  <P>
  <HR>

  <P></P>_______________________________________________<BR>Speechsc =
mailing=20
  =
list<BR>Speechsc@ietf.org<BR>https://www1.ietf.org/mailman/listinfo/speec=
hsc<BR></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_00FE_01C6ACF5.BB892490--


--===============0880876201==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============0880876201==--


From speechsc-bounces@ietf.org Fri Jul 21 14:31:13 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1G3zmL-0007yW-6M; Fri, 21 Jul 2006 14:31:13 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1G3zmK-0007yP-CX
	for speechsc@ietf.org; Fri, 21 Jul 2006 14:31:12 -0400
Received: from rat01037.dc-ratingen.de ([195.233.129.142])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G3zmH-0002wI-JK
	for speechsc@ietf.org; Fri, 21 Jul 2006 14:31:12 -0400
Received: from rat01047.dc-ratingen.de (rat01047_e0 [195.233.128.119])
	by rat01037.dc-ratingen.de (Switch-3.1.4/Switch-3.1.0) with ESMTP id
	k6LIUrvV016382; Fri, 21 Jul 2006 20:30:53 +0200 (MEST)
Received: from gpsmxr04.gps.internal.vodafone.com ([195.232.231.115])
	by rat01047.dc-ratingen.de (Switch-3.1.4/Switch-3.1.0) with ESMTP id
	k6LIUqQE025583; Fri, 21 Jul 2006 20:30:53 +0200 (MEST)
Received: from gpsmx05.gps.internal.vodafone.com ([145.230.32.22]) by
	gpsmxr04.gps.internal.vodafone.com with Microsoft
	SMTPSVC(6.0.3790.1830); Fri, 21 Jul 2006 20:30:52 +0200
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Subject: RE: [Speechsc] Queuing in ASR
Date: Fri, 21 Jul 2006 20:30:46 +0200
Message-ID: <21FBFFD8B2486242AB9A09E20A7E9C44F66322@gpsmx05.gps.internal.vodafone.com>
In-Reply-To: <010101c6aced$5a0b1460$6700000a@db01.voxpilot.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [Speechsc] Queuing in ASR
Thread-Index: Acas7nMB18gfCMNkTsOoPWiQJG/w4QAATwXQ
From: "Reifenrath, Klaus, VF-Group" <Klaus.Reifenrath@vodafone.com>
To: "Dave Burke" <david.burke@voxpilot.com>, <jakkis@huawei.com>,
	<speechsc@ietf.org>
X-OriginalArrivalTime: 21 Jul 2006 18:30:52.0620 (UTC)
	FILETIME=[CBFFB0C0:01C6ACF3]
X-Spam-Score: 0.1 (/)
X-Scan-Signature: f0b5a4216bfa030ed8a6f68d1833f8ae
Cc: sarvi@cisco.com
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1676703811=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============1676703811==
Content-class: urn:content-classes:message
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C6ACF3.CCF834F5"

This is a multi-part message in MIME format.

------_=_NextPart_001_01C6ACF3.CCF834F5
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

The queuing for speech recognition resources was introduced in the
context of hotword recognition. Very often you want to start a normal
recognition after a hotword recognition without loosing audio blocks
between the two recognitions. E.g. a hotword recognition is used to
detect the wake-up  ("Computer") and a normal recognition to recognize
the following request ("Please check my inbox!"). If we cannot queue
recognition requests, the recognition resource will miss meaningful
audio blocks, because the client can start the next recognition only
after it received the recognition result.=20
=20
The START-OF-SPEECH event indicated that the next recognition is
started.=20
=20
Klaus
=20


________________________________

	From: Dave Burke [mailto:david.burke@voxpilot.com]=20
	Sent: Freitag, 21. Juli 2006 19:45
	To: jakkis@huawei.com; speechsc@ietf.org
	Cc: sarvi@cisco.com
	Subject: Re: [Speechsc] Queuing in ASR
=09
=09
	Good comments. I had also spotted the problem of notifying
PENDING -> IN-PROGRESS for the speechrecog resource. My personal
preference is to simply remove the queuing of RECOGNIZE requests as I
don't see any worthwhile value to this feature. It will also be quicker
than fixing it though that's just a nice side-effect. Any vehement
disagreements?
	=20
	Dave
	=20
	----- Original Message -----=20

		From: jakki sasidhar <mailto:jakkis@huawei.com> =20
		To: speechsc@ietf.org=20
		Cc: sarvi@cisco.com ; david.burke@voxpilot.com=20
		Sent: Friday, July 21, 2006 2:10 PM
		Subject: [Speechsc] Queuing in ASR

		Hi,
		    MRCP draft-ietf-speechsc-mrcpv2-10 is not clear
about the queuing of RECOGNIZE request at the ASR resource. As per
section 9.9, if the server receives a RECOGNIZE request and the ASR
resource is currently active with another request, then a 200 response
with PENDING state will be dispatched to the client. But once the ASR
becomes free and starts serving the queued request, it needs to indicate
this to the client. In the case of TTS, this is done by using the
SPEECH-MARKER event which carries a NULL marker and the state as
IN-PROGRESS. But for ASR, the spec doesnot define any such mechanism.=20
		    Also, if there are multiple requests pending at the
client side, a STOP request defined for TTS can carry a list of request
ids to specify which all requests should be stopped. There is no such
provision in case of the STOP of RECOGNIZE requests.
		    Any comments on this will be appreciated.
		=20
		Thanks & Regards,
		Sasidhar

	=09
________________________________


	=09

		_______________________________________________
		Speechsc mailing list
		Speechsc@ietf.org
		https://www1.ietf.org/mailman/listinfo/speechsc
	=09


------_=_NextPart_001_01C6ACF3.CCF834F5
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii">
<META content=3D"MSHTML 6.00.2900.2912" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>The queuing for speech recognition resources =
was introduced=20
in the context of hotword recognition. Very often you want to start a =
normal=20
recognition&nbsp;after a hotword recognition without loosing audio =
blocks=20
between the two recognitions. E.g. a hotword recognition is used to =
detect the=20
wake-up&nbsp; ("Computer") and a normal recognition to recognize the =
following=20
request ("Please check my inbox!"). If we cannot queue recognition=20
requests,&nbsp;the recognition resource will miss meaningful audio =
blocks,=20
because the client can start the next recognition only&nbsp;after it =
received=20
the recognition result. </FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>The START-OF-SPEECH&nbsp;event indicated that =
the next=20
recognition is started.&nbsp;</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>Klaus</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV><BR>
<BLOCKQUOTE dir=3Dltr style=3D"MARGIN-RIGHT: 0px">
  <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
  <HR tabIndex=3D-1>
  <FONT face=3DTahoma size=3D2><B>From:</B> Dave Burke=20
  [mailto:david.burke@voxpilot.com] <BR><B>Sent:</B> Freitag, 21. Juli =
2006=20
  19:45<BR><B>To:</B> jakkis@huawei.com; speechsc@ietf.org<BR><B>Cc:</B> =

  sarvi@cisco.com<BR><B>Subject:</B> Re: [Speechsc] Queuing in=20
  ASR<BR></FONT><BR></DIV>
  <DIV></DIV>
  <DIV><FONT face=3DArial size=3D2>Good comments. I had also spotted the =
problem of=20
  notifying PENDING -&gt; IN-PROGRESS for the speechrecog resource. My =
personal=20
  preference is to simply remove the queuing of RECOGNIZE requests as I =
don't=20
  see any worthwhile value to this feature. It will also be quicker than =
fixing=20
  it though that's just a nice side-effect. Any vehement=20
  disagreements?</FONT></DIV>
  <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
  <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
  <DIV>----- Original Message ----- </DIV>
  <BLOCKQUOTE=20
  style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
    <DIV=20
    style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
    <A title=3Djakkis@huawei.com href=3D"mailto:jakkis@huawei.com">jakki =

    sasidhar</A> </DIV>
    <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3Dspeechsc@ietf.org=20
    href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A> </DIV>
    <DIV style=3D"FONT: 10pt arial"><B>Cc:</B> <A =
title=3Dsarvi@cisco.com=20
    href=3D"mailto:sarvi@cisco.com">sarvi@cisco.com</A> ; <A=20
    title=3Ddavid.burke@voxpilot.com=20
    =
href=3D"mailto:david.burke@voxpilot.com">david.burke@voxpilot.com</A> =
</DIV>
    <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Friday, July 21, 2006 =
2:10=20
    PM</DIV>
    <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [Speechsc] Queuing =
in=20
ASR</DIV>
    <DIV><BR></DIV>
    <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial=20
    size=3D2>Hi,</FONT></SPAN></DIV>
    <DIV><SPAN class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; <FONT =
face=3DArial=20
    size=3D2>MRCP </FONT><FONT face=3DArial><FONT=20
    size=3D2>draft-ietf-speechsc-mrcpv2-1<SPAN =
class=3D241065112-21072006>0 is not=20
    clear about the queuing&nbsp;of&nbsp;RECOGNIZE request at the ASR =
resource.=20
    As per section 9.9, if the server receives a RECOGNIZE request and =
the ASR=20
    resource is currently active with another&nbsp;request, then&nbsp;a =
200=20
    response with PENDING state will be dispatched to the client. But =
once the=20
    ASR&nbsp;becomes free and starts serving the queued request, it =
needs to=20
    indicate this&nbsp;to the =
client.&nbsp;</SPAN></FONT></FONT></SPAN><SPAN=20
    class=3D241065112-21072006><FONT face=3DArial><FONT size=3D2><SPAN=20
    class=3D241065112-21072006>In the case of TTS, this is done by using =
the=20
    SPEECH-MARKER event which carries a NULL marker and the state as=20
    IN-PROGRESS. But for ASR, the spec doesnot define any such =
mechanism.=20
    </SPAN></FONT></FONT></SPAN></DIV>
    <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
    class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; Also, if there are =
multiple=20
    requests pending at the client side, a STOP request defined for TTS =
can=20
    carry a list of request ids to specify which all requests should be =
stopped.=20
    There is no such provision in case of the STOP of&nbsp;RECOGNIZE=20
    requests.</SPAN></FONT></FONT></SPAN></DIV>
    <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
    class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; Any comments on this =
will be=20
    appreciated.</SPAN></FONT></FONT></SPAN></DIV>
    <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
    class=3D241065112-21072006></SPAN></FONT></FONT></SPAN>&nbsp;</DIV>
    <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
    class=3D241065112-21072006>Thanks &amp;=20
    Regards,</SPAN></FONT></FONT></SPAN></DIV>
    <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
    =
class=3D241065112-21072006>Sasidhar</SPAN></FONT></FONT></SPAN></DIV>
    <P>
    <HR>

    <P></P>_______________________________________________<BR>Speechsc =
mailing=20
    =
list<BR>Speechsc@ietf.org<BR>https://www1.ietf.org/mailman/listinfo/speec=
hsc<BR></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>

------_=_NextPart_001_01C6ACF3.CCF834F5--


--===============1676703811==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============1676703811==--


From speechsc-bounces@ietf.org Fri Jul 21 16:56:48 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1G423B-0007RF-MG; Fri, 21 Jul 2006 16:56:45 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1G423A-0007H4-Hl
	for speechsc@ietf.org; Fri, 21 Jul 2006 16:56:44 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G41sm-0002uB-H9
	for speechsc@ietf.org; Fri, 21 Jul 2006 16:46:01 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id E54B8214104; Fri, 21 Jul 2006 20:45:58 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-4.1 required=5.5 tests=ALL_TRUSTED,AWL,BAYES_00,
	HTML_50_60,HTML_MESSAGE autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (dsl-34-34.dsl.netsource.ie [213.79.34.34])
	by mail.voxpilot.com (Postfix) with ESMTP
	id B4085214104; Fri, 21 Jul 2006 20:45:52 +0000 (GMT)
Message-ID: <013601c6ad06$a1bfd2f0$6700000a@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: "Reifenrath, Klaus, VF-Group" <Klaus.Reifenrath@vodafone.com>,
	<jakkis@huawei.com>, <speechsc@ietf.org>
References: <21FBFFD8B2486242AB9A09E20A7E9C44F66322@gpsmx05.gps.internal.vodafone.com>
Subject: Re: [Speechsc] Queuing in ASR
Date: Fri, 21 Jul 2006 21:45:41 +0100
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 1c0c3d540ad9f95212b1c2a9a2cc2595
Cc: sarvi@cisco.com
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1127099333=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============1127099333==
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0133_01C6AD0F.02FDCCE0"

This is a multi-part message in MIME format.

------=_NextPart_000_0133_01C6AD0F.02FDCCE0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Thanks for the clarification.

Still a little puzzled as to the features merit. Couldn't you have =
"Computer please check my inbox" as a hotword phrase and just have a =
single recognition - especially as you can't make the second =
recognitions grammar conditional on the first's result? I'm not sure =
START-OF-SPEECH is so elegant here - e.g. you're not guaranteed it will =
be generated if the user just says "Computer". Also, START-OF-SPEECH =
doesn't correspond accurately to the PENDING to IN-PROGRESS transition =
but rather with a max error of No-Input-Timeout.

Dave
  ----- Original Message -----=20
  From: Reifenrath, Klaus, VF-Group=20
  To: Dave Burke ; jakkis@huawei.com ; speechsc@ietf.org=20
  Cc: sarvi@cisco.com=20
  Sent: Friday, July 21, 2006 7:30 PM
  Subject: RE: [Speechsc] Queuing in ASR


  The queuing for speech recognition resources was introduced in the =
context of hotword recognition. Very often you want to start a normal =
recognition after a hotword recognition without loosing audio blocks =
between the two recognitions. E.g. a hotword recognition is used to =
detect the wake-up  ("Computer") and a normal recognition to recognize =
the following request ("Please check my inbox!"). If we cannot queue =
recognition requests, the recognition resource will miss meaningful =
audio blocks, because the client can start the next recognition only =
after it received the recognition result.=20

  The START-OF-SPEECH event indicated that the next recognition is =
started.=20

  Klaus


-------------------------------------------------------------------------=
---
    From: Dave Burke [mailto:david.burke@voxpilot.com]=20
    Sent: Freitag, 21. Juli 2006 19:45
    To: jakkis@huawei.com; speechsc@ietf.org
    Cc: sarvi@cisco.com
    Subject: Re: [Speechsc] Queuing in ASR


    Good comments. I had also spotted the problem of notifying PENDING =
-> IN-PROGRESS for the speechrecog resource. My personal preference is =
to simply remove the queuing of RECOGNIZE requests as I don't see any =
worthwhile value to this feature. It will also be quicker than fixing it =
though that's just a nice side-effect. Any vehement disagreements?

    Dave

    ----- Original Message -----=20
      From: jakki sasidhar=20
      To: speechsc@ietf.org=20
      Cc: sarvi@cisco.com ; david.burke@voxpilot.com=20
      Sent: Friday, July 21, 2006 2:10 PM
      Subject: [Speechsc] Queuing in ASR


      Hi,
          MRCP draft-ietf-speechsc-mrcpv2-10 is not clear about the =
queuing of RECOGNIZE request at the ASR resource. As per section 9.9, if =
the server receives a RECOGNIZE request and the ASR resource is =
currently active with another request, then a 200 response with PENDING =
state will be dispatched to the client. But once the ASR becomes free =
and starts serving the queued request, it needs to indicate this to the =
client. In the case of TTS, this is done by using the SPEECH-MARKER =
event which carries a NULL marker and the state as IN-PROGRESS. But for =
ASR, the spec doesnot define any such mechanism.=20
          Also, if there are multiple requests pending at the client =
side, a STOP request defined for TTS can carry a list of request ids to =
specify which all requests should be stopped. There is no such provision =
in case of the STOP of RECOGNIZE requests.
          Any comments on this will be appreciated.

      Thanks & Regards,
      Sasidhar


-------------------------------------------------------------------------=
-


      _______________________________________________
      Speechsc mailing list
      Speechsc@ietf.org
      https://www1.ietf.org/mailman/listinfo/speechsc


-------------------------------------------------------------------------=
-----


  _______________________________________________
  Speechsc mailing list
  Speechsc@ietf.org
  https://www1.ietf.org/mailman/listinfo/speechsc

------=_NextPart_000_0133_01C6AD0F.02FDCCE0
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2900.2912" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Thanks for the =
clarification.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Still a little puzzled as to the =
features merit.=20
Couldn't you have "Computer please check my inbox" as a hotword phrase =
and just=20
have a single recognition - especially as you can't make the second =
recognitions=20
grammar conditional on the first's result?&nbsp;I'm not sure =
START-OF-SPEECH is=20
so elegant here - e.g. you're not guaranteed it will be generated if the =
user=20
just says "Computer". Also,&nbsp;START-OF-SPEECH doesn't correspond =
accurately=20
to the PENDING to IN-PROGRESS transition but rather with&nbsp;a max =
error of=20
No-Input-Timeout.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
<BLOCKQUOTE=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
  <DIV=20
  style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
  <A title=3DKlaus.Reifenrath@vodafone.com=20
  href=3D"mailto:Klaus.Reifenrath@vodafone.com">Reifenrath, Klaus, =
VF-Group</A>=20
  </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3Ddavid.burke@voxpilot.com=20
  href=3D"mailto:david.burke@voxpilot.com">Dave Burke</A> ; <A=20
  title=3Djakkis@huawei.com =
href=3D"mailto:jakkis@huawei.com">jakkis@huawei.com</A>=20
  ; <A title=3Dspeechsc@ietf.org=20
  href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Cc:</B> <A title=3Dsarvi@cisco.com=20
  href=3D"mailto:sarvi@cisco.com">sarvi@cisco.com</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Friday, July 21, 2006 =
7:30 PM</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> RE: [Speechsc] Queuing =
in=20
  ASR</DIV>
  <DIV><BR></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>The queuing for speech recognition resources =
was=20
  introduced in the context of hotword recognition. Very often you want =
to start=20
  a normal recognition&nbsp;after a hotword recognition without loosing =
audio=20
  blocks between the two recognitions. E.g. a hotword recognition is =
used to=20
  detect the wake-up&nbsp; ("Computer") and a normal recognition to =
recognize=20
  the following request ("Please check my inbox!"). If we cannot queue=20
  recognition requests,&nbsp;the recognition resource will miss =
meaningful audio=20
  blocks, because the client can start the next recognition =
only&nbsp;after it=20
  received the recognition result. </FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>The START-OF-SPEECH&nbsp;event indicated that =
the next=20
  recognition is started.&nbsp;</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>Klaus</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV><BR>
  <BLOCKQUOTE dir=3Dltr style=3D"MARGIN-RIGHT: 0px">
    <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr =
align=3Dleft>
    <HR tabIndex=3D-1>
    <FONT face=3DTahoma size=3D2><B>From:</B> Dave Burke=20
    [mailto:david.burke@voxpilot.com] <BR><B>Sent:</B> Freitag, 21. Juli =
2006=20
    19:45<BR><B>To:</B> jakkis@huawei.com; =
speechsc@ietf.org<BR><B>Cc:</B>=20
    sarvi@cisco.com<BR><B>Subject:</B> Re: [Speechsc] Queuing in=20
    ASR<BR></FONT><BR></DIV>
    <DIV></DIV>
    <DIV><FONT face=3DArial size=3D2>Good comments. I had also spotted =
the problem=20
    of notifying PENDING -&gt; IN-PROGRESS for the speechrecog resource. =
My=20
    personal preference is to simply remove the queuing of RECOGNIZE =
requests as=20
    I don't see any worthwhile value to this feature. It will also be =
quicker=20
    than fixing it though that's just a nice side-effect. Any vehement=20
    disagreements?</FONT></DIV>
    <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
    <DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
    <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
    <DIV>----- Original Message ----- </DIV>
    <BLOCKQUOTE=20
    style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
      <DIV=20
      style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
      <A title=3Djakkis@huawei.com =
href=3D"mailto:jakkis@huawei.com">jakki=20
      sasidhar</A> </DIV>
      <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3Dspeechsc@ietf.org=20
      href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A> </DIV>
      <DIV style=3D"FONT: 10pt arial"><B>Cc:</B> <A =
title=3Dsarvi@cisco.com=20
      href=3D"mailto:sarvi@cisco.com">sarvi@cisco.com</A> ; <A=20
      title=3Ddavid.burke@voxpilot.com=20
      =
href=3D"mailto:david.burke@voxpilot.com">david.burke@voxpilot.com</A> =
</DIV>
      <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Friday, July 21, 2006 =
2:10=20
      PM</DIV>
      <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [Speechsc] Queuing =
in=20
      ASR</DIV>
      <DIV><BR></DIV>
      <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial=20
      size=3D2>Hi,</FONT></SPAN></DIV>
      <DIV><SPAN class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; <FONT =
face=3DArial=20
      size=3D2>MRCP </FONT><FONT face=3DArial><FONT=20
      size=3D2>draft-ietf-speechsc-mrcpv2-1<SPAN =
class=3D241065112-21072006>0 is not=20
      clear about the queuing&nbsp;of&nbsp;RECOGNIZE request at the ASR=20
      resource. As per section 9.9, if the server receives a RECOGNIZE =
request=20
      and the ASR resource is currently active with =
another&nbsp;request,=20
      then&nbsp;a 200 response with PENDING state will be dispatched to =
the=20
      client. But once the ASR&nbsp;becomes free and starts serving the =
queued=20
      request, it needs to indicate this&nbsp;to the=20
      client.&nbsp;</SPAN></FONT></FONT></SPAN><SPAN=20
      class=3D241065112-21072006><FONT face=3DArial><FONT size=3D2><SPAN =

      class=3D241065112-21072006>In the case of TTS, this is done by =
using the=20
      SPEECH-MARKER event which carries a NULL marker and the state as=20
      IN-PROGRESS. But for ASR, the spec doesnot define any such =
mechanism.=20
      </SPAN></FONT></FONT></SPAN></DIV>
      <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
      class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; Also, if there are =
multiple=20
      requests pending at the client side, a STOP request defined for =
TTS can=20
      carry a list of request ids to specify which all requests should =
be=20
      stopped. There is no such provision in case of the STOP =
of&nbsp;RECOGNIZE=20
      requests.</SPAN></FONT></FONT></SPAN></DIV>
      <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
      class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; Any comments on this =
will be=20
      appreciated.</SPAN></FONT></FONT></SPAN></DIV>
      <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
      =
class=3D241065112-21072006></SPAN></FONT></FONT></SPAN>&nbsp;</DIV>
      <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
      class=3D241065112-21072006>Thanks &amp;=20
      Regards,</SPAN></FONT></FONT></SPAN></DIV>
      <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
      =
class=3D241065112-21072006>Sasidhar</SPAN></FONT></FONT></SPAN></DIV>
      <P>
      <HR>

      <P></P>_______________________________________________<BR>Speechsc =
mailing=20
      =
list<BR>Speechsc@ietf.org<BR>https://www1.ietf.org/mailman/listinfo/speec=
hsc<BR></BLOCKQUOTE></BLOCKQUOTE>
  <P>
  <HR>

  <P></P>_______________________________________________<BR>Speechsc =
mailing=20
  =
list<BR>Speechsc@ietf.org<BR>https://www1.ietf.org/mailman/listinfo/speec=
hsc<BR></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_0133_01C6AD0F.02FDCCE0--


--===============1127099333==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============1127099333==--


From speechsc-bounces@ietf.org Mon Jul 24 06:09:00 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1G4xMx-0001zW-GR; Mon, 24 Jul 2006 06:08:59 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1G4xMw-0001yq-0I
	for speechsc@ietf.org; Mon, 24 Jul 2006 06:08:58 -0400
Received: from mailf.telecomitalia.it ([156.54.233.32])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G4xMs-0006Gn-Cu
	for speechsc@ietf.org; Mon, 24 Jul 2006 06:08:57 -0400
Received: from ptpxch008ba020.idc.cww.telecomitalia.it ([156.54.240.51]) by
	mailf.telecomitalia.it with Microsoft SMTPSVC(6.0.3790.1830);
	Mon, 24 Jul 2006 12:08:48 +0200
Received: from PTPEVS106BA020.idc.cww.telecomitalia.it ([156.54.241.223]) by
	ptpxch008ba020.idc.cww.telecomitalia.it with Microsoft
	SMTPSVC(6.0.3790.1830); Mon, 24 Jul 2006 12:08:48 +0200
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.2663
Content-Class: urn:content-classes:message
MIME-Version: 1.0
Importance: normal
Priority: normal
Subject: RE: [Speechsc] Queuing in ASR
Date: Mon, 24 Jul 2006 12:05:47 +0200
Message-ID: <01C0B9926BC410459FE9AACE49B815027DB61F@PTPEVS106BA020.idc.cww.telecomitalia.it>
In-Reply-To: <013601c6ad06$a1bfd2f0$6700000a@db01.voxpilot.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [Speechsc] Queuing in ASR
thread-index: AcatCDicoVrwl8IgRBmsquEdf76jrgB/jkPw
From: "Bergallo Patrizio" <patrizio.bergallo@loquendo.com>
To: "Dave Burke" <david.burke@voxpilot.com>,
	"Reifenrath, Klaus, VF-Group" <Klaus.Reifenrath@vodafone.com>,
	<jakkis@huawei.com>, <speechsc@ietf.org>
X-OriginalArrivalTime: 24 Jul 2006 10:08:48.0310 (UTC)
	FILETIME=[27C00560:01C6AF09]
X-Spam-Score: 0.1 (/)
X-Scan-Signature: dadeebe491e67c033a493fd3c7d6792b
Cc: sarvi@cisco.com
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0448460399=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============0448460399==
Content-Transfer-Encoding: 7bit
Content-Class: urn:content-classes:message
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C6AF09.27649057"

This is a multi-part message in MIME format.

------_=_NextPart_001_01C6AF09.27649057
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hi,
=20
about this topic Issue 20 (
<https://www.softarmor.com/roundup/speechsc/issue20>
https://www.softarmor.com/roundup/speechsc/issue20) has been raised 13
months ago, but it is unfortunately still in unread status. Anyway, we
proposed a new RECOGNITION-STARTED event, and a similar one for
verification (VERIFICATION-STARTED), that has the same problem.
A part from introducing a new event or re-using an existent one, we
think this kind of signal is still useful.
We are ok to extend the meaning of START-OF-INPUT for this purpose, i.e.
using a new or blank value for Input-type header, that should be similar
to the way used in the synthesizer resource with the SPEECH-MARKER event
with the speech-marker header.
=20
Regards,
Patrizio Bergallo & Vittorio Manzone, Loquendo.


-----Original Message-----
From: Dave Burke [mailto:david.burke@voxpilot.com]=20
Sent: Friday, July 21, 2006 10:46 PM
To: Reifenrath, Klaus, VF-Group; jakkis@huawei.com; speechsc@ietf.org
Cc: sarvi@cisco.com
Subject: Re: [Speechsc] Queuing in ASR


Thanks for the clarification.
=20
Still a little puzzled as to the features merit. Couldn't you have
"Computer please check my inbox" as a hotword phrase and just have a
single recognition - especially as you can't make the second
recognitions grammar conditional on the first's result? I'm not sure
START-OF-SPEECH is so elegant here - e.g. you're not guaranteed it will
be generated if the user just says "Computer". Also, START-OF-SPEECH
doesn't correspond accurately to the PENDING to IN-PROGRESS transition
but rather with a max error of No-Input-Timeout.
=20
Dave

----- Original Message -----=20
From: Reifenrath, Klaus, VF-Group <mailto:Klaus.Reifenrath@vodafone.com>

To: Dave Burke <mailto:david.burke@voxpilot.com>  ; jakkis@huawei.com ;
speechsc@ietf.org=20
Cc: sarvi@cisco.com=20
Sent: Friday, July 21, 2006 7:30 PM
Subject: RE: [Speechsc] Queuing in ASR

The queuing for speech recognition resources was introduced in the
context of hotword recognition. Very often you want to start a normal
recognition after a hotword recognition without loosing audio blocks
between the two recognitions. E.g. a hotword recognition is used to
detect the wake-up  ("Computer") and a normal recognition to recognize
the following request ("Please check my inbox!"). If we cannot queue
recognition requests, the recognition resource will miss meaningful
audio blocks, because the client can start the next recognition only
after it received the recognition result.=20
=20
The START-OF-SPEECH event indicated that the next recognition is
started.=20
=20
Klaus
=20


  _____ =20

From: Dave Burke [mailto:david.burke@voxpilot.com]=20
Sent: Freitag, 21. Juli 2006 19:45
To: jakkis@huawei.com; speechsc@ietf.org
Cc: sarvi@cisco.com
Subject: Re: [Speechsc] Queuing in ASR


Good comments. I had also spotted the problem of notifying PENDING ->
IN-PROGRESS for the speechrecog resource. My personal preference is to
simply remove the queuing of RECOGNIZE requests as I don't see any
worthwhile value to this feature. It will also be quicker than fixing it
though that's just a nice side-effect. Any vehement disagreements?
=20
Dave
=20
----- Original Message -----=20

From: jakki  <mailto:jakkis@huawei.com> sasidhar=20
To: speechsc@ietf.org=20
Cc: sarvi@cisco.com ; david.burke@voxpilot.com=20
Sent: Friday, July 21, 2006 2:10 PM
Subject: [Speechsc] Queuing in ASR

Hi,
    MRCP draft-ietf-speechsc-mrcpv2-10 is not clear about the queuing of
RECOGNIZE request at the ASR resource. As per section 9.9, if the server
receives a RECOGNIZE request and the ASR resource is currently active
with another request, then a 200 response with PENDING state will be
dispatched to the client. But once the ASR becomes free and starts
serving the queued request, it needs to indicate this to the client. In
the case of TTS, this is done by using the SPEECH-MARKER event which
carries a NULL marker and the state as IN-PROGRESS. But for ASR, the
spec doesnot define any such mechanism.=20
    Also, if there are multiple requests pending at the client side, a
STOP request defined for TTS can carry a list of request ids to specify
which all requests should be stopped. There is no such provision in case
of the STOP of RECOGNIZE requests.
    Any comments on this will be appreciated.
=20
Thanks & Regards,
Sasidhar


  _____ =20


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


  _____ =20


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc


Gruppo Telecom Italia - Direzione e coordinamento di Telecom Italia =
S.p.A.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
CONFIDENTIALITY NOTICE
This message and its attachments are addressed solely to the persons =
above and may contain confidential information. If you have received the =
message in error, be informed that any use of the content hereof is =
prohibited. Please return it immediately to the sender and delete the =
message. Should you have any questions, please send an e_mail to =
<mailto:webmaster@telecomitalia.it>webmaster@telecomitalia.it. Thank =
you<http://www.loquendo.com>www.loquendo.com
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

------_=_NextPart_001_01C6AF09.27649057
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<TITLE>Message</TITLE>

<META content=3D"MSHTML 6.00.2900.2802" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>Hi,</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT color=3D#0000ff><FONT face=3DArial><FONT size=3D2>about this =
topic Issue 20=20
(<A href=3D"https://www.softarmor.com/roundup/speechsc/issue20"><FONT =
face=3DArial=20
size=3D2>https://www.softarmor.com/roundup/speechsc/issue20</FONT></A><FO=
NT=20
face=3DArial size=3D2>) </FONT>has been raised 13 months&nbsp;<SPAN=20
class=3D843234909-24072006>ago</SPAN></FONT></FONT><FONT face=3DArial =
size=3D2>, but=20
it is unfortunately still in unread status. Anyway, we proposed a new=20
RECOGNITION-STARTED event, and a similar one for verification=20
(VERIFICATION-STARTED), that has the same problem.</FONT></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>A part from introducing =
a new event=20
or re-using an existent one, we think this kind of signal is still=20
useful.</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>We are ok to extend the =
meaning of=20
START-OF-INPUT for this purpose, i.e. using a new or blank value for =
Input-type=20
header, that should be similar to the way used in the synthesizer =
resource with=20
the SPEECH-MARKER event with the speech-marker header.</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>Regards,</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>Patrizio Bergallo &amp; =
Vittorio=20
Manzone, Loquendo.<BR></FONT></DIV>
<BLOCKQUOTE dir=3Dltr=20
style=3D"PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px =
solid; MARGIN-RIGHT: 0px">
  <DIV></DIV>
  <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr =
align=3Dleft><FONT=20
  face=3DTahoma size=3D2>-----Original Message-----<BR><B>From:</B> Dave =
Burke=20
  [mailto:david.burke@voxpilot.com] <BR><B>Sent:</B> Friday, July 21, =
2006 10:46=20
  PM<BR><B>To:</B> Reifenrath, Klaus, VF-Group; jakkis@huawei.com;=20
  speechsc@ietf.org<BR><B>Cc:</B> sarvi@cisco.com<BR><B>Subject:</B> Re: =

  [Speechsc] Queuing in ASR<BR><BR></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2>Thanks for the =
clarification.</FONT></DIV>
  <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>Still a little puzzled as to the =
features merit.=20
  Couldn't you have "Computer please check my inbox" as a hotword phrase =
and=20
  just have a single recognition - especially as you can't make the =
second=20
  recognitions grammar conditional on the first's result?&nbsp;I'm not =
sure=20
  START-OF-SPEECH is so elegant here - e.g. you're not guaranteed it =
will be=20
  generated if the user just says "Computer". Also,&nbsp;START-OF-SPEECH =
doesn't=20
  correspond accurately to the PENDING to IN-PROGRESS transition but =
rather=20
  with&nbsp;a max error of No-Input-Timeout.</FONT></DIV>
  <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
  <BLOCKQUOTE=20
  style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
    <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
    <DIV=20
    style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
    <A title=3DKlaus.Reifenrath@vodafone.com=20
    href=3D"mailto:Klaus.Reifenrath@vodafone.com">Reifenrath, Klaus, =
VF-Group</A>=20
    </DIV>
    <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3Ddavid.burke@voxpilot.com=20
    href=3D"mailto:david.burke@voxpilot.com">Dave Burke</A> ; <A=20
    title=3Djakkis@huawei.com=20
    href=3D"mailto:jakkis@huawei.com">jakkis@huawei.com</A> ; <A=20
    title=3Dspeechsc@ietf.org=20
    href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A> </DIV>
    <DIV style=3D"FONT: 10pt arial"><B>Cc:</B> <A =
title=3Dsarvi@cisco.com=20
    href=3D"mailto:sarvi@cisco.com">sarvi@cisco.com</A> </DIV>
    <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Friday, July 21, 2006 =
7:30=20
    PM</DIV>
    <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> RE: [Speechsc] =
Queuing in=20
    ASR</DIV>
    <DIV><BR></DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2>The queuing for speech recognition =
resources was=20
    introduced in the context of hotword recognition. Very often you =
want to=20
    start a normal recognition&nbsp;after a hotword recognition without =
loosing=20
    audio blocks between the two recognitions. E.g. a hotword =
recognition is=20
    used to detect the wake-up&nbsp; ("Computer") and a normal =
recognition to=20
    recognize the following request ("Please check my inbox!"). If we =
cannot=20
    queue recognition requests,&nbsp;the recognition resource will miss=20
    meaningful audio blocks, because the client can start the next =
recognition=20
    only&nbsp;after it received the recognition result. =
</FONT></SPAN></DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2>The START-OF-SPEECH&nbsp;event indicated =
that the next=20
    recognition is started.&nbsp;</FONT></SPAN></DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2>Klaus</FONT></SPAN></DIV>
    <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
    color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV><BR>
    <BLOCKQUOTE dir=3Dltr style=3D"MARGIN-RIGHT: 0px">
      <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr =
align=3Dleft>
      <HR tabIndex=3D-1>
      <FONT face=3DTahoma size=3D2><B>From:</B> Dave Burke=20
      [mailto:david.burke@voxpilot.com] <BR><B>Sent:</B> Freitag, 21. =
Juli 2006=20
      19:45<BR><B>To:</B> jakkis@huawei.com; =
speechsc@ietf.org<BR><B>Cc:</B>=20
      sarvi@cisco.com<BR><B>Subject:</B> Re: [Speechsc] Queuing in=20
      ASR<BR></FONT><BR></DIV>
      <DIV></DIV>
      <DIV><FONT face=3DArial size=3D2>Good comments. I had also spotted =
the problem=20
      of notifying PENDING -&gt; IN-PROGRESS for the speechrecog =
resource. My=20
      personal preference is to simply remove the queuing of RECOGNIZE =
requests=20
      as I don't see any worthwhile value to this feature. It will also =
be=20
      quicker than fixing it though that's just a nice side-effect. Any =
vehement=20
      disagreements?</FONT></DIV>
      <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
      <DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
      <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
      <DIV>----- Original Message ----- </DIV>
      <BLOCKQUOTE=20
      style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
        <DIV=20
        style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
        <A title=3Djakkis@huawei.com =
href=3D"mailto:jakkis@huawei.com">jakki=20
        sasidhar</A> </DIV>
        <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3Dspeechsc@ietf.org=20
        href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A> </DIV>
        <DIV style=3D"FONT: 10pt arial"><B>Cc:</B> <A =
title=3Dsarvi@cisco.com=20
        href=3D"mailto:sarvi@cisco.com">sarvi@cisco.com</A> ; <A=20
        title=3Ddavid.burke@voxpilot.com=20
        =
href=3D"mailto:david.burke@voxpilot.com">david.burke@voxpilot.com</A>=20
        </DIV>
        <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Friday, July 21, =
2006 2:10=20
        PM</DIV>
        <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [Speechsc] =
Queuing in=20
        ASR</DIV>
        <DIV><BR></DIV>
        <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial=20
        size=3D2>Hi,</FONT></SPAN></DIV>
        <DIV><SPAN class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; <FONT =
face=3DArial=20
        size=3D2>MRCP </FONT><FONT face=3DArial><FONT=20
        size=3D2>draft-ietf-speechsc-mrcpv2-1<SPAN =
class=3D241065112-21072006>0 is=20
        not clear about the queuing&nbsp;of&nbsp;RECOGNIZE request at =
the ASR=20
        resource. As per section 9.9, if the server receives a RECOGNIZE =
request=20
        and the ASR resource is currently active with =
another&nbsp;request,=20
        then&nbsp;a 200 response with PENDING state will be dispatched =
to the=20
        client. But once the ASR&nbsp;becomes free and starts serving =
the queued=20
        request, it needs to indicate this&nbsp;to the=20
        client.&nbsp;</SPAN></FONT></FONT></SPAN><SPAN=20
        class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
        class=3D241065112-21072006>In the case of TTS, this is done by =
using the=20
        SPEECH-MARKER event which carries a NULL marker and the state as =

        IN-PROGRESS. But for ASR, the spec doesnot define any such =
mechanism.=20
        </SPAN></FONT></FONT></SPAN></DIV>
        <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
        class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; Also, if there are =
multiple=20
        requests pending at the client side, a STOP request defined for =
TTS can=20
        carry a list of request ids to specify which all requests should =
be=20
        stopped. There is no such provision in case of the STOP=20
        of&nbsp;RECOGNIZE requests.</SPAN></FONT></FONT></SPAN></DIV>
        <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
        class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; Any comments on =
this will be=20
        appreciated.</SPAN></FONT></FONT></SPAN></DIV>
        <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
        =
class=3D241065112-21072006></SPAN></FONT></FONT></SPAN>&nbsp;</DIV>
        <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
        class=3D241065112-21072006>Thanks &amp;=20
        Regards,</SPAN></FONT></FONT></SPAN></DIV>
        <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
        =
class=3D241065112-21072006>Sasidhar</SPAN></FONT></FONT></SPAN></DIV>
        <P>
        <HR>

        =
<P></P>_______________________________________________<BR>Speechsc=20
        mailing=20
        =
list<BR>Speechsc@ietf.org<BR>https://www1.ietf.org/mailman/listinfo/speec=
hsc<BR></BLOCKQUOTE></BLOCKQUOTE>
    <P>
    <HR>

    <P></P>_______________________________________________<BR>Speechsc =
mailing=20
    =
list<BR>Speechsc@ietf.org<BR>https://www1.ietf.org/mailman/listinfo/speec=
hsc<BR></BLOCKQUOTE></BLOCKQUOTE><BR>Gruppo=20
Telecom Italia - Direzione e coordinamento di Telecom Italia =
S.p.A.<BR><BR><FONT=20
size=3D3>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D</FONT><BR>CONFIDENTIALITY=20
NOTICE<BR>This message and its attachments are addressed solely to the=20
persons<BR>above and may contain confidential information. If you have=20
received<BR>the message in error, be informed that any use of the =
content=20
hereof<BR>is prohibited. Please return it immediately to the sender and=20
delete<BR>the message. Should you have any questions, please send an =
e_mail=20
to<BR>&lt;<A=20
href=3D"mailto:webmaster@telecomitalia.it">mailto:webmaster@telecomitalia=
.it</A>&gt;webmaster@telecomitalia.it.=20
Thank you<BR>&lt;<A=20
href=3D"http://www.loquendo.com">http://www.loquendo.com</A>&gt;www.loque=
ndo.com<BR><FONT=20
size=3D3>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D</FONT><BR></P></FONT>
</BODY></HTML>

------_=_NextPart_001_01C6AF09.27649057--


--===============0448460399==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============0448460399==--


From speechsc-bounces@ietf.org Mon Jul 24 10:57:42 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1G51sL-00086N-F0; Mon, 24 Jul 2006 10:57:41 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1G51sK-00086I-17
	for speechsc@ietf.org; Mon, 24 Jul 2006 10:57:40 -0400
Received: from g2.genesyslab.com ([198.49.180.210])
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G51sJ-0003hb-6E
	for speechsc@ietf.org; Mon, 24 Jul 2006 10:57:40 -0400
Received: from GIMLI.us.int.genesyslab.com ([192.168.20.233]) by
	g2.genesyslab.com with Microsoft SMTPSVC(6.0.3790.1830); 
	Mon, 24 Jul 2006 07:57:37 -0700
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Subject: RE: [Speechsc] Queuing in ASR
Date: Mon, 24 Jul 2006 07:57:36 -0700
Message-ID: <911B89A9FD71E649AA624FF24790D76F6745D3@GIMLI.us.int.genesyslab.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [Speechsc] Queuing in ASR
Thread-Index: AcatCDum/WZw1YU7SGWux5YapDUE4ACIomDA
From: "Andrew Wahbe" <Andrew.Wahbe@genesyslab.com>
To: "Dave Burke" <david.burke@voxpilot.com>,
	"Reifenrath, Klaus, VF-Group" <Klaus.Reifenrath@vodafone.com>,
	<jakkis@huawei.com>, <speechsc@ietf.org>
X-OriginalArrivalTime: 24 Jul 2006 14:57:37.0603 (UTC)
	FILETIME=[80D04930:01C6AF31]
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 876202f9cbc0933cffbc58102e40f8f2
Cc: sarvi@cisco.com
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1151933250=="
Errors-To: speechsc-bounces@ietf.org

This is a multi-part message in MIME format.

--===============1151933250==
Content-class: urn:content-classes:message
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C6AF31.81182F1C"

This is a multi-part message in MIME format.

------_=_NextPart_001_01C6AF31.81182F1C
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Well, typically you would want to limit the size of your hotword grammar
to get good accuracy, throwing all the commands into the hotword grammar
would most likely limit its effectiveness. Additionally, the
min/max-hotword utterance length features are based around the
assumption that you only have a few words in your grammar, allowing you
to further improve accuracy by restricting recognition to utterances
whose length is in the range you might expect for those words. If you
had a big grammar with all sorts of phrases in it, the length range
wouldn't be of much use. You also want to use short phrases/words for
hotword since you are only stopping the prompts after the utterance is
completed.
=20
I have to ask if this is the only use case for this feature, though. I
think it would be better implemented as a new recognition type, say
"wake-up" recognition, that specified a special wake-up-word grammar in
addition to the "normal" grammars. This mode could send a
START-OF-SPEECH when the wake-up word was detected (allowing the prompt
to be stopped early), and might better handle cases where there wasn't a
long pause between the wake-up word and the command (from an
end-of-speech detection perspective, not just a "dropping audio"
perspective). Anyways, an "add-on" draft could define the type and new
headers, etc. Queuing the recognition isn't a great solution IMO since,
as Dave mentioned, you can't specify that the recognition should only be
run if the former was successful. I suppose that you could STOP the
second recognition in those cases, but it still doesn't seem like a
great solution. Anyways, are there other uses for queuing recognitions?
=20
An event to signal when recognition was started may still be a good idea
(though I'm not a big fan of re-using START-OF-SPEECH for this) if we
want to get timestamp information about when recognition started (see:
http://www1.ietf.org/mail-archive/web/speechsc/current/msg01925.html).
In that discussion, I was assuming that you could put the timestamp on
the response to the RECOGNIZE request, but that solution delays the
response until you have received the audio that you are going to
process. I don't think that's such a great idea; a separate event is
much better.
=20
=20
Andrew

=20
________________________________

From: Dave Burke [mailto:david.burke@voxpilot.com]=20
Sent: July 21, 2006 4:46 PM
To: Reifenrath, Klaus, VF-Group; jakkis@huawei.com; speechsc@ietf.org
Cc: sarvi@cisco.com
Subject: Re: [Speechsc] Queuing in ASR


Thanks for the clarification.
=20
Still a little puzzled as to the features merit. Couldn't you have
"Computer please check my inbox" as a hotword phrase and just have a
single recognition - especially as you can't make the second
recognitions grammar conditional on the first's result? I'm not sure
START-OF-SPEECH is so elegant here - e.g. you're not guaranteed it will
be generated if the user just says "Computer". Also, START-OF-SPEECH
doesn't correspond accurately to the PENDING to IN-PROGRESS transition
but rather with a max error of No-Input-Timeout.
=20
Dave

	----- Original Message -----=20
	From: Reifenrath, Klaus, VF-Group
<mailto:Klaus.Reifenrath@vodafone.com> =20
	To: Dave Burke <mailto:david.burke@voxpilot.com>  ;
jakkis@huawei.com ; speechsc@ietf.org=20
	Cc: sarvi@cisco.com=20
	Sent: Friday, July 21, 2006 7:30 PM
	Subject: RE: [Speechsc] Queuing in ASR

	The queuing for speech recognition resources was introduced in
the context of hotword recognition. Very often you want to start a
normal recognition after a hotword recognition without loosing audio
blocks between the two recognitions. E.g. a hotword recognition is used
to detect the wake-up  ("Computer") and a normal recognition to
recognize the following request ("Please check my inbox!"). If we cannot
queue recognition requests, the recognition resource will miss
meaningful audio blocks, because the client can start the next
recognition only after it received the recognition result.=20
	=20
	The START-OF-SPEECH event indicated that the next recognition is
started.=20
	=20
	Klaus
	=20


________________________________

		From: Dave Burke [mailto:david.burke@voxpilot.com]=20
		Sent: Freitag, 21. Juli 2006 19:45
		To: jakkis@huawei.com; speechsc@ietf.org
		Cc: sarvi@cisco.com
		Subject: Re: [Speechsc] Queuing in ASR
	=09
	=09
		Good comments. I had also spotted the problem of
notifying PENDING -> IN-PROGRESS for the speechrecog resource. My
personal preference is to simply remove the queuing of RECOGNIZE
requests as I don't see any worthwhile value to this feature. It will
also be quicker than fixing it though that's just a nice side-effect.
Any vehement disagreements?
		=20
		Dave
		=20
		----- Original Message -----=20

			From: jakki sasidhar <mailto:jakkis@huawei.com>

			To: speechsc@ietf.org=20
			Cc: sarvi@cisco.com ; david.burke@voxpilot.com=20
			Sent: Friday, July 21, 2006 2:10 PM
			Subject: [Speechsc] Queuing in ASR

			Hi,
			    MRCP draft-ietf-speechsc-mrcpv2-10 is not
clear about the queuing of RECOGNIZE request at the ASR resource. As per
section 9.9, if the server receives a RECOGNIZE request and the ASR
resource is currently active with another request, then a 200 response
with PENDING state will be dispatched to the client. But once the ASR
becomes free and starts serving the queued request, it needs to indicate
this to the client. In the case of TTS, this is done by using the
SPEECH-MARKER event which carries a NULL marker and the state as
IN-PROGRESS. But for ASR, the spec doesnot define any such mechanism.=20
			    Also, if there are multiple requests pending
at the client side, a STOP request defined for TTS can carry a list of
request ids to specify which all requests should be stopped. There is no
such provision in case of the STOP of RECOGNIZE requests.
			    Any comments on this will be appreciated.
			=20
			Thanks & Regards,
			Sasidhar

		=09
________________________________


		=09

			_______________________________________________
			Speechsc mailing list
			Speechsc@ietf.org
			https://www1.ietf.org/mailman/listinfo/speechsc
		=09

=09
________________________________


=09

	_______________________________________________
	Speechsc mailing list
	Speechsc@ietf.org
	https://www1.ietf.org/mailman/listinfo/speechsc
=09


------_=_NextPart_001_01C6AF31.81182F1C
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii">
<META content=3D"MSHTML 6.00.2900.2769" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D817250914-24072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>Well, typically you would want to limit the =
size of your=20
hotword grammar to get good accuracy, throwing all the commands into the =
hotword=20
grammar would most likely limit its effectiveness.&nbsp;Additionally, =
the=20
min/max-hotword&nbsp;utterance length&nbsp;features are based around the =

assumption that you only have&nbsp;a few words&nbsp;in your grammar, =
allowing=20
you to further improve accuracy by restricting recognition to utterances =
whose=20
length is in the range you might expect for those words. If you had a =
big=20
grammar with all sorts of phrases in it, the length range wouldn't be of =
much=20
use. You also want to use short phrases/words for hotword since you are =
only=20
stopping the prompts after the utterance is =
completed.</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D817250914-24072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D817250914-24072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>I have to ask if this is the only use case for =
this=20
feature, though. I think it would be better implemented as a new =
recognition=20
type, say&nbsp;"wake-up" recognition, that specified a special =
wake-up-word=20
grammar in addition to the "normal" grammars. This mode could send a=20
START-OF-SPEECH when the wake-up word was detected (allowing the prompt =
to be=20
stopped early), and might better handle cases where there wasn't a long =
pause=20
between the wake-up word and the command (from an end-of-speech=20
detection&nbsp;perspective, not just a "dropping audio" perspective). =
Anyways,=20
an "add-on" draft could define the type and new headers, etc. Queuing =
the=20
recognition isn't a great solution IMO since, as Dave mentioned, you =
can't=20
specify that the recognition should only be run if the former was =
successful. I=20
suppose that you could STOP the second recognition in those cases, but =
it still=20
doesn't seem like a great solution. Anyways, are there other uses for =
queuing=20
recognitions?</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D817250914-24072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D817250914-24072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>An event to signal when recognition was started =
may still=20
be a good idea (though I'm not a big fan of re-using START-OF-SPEECH for =
this)=20
if we want to get timestamp information about when recognition started =
(see: <A=20
href=3D"http://www1.ietf.org/mail-archive/web/speechsc/current/msg01925.h=
tml">http://www1.ietf.org/mail-archive/web/speechsc/current/msg01925.html=
</A>).=20
In that discussion, I was assuming that you could put the timestamp on =
the=20
response to the RECOGNIZE request, but that solution delays the response =
until=20
you have received the audio that you are going to process. I don't think =
that's=20
such a great idea; a separate event is much better.</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D817250914-24072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D817250914-24072006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D817250914-24072006></SPAN><FONT face=3DArial><FONT=20
color=3D#0000ff><FONT size=3D2>Andrew</FONT></FONT></FONT></DIV>
<DIV><FONT face=3DArial><FONT color=3D#0000ff><FONT size=3D2><SPAN=20
class=3D817250914-24072006></SPAN></FONT></FONT></FONT><BR>&nbsp;</DIV>
<DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
<HR tabIndex=3D-1>
<FONT face=3DTahoma size=3D2><B>From:</B> Dave Burke=20
[mailto:david.burke@voxpilot.com] <BR><B>Sent:</B> July 21, 2006 4:46=20
PM<BR><B>To:</B> Reifenrath, Klaus, VF-Group; jakkis@huawei.com;=20
speechsc@ietf.org<BR><B>Cc:</B> sarvi@cisco.com<BR><B>Subject:</B> Re:=20
[Speechsc] Queuing in ASR<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><FONT face=3DArial size=3D2>Thanks for the =
clarification.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Still a little puzzled as to the =
features merit.=20
Couldn't you have "Computer please check my inbox" as a hotword phrase =
and just=20
have a single recognition - especially as you can't make the second =
recognitions=20
grammar conditional on the first's result?&nbsp;I'm not sure =
START-OF-SPEECH is=20
so elegant here - e.g. you're not guaranteed it will be generated if the =
user=20
just says "Computer". Also,&nbsp;START-OF-SPEECH doesn't correspond =
accurately=20
to the PENDING to IN-PROGRESS transition but rather with&nbsp;a max =
error of=20
No-Input-Timeout.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
<BLOCKQUOTE=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
  <DIV=20
  style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
  <A title=3DKlaus.Reifenrath@vodafone.com=20
  href=3D"mailto:Klaus.Reifenrath@vodafone.com">Reifenrath, Klaus, =
VF-Group</A>=20
  </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3Ddavid.burke@voxpilot.com=20
  href=3D"mailto:david.burke@voxpilot.com">Dave Burke</A> ; <A=20
  title=3Djakkis@huawei.com =
href=3D"mailto:jakkis@huawei.com">jakkis@huawei.com</A>=20
  ; <A title=3Dspeechsc@ietf.org=20
  href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Cc:</B> <A title=3Dsarvi@cisco.com=20
  href=3D"mailto:sarvi@cisco.com">sarvi@cisco.com</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Friday, July 21, 2006 =
7:30 PM</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> RE: [Speechsc] Queuing =
in=20
  ASR</DIV>
  <DIV><BR></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>The queuing for speech recognition resources =
was=20
  introduced in the context of hotword recognition. Very often you want =
to start=20
  a normal recognition&nbsp;after a hotword recognition without loosing =
audio=20
  blocks between the two recognitions. E.g. a hotword recognition is =
used to=20
  detect the wake-up&nbsp; ("Computer") and a normal recognition to =
recognize=20
  the following request ("Please check my inbox!"). If we cannot queue=20
  recognition requests,&nbsp;the recognition resource will miss =
meaningful audio=20
  blocks, because the client can start the next recognition =
only&nbsp;after it=20
  received the recognition result. </FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>The START-OF-SPEECH&nbsp;event indicated that =
the next=20
  recognition is started.&nbsp;</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2>Klaus</FONT></SPAN></DIV>
  <DIV dir=3Dltr align=3Dleft><SPAN class=3D125260118-21072006><FONT =
face=3DArial=20
  color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV><BR>
  <BLOCKQUOTE dir=3Dltr style=3D"MARGIN-RIGHT: 0px">
    <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr =
align=3Dleft>
    <HR tabIndex=3D-1>
    <FONT face=3DTahoma size=3D2><B>From:</B> Dave Burke=20
    [mailto:david.burke@voxpilot.com] <BR><B>Sent:</B> Freitag, 21. Juli =
2006=20
    19:45<BR><B>To:</B> jakkis@huawei.com; =
speechsc@ietf.org<BR><B>Cc:</B>=20
    sarvi@cisco.com<BR><B>Subject:</B> Re: [Speechsc] Queuing in=20
    ASR<BR></FONT><BR></DIV>
    <DIV></DIV>
    <DIV><FONT face=3DArial size=3D2>Good comments. I had also spotted =
the problem=20
    of notifying PENDING -&gt; IN-PROGRESS for the speechrecog resource. =
My=20
    personal preference is to simply remove the queuing of RECOGNIZE =
requests as=20
    I don't see any worthwhile value to this feature. It will also be =
quicker=20
    than fixing it though that's just a nice side-effect. Any vehement=20
    disagreements?</FONT></DIV>
    <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
    <DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
    <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
    <DIV>----- Original Message ----- </DIV>
    <BLOCKQUOTE=20
    style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
      <DIV=20
      style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
      <A title=3Djakkis@huawei.com =
href=3D"mailto:jakkis@huawei.com">jakki=20
      sasidhar</A> </DIV>
      <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3Dspeechsc@ietf.org=20
      href=3D"mailto:speechsc@ietf.org">speechsc@ietf.org</A> </DIV>
      <DIV style=3D"FONT: 10pt arial"><B>Cc:</B> <A =
title=3Dsarvi@cisco.com=20
      href=3D"mailto:sarvi@cisco.com">sarvi@cisco.com</A> ; <A=20
      title=3Ddavid.burke@voxpilot.com=20
      =
href=3D"mailto:david.burke@voxpilot.com">david.burke@voxpilot.com</A> =
</DIV>
      <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Friday, July 21, 2006 =
2:10=20
      PM</DIV>
      <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [Speechsc] Queuing =
in=20
      ASR</DIV>
      <DIV><BR></DIV>
      <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial=20
      size=3D2>Hi,</FONT></SPAN></DIV>
      <DIV><SPAN class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; <FONT =
face=3DArial=20
      size=3D2>MRCP </FONT><FONT face=3DArial><FONT=20
      size=3D2>draft-ietf-speechsc-mrcpv2-1<SPAN =
class=3D241065112-21072006>0 is not=20
      clear about the queuing&nbsp;of&nbsp;RECOGNIZE request at the ASR=20
      resource. As per section 9.9, if the server receives a RECOGNIZE =
request=20
      and the ASR resource is currently active with =
another&nbsp;request,=20
      then&nbsp;a 200 response with PENDING state will be dispatched to =
the=20
      client. But once the ASR&nbsp;becomes free and starts serving the =
queued=20
      request, it needs to indicate this&nbsp;to the=20
      client.&nbsp;</SPAN></FONT></FONT></SPAN><SPAN=20
      class=3D241065112-21072006><FONT face=3DArial><FONT size=3D2><SPAN =

      class=3D241065112-21072006>In the case of TTS, this is done by =
using the=20
      SPEECH-MARKER event which carries a NULL marker and the state as=20
      IN-PROGRESS. But for ASR, the spec doesnot define any such =
mechanism.=20
      </SPAN></FONT></FONT></SPAN></DIV>
      <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
      class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; Also, if there are =
multiple=20
      requests pending at the client side, a STOP request defined for =
TTS can=20
      carry a list of request ids to specify which all requests should =
be=20
      stopped. There is no such provision in case of the STOP =
of&nbsp;RECOGNIZE=20
      requests.</SPAN></FONT></FONT></SPAN></DIV>
      <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
      class=3D241065112-21072006>&nbsp;&nbsp;&nbsp; Any comments on this =
will be=20
      appreciated.</SPAN></FONT></FONT></SPAN></DIV>
      <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
      =
class=3D241065112-21072006></SPAN></FONT></FONT></SPAN>&nbsp;</DIV>
      <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
      class=3D241065112-21072006>Thanks &amp;=20
      Regards,</SPAN></FONT></FONT></SPAN></DIV>
      <DIV><SPAN class=3D241065112-21072006><FONT face=3DArial><FONT =
size=3D2><SPAN=20
      =
class=3D241065112-21072006>Sasidhar</SPAN></FONT></FONT></SPAN></DIV>
      <P>
      <HR>

      <P></P>_______________________________________________<BR>Speechsc =
mailing=20
      =
list<BR>Speechsc@ietf.org<BR>https://www1.ietf.org/mailman/listinfo/speec=
hsc<BR></BLOCKQUOTE></BLOCKQUOTE>
  <P>
  <HR>

  <P></P>_______________________________________________<BR>Speechsc =
mailing=20
  =
list<BR>Speechsc@ietf.org<BR>https://www1.ietf.org/mailman/listinfo/speec=
hsc<BR></BLOCKQUOTE></BODY></HTML>

------_=_NextPart_001_01C6AF31.81182F1C--


--===============1151933250==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc

--===============1151933250==--


From speechsc-bounces@ietf.org Wed Jul 26 07:13:37 2006
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com)
	by megatron.ietf.org with esmtp (Exim 4.43)
	id 1G5hKb-0002qN-DV; Wed, 26 Jul 2006 07:13:37 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org)
	by megatron.ietf.org with esmtp (Exim 4.43) id 1G5hKZ-0002qC-Fu
	for speechsc@ietf.org; Wed, 26 Jul 2006 07:13:35 -0400
Received: from fw01.db01.voxpilot.com ([212.17.54.82] helo=mail.voxpilot.com)
	by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G5hKW-0005zt-U0
	for speechsc@ietf.org; Wed, 26 Jul 2006 07:13:35 -0400
Received: by mail.voxpilot.com (Postfix, from userid 552)
	id 9B6DE214101; Wed, 26 Jul 2006 11:13:31 +0000 (GMT)
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on db01ms01
X-Spam-Status: No, score=-4.3 required=5.5 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from daburkewxp (unknown [10.0.0.102])
	by mail.voxpilot.com (Postfix) with ESMTP id D142D2140F9
	for <speechsc@ietf.org>; Wed, 26 Jul 2006 11:13:27 +0000 (GMT)
Message-ID: <054301c6b0a4$7ee1f320$6700000a@db01.voxpilot.com>
From: "Dave Burke" <david.burke@voxpilot.com>
To: <speechsc@ietf.org>
Date: Wed, 26 Jul 2006 12:13:17 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=response
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 244a2fd369eaf00ce6820a760a3de2e8
Subject: [Speechsc] Fw: [issue95] NULL and Unicode in the Speech-Marker
	header
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>,
	<mailto:speechsc-request@ietf.org?subject=subscribe>
Errors-To: speechsc-bounces@ietf.org


----- Original Message ----- 
From: "Dave Burke" <david.burke@voxpilot.com>
To: "Roundup issue tracker" <issue_tracker@softarmor.com>; 
<eburger@brooktrout.com>; <oran@cisco.com>; <sarvi@cisco.com>
Sent: Wednesday, July 26, 2006 11:45 AM
Subject: Re: [issue95] NULL and Unicode in the Speech-Marker header


> Mostly a style thing - I wonder should we change ABNF to omit trailing ';' 
> when there's no mark name:
>
>   speech-marker = "Speech-Marker" ":"
>                   "timestamp" "=" [time-stamp-value] [";" 1*UTFCHAR] CRLF
>
> The following are then valid:
>
>    Speech-Marker: timestamp=857206027059
>    Speech-Marker: timestamp=857206027059;mark_name
>    Speech-Marker: timestamp=;mark_name
>    Speech-Marker: timestamp= // an implementation probably wouldn't send 
> this
>
> but not
>
>    Speech-Marker: timestamp=857206027059;
>
> Dave
>
> ----- Original Message ----- 
> From: "Dan Burnett" <issue_tracker@softarmor.com>
> To: <david.burke@voxpilot.com>; <eburger@brooktrout.com>; 
> <oran@cisco.com>; <sarvi@cisco.com>
> Sent: Wednesday, June 14, 2006 12:50 PM
> Subject: [issue95] NULL and Unicode in the Speech-Marker header
>
>
>
> New submission from Dan Burnett <dan_burnett2000@yahoo.com>:
>
> Resolution:  The ABNF will be updated as suggested.  Note that this issue 
> is one
> of the cases in issue 57.  This one, however, requires the addition of a 
> space
> character (0x20).
>
> The history/discussion of this issue can be found as Issue 1 in the thread
> beginning with 
> http://www1.ietf.org/mail-archive/web/speechsc/current/msg01788.html.
>
> ----------
> assignedto: dburnett
> messages: 156
> nosy: dburke, dburnett, eburger, oran, sarvi
> priority: bug
> status: in-progress
> title: NULL and Unicode in the Speech-Marker header
>
> ____________________________________________________
> Roundup issue tracker <issue_tracker@softarmor.com>
> <https://www.softarmor.com/roundup/speechsc/issue95>
> ____________________________________________________
> 


_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc