
From root@core3.amsl.com  Tue Aug 11 11:30:01 2009
Return-Path: <root@core3.amsl.com>
X-Original-To: speechsc@ietf.org
Delivered-To: speechsc@core3.amsl.com
Received: by core3.amsl.com (Postfix, from userid 0) id AE6B43A6F70; Tue, 11 Aug 2009 11:30:01 -0700 (PDT)
From: Internet-Drafts@ietf.org
To: i-d-announce@ietf.org
Content-Type: Multipart/Mixed; Boundary="NextPart"
Mime-Version: 1.0
Message-Id: <20090811183001.AE6B43A6F70@core3.amsl.com>
Date: Tue, 11 Aug 2009 11:30:01 -0700 (PDT)
Cc: speechsc@ietf.org
Subject: [Speechsc] I-D Action:draft-ietf-speechsc-mrcpv2-20.txt
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Aug 2009 18:30:01 -0000

--NextPart

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the Speech Services Control Working Group of the IETF.


	Title           : Media Resource Control Protocol Version 2 (MRCPv2)
	Author(s)       : S. Shanmugham, D. Burnett
	Filename        : draft-ietf-speechsc-mrcpv2-20.txt
	Pages           : 217
	Date            : 2009-08-11

The MRCPv2 protocol allows client hosts to control media service
resources such as speech synthesizers, recognizers, verifiers and
identifiers residing in servers on the network.  MRCPv2 is not a
"stand-alone" protocol - it relies on other protocols, such as
Session Initiation Protocol (SIP) to rendezvous MRCPv2 clients and
servers and manage sessions between them, and the Session Description
Protocol (SDP) to describe, discover and exchange capabilities.  It
also depends on SIP and SDP to establish the media sessions and
associated parameters between the media source or sink and the media
server.  Once this is done, the MRCPv2 protocol exchange operates
over the control session established above, allowing the client to
control the media processing resources on the speech resource server.

A URL for this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-ietf-speechsc-mrcpv2-20.txt

Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/

Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.

--NextPart
Content-Type: Message/External-body;
	name="draft-ietf-speechsc-mrcpv2-20.txt";
	site="ftp.ietf.org";
	access-type="anon-ftp";
	directory="internet-drafts"

Content-Type: text/plain
Content-ID: <2009-08-11112058.I-D@ietf.org>


--NextPart--

From dburnett@voxeo.com  Tue Aug 11 11:35:00 2009
Return-Path: <dburnett@voxeo.com>
X-Original-To: speechsc@core3.amsl.com
Delivered-To: speechsc@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D8E6F3A6A2F for <speechsc@core3.amsl.com>; Tue, 11 Aug 2009 11:35:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.001
X-Spam-Level: 
X-Spam-Status: No, score=0.001 tagged_above=-999 required=5 tests=[BAYES_50=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Cvk1+qvGGRIZ for <speechsc@core3.amsl.com>; Tue, 11 Aug 2009 11:35:00 -0700 (PDT)
Received: from voxeo.com (mmail.voxeo.com [66.193.54.208]) by core3.amsl.com (Postfix) with SMTP id 459363A6CE5 for <speechsc@ietf.org>; Tue, 11 Aug 2009 11:34:18 -0700 (PDT)
Received: from [76.111.40.166] (account dburnett HELO [192.168.15.123]) by voxeo.com (CommuniGate Pro SMTP 5.2.3) with ESMTPSA id 50022133 for speechsc@ietf.org; Tue, 11 Aug 2009 18:22:10 +0000
Message-Id: <DD418C36-481A-4426-B586-E5703AD61226@voxeo.com>
From: Dan Burnett <dburnett@voxeo.com>
To: "IETF SPEECHSC (E-mail)" <speechsc@ietf.org>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Tue, 11 Aug 2009 14:22:10 -0400
X-Mailer: Apple Mail (2.930.3)
Subject: [Speechsc] draft-20 changes (in addition to those in Roni's email)
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Aug 2009 18:35:00 -0000

Here are the other changes in draft-20 in addition to those in Roni's  
email:


- Corrected the two typos that Arsen mentioned.

- In 11.5.2.8, clarified that the adapted element is only found within  
the first <voiceprint> element (based on Christian Groves' question).

- Corrected the remaining mistakes in the RNG schema pointed out  
recently by Christian Groves.



-- dan


From dburnett@voxeo.com  Tue Aug 11 11:35:01 2009
Return-Path: <dburnett@voxeo.com>
X-Original-To: speechsc@core3.amsl.com
Delivered-To: speechsc@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 7DB343A6CE5; Tue, 11 Aug 2009 11:35:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.299
X-Spam-Level: 
X-Spam-Status: No, score=-1.299 tagged_above=-999 required=5 tests=[AWL=1.299,  BAYES_00=-2.599, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jqsZzR3Xhe2H; Tue, 11 Aug 2009 11:34:53 -0700 (PDT)
Received: from voxeo.com (mmail.voxeo.com [66.193.54.208]) by core3.amsl.com (Postfix) with SMTP id 400573A6B1D; Tue, 11 Aug 2009 11:34:14 -0700 (PDT)
Received: from [76.111.40.166] (account dburnett HELO [192.168.15.123]) by voxeo.com (CommuniGate Pro SMTP 5.2.3) with ESMTPSA id 50022126; Tue, 11 Aug 2009 18:21:49 +0000
Message-Id: <E2C626B8-8CA1-4A1D-A2CE-B6AB4B269DEE@voxeo.com>
From: Dan Burnett <dburnett@voxeo.com>
To: Roni Even <Even.roni@huawei.com>
In-Reply-To: <033101c9ff3a$cbe33160$63a99420$%roni@huawei.com>
Content-Type: multipart/alternative; boundary=Apple-Mail-125--1022714190
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Tue, 11 Aug 2009 14:21:48 -0400
References: <033101c9ff3a$cbe33160$63a99420$%roni@huawei.com>
X-Mailer: Apple Mail (2.930.3)
Cc: speechsc@ietf.org, sarvi@cisco.com, oran@cisco.com, rai@ietf.org
Subject: Re: [Speechsc] RAI review of draft-ietf-speechsc-mrcpv2-19
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Aug 2009 18:35:01 -0000

--Apple-Mail-125--1022714190
Content-Type: text/plain;
	charset=WINDOWS-1252;
	format=flowed;
	delsp=yes
Content-Transfer-Encoding: quoted-printable


On Jul 7, 2009, at 3:40 PM, Roni Even wrote:

> Hi,
>
> I was assigned to do a RAI review of the draft.  The draft looks =20
> ready for publication to me. I have some comments mostly editorial.
>
> The only issue I see that is not pure editorial is the issue of the =20=

> different parameters like confidence threshold, sensitivity level =20
> (see comments 11, 13, 15, 16 and 17). I think that some =20
> clarification on the semantics and the scale (for example are the =20
> values linearly spaced) as well as when they are useful will be =20
> helpful to implementers.
>
> 1.       In figure 1 Expand the abbreviations TTS, ASR, SV , SI and =20=

> how they are related to the media resource types in 3.1
>

Done.  Added some text explaining Figure 1 and enhanced Figure 1 =20
slightly for clarification.
> 2.       In figure 1 there is a SIP dialog between the MRCPv2 client =20=

> and the media source/sink, what is this dialog, I only saw in =20
> section 4 a dialog between the client and server.
>
Clarified in the first example of section 4.2 that the SIP dialog with =20=

the media source/sink is not shown.
> 3.       In section 3.2 you have =93For example: =20
> sip:mrcpv2@example.net=94 twice one after the other.
>
Fixed.

> 4.       In the example in section 4.2 you =93a=3Dcmid:1=94, cmid is =20=

> specified later in the document so maybe you can add some reference =20=

> to where it is specified

Done.

>
> 5.       In the example is section 4.2 and in following examples you =20=

> have =93m=3Daudio 49170 RTP/AVP 0 96=94 but do not have an rtpmap =20
> parameter for mapping 96 (dynamic payload type number) to a media =20
> encoding name.

It is not in the first or third examples (Synthesizer only), but it is =20=

in the second example (Recognizer).  I have removed 96 as an option =20
for the Synthesizer-only examples but let it remain as an addition for =20=

the Recognizer example.

>
> 6.       In section 4.3 =93Also note that more that one media session =20=

> can be associated with a single resource if need be, but this =20
> scenario is not useful for the current set of resources=94. There is a =
=20
> typo the second =93that=94 should be =93than=94. I am also not sure if =
the =20
> current syntax in this document can support the mode.
>
Fixed the typo.

>
> 7.       In section 4.3 =93The formatting of the"cmid" attribute in =20=

> SDP RFC3388 [RFC4566]=94. I think you meant SDP grouping and need the =20=

> reference to RFC 3388.
>
I removed the reference altogether because it already exists =20
(correctly) earlier in the paragraph.

>
> 8.       In section 5.1 =93The message-length field specifies the =20
> length of the message, including the start-line=94 is the length in =20=

> Bytes, there is no unit specified.

Changed "length of the message" to "length of the message in bytes".

>
> 9.       In section 6.3.1, typo you have =93Verfication =93 instead of =
=20
> verification. It appears twice in the section.

Fixed.

>
> 10.   In the example in section 7 you have =93m=3Daudio 0 RTP/AVP 0 1 =
3=94 =20
> payload type 1 was deleted from the IANA registry, maybe have =20
> another payload type number.

I just removed that payload type.  It is not germane to the example.

>
> 11.   In section 9.4.1, 9.4.2 and 9.4.3 you specify confidence =20
> threshold, sensitivity level and speed vs accuracy. What is the =20
> scale here; is it linear between 0 and 1. What is the absolute value =20=

> of the number, if you receive the same confidence level from two =20
> recognizers are they the same (e.g. when using context block to =20
> switch servers).  For the speed vs accuracy, how does the client =20
> know what is the relation between the value and the number of =20
> available sessions, since this seems to be the reason for using this =20=

> parameter.
>
The interpretation of all of these parameters is implementation-=20
specific because the underlying technologies used to implement them =20
vary and can even be proprietary.  In practice the speech recognition =20=

and synthesis and speaker authentication communities have lived with =20
this state of affairs for many years, and users of other APIs for this =20=

technology are well aware of and have built applications that =20
accommodate this variability in interpretation.  It is outside the =20
scope of this specification to attempt to standardize interpretations =20=

of these values.

> 12.   In 9.4.9 and in 10.4.8, 11.4.11 what are the values for media-=20=

> type-value, you also mention audio and video but it looks to me that =20=

> this document only discusses voice.

Yes.  Although the original intent was to record speech, application =20
authors today are beginning to look at ways to incorporate other audio =20=

or video.  The intent of the sentences in these sections is to clarify =20=

that the specification itself imposes no restriction on the types of =20
media that are allowed.

>
> 13.   In 9.4.35 and 9.4.36 what is the scale for the consistency =20
> here. How does one know what close means. What is the consistency =20
> between different recognizers.

The answer to question 11, above, applies here as well.

>
> 14.   In section 9.6.3.3 in the example (figure 2) confidence should =20=

> be 0.75 and not 75

Fixed.

>
> 15.   In section 10.4.1 it is not clear how you measure the =20
> sensitivity in order to specify, is it based on some SNR translated =20=

> to 0 to 1 scale?

The answer to question 11, above, applies here as well.

>
> 16.   In 11.4.6 the same issue with the scale, how does the client =20
> know how to set a value when working with different speaker =20
> verification servers.

Ditto.  I should point out that in all of these cases the parameters =20
are typically passed directly to the engine, and their interpretations =20=

are defined (and described) in the vendors' documentation.  The most =20
common MRCPv2 server implementations are by the technology vendors =20
themselves (the providers of the synthesis, recognition, and =20
verification engines).  This is commonly understood in this technology =20=

industry (meaning those who use this technology regularly).

>
> 17.   In 11.5.2.9 you state that the verification-score is not a =20
> probability, so what is it. How can the client decide if, for =20
> example, 0 is a good score for specifying the threshold.  I also =20
> noticed that the values in the example in section 11.5.2.10 are very =20=

> precise like 0.98514 is this the expected precision. The examples =20
> here and in section 11.11 do not show the threshold, if the =20
> threshold is required for this flow why not show it in the example?

This parameter, as others mentioned above, has only a vendor-specific =20=

interpretation.  In practice authors interpret these values based both =20=

on guidance from the technology vendors and via experimentation on =20
large sets of recorded data.

The Min-Verification-Score threshold is not required to be set.  In =20
many cases the technology vendor has a fairly good understanding of =20
what the default threshold should be.  The verification-score is =20
returned, however, in case the application author determines (through =20=

experimentation, as described above) that the default threshold is not =20=

producing optimal results for the application.  In that case the =20
author can set the threshold to a different value or can set it to -1 =20=

and make the determination within the application itself based on the =20=

verification-score values.

>
> 18.   In section 12.3 the suggestion is to use SRTP as the mandatory =20=

> interoperability mode. If the reason for mandating SRTP is for a =20
> common mode you should also decide on a key exchange mechanism. I =20
> suggest you look at =
http://tools.ietf.org/html/draft-ietf-avt-srtp-not-mandatory-02=20
>  for discussion on media security.

Based on the discussion between you and Dan York on the list, I will =20
change this:

12.3. Media session protection
Sensitive data is also carried on media sessions terminating on MRCPv2 =20=

servers (the other end of a media channel may or may not be on the =20
MRCPv2 client). This data includes the user's spoken utterances and =20
the output of text-to-speech operations. MRCPv2 servers MUST support =20
SRTP for protection of audio media sessions. MRCPv2 clients that =20
originate or consume audio similarly MUST support SRTP. Alternative =20
media channel protection MAY be used if desired (e.g. IPSEC).

to this:

12.3. Media session protection
Sensitive data is also carried on media sessions terminating on MRCPv2 =20=

servers (the other end of a media channel may or may not be on the =20
MRCPv2 client). This data includes the user's spoken utterances and =20
the output of text-to-speech operations. MRCPv2 servers MUST support a =20=

security mechanism for protection of audio media sessions. MRCPv2 =20
clients that originate or consume audio similarly MUST support a =20
security mechanism for protection of the audio. If appropriate, usage =20=

of the Secure Real-time Transport Protocol (SRTP) [RFC3711] is =20
recommended.
>
> 19.   In section13.7.2 you specify the attribute resource as session =20=

> level yet in the example in section 4.2 it is a media level =20
> attribute. The same goes for the channel attribute

I have corrected both in section 13.7.2 to be media-level.

>
> Thanks
>
> Roni Even
>
>


--Apple-Mail-125--1022714190
Content-Type: text/html;
	charset=WINDOWS-1252
Content-Transfer-Encoding: quoted-printable

<html><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><br><div><div>On Jul 7, 2009, =
at 3:40 PM, Roni Even wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><p class=3D"MsoCommentText" =
style=3D"margin-top: 0in; margin-right: 0in; margin-bottom: 10pt; =
margin-left: 0in; line-height: 115%; font-size: 10pt; font-family: =
Calibri, sans-serif; "><span style=3D"font-size: 11pt; line-height: =
115%; ">Hi,<o:p></o:p></span></p><p class=3D"MsoCommentText" =
style=3D"margin-top: 0in; margin-right: 0in; margin-bottom: 10pt; =
margin-left: 0in; line-height: 115%; font-size: 10pt; font-family: =
Calibri, sans-serif; "><span style=3D"font-size: 11pt; line-height: =
115%; ">I was assigned to do a RAI review of the draft. &nbsp;The draft =
looks ready for publication to me. I have some comments mostly =
editorial.<o:p></o:p></span></p><p class=3D"MsoCommentText" =
style=3D"margin-top: 0in; margin-right: 0in; margin-bottom: 10pt; =
margin-left: 0in; line-height: 115%; font-size: 10pt; font-family: =
Calibri, sans-serif; "><span style=3D"font-size: 11pt; line-height: =
115%; ">The only issue I see that is not pure editorial is the issue of =
the different parameters like confidence threshold, sensitivity level =
(see comments 11, 13, 15, 16 and 17). I think that some clarification on =
the semantics and the scale (for example are the values linearly spaced) =
as well as when they are useful will be helpful to =
implementers.<o:p></o:p></span></p><p class=3D"MsoCommentText" =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-bottom: 10pt; margin-left: 0in; line-height: 115%; font-size: =
10pt; font-family: Calibri, sans-serif; "><span style=3D"font-size: =
11pt; line-height: 115%; "><span>1.<span style=3D"font: normal normal =
normal 7pt/normal 'Times New Roman'; =
">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; line-height: 115%; =
">In figure 1 Expand the abbreviations TTS, ASR, SV , SI and how they =
are related to the media resource types in =
3.1</span></p></div></div></span></blockquote><div><br></div>Done. =
&nbsp;Added some text explaining Figure 1 and enhanced Figure 1 slightly =
for clarification.<br><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><p class=3D"MsoCommentText" =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-bottom: 10pt; margin-left: 0in; line-height: 115%; font-size: =
10pt; font-family: Calibri, sans-serif; "><span style=3D"font-size: =
11pt; line-height: 115%; "><o:p></o:p></span></p><p =
class=3D"MsoCommentText" style=3D"text-indent: -0.25in; margin-top: 0in; =
margin-right: 0in; margin-bottom: 10pt; margin-left: 0in; line-height: =
115%; font-size: 10pt; font-family: Calibri, sans-serif; "><span =
style=3D"font-size: 11pt; line-height: 115%; "><span>2.<span =
style=3D"font: normal normal normal 7pt/normal 'Times New Roman'; =
">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; line-height: 115%; =
">In figure 1 there is a SIP dialog between the MRCPv2 client and the =
media source/sink, what is this dialog, I only saw in section 4 a dialog =
between the client and =
server.</span></p></div></div></span></blockquote><div>Clarified =
in&nbsp;the first example of section 4.2 that the SIP dialog with the =
media source/sink is not shown.</div><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><p class=3D"MsoCommentText" =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-bottom: 10pt; margin-left: 0in; line-height: 115%; font-size: =
10pt; font-family: Calibri, sans-serif; "><span style=3D"font-size: =
11pt; line-height: 115%; "><o:p></o:p></span></p><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>3.<span style=3D"font: normal normal normal =
7pt/normal 'Times New Roman'; =
">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In section 3.2 you have =93For example:<span =
class=3D"Apple-converted-space">&nbsp;</span><a =
href=3D"sip:mrcpv2@example.net" style=3D"color: blue; text-decoration: =
underline; "><span style=3D"color: windowtext; text-decoration: none; =
">sip:mrcpv2@example.net</span></a>=94 twice one after the =
other.<o:p></o:p></span></div><div style=3D"margin-top: 0in; =
margin-right: 0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: =
10.5pt; font-family: Consolas; "><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif; =
"><o:p>&nbsp;</o:p></span></div></div></div></span></blockquote>Fixed.</di=
v><div><br><blockquote type=3D"cite"><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; =
white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: =
0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><span>4.<span style=3D"font: normal normal normal 7pt/normal 'Times =
New Roman'; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In the example in section 4.2 you =93a=3Dcmid:1=94, cmid =
is specified later in the document so maybe you can add some reference =
to where it is =
specified</span></div></div></div></span></blockquote><div><br></div>Done.=
</div><div><br><blockquote type=3D"cite"><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; =
white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: =
0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><o:p></o:p></span></div><div style=3D"margin-top: 0in; margin-right: =
0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>5.<span style=3D"font: normal normal normal =
7pt/normal 'Times New Roman'; =
">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In the example is section 4.2 and in following examples =
you have =93m=3Daudio 49170 RTP/AVP 0 96=94 but do not have an rtpmap =
parameter for mapping 96 (dynamic payload type number) to a media =
encoding =
name.</span></div></div></div></span></blockquote><div><br></div>It is =
not in the first or third examples (Synthesizer only), but it is in the =
second example (Recognizer). &nbsp;I have removed 96 as an option for =
the Synthesizer-only examples but let it remain as an addition for the =
Recognizer example.</div><div><br><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><o:p></o:p></span></div><div style=3D"margin-top: 0in; margin-right: =
0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>6.<span style=3D"font: normal normal normal =
7pt/normal 'Times New Roman'; =
">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In section 4.3 =93Also note that more that one media =
session can be associated with a single resource if need be, but this =
scenario is not useful for the current set of resources=94. There is a =
typo the second =93that=94 should be =93than=94. I am also not sure if =
the current syntax in this document can support the =
mode.<o:p></o:p></span></div><div style=3D"margin-top: 0in; =
margin-right: 0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: =
10.5pt; font-family: Consolas; "><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif; =
"><o:p>&nbsp;</o:p></span></div></div></div></span></blockquote>Fixed =
the typo.</div><div><br><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"margin-top: 0in; =
margin-right: 0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: =
10.5pt; font-family: Consolas; "><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>7.<span style=3D"font: normal normal normal =
7pt/normal 'Times New Roman'; =
">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In section 4.3 =93The formatting of the"cmid" attribute in =
SDP RFC3388 [RFC4566]=94. I think you meant SDP grouping and need the =
reference to RFC 3388.<o:p></o:p></span></div><div style=3D"margin-top: =
0in; margin-right: 0in; margin-left: 0in; margin-bottom: 0.0001pt; =
font-size: 10.5pt; font-family: Consolas; "><span style=3D"font-size: =
11pt; font-family: Calibri, sans-serif; =
"><o:p>&nbsp;</o:p></span></div></div></div></span></blockquote>I =
removed the reference altogether because it already exists (correctly) =
earlier in the paragraph.</div><div><br><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"margin-top: 0in; =
margin-right: 0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: =
10.5pt; font-family: Consolas; "><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>8.<span style=3D"font: normal normal normal =
7pt/normal 'Times New Roman'; =
">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In section 5.1 =93</span><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif; ">The message-length field specifies =
the length of the message, including the start-line=94 is the length in =
Bytes, there is no unit =
specified.</span></div></div></div></span></blockquote><div><br></div>Chan=
ged "length of the message" to "length of the message in =
bytes".</div><div><br><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><o:p></o:p></span></div><div style=3D"margin-top: 0in; margin-right: =
0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>9.<span style=3D"font: normal normal normal =
7pt/normal 'Times New Roman'; =
">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In section 6.3.1, typo you have =93Verfication =93 instead =
of verification. It appears twice in the =
section.</span></div></div></div></span></blockquote><div><br></div>Fixed.=
</div><div><br><blockquote type=3D"cite"><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; =
white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: =
0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><o:p></o:p></span></div><div style=3D"margin-top: 0in; margin-right: =
0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>10.<span style=3D"font: normal normal =
normal 7pt/normal 'Times New Roman'; ">&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In the example in section 7 you have =93m=3Daudio 0 =
RTP/AVP 0 1 3=94 payload type 1 was deleted from the IANA registry, =
maybe have another payload type =
number.</span></div></div></div></span></blockquote><div><br></div>I =
just removed that payload type. &nbsp;It is not germane to the =
example.</div><div><br><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><o:p></o:p></span></div><div style=3D"margin-top: 0in; margin-right: =
0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>11.<span style=3D"font: normal normal =
normal 7pt/normal 'Times New Roman'; ">&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In section 9.4.1, 9.4.2 and 9.4.3 you specify confidence =
threshold, sensitivity level and speed vs accuracy. What is the scale =
here; is it linear between 0 and 1. What is the absolute value of the =
number, if you receive the same confidence level from two recognizers =
are they the same (e.g. when using context block to switch =
servers).&nbsp; For the speed vs accuracy, how does the client know what =
is the relation between the value and the number of available sessions, =
since this seems to be the reason for using this =
parameter.<o:p></o:p></span></div><div style=3D"margin-top: 0in; =
margin-right: 0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: =
10.5pt; font-family: Consolas; "><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif; =
"><o:p>&nbsp;</o:p></span></div></div></div></span></blockquote>The =
interpretation of all of these parameters is implementation-specific =
because the underlying technologies used to implement them vary and can =
even be proprietary. &nbsp;In practice the speech recognition and =
synthesis and speaker authentication communities have lived with this =
state of affairs for many years, and users of other APIs for this =
technology are well aware of and have built applications that =
accommodate this variability in interpretation. &nbsp;It is outside the =
scope of this specification to attempt to standardize interpretations of =
these values.</div><div><br><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><span>12.<span style=3D"font: normal normal normal 7pt/normal 'Times =
New Roman'; ">&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In 9.4.9 and in 10.4.8, 11.4.11 what are the values for =
media-type-value, you also mention audio and video but it looks to me =
that this document only discusses =
voice.</span></div></div></div></span></blockquote><div><br></div>Yes. =
&nbsp;Although the original intent was to record speech, application =
authors today are beginning to look at ways to incorporate other audio =
or video. &nbsp;The intent of the sentences in these sections is to =
clarify that the specification itself imposes no restriction on the =
types of media that are allowed.</div><div><br><blockquote =
type=3D"cite"><span class=3D"Apple-style-span" style=3D"border-collapse: =
separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; =
font-style: normal; font-variant: normal; font-weight: normal; =
letter-spacing: normal; line-height: normal; orphans: 2; text-align: =
auto; text-indent: 0px; text-transform: none; white-space: normal; =
widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><o:p></o:p></span></div><div style=3D"margin-top: 0in; margin-right: =
0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>13.<span style=3D"font: normal normal =
normal 7pt/normal 'Times New Roman'; ">&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In 9.4.35 and 9.4.36 what is the scale for the consistency =
here. How does one know what close means. What is the consistency =
between different =
recognizers.</span></div></div></div></span></blockquote><div><br></div>Th=
e answer to question 11, above, applies here as =
well.</div><div><br><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><o:p></o:p></span></div><div style=3D"margin-top: 0in; margin-right: =
0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>14.<span style=3D"font: normal normal =
normal 7pt/normal 'Times New Roman'; ">&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In section 9.6.3.3 in the example (figure 2) confidence =
should be 0.75 and not =
75</span></div></div></div></span></blockquote><div><br></div>Fixed.</div>=
<div><br><blockquote type=3D"cite"><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; =
white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: =
0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><o:p></o:p></span></div><div style=3D"margin-top: 0in; margin-right: =
0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>15.<span style=3D"font: normal normal =
normal 7pt/normal 'Times New Roman'; ">&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In section 10.4.1 it is not clear how you measure the =
sensitivity in order to specify, is it based on some SNR translated to 0 =
to 1 =
scale?</span></div></div></div></span></blockquote><div><br></div>The =
answer to question 11, above, applies here as =
well.</div><div><br><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><o:p></o:p></span></div><div style=3D"margin-top: 0in; margin-right: =
0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>16.<span style=3D"font: normal normal =
normal 7pt/normal 'Times New Roman'; ">&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In 11.4.6 the same issue with the scale, how does the =
client know how to set a value when working with different speaker =
verification =
servers.</span></div></div></div></span></blockquote><div><br></div>Ditto.=
 &nbsp;I should point out that in all of these cases the parameters are =
typically passed directly to the engine, and their interpretations are =
defined (and described) in the vendors' documentation. &nbsp;The most =
common MRCPv2 server implementations are by the technology vendors =
themselves (the providers of the synthesis, recognition, and =
verification engines). &nbsp;This is commonly understood in this =
technology industry (meaning those who use this technology =
regularly).</div><div><br><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><o:p></o:p></span></div><div style=3D"margin-top: 0in; margin-right: =
0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>17.<span style=3D"font: normal normal =
normal 7pt/normal 'Times New Roman'; ">&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In 11.5.2.9 you state that the verification-score is not a =
probability, so what is it. How can the client decide if, for example, 0 =
is a good score for specifying the threshold.&nbsp; I also noticed that =
the values in the example in section 11.5.2.10 are very precise like =
0.98514 is this the expected precision. The examples here and in section =
11.11 do not show the threshold, if the threshold is required for this =
flow why not show it in the =
example?</span></div></div></div></span></blockquote><div><br></div>This =
parameter, as others mentioned above, has only a vendor-specific =
interpretation. &nbsp;In practice authors interpret these values based =
both on guidance from the technology vendors and via experimentation on =
large sets of recorded data.</div><div><br></div><div>The =
Min-Verification-Score threshold is not required to be set. &nbsp;In =
many cases the technology vendor has a fairly good understanding of what =
the default threshold should be. &nbsp;The verification-score is =
returned, however, in case the application author determines (through =
experimentation, as described above) that the default threshold is not =
producing optimal results for the application. &nbsp;In that case the =
author can set the threshold to a different value or can set it to -1 =
and make the determination within the application itself based on the =
verification-score values.</div><div><br><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><o:p></o:p></span></div><div style=3D"margin-top: 0in; margin-right: =
0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>18.<span style=3D"font: normal normal =
normal 7pt/normal 'Times New Roman'; ">&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In section 12.3 the suggestion is to use SRTP as the =
mandatory interoperability mode. If the reason for mandating SRTP is for =
a common mode you should also decide on a key exchange mechanism. I =
suggest you look at<span class=3D"Apple-converted-space">&nbsp;</span><a =
href=3D"http://tools.ietf.org/html/draft-ietf-avt-srtp-not-mandatory-02" =
style=3D"color: blue; text-decoration: underline; =
">http://tools.ietf.org/html/draft-ietf-avt-srtp-not-mandatory-02</a><span=
 class=3D"Apple-converted-space">&nbsp;</span>for discussion on media =
security.</span></div></div></div></span></blockquote><div><br></div>Based=
 on the discussion between you and Dan York on the list, I will change =
this:</div><div><br></div><div><pre><font class=3D"Apple-style-span" =
face=3D"Helvetica" size=3D"3"><span class=3D"Apple-style-span" =
style=3D"white-space: normal; ">12.3. Media session =
protection&nbsp;</span></font></pre><pre><font class=3D"Apple-style-span" =
face=3D"Helvetica" size=3D"3"><span class=3D"Apple-style-span" =
style=3D"font-size: 12px; white-space: normal;">Sensitive data is also =
carried on media sessions terminating on
   MRCPv2 servers (the other end of a media channel may or may not be on
   the MRCPv2 client).  This data includes the user's spoken utterances
   and the output of text-to-speech operations.  MRCPv2 servers MUST
   support SRTP for protection of audio media sessions.  MRCPv2 clients
   that originate or consume audio similarly MUST support SRTP.
   Alternative media channel protection MAY be used if desired (e.g.
   IPSEC).</span></font></pre></div><div><br></div><div>to =
this:</div><div><br></div><div><pre><font class=3D"Apple-style-span" =
face=3D"Helvetica" size=3D"3"><span class=3D"Apple-style-span" =
style=3D"font-size: 12px; white-space: normal;">12.3. Media session =
protection&nbsp;</span></font></pre><pre><font class=3D"Apple-style-span" =
face=3D"Helvetica" size=3D"3"><span class=3D"Apple-style-span" =
style=3D"font-size: 12px; white-space: normal;">Sensitive data is also =
carried on media sessions terminating on MRCPv2 servers (the other end =
of a media channel may or may not be on the MRCPv2 client). This data =
includes the user's spoken utterances and the output of text-to-speech =
operations. MRCPv2 servers MUST support a security mechanism for =
protection of audio media sessions. MRCPv2 clients that originate or =
consume audio similarly MUST support a security mechanism for protection =
of the audio. If appropriate,&nbsp;usage of the Secure Real-time =
Transport Protocol (SRTP)&nbsp;[RFC3711] is =
recommended.</span></font></pre></div><div><blockquote type=3D"cite"><span=
 class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><o:p></o:p></span></div><div style=3D"margin-top: 0in; margin-right: =
0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"text-indent: -0.25in; margin-top: 0in; margin-right: 0in; =
margin-left: 0in; margin-bottom: 0.0001pt; font-size: 10.5pt; =
font-family: Consolas; "><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif; "><span>19.<span style=3D"font: normal normal =
normal 7pt/normal 'Times New Roman'; ">&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
dir=3D"LTR"></span><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; ">In section13.7.2 you specify the attribute resource as =
session level yet in the example in section 4.2 it is a media level =
attribute. The same goes for the channel =
attribute</span></div></div></div></span></blockquote><div><br></div>I =
have corrected both in section 13.7.2 to be =
media-level.</div><div><br><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple"><div class=3D"Section1"><div style=3D"text-indent: =
-0.25in; margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
"><o:p></o:p></span></div><div style=3D"margin-top: 0in; margin-right: =
0in; margin-left: 0.5in; margin-bottom: 0.0001pt; font-size: 11pt; =
font-family: Calibri, sans-serif; "><o:p>&nbsp;</o:p></div><div =
style=3D"margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
">Thanks<o:p></o:p></span></div><div style=3D"margin-top: 0in; =
margin-right: 0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: =
10.5pt; font-family: Consolas; "><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 10.5pt; font-family: Consolas; =
"><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; =
">Roni Even<o:p></o:p></span></div><div style=3D"margin-top: 0in; =
margin-right: 0in; margin-left: 0in; margin-bottom: 0.0001pt; font-size: =
10.5pt; font-family: Consolas; "><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"margin-top: 0in; margin-right: 0in; margin-left: 0in; =
margin-bottom: 0.0001pt; font-size: 11pt; font-family: Calibri, =
sans-serif; =
"><o:p>&nbsp;</o:p></div></div></div></span></blockquote></div><br></body>=
</html>=

--Apple-Mail-125--1022714190--

From Even.roni@huawei.com  Wed Aug 12 00:22:37 2009
Return-Path: <Even.roni@huawei.com>
X-Original-To: speechsc@core3.amsl.com
Delivered-To: speechsc@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 234F73A6A38; Wed, 12 Aug 2009 00:22:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level: 
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, J_CHICKENPOX_16=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rjkPyqdd83d1; Wed, 12 Aug 2009 00:22:24 -0700 (PDT)
Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [58.251.152.64]) by core3.amsl.com (Postfix) with ESMTP id 00ACD3A63D3; Wed, 12 Aug 2009 00:22:23 -0700 (PDT)
Received: from huawei.com (szxga01-in [172.24.2.3]) by szxga01-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0KO9002ZQ5NB8U@szxga01-in.huawei.com>; Wed, 12 Aug 2009 15:18:47 +0800 (CST)
Received: from huawei.com ([172.24.1.6]) by szxga01-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0KO900K8S5MYYY@szxga01-in.huawei.com>; Wed, 12 Aug 2009 15:18:46 +0800 (CST)
Received: from windows8d787f9 (bzq-79-180-57-195.red.bezeqint.net [79.180.57.195]) by szxml02-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTPA id <0KO90078S5M7EF@szxml02-in.huawei.com>; Wed, 12 Aug 2009 15:18:41 +0800 (CST)
Date: Wed, 12 Aug 2009 10:15:20 +0300
From: Roni Even <Even.roni@huawei.com>
In-reply-to: <E2C626B8-8CA1-4A1D-A2CE-B6AB4B269DEE@voxeo.com>
To: 'Dan Burnett' <dburnett@voxeo.com>
Message-id: <027801ca1b1c$c2e8ee80$48bacb80$%roni@huawei.com>
MIME-version: 1.0
X-Mailer: Microsoft Office Outlook 12.0
Content-type: multipart/alternative; boundary="Boundary_(ID_VAnI9RDuzfD30+DX7QbxZw)"
Content-language: en-us
Thread-index: AcoasgmJVUUpROjvQ5a54Hub9X2DzAAZj4yw
References: <033101c9ff3a$cbe33160$63a99420$%roni@huawei.com> <E2C626B8-8CA1-4A1D-A2CE-B6AB4B269DEE@voxeo.com>
X-Mailman-Approved-At: Wed, 12 Aug 2009 08:12:42 -0700
Cc: speechsc@ietf.org, sarvi@cisco.com, oran@cisco.com, rai@ietf.org
Subject: Re: [Speechsc] RAI review of draft-ietf-speechsc-mrcpv2-19
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Aug 2009 07:22:37 -0000

This is a multi-part message in MIME format.

--Boundary_(ID_VAnI9RDuzfD30+DX7QbxZw)
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT

Hi Dan,

I understand your explanation about all these "vendor specific" parameter. I
think that since this a standard track document there should be some text
explaining the usage of these parameters as well as making a note that since
these are vendor specific information you cannot compare the values coming
from different vendors

 

 

As for my comment number 5 on payload type 96. My comment was that if the
m-line has a payload type number of 96 you must have a a=rtpmap line mapping
96 to a specific subtype name while for pcmu it is not mandatory to have
a=rtpmap like you have in your examples since payload type number 0 is a
static payload type number assigned to pcmu

 

 

Roni Even 

 

From: Dan Burnett [mailto:dburnett@voxeo.com] 
Sent: Tuesday, August 11, 2009 9:22 PM
To: Roni Even
Cc: sarvi@cisco.com; oran@cisco.com; 'Eric Burger'; speechsc@ietf.org;
rai@ietf.org
Subject: Re: RAI review of draft-ietf-speechsc-mrcpv2-19

 

 

On Jul 7, 2009, at 3:40 PM, Roni Even wrote:





Hi,

I was assigned to do a RAI review of the draft.  The draft looks ready for
publication to me. I have some comments mostly editorial.

The only issue I see that is not pure editorial is the issue of the
different parameters like confidence threshold, sensitivity level (see
comments 11, 13, 15, 16 and 17). I think that some clarification on the
semantics and the scale (for example are the values linearly spaced) as well
as when they are useful will be helpful to implementers.

1.       In figure 1 Expand the abbreviations TTS, ASR, SV , SI and how they
are related to the media resource types in 3.1

 

Done.  Added some text explaining Figure 1 and enhanced Figure 1 slightly
for clarification.



2.       In figure 1 there is a SIP dialog between the MRCPv2 client and the
media source/sink, what is this dialog, I only saw in section 4 a dialog
between the client and server.

Clarified in the first example of section 4.2 that the SIP dialog with the
media source/sink is not shown.

3.       In section 3.2 you have "For example:  <sip:mrcpv2@example.net>
sip:mrcpv2@example.net" twice one after the other.

 

Fixed.





4.       In the example in section 4.2 you "a=cmid:1", cmid is specified
later in the document so maybe you can add some reference to where it is
specified

 

Done.





 

5.       In the example is section 4.2 and in following examples you have
"m=audio 49170 RTP/AVP 0 96" but do not have an rtpmap parameter for mapping
96 (dynamic payload type number) to a media encoding name.

 

It is not in the first or third examples (Synthesizer only), but it is in
the second example (Recognizer).  I have removed 96 as an option for the
Synthesizer-only examples but let it remain as an addition for the
Recognizer example.





 

6.       In section 4.3 "Also note that more that one media session can be
associated with a single resource if need be, but this scenario is not
useful for the current set of resources". There is a typo the second "that"
should be "than". I am also not sure if the current syntax in this document
can support the mode.

 

Fixed the typo.





 

7.       In section 4.3 "The formatting of the"cmid" attribute in SDP
RFC3388 [RFC4566]". I think you meant SDP grouping and need the reference to
RFC 3388.

 

I removed the reference altogether because it already exists (correctly)
earlier in the paragraph.





 

8.       In section 5.1 "The message-length field specifies the length of
the message, including the start-line" is the length in Bytes, there is no
unit specified.

 

Changed "length of the message" to "length of the message in bytes".





 

9.       In section 6.3.1, typo you have "Verfication " instead of
verification. It appears twice in the section.

 

Fixed.





 

10.   In the example in section 7 you have "m=audio 0 RTP/AVP 0 1 3" payload
type 1 was deleted from the IANA registry, maybe have another payload type
number.

 

I just removed that payload type.  It is not germane to the example.





 

11.   In section 9.4.1, 9.4.2 and 9.4.3 you specify confidence threshold,
sensitivity level and speed vs accuracy. What is the scale here; is it
linear between 0 and 1. What is the absolute value of the number, if you
receive the same confidence level from two recognizers are they the same
(e.g. when using context block to switch servers).  For the speed vs
accuracy, how does the client know what is the relation between the value
and the number of available sessions, since this seems to be the reason for
using this parameter.

 

The interpretation of all of these parameters is implementation-specific
because the underlying technologies used to implement them vary and can even
be proprietary.  In practice the speech recognition and synthesis and
speaker authentication communities have lived with this state of affairs for
many years, and users of other APIs for this technology are well aware of
and have built applications that accommodate this variability in
interpretation.  It is outside the scope of this specification to attempt to
standardize interpretations of these values.





12.   In 9.4.9 and in 10.4.8, 11.4.11 what are the values for
media-type-value, you also mention audio and video but it looks to me that
this document only discusses voice.

 

Yes.  Although the original intent was to record speech, application authors
today are beginning to look at ways to incorporate other audio or video.
The intent of the sentences in these sections is to clarify that the
specification itself imposes no restriction on the types of media that are
allowed.





 

13.   In 9.4.35 and 9.4.36 what is the scale for the consistency here. How
does one know what close means. What is the consistency between different
recognizers.

 

The answer to question 11, above, applies here as well.





 

14.   In section 9.6.3.3 in the example (figure 2) confidence should be 0.75
and not 75

 

Fixed.





 

15.   In section 10.4.1 it is not clear how you measure the sensitivity in
order to specify, is it based on some SNR translated to 0 to 1 scale?

 

The answer to question 11, above, applies here as well.





 

16.   In 11.4.6 the same issue with the scale, how does the client know how
to set a value when working with different speaker verification servers.

 

Ditto.  I should point out that in all of these cases the parameters are
typically passed directly to the engine, and their interpretations are
defined (and described) in the vendors' documentation.  The most common
MRCPv2 server implementations are by the technology vendors themselves (the
providers of the synthesis, recognition, and verification engines).  This is
commonly understood in this technology industry (meaning those who use this
technology regularly).





 

17.   In 11.5.2.9 you state that the verification-score is not a
probability, so what is it. How can the client decide if, for example, 0 is
a good score for specifying the threshold.  I also noticed that the values
in the example in section 11.5.2.10 are very precise like 0.98514 is this
the expected precision. The examples here and in section 11.11 do not show
the threshold, if the threshold is required for this flow why not show it in
the example?

 

This parameter, as others mentioned above, has only a vendor-specific
interpretation.  In practice authors interpret these values based both on
guidance from the technology vendors and via experimentation on large sets
of recorded data.

 

The Min-Verification-Score threshold is not required to be set.  In many
cases the technology vendor has a fairly good understanding of what the
default threshold should be.  The verification-score is returned, however,
in case the application author determines (through experimentation, as
described above) that the default threshold is not producing optimal results
for the application.  In that case the author can set the threshold to a
different value or can set it to -1 and make the determination within the
application itself based on the verification-score values.





 

18.   In section 12.3 the suggestion is to use SRTP as the mandatory
interoperability mode. If the reason for mandating SRTP is for a common mode
you should also decide on a key exchange mechanism. I suggest you look at
http://tools.ietf.org/html/draft-ietf-avt-srtp-not-mandatory-02 for
discussion on media security.

 

Based on the discussion between you and Dan York on the list, I will change
this:

 

12.3. Media session protection 
Sensitive data is also carried on media sessions terminating on MRCPv2
servers (the other end of a media channel may or may not be on the MRCPv2
client). This data includes the user's spoken utterances and the output of
text-to-speech operations. MRCPv2 servers MUST support SRTP for protection
of audio media sessions. MRCPv2 clients that originate or consume audio
similarly MUST support SRTP. Alternative media channel protection MAY be
used if desired (e.g. IPSEC).

 

to this:

 

12.3. Media session protection 
Sensitive data is also carried on media sessions terminating on MRCPv2
servers (the other end of a media channel may or may not be on the MRCPv2
client). This data includes the user's spoken utterances and the output of
text-to-speech operations. MRCPv2 servers MUST support a security mechanism
for protection of audio media sessions. MRCPv2 clients that originate or
consume audio similarly MUST support a security mechanism for protection of
the audio. If appropriate, usage of the Secure Real-time Transport Protocol
(SRTP) [RFC3711] is recommended.

 

19.   In section13.7.2 you specify the attribute resource as session level
yet in the example in section 4.2 it is a media level attribute. The same
goes for the channel attribute

 

I have corrected both in section 13.7.2 to be media-level.





 

Thanks

 

Roni Even

 

 

 


--Boundary_(ID_VAnI9RDuzfD30+DX7QbxZw)
Content-type: text/html; charset=us-ascii
Content-transfer-encoding: 7BIT

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
 /* Font Definitions */
 @font-face
	{font-family:Helvetica;
	panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
	{font-family:Consolas;
	panose-1:2 11 6 9 2 2 4 3 2 4;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
p.MsoCommentText, li.MsoCommentText, div.MsoCommentText
	{mso-style-priority:99;
	mso-style-link:"Comment Text Char";
	mso-margin-top-alt:auto;
	margin-right:0in;
	mso-margin-bottom-alt:auto;
	margin-left:0in;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
pre
	{mso-style-priority:99;
	mso-style-link:"HTML Preformatted Char";
	margin:0in;
	margin-bottom:.0001pt;
	font-size:10.0pt;
	font-family:"Courier New";}
span.apple-style-span
	{mso-style-name:apple-style-span;}
span.CommentTextChar
	{mso-style-name:"Comment Text Char";
	mso-style-priority:99;
	mso-style-link:"Comment Text";
	font-family:"Calibri","sans-serif";}
span.apple-converted-space
	{mso-style-name:apple-converted-space;}
span.HTMLPreformattedChar
	{mso-style-name:"HTML Preformatted Char";
	mso-style-priority:99;
	mso-style-link:"HTML Preformatted";
	font-family:Consolas;}
span.EmailStyle23
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-size:10.0pt;}
@page Section1
	{size:8.5in 11.0in;
	margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
	{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
 <o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
 <o:shapelayout v:ext="edit">
  <o:idmap v:ext="edit" data="1" />
 </o:shapelayout></xml><![endif]-->
</head>

<body lang=EN-US link=blue vlink=purple style='word-wrap: break-word;
-webkit-nbsp-mode: space;-webkit-line-break: after-white-space'>

<div class=Section1>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Hi Dan,<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>I understand your explanation about all these &quot;vendor
specific&quot; parameter. I think that since this a standard track document
there should be some text explaining the usage of these parameters as well as
making a note that since these are vendor specific information you cannot
compare the values coming from different vendors<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>As for my comment number 5 on payload type 96. My comment was
that if the m-line has a payload type number of 96 you must have a a=rtpmap
line mapping 96 to a specific subtype name while for pcmu it is not mandatory
to have a=rtpmap like you have in your examples since payload type number 0 is
a static payload type number assigned to pcmu<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Roni Even <o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p>&nbsp;</o:p></span></p>

<div style='border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt'>

<div>

<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'>

<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Dan Burnett
[mailto:dburnett@voxeo.com] <br>
<b>Sent:</b> Tuesday, August 11, 2009 9:22 PM<br>
<b>To:</b> Roni Even<br>
<b>Cc:</b> sarvi@cisco.com; oran@cisco.com; 'Eric Burger'; speechsc@ietf.org;
rai@ietf.org<br>
<b>Subject:</b> Re: RAI review of draft-ietf-speechsc-mrcpv2-19<o:p></o:p></span></p>

</div>

</div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<div>

<div>

<p class=MsoNormal>On Jul 7, 2009, at 3:40 PM, Roni Even wrote:<o:p></o:p></p>

</div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<p class=MsoCommentText style='mso-margin-top-alt:0in;margin-right:0in;
margin-bottom:10.0pt;margin-left:0in;line-height:115%'><span style='font-size:
11.0pt;line-height:115%;font-family:"Calibri","sans-serif";color:black'>Hi,</span><span
style='font-size:10.0pt;line-height:115%;font-family:"Calibri","sans-serif";
color:black'><o:p></o:p></span></p>

<p class=MsoCommentText style='mso-margin-top-alt:0in;margin-right:0in;
margin-bottom:10.0pt;margin-left:0in;line-height:115%'><span style='font-size:
11.0pt;line-height:115%;font-family:"Calibri","sans-serif";color:black'>I was
assigned to do a RAI review of the draft. &nbsp;The draft looks ready for
publication to me. I have some comments mostly editorial.</span><span
style='font-size:10.0pt;line-height:115%;font-family:"Calibri","sans-serif";
color:black'><o:p></o:p></span></p>

<p class=MsoCommentText style='mso-margin-top-alt:0in;margin-right:0in;
margin-bottom:10.0pt;margin-left:0in;line-height:115%'><span style='font-size:
11.0pt;line-height:115%;font-family:"Calibri","sans-serif";color:black'>The
only issue I see that is not pure editorial is the issue of the different
parameters like confidence threshold, sensitivity level (see comments 11, 13,
15, 16 and 17). I think that some clarification on the semantics and the scale
(for example are the values linearly spaced) as well as when they are useful
will be helpful to implementers.</span><span style='font-size:10.0pt;
line-height:115%;font-family:"Calibri","sans-serif";color:black'><o:p></o:p></span></p>

<p class=MsoCommentText style='mso-margin-top-alt:0in;margin-right:0in;
margin-bottom:10.0pt;margin-left:0in;text-indent:-.25in;line-height:115%'><span
style='font-size:11.0pt;line-height:115%;font-family:"Calibri","sans-serif";
color:black'>1.</span><span style='font-size:7.0pt;line-height:115%;color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
line-height:115%;font-family:"Calibri","sans-serif";color:black'>In figure 1
Expand the abbreviations TTS, ASR, SV , SI and how they are related to the
media resource types in 3.1</span><span style='font-size:10.0pt;line-height:
115%;font-family:"Calibri","sans-serif";color:black'><o:p></o:p></span></p>

</div>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>Done. &nbsp;Added some text explaining Figure 1 and enhanced
Figure 1 slightly for clarification.<br>
<br>
<o:p></o:p></p>

<div>

<div>

<p class=MsoCommentText style='mso-margin-top-alt:0in;margin-right:0in;
margin-bottom:10.0pt;margin-left:0in;text-indent:-.25in;line-height:115%'><span
style='font-size:11.0pt;line-height:115%;font-family:"Calibri","sans-serif";
color:black'>2.</span><span style='font-size:7.0pt;line-height:115%;color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
line-height:115%;font-family:"Calibri","sans-serif";color:black'>In figure 1
there is a SIP dialog between the MRCPv2 client and the media source/sink, what
is this dialog, I only saw in section 4 a dialog between the client and server.</span><span
style='font-size:10.0pt;line-height:115%;font-family:"Calibri","sans-serif";
color:black'><o:p></o:p></span></p>

</div>

</div>

<div>

<p class=MsoNormal>Clarified in&nbsp;the first example of section 4.2 that the
SIP dialog with the media source/sink is not shown.<o:p></o:p></p>

</div>

<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'>

<div>

<div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>3.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In section 3.2 you have
&#8220;For example:<span class=apple-converted-space>&nbsp;</span><a
href="sip:mrcpv2@example.net"><span style='color:windowtext;text-decoration:
none'>sip:mrcpv2@example.net</span></a>&#8221; twice one after the other.</span><span
style='font-size:10.5pt;font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

</blockquote>

<p class=MsoNormal>Fixed.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>4.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In the example in section 4.2
you &#8220;a=cmid:1&#8221;, cmid is specified later in the document so maybe
you can add some reference to where it is specified</span><span
style='font-size:10.5pt;font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>Done.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>5.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In the example is section 4.2
and in following examples you have &#8220;m=audio 49170 RTP/AVP 0 96&#8221; but
do not have an rtpmap parameter for mapping 96 (dynamic payload type number) to
a media encoding name.</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>It is not in the first or third examples (Synthesizer only),
but it is in the second example (Recognizer). &nbsp;I have removed 96 as an
option for the Synthesizer-only examples but let it remain as an addition for
the Recognizer example.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>6.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In section 4.3 &#8220;Also note
that more that one media session can be associated with a single resource if
need be, but this scenario is not useful for the current set of
resources&#8221;. There is a typo the second &#8220;that&#8221; should be
&#8220;than&#8221;. I am also not sure if the current syntax in this document
can support the mode.</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<p class=MsoNormal>Fixed the typo.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>7.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In section 4.3 &#8220;The
formatting of the&quot;cmid&quot; attribute in SDP RFC3388 [RFC4566]&#8221;. I
think you meant SDP grouping and need the reference to RFC 3388.</span><span
style='font-size:10.5pt;font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<p class=MsoNormal>I removed the reference altogether because it already exists
(correctly) earlier in the paragraph.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>8.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In section 5.1 &#8220;The
message-length field specifies the length of the message, including the
start-line&#8221; is the length in Bytes, there is no unit specified.</span><span
style='font-size:10.5pt;font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>Changed &quot;length of the message&quot; to &quot;length of
the message in bytes&quot;.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>9.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In section 6.3.1, typo you have
&#8220;Verfication &#8220; instead of verification. It appears twice in the
section.</span><span style='font-size:10.5pt;font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>Fixed.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>10.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In the example in section 7 you
have &#8220;m=audio 0 RTP/AVP 0 1 3&#8221; payload type 1 was deleted from the
IANA registry, maybe have another payload type number.</span><span
style='font-size:10.5pt;font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>I just removed that payload type. &nbsp;It is not germane to
the example.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>11.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In section 9.4.1, 9.4.2 and
9.4.3 you specify confidence threshold, sensitivity level and speed vs
accuracy. What is the scale here; is it linear between 0 and 1. What is the
absolute value of the number, if you receive the same confidence level from two
recognizers are they the same (e.g. when using context block to switch
servers).&nbsp; For the speed vs accuracy, how does the client know what is the
relation between the value and the number of available sessions, since this
seems to be the reason for using this parameter.</span><span style='font-size:
10.5pt;font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<p class=MsoNormal>The interpretation of all of these parameters is
implementation-specific because the underlying technologies used to implement
them vary and can even be proprietary. &nbsp;In practice the speech recognition
and synthesis and speaker authentication communities have lived with this state
of affairs for many years, and users of other APIs for this technology are well
aware of and have built applications that accommodate this variability in
interpretation. &nbsp;It is outside the scope of this specification to attempt
to standardize interpretations of these values.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>12.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In 9.4.9 and in 10.4.8, 11.4.11
what are the values for media-type-value, you also mention audio and video but
it looks to me that this document only discusses voice.</span><span
style='font-size:10.5pt;font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>Yes. &nbsp;Although the original intent was to record
speech, application authors today are beginning to look at ways to incorporate
other audio or video. &nbsp;The intent of the sentences in these sections is to
clarify that the specification itself imposes no restriction on the types of media
that are allowed.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>13.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In 9.4.35 and 9.4.36 what is
the scale for the consistency here. How does one know what close means. What is
the consistency between different recognizers.</span><span style='font-size:
10.5pt;font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>The answer to question 11, above, applies here as well.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>14.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In section 9.6.3.3 in the
example (figure 2) confidence should be 0.75 and not 75</span><span
style='font-size:10.5pt;font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>Fixed.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>15.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In section 10.4.1 it is not
clear how you measure the sensitivity in order to specify, is it based on some
SNR translated to 0 to 1 scale?</span><span style='font-size:10.5pt;font-family:
Consolas;color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>The answer to question 11, above, applies here as well.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>16.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In 11.4.6 the same issue with
the scale, how does the client know how to set a value when working with
different speaker verification servers.</span><span style='font-size:10.5pt;
font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>Ditto. &nbsp;I should point out that in all of these cases
the parameters are typically passed directly to the engine, and their
interpretations are defined (and described) in the vendors' documentation.
&nbsp;The most common MRCPv2 server implementations are by the technology
vendors themselves (the providers of the synthesis, recognition, and
verification engines). &nbsp;This is commonly understood in this technology
industry (meaning those who use this technology regularly).<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>17.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In 11.5.2.9 you state that the
verification-score is not a probability, so what is it. How can the client
decide if, for example, 0 is a good score for specifying the threshold.&nbsp; I
also noticed that the values in the example in section 11.5.2.10 are very
precise like 0.98514 is this the expected precision. The examples here and in
section 11.11 do not show the threshold, if the threshold is required for this
flow why not show it in the example?</span><span style='font-size:10.5pt;
font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>This parameter, as others mentioned above, has only a
vendor-specific interpretation. &nbsp;In practice authors interpret these
values based both on guidance from the technology vendors and via
experimentation on large sets of recorded data.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<div>

<p class=MsoNormal>The Min-Verification-Score threshold is not required to be
set. &nbsp;In many cases the technology vendor has a fairly good understanding
of what the default threshold should be. &nbsp;The verification-score is
returned, however, in case the application author determines (through
experimentation, as described above) that the default threshold is not
producing optimal results for the application. &nbsp;In that case the author
can set the threshold to a different value or can set it to -1 and make the
determination within the application itself based on the verification-score
values.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>18.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In section 12.3 the suggestion
is to use SRTP as the mandatory interoperability mode. If the reason for
mandating SRTP is for a common mode you should also decide on a key exchange
mechanism. I suggest you look at<span class=apple-converted-space>&nbsp;</span><a
href="http://tools.ietf.org/html/draft-ietf-avt-srtp-not-mandatory-02">http://tools.ietf.org/html/draft-ietf-avt-srtp-not-mandatory-02</a><span
class=apple-converted-space>&nbsp;</span>for discussion on media security.</span><span
style='font-size:10.5pt;font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>Based on the discussion between you and Dan York on the
list, I will change this:<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<div><pre><span class=apple-style-span><span style='font-size:12.0pt;
font-family:"Helvetica","sans-serif"'>12.3. Media session protection&nbsp;</span></span><o:p></o:p></pre><pre><span
class=apple-style-span><span style='font-size:9.0pt;font-family:"Helvetica","sans-serif"'>Sensitive data is also carried on media sessions terminating on MRCPv2 servers (the other end of a media channel may or may not be on the MRCPv2 client). This data includes the user's spoken utterances and the output of text-to-speech operations. MRCPv2 servers MUST support SRTP for protection of audio media sessions. MRCPv2 clients that originate or consume audio similarly MUST support SRTP. Alternative media channel protection MAY be used if desired (e.g. IPSEC).</span></span><o:p></o:p></pre></div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<div>

<p class=MsoNormal>to this:<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<div><pre><span class=apple-style-span><span style='font-size:9.0pt;font-family:
"Helvetica","sans-serif"'>12.3. Media session protection&nbsp;</span></span><o:p></o:p></pre><pre><span
class=apple-style-span><span style='font-size:9.0pt;font-family:"Helvetica","sans-serif"'>Sensitive data is also carried on media sessions terminating on MRCPv2 servers (the other end of a media channel may or may not be on the MRCPv2 client). This data includes the user's spoken utterances and the output of text-to-speech operations. MRCPv2 servers MUST support a security mechanism for protection of audio media sessions. MRCPv2 clients that originate or consume audio similarly MUST support a security mechanism for protection of the audio. If appropriate,&nbsp;usage of the Secure Real-time Transport Protocol (SRTP)&nbsp;[RFC3711] is recommended.</span></span><o:p></o:p></pre></div>

<div>

<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'>

<div>

<div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal style='text-indent:-.25in'><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>19.</span><span
style='font-size:7.0pt;color:black'>&nbsp;&nbsp;<span
class=apple-converted-space>&nbsp;</span></span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:black'>In section13.7.2 you specify
the attribute resource as session level yet in the example in section 4.2 it is
a media level attribute. The same goes for the channel attribute</span><span
style='font-size:10.5pt;font-family:Consolas;color:black'><o:p></o:p></span></p>

</div>

</div>

</div>

</blockquote>

<div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

<p class=MsoNormal>I have corrected both in section 13.7.2 to be media-level.<o:p></o:p></p>

</div>

<div>

<p class=MsoNormal><br>
<br>
<o:p></o:p></p>

<div>

<div>

<div style='margin-left:.5in'>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;<o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>Thanks</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>Roni Even</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;</span><span style='font-size:10.5pt;font-family:Consolas;
color:black'><o:p></o:p></span></p>

</div>

<div>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:black'>&nbsp;<o:p></o:p></span></p>

</div>

</div>

</div>

</div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

</div>

</div>

</body>

</html>

--Boundary_(ID_VAnI9RDuzfD30+DX7QbxZw)--

From Christian.Groves@nteczone.com  Mon Aug 17 23:00:06 2009
Return-Path: <Christian.Groves@nteczone.com>
X-Original-To: speechsc@core3.amsl.com
Delivered-To: speechsc@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 452A33A68AA for <speechsc@core3.amsl.com>; Mon, 17 Aug 2009 23:00:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.605
X-Spam-Level: 
X-Spam-Status: No, score=-2.605 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, RELAY_IS_203=0.994]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rJMrAEipXv7f for <speechsc@core3.amsl.com>; Mon, 17 Aug 2009 23:00:05 -0700 (PDT)
Received: from ipmail02.adl6.internode.on.net (ipmail02.adl6.internode.on.net [203.16.214.140]) by core3.amsl.com (Postfix) with ESMTP id 104DB3A6B60 for <speechsc@ietf.org>; Mon, 17 Aug 2009 23:00:04 -0700 (PDT)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApYBAKrdiUp5LMMG/2dsb2JhbAAI1xCEGQU
X-IronPort-AV: E=Sophos;i="4.43,401,1246804200"; d="scan'208";a="23188982"
Received: from ppp121-44-195-6.lns10.mel4.internode.on.net (HELO [127.0.0.1]) ([121.44.195.6]) by ipmail02.adl6.internode.on.net with ESMTP; 18 Aug 2009 15:30:00 +0930
Message-ID: <4A8A434D.9020900@nteczone.com>
Date: Tue, 18 Aug 2009 15:59:41 +1000
From: Christian Groves <Christian.Groves@nteczone.com>
User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)
MIME-Version: 1.0
To: Dan Burnett <dburnett@voxeo.com>
References: <DD418C36-481A-4426-B586-E5703AD61226@voxeo.com>
In-Reply-To: <DD418C36-481A-4426-B586-E5703AD61226@voxeo.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "IETF SPEECHSC \(E-mail\)" <speechsc@ietf.org>
Subject: Re: [Speechsc] draft-20 changes (in addition to those in Roni's email)
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Aug 2009 06:00:06 -0000

G'Day Dan,

Thanks for the updates. It clarifies my concerns.

Regards, Christian

Dan Burnett wrote:
> Here are the other changes in draft-20 in addition to those in Roni's 
> email:
>
>
> - Corrected the two typos that Arsen mentioned.
>
> - In 11.5.2.8, clarified that the adapted element is only found within 
> the first <voiceprint> element (based on Christian Groves' question).
>
> - Corrected the remaining mistakes in the RNG schema pointed out 
> recently by Christian Groves.
>
>
>
> -- dan
>
> _______________________________________________
> Speechsc mailing list
> Speechsc@ietf.org
> https://www.ietf.org/mailman/listinfo/speechsc
> Supplemental web site:
> &lt;http://www.standardstrack.com/ietf/speechsc&gt;
>

From corbya@microsoft.com  Fri Aug 21 18:13:26 2009
Return-Path: <corbya@microsoft.com>
X-Original-To: speechsc@core3.amsl.com
Delivered-To: speechsc@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 1293C3A69B5 for <speechsc@core3.amsl.com>; Fri, 21 Aug 2009 18:13:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.598
X-Spam-Level: 
X-Spam-Status: No, score=-10.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VGeRr-a27y11 for <speechsc@core3.amsl.com>; Fri, 21 Aug 2009 18:13:23 -0700 (PDT)
Received: from smtp.microsoft.com (mailc.microsoft.com [131.107.115.214]) by core3.amsl.com (Postfix) with ESMTP id 334F93A6838 for <speechsc@ietf.org>; Fri, 21 Aug 2009 18:13:23 -0700 (PDT)
Received: from TK5EX14HUBC101.redmond.corp.microsoft.com (157.54.7.153) by TK5-EXGWY-E803.partners.extranet.microsoft.com (10.251.56.169) with Microsoft SMTP Server (TLS) id 8.2.176.0; Fri, 21 Aug 2009 18:13:28 -0700
Received: from TK5EX14MBXC116.redmond.corp.microsoft.com ([169.254.7.27]) by TK5EX14HUBC101.redmond.corp.microsoft.com ([157.54.7.153]) with mapi; Fri, 21 Aug 2009 18:13:22 -0700
From: Corby Anderson <corbya@microsoft.com>
To: "speechsc@ietf.org" <speechsc@ietf.org>
Thread-Topic: Confusuion with INTERPRET
Thread-Index: Acoixbz/oqwD7Tk/Sv6x0aOqfZ5kDw==
Date: Sat, 22 Aug 2009 01:13:20 +0000
Message-ID: <EF149B22CD1213419BF4DFE038422CAC6D6C23@TK5EX14MBXC116.redmond.corp.microsoft.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Content-Type: multipart/alternative; boundary="_000_EF149B22CD1213419BF4DFE038422CAC6D6C23TK5EX14MBXC116red_"
MIME-Version: 1.0
X-Mailman-Approved-At: Sat, 22 Aug 2009 08:38:25 -0700
Subject: [Speechsc] Confusuion with INTERPRET
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Aug 2009 01:15:30 -0000

--_000_EF149B22CD1213419BF4DFE038422CAC6D6C23TK5EX14MBXC116red_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Does section 9.20 INTERPRET need some clarification? 9.20 states that INTER=
PRETATION should return an INTERPRETATION-COMPLETE event (as described in 9=
.21), but the example in section 9.20 shows the following response:

   S->C:    MRCP/2.0 49 543267 200 COMPLETE
           Channel-Identifier:32AECB23433801@speechrecog
           Completion-Cause:000 success
           Content-Type:application/nlsml+xml
           Content-Length:...

That S->C format is for responses (5.3), not events (5.5).  Contrast this w=
ith the RECOGNITION-RESPONSE event to RECOGNIZE:

   S->C:MRCP/2.0 486 RECOGNITION-COMPLETE 543260 COMPLETE
   Channel-Identifier:32AECB23433801@speechrecog
   Completion-Cause:000 success
   Waveform-URI:<http://web.media.com/session123/audio.wav>;
                size=3D124535;duration=3D2340
   Content-Type:applicationt/x-nlsml
   Content-Length:...

Shouldn't the first line of the INTERPRETATION-COMPLETE event be something =
like the following?
   S->C:    MRCP/2.0 49 INTERPRETATION-COMPLETE 543267 COMPLETE

The only mention of INTERPRETATION-COMPLETE in the spec are
* table of contents
* 9.3 Recognizer events
* 9.21 where it's described
* 13.1.2 MRCPv2 methods and events
* 15 Normative definition

I found no usage examples for INTERPRETATION-COMPLETE; most notably not in =
9.20



Also, section 9.9 states
   For the recognizer resource, RECOGNIZE is the only request that
   returns a request-state of IN-PROGRESS, meaning that recognition is
   in progress.

But the example in 9.20 for INTERPRET shows
   S->C:    MRCP/2.0 49 543266 200 IN-PROGRESS
           Channel-Identifier:32AECB23433801@speechrecog

Is the recognizer resource the resource that performs interpretation?  If s=
o, then the text in 9.9 should be changed to say the following:
   For the recognizer resource, RECOGNIZE and INTERPRET are the only
   requests that return a request-state of IN-PROGRESS, meaning that
   recognition or interpretation is in progress.


Corby Anderson


--_000_EF149B22CD1213419BF4DFE038422CAC6D6C23TK5EX14MBXC116red_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=3DContent-Type content=3D"text/html; charset=3Dus-ascii">
<meta name=3DGenerator content=3D"Microsoft Word 12 (filtered medium)">
<style>
<!--
 /* Font Definitions */
 @font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
span.EmailStyle17
	{mso-style-type:personal-compose;
	font-family:"Calibri","sans-serif";
	color:windowtext;}
.MsoChpDefault
	{mso-style-type:export-only;}
@page Section1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.Section1
	{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
 <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
 <o:shapelayout v:ext=3D"edit">
  <o:idmap v:ext=3D"edit" data=3D"1" />
 </o:shapelayout></xml><![endif]-->
</head>

<body lang=3DEN-US link=3Dblue vlink=3Dpurple>

<div class=3DSection1>

<p class=3DMsoNormal>Does section 9.20 INTERPRET need some clarification? 9=
.20
states that INTERPRETATION should return an INTERPRETATION-COMPLETE event (=
as
described in 9.21), but the example in section 9.20 shows the following
response:<o:p></o:p></p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp; S-&gt;C:&nbsp;&nbsp;&nbsp; MRCP/2.0 49 54=
3267
200 COMPLETE<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;
Channel-Identifier:32AECB23433801@speechrecog<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;
Completion-Cause:000 success<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;
Content-Type:application/nlsml+xml<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;
Content-Length:...<o:p></o:p></p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal>That S-&gt;C format is for responses (5.3), not events=
 (5.5).&nbsp;
Contrast this with the RECOGNITION-RESPONSE event to RECOGNIZE:<o:p></o:p><=
/p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp; S-&gt;C:MRCP/2.0 486 RECOGNITION-COMPLETE
543260 COMPLETE<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp; Channel-Identifier:32AECB23433801@speechr=
ecog<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp; Completion-Cause:000 success<o:p></o:p></=
p>

<p class=3DMsoNormal>&nbsp;&nbsp;
Waveform-URI:&lt;http://web.media.com/session123/audio.wav&gt;;<o:p></o:p><=
/p>

<p class=3DMsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
size=3D124535;duration=3D2340<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp; Content-Type:applicationt/x-nlsml<o:p></o=
:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp; Content-Length:...<o:p></o:p></p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal>Shouldn&#8217;t the first line of the INTERPRETATION-C=
OMPLETE
event be something like the following?<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp; S-&gt;C:&nbsp;&nbsp;&nbsp; MRCP/2.0 49 IN=
TERPRETATION-COMPLETE
543267 COMPLETE<o:p></o:p></p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal>The only mention of INTERPRETATION-COMPLETE in the spe=
c are<o:p></o:p></p>

<p class=3DMsoNormal>* table of contents<o:p></o:p></p>

<p class=3DMsoNormal>* 9.3 Recognizer events<o:p></o:p></p>

<p class=3DMsoNormal>* 9.21 where it&#8217;s described<o:p></o:p></p>

<p class=3DMsoNormal>* 13.1.2 MRCPv2 methods and events<o:p></o:p></p>

<p class=3DMsoNormal>* 15 Normative definition<o:p></o:p></p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal>I found no usage examples for INTERPRETATION-COMPLETE;=
 most
notably not in 9.20<o:p></o:p></p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal>Also, section 9.9 states<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp; For the recognizer resource, RECOGNIZE is=
 the
only request that<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp; returns a request-state of IN-PROGRESS, m=
eaning
that recognition is<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp; in progress.<o:p></o:p></p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal>But the example in 9.20 for INTERPRET shows<o:p></o:p>=
</p>

<p class=3DMsoNormal>&nbsp;&nbsp; S-&gt;C:&nbsp;&nbsp;&nbsp; MRCP/2.0 49 54=
3266
200 IN-PROGRESS<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;
Channel-Identifier:32AECB23433801@speechrecog<o:p></o:p></p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal>Is the recognizer resource the resource that performs
interpretation?&nbsp; If so, then the text in 9.9 should be changed to say =
the
following:<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp;&nbsp; For the recognizer resource, RECOGNIZE an=
d
INTERPRET are the only<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp; &nbsp;requests that return a request-state of
IN-PROGRESS, meaning that<o:p></o:p></p>

<p class=3DMsoNormal>&nbsp; &nbsp;recognition or interpretation is in progr=
ess.<o:p></o:p></p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal>Corby Anderson<o:p></o:p></p>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

</div>

</body>

</html>

--_000_EF149B22CD1213419BF4DFE038422CAC6D6C23TK5EX14MBXC116red_--

From arthur@istnetworks.com  Mon Aug 24 03:17:56 2009
Return-Path: <arthur@istnetworks.com>
X-Original-To: speechsc@core3.amsl.com
Delivered-To: speechsc@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 129203A6E19 for <speechsc@core3.amsl.com>; Mon, 24 Aug 2009 03:17:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.001
X-Spam-Level: *
X-Spam-Status: No, score=1.001 tagged_above=-999 required=5 tests=[BAYES_50=0.001, J_BACKHAIR_33=1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id U0cTu9DsNOcb for <speechsc@core3.amsl.com>; Mon, 24 Aug 2009 03:17:55 -0700 (PDT)
Received: from Out001.Mi8.com (out001.mi8.com [216.166.12.74]) by core3.amsl.com (Postfix) with ESMTP id 3F35C3A69FA for <speechsc@ietf.org>; Mon, 24 Aug 2009 03:17:54 -0700 (PDT)
Received: from AUSP02VMBX01.Mi8.com ([10.4.8.6]) by Out001.Mi8.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 24 Aug 2009 06:17:59 -0400
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Mon, 24 Aug 2009 06:18:09 -0400
Message-ID: <19EAD33AD254324284516DA3335C8FEA960203@AUSP02VMBX01.Mi8.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Semantic Scripts - Arrays
Thread-Index: AcokpC3iUckZGeaIT2WOssle7gDrPA==
From: "Arthur Vernon" <arthur@istnetworks.com>
To: <speechsc@ietf.org>
X-OriginalArrivalTime: 24 Aug 2009 10:17:59.0703 (UTC) FILETIME=[27F42A70:01CA24A4]
Subject: [Speechsc] Semantic Scripts - Arrays
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Aug 2009 10:17:56 -0000

In "Speech Processing for IP Networks" there is a reference to how
arrays of values are to be handled.

The gist of it is:=20

"It is possible for the rule variable of the root rule to be of type
Array (e.g. x[0], x[1], etc) or indeed
for one of its properties to be of type Array."

I cannot see a concrete example in the specification of how this
information is supposed to be communicated from the server to the
client.

The MRCP 2 server reports this...
<stock confidence=3D"0.687036">
   BHP
   RIO
</stock>

As you can see, there is notional identification of the two items I
spoke by way of a CR/LF.

The MRCP 2 client reports receiving this:

<stock confidence=3D"0.687036">
   BHP
   RIO
</stock>


On the document server, this is getting passed back to me as stock:
"BRI"

Further reading of the logs on the MRCP client (produced using)

<log>The type of stock is <value
expr=3D"typeof(Buy$.interpretation.stock)"/></log>
<log>The value of stock is <value
expr=3D"Buy$.interpretation.stock"/></log>

reveals:

The type of stock is string
The value of stock is BRI

I would have expected that the type of stock to be array. (containing
BHP and RIO)


My problem is (from a user perspective) who do I complain to to seek a
resolution to the problem?

Can the specification clarify this matter please (or maybe I have just
missed this bit).

Here is the (edited and relevant bits of the) grammar used to generate
this:

<rule id=3D"generic"> Buy <tag>out._service =3D "StockBuy"; =
out.stock=3Dnew
Array(); var index=3D0;</tag>
                            <item repeat=3D"0-2">
                                <ruleref uri=3D"#company"/>
                                <tag>out.stock[index++] =3D
""+rules.company;</tag>
                            </item>
                        </rule>
                        <rule id=3D"company">
                            <item>
                                <one-of>
=20
<item>BHP<tag>out=3D"BHP";</tag></item>
                                    <item>Rio tinto
<tag>out=3D"RIO";</tag></item>
                                </one-of>
                            </item>
                        </rule>

It is a bit hacked because in the process I have tried numerious ways to
realise a working array.


Kind Regards,
Arthur Vernon - Lead Software Architect, IST Networks

AUSTRALIA
(m) +61 411 336 176
(w) +61 (8) 6380 2058
Unit 5B, 1 Station St, Subiaco WA 6008=20


