
Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id GAA04344 for ietf-xml-mime-bks; Fri, 24 Mar 2000 06:26:15 -0800 (PST)
Received: from prserv.net (out1.prserv.net [32.97.166.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id GAA04340 for <ietf-xml-mime@imc.org>; Fri, 24 Mar 2000 06:26:13 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.179]) by prserv.net (out1) with SMTP id <2000032414280525202g3emde>; Fri, 24 Mar 2000 14:28:05 +0000
Message-Id: <200003241427.AA02055@t3knz.attglobal.net>
From: MURATA Makoto <muraw3c@attglobal.net>
Date: Fri, 24 Mar 2000 23:27:55 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
In-Reply-To: <Pine.GSO.4.21.0003232257550.21042-100000@gate>
References: <Pine.GSO.4.21.0003232257550.21042-100000@gate>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Rick Jelliffe wrote...

 >> HTTP already has the accept-charset field.  I do not understand your claim.
 >
 >If I am using a DOM parser, I cannot ask it "what encodings do you
 >support?" If I am using SAX in Java, I can assume that the encodings
 >underlying Java are the ones available. I don't recall any C or C++ XML
 >parser that exposes this information: I don't think Expat does, for
 >example. 

Although such information is described in their manuals, I do not 
think that they have any APIs for "what encodings do you
support?".  I agree.

 >If my browser cannot ask its XML processor "what encodings do you
 >support?" in order to perform content negotiation for XML, then either the
 >poor user must configure it themselves or the HTTP code has to take on the
 >responsibility for providing transcoding services itself (perhaps not a
 >bad thing for the future).  And configuration has to be done
 >application-by-application: for example James Clark's vanilla Expat did
 >not accept Big5, so every XML application built on it was pretty unusable
 >for Traditional Chinese here.  

Ideally, XML processors should silently provide the accept-charset field.  

On top of document entities, an XML processor may silently fetch external 
parsed entities, external parameter entities, and external DTD subsets.  
(I know expat doesn't, but other parsers do.)  Since application 
programmers cannot control such fetching, the best solution is hardcode 
the accept-charset field in the XML processor.  Certainly, the person who 
writes the XML processor knows which encoding is supported.  (It would be 
great if we can register callback routines for unsupported charsets.)

 >
 >Even if you are using DOM (or
 >SAX) it is quite possible that the system integrator has chosen to use a
 >different implementation from the one which the software developer used.
 >So you cannot ask DOM, and the programmer cannot be sure which
 >implementation is being used. 

Right.  But we can always assume that the same UCS characters will be 
received.

 >So it seems to me that content negotiation of character encoding for XML
 >is a bit of a myth: it requires that the user test applications rather
 >than it being transparent. 

If content negotiation is hardcoded in XML processors, application programmers 
do not have to worry.

>That is an unreasonable and unworkable
 >requirement. At the moment, the browser has to guess which encodings are
 >available, or the poor user has to test if the local encoding is
 >supported. (I suppose systems could also have some automated system which
 >requested a big5 XML file and then tried to parse it. Not really
 >elegant. Presumably the XML file would have to be sourced internally. )

I am afraid that I do not fully understand your claim.  Could you 
try again?

 >For content negotiation of MIME types, a browser knows which content-types
 >have handlers. But it doesn't know this information for character-encoding
 >for the XML applications it has. That is why I 

Probably, you sent this mail before you finished the last sentence.  

----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA13627 for ietf-xml-mime-bks; Thu, 23 Mar 2000 09:10:57 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA13623 for <ietf-xml-mime@imc.org>; Thu, 23 Mar 2000 09:10:56 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id MAA08084; Thu, 23 Mar 2000 12:12:49 -0500
Message-ID: <38DA5072.61415F6F@w3.org>
Date: Thu, 23 Mar 2000 18:12:18 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: MURATA Makoto <muraw3c@attglobal.net>
CC: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <38D903E1.7E6BC41F@w3.org> <200003230919.AA02031@t3knz.attglobal.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:
> 
> In message "Re: Some text that may be useful for the update of RFC 2376",
> Chris Lilley wrote...
> 
>  >Thus, for any XML file which is not encoded in US-ASCII, text/xml is an
>  >inappropriate choice of MIME type. Silent data corruption can and will
>  >occur.
> 
> If I use UTF-8 or UTF-16 and provide the charset parameter, no data
> corruption will occur.

Provided that the fallback to text/plain does not occur, which cannot be
guaranteed.

>  >For all international xml files (noting that in this context, the USA is
>  >international too due to the widespread use of Spanish, and the wide numbe
>  >rof other languages in use), a type such as application/xml is the correct
>  >choice, unless ther eis a more specific non-text type available.
> 
> If an XML document is readable for casual users, satisfies
> restrictions of the top-level media type "text", and does not require
> special dispatching, "text/xml" is the most appropriate media type.

Thats a lot of ifs.

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA12658 for ietf-xml-mime-bks; Thu, 23 Mar 2000 08:11:46 -0800 (PST)
Received: from gate.sinica.edu.tw (gate.sinica.edu.tw [140.109.4.130]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA12654 for <ietf-xml-mime@imc.org>; Thu, 23 Mar 2000 08:11:44 -0800 (PST)
Received: from localhost by gate.sinica.edu.tw (8.9.3/8.9.3) with ESMTP id AAA22642 for <ietf-xml-mime@imc.org>; Fri, 24 Mar 2000 00:13:34 +0800 (CST)
Date: Fri, 24 Mar 2000 00:13:33 +0800 (CST)
From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
X-Sender: ricko@gate
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
In-Reply-To: <200003170736.AA01945@t3knz.attglobal.net>
Message-ID: <Pine.GSO.4.21.0003232257550.21042-100000@gate>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On Fri, 17 Mar 2000, MURATA Makoto wrote:

> Rick Jelliffe wrote...

>  >We need a way to ensure end-to-end integrity. 
> 
> I do not agree.  Why?

By "we" I mean my employers and I, not "we" as in ietf-xml-mime. 
We need to be able to send and receive Big5-encoded XML where we have
no control over the behaviour of intermediate systems. 

>  >(and, in any case,
>  >there is no mechanism currently for an XML parser to feed information
>  >about which encodings it accepts to the HTTP system to set up the 
>  >preferences in the first place.)
> 
> HTTP already has the accept-charset field.  I do not understand your claim.

If I am using a DOM parser, I cannot ask it "what encodings do you
support?" If I am using SAX in Java, I can assume that the encodings
underlying Java are the ones available. I don't recall any C or C++ XML
parser that exposes this information: I don't think Expat does, for
example. 

If my browser cannot ask its XML processor "what encodings do you
support?" in order to perform content negotiation for XML, then either the
poor user must configure it themselves or the HTTP code has to take on the
responsibility for providing transcoding services itself (perhaps not a
bad thing for the future).  And configuration has to be done
application-by-application: for example James Clark's vanilla Expat did
not accept Big5, so every XML application built on it was pretty unusable
for Traditional Chinese here.  

Even if you are using DOM (or
SAX) it is quite possible that the system integrator has chosen to use a
different implementation from the one which the software developer used.
So you cannot ask DOM, and the programmer cannot be sure which
implementation is being used. 

So it seems to me that content negotiation of character encoding for XML
is a bit of a myth: it requires that the user test applications rather
than it being transparent. That is an unreasonable and unworkable
requirement. At the moment, the browser has to guess which encodings are
available, or the poor user has to test if the local encoding is
supported. (I suppose systems could also have some automated system which
requested a big5 XML file and then tried to parse it. Not really
elegant. Presumably the XML file would have to be sourced internally. )

For content negotiation of MIME types, a browser knows which content-types
have handlers. But it doesn't know this information for character-encoding
for the XML applications it has. That is why I 

Rick Jelliffe



Received: by ns.secondary.com (8.9.3/8.9.3) id GAA09850 for ietf-xml-mime-bks; Thu, 23 Mar 2000 06:01:05 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id GAA09846 for <ietf-xml-mime@imc.org>; Thu, 23 Mar 2000 06:01:04 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id JAA21493; Thu, 23 Mar 2000 09:02:36 -0500
Message-ID: <38DA23F5.1C519480@w3.org>
Date: Thu, 23 Mar 2000 15:02:29 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: MURATA Makoto <muraw3c@attglobal.net>
CC: ietf-xml-mime@imc.org
Subject: Re: Meida types and stylesheet linking
References: <38D8FD3C.E5ADD63C@w3.org> <200003230219.AA02016@t3knz.attglobal.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:
> 
> In message "Re: Meida types and stylesheet linking",
> Chris Lilley wrote...
>  >Can you point me to some documentation that says that this usage of a MIME
>  >type label, outside of a content-type header, is somehow invalid or
>  >incorrect?
> 
> In RFC 2045, we have the following:
> 
>    The purpose of the Content-Type field is to describe the data
>    contained in the body fully enough that the receiving user agent can
>    pick an appropriate agent or mechanism to present the data to the
>    user, or otherwise deal with the data in an appropriate manner. The
>    value in this field is called a media type.
> 
> So, if something is not specified in the content-type field, it is
> not a media type :-)

No, sorry, the paragraph you quote does not establish commutativity.

The valie in that field is a media type, yes. That clearly establishes that
it is an error to put something else in that field. 

It does not establish that it is an error to use that type elsewhere in
other contexts, and that is what people have been doing to good effect. The
alternative being to invent a new and parallel labelling scheme, which is
just a waste.

So the answer to my original question above is "no, I cannot" 

--
Chris


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id BAA29074 for ietf-xml-mime-bks; Thu, 23 Mar 2000 01:21:17 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id BAA29069 for <ietf-xml-mime@imc.org>; Thu, 23 Mar 2000 01:21:16 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.172]) by prserv.net (out4) with SMTP id <2000032309231023902likc8e>; Thu, 23 Mar 2000 09:23:11 +0000
Message-Id: <200003230923.AA02033@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 18:23:03 +0900
To: Chris Lilley <chris@w3.org>
Cc: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <38D906A7.8837412F@w3.org>
References: <38D906A7.8837412F@w3.org>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...
 >

 >> Unfortunately, we do not have "fairly good state of encoding declaration
 >> of XML files".  People generate XML documents by XSLT or their own programs,
 >> and fail to specify the correct charset.
 >
 >That is not a problem. Such files will not be well formed and thus, will
 >fail toparse.

You are saying that the omission of the charset parameter is a problem, 
and that incorrect encoding PIs are not problems.  I do not know why.  

Many Japanese users have failed to specify correct encoding PIs, and many   
Japanese programmers have failed to generate them correctly.  I also heard 
that users in developing countries copy ISO-8859-1 HTML files and mistakenly 
put incorrect meta tags.  The same thing will happen to XML.  In-band encoding 
is not free from errors.

 >> I think that you are not paying attention to other textual format. 
 >
 >Oh I am, but not on this list where it is off topic.

In RFC 2318 (text/css) you co-authored, the charset parameter is described 
as below:

       The syntax of CSS is expressed in US-ASCII, but a CSS file can
       contain strings which may use any Unicode character. Any charset
       that is a superset of US-ASCII may be used; US-ASCII, iso-8859-X
       and utf-8 are recommended.

RFC 2616 (HTTP) is a draft standard and defined the default as below:

   The "charset" parameter is used with some media types to define the
   character set (section 3.4) of the data. When no explicit charset
   parameter is provided by the sender, media subtypes of the "text"
   type are defined to have a default charset value of "ISO-8859-1" when
   received via HTTP. Data in character sets other than "ISO-8859-1" or
   its subsets MUST be labeled with an appropriate charset value. See
   section 3.4.1 for compatibility problems.

Thus, the default value of the charset parameter of text/css is 
ISO-8859-1.  I know that CSS recommendations are different.   But 
in the realm of IETF, the default value is ISO-8859-1.

 >> I would
 >> like XML to be a good citizen of the WWW and to establish a good practise
 >
 >As would I. I don't consider the propogation of known faults to be "good
 >practice".

Sorry, but I have to trust W3C I18N WG, etc. 


 >>  The charset parameter
 >> is not a historical requirement.  Rather, it is the right solution,
 >> which is just about to take off.  I think that we are wasting our
 >> limited resources by repeating old discussion rather than doing more
 >> implemenations.
 >
 >You consistently fail to address the issue of file system processing of
 >XML, and instead characterise all opposition to your proposal as "time
 >wasting". I will be happy to characterise it as that once you have given a
 >satisfactory response to the questions I pose.

The long-term goal is to make file systems of operating systems to 
provide the charset parameter.  Encoding declarations are tentative 
solutions.

I am not insisting on my proposal.  I am insisting on the rough 
consensus achived in the past.  Since the I18N WG asked the XML Syntax WG 
not to change the precedence of the charset parameter, I am extremely 
reluctant to do such changes.  Up to now, the only change I can support 
is to mandate the charset parameter of text/xml.

 >> Since XML processors support UTF-8 and UTF-16, transcoding from Unicode to
 >> legacy encodings does not look very attractive. 
 >
 >I agree that such transcoding is unattractive, but you seem to want to bias
 >the XML MIME specification to supporting such transcoding whatever the cost
 >to other sorts of processing.

The only "other sorts of processing" I can imagine is to provide the charset 
parameter.  I understand that it is not very easy at present, but WWW servers 
are getting better.  You think that the cost of developing and using XML-aware 
transcoders and the cost of inventing different in-band encoding for 
different textual formats is not a big deal.  I do not agree.


 >However, something that converts an XML file from 8859-1 to UTF-8 and
 >leaves the endoding declaration saying 8859-1 is not useful. It has not
 >generated XML. It has made a thing which will fail to parse.

Since the charset parameter is now authoritative, such documents will parse.

----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id BAA29066 for ietf-xml-mime-bks; Thu, 23 Mar 2000 01:21:14 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id BAA29062 for <ietf-xml-mime@imc.org>; Thu, 23 Mar 2000 01:21:14 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.172]) by prserv.net (out4) with SMTP id <2000032309230823902likc7e>; Thu, 23 Mar 2000 09:23:09 +0000
Message-Id: <200003230919.AA02031@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 18:19:50 +0900
To: Chris Lilley <chris@w3.org>
Cc: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <38D903E1.7E6BC41F@w3.org>
References: <38D903E1.7E6BC41F@w3.org>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...

 >Thus, for any XML file which is not encoded in US-ASCII, text/xml is an
 >inappropriate choice of MIME type. Silent data corruption can and will
 >occur. 

If I use UTF-8 or UTF-16 and provide the charset parameter, no data 
corruption will occur.

 >For all international xml files (noting that in this context, the USA is
 >international too due to the widespread use of Spanish, and the wide numbe
 >rof other languages in use), a type such as application/xml is the correct
 >choice, unless ther eis a more specific non-text type available.

If an XML document is readable for casual users, satisfies 
restrictions of the top-level media type "text", and does not require 
special dispatching, "text/xml" is the most appropriate media type.

Cheers,

----
MURATA Makoto  muraw3c@attglobal.net


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id BAA29060 for ietf-xml-mime-bks; Thu, 23 Mar 2000 01:21:13 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id BAA29056 for <ietf-xml-mime@imc.org>; Thu, 23 Mar 2000 01:21:11 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.172]) by prserv.net (out4) with SMTP id <2000032309230623902likc6e>; Thu, 23 Mar 2000 09:23:07 +0000
Message-Id: <200003230922.AA02032@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 18:22:07 +0900
To: Chris Lilley <chris@w3.org>
Cc: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <38D902E0.8E2DA0E9@w3.org>
References: <38D902E0.8E2DA0E9@w3.org>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...
 >

 >> I think that such a transcoder is very helpful because it works for
 >> all textual formats and also because it is very efficient.
 >
 >No, it is not helpful, because it makes like a lot more difficult for
 >everyone else and leads to data corruption.

Apparently, Martin and I do not agree with you.

 >Incidentally I don't see an answer to my question about what such an
 >XML-unaweare transcoder would do when converting down from UTF-8 or UTF-16
 >to some 8-bit charset withall the unrepresentable characters. Since it
 >doesn't know XML itcan't use NCRS. What does it do, silently replace these
 >characters with question marks? And that is somehow OK? 

In the case of XML, conversion from Unicode to legacy encodings is not very 
useful.  Even when such conversion is requested, transcoders can give up 
transcoding, when they encounter something unrepresentable.


 >>  >> The charset parameter is such a solution.
 >>  >
 >>  >It is one such solution. There are better ones, and indeed a much better
 >>  >one in the XML specification.
 >> 
 >> It works only for XML.  It is not bad, when the MIME header is not available.
 >> But when it is available, we must rely on the charset parameter.
 >
 >For text/*, yes, we have to. Luckily there is application/* and model/* and
 >image/* and so forth for people using XML who care about data integrity and
 >don't want cheap text processing tools playing fast and loose with their
 >data.

I believe that "always the charset parameter" is the recommendation shown 
in RFC 2130 and the public page of W3C I18N WG.

In my message "History of the charset issue", I tried to summarize 
my understanding of the history.  I am unable to ignore the consensus 
of W3C I18N WG, W3C XML Syntax WG, and W3C XML SIG&WG, and the recommendation 
shown in RFC 2130.

 >> You are advocating different in-band encoding signatures for different
 >> formats.  I think that this is a significant burden to users and speficiation
 >> developers.
 >
 >You are advocating different out-of-band or in-band or mixed signatures for
 >different protocols.

The long-term goal is to make file systems of OS aware of the charset 
parameter.  Editors know the charset, and they store the charset information 
in the file system.  This info is then passed to WWW servers and further passed 
to WWW browsers.  The charset info is completely hidden from users and 
everything is automatic.  There will be no data corruption.

As of today, we need in-band signature and some tricks to keep out-of-band 
signature and in-band signature consistent.

Most modern WWW servers provide the charset parameter.  We only have to 
encourage them without repeading old arguments.

 > A solution that requires every "save as" of an XML
 >file to rewrite the (incorrect, but overridded by a MIME charset parameter)
 >encoding declaration, which was only incorrect because one of your "I know
 >how to fiddle with all text files" transcoders silently broke it in the
 >first place. This places, as you say, an intolerable burden on users.

You are confusing XML-unaware transcoders and XML-aware programs which 
save XML documents into files.

 >One of the things about XML, which differs from HTML, is typical patterns
 >of use. XML treansmitted over HTTP ius likely to be extensively manipulated
 >from the filesystem of both the server and the client, a common operation
 >which your proposal makes much more difficult, just to allow people who
 >write simple text processing tools to not add XML support. As a trade off,
 >i hope it is obvious to everyone else why this is such a bad idea.

I agree on the first and second sentence and completely disagree with 
the last sentence.


----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id VAA17163 for ietf-xml-mime-bks; Wed, 22 Mar 2000 21:45:22 -0800 (PST)
Received: from sh.w3.mag.keio.ac.jp (sh.w3.mag.keio.ac.jp [133.27.194.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id VAA17154 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 21:45:19 -0800 (PST)
Received: from enoshima (dhcp-100-224.mag.keio.ac.jp [133.27.195.224]) by sh.w3.mag.keio.ac.jp (8.9.3/3.7W) with ESMTP id OAA08218; Thu, 23 Mar 2000 14:46:37 +0900 (JST)
Message-Id: <4.2.0.58.J.20000323143207.0331d9c0@sh.w3.mag.keio.ac.jp>
X-Sender: duerst@sh.w3.mag.keio.ac.jp
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Thu, 23 Mar 2000 14:45:48 +0900
To: Tim Bray <tbray@textuality.com>, John Cowan <jcowan@reutershealth.com>
From: "Martin J. Duerst" <duerst@w3.org>
Subject: Re: UTF-16, the BOM, and media types
Cc: MURATA Makoto <muraw3c@attglobal.net>, ietf-xml-mime@imc.org
In-Reply-To: <3.0.32.20000322135249.0295ba70@pop.intergate.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 00/03/22 13:52 -0800, Tim Bray wrote:
>At 04:34 PM 3/22/00 -0500, John Cowan wrote:
> >> Section 4.3.3 of XML 1.0 says
> >>  "Entities encoded in UTF-16 must begin with the Byte Order Mark described
> >>   by ISO/IEC 10646 Annex E and Unicode Appendix B (the ZERO WIDTH NO-BREAK
> >>   SPACE character, #xFEFF)."
> >
> >That describes entities encoded in the charset called "UTF-16".  It says
> >nothing about entities encoded in the charsets "UTF-16BE" and "UTF-16LE"
> >or for that matter charset "x-focs".
>
>Yep, if you hold your head at just the right angle, and don't think of
>the word "rhinocerous", you can convince yourself that the 16[BL]E
>encodings are really different things entirely, just happen to share
>a few characters with That Other Encoding's name, just close personal
>friends, etc...

Well, we know they are closely related, but what do processors do?
No XML processor is supposed or even allowed to assume that e.g.
iso-8859-1 and iso-8859-15 are closely related, or that e.g.
iso-8859-1 and windows-1252 are even more closely related.
There is no way for a processor to figure out. Trying to guess
at that level in a non-interactive environment is doomed to
fail. Trying to guess on prefixes of names is of course crazy.

So if something comes in with a label of UTF-16BE, then an XML
processor can either say 'sorry, don't know UTF-16BE', or it
can know it and interpret it accordingly. Every XML processor
has to understand UTF-16, but supporting UTF-16LE is not
required. If you don't like UTF-16LE for XML, just don't
support it.

And please note the following erratum to the XML spec:
http://www.w3.org/XML/xml-19980210-errata#E44

New:
       00 3C ## ##,
       00 25 ## ##,
       00 20 ## ##,
       00 09 ## ##,
       00 0D ## ## or
       00 0A ## ##: Big-endian UTF-16 or ISO-10646-UCS-2. Note that, absent
                    an encoding declaration, these cases are strictly
                    speaking in error.
       3C 00 ## ##,
       25 00 ## ##,
       20 00 ## ##,
       09 00 ## ##,
       0D 00 ## ## or
       0A 00 ## ##: Little-endian UTF-16 or ISO-10646-UCS-2. Note that, absent
                    an encoding declaration, these cases are strictly
                    speaking in error.

old:
       00 3C 00 3F: UTF-16, big-endian, no Byte Order Mark (and thus,
                    strictly speaking, in error)
       3C 00 3F 00: UTF-16, little-endian, no Byte Order Mark (and thus,
                    strictly speaking, in error)

The new text is quite a bit clearer. But if it's not clear enough,
then we'll have to make it even clearer.


Regards,   Martin.


Received: by ns.secondary.com (8.9.3/8.9.3) id VAA17162 for ietf-xml-mime-bks; Wed, 22 Mar 2000 21:45:22 -0800 (PST)
Received: from sh.w3.mag.keio.ac.jp (sh.w3.mag.keio.ac.jp [133.27.194.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id VAA17155 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 21:45:19 -0800 (PST)
Received: from enoshima (dhcp-100-224.mag.keio.ac.jp [133.27.195.224]) by sh.w3.mag.keio.ac.jp (8.9.3/3.7W) with ESMTP id OAA08215; Thu, 23 Mar 2000 14:46:35 +0900 (JST)
Message-Id: <4.2.0.58.J.20000323141220.0331ede0@sh.w3.mag.keio.ac.jp>
X-Sender: duerst@sh.w3.mag.keio.ac.jp
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Thu, 23 Mar 2000 14:48:02 +0900
To: Tim Bray <tbray@textuality.com>, John Cowan <jcowan@reutershealth.com>, MURATA Makoto <muraw3c@attglobal.net>
From: "Martin J. Duerst" <duerst@w3.org>
Subject: Re: UTF-16, the BOM, and media types
Cc: ietf-xml-mime@imc.org
In-Reply-To: <3.0.32.20000322130928.015296f0@pop.intergate.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

I'm surprised this comes up again.

At 00/03/22 13:09 -0800, Tim Bray wrote:
>At 02:57 PM 3/22/00 -0500, John Cowan wrote:
> >> UTF-16le and UTF-16be cannot be used for XML.  XML mandates
> >> the BOM for utf-16.  Meanwhile, utf-16le and utf-16be cannot
> >> have the BOM.  More about this, see RFC 2781.
> >
> >I do not understand this from the text of XML 1.0.  Clause 4.3.3 only says
> >that if there is no encoding declaration, then either:
>
>Section 4.3.3 of XML 1.0 says
>  "Entities encoded in UTF-16 must begin with the Byte Order Mark described
>   by ISO/IEC 10646 Annex E and Unicode Appendix B (the ZERO WIDTH NO-BREAK
>   SPACE character, #xFEFF)."
>
>Thus in my view the RFC is correct,

Sorry, which RFC? If you mean RFC 2781, then I just checked
and I didn't find the string 'XML' there.

And it's not supposed to show up. RFCs about character encodings
are not supposed to say things about various document formats,
and document formats, for the largest part, are not supposed to
say something about specific character encodings.


>and thus 16BE and 16LE are not useful
>for XML.  It is good practice, whenever you store anything in UTF-16, to
>put a BOM in, and XML makes that good practice compulsory, which is pretty
>painless since it seems that virtually all software that writes UTF-16 does
>so anyhow. The cost of a BOM is zilch.  The benefit in data survival in the
>face of stupid byte order tricks (yes, they still happen), is immense.
>
>Martin Duerst, a smart guy whom I respect, invested several hours in
>trying to convince me that the 16[BL]E variants with forbidden-BOM had
>some real-world justification,

Well, that was mainly because you insisted that they needed to be
forbidden unless there was some real-world justification. But that's
not the real issue. The real issue is that each spec stays with it's
business.

XML does that most of the time, and requiring a BOM for UTF-16
in the XML spec makes sense in the context of requiring all
XML processors to accept UTF-8 and UTF-16 even without any
encoding information. Otherwise, it wouldn't always be possible
to distinguish them. Apart from that, it does not make sense
to use the XML spec to try to legislate on any character encodings.
It would be very surprising if e.g. the XML spec said that
EUC-JP is okay, but Shift_JIS is not okay, or Shift_JIS is
okay, except for half-width kana, and so on.

What you think is suitable for XML is another thing, that can
go into tutorials, books, and so on. Some people would claim
that only UTF-8 is suitable, others would claim whatever they
want. Some get it right, and others get it wrong. It's not
for the spec to judge.


Regards,   Martin.

P.S.: If you wonder where UTF-16BE/LE could be of use in the context
       of XML, there was recently a discussion in the XML Signature
       WG about the use of XPath. The BOM confused a lot of people
       on that group.


Received: by ns.secondary.com (8.9.3/8.9.3) id SAA04284 for ietf-xml-mime-bks; Wed, 22 Mar 2000 18:23:33 -0800 (PST)
Received: from prserv.net (out2.prserv.net [32.97.166.32]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id SAA04278 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 18:23:32 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.2]) by prserv.net (out2) with SMTP id <2000032302251322901cm4sle>; Thu, 23 Mar 2000 02:25:14 +0000
Message-Id: <200003230219.AA02016@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 11:19:18 +0900
To: Chris Lilley <chris@w3.org>
Cc: ietf-xml-mime@imc.org
Subject: Re: Meida types and stylesheet linking
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <38D8FD3C.E5ADD63C@w3.org>
References: <38D8FD3C.E5ADD63C@w3.org>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Meida types and stylesheet linking",
Chris Lilley wrote...
 >Can you point me to some documentation that says that this usage of a MIME
 >type label, outside of a content-type header, is somehow invalid or
 >incorrect?

In RFC 2045, we have the following:

   The purpose of the Content-Type field is to describe the data
   contained in the body fully enough that the receiving user agent can
   pick an appropriate agent or mechanism to present the data to the
   user, or otherwise deal with the data in an appropriate manner. The
   value in this field is called a media type.

So, if something is not specified in the content-type field, it is 
not a media type :-)

----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id NAA27175 for ietf-xml-mime-bks; Wed, 22 Mar 2000 13:51:10 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id NAA27171 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 13:51:09 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id NAA18264; Wed, 22 Mar 2000 13:51:28 -0800
Message-Id: <3.0.32.20000322135249.0295ba70@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Wed, 22 Mar 2000 13:52:51 -0800
To: John Cowan <jcowan@reutershealth.com>
From: Tim Bray <tbray@textuality.com>
Subject: Re: UTF-16, the BOM, and media types
Cc: MURATA Makoto <muraw3c@attglobal.net>, ietf-xml-mime@imc.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 04:34 PM 3/22/00 -0500, John Cowan wrote:
>> Section 4.3.3 of XML 1.0 says
>>  "Entities encoded in UTF-16 must begin with the Byte Order Mark described
>>   by ISO/IEC 10646 Annex E and Unicode Appendix B (the ZERO WIDTH NO-BREAK
>>   SPACE character, #xFEFF)."
>
>That describes entities encoded in the charset called "UTF-16".  It says
>nothing about entities encoded in the charsets "UTF-16BE" and "UTF-16LE"
>or for that matter charset "x-focs".

Yep, if you hold your head at just the right angle, and don't think of
the word "rhinocerous", you can convince yourself that the 16[BL]E 
encodings are really different things entirely, just happen to share
a few characters with That Other Encoding's name, just close personal
friends, etc...

But I just don't get it.  This feels perverse.  I repeat, anyone using
any flavor of UTF-16 can (and usually does) put in a BOM, if only on a
belt-and-suspenders basis; and for this reason, de facto, the 16[BL]E
media types, which forbid this practice, are simply not in practical terms 
usable for XML.

I'm not denying that these things exist.  Just asserting that the RFC is
correct in saying they don't work with XML. -Tim


Received: by ns.secondary.com (8.9.3/8.9.3) id NAA27112 for ietf-xml-mime-bks; Wed, 22 Mar 2000 13:47:35 -0800 (PST)
Received: from mauve.innosoft.com (DSL107-055.brandx.net [209.55.107.55]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id NAA27101 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 13:47:33 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JNBW2V7NN40000IV@MAUVE.INNOSOFT.COM> for ietf-xml-mime@imc.org; Wed, 22 Mar 2000 13:49:21 -0800 (PST)
Date: Wed, 22 Mar 2000 13:47:51 -0800 (PST)
Subject: Re: Finishing(!) the XML-tagging discussion
In-reply-to: "Your message dated Wed, 22 Mar 2000 20:00:39 +0000" <4.2.2.20000322195726.00bfa2c0@pop.dial.pipex.com>
To: Graham Klyne <GK@dial.pipex.com>
Cc: "Martin J. Duerst" <duerst@w3.org>, ietf-xml-mime@imc.org
Message-id: <01JNC3GA5EDC0000IV@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii; format=flowed
References: <01JNAEGY3OOG00004D@MAUVE.INNOSOFT.COM> <4.2.2.20000321081348.00a75320@pop.dial.pipex.com> <4.2.0.58.J.20000322115620.03467dc0@sh.w3.mag.keio.ac.jp>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> At 11:59 AM 3/22/00 +0900, Martin J. Duerst wrote:
> >should definitely be
> >
> >>accept-features: (&((language=de),(language=fr)))

> A nit, before this example becomes received reality:  the way to express
> this in CONNEG is:

>    accept-features: (| (language=de) (language=fr) )

> (i.e. use logical OR, fewer brackets in this case, and no comma between
> options.  The basic syntax is lifted from LDAP search filters.)

Um, the point of the example was to have something that _doesn't_ exist.
Hence the use of AND, not OR.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id NAA26889 for ietf-xml-mime-bks; Wed, 22 Mar 2000 13:32:18 -0800 (PST)
Received: from mail.reutershealth.com (mail.reutershealth.com [204.243.9.36]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id NAA26885 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 13:32:17 -0800 (PST)
Received: from reutershealth.com (IDENT:cowan@skunk.reutershealth.com [204.243.9.153]) by mail.reutershealth.com (Pro-8.9.3/8.9.3) with ESMTP id QAA19198; Wed, 22 Mar 2000 16:34:03 -0500 (EST)
Message-ID: <38D93C7A.7EB52152@reutershealth.com>
Date: Wed, 22 Mar 2000 16:34:50 -0500
From: John Cowan <jcowan@reutershealth.com>
Organization: Reuters Health Information
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.5-15 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Tim Bray <tbray@textuality.com>
CC: MURATA Makoto <muraw3c@attglobal.net>, ietf-xml-mime@imc.org
Subject: Re: UTF-16, the BOM, and media types
References: <3.0.32.20000322130928.015296f0@pop.intergate.ca>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Tim Bray wrote:

> Section 4.3.3 of XML 1.0 says
>  "Entities encoded in UTF-16 must begin with the Byte Order Mark described
>   by ISO/IEC 10646 Annex E and Unicode Appendix B (the ZERO WIDTH NO-BREAK
>   SPACE character, #xFEFF)."

That describes entities encoded in the charset called "UTF-16".  It says
nothing about entities encoded in the charsets "UTF-16BE" and "UTF-16LE"
or for that matter charset "x-focs".

> It is good practice, whenever you store anything in UTF-16, to
> put a BOM in,

I don't deny it.  But the letter of the specification permits UTF-16[BL]E,
as long as an explicit encoding declaration is present.

> and XML makes that good practice compulsory,

I am not convinced.

> Martin Duerst, a smart guy whom I respect, invested several hours in
> trying to convince me that the 16[BL]E variants with forbidden-BOM had
> some real-world justification, but I forget what it is...

The main issue is that people actually do use them.  Charset names, like media
types, are intended to permit labeling of what actually exists.  The existence
of a charset name does not mean that anybody thinks the corresponding charset
is a Good Thing.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)


Received: by ns.secondary.com (8.9.3/8.9.3) id NAA26461 for ietf-xml-mime-bks; Wed, 22 Mar 2000 13:08:44 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id NAA26456 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 13:08:43 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id NAA18175; Wed, 22 Mar 2000 13:08:27 -0800
Message-Id: <3.0.32.20000322130928.015296f0@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Wed, 22 Mar 2000 13:09:49 -0800
To: John Cowan <jcowan@reutershealth.com>, MURATA Makoto <muraw3c@attglobal.net>
From: Tim Bray <tbray@textuality.com>
Subject: UTF-16, the BOM, and media types
Cc: ietf-xml-mime@imc.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 02:57 PM 3/22/00 -0500, John Cowan wrote:
>> UTF-16le and UTF-16be cannot be used for XML.  XML mandates
>> the BOM for utf-16.  Meanwhile, utf-16le and utf-16be cannot
>> have the BOM.  More about this, see RFC 2781.
>
>I do not understand this from the text of XML 1.0.  Clause 4.3.3 only says
>that if there is no encoding declaration, then either:

Section 4.3.3 of XML 1.0 says
 "Entities encoded in UTF-16 must begin with the Byte Order Mark described 
  by ISO/IEC 10646 Annex E and Unicode Appendix B (the ZERO WIDTH NO-BREAK 
  SPACE character, #xFEFF)."

Thus in my view the RFC is correct, and thus 16BE and 16LE are not useful
for XML.  It is good practice, whenever you store anything in UTF-16, to 
put a BOM in, and XML makes that good practice compulsory, which is pretty 
painless since it seems that virtually all software that writes UTF-16 does 
so anyhow. The cost of a BOM is zilch.  The benefit in data survival in the 
face of stupid byte order tricks (yes, they still happen), is immense.

Martin Duerst, a smart guy whom I respect, invested several hours in
trying to convince me that the 16[BL]E variants with forbidden-BOM had
some real-world justification, but I forget what it is... and I remain
convinced that they are simply not suitable for use with XML. -Tim


Received: by ns.secondary.com (8.9.3/8.9.3) id MAA25573 for ietf-xml-mime-bks; Wed, 22 Mar 2000 12:18:27 -0800 (PST)
Received: from hose.pipex.net (hose.news.pipex.net [158.43.128.58]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id MAA25569 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 12:18:25 -0800 (PST)
Received: from GK-VAIO (userk545.uk.uudial.com [193.149.70.121]) by hose.pipex.net (Postfix) with ESMTP id 4CE8D451D; Wed, 22 Mar 2000 20:20:14 +0000 (GMT)
Message-Id: <4.2.2.20000322195726.00bfa2c0@pop.dial.pipex.com>
X-Sender: maiw03@pop.dial.pipex.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Wed, 22 Mar 2000 20:00:39 +0000
To: "Martin J. Duerst" <duerst@w3.org>
From: Graham Klyne <GK@dial.pipex.com>
Subject: Re: Finishing(!) the XML-tagging discussion
Cc: ietf-xml-mime@imc.org
In-Reply-To: <4.2.0.58.J.20000322115620.03467dc0@sh.w3.mag.keio.ac.jp>
References: <01JNAEGY3OOG00004D@MAUVE.INNOSOFT.COM> <"Your message dated Tue, 21 Mar 2000 08:26:34 -0500" <200003211326.IAA19007@astro.cs.utk.edu> <4.2.2.20000321081348.00a75320@pop.dial.pipex.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 11:59 AM 3/22/00 +0900, Martin J. Duerst wrote:
>should definitely be
>
>>accept-features: (&((language=de),(language=fr)))

A nit, before this example becomes received reality:  the way to express 
this in CONNEG is:

   accept-features: (| (language=de) (language=fr) )

(i.e. use logical OR, fewer brackets in this case, and no comma between 
options.  The basic syntax is lifted from LDAP search filters.)

#g

------------
Graham Klyne
(GK@ACM.ORG)



Received: by ns.secondary.com (8.9.3/8.9.3) id LAA25180 for ietf-xml-mime-bks; Wed, 22 Mar 2000 11:54:50 -0800 (PST)
Received: from mail.reutershealth.com (mail.reutershealth.com [204.243.9.36]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id LAA25176 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 11:54:49 -0800 (PST)
Received: from reutershealth.com (IDENT:cowan@skunk.reutershealth.com [204.243.9.153]) by mail.reutershealth.com (Pro-8.9.3/8.9.3) with ESMTP id OAA18233; Wed, 22 Mar 2000 14:56:40 -0500 (EST)
Message-ID: <38D925A5.48188F12@reutershealth.com>
Date: Wed, 22 Mar 2000 14:57:25 -0500
From: John Cowan <jcowan@reutershealth.com>
Organization: Reuters Health Information
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.5-15 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: MURATA Makoto <muraw3c@attglobal.net>
CC: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <Pine.GSO.4.21.0003152145500.28051-100000@gate> <200003221650.AA02009@t3knz.attglobal.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:

> UTF-16le and UTF-16be cannot be used for XML.  XML mandates
> the BOM for utf-16.  Meanwhile, utf-16le and utf-16be cannot
> have the BOM.  More about this, see RFC 2781.

I do not understand this from the text of XML 1.0.  Clause 4.3.3 only says
that if there is no encoding declaration, then either:

	a BOM is present, and the encoding is UTF-16, or

	no BOM is present, and the encoding is UTF-8.

If a proper encoding declaration is present, then any charset may be
used; however, parsers are only required to handle UTF-8 and UTF-16.
(In practice, all parsers known to me also accept US-ASCII and ISO-8859-1.)

For example, a file beginning with the characters

	<?xml version='1.0' encoding='x-focs'?>

encoded in Finagle's Own Character Set is perfectly legal, and will be parsed
successfully by any parser with an x-focs conversion table.  This is true even if
x-focs is a multi-byte character set.

I see absolutely no reason why UTF-16BE and UTF-16LE should be excluded from
the list of acceptable charsets.  It is true that Appendix F claims that a text beginning with the bytes 00 3C 00 3F or 3C 00 3F 00 is "strictly speaking,
in error", but Appendix F is marked "non-normative", and this text is
qualified in E44 anyhow.

> I see no reasons for preserving byte sequences.  We only have to
> preserve XML information sets.

Almost, since strictly speaking the charset is part of the information set.

> Existing programming languages do not support Unicode very well, as
> I see it.

Except Java, Javascript, Ada 95, Dylan ....
 
-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id JAA22818 for ietf-xml-mime-bks; Wed, 22 Mar 2000 09:43:29 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA22814 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 09:43:28 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id MAA12822; Wed, 22 Mar 2000 12:45:16 -0500
Message-ID: <38D906A7.8837412F@w3.org>
Date: Wed, 22 Mar 2000 18:45:11 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: MURATA Makoto <muraw3c@attglobal.net>
CC: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <38CF9D82.F4420ED4@w3.org> <200003221650.AA02008@t3knz.attglobal.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:
> 
> In message "Re: Some text that may be useful for the update of RFC 2376",
> Chris Lilley wrote...
>  >and attempt to
>  >stretch this to make loose and wooly the current, fairly good state of
>  >encoding declaration of XML files?
> 
> Unfortunately, we do not have "fairly good state of encoding declaration
> of XML files".  People generate XML documents by XSLT or their own programs,
> and fail to specify the correct charset.

That is not a problem. Such files will not be well formed and thus, will
fail toparse.


>  Encoding PIs are not bad when
> the MIME header is absent.  But mistakes do happen.
> 
>  >Several people have pointed out that I am focussing on XML here. I would
>  >refer them to the name and scope of the mailing list.
> 
> I think that you are not paying attention to other textual format. 

Oh I am, but not on this list where it is off topic.

> I would
> like XML to be a good citizen of the WWW and to establish a good practise

As would I. I don't consider the propogation of known faults to be "good
practice".


>  >Incidentally, XML is probably not best described as a textual format. It is
>  >a data format, which can among other things be used to describe
>  >international text. I am aware that the text/* media types have some
>  >historical requirements regarding 'character set'; this is sufficient that
>  >my opinion is that text/* should not be used for XML in general.
>  >Application/xml has no such problems (though it seems that people propose
>  >to propogate these problems there).
> 
> I think that many XML documents are readable for casual users and that
> the top-level type "text" is most appropriate.

As long as they only use US-ASCII.

>  The charset parameter
> is not a historical requirement.  Rather, it is the right solution,
> which is just about to take off.  I think that we are wasting our
> limited resources by repeating old discussion rather than doing more
> implemenations.

You consistently fail to address the issue of file system processing of
XML, and instead characterise all opposition to your proposal as "time
wasting". I will be happy to characterise it as that once you have given a
satisfactory response to the questions I pose.


>  >It is possible for example to take a payload of image/svg-xml and alter it
>  >from UTF-16 to ISO-8859-15 (this would entail rewriting the encoding
>  >declaration and insertion of NCRs for any characters outside the repertoire
>  >of 8859-15). I would be most upset, as would every decoder on the planet,
>  >if the same conversion was performed on image/png.
> 
> Since XML processors support UTF-8 and UTF-16, transcoding from Unicode to
> legacy encodings does not look very attractive. 

I agree that such transcoding is unattractive, but you seem to want to bias
the XML MIME specification to supporting such transcoding whatever the cost
to other sorts of processing.

>  What is needed is the
> other way around: conversion from legacy encodings to Unicode.  Such transcoders
> do not need character references by numbers.

Thanks. That is the first time that I saw you limit these "all text" 
transcoders to somewhere that they might at least be useful and be able to
represent all the characters.

However, something that converts an XML file from 8859-1 to UTF-8 and
leaves the endoding declaration saying 8859-1 is not useful. It has not
generated XML. It has made a thing which will fail to parse. A transcoder
that *knows* is converting from (list of legacy 8-bit charsets) to (UTF-8
or UTF-16) can always do the right thing by always emitting an XML
declaration without an encodingdeclareation (or better, one thatsauys which
of UTF-8 or UTF-16 is used). This is one line of code. So then all it needs
to do is strip out any existing XML declaration. That is pretty trivial,
too. I mean, grep -v will do in a pinch ;-)

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA22557 for ietf-xml-mime-bks; Wed, 22 Mar 2000 09:31:36 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA22553 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 09:31:35 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id MAA11222; Wed, 22 Mar 2000 12:33:26 -0500
Message-ID: <38D903E1.7E6BC41F@w3.org>
Date: Wed, 22 Mar 2000 18:33:21 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: MURATA Makoto <muraw3c@attglobal.net>
CC: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <38CE7404.316E69DB@w3.org> <200003221650.AA02006@t3knz.attglobal.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:
> In message "Re: Some text that may be useful for the update of RFC 2376",
> Chris Lilley wrote...
>  >That was what I was suggesting, changing it for text/xml only.
> 
> If we consider e-mail and fallback to text/plain, there is not
> much we can do to text/xml.  We can mandate the charset parameter.
> That's all what we can do.  We cannot change the default.

Correct. The text/plain fallback for HTTP and for email differs, and thus,
only the intersection of these two defaults can be relied upon.


Thus, for any XML file which is not encoded in US-ASCII, text/xml is an
inappropriate choice of MIME type. Silent data corruption can and will
occur. 

For all international xml files (noting that in this context, the USA is
international too due to the widespread use of Spanish, and the wide numbe
rof other languages in use), a type such as application/xml is the correct
choice, unless ther eis a more specific non-text type available.

--
Chris


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id JAA22500 for ietf-xml-mime-bks; Wed, 22 Mar 2000 09:28:21 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA22495 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 09:28:19 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id MAA10571; Wed, 22 Mar 2000 12:29:08 -0500
Message-ID: <38D902E0.8E2DA0E9@w3.org>
Date: Wed, 22 Mar 2000 18:29:04 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: MURATA Makoto <muraw3c@attglobal.net>
CC: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <38CE65A3.1BFE9ECF@w3.org> <200003221650.AA02005@t3knz.attglobal.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:
> 
> In message "Re: Some text that may be useful for the update of RFC 2376",
> Chris Lilley wrote...
>  >Yes - such behaviour is clearly broken. Since a transcoder is changing many
>  >or all the other bytes in the file, expecting it to also correctly update
>  >the encoding declaration rather than leaving it broken is not asking too
>  >much.
> 
> I think that such a transcoder is very helpful because it works for
> all textual formats and also because it is very efficient.

No, it is not helpful, because it makes like a lot more difficult for
everyone else and leads to data corruption.

Incidentally I don't see an answer to my question about what such an
XML-unaweare transcoder would do when converting down from UTF-8 or UTF-16
to some 8-bit charset withall the unrepresentable characters. Since it
doesn't know XML itcan't use NCRS. What does it do, silently replace these
characters with question marks? And that is somehow OK? 

What about if this makes the file no longer well formed - that is OK too?
And all to save half an hour of developer time in a transcoder, at the
expense of silently corrupted data and lots more work for developers of
tools and for users, to patch up the errors that such tools introduce. This
is a really bad idea!

>  >> The charset parameter is such a solution.
>  >
>  >It is one such solution. There are better ones, and indeed a much better
>  >one in the XML specification.
> 
> It works only for XML.  It is not bad, when the MIME header is not available.
> But when it is available, we must rely on the charset parameter.

For text/*, yes, we have to. Luckily there is application/* and model/* and
image/* and so forth for people using XML who care about data integrity and
don't want cheap text processing tools playing fast and loose with their
data.


>  >> We should not try to bend
>  >> specifications only to invent an ad-hoc solution for a particular format.
>  >
>  >I can only agree with that sentence by replacing "format" with "protocol".
> 
> You are advocating different in-band encoding signatures for different
> formats.  I think that this is a significant burden to users and speficiation
> developers.

You are advocating different out-of-band or in-band or mixed signatures for
different protocols. A solution that requires every "save as" of an XML
file to rewrite the (incorrect, but overridded by a MIME charset parameter)
encoding declaration, which was only incorrect because one of your "I know
how to fiddle with all text files" transcoders silently broke it in the
first place. This places, as you say, an intolerable burden on users.

One of the things about XML, which differs from HTML, is typical patterns
of use. XML treansmitted over HTTP ius likely to be extensively manipulated
from the filesystem of both the server and the client, a common operation
which your proposal makes much more difficult, just to allow people who
write simple text processing tools to not add XML support. As a trade off,
i hope it is obvious to everyone else why this is such a bad idea.

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA22042 for ietf-xml-mime-bks; Wed, 22 Mar 2000 09:03:16 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA22037 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 09:03:15 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id MAA05349; Wed, 22 Mar 2000 12:05:05 -0500
Message-ID: <38D8FD3C.E5ADD63C@w3.org>
Date: Wed, 22 Mar 2000 18:05:00 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: MURATA Makoto <muraw3c@attglobal.net>
CC: ietf-xml-mime@imc.org
Subject: Re: Meida types and stylesheet linking
References: <38D8CE54.6CC45E61@w3.org> <200003221446.AA01978@t3knz.attglobal.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:
> 
> In message "Re: Meida types and stylesheet linking",
> Chris Lilley wrote...
>  >The media type is being used as a
>  >label. But it is not being used as a value for the Content-Type mail or
>  >HTTP header, so is not asserting that the entire resource is of this type.
> 
> Historically, HTML has (ab)used media types even when no MIME entities are
> involved.  For example, embedded stylesheets represented by <style> of
> HTML always have media types.

Can you point me to some documentation that says that this usage of a MIME
type label, outside of a content-type header, is somehow invalid or
incorrect?

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA21524 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:53:07 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21520 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:53:06 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216545823901e0040e>; Wed, 22 Mar 2000 16:54:58 +0000
Message-Id: <200003221650.AA02014@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:50:41 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <200003171444.JAA28170@hesketh.net>
References: <200003171444.JAA28170@hesketh.net>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Simon St.Laurent wrote...
 >>HTTP already has the accept-charset field.  I do not understand your claim.
 >
 >I think Rick may just be saying that the couplings between XML parsers and
 >HTTP handlers are pretty loose right now, and it's up to programmers to
 >tighten those connections.  While it's easy to switch out, say Xerces, in
 >favor of Aelfred, the HTTP end of the connection isn't going to change it's
 >accept-charset settings automatically.  It probably won't even notice the
 >change.

I think that the accept-charset field should be hardcoded in XML processors.  
If an XML processor supports only a few charsets, it is probably a good idea 
to hardcode the accept-charset field for these charsets.

On the other hand, programmers who use existing XML processors should be 
liberated from encoding issues.

----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA21513 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:53:05 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21509 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:53:04 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216545523901e003ve>; Wed, 22 Mar 2000 16:54:56 +0000
Message-Id: <200003221650.AA02013@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:50:39 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <38D0D07C.C9464048@w3.org>
References: <38D0D07C.C9464048@w3.org>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...

 >Yes. Though it is not clear what a non-XML-aware transcodes would do when
 >swizzling an XML document between encodings. Since it can't use entities or
 >NCRs, what does it use for characters that do not fall within the
 >repertoire of te encoding it is converting to? Question marks?

I believe that most trascoding of XML documents will be from legacy 
encodings to Unicode, since every XML processors can handle Unicode.  
If this is the case, we do not need character references or CDATA 
sections.

 >> It should always be an error of some kind if the charset parameter
 >> does not agree with the encoding attribute (or other Appendix F
 >> mechanism).  Only given that constraint is it useful to make the
 >> charset parameter significant.
 >
 >I agree, but then given that constraint the charset parameter is
 >superfluous since it adds no new information. However, I might be prepared
 >to conceed that it is not too harmful as long as it is constrained tosay
 >what the XML encoding says, and for it to be an error for these to differ.

I do not understand why the difference between the charset parameter 
and the encoding PI is harmful.  The arguement against the charset 
parameter is its omission.  If it is omitted, inconsistency will not arise.  
If it is provided, it is authoritative.



----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA21504 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:53:03 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21500 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:53:02 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216545423901e003ue>; Wed, 22 Mar 2000 16:54:54 +0000
Message-Id: <200003221650.AA02012@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:50:38 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <Pine.GSO.4.21.0003161206400.27383-100000@gate>
References: <Pine.GSO.4.21.0003161206400.27383-100000@gate>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Rick Jelliffe wrote...

 >I take your point, but it  sidesteps the issue of
 >what someone is supposed to do when they type in Yen and
 >their "UTF-*"-labelled data comes out with the codepoint
 >for "/" being used. In that case, strictly they have
 >used the wrong mapping table and they have corrupted their
 >data; but if we can give them a way to escape into the
 >bliss of standard Unicode by labelling the variant encoding
 >they have effectively used. 

This issue is very annoying, and I have failed to find 
any good solutions.  I think that the best solution is to migrate 
to Unicode.

 >The proposed Japanese Profile for XML, which Murata-san 
 >has been the leading light, says that there needs to be
 >extra IANA-registered sets to cover this problem. 

Actually, although some names are used in the document, they 
are not intended to be submitted to the ietf-charset list.


----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA21495 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:53:01 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21491 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:53:00 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216545223901e003te>; Wed, 22 Mar 2000 16:54:52 +0000
Message-Id: <200003221650.AA02011@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:50:36 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <200003160241.VAA08781@astro.cs.utk.edu>
References: <200003160241.VAA08781@astro.cs.utk.edu>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Keith Moore wrote...
 >> I don't think the 'charset' parameter works that way.
 >> Either you use it, the way it's defined, or you don't.
 >
 >charset is defined for text.  it's not defined for other types.
 >application/* types can define a charset parameter any way they want.
 >(so, for that matter, could audio/*, video/*, or model/*, but 
 >that seems less likely)

Right.  But RFC 2046 mentions the possibility of using the charset 
parameter for other top-level media types.

----
MURATA Makoto  muraw3c@attglobal.net


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id IAA21486 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:53:00 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21482 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:52:59 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216544923901e003se>; Wed, 22 Mar 2000 16:54:49 +0000
Message-Id: <200003221650.AA02009@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:50:32 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <Pine.GSO.4.21.0003152145500.28051-100000@gate>
References: <Pine.GSO.4.21.0003152145500.28051-100000@gate>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Rick Jelliffe wrote...
 >What about this?
 >
 >	1) In all cases, charset parameter is required.
 >	There is no default. Failure is an unrecoverable
 >	error, for general applications. Detection is
 >	mandatory.

This is a change I can agree on.

 >	2) In all cases, all code sequences in
 >	the document must match code sequences allowed
 >	by the encoding specified by the charset parameter.
 >	Failure is an unrecoverable error, for general
 >	applications. Detection is not mandatory.

Agreed.  I think that this is not an issue of RFC 2376 but an 
issue of XML 1.0.
	
 >	3) In all cases, if the document starts with a BOM,
 >	the charset parameter must indicate which flavour
 >	of UTF-16 is being used. There is no default.
 >	Failure is an unrecoverable error, for general
 >	applications. Detection is not mandatory, but should
 >	be made so at some future date.

UTF-16le and UTF-16be cannot be used for XML.  XML mandates 
the BOM for utf-16.  Meanwhile, utf-16le and utf-16be cannot 
have the BOM.  More about this, see RFC 2781.

 >	4) If the document is sent text/xml, the encoding
 >	parameter of the XML header is not checked. However,
 >	well-behaved systems should rewrite the encoding
 >	attribute of the XML header to agree with charset 
 >	parameter. 

When the recipient has to discard the MIME header, it has 
to change the encoding PI.  I believe that RFC 2376 already 
covers this.

 >	5) If the data is sent application/xml then
 >	the charset parameter must agree with the
 >	encoding attribute of the XML header. Failure is
 >	an unrecoverable error, for general applications.
 >	Detection is not mandatory.

In other words, you are proposing that XML-unaware transcoders 
should not be used for application/xml.  Since I would like to encourage 
effecient and generic transcoders, I am reluctant.

 >	6) The rules above can be bent or strengthened for
 >	specialist applications, by specific agreement between
 >	the recipient and sending parties. The main 
 >	alteration envisaged would be to allow, as an 
 >	obvious error-recovery strategy, that if the 
 >	charset parameter is missing, the encoding attribute
 >	of the XML header can be used. Another alteration
 >	envisaged is for some defaulting to be used.
 >	However, specialist applications which require this
 >	behaviour should not, in general be using text/xml*
 >	or application/xml*.

Some restrictions are useful for some XML-based media types.  For 
example, application/iotp-xml might allow Unicode only.   I am 
willing to mention such restrictions in the I-D. 

 >Discussion:
 >
 >The reason for 1) is that we have a clash between user expections
 >(iso8859-1), RFCs (US-ASCII) and XML defaults (UTF-8). There is
 >no winnable solution to defaults. 

I am personally happy to mandate the charset parameter.  

When RFC 2376 was sent to the IAB, the default for text/xml in the 
case of HTTP was 8859-1.  The IAB suggested US-ASCII.

 >The reason for 2) is simply to state clearly that error-recovery
 >from corrupted data is not the norm.
 >
 >The reason for 3) is that, as Murata-san's proposed
 >Japanese Profile of XML makes clear, there are Japanese flavours
 >of Unicode floating about.

As Martin corrected, conversion tables are ambiguous.  But there 
are no flavors of Unicode.

 >The reason for 5) is that the reason why we have application/xml
 >as well as text/xml is to prevent point-to-point manipulation of
 >the data. It should be treated like a binary file. It should 
 >allow end-to-end data integrity. 

I do not understand why we have to prohibit transcoding that 
does not rewrite encoding declarations.  The main argument against 
the charset parameter is that it is often missing or incorrect.  
Application/xml allows the omission of the charset parameter.  
If it is omitted, we rely on autodection described in XML 1.0.  
I believe that it was Martin who proposed this compromise in the 
W3C XML SIG and everybody can live with it.

I see no reasons for preserving byte sequences.  We only have to 
preserve XML information sets.

 >(There is a fundamental weak point in point-to-point charset 
 >parameter transmission: there is no standard mechanism for 
 >registering the character set of individual files which a 
 >webserver can pick up: furthermore, some programming languages 

AddType and AddCharset of Apache allows registeration for 
each directory.  We can also use conventions for file extensions.

It would be great if the W3C team further enhances Apache.

 >such as C do not have a character type but operate on storage types, 
 >so the encoding data is not available automatically anyway; 

Existing programming languages do not support Unicode very well, as 
I see it.

 >also, on UNIX systems using pipes, there is no parallel channel 
 >available for out-of-band information between the processes on 
 >either side of the pipe, so encoding information may be
 >difficult to propogate automatically. 

This is true, but programs interchange DOM data rather than textual 
XML.



----
MURATA Makoto  muraw3c@attglobal.net


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id IAA21476 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:52:57 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21471 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:52:56 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216544623901e003re>; Wed, 22 Mar 2000 16:54:47 +0000
Message-Id: <200003221650.AA02008@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:50:30 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <38CF9D82.F4420ED4@w3.org>
References: <38CF9D82.F4420ED4@w3.org>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...
 >
 >
 >MURATA Makoto wrote:
 >> 
 >> In message "Re: Some text that may be useful for the update of RFC 2376",
 >> Martin J. Duerst wrote...
 >>  >CSS is served as text/css. XSL is XML. VBScript and JavaScript may be
 >>  >served as application/... If they don't have a 'charset' parameter,
 >>  >and they don't have any internal way to indicate the encoding,
 >>  >that's the problem of these registrations, not our problem.
 >> 
 >> Are you saying that each format should invent their own rules for
 >> indicating the charset?  My understanding was (and still is) that
 >> you as an I18n guy at W3C are promoting a single generalized solution
 >> for all textual formats.
 >
 >Are you saying that each transport protocol (which formally inclues direct
 >filesystem access) should have their own, sometimes contradictory,
 >overrides and defaults and assumptions?

RFC 2130 clearly recommends the use of the MIME header and its charset 
parameter.

 >Or that we should take the current,
 >lowest-common-denominator, fails far more often than it works charset
 >parameter of two particular protocols (each of which has a different
 >default, ands neither of which is implemented consistently) 

As for text/xml and application/xml, the default does not depend on 
the protocol.  

 >and attempt to
 >stretch this to make loose and wooly the current, fairly good state of
 >encoding declaration of XML files?

Unfortunately, we do not have "fairly good state of encoding declaration 
of XML files".  People generate XML documents by XSLT or their own programs, 
and fail to specify the correct charset.  Encoding PIs are not bad when 
the MIME header is absent.  But mistakes do happen.

 >Several people have pointed out that I am focussing on XML here. I would
 >refer them to the name and scope of the mailing list.

I think that you are not paying attention to other textual format.  I would 
like XML to be a good citizen of the WWW and to establish a good practise 
for every textual format.

 >Incidentally, XML is probably not best described as a textual format. It is
 >a data format, which can among other things be used to describe
 >international text. I am aware that the text/* media types have some
 >historical requirements regarding 'character set'; this is sufficient that
 >my opinion is that text/* should not be used for XML in general.
 >Application/xml has no such problems (though it seems that people propose
 >to propogate these problems there).

I think that many XML documents are readable for casual users and that 
the top-level type "text" is most appropriate.  The charset parameter 
is not a historical requirement.  Rather, it is the right solution, 
which is just about to take off.  I think that we are wasting our 
limited resources by repeating old discussion rather than doing more 
implemenations.

 >It is possible for example to take a payload of image/svg-xml and alter it
 >from UTF-16 to ISO-8859-15 (this would entail rewriting the encoding
 >declaration and insertion of NCRs for any characters outside the repertoire
 >of 8859-15). I would be most upset, as would every decoder on the planet,
 >if the same conversion was performed on image/png.

Since XML processors support UTF-8 and UTF-16, transcoding from Unicode to 
legacy encodings does not look very attractive.  What is needed is the 
other way around: conversion from legacy encodings to Unicode.  Such transcoders 
do not need character references by numbers.

----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA21465 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:52:54 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21459 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:52:53 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216544523901e003qe>; Wed, 22 Mar 2000 16:54:45 +0000
Message-Id: <200003221650.AA02007@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:50:29 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <2339B88D6AA6D31187A80008C7E6F6722D910B@daemsg01.software-ag.de>
References: <2339B88D6AA6D31187A80008C7E6F6722D910B@daemsg01.software-ag.de>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "RE: Some text that may be useful for the update of RFC 2376",
Langer, Paul wrote...

 >But there is an open issue with XML (media type "text/xml") via HTTP:

I think that RFC 2376 covers all issues about the charset.  Your point 
is not an open issue but rather a proposal to revisit past decisions.

 >There are systems out there now (e.g. IE5, Netscape 4.7) that send
 >XML documents with correct encoding declaration as media type "text/xml"
 >without charset parameter.
 >If the document arrives without a charset parameter in the Content-Type
 >header at the XML processor's site, the processor does not know whether
 >there was a transcoder involved or not and has to use encoding "us-ascii"
 >for this document.

Yes, the charset parameter is often missing or incorrect.  But if we 
change the default, all MIME engines will fail to work.  We cannot sacrifice 
conformant implementations so as to rescue non-conformat implementations.

----
MURATA Makoto  muraw3c@attglobal.net


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id IAA21454 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:52:52 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21449 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:52:52 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216544323901e003pe>; Wed, 22 Mar 2000 16:54:43 +0000
Message-Id: <200003221650.AA02006@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:50:26 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <38CE7404.316E69DB@w3.org>
References: <38CE7404.316E69DB@w3.org>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...
 >
 >
 >MURATA Makoto wrote:
 >> 
 >> In message "Re: Some text that may be useful for the update of RFC 2376",
 >> Chris Lilley wrote...
 >> 
 >>  >> - XML sent (e.g. mail, http) as text/xml (or equivalent, e.g. text/vnd.wap.wml):
 >>  >
 >>  >as text/"anything" in other words
 >> 
 >> I think that RFC 2046 covers text/* in general.  RFC 2376 cannot
 >> change the default of HTTP (i.e., 8859-1).  The IAB allowed
 >> RFC 2376 to change the default for text/xml only.
 >
 >That was what I was suggesting, changing it for text/xml only.

If we consider e-mail and fallback to text/plain, there is not 
much we can do to text/xml.  We can mandate the charset parameter.  
That's all what we can do.  We cannot change the default.





----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA21444 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:52:51 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21440 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:52:50 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216544123901e003oe>; Wed, 22 Mar 2000 16:54:42 +0000
Message-Id: <200003221650.AA02005@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:50:24 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <38CE65A3.1BFE9ECF@w3.org>
References: <38CE65A3.1BFE9ECF@w3.org>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...
 >
 >
 >MURATA Makoto wrote:

 >> We are all aware of this problem.  We are also aware of transcoders
 >> which changes the charset parameter but does not rerwrite encoding
 >> declarations.
 >
 >Yes - such behaviour is clearly broken. Since a transcoder is changing many
 >or all the other bytes in the file, expecting it to also correctly update
 >the encoding declaration rather than leaving it broken is not asking too
 >much.

I think that such a transcoder is very helpful because it works for 
all textual formats and also because it is very efficient.

 >> The charset parameter is such a solution. 
 >
 >It is one such solution. There are better ones, and indeed a much better
 >one in the XML specification.

It works only for XML.  It is not bad, when the MIME header is not available.  
But when it is available, we must rely on the charset parameter.

 >> We should not try to bend
 >> specifications only to invent an ad-hoc solution for a particular format.
 >
 >I can only agree with that sentence by replacing "format" with "protocol".

You are advocating different in-band encoding signatures for different 
formats.  I think that this is a significant burden to users and speficiation 
developers.


----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA21311 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:46:01 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21306 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:46:00 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216475223900t6q0re>; Wed, 22 Mar 2000 16:47:52 +0000
Message-Id: <200003221647.AA02001@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:47:22 +0900
To: ietf-xml-mime@imc.org
Subject: Fwd: Re: Default of the charset parameter
From: MURATA Makoto <muraw3c@attglobal.net>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

This is forwarding message...

MURATA Makoto wrote...

> Proposal:
> 
> The default of the charset parameter of text/xml and
> application/xml is UTF-8 rather than US-ASCII (RFC 2045) or
> ISO-8859-1 (RFC 2068 [HTTP/1.1]).

I believe that UTF-8 is a better default than ISO-8859-1 or US-ASCII.  
But if HTTP people and MIME people does not accept such revision, 
I would propose that the charset parameter of text/xml and application/xml 
be mandatory.  In other words, there will be no default.



----
MURATA Makoto  muraw3c@attglobal.net


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id IAA21304 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:46:00 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21299 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:45:59 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216475023900t6q0qe>; Wed, 22 Mar 2000 16:47:50 +0000
Message-Id: <200003221647.AA02000@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:47:20 +0900
To: ietf-xml-mime@imc.org
Subject: Fwd: Default of the charset parameter
From: MURATA Makoto <muraw3c@attglobal.net>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In the W3C XML SIG, Kurt Conrad and I wrote this summary for the 
discussion of XML media types.


Kurt Conrad wrote...
Proposal:

The default of the charset parameter of text/xml and
application/xml is UTF-8 rather than US-ASCII (RFC 2045) or
ISO-8859-1 (RFC 2068 [HTTP/1.1]).

Criteria:

The default of this parameter is an interesting issue.
There are conflicting RFC's and a recommendation.  

In RFC 2046 (MIME: Media types), the default is US-ASCII.

> 4.1.2.  Charset Parameter [RFC 2046]
[snip]
>   Unlike some other parameter values, the values of the charset
>   parameter are NOT case sensitive.  The default character set, which
>   must be assumed in the absence of a charset parameter, is US-ASCII.

In RFC 2068 (HTTP/1.1), the default is ISO-8859-1.

>3.7.1 Canonicalization and Text Defaults [RFC 2068]
[snip]
>   The "charset" parameter is used with some media types to define the
>   character set (section 3.4) of the data. When no explicit charset
>   parameter is provided by the sender, media subtypes of the "text"
>   type are defined to have a default charset value of "ISO-8859-1" when
>   received via HTTP. Data in character sets other than "ISO-8859-1" or
>   its subsets MUST be labeled with an appropriate charset value.

HTML 4.0 further overrides this decision.

>5.2.2 Specifying the character encoding  [HTML 4.0]
[snip]
>The HTTP protocol ([RFC2068], section 3.7.1) mentions
>ISO-8859-1 as a default character encoding when the
>"charset" parameter is absent from the "Content-Type" header
>field. In practice, this recommendation has proved useless
>because some servers don't allow a "charset" parameter to be
>sent, and others may not be configured to send the
>parameter. Therefore, user agents must not assume any
>default value for the "charset" parameter.
>
>To address server or configuration limitations, HTML
>documents may include explicit information about the
>document's character encoding; the META element can be used
>to provide user agents with this information.
>
>For example, to specify that the character encoding of the
>current document is "EUC-JP", a document should include the
>following META declaration:

><META http-equiv="Content-Type" content="text/html;
>charset=EUC-JP"> The META declaration must only be used when
>the character encoding is organized such that ASCII
>characters stand for themselves (at least until the META
>element is parsed). META declarations should appear as early
>as possible in the HEAD element.
>
>For cases where neither the HTTP protocol nor the META
>element provides information about the character encoding of
>a document, HTML also provides the charset attribute on
>several elements. By combining these mechanisms, an author
>can greatly improve the chances that, when the user
>retrieves a resource, the user agent will recognize the
>character encoding.

RFC 2130 (The Report of the IAB Character Set Workshop)
provides a guideline for the use of character sets on the
Internet.  RFC 2130 recommends UTF-8 as the default for new
protocols.

>0: Executive summary [RFC 2130]
>   This report recommends the use of ISO 10646 as the default Coded
>   Character Set, and UTF-8 as the default Character Encoding Scheme in
>   the creation of new protocols or new version of old protocols which
>   transmit text. These defaults do not deprecate the use of other
>   character sets when and where they are needed; they are simply
>   intended to provide guidance and a specification for interoperability.

Since XML is a new application in the Internet, the best
default is UTF-8, as recommended by RFC2130.  There is no
need to change existing HTTP/1.1 Web servers.  There is no
need to consider backward compatibility of already installed
XML documents.  We can start from scratch.

One potential drawback is fallback to text/plain.  Since the
default of HTTP/1.1 is ISO-8859-1, fallback to text/plain
might cause corrupted data.  However, we do not think that 
this is a major problem.  


References:

HTML 4.0 Specification
   http://www.w3.org/TR/REC-html40/

RFC 2130
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2130.txt

RFC 2045
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2045.txt

RFC 2068
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2068.txt



----
MURATA Makoto  muraw3c@attglobal.net


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id IAA21297 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:45:58 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21289 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:45:57 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216474723900t6q0pe>; Wed, 22 Mar 2000 16:47:48 +0000
Message-Id: <200003221647.AA02003@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:47:29 +0900
To: ietf-xml-mime@imc.org
Subject: Fwd: Text/xml vs application/xml
From: MURATA Makoto <muraw3c@attglobal.net>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In the W3C XML SIG, Kurt Conrad and I wrote this summary for the 
discussion of XML media types.


Kurt Conrad wrote...
Proposal:

This RFC will introduce both text/xml and application/xml.
Text/xml is recommended for entities that would be meaningful
to a human being without XML processing.  (Thus, text/xml is
always appropriate for external DTD subsets and external
parameter entities.)  Application/xml is recommended for all
others.

Transmission of XML documents encoded in UTF-16 or UCS-2 via
the SMTP protocol is a special case.  For this purpose, we
cannot use text/xml, because of the line termination rule of
MIME.  Application/xml is recommended, instead.  (Note that
the XML PR needs slight revision, if this proposed decision
is accepted.)


Criteria:

RFC 2046 provides the definition of top-level media 
types "text" and "application".  The definition of 
"text" is as below:

>3.  Overview Of The Initial Top-Level Media Types [RFC 2046]
>   The five discrete top-level media types are:
>    (1)   text -- textual information.  The subtype "plain" in
>          particular indicates plain text containing no
>          formatting commands or directives of any sort. Plain
>          text is intended to be displayed "as-is". No special
>          software is required to get the full meaning of the
>          text, aside from support for the indicated character
>          set. Other subtypes are to be used for enriched text in
>          forms where application software may enhance the
>          appearance of the text, but such software must not be
>          required in order to get the general idea of the
>          content.  Possible subtypes of "text" thus include any
>          word processor format that can be read without
>          resorting to software that understands the format.  In
>          particular, formats that employ embeddded binary
>          formatting information are not considered directly
>          readable. A very simple and portable subtype,
>          "richtext", was defined in RFC 1341, with a further
>          revision in RFC 1896 under the name "enriched".

[snip]
>4.1.  Text Media Type [RFC 2046]
>
>   The "text" media type is intended for sending material which is
>   principally textual in form.  A "charset" parameter may be used to
>   indicate the character set of the body text for "text" subtypes,
>   notably including the subtype "text/plain", which is a generic
>   subtype for plain text.  Plain text does not provide for or allow
>   formatting commands, font attribute specifications, processing
>   instructions, interpretation directives, or content markup.  Plain
>   text is seen simply as a linear sequence of characters, possibly
>   interrupted by line breaks or page breaks.  Plain text may allow the
>   stacking of several characters in the same position in the text.
>   Plain text in scripts like Arabic and Hebrew may also include
>   facilitites that allow the arbitrary mixing of text segments with
>   opposite writing directions.
>
>   Beyond plain text, there are many formats for representing what might
>   be known as "rich text".  An interesting characteristic of many such
>   representations is that they are to some extent readable even without
>   the software that interprets them.  It is useful, then, to
>   distinguish them, at the highest level, from such unreadable data as
>   images, audio, or text represented in an unreadable form. In the
>   absence of appropriate interpretation software, it is reasonable to
>   show subtypes of "text" to the user, while it is not reasonable to do
>   so with most nontextual data. Such formatted textual data should be
>   represented using subtypes of "text".
>
>4.1.1.  Representation of Line Breaks [RFC 2046]
>
>[snip]


It is quite clear that most XML documents belong to the 
"text" type.

Meanwhile, the top-level type "application" is defined as
below:

>3.  Overview Of The Initial Top-Level Media Types [RFC 2046]
snip
>(5)   application -- some other kind of data, typically
>          either uninterpreted binary data or information to be
>          processed by an application.  The subtype "octet-
>          stream" is to be used in the case of uninterpreted
>          binary data, in which case the simplest recommended
>          action is to offer to write the information into a file
>          for the user.  The "PostScript" subtype is also defined
>          for the transport of PostScript material.  Other
>          expected uses for "application" include spreadsheets,
>          data for mail-based scheduling systems, and languages
>          for "active" (computational) messaging, and word
>          processing formats that are not directly readable.
>          Note that security considerations may exist for some
>          types of application data, most notably
>          "application/PostScript" and any form of active
>          messaging.  These issues are discussed later in this
>          document.
[snip]
>4.5.  Application Media Type [RFC 2046]
>   The "application" media type is to be used for discrete data which do
>   not fit in any of the other categories, and particularly for data to
>   be processed by some type of application program.  This is
>   information which must be processed by an application before it is
>   viewable or usable by a user.  Expected uses for the "application"
>   media type include file transfer, spreadsheets, data for mail-based
>   scheduling systems, and languages for "active" (computational)
>   material.  (The latter, in particular, can pose security problems
>   which must be understood by implementors, and are considered in
>   detail in the discussion of the "application/PostScript" media type.)
>   For example, a meeting scheduler might define a standard
>   representation for information about proposed meeting dates.  An
>   intelligent user agent would use this information to conduct a dialog
>   with the user, and might then send additional material based on that
>   dialog.  More generally, there have been several "active" messaging
>   languages developed in which programs in a suitably specialized
>   language are transported to a remote location and automatically run
>   in the recipient's environment.
>   Such applications may be defined as subtypes of the "application"
>   media type. This document defines two subtypes:
>   octet-stream, and PostScript.
>   The subtype of "application" will often be either the name or include
>   part of the name of the application for which the data are intended.
>   This does not mean, however, that any application program name may be
>   used freely as a subtype of "application".


Probably, some XML data belong to this class.  This is 
one reason to introduce application/xml.

Another reason for application/xml is the delivery of XML 
documents in UTF-16 by the SMTP protocol.  RFC 2046 
has a very strict rule for line termination, which makes 
it impossible to use UTF-16.  Although HTTP loosens 
this rule, the SMTP protocol does not.  Thus, the 
only choice is application/xml.


References:

RFC 1896
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1896.txt

RFC 1341
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1341.txt

RFC 2046
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2046.txt



----
MURATA Makoto  muraw3c@attglobal.net


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id IAA21296 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:45:58 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21287 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:45:56 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216474423900t6q0oe>; Wed, 22 Mar 2000 16:47:45 +0000
Message-Id: <200003221647.AA02002@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:47:27 +0900
To: ietf-xml-mime@imc.org
Subject: Fwd: Determination of encoding/character set
From: MURATA Makoto <muraw3c@attglobal.net>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In the W3C XML SIG, Kurt Conrad and I wrote this summary for the 
discussion of XML media types.


Kurt Conrad wrote...
Proposal:

MIME types (text/xml and application/xml) for XML documents
have the charset parameter.  The encoding method is determined
by this parameter only.  All other information is for error
recovery only.


Criteria:

This issue has been very controversial.  Which should determine
encoding, the charset parameter of the MIME header, or the encoding
PI (or BOM)?  

This issue is very closely related with the next issue
(text/xml and/or application/xml?), as the top-level type
"text" provides have the charset parameter.

There are two relevant RFC's, namely RFC 2130 (The Report of the IAB
Character Set Workshop) and RFC 2046 (MIME Part Two: Media Types).

RFC 2130 provides a guideline for the use of character sets on the
Internet.  For XML to be a good citizen in the Internet, we have to
follow this guideline wherever possible.

In RFC 2130, determination of character encoding is a
protocol issue.  RFC 2130 clearly recommends the use of MIME
headers to determine character encoding (Character Encoding
Scheme in the terminology of RFC 2130).

>3.3:  Determining which values of CCS, CES, and TES are used [RFC 2130]
>
>   To completely specify which CCS, CES, and TES are used in a specific
>   text transmission, there needs to be a consistent set of labels for
>   specifying which CCS, CES, and TES are used.  Once the appropriate
>   mechanisms have been selected, there are six techniques for attaching
>   these labels to the data.
>
>   The labels themselves are named and registered, either with IANA
>   [IANA] or with some other registry.  Ideally, their definitions are
>   retrievable from some registration authority.
>
>   Labels may be determined in one of the following ways:
>
>   -  Determined by guessing, where the receiver of the text has to
>      guess the values of the CCS, CES, and TES. For example: "I got
>      this from Sweden so it's probably  ISO-8859-1."  This is
>      obviously not a very foolproof way to decode text.
>   -  Determined by the standard, where the protocol used to transmit
>      the data has made documented choices of CCS, CES, and TES in the
>      standard. Thus, the encodings used are known through the
>      access protocol, for example HTTP [HTTP] uses (but is not
>      limited to) ISO-8859-1, SMTP uses US-ASCII.
>   -  Attached to the transfer envelope, where the descriptive labels are
>      attached to the wrapper placed around the text for transport.
>      MIME headers are a good example of this technique.
>   -  Included in the data stream, where the data stream itself has
>      been encoded in such a way as to signal the character set used.
>      For example, ISO-2022 encodes the data with escape sequences to
>      provide information on the character subset currently being used.
>   -  Agreed by prior bilateral agreement, where some out-of-band
>      negotiation has allowed the text transmitter and receiver to
>      determine the CCS, CES, and  TES for the transmitted text.
>   -  Agreed to by negotiation during some phase, typically
>      initialization of the protocol.
>
>3.3.1:  Recommendations for value specification mechanisms [RFC 2130]
>
>   While each of these techniques (with the  exception of guessing) is
>   useful in particular situations, interoperability requires a more
>   consistent set of techniques.  Thus, we recommend that MIME
>   registered values be used for all tagging of character sets and
>   languages UNLESS there is an existing mechanism for determining the
>   required information using one of the other techniques (except
>   guessing).  This recommendation will require a fair bit of work on
>   the part of protocol designers, implementors, the IETF, the IESG, and
>   the IAB.

The top-level media type "text" already provides the charset
parameter (RFC2046).  Thus, if we use text/*, encoding
should determined by this parameter only.

>4.1.2.  Charset Parameter [RFC 2046]
>
>   A critical parameter that may be specified in the Content-Type field
>   for "text/plain" data is the character set.  This is specified with a
>   "charset" parameter, as in:
>
>     Content-type: text/plain; charset=iso-8859-1
>
>   Unlike some other parameter values, the values of the charset
>   parameter are NOT case sensitive.  The default character set, which
>   must be assumed in the absence of a charset parameter, is US-ASCII.
>
>   The specification for any future subtypes of "text" must specify
>   whether or not they will also utilize a "charset" parameter, and may
>   possibly restrict its values as well.  For other subtypes of "text"
>   than "text/plain", the semantics of the "charset" parameter should be
>   defined to be identical to those specified here for "text/plain",
>   i.e., the body consists entirely of characters in the given charset.
>   In particular, definers of future "text" subtypes should pay close
>   attention to the implications of multioctet character sets for their
>   subtype definitions.
>
>   The charset parameter for subtypes of "text" gives a name of a
>   character set, as "character set" is defined in RFC 2045.  The rules
>   regarding line breaks detailed in the previous section must also be
>   observed -- a character set whose definition does not conform to
>   these rules cannot be used in a MIME "text" subtype.

We have to use the top-level type "application" for
transmitting XML documents in UTF-16 or UCS-2 via the SMTP
protocol, because of the line termination rule of MIME.
However, even in this case, RFC 2046 suggests the charset
parameter (4.1.2).

>   Other media types than subtypes of "text" might choose to employ the
>   charset parameter as defined here, but with the CRLF/line break
>   restriction removed.  Therefore, all character sets that conform to
>   the general definition of "character set" in RFC 2045 can be
>   registered for MIME use.

HTML 4.0 already uses the charset parameter.

>5.2 Character encodings [HTML 4.0]
>
>What this specification calls a character encoding is known
>by different names in other specifications (which may cause
>some confusion). However, the concept is largely the same
>across the Internet. Also, protocol headers, attributes, and
>parameters referring to character encodings share the same
>name -- "charset" -- and use the same values from the [IANA]
>registry (see [CHARSETS] for a complete list).
>
>The "charset" parameter identifies a character encoding,
>which is a method of converting a sequence of bytes into a
>sequence of characters. This conversion fits naturally with
>the scheme of Web activity: servers send HTML documents to
>user agents as a stream of bytes; user agents interpret them
>as a sequence of characters. The conversion method can range
>from simple one-to-one correspondence to complex switching
>schemes or algorithms.


How do we specify the charset parameter?  HTML 4.0 
talks about server configuration.

>5.2.2 Specifying the character encoding [HTML 4.0]
>
>How does a server determine which character encoding applies
>for a document it serves? Some servers examine the first few
>bytes of the document, or check against a database of known
>files and encodings. Many modern servers give Web masters
>more control over charset configuration than old servers do. 
>Web masters should use these mechanisms to send out a
>"charset" parameter whenever possible, but should take care
>not to identify a document with the wrong "charset"
>parameter value.

It has been argued that casual users cannot set the charset
parameter.  However, the most popular WWW server, namely
Apache, allows casual users to set the charset parameter
easily.  A casual user only has to make a file named .htaccess 
in his or her directory and add a line as below:

	AddType  'text/xml; charset=utf-8'    xml

(See http://www.apache.org/docs/mod/mod_mime.html#addtype).

Some WWW servers do not provide this feature (.htaccess),
but it is usually possible to use file extensions to specify
the charset parameter.  For example, the file extension
"xml8" specifies the charset parameter "utf-8", if the WWW
server configuration file has a line as below:

   type="text/xml; charset=utf-8" exts=xml8


References:

Apache HTTP Server Version 1.3 / Module mod_mime / Directive AddType
   http://www.apache.org/docs/mod/mod_mime.html#addtype

HTML 4.0 Specification
   http://www.w3.org/TR/REC-html40/

RFC 2130
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2130.txt

RFC 2045
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2045.txt

RFC 2046
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2046.txt




----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA21286 for ietf-xml-mime-bks; Wed, 22 Mar 2000 08:45:56 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA21281 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 08:45:55 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.183]) by prserv.net (out4) with SMTP id <2000032216474223900t6q0ne>; Wed, 22 Mar 2000 16:47:43 +0000
Message-Id: <200003221647.AA01997@t3knz.attglobal.net>
Date: Thu, 23 Mar 2000 01:47:12 +0900
To: ietf-xml-mime@imc.org
Subject: History of the charset issue
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <200003151641.AA01930@t3knz.attglobal.net>
References: <200003151641.AA01930@t3knz.attglobal.net>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

The charset issue is very old.  I started to participate in its
discussion in 1997, but it started away back in history.  In this
message, I cover the history of XML media types and do not cover that
of HTML media types.

In 1996, IETF had a workshop for charset issues. RFC 2130 is a report
of this workshop, and RFC 2277 is based on it.  RFC 2130 has the
following recommendation:

	The recommended specification scheme is the MIME "charset"
	specification, using the IANA "charset" specifications.

W3C I18N WG is also recommending the use of the charset parameter and
is trying to encourage WWW server developers to support the charset
parameter.

	http://www.w3.org/International/O-HTTP-charset.html
	http://www.w3.org/International/O-charset.html

After long discussion at W3C XML SIG, RFC 2376 was published.  I am
forwarding some of the summaries Kurt and I wrote together.  I believe
that then-discussion covers all points that have been raised recently
in this mailing list.

The I18N WG requested to the XML Syntax WG that determination of the
charset as described in RFC 2130 should not be changed, and the XML
Syntax WG said that there are no plans to change it.

	http://www.w3.org/International/Group/1998/12/NOTE-i18n-rev-xml-19981221
	http://lists.w3.org/Archives/Member/w3c-i18n-ig/1999Apr/0006.html

Although no new arguments have appeared, new WWW servers have
appeared.  IIS 4.0 and 5.0 supports the charset parameter.  
(I am forwarding some info written in Japanese to Martin.)  In 
my understanding, W3C I18N WG is encouraging such WWW servers.

I think we should just encourage conformant implementations without
changing the specification.  Since new WWW servers provide better
support, we only have to improve them and use them correctly.  The
advent of XML and Unicode provides a perfect chance.

The charset issue has made people burn out.  I believe that 
implementation and experiments are more important than repeating 
the same agrument.

Cheers,

----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id GAA19132 for ietf-xml-mime-bks; Wed, 22 Mar 2000 06:45:09 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id GAA19127 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 06:45:08 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.97]) by prserv.net (out4) with SMTP id <2000032214465823900r6gige>; Wed, 22 Mar 2000 14:46:59 +0000
Message-Id: <200003221446.AA01978@t3knz.attglobal.net>
From: MURATA Makoto <muraw3c@attglobal.net>
Date: Wed, 22 Mar 2000 23:46:54 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Meida types and stylesheet linking
In-Reply-To: <38D8CE54.6CC45E61@w3.org>
References: <38D8CE54.6CC45E61@w3.org>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Meida types and stylesheet linking",
Chris Lilley wrote...
 >> Suppose that a stylesheet-linking PI uses a fragment identifier
 >> (e.g., XPointer) so as to reference to an embedded stylesheet.  We
 >> are no longer forced to specify some media type.
 >
 >But not prevented from doing so either. 

This is correct.

 >The media type is being used as a
 >label. But it is not being used as a value for the Content-Type mail or
 >HTTP header, so is not asserting that the entire resource is of this type.

Historically, HTML has (ab)used media types even when no MIME entities are 
involved.  For example, embedded stylesheets represented by <style> of 
HTML always have media types.

Cheers,

----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id FAA17990 for ietf-xml-mime-bks; Wed, 22 Mar 2000 05:43:08 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id FAA17986 for <ietf-xml-mime@imc.org>; Wed, 22 Mar 2000 05:43:07 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id IAA15821; Wed, 22 Mar 2000 08:44:58 -0500
Message-ID: <38D8CE54.6CC45E61@w3.org>
Date: Wed, 22 Mar 2000 14:44:52 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: MURATA Makoto <muraw3c@attglobal.net>
CC: ietf-xml-mime@imc.org
Subject: Re: Meida types and stylesheet linking
References: <200003220621.AA01975@t3knz.attglobal.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:
> 
> The XML Core WG of W3C recently announced an erratum for "Associating
> Style Sheets with XML documents Version 1.0".
> 
> http://www.w3.org/1999/06/REC-xml-stylesheet-19990629/errata
> -----------------------------------------------
>         Description: The pseudo-attribute "type" declared with type
>         CDATA #REQUIRED is mandatory.
> 
>         Correction: The declaration is changed to: type CDATA #IMPLIED
>         so that it is optional.
> 
>         Rationale: This pseudo-attribute is adopted from HTML 4.0, where
>         it is optional.
> -------------------------------------------------------------------
> 
> Suppose that a stylesheet-linking PI uses a fragment identifier
> (e.g., XPointer) so as to reference to an embedded stylesheet.  We
> are no longer forced to specify some media type.

But not prevented from doing so either. The media type is beingusxed as a
label. But it is not being used as a value for the Content-Type mail or
HTTP header, so is not assertingthatthe entire resource is of this type.

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id WAA27596 for ietf-xml-mime-bks; Tue, 21 Mar 2000 22:20:15 -0800 (PST)
Received: from prserv.net (out1.prserv.net [32.97.166.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id WAA27592 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 22:20:14 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.196]) by prserv.net (out1) with SMTP id <2000032206215425200hp4kne>; Wed, 22 Mar 2000 06:21:55 +0000
Message-Id: <200003220621.AA01975@t3knz.attglobal.net>
From: MURATA Makoto <muraw3c@attglobal.net>
Date: Wed, 22 Mar 2000 15:21:59 +0900
To: ietf-xml-mime@imc.org
Subject: Meida types and stylesheet linking
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

The XML Core WG of W3C recently announced an erratum for "Associating 
Style Sheets with XML documents Version 1.0".

http://www.w3.org/1999/06/REC-xml-stylesheet-19990629/errata
-----------------------------------------------
	Description: The pseudo-attribute "type" declared with type 
	CDATA #REQUIRED is mandatory. 

	Correction: The declaration is changed to: type CDATA #IMPLIED 
	so that it is optional. 

	Rationale: This pseudo-attribute is adopted from HTML 4.0, where 
	it is optional. 
-------------------------------------------------------------------

Suppose that a stylesheet-linking PI uses a fragment identifier 
(e.g., XPointer) so as to reference to an embedded stylesheet.  We 
are no longer forced to specify some media type.

Cheers,







----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id TAA16264 for ietf-xml-mime-bks; Tue, 21 Mar 2000 19:06:05 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id TAA16251; Tue, 21 Mar 2000 19:05:38 -0800 (PST)
Received: from thinkpad (slip-32-100-198-128.ca.us.prserv.net [32.100.198.128]) by hesketh.net (8.9.3/8.9.3) with SMTP id WAA08573; Tue, 21 Mar 2000 22:07:15 -0500
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
Message-Id: <4.0.1.20000321123919.0116a8b0@216.27.10.33>
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Tue, 21 Mar 2000 12:45:58 -0500
To: Valdis.Kletnieks@vt.edu
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion 
Cc: ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <200003210834.e2L8YaC22436@black-ice.cc.vt.edu>
References: <Your message of "Mon, 20 Mar 2000 17:59:13 EST."             <200003202258.RAA27431@hesketh.net> <Your message of "Mon, 20 Mar 2000 15:00:27 EST." <200003201959.OAA17209@hesketh.net> <200003202258.RAA27431@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 03:34 AM 3/21/00 -0500, Valdis.Kletnieks@vt.edu wrote:
>If you haven't been in the industry long enough that it takes you more
>than 30 seconds to think of at least 5 vendors(*) of software who aren't
>dissuaded from doing something just because the standard says explicitly
>not to do that, you haven't been around enough to see *seriously* broken
>software.

I share your concerns with vendors who stray from standards, and spend far
too much of my time testing implementation conformance.

However, this paranoia about vendors doesn't mean that we shouldn't create
new standards or add new features to existing approaches.  It means that we
should be as clear as possible about what is specified, and make certain
that those new features will fit well within the context of those existing
approaches.

Vendors will stray, but at least we'll have something unambiguous to
compare them against.

So far as I can tell, this suffix is actually less prone to 'standards
drift' and has a greater chance of being adopted in the form in which it is
proposed than any of the parameter-based options presented as alternatives.

The existence of '*seriously* broken software' doesn't seem like an
argument against this particular innovation.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id TAA16224 for ietf-xml-mime-bks; Tue, 21 Mar 2000 19:04:37 -0800 (PST)
Received: from sh.w3.mag.keio.ac.jp (sh.w3.mag.keio.ac.jp [133.27.194.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id TAA16219 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 19:04:35 -0800 (PST)
Received: from enoshima (dhcp-100-224.mag.keio.ac.jp [133.27.195.224]) by sh.w3.mag.keio.ac.jp (8.9.3/3.7W) with ESMTP id MAA01707; Wed, 22 Mar 2000 12:06:00 +0900 (JST)
Message-Id: <4.2.0.58.J.20000322115620.03467dc0@sh.w3.mag.keio.ac.jp>
X-Sender: duerst@sh.w3.mag.keio.ac.jp
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Wed, 22 Mar 2000 11:59:20 +0900
To: ned.freed@INNOSOFT.COM, Keith Moore <moore@cs.utk.edu>
From: "Martin J. Duerst" <duerst@w3.org>
Subject: Re: Finishing(!) the XML-tagging discussion
Cc: Graham Klyne <GK@dial.pipex.com>, Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org
In-Reply-To: <01JNAEGY3OOG00004D@MAUVE.INNOSOFT.COM>
References: <"Your message dated Tue, 21 Mar 2000 08:26:34 -0500" <200003211326.IAA19007@astro.cs.utk.edu> <4.2.2.20000321081348.00a75320@pop.dial.pipex.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 00/03/21 08:03 -0800, ned.freed@INNOSOFT.COM wrote:

I understand this was a strawman, too, but just to make sure:

>accept-features: (&((language=ger),(language=fre)))

should definitely be

>accept-features: (&((language=de),(language=fr)))


RFC 1766, referenced from draft-hoffman-char-lang-media-02.txt,
does not currently permit three-letter language codes,
and there is no plan to use them where two-letter codes
are already available.


Regards, Martin.



Received: by ns.secondary.com (8.9.3/8.9.3) id KAA03202 for ietf-xml-mime-bks; Tue, 21 Mar 2000 10:39:57 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id KAA03180; Tue, 21 Mar 2000 10:39:14 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN8A01MCOW00004D@MAUVE.INNOSOFT.COM>; Tue, 21 Mar 2000 10:40:40 -0800 (PST)
Date: Tue, 21 Mar 2000 10:39:34 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Tue, 21 Mar 2000 11:43:23 -0500" <38D7A6AB.C9FEE176@reutershealth.com>
To: John Cowan <jcowan@reutershealth.com>
Cc: ned.freed@INNOSOFT.COM, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JNAIKZSJ4O00004D@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
References: <200003201814.NAA26399@astro.cs.utk.edu> <01JN94MCPL8U00004D@MAUVE.INNOSOFT.COM> <01JN94MCPL8U00004D@MAUVE.INNOSOFT.COM> <01JNAB29SEMK00004D@MAUVE.INNOSOFT.COM>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> ned.freed@INNOSOFT.COM wrote:

> > AFAIK, no. I don't see a reason to change an existing registration in any
> > case -- when we've made changes to the media types system in the past
> > we've grandfathered old registrations that don't "fit" the new rules.

> Not quite the issue.  VRML, which is not XML-compatible, is application/vrml.
> If VRML is reformulated to be XML-compatible (in a new version), do we
> then register application/vrml-xml?  I think we do.

Sure, and this is entirely consistent since what you're describing is a new
media type. (An old VRML handler certainly won't be able to process it...)

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id KAA03151 for ietf-xml-mime-bks; Tue, 21 Mar 2000 10:37:49 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id KAA03147 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 10:37:47 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN8A01MCOW00004D@MAUVE.INNOSOFT.COM> for ietf-xml-mime@imc.org; Tue, 21 Mar 2000 10:39:17 -0800 (PST)
Date: Tue, 21 Mar 2000 10:35:53 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Tue, 21 Mar 2000 08:43:50 -0800" <3.0.32.20000321084346.013211b0@pop.intergate.ca>
To: Tim Bray <tbray@textuality.com>
Cc: John Cowan <jcowan@reutershealth.com>, Graham Klyne <GK@dial.pipex.com>, Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org
Message-id: <01JNAIJ9O2UE00004D@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > > (b) register a conneg feature tag to indicate XML content;  I would suggest
> > > something along the lines of (XML-namespace="<URI>"), where <URI> is an XML
> > > namespace identifier, as that would allow negotiation to specific XML
> > > applications.

> > Which namespace?  It is already common for documents to refer to more than
> > one XML namespace.

> Hmmm, if you believe this is a good idea, the answer is obvious: the
> namespace of the root element.  The notion of content-negotiation based
> on namespaces had never crossed my mind, but it has a very powerful feel
> to it; surely somebody has thought of this?  Particularly in b2b-land,
> where it seems inevitable that there are going to be several competing
> ways to encode an invoice or a catalogue-enquiry or whatever, each ID'ed
> by namespace. -Tim

This also illustrates another characteristic of conneg -- the simple labels
that work pretty well as a dispatch mechanism in MIME are necessary but
nowhere near sufficient. In this particular case this feature tag is
becoming something that's skewed away from what the -xml suffix provides.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id KAA02796 for ietf-xml-mime-bks; Tue, 21 Mar 2000 10:29:41 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id KAA02792 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 10:29:40 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN8A01MCOW00004D@MAUVE.INNOSOFT.COM> for ietf-xml-mime@imc.org; Tue, 21 Mar 2000 10:31:11 -0800 (PST)
Date: Tue, 21 Mar 2000 10:25:10 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Tue, 21 Mar 2000 10:27:49 -0500" <38D794F5.30432767@reutershealth.com>
To: John Cowan <jcowan@reutershealth.com>
Cc: Graham Klyne <GK@dial.pipex.com>, Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org
Message-id: <01JNAI97UWWW00004D@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
References: <4.2.2.20000321083733.00a399c0@pop.dial.pipex.com>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> Graham Klyne wrote:

> > (b) register a conneg feature tag to indicate XML content;  I would suggest
> > something along the lines of (XML-namespace="<URI>"), where <URI> is an XML
> > namespace identifier, as that would allow negotiation to specific XML
> > applications.

> Which namespace?  It is already common for documents to refer to more than
> one XML namespace.

Why not just list them all? Feature values can be an enumerated list. (Although
I don't quite see how these get compared... Neither RFC 2533 or RFC 2738 seem
to cover this...)

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA01420 for ietf-xml-mime-bks; Tue, 21 Mar 2000 09:19:25 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA01416 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 09:19:24 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id JAA14553; Tue, 21 Mar 2000 09:18:52 -0800
Message-Id: <3.0.32.20000321092005.01337d20@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Tue, 21 Mar 2000 09:20:06 -0800
To: John Cowan <jcowan@reutershealth.com>, "ietf-xml-mime@imc.org" <ietf-xml-mime@imc.org>
From: Tim Bray <tbray@textuality.com>
Subject: foo/bar and foo/bar-xml
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 12:12 PM 3/21/00 -0500, John Cowan wrote:
>I certainly agree that foo/bar and foo/bar-xml should not both exist if
>they are synonymous.  But if the VRML folks find it necessary to abandon
>the name VRML, that ought not to be driven by the adoption of a new
>syntax internally.

I think this isn't as off-topic as it sounds.  X3D is *not* just a 
restatement of VRML in XML syntax, so it really is a new kind of thing
with a new media type.  In fact, the notion of taking an existing 
vocabulary and making no changes to it aside from XML syntax strikes
me as a rare and unusual thing; thus one of the things I *don't* worry
about is the coexistence of foo/bar and foo/bar-xml.

Is XHTML an exception?  I don't think so - the differences between XHTML
and what you can get away with in text/html run surprisingly deep. -Tim

PS: See?  I changed the subject line!  Wow.


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA01313 for ietf-xml-mime-bks; Tue, 21 Mar 2000 09:13:23 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA01309 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 09:13:22 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id JAA14538; Tue, 21 Mar 2000 09:15:10 -0800
Message-Id: <3.0.32.20000321091622.01310560@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Tue, 21 Mar 2000 09:16:25 -0800
To: John Cowan <jcowan@reutershealth.com>, "ietf-xml-mime@imc.org" <ietf-xml-mime@imc.org>
From: Tim Bray <tbray@textuality.com>
Subject: Re: Finishing the XML-tagging discussion
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 12:10 PM 3/21/00 -0500, John Cowan wrote:
>> Hmmm, if you believe this is a good idea, the answer is obvious: the
>> namespace of the root element.
>
>Insufficient.  Suppose we have an XHTML-1.1 based document, and you
>are trying to negotiate depending on some non-HTML content in it
>(metadata, e.g.)  The root element's namespace is not particularly
>privileged, and may be the least important namespace in the document.

I think that content-negotiation is probably way out of its depth when trying 
to deal with seriously-compound documents.  Maybe this isn't a good idea after 
all.  At the end of the day, I suspect that there should be a simple (ideally 
one-to-one) mapping between namespaces and media-types.  Which would make the 
namespace idea superfluous. -Tim


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA01227 for ietf-xml-mime-bks; Tue, 21 Mar 2000 09:09:42 -0800 (PST)
Received: from mail.reutershealth.com (mail.reutershealth.com [204.243.9.36]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA01223 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 09:09:41 -0800 (PST)
Received: from reutershealth.com (IDENT:cowan@skunk.reutershealth.com [204.243.9.153]) by mail.reutershealth.com (Pro-8.9.3/8.9.3) with ESMTP id MAA09245 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 12:11:28 -0500 (EST)
Message-ID: <38D7AD70.EAAC7CDE@reutershealth.com>
Date: Tue, 21 Mar 2000 12:12:16 -0500
From: John Cowan <jcowan@reutershealth.com>
Organization: Reuters Health Information
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.5-15 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: "ietf-xml-mime@imc.org" <ietf-xml-mime@imc.org>
Subject: Re: Finishing the XML-tagging discussion
References: <3.0.32.20000321085449.013374c0@pop.intergate.ca>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Tim Bray wrote:

> If/when that happens, it's not going to be called VRML any more; it's going
> to be a new kind of thing.  I believe currently they go by the internal
> monicker of "x3d"; so that would be
> 
> model/x3d-xml
> 
> The idea is to avoid the existence of both foo/bar and foo/bar-xml - SVG,
> for example, plans to (already has?) register only image/svg-xml. -Tim

I certainly agree that foo/bar and foo/bar-xml should not both exist if
they are synonymous.  But if the VRML folks find it necessary to abandon
the name VRML, that ought not to be driven by the adoption of a new
syntax internally.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA01194 for ietf-xml-mime-bks; Tue, 21 Mar 2000 09:07:36 -0800 (PST)
Received: from mail.reutershealth.com (mail.reutershealth.com [204.243.9.36]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA01190 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 09:07:35 -0800 (PST)
Received: from reutershealth.com (IDENT:cowan@skunk.reutershealth.com [204.243.9.153]) by mail.reutershealth.com (Pro-8.9.3/8.9.3) with ESMTP id MAA09218 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 12:09:22 -0500 (EST)
Message-ID: <38D7ACF2.DE58C5B4@reutershealth.com>
Date: Tue, 21 Mar 2000 12:10:10 -0500
From: John Cowan <jcowan@reutershealth.com>
Organization: Reuters Health Information
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.5-15 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: "ietf-xml-mime@imc.org" <ietf-xml-mime@imc.org>
Subject: Re: Finishing the XML-tagging discussion
References: <3.0.32.20000321084346.013211b0@pop.intergate.ca>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Tim Bray wrote:

> Hmmm, if you believe this is a good idea, the answer is obvious: the
> namespace of the root element.

Insufficient.  Suppose we have an XHTML-1.1 based document, and you
are trying to negotiate depending on some non-HTML content in it
(metadata, e.g.)  The root element's namespace is not particularly
privileged, and may be the least important namespace in the document.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA00929 for ietf-xml-mime-bks; Tue, 21 Mar 2000 08:52:36 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA00919; Tue, 21 Mar 2000 08:52:34 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id IAA14469; Tue, 21 Mar 2000 08:53:37 -0800
Message-Id: <3.0.32.20000321085449.013374c0@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Tue, 21 Mar 2000 08:54:51 -0800
To: John Cowan <jcowan@reutershealth.com>, ned.freed@INNOSOFT.COM
From: Tim Bray <tbray@textuality.com>
Subject: Re: Finishing the XML-tagging discussion
Cc: Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 11:43 AM 3/21/00 -0500, John Cowan wrote:
>Not quite the issue.  VRML, which is not XML-compatible, is application/vrml.
>If VRML is reformulated to be XML-compatible (in a new version), do we
>then register application/vrml-xml?  I think we do.

If/when that happens, it's not going to be called VRML any more; it's going
to be a new kind of thing.  I believe currently they go by the internal
monicker of "x3d"; so that would be

model/x3d-xml

The idea is to avoid the existence of both foo/bar and foo/bar-xml - SVG,
for example, plans to (already has?) register only image/svg-xml. -Tim


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA00755 for ietf-xml-mime-bks; Tue, 21 Mar 2000 08:42:52 -0800 (PST)
Received: from mail.reutershealth.com (mail.reutershealth.com [204.243.9.36]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA00728; Tue, 21 Mar 2000 08:42:24 -0800 (PST)
Received: from reutershealth.com (IDENT:cowan@skunk.reutershealth.com [204.243.9.153]) by mail.reutershealth.com (Pro-8.9.3/8.9.3) with ESMTP id LAA08956; Tue, 21 Mar 2000 11:42:35 -0500 (EST)
Message-ID: <38D7A6AB.C9FEE176@reutershealth.com>
Date: Tue, 21 Mar 2000 11:43:23 -0500
From: John Cowan <jcowan@reutershealth.com>
Organization: Reuters Health Information
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.5-15 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: ned.freed@INNOSOFT.COM
CC: Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion
References: <200003201814.NAA26399@astro.cs.utk.edu> <01JN94MCPL8U00004D@MAUVE.INNOSOFT.COM> <01JN94MCPL8U00004D@MAUVE.INNOSOFT.COM> <01JNAB29SEMK00004D@MAUVE.INNOSOFT.COM>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

ned.freed@INNOSOFT.COM wrote:

> AFAIK, no. I don't see a reason to change an existing registration in any
> case -- when we've made changes to the media types system in the past
> we've grandfathered old registrations that don't "fit" the new rules.

Not quite the issue.  VRML, which is not XML-compatible, is application/vrml.
If VRML is reformulated to be XML-compatible (in a new version), do we
then register application/vrml-xml?  I think we do.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA00745 for ietf-xml-mime-bks; Tue, 21 Mar 2000 08:42:46 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA00740 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 08:42:45 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id IAA14412; Tue, 21 Mar 2000 08:42:36 -0800
Message-Id: <3.0.32.20000321084346.013211b0@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Tue, 21 Mar 2000 08:43:50 -0800
To: John Cowan <jcowan@reutershealth.com>, Graham Klyne <GK@dial.pipex.com>
From: Tim Bray <tbray@textuality.com>
Subject: Re: Finishing the XML-tagging discussion
Cc: Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 10:27 AM 3/21/00 -0500, John Cowan wrote:
>Graham Klyne wrote:
>> (b) register a conneg feature tag to indicate XML content;  I would suggest
>> something along the lines of (XML-namespace="<URI>"), where <URI> is an XML
>> namespace identifier, as that would allow negotiation to specific XML
>> applications.
>Which namespace?  It is already common for documents to refer to more than
>one XML namespace.

Hmmm, if you believe this is a good idea, the answer is obvious: the
namespace of the root element.  The notion of content-negotiation based
on namespaces had never crossed my mind, but it has a very powerful feel
to it; surely somebody has thought of this?  Particularly in b2b-land,
where it seems inevitable that there are going to be several competing
ways to encode an invoice or a catalogue-enquiry or whatever, each ID'ed
by namespace. -Tim



Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id IAA00699 for ietf-xml-mime-bks; Tue, 21 Mar 2000 08:41:23 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA00695 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 08:41:22 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN8A01MCOW00004D@MAUVE.INNOSOFT.COM> for ietf-xml-mime@imc.org; Tue, 21 Mar 2000 08:42:53 -0800 (PST)
Date: Tue, 21 Mar 2000 08:03:29 -0800 (PST)
Subject: Re: Finishing(!) the XML-tagging discussion
In-reply-to: "Your message dated Tue, 21 Mar 2000 08:26:34 -0500" <200003211326.IAA19007@astro.cs.utk.edu>
To: Keith Moore <moore@cs.utk.edu>
Cc: Graham Klyne <GK@dial.pipex.com>, Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org
Message-id: <01JNAEGY3OOG00004D@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <4.2.2.20000321081348.00a75320@pop.dial.pipex.com>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

(Apologies in advance for the syntax of my examples here. I'm not bothering
to check them carefully.)

> my position is that requiring a separate label to facilitate
> content negotiation makes it entirely too likely that the
> separate label will be incorrect or omitted.  so any proposal
> for an xml frob that says "don't use this frob for content-negotiation"
> is a non-starter - it's headed 180 degrees in the wrong direction.

Keith, with all due respect, this strikes me as a fine academically
pure view, but unfortunately one that's entirely divorced from reality.

The reality is that feature expressions and their associated tags are
being used in at least four different ways:

(1) As a standalone negotiation mechanism.
(2) As an adjunct to an existing negotiation mechanism.
(3) As a standalone means of labelling objects.
(4) As an adjunct to an existing object labelling mechanism.

These multiple different uses make it inevitable that tags will be registered
that result in silly states. Consider, for example, (1) and (2). (1) as
manifested in Internet FAX makes it necessary to register things like media
types and charsets as feature tags. But (2) as manifested in HTTP leads to the
possibility of:

   accept-charset: UTF-8
   accept-features: (charset=ISO-2022-JP)

Or consider (3) and (4). (3) again requires registration of things like
media types. But (4) in the case of email makes it possible to say:

   content-features: (media-type="text/plain")
   content-type: application/octet-stream

Now, the only ways I can think of to address this problem are:

(a) Prohibit the registration of tags that duplicate other labels. Problem
    is, this breaks (1) and (3) completely, and also breaks (2) and (4)
    in cases where some labels from other contexts aren't available. So
    this dog doesn't hunt.
(b) Prohibit the use of tags in a given context that duplicate labels used
    in that context. Problem is that such a rule is quite difficult to
    write given that the existing label set is a moving target in many
    contexts, and effectively unenforceable in any case. So this dog doesn't
    hunt either.
(c) Prohibit the use of tags whose values conflict with the labels in a
    given context. But this amounts to sayin, "Don't produce stuff with silly
    states"; it doesn't provide the guarantee you're demanding.
(d) Have different sorts of feature expressions for the different cases.
    But this multiplies the number of mechanisms to a ridiculous degree.

Moreover, given the complexity of feature expressions it actually is possible
to have silly states within a single expression:

accept-features: (&((charset=UTF-8),(charset=ISO-2022-JP)))

accept-features: (&((language=ger),(language=fre)))

And ironically, determining that an arbitrary feature expression actually
describes a silly state turns out to be NP-hard! (It's obviously equivalent to
the classic satisfiability problem.)

The bottom line here is that silly states abound in the world of content
negotiation. That's the nature of the beast, and there's nothing you can do to
change it. You are tilting at windmills here.

This is why I've always considered the criteria for viable content negotiation
labelling to be totally separate from the straightforward labelling we've
been talking about here. Trying to combine the two inevitably creates
requirements that are, so to speak, unsatisfiale.

Now, let's return for a moment to the issue at hand (please note I didn't use
XML once in any of the above examples). I've said previously that your proposal
of a content-type parameter leads directly and inevitably to a new feature tag
being needed, so if new feature tags are unacceptable to you your own proposal
has to be unacceptable as well. It occurs to me that some of the readers of
this thread may not understand why this is so. The reason is simple: The
defined mechanism for putting content-type information in feature expressions
(draft-ietf-conneg-content-features-02.txt) provides a means of  checking media
types in feature expressions but no means of checking that a given parameter is
present on an arbitrary media type. And the current doctrine is that rather
than developing a mechanism for checking arbitray media type parameters those
parameters important enough to check will receive their own tag. See, for
example, draft-hoffman-char-lang-media-02.txt, where one such tag is
proposed.

Amusingly enough, I sent a note to the IESG arguing that the omission
of a facility to check MIME parameters in feature expressions was an
oversight that should be corrected. But I was alone in believing this, so
it didn't happen. (You were, as I recall, supportive of the document as-is
and silent on this particular point.)

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA00600 for ietf-xml-mime-bks; Tue, 21 Mar 2000 08:34:03 -0800 (PST)
Received: from msw.mimesweeper.com (msw.mimesweeper.com [194.168.90.18]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA00596 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 08:34:01 -0800 (PST)
Received: from bell.mimesweeper.com (unverified) by msw.mimesweeper.com (Content Technologies SMTPRS 4.1.5) with ESMTP id <Tc2a85a12de4b16c71427@msw.mimesweeper.com>; Tue, 21 Mar 2000 16:38:12 +0000
Received: from GK-VAIO (gk-vaio.mimesweeper.com [194.168.90.137]) by bell.mimesweeper.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2448.0) id HDT541ZZ; Tue, 21 Mar 2000 16:37:02 -0000
Message-Id: <4.2.2.20000321163546.00b0d930@pop.dial.pipex.com>
X-Sender: maiw03@pop.dial.pipex.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Tue, 21 Mar 2000 16:37:16 +0000
To: John Cowan <jcowan@reutershealth.com>
From: Graham Klyne <GK@dial.pipex.com>
Subject: Re: Finishing the XML-tagging discussion
Cc: ietf-xml-mime@imc.org
In-Reply-To: <38D794F5.30432767@reutershealth.com>
References: <Your message of "Mon, 20 Mar 2000 18:48:36 EST." <200003202347.SAA29764@hesketh.net> <4.2.2.20000321083733.00a399c0@pop.dial.pipex.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 10:27 AM 3/21/00 -0500, John Cowan wrote:
>Graham Klyne wrote:
>
> > (b) register a conneg feature tag to indicate XML content;  I would suggest
> > something along the lines of (XML-namespace="<URI>"), where <URI> is an XML
> > namespace identifier, as that would allow negotiation to specific XML
> > applications.
>
>Which namespace?  It is already common for documents to refer to more than
>one XML namespace.

All of them.

#g

------------
Graham Klyne
(GK@ACM.ORG)



Received: by ns.secondary.com (8.9.3/8.9.3) id HAA29275 for ietf-xml-mime-bks; Tue, 21 Mar 2000 07:26:18 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA29270 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 07:26:17 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN8A01MCOW00004D@MAUVE.INNOSOFT.COM> for ietf-xml-mime@imc.org; Tue, 21 Mar 2000 07:27:47 -0800 (PST)
Date: Tue, 21 Mar 2000 07:23:35 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Tue, 21 Mar 2000 08:23:01 -0500" <200003211323.IAA18995@astro.cs.utk.edu>
To: Keith Moore <moore@cs.utk.edu>
Cc: Graham Klyne <GK@dial.pipex.com>, Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org
Message-id: <01JNABTUWEPY00004D@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <4.2.2.20000321083733.00a399c0@pop.dial.pipex.com>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> I really don't like having a separate label for this.  Anytime you
> have two different labels for the same thing, you should be wary,
> because it's inevitable that the two labels will get out of sync.
> This has far greater potential for silly states than the concerns
> that Ned expressed about a separate content-type parameter.

The problem is that the only way you can avoid having a separate tag in
feature expressions is to make the feature expression label the _only_ label
saying this. That's just the nature of the beast. Your parameter proposal
ends up needing a separate feature expression tag, just as the suffix
proposal does.

> Also, all of the deployment arguments about a separate
> content-type parameter are at least as applicable for
> a separate content-feature header - if composers can't even
> add a parameter to the content-type line, how likely is it that
> they can reliably add a completely separate header?

And this is exactly why having only the feature expression label isn't
enough -- it doesn't solve the immediate problem for anyone.

				Ned



Received: by ns.secondary.com (8.9.3/8.9.3) id HAA29251 for ietf-xml-mime-bks; Tue, 21 Mar 2000 07:25:31 -0800 (PST)
Received: from mail.reutershealth.com (mail.reutershealth.com [204.243.9.36]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA29247 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 07:25:29 -0800 (PST)
Received: from reutershealth.com (IDENT:cowan@skunk.reutershealth.com [204.243.9.153]) by mail.reutershealth.com (Pro-8.9.3/8.9.3) with ESMTP id KAA08230; Tue, 21 Mar 2000 10:27:02 -0500 (EST)
Message-ID: <38D794F5.30432767@reutershealth.com>
Date: Tue, 21 Mar 2000 10:27:49 -0500
From: John Cowan <jcowan@reutershealth.com>
Organization: Reuters Health Information
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.5-15 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Graham Klyne <GK@dial.pipex.com>
CC: Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org
Subject: Re: Finishing the XML-tagging discussion
References: <Your message of "Mon, 20 Mar 2000 18:48:36 EST." <200003202347.SAA29764@hesketh.net> <4.2.2.20000321083733.00a399c0@pop.dial.pipex.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Graham Klyne wrote:

> (b) register a conneg feature tag to indicate XML content;  I would suggest
> something along the lines of (XML-namespace="<URI>"), where <URI> is an XML
> namespace identifier, as that would allow negotiation to specific XML
> applications.

Which namespace?  It is already common for documents to refer to more than
one XML namespace.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)


Received: by ns.secondary.com (8.9.3/8.9.3) id HAA28764 for ietf-xml-mime-bks; Tue, 21 Mar 2000 07:04:41 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA28739; Tue, 21 Mar 2000 07:04:05 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN8A01MCOW00004D@MAUVE.INNOSOFT.COM>; Tue, 21 Mar 2000 07:05:32 -0800 (PST)
Date: Tue, 21 Mar 2000 06:41:09 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Tue, 21 Mar 2000 03:22:22 -0500" <200003210822.e2L8MMC27568@black-ice.cc.vt.edu>
To: Valdis.Kletnieks@vt.edu
Cc: ned.freed@INNOSOFT.COM, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JNAB29SEMK00004D@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <200003201814.NAA26399@astro.cs.utk.edu> <01JN94MCPL8U00004D@MAUVE.INNOSOFT.COM> <01JN94MCPL8U00004D@MAUVE.INNOSOFT.COM>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> On Mon, 20 Mar 2000 10:48:40 PST, ned.freed@innosoft.com said:
> > I note in passing that we've already registered some XML-based types
> > with an "XML" suffix, and no problems have been reported with this
> > usage.

> Has there been any registration of a 'foobar-xml' that duplicates
> an existing 'foobar', but with an XML wrapping?

AFAIK, no. I don't see a reason to change an existing registration in any
case -- when we've made changes to the media types system in the past
we've grandfathered old registrations that don't "fit" the new rules.

> If so, have any
> user agents been known to have been modified to use the -xml as the
> sort of trigger we have been discussing?

I always try to code stuff like this before making assertions as to its
ease of implementation. In this case I did so with the user agent I
use (PMDF MAIL). The changes for -xml handling were very simple and seemed
to work just fine.

I also attempted to code the changes for dispatching off of a global parameter
and adding a global parameter when sending. The former turns out to be doable
but kinda ugly; the latter isn't doable in my UA short of a new product release
because the format of a critical data file has to change to accomodate it. (I
don't have a complicated GUI to contend with either.)

FWIW, this also got me started looking at implementation of content-features
support in our software. This is, well, interesting. The facility is well
specified as far as I can tell, but part of it involves implementation of a
expression canonicalization routine. Now, I used to work on the symbolic
algebra system in a package called MATHLIB, so this is known territory for me,
but I wonder how well someone who's never done anything like this would handle
it. It certainly is way more complex than anything else we've discussed,
although its addition doesn't present a problem from an interface or file
compatibility standpoint.

And there's also the problem of generating feature expressions describing stuff
that varies from document to document -- as I see it this can be done either by
a generic analysis step or else by adding  some communications channels
composition programs can use. Both of these amount to pushing the problem off
to some other component, and I have to wonder whether or not composition agents
will actually evolve to the point of providing such information or analysis
programs of this sort will be written.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id FAA27019 for ietf-xml-mime-bks; Tue, 21 Mar 2000 05:26:48 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id FAA27015 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 05:26:47 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id IAA19007; Tue, 21 Mar 2000 08:26:34 -0500 (EST)
Message-Id: <200003211326.IAA19007@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: Graham Klyne <GK@dial.pipex.com>
cc: Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org
Subject: Re: Finishing(!) the XML-tagging discussion 
In-reply-to: Your message of "Tue, 21 Mar 2000 08:27:04 GMT." <4.2.2.20000321081348.00a75320@pop.dial.pipex.com> 
Date: Tue, 21 Mar 2000 08:26:34 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Graham, 

my position is that requiring a separate label to facilitate
content negotiation makes it entirely too likely that the
separate label will be incorrect or omitted.  so any proposal
for an xml frob that says "don't use this frob for content-negotiation" 
is a non-starter - it's headed 180 degrees in the wrong direction.

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id FAA26975 for ietf-xml-mime-bks; Tue, 21 Mar 2000 05:23:16 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id FAA26971 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 05:23:15 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id IAA18995; Tue, 21 Mar 2000 08:23:01 -0500 (EST)
Message-Id: <200003211323.IAA18995@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: Graham Klyne <GK@dial.pipex.com>
cc: Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Tue, 21 Mar 2000 08:49:23 GMT." <4.2.2.20000321083733.00a399c0@pop.dial.pipex.com> 
Date: Tue, 21 Mar 2000 08:23:01 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> (c) use the proposed Content-feature tag for labeling MIME content.

I really don't like having a separate label for this.  Anytime you
have two different labels for the same thing, you should be wary,
because it's inevitable that the two labels will get out of sync.
This has far greater potential for silly states than the concerns
that Ned expressed about a separate content-type parameter.

Also, all of the deployment arguments about a separate
content-type parameter are at least as applicable for
a separate content-feature header - if composers can't even
add a parameter to the content-type line, how likely is it that
they can reliably add a completely separate header?

Keith



Received: by ns.secondary.com (8.9.3/8.9.3) id BAA12901 for ietf-xml-mime-bks; Tue, 21 Mar 2000 01:21:09 -0800 (PST)
Received: from msw.mimesweeper.com (msw.mimesweeper.com [194.168.90.18]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id BAA12896 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 01:21:08 -0800 (PST)
Received: from bell.mimesweeper.com (unverified) by msw.mimesweeper.com (Content Technologies SMTPRS 4.1.5) with ESMTP id <Tc2a85a12de4b153ab1c4@msw.mimesweeper.com>; Tue, 21 Mar 2000 09:25:15 +0000
Received: from GK-VAIO (gk-vaio.mimesweeper.com [194.168.90.137]) by bell.mimesweeper.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2448.0) id HDT54DWD; Tue, 21 Mar 2000 09:24:06 -0000
Message-Id: <4.2.2.20000321081348.00a75320@pop.dial.pipex.com>
X-Sender: maiw03@pop.dial.pipex.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Tue, 21 Mar 2000 08:27:04 +0000
To: Keith Moore <moore@cs.utk.edu>
From: Graham Klyne <GK@dial.pipex.com>
Subject: Re: Finishing(!) the XML-tagging discussion 
Cc: ietf-xml-mime@imc.org
In-Reply-To: <200003201950.OAA26918@astro.cs.utk.edu>
References: <Your message of "Mon, 20 Mar 2000 13:51:42 EST." <200003201850.NAA13040@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 02:50 PM 3/20/00 -0500, Keith Moore wrote:
>or to put it another way, if we're going to invent a new syntatical
>convention for describing content-type characteristics, let's
>pick one that actually works with content negotiation
>rather than one which requires drastic changes to content
>negotiation frameworks.

>in a sense, it doesn't break anything.  in another sense, it
>creates a new feature of content-type names that can be exploited,
>and people will demand that they be exploited.  so it requires
>changes of existing implementations.  and if the -xml feature catches
>on then it will creep into content-type negotiation schemes.  this is
>not an intended consequence, but it is a likely one.

Your concern here, then, is what people *might* do to abuse the feature in 
future?

Given that the *intent* of the '-xml' seems to be quite benign, and is 
perceived by many to be quite useful, may I suggest that:

(a) the proposal is presented as nothing more than a naming *convention*, 
plastered with health warnings about NOT using it for content negotiation, 
or any purpose other than local identification of XML for possible 
fall-back submission to a generic XML handler, and

(b) introduce at the same time a proposal that will allow people to 
definitively tag XML for the purposes of content negotiation, and refer to 
this proposal from (a).

My thinking is a "carrot and stick" approach to resisting the harmful usage 
that concerns you:  be very clear that bad things will happen if the 
feature is abused, AND provide a mechanism to achieve the desired results 
(for content negotiation, etc.) without incurring the harm.


I think there are two distinct sets of requirements here:

(a) is a local convention that can be used transparently as far as existing 
agents are concerned,

and

(b) relates to new capabilities (hence new/upgraded software) that require 
the active cooperation of two or more communicating parties.

I think much of the present debate arises because these requirements are 
being confused.

#g

------------
Graham Klyne
(GK@ACM.ORG)



Received: by ns.secondary.com (8.9.3/8.9.3) id BAA12892 for ietf-xml-mime-bks; Tue, 21 Mar 2000 01:21:08 -0800 (PST)
Received: from msw.mimesweeper.com (msw.mimesweeper.com [194.168.90.18]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id BAA12884 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 01:21:06 -0800 (PST)
Received: from bell.mimesweeper.com (unverified) by msw.mimesweeper.com (Content Technologies SMTPRS 4.1.5) with ESMTP id <Tc2a85a12de4b153ab0e7@msw.mimesweeper.com>; Tue, 21 Mar 2000 09:25:15 +0000
Received: from GK-VAIO (gk-vaio.mimesweeper.com [194.168.90.137]) by bell.mimesweeper.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2448.0) id HDT54DWB; Tue, 21 Mar 2000 09:24:06 -0000
Message-Id: <4.2.2.20000321080405.00a41c50@pop.dial.pipex.com>
X-Sender: maiw03@pop.dial.pipex.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Tue, 21 Mar 2000 08:09:58 +0000
To: "Marshall Rose" <mrose+mtr.netnews@dbc.mtview.ca.us>
From: Graham Klyne <GK@dial.pipex.com>
Subject: Re: Finishing the XML-tagging discussion
Cc: <ietf-xml-mime@imc.org>
In-Reply-To: <003801bf929f$e7887b80$6d61fea9@dbc.mtview.ca.us>
References: <38D61D2B.A71DC71C@w3.org> <01JN92LO01ME00004D@MAUVE.INNOSOFT.COM> <01JN94T4BSEM00004D@MAUVE.INNOSOFT.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 11:10 AM 3/20/00 -0800, Marshall Rose wrote:
> > >     Content-type: image/svg; representation="xml"
> > If we absolutely have to do this with a separate piece of information, I
>would
> > opt for a content-feature tag. That way there's a clear delineation
>between
> > when feature information is or is not present, and we don't mess up MIME
> > parameter space. And we need the feature tag anyway for negotiation
>purposes.
>
>hmmm.  my view of the example above is that XML is being used as the syntax
>but the semantics of the blob being passed are still SVG semantics.

A closer MIME analogue to this would appear to me 
Content-transfer-encoding.  Thus, precedent suggests a separate header 
being the way to go *IF* a separate piece of information is required.

#g

------------
Graham Klyne
(GK@ACM.ORG)



Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id BAA12897 for ietf-xml-mime-bks; Tue, 21 Mar 2000 01:21:08 -0800 (PST)
Received: from msw.mimesweeper.com (msw.mimesweeper.com [194.168.90.18]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id BAA12890 for <ietf-xml-mime@imc.org>; Tue, 21 Mar 2000 01:21:07 -0800 (PST)
Received: from bell.mimesweeper.com (unverified) by msw.mimesweeper.com (Content Technologies SMTPRS 4.1.5) with ESMTP id <Tc2a85a12de4b153ab318@msw.mimesweeper.com>; Tue, 21 Mar 2000 09:25:15 +0000
Received: from GK-VAIO (gk-vaio.mimesweeper.com [194.168.90.137]) by bell.mimesweeper.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2448.0) id HDT54DWF; Tue, 21 Mar 2000 09:24:07 -0000
Message-Id: <4.2.2.20000321083733.00a399c0@pop.dial.pipex.com>
X-Sender: maiw03@pop.dial.pipex.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Tue, 21 Mar 2000 08:49:23 +0000
To: Keith Moore <moore@cs.utk.edu>
From: Graham Klyne <GK@dial.pipex.com>
Subject: Re: Finishing the XML-tagging discussion 
Cc: ietf-xml-mime@imc.org
In-Reply-To: <200003210016.TAA28135@astro.cs.utk.edu>
References: <Your message of "Mon, 20 Mar 2000 18:48:36 EST." <200003202347.SAA29764@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 07:16 PM 3/20/00 -0500, Keith Moore wrote:
> > Requesting yet again.  Could we please have details of this horrible
> > problem inflicted by the -xml suffix?
>
>could we please have details of how conneg expressions or HTTP
>Accept headers could be used to say things like
>"I accept XML documents"?

Here is a strawman ...

(a) revise RFC 2295 (currently experimental) so that rather than use its 
own notations for describing resource variants and client features it uses 
conneg expressions.  RFC2295 introduces an "Accept-features:" header that 
is provided by a client requesting a resource.

(b) register a conneg feature tag to indicate XML content;  I would suggest 
something along the lines of (XML-namespace="<URI>"), where <URI> is an XML 
namespace identifier, as that would allow negotiation to specific XML 
applications.  A distinguished value, say "*", would be defined to indicate 
a generic XML handling capability.  An empty string could indicate XML 
handling without recognition of namespaces (i.e. "classic" XML).

(c) use the proposed Content-feature tag for labeling MIME content.

I believe this approach takes far greater account of the way the XML and 
XML applications are being developed by W3C than simply having a parameter 
that says "this is XML".

#g

------------
Graham Klyne
(GK@ACM.ORG)



Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id AAA11082 for ietf-xml-mime-bks; Tue, 21 Mar 2000 00:32:56 -0800 (PST)
Received: from black-ice.cc.vt.edu (root@black-ice.cc.vt.edu [128.173.14.71]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id AAA11060; Tue, 21 Mar 2000 00:32:52 -0800 (PST)
From: Valdis.Kletnieks@vt.edu
Received: from black-ice.cc.vt.edu (valdis@LOCALHOST [127.0.0.1]) by black-ice.cc.vt.edu (8.10.1.Alpha0/8.10.1.Alpha0) with ESMTP id e2L8YaC22436; Tue, 21 Mar 2000 03:34:36 -0500
Message-Id: <200003210834.e2L8YaC22436@black-ice.cc.vt.edu>
To: "Simon St.Laurent" <simonstl@simonstl.com>
cc: ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 17:59:13 EST." <200003202258.RAA27431@hesketh.net> 
X-URL: http://black-ice.cc.vt.edu/~valdis/
X-Face: 34C9$Ewd2zeX+\!i1BA\j{ex+$/V'JBG#;3_noWWYPa"|,I#`R"{n@w>#:{)FXyiAS7(8t( ^*w5O*!8O9YTe[r{e%7(yVRb|qxsRYw`7J!`AM}m_SHaj}f8eb@d^L>BrX7iO[<!v4-0bVIpaxF#-) %9#a9h6JXI|T|8o6t\V?kGl]Q!1V]GtNliUtz:3},0"hkPeBuu%E,j(:\iOX-P,t7lRR#
References: <Your message of "Mon, 20 Mar 2000 15:00:27 EST." <200003201959.OAA17209@hesketh.net> <200003202258.RAA27431@hesketh.net>
Date: Tue, 21 Mar 2000 03:34:35 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On Mon, 20 Mar 2000 17:59:13 EST, "Simon St.Laurent" said:
> It's a plausible risk only if HTTP 1.1 or other protocols are explicitly
> reopened and modified.  Otherwise, it's pretty easy to point to the spec
> and say "No.  That's forbidden."  It might be a good idea to present these

If you haven't been in the industry long enough that it takes you more
than 30 seconds to think of at least 5 vendors(*) of software who aren't
dissuaded from doing something just because the standard says explicitly
not to do that, you haven't been around enough to see *seriously* broken
software.

Careful reading of the MIME specs will reveal that a *large* portion of
all the prohibitions against this, that, and the other thing aren't
there to prevent a good programmer from implementing something known to
be impossible.  The prohibitions are there to stop poor programmers
from implementing possible things incorrectly.

/Valdis

(*) Conglomerates such as IBM and Microsoft only count as one vendor
each, no matter how many offending companies/divisions they own.



Received: by ns.secondary.com (8.9.3/8.9.3) id AAA09953 for ietf-xml-mime-bks; Tue, 21 Mar 2000 00:20:47 -0800 (PST)
Received: from black-ice.cc.vt.edu (root@black-ice.cc.vt.edu [128.173.14.71]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id AAA09943; Tue, 21 Mar 2000 00:20:43 -0800 (PST)
From: Valdis.Kletnieks@vt.edu
Received: from black-ice.cc.vt.edu (valdis@LOCALHOST [127.0.0.1]) by black-ice.cc.vt.edu (8.10.1.Alpha0/8.10.1.Alpha0) with ESMTP id e2L8MMC27568; Tue, 21 Mar 2000 03:22:22 -0500
Message-Id: <200003210822.e2L8MMC27568@black-ice.cc.vt.edu>
To: ned.freed@INNOSOFT.COM
cc: ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 10:48:40 PST." <01JN94MCPL8U00004D@MAUVE.INNOSOFT.COM> 
X-URL: http://black-ice.cc.vt.edu/~valdis/
X-Face: 34C9$Ewd2zeX+\!i1BA\j{ex+$/V'JBG#;3_noWWYPa"|,I#`R"{n@w>#:{)FXyiAS7(8t( ^*w5O*!8O9YTe[r{e%7(yVRb|qxsRYw`7J!`AM}m_SHaj}f8eb@d^L>BrX7iO[<!v4-0bVIpaxF#-) %9#a9h6JXI|T|8o6t\V?kGl]Q!1V]GtNliUtz:3},0"hkPeBuu%E,j(:\iOX-P,t7lRR#
References: <200003201814.NAA26399@astro.cs.utk.edu> <01JN94MCPL8U00004D@MAUVE.INNOSOFT.COM>
Date: Tue, 21 Mar 2000 03:22:22 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On Mon, 20 Mar 2000 10:48:40 PST, ned.freed@innosoft.com said:
> I note in passing that we've already registered some XML-based types
> with an "XML" suffix, and no problems have been reported with this
> usage.

Has there been any registration of a 'foobar-xml' that duplicates
an existing 'foobar', but with an XML wrapping?  If so, have any
user agents been known to have been modified to use the -xml as the
sort of trigger we have been discussing?

				Valdis Kletnieks
				Operating Systems Analyst
				Virginia Tech


Received: by ns.secondary.com (8.9.3/8.9.3) id AAA08633 for ietf-xml-mime-bks; Tue, 21 Mar 2000 00:00:17 -0800 (PST)
Received: from sh.w3.mag.keio.ac.jp (sh.w3.mag.keio.ac.jp [133.27.194.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id XAA08589; Mon, 20 Mar 2000 23:59:55 -0800 (PST)
Received: from enoshima (dhcp-100-224.mag.keio.ac.jp [133.27.195.224]) by sh.w3.mag.keio.ac.jp (8.9.3/3.7W) with ESMTP id RAA28410; Tue, 21 Mar 2000 17:01:30 +0900 (JST)
Message-Id: <4.2.0.58.J.20000321165838.03456bd0@sh.w3.mag.keio.ac.jp>
X-Sender: duerst@sh.w3.mag.keio.ac.jp
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Tue, 21 Mar 2000 16:59:24 +0900
To: ned.freed@INNOSOFT.COM, Keith Moore <moore@cs.utk.edu>
From: "Martin J. Duerst" <duerst@w3.org>
Subject: Re: Finishing the XML-tagging discussion
Cc: ned.freed@INNOSOFT.COM, "Simon St.Laurent" <simonstl@simonstl.com>, Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <01JN9SDVLGFW0000WW@MAUVE.INNOSOFT.COM>
References: <"Your message dated Tue, 21 Mar 2000 00:56:53 -0500" <200003210556.AAA29417@astro.cs.utk.edu> <4.2.0.58.J.20000321123528.03489100@sh.w3.mag.keio.ac.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 00/03/20 22:06 -0800, ned.freed@innosoft.com wrote:
> > now you're moving from the realm of specifying 'features' or
> > 'default handling' of the object into specifying layered encodings.
> > and this was something that we were very unwilling to do when
> > we defined MIME.  The feeling was that having an arbitrary
> > number of cascaded encodings was likely to be detrimental to
> > interoperability.
>
>I thought this too until I realized these were simply strawman names in
>Martin's example.

Exactly. Sorry for not having been clear enough.

Regards,   Martin.


Received: by ns.secondary.com (8.9.3/8.9.3) id WAA02166 for ietf-xml-mime-bks; Mon, 20 Mar 2000 22:09:12 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id WAA02149; Mon, 20 Mar 2000 22:09:05 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN9RMXRRE80000WW@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 22:10:29 -0800 (PST)
Date: Mon, 20 Mar 2000 22:06:41 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Tue, 21 Mar 2000 00:56:53 -0500" <200003210556.AAA29417@astro.cs.utk.edu>
To: Keith Moore <moore@cs.utk.edu>
Cc: "Martin J. Duerst" <duerst@w3.org>, ned.freed@INNOSOFT.COM, "Simon St.Laurent" <simonstl@simonstl.com>, Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN9SDVLGFW0000WW@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <4.2.0.58.J.20000321123528.03489100@sh.w3.mag.keio.ac.jp>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> now you're moving from the realm of specifying 'features' or
> 'default handling' of the object into specifying layered encodings.
> and this was something that we were very unwilling to do when
> we defined MIME.  The feeling was that having an arbitrary
> number of cascaded encodings was likely to be detrimental to
> interoperability.

I thought this too until I realized these were simply strawman names in
Martin's example. I wish he had used other names since the ones he chose invite
criticism of this sort. OTOH, we've been unable to come up with a good example
of a possible suffix that would be orthogonal to -xml...

In any case, MIME flatly forbids the specification of encodings as content
types, so this is a moot point.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id VAA01580 for ietf-xml-mime-bks; Mon, 20 Mar 2000 21:57:06 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id VAA01561; Mon, 20 Mar 2000 21:56:57 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id AAA29417; Tue, 21 Mar 2000 00:56:53 -0500 (EST)
Message-Id: <200003210556.AAA29417@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: "Martin J. Duerst" <duerst@w3.org>
cc: ned.freed@INNOSOFT.COM, "Simon St.Laurent" <simonstl@simonstl.com>, Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Tue, 21 Mar 2000 12:40:24 +0900." <4.2.0.58.J.20000321123528.03489100@sh.w3.mag.keio.ac.jp> 
Date: Tue, 21 Mar 2000 00:56:53 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

now you're moving from the realm of specifying 'features' or
'default handling' of the object into specifying layered encodings.
and this was something that we were very unwilling to do when
we defined MIME.  The feeling was that having an arbitrary
number of cascaded encodings was likely to be detrimental to 
interoperability.

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id VAA28354 for ietf-xml-mime-bks; Mon, 20 Mar 2000 21:03:56 -0800 (PST)
Received: from sh.w3.mag.keio.ac.jp (sh.w3.mag.keio.ac.jp [133.27.194.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id VAA28328; Mon, 20 Mar 2000 21:03:43 -0800 (PST)
Received: from enoshima (dhcp-100-224.mag.keio.ac.jp [133.27.195.224]) by sh.w3.mag.keio.ac.jp (8.9.3/3.7W) with ESMTP id OAA27768; Tue, 21 Mar 2000 14:04:44 +0900 (JST)
Message-Id: <4.2.0.58.J.20000321123528.03489100@sh.w3.mag.keio.ac.jp>
X-Sender: duerst@sh.w3.mag.keio.ac.jp
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Tue, 21 Mar 2000 12:40:24 +0900
To: ned.freed@INNOSOFT.COM, "Simon St.Laurent" <simonstl@simonstl.com>
From: "Martin J. Duerst" <duerst@w3.org>
Subject: Re: Finishing the XML-tagging discussion
Cc: Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <01JN9FKA1MRU0000MM@MAUVE.INNOSOFT.COM>
References: <"Your message dated Mon, 20 Mar 2000 18:48:36 -0500" <200003202347.SAA29764@hesketh.net> <200003202030.PAA27040@astro.cs.utk.edu> <200003202258.RAA27431@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 00/03/20 15:57 -0800, ned.freed@INNOSOFT.COM wrote:


>Let me try and state what I think the problem that have been brought up are:
>
>(1) We have to deal with what happens should another one of these suffixes
>     be defined at some point. Now, I have pointed out that we can define
>     things in such a way that this is defered until #2 comes along. But
>     if we want to tackle it now we certainly can. The simplest option is
>     to insist on these tags being in alphabetical order in any media type
>     name.

An alternative would be to state that these suffixes are to be
taken inside-out, or outside-in. Assuming outside-in, and
assuming 'charset', generic encodings, and RDF as strawman
examples, the sequence would be something like

text/example-rdf-xml-shift_jis-gzip

i.e. This stuff is RDF experssed in XML encoded in Shift_JIS
and compressed with gzip, or outside-in: To get at this stuff,
first decompress it with g(un)zip, then interpret it as
a sequence of characters encoded in Shift_JIS, then hand it
to an XML parser, and then to an RDF engine.


Regards,   Martin.


Received: by ns.secondary.com (8.9.3/8.9.3) id TAA23830 for ietf-xml-mime-bks; Mon, 20 Mar 2000 19:55:35 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id TAA23810; Mon, 20 Mar 2000 19:55:28 -0800 (PST)
Received: from thinkpad (ith1-190.twcny.rr.com [24.92.236.144]) by hesketh.net (8.9.3/8.9.3) with SMTP id WAA07000; Mon, 20 Mar 2000 22:57:11 -0500
Message-Id: <200003210357.WAA07000@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Mon, 20 Mar 2000 22:58:05 -0500
To: Keith Moore <moore@cs.utk.edu>
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion 
Cc: ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <200003210328.WAA28889@astro.cs.utk.edu>
References: <Your message of "Mon, 20 Mar 2000 20:49:03 EST."             <200003210148.UAA02623@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 10:28 PM 3/20/00 -0500, Keith Moore wrote:
>> Again, please describe the impact.
>
>I have done so.  Repeatedly.  

Perhaps I am merely illiterate, but I haven't seen a single instance
proposed where the existence of the suffix, in the contexts proposed by the
I-D, actually breaks _anything_.  

If they're in the archives and I'm just incapable of finding it, I'd
appreciate a reference.  Pointing to them offlist is perfectly fine,
perhaps even preferable at this point.

In the meantime, I don't think you've proven your case of 'pollution'.  You
may well have proved that -xml doesn't solve all problems all the time (it
isn't intended to), but I don't think you've demonstrated that it causes
harm.  

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id TAA22053 for ietf-xml-mime-bks; Mon, 20 Mar 2000 19:28:25 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id TAA22043; Mon, 20 Mar 2000 19:28:23 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id WAA28889; Mon, 20 Mar 2000 22:28:27 -0500 (EST)
Message-Id: <200003210328.WAA28889@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: "Simon St.Laurent" <simonstl@simonstl.com>
cc: Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 20:49:03 EST." <200003210148.UAA02623@hesketh.net> 
Date: Mon, 20 Mar 2000 22:28:27 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> Again, please describe the impact.

I have done so.  Repeatedly.  


Received: by ns.secondary.com (8.9.3/8.9.3) id TAA21982 for ietf-xml-mime-bks; Mon, 20 Mar 2000 19:27:20 -0800 (PST)
Received: from sh.w3.mag.keio.ac.jp (sh.w3.mag.keio.ac.jp [133.27.194.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id TAA21971; Mon, 20 Mar 2000 19:27:17 -0800 (PST)
Received: from enoshima (dhcp-100-224.mag.keio.ac.jp [133.27.195.224]) by sh.w3.mag.keio.ac.jp (8.9.3/3.7W) with ESMTP id MAA27557; Tue, 21 Mar 2000 12:28:38 +0900 (JST)
Message-Id: <4.2.0.58.J.20000321120725.03432830@sh.w3.mag.keio.ac.jp>
X-Sender: duerst@sh.w3.mag.keio.ac.jp
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Tue, 21 Mar 2000 12:22:15 +0900
To: ned.freed@INNOSOFT.COM, Keith Moore <moore@cs.utk.edu>
From: "Martin J. Duerst" <duerst@w3.org>
Subject: Re: Finishing the XML-tagging discussion
Cc: ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <01JN9ERAX62O0000MM@MAUVE.INNOSOFT.COM>
References: <"Your message dated Mon, 20 Mar 2000 15:28:51 -0500" <200003202028.PAA27011@astro.cs.utk.edu> <01JN94K31VY200004D@MAUVE.INNOSOFT.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 00/03/20 15:21 -0800, ned.freed@INNOSOFT.COM wrote:



> > let me see if I understand you correctly: what you are saying is that
> > people will expect the new convention (whatever it is) to work and
> > control the recipient's MIME readers's fallback behavior even in
> > the presence of a  vast installed base that neither understands this
> > convention nor XML?
>
>No, that is not what I'm saying. What I'm saying is that a new agent, one that
>depends on the parameter being present, only works if it is present. And this
>then implies that a substantial number of agents need to send the tag in order
>for it to be useful. This won't happen soon if at all, so my agent that I 
>wrote
>which depends on the tag we've called for ends up not working.

All the experience that I have had and seen reported with the 'charset'
parameter suggests that this is indeed a problem. Adding another
parameter, and getting it to work, would take years. It is only
recently that e.g. Apache was updated to include an 'AddCharset'
directive to make it easier to set the charset on the Content-Type.

With -xml, you get immediate benefits. Webmasters will set the type
to image/svg-xml just because there is nothing else. Webmasters
won't be much inclined to add a parameter that looks obvious anyway.

Although I do not claim that 'charset' should have been a type
suffix (the benefits of having it orthogonal are big enough in
that case), I think all the experience from 'charset' clearly
points to -xml as the solution.


Regards,   Martin.


Received: by ns.secondary.com (8.9.3/8.9.3) id TAA21815 for ietf-xml-mime-bks; Mon, 20 Mar 2000 19:21:16 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id TAA21795; Mon, 20 Mar 2000 19:21:12 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN9BU8XYCW0000MM@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 16:30:39 -0800 (PST)
Date: Mon, 20 Mar 2000 16:28:08 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 19:16:41 -0500" <200003210016.TAA28135@astro.cs.utk.edu>
To: Keith Moore <moore@cs.utk.edu>
Cc: "Simon St.Laurent" <simonstl@simonstl.com>, Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN9GIKJ75E0000MM@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <200003202347.SAA29764@hesketh.net>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > Requesting yet again.  Could we please have details of this horrible
> > problem inflicted by the -xml suffix?

> could we please have details of how conneg expressions or HTTP
> Accept headers could be used to say things like
> "I accept XML documents"?

accept-features: (syntax=xml)

My syntax is probably wrong but I'm sure you get the idea.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id TAA21806 for ietf-xml-mime-bks; Mon, 20 Mar 2000 19:21:15 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id TAA21791; Mon, 20 Mar 2000 19:21:11 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN9BU8XYCW0000MM@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 16:03:47 -0800 (PST)
Date: Mon, 20 Mar 2000 15:57:29 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 18:48:36 -0500" <200003202347.SAA29764@hesketh.net>
To: "Simon St.Laurent" <simonstl@simonstl.com>
Cc: Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN9FKA1MRU0000MM@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
References: <200003202030.PAA27040@astro.cs.utk.edu> <200003202258.RAA27431@hesketh.net>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> Requesting yet again.  Could we please have details of this horrible
> problem inflicted by the -xml suffix?

Let me try and state what I think the problem that have been brought up are:

(1) We have to deal with what happens should another one of these suffixes
    be defined at some point. Now, I have pointed out that we can define
    things in such a way that this is defered until #2 comes along. But
    if we want to tackle it now we certainly can. The simplest option is
    to insist on these tags being in alphabetical order in any media type
    name.

(2) There is concern that this will lead to use of stuff like */*-xml in
    HTTP accept fields even if we specifically disallow it. I would like to
    address this by defining a conneg feature tag that says "the underlying
    syntax here is xml". This would be in addition to the "-xml" suffix, and
    would apply even to cases where media types aren't specified or where
    the -xml suffix isn't used.

If there are any other concrete problems I've missed them and would 
appreciate hearing about them.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id TAA21101 for ietf-xml-mime-bks; Mon, 20 Mar 2000 19:08:01 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id TAA21090; Mon, 20 Mar 2000 19:07:55 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN9BU8XYCW0000MM@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 19:09:20 -0800 (PST)
Date: Mon, 20 Mar 2000 19:05:30 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 20:12:34 -0500" <200003210112.UAA28371@astro.cs.utk.edu>
To: Keith Moore <moore@cs.utk.edu>
Cc: ned.freed@INNOSOFT.COM, Keith Moore <moore@cs.utk.edu>, "Simon St.Laurent" <simonstl@simonstl.com>, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN9M2AMI2O0000MM@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <01JN9GIKJ75E0000MM@MAUVE.INNOSOFT.COM>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > accept-features: (syntax=xml)

> yes, but you can't do that without either

> a) adding yet another frob to the object for this purpose
>    (which is separate from the content-type frob), or

Yep, which is why I've repeatedly proposed adding a feature tag for this
purpose in addition to the -xml suffix. 

Also note that adding a general content-type parameter doesn't work with conneg
either, since conneg only deals with media types, not their parameters. (This
is another one of those cases you don't seem to be worried about.) So this frob
is needed for conneg to work even with your proposal.

> b) teaching the conneg recognizer about the xml frob

Can't teach it about either the -xml suffix or the content-type parameter
very easily if at all. Conneg just doesn't work this way.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id SAA20652 for ietf-xml-mime-bks; Mon, 20 Mar 2000 18:59:40 -0800 (PST)
Received: from mgate-01.teledesic.com (mgate-01.teledesic.com [216.190.22.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id SAA20648; Mon, 20 Mar 2000 18:59:39 -0800 (PST)
Received: by mgate-01.teledesic.com with Internet Mail Service (5.5.2448.0) id <HKNL3RR7>; Mon, 20 Mar 2000 18:56:58 -0800
Message-ID: <25D0C66E6D25D311B2AC0008C7913EE0517E0F@tdmail2.teledesic.com>
From: Dan Kohn <dan@teledesic.com>
To: ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: RE: Finishing the XML-tagging discussion
Date: Mon, 20 Mar 2000 18:50:06 -0800
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
Content-Type: text/plain; charset="windows-1252"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

>> but I have to wonder - how likely is it that such XML-ish things
>> will be generated by vanilla MIME user agents anyway?  isn't it
>> more likely that they will be generated by things that are XML-aware?

>They will be generated by XML-generation thingies but labelled by 
>MIME-generation thingies. I don't have to have a new user agent with
>XML-generation capabilities to use XML heavily.

As a specific example (which this thread has been conspicuously lacking), if
I attach an image/svg-xml file to this message and include the instructions
"Please copy the French text on the road sign", someone with an XML-aware
MIME client and an XML browser but no support for SVG can still probably
open the file and copy the text.  By contrast, I the sender have no way of
adding $superclass support to my existing mailer, let alone the receiver's.

I think the thread has probably laid clear most of the issues involved.
However, we can see from new people popping in mid-thread that these will
all be re-raised if and when we ever go to last call.

Therefore, I'm going to work with Simon and Murata-san to create an -03
draft that summarizes the alternatives to -xml and issues with them.  This
would most likely make sense as an appendix to the current draft.

Please let me know if you are unwilling for me to use text posted to the
list in these explanations (Ned, that especially mean you).  Also, although
we'll announce a new draft on ietf-822 as well, I'd suggest that
ietf-xml-mime would be the best list on which to continue this thread, given
that it was expressly created for that purpose.  People on ietf-822 are
probably (far too) aware of the situation by now.

		- dan
--
Daniel Kohn <mailto:dan@dankohn.com>
tel:+1-425-602-6222  fax:+1-425-602-6223
http://www.dankohn.com 


Received: by ns.secondary.com (8.9.3/8.9.3) id RAA14787 for ietf-xml-mime-bks; Mon, 20 Mar 2000 17:46:33 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id RAA14776; Mon, 20 Mar 2000 17:46:30 -0800 (PST)
Received: from thinkpad (ith1-190.twcny.rr.com [24.92.236.144]) by hesketh.net (8.9.3/8.9.3) with SMTP id UAA02623; Mon, 20 Mar 2000 20:48:12 -0500
Message-Id: <200003210148.UAA02623@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Mon, 20 Mar 2000 20:49:03 -0500
To: Keith Moore <moore@cs.utk.edu>
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion 
Cc: Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <200003210129.UAA28580@astro.cs.utk.edu>
References: <Your message of "Mon, 20 Mar 2000 19:38:54 EST."             <200003210038.TAA32233@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 08:29 PM 3/20/00 -0500, Keith Moore wrote:
>> I'm still looking for damage, though I'm concluding more and more that
>> there simply isn't any, and that this discussion is only circling.
>
>let me put it this way: some people are quite happy to pollute the MIME 
>content-type space, and to alter existing dispatching mechanisms and 
>existing content-negotiation mechanisms, in order to establish an ad hoc 
>recognition scheme for XML objects as quickly as possible.  others are 
>more concerned about the longevity of the overall MIME architecture and 
>want to see that the likely effects of such changes are carefully 
>considered before such changes are widely adopted or endorsed.

Carefully considered concerns are one thing, but evidence of actual damage
is another. Your bias is obvious, yet your concerns are unsubstantiated.

Do you have substantial evidence of real 'pollution'?  I'm quite happy to
carefully consider the impact of such pollution, but I have yet to see such
impact described in any credible detail.

Again, please describe the impact.

Thank you.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id RAA14275 for ietf-xml-mime-bks; Mon, 20 Mar 2000 17:29:37 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id RAA14256; Mon, 20 Mar 2000 17:29:19 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id UAA28580; Mon, 20 Mar 2000 20:29:24 -0500 (EST)
Message-Id: <200003210129.UAA28580@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: "Simon St.Laurent" <simonstl@simonstl.com>
cc: Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 19:38:54 EST." <200003210038.TAA32233@hesketh.net> 
Date: Mon, 20 Mar 2000 20:29:24 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> I'm still looking for damage, though I'm concluding more and more that
> there simply isn't any, and that this discussion is only circling.

let me put it this way: some people are quite happy to pollute the MIME 
content-type space, and to alter existing dispatching mechanisms and 
existing content-negotiation mechanisms, in order to establish an ad hoc 
recognition scheme for XML objects as quickly as possible.  others are 
more concerned about the longevity of the overall MIME architecture and 
want to see that the likely effects of such changes are carefully 
considered before such changes are widely adopted or endorsed.

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id RAA13756 for ietf-xml-mime-bks; Mon, 20 Mar 2000 17:12:36 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id RAA13745; Mon, 20 Mar 2000 17:12:33 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id UAA28371; Mon, 20 Mar 2000 20:12:34 -0500 (EST)
Message-Id: <200003210112.UAA28371@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: ned.freed@INNOSOFT.COM
cc: Keith Moore <moore@cs.utk.edu>, "Simon St.Laurent" <simonstl@simonstl.com>, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 16:28:08 PST." <01JN9GIKJ75E0000MM@MAUVE.INNOSOFT.COM> 
Date: Mon, 20 Mar 2000 20:12:34 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> accept-features: (syntax=xml)

yes, but you can't do that without either
a) adding yet another frob to the object for this purpose
   (which is separate from the content-type frob), or
b) teaching the conneg recognizer about the xml frob

Keith


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id RAA13533 for ietf-xml-mime-bks; Mon, 20 Mar 2000 17:05:57 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id RAA13519; Mon, 20 Mar 2000 17:05:44 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN9BU8XYCW0000MM@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 17:07:06 -0800 (PST)
Date: Mon, 20 Mar 2000 16:32:26 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 19:15:25 -0500" <200003210015.TAA28120@astro.cs.utk.edu>
To: Keith Moore <moore@cs.utk.edu>
Cc: ned.freed@INNOSOFT.COM, Keith Moore <moore@cs.utk.edu>, Chris Lilley <chris@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN9HRRTLJ20000MM@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <01JN9ERAX62O0000MM@MAUVE.INNOSOFT.COM>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > > > An advisory parameter of this sort is a worthless parameter. If you cannot
> > > > depend on it appearing for every instance of a given content type you
> > > > cannot use it for anything.
> >
> > > that's not true.   if the parameter is missing then some opportunities
> > > to evaluate the object (in the absence of knowledge of the complete type)
> > > will be lost.  but that's not the same thing as "you cannot use it for
> > > anything".
> >
> > Something that I cannot count on being there is of no use to me, and, I
> > suspect, to anyone else who cares about this stuff. This has to be
> > dependable to be useful.

> by that argument nothing in MIME would be useful except the ability
> to handle text/plain; charset=us-ascii, since that's all that was
> there in RFC 822 and all that could be expected out of email recipients
> at the time MIME was deployed.

That is exactly how I would characterize MIME if it weren't the dominant format
in use today, and the only standardized format for multimedia in RFC822 mail.
Having more than one such format clearly would not be useful. (Indeed, this
exact argument has been used to keep other mechanisms at bay.)

> and I fail to see how special recognition of the -xml frob is any
> more likely than special treatment of a $superclass parameter.
> by your criteria above neither of these will be useful.

It's more likely to work because it's built into the media type, and generation
of correct media types is a solved problem.

> > > how is this kind of "failure" any different than the kind of failure
> > > that exists when a reader doesn't understand image/svg-xml and
> > > fails to recognize that the -xml suffix means that it can be
> > > treated as generic xml?
> >
> > It is different because I can do something about it -- I can upgrade
> > my software. Whereas in order to get benefit from your proposal I have
> > to upgrade the software that created the object I received -- software I
> > do not control. And in some cases I even have to upgrade the protocols
> > in use.

> okay.  I see that there is a higher barrier to adoption of
> $superclass than for -xml because the sender is more likely to
> be able to generate -xml than $superclass.

Finally!

> but I still think you overstate the hazard.  if someone sends you
> an image/svg content without the $superclass label you can still
> fix the problem on your end merely by defining a specific handler
> for image/svg.  true, it doesn't solve the problem for how to deal
> with large numbers of unanticipated XML-based types, but it will
> work on a small scale.

> but I have to wonder - how likely is it that such XML-ish things
> will be generated by vanilla MIME user agents anyway?  isn't it
> more likely that they will be generated by things that are XML-aware?

They will be generated by XML-generation thingies but labelled by 
MIME-generation thingies. I don't have to have a new user agent with
XML-generation capabilities to use XML heavily.

> > Then let's by all means add a content-features tag as well. Negotiation
> > problem solved.

> so you want to put the xml frob in two different places?

I'm not wild about having it in two places either, but we've already accepted
the fact that conneg duplicates a large number of other mechanisms, so this is
in some sense inevitable. For example, content-features potentially duplicates
content-type and accept-features potentially duplicates accept and
accept-charset. (But then again, accept-charset duplicates accept in some
cases.)

I guess my position is that since negotiation facilities came along late in the
game some level of duplication in them is acceptable, and since the handling of
such thing is yet to be coded we can accomodate such duplication as need be.

> seems like that introduces far more silly states than putting
> it in either of them.

On the other hand, there are places where either the content-type or
the features are unavailable.

> banning content negotiation's use of this information hardly seems like a
> constructive way forward.  what I'd like to see is a proposal for  adding
> these frobs that anticipates the needs of content negotiation rather than
> pretending that they don't exist.

> (this can still be done even if they are bundled in the content type name,
> but it it probably needs a slighly different syntax; for one thing,
> "-" already appears in a number of content-type names without being
> intended as a feature separator.)

I actually don't have a problem with this approach either, but it seems
to me it should be part of the content negotiation framework rather than
an isolated, ad-hoc extension to HTTP alone.

> my guess is that superclasses will often be omitted, but will rarely
> be incorrect when they are present.

I'm pretty much convinced this isn't going to be the case. I've seen several
instances of type transformation that are parameter-preserving. We're protected
from this now by the near-orthogonality of media type parameters, but this
would change with a superclass parameter of the sort you propose.

> > No, that is not what I'm saying. What I'm saying is that a new agent, one that
> > depends on the parameter being present, only works if it is present.

> why should an agent depend on the parameter being present (if it knows
> that the specific type is XML-based)?

Keith, you're the one that claims that the instant a new suffix is defined
people will start trying to use it in accept fields, even if such usage
is explicitly disallowed. Whereas all I'm claiming is that people might
just possibly try and use a field we standardize for its intended purpose, and
rely on it when they cannot.

You're the one who's making an apples and oranges argument here.

> > People do expect it. We have argued long and hard that a filename isn't
> > sufficient, and we've mostly won. Now that we've won we want to say,
> > "Surprise, we've changed our minds, you now need to generate these
> > parameters for things to work". This is unacceptable.

> we've always said that the content-type specification included the parameters,
> that the content-type name without the parameters was incomplete.

Unfortunately, while we've won the media type battle, this one we've lost. (It
probably would have helped if we'd had some instances early on where the
parameters were clearly vital in the handling of some atomic type or if we'd
defined the mailcap format so that it was clear that parameters are an
essential part of a media type. Gotta eat your own dog food...)

> so I don't
> see how we're changing our minds by defining new types that happen to have
> parameters - even if some of those parameters are shared with other types.

Basic existentialism here, I'm afraid -- our intentions might have been
good, but they are irrelevant at this point.

> anyway the content negotiation argument isn't about the particular syntax
> that is used. it's about whether, having defined this new frob for
> content-types, we then have to define new content negotiation
> mechanisms to recognize those frobs.  if we can come up with something
> that can be made to more-or-less work with existing HTTP accept and/or
> conneg expressisons, that would seem like a win.

Given the rarity of parameters in accept expresisons, I have to wonder how
well they actually work... 

But in any case, I again repeat that the suffix proposal is not intended
by anybody to be a replacement for the definition of a negotiation mechanism.
If having such a mechanism is a condition of acceptance of the proposal,
I'm sure we can define one that will work.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id QAA12674 for ietf-xml-mime-bks; Mon, 20 Mar 2000 16:36:22 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id QAA12662; Mon, 20 Mar 2000 16:36:18 -0800 (PST)
Received: from thinkpad (ith1-190.twcny.rr.com [24.92.236.144]) by hesketh.net (8.9.3/8.9.3) with SMTP id TAA32233; Mon, 20 Mar 2000 19:38:01 -0500
Message-Id: <200003210038.TAA32233@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Mon, 20 Mar 2000 19:38:54 -0500
To: Keith Moore <moore@cs.utk.edu>
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion 
Cc: Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <200003210016.TAA28135@astro.cs.utk.edu>
References: <Your message of "Mon, 20 Mar 2000 18:48:36 EST."             <200003202347.SAA29764@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 07:16 PM 3/20/00 -0500, Keith Moore wrote:
>> Requesting yet again.  Could we please have details of this horrible
>> problem inflicted by the -xml suffix?
>
>could we please have details of how conneg expressions or HTTP
>Accept headers could be used to say things like
>"I accept XML documents"?

It's a good question, except (critically) that the suffix under discussion
doesn't provide any mechanisms for making such requests. 

All this suffix lets us do is sort out the XML from the non-XML without
sniffing.  As has been stated repeatedly for the last few days, sniffing is
neither reliable nor fun, and doesn't take advantage of commodity components.

I am quite content to have meaningfully labeled information even in the
absence of detailed discussions about those labels.  

I'm still looking for damage, though I'm concluding more and more that
there simply isn't any, and that this discussion is only circling.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id QAA12245 for ietf-xml-mime-bks; Mon, 20 Mar 2000 16:17:57 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id QAA12233; Mon, 20 Mar 2000 16:17:53 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id TAA28146; Mon, 20 Mar 2000 19:17:56 -0500 (EST)
Message-Id: <200003210017.TAA28146@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: ned.freed@INNOSOFT.COM
cc: Keith Moore <moore@cs.utk.edu>, "Simon St.Laurent" <simonstl@simonstl.com>, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 15:45:53 PST." <01JN9FBH0UEQ0000MM@MAUVE.INNOSOFT.COM> 
Date: Mon, 20 Mar 2000 19:17:56 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> They would matter if and only if there wasn't a separate viable solution to 
> the negotiation problem. But there is such a solution. Why not simply 
> use it?

because *that* solution really does introduce more silly states.

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id QAA12208 for ietf-xml-mime-bks; Mon, 20 Mar 2000 16:16:40 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id QAA12198; Mon, 20 Mar 2000 16:16:37 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id TAA28135; Mon, 20 Mar 2000 19:16:41 -0500 (EST)
Message-Id: <200003210016.TAA28135@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: "Simon St.Laurent" <simonstl@simonstl.com>
cc: Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 18:48:36 EST." <200003202347.SAA29764@hesketh.net> 
Date: Mon, 20 Mar 2000 19:16:41 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> Requesting yet again.  Could we please have details of this horrible
> problem inflicted by the -xml suffix?

could we please have details of how conneg expressions or HTTP
Accept headers could be used to say things like
"I accept XML documents"?

Keith


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id QAA12174 for ietf-xml-mime-bks; Mon, 20 Mar 2000 16:15:39 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id QAA12160; Mon, 20 Mar 2000 16:15:29 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id TAA28120; Mon, 20 Mar 2000 19:15:25 -0500 (EST)
Message-Id: <200003210015.TAA28120@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: ned.freed@INNOSOFT.COM
cc: Keith Moore <moore@cs.utk.edu>, Chris Lilley <chris@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 15:21:06 PST." <01JN9ERAX62O0000MM@MAUVE.INNOSOFT.COM> 
Date: Mon, 20 Mar 2000 19:15:25 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > > An advisory parameter of this sort is a worthless parameter. If you cannot
> > > depend on it appearing for every instance of a given content type you
> > > cannot use it for anything.
> 
> > that's not true.   if the parameter is missing then some opportunities
> > to evaluate the object (in the absence of knowledge of the complete type)
> > will be lost.  but that's not the same thing as "you cannot use it for
> > anything".
> 
> Something that I cannot count on being there is of no use to me, and, I
> suspect, to anyone else who cares about this stuff. This has to be
> dependable to be useful.

by that argument nothing in MIME would be useful except the ability
to handle text/plain; charset=us-ascii, since that's all that was
there in RFC 822 and all that could be expected out of email recipients
at the time MIME was deployed.  

and I fail to see how special recognition of the -xml frob is any 
more likely than special treatment of a $superclass parameter.
by your criteria above neither of these will be useful.

> > > (Or worse, you will use it and fail when it isn't present.)
> 
> > how is this kind of "failure" any different than the kind of failure
> > that exists when a reader doesn't understand image/svg-xml and
> > fails to recognize that the -xml suffix means that it can be
> > treated as generic xml?
> 
> It is different because I can do something about it -- I can upgrade
> my software. Whereas in order to get benefit from your proposal I have
> to upgrade the software that created the object I received -- software I 
> do not control. And in some cases I even have to upgrade the protocols
> in use.

okay.  I see that there is a higher barrier to adoption of 
$superclass than for -xml because the sender is more likely to
be able to generate -xml than $superclass.  

but I still think you overstate the hazard.  if someone sends you
an image/svg content without the $superclass label you can still
fix the problem on your end merely by defining a specific handler
for image/svg.  true, it doesn't solve the problem for how to deal
with large numbers of unanticipated XML-based types, but it will 
work on a small scale.

but I have to wonder - how likely is it that such XML-ish things 
will be generated by vanilla MIME user agents anyway?  isn't it
more likely that they will be generated by things that are XML-aware?

> > > This is a complete red herring. Nobody is proposing that the suffix be
> > > used for negotiation purposes. Negotiation is a different problem than
> > > labelling.
> 
> > yes, they are different problems, but if we establish this syntatic
> > conventntion, people will want to use that convention in content
> > negotiation.  or to put it another way, a content-type feature
> > labelling convention  that works hand in hand with content negotiation
> > is vastly more useful than one which does not.
> 
> Then let's by all means add a content-features tag as well. Negotiation
> problem solved.

so you want to put the xml frob in two different places?
seems like that introduces far more silly states than putting
it in either of them. 

> > > Again, an advisory parameter of this sort accomplishes nothing.
> 
> > "nothing" seems to me like a gross exaggeration.
> 
> I see it as an understatement.

fine, but it's not exactly an illuminating explanation.

> > > > To put it another way, if it's absolutely important that every instance
> > > > of image/svg be externally labelled as XML then I'd agree with you.
> > > > But I don't see this as absolutely important.
> 
> > > I do.
> 
> > okay, why?
> 
> Because I like to build software that actually works most of the time.

funny, I do too.  OTOH, I realize that when I send someone a new
or unusual kind of MIME object that there's a chance they won't
be able to interpret it, and that I won't find this out immediately,
and I accept that as a fundamental limitation of MIME.

> > okay, how many HTTP servers can deal with
> 
> > Accept: */*-xml
> 
> Any compliant HTTP server can deal with it. Now, it may not give you  the
> result you want, but so what? This is nothing but a strawman argument; the
> proposal at hand doesn't specify any extensions to the accept field in HTTP. 
> If you think it is such a risk that such usage will appear then we can
> specifically ban it.

banning content negotiation's use of this information hardly seems like a 
constructive way forward.  what I'd like to see is a proposal for  adding 
these frobs that anticipates the needs of content negotiation rather than 
pretending that they don't exist.

(this can still be done even if they are bundled in the content type name,
but it it probably needs a slighly different syntax; for one thing,
"-" already appears in a number of content-type names without being
intended as a feature separator.)

> > > Thus far all you have cited are easily surmountable
> > > problems, like the ordering of future additional suffixes (assuming there
> > > ever are any).
> 
> > yes, if we decide to use frobs on the content-type name it's not too
> > difficult to define a canonical ordering such that there's one unique
> > way of spelling any content-type name.  but if we do go in that direction
> > then I'd like us to go ahead and define that syntatic convention,
> > including the ordering
> 
> Fine with me.
> 
> > > > a concrete example of something that this breaks would be helpful
> > > > in getting me to understand your concerns.
> > >
> > > It breaks so many things in so many ways... Some exmaples:
> > >
> > > (1) Silly state problems. Consider the possible effect of image/jpeg;
> > >     $superclass=text/xml on a handler only prepared to accept XML text.
> > >     (And compare it with the effect of image/jpeg; charset=us-ascii on
> > >     any existing handler.)
> 
> > we need to compare apples to apples.  if the handler is only prepared
> > to accept XML text and the image/jpeg arrives with a $superclass=text/xml
> > type then it gets handled to the XML layer and that layer says
> > "invalid XML content" (and ideally the recipient gets a chance to save it).
> > if it sees image/jpeg; charset=us-ascii then it gets treated as
> > application/octet-stream and the recipient gets a chance to save it.
> 
> I'm sorry, but I _am_ comparing apples with apples. I've seen way too much
> software that simply crashes under such circumstances. (And so, I suspect, 
> have you.) Whereas the latter case causes no harm at all.

okay, I'm making assumptions about XML here - which is that XML readers
actually are built on parsing technology and can thus supply reasonable
error messages a high percentage of the time.  and that there are
cookies within XML that the XML parser will use to quickly distinguish
most obviously-not-valid-xml content from maybe-valid-xml content.

now if were were talking about subclasses of e.g. postscript or pdf I 
would indeed expect to see gobbledegook and crashes.  (since interpreters
for these langages often seem to crash even on perfectly valid documents)

and of course this would be a general purpose mechanism, not one specific
to xml.

but given that there's already a great deal of potential for mislabelling
in MIME, does this additional opportunity really increase the risk
that an object will be mislabelled?  i.e. how much more likely is it that 

a) the content-type is correct, but a $superclass parameter is present 
   and incorrect, than
b) the content-type is incorrect?

my guess is that superclasses will often be omitted, but will rarely
be incorrect when they are present.

> 
> > > (2) Problems with sending agents not including the tag. Suppose an application
> > >     is deployed that depends on the superclass tag. (This is inevitable once
> > >     the tag is defined, you can call it advisory until you are blue in the
> > >     face but if it is used at all it won't be taken as such.)
> 
> > let me see if I understand you correctly: what you are saying is that
> > people will expect the new convention (whatever it is) to work and
> > control the recipient's MIME readers's fallback behavior even in
> > the presence of a  vast installed base that neither understands this
> > convention nor XML?
> 
> No, that is not what I'm saying. What I'm saying is that a new agent, one that
> depends on the parameter being present, only works if it is present. 

why should an agent depend on the parameter being present (if it knows
that the specific type is XML-based)?

> And this then implies that a substantial number of agents need to send the 
> tag in order for it to be useful. This won't happen soon if at all, so my 
> agent that I wrote which depends on the tag we've called for ends up not 
> working.
> 
> > or that this would be like the user agents that could recognize
> > filename suffixes but refused  to look at the content-type?
> > people would expect their content to be read by the recipient
> > even if they could not label it correctly?
> 
> People do expect it. We have argued long and hard that a filename isn't
> sufficient, and we've mostly won. Now that we've won we want to say, 
> "Surprise, we've changed our minds, you now need to generate these 
> parameters for things to work". This is unacceptable.

we've always said that the content-type specification included the parameters,
that the content-type name without the parameters was incomplete.  so I don't
see how we're changing our minds by defining new types that happen to have 
parameters - even if some of those parameters are shared with other types.

> > (the latter strikes me as an argment against any sort of XML
> > convention at all - since you seem to be saying that if the convention
> > does exist that it will be (mis)used in preference to the primary
> > content-type )
> 
> The nice thing about the tag being part of the media type is that it leverages
> off of our years of insisting that proper media type labels be used. We define
> the convention and the rest of the problem takes care of itself.
> 
> > and how many user agents does this affect?  e.g., what percentage
> > of UAs cannot send out a text/plain attachment with a charset label?
> 
> A pretty large number, actually.
> 
> > > (3) Problems with places where parameters aren't expected/allowed. Once
> > >     the tag is required there will be pressure to generate it. This in turn
> > >     will lead to sending agents upgrading producing it and thereby cranking
> > >     out parameters for the first time. Some of these agents are now used to
> > >     generate values for fields that don't allow parameters. The upgrade will
> > >     cause these fields to become synatically invalid.
> 
> > such as in accept headers?
> 
> Strawman again. There's nothing syntactically invalid about putting funny
> wildcards in an accept header.

no, but as you point out, that doesn't have the desired result.

anyway the content negotiation argument isn't about the particular syntax 
that is used. it's about whether, having defined this new frob for
content-types, we then have to define new content negotiation
mechanisms to recognize those frobs.  if we can come up with something
that can be made to more-or-less work with existing HTTP accept and/or 
conneg expressisons, that would seem like a win.

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id PAA10551 for ietf-xml-mime-bks; Mon, 20 Mar 2000 15:55:25 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id PAA10541; Mon, 20 Mar 2000 15:55:18 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN9BU8XYCW0000MM@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 15:56:42 -0800 (PST)
Date: Mon, 20 Mar 2000 15:45:53 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 18:38:21 -0500" <200003202338.SAA27934@astro.cs.utk.edu>
To: Keith Moore <moore@cs.utk.edu>
Cc: ned.freed@INNOSOFT.COM, Keith Moore <moore@cs.utk.edu>, "Simon St.Laurent" <simonstl@simonstl.com>, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN9FBH0UEQ0000MM@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <01JN9E2T6ZYU0000MM@MAUVE.INNOSOFT.COM>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > Whereas I believe your proposal leapfrogs shortsighted into the realm
> > of lunacy.

> hey, I'm quite willing to entertain criticism and feedback, but
> a label like "lunacy" seems completely unjustified.  nor does
> it tell me much about the problems you see with it.  (and I
> am trying to understand these)  the most I've been able to glean
> thus far is that people will misuse parameters in various ways.
> even granted that this will happen, I still don't see that it's
> "lunacy" to consider that path.

Of course it isn't wrong to consider it. But make no mistake about it: You are
opening up MIME and redesigning it in a fundamental way here. Even if the
change was a sensible one this would be *extremely* dangerous, and I would
probably oppose it on these grounds even if I thought it was a good idea.

But IMO this change isn't even remotely sensible -- and I have explained why I
believe this in great detail already. But you continue to advocate it in spite
of all this. This I consider to be lunacy. You may not like this
characterization, but I call them as I see them, and that's how I see this.
(Actually, lunacy is the kindest term I could come up with.)

> and I'm also willing to consider alternative ways of doing this,
> including the -xml frob.

That is not what you have indicated.

> but I think it's worth looking at
> the implications of these alternatives with respect to content
> negotiation.  and I completely don't buy the arguments that people
> aren't going to want to recognize these frobs in content negotiation
> mechanisms, or that these considerations don't matter because they're
> not part of the current proposal.

They would matter if and only if there wasn't a separate viable solution to the
the negotiation problem. But there is such a solution. Why not simply use it?

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id PAA10343 for ietf-xml-mime-bks; Mon, 20 Mar 2000 15:46:03 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id PAA10333; Mon, 20 Mar 2000 15:46:00 -0800 (PST)
Received: from thinkpad (ith1-190.twcny.rr.com [24.92.236.144]) by hesketh.net (8.9.3/8.9.3) with SMTP id SAA29764; Mon, 20 Mar 2000 18:47:44 -0500
Message-Id: <200003202347.SAA29764@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Mon, 20 Mar 2000 18:48:36 -0500
To: Keith Moore <moore@cs.utk.edu>
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion 
Cc: ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <200003202258.RAA27431@hesketh.net>
References: <200003202030.PAA27040@astro.cs.utk.edu> <Your message of "Mon, 20 Mar 2000 15:00:27 EST."             <200003201959.OAA17209@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Requesting yet again.  Could we please have details of this horrible
problem inflicted by the -xml suffix?

Thanks.  Without details, we can only circle.

>KM>>or to put it another way, if we're going to invent a new syntatical
>KM>>convention for describing content-type characteristics, let's 
>KM>>pick one that actually works with content negotiation 
>KM>>rather than one which requires drastic changes to content 
>KM>>negotiation frameworks.
>>
>SSL>I have yet to hear what those 'drastic changes' involve.  More details
>SSL>would be greatly appreciated.
>
>I would still appreciate details describing the consquences of this risk.  
>
>Thanks!

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id PAA10165 for ietf-xml-mime-bks; Mon, 20 Mar 2000 15:43:12 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id PAA10155; Mon, 20 Mar 2000 15:43:09 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN9BU8XYCW0000MM@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 15:44:34 -0800 (PST)
Date: Mon, 20 Mar 2000 15:43:29 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 18:28:29 -0500" <200003202328.SAA27887@astro.cs.utk.edu>
To: Keith Moore <moore@cs.utk.edu>
Cc: "Simon St.Laurent" <simonstl@simonstl.com>, Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN9EWFO4HG0000MM@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <200003202258.RAA27431@hesketh.net>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > It's a plausible risk only if HTTP 1.1 or other protocols are explicitly
> > reopened and modified.  Otherwise, it's pretty easy to point to the spec
> > and say "No.  That's forbidden."

> that's not the risk I'm referring to.  The risk I'm talking about is the
> risk that people will want to reopen and modify those protocols for such
> purposes, but they will be left without a clean way of doing so.

Again, by all means specify a "this is XML" content feature tag of some sort.
This solves the negotiation problem without the need to open up anything.

				Ned


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id PAA10072 for ietf-xml-mime-bks; Mon, 20 Mar 2000 15:39:07 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id PAA10060; Mon, 20 Mar 2000 15:39:04 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN9BU8XYCW0000MM@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 15:40:26 -0800 (PST)
Date: Mon, 20 Mar 2000 15:21:06 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 15:28:51 -0500" <200003202028.PAA27011@astro.cs.utk.edu>
To: Keith Moore <moore@cs.utk.edu>
Cc: ned.freed@INNOSOFT.COM, Keith Moore <moore@cs.utk.edu>, Chris Lilley <chris@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN9ERAX62O0000MM@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <01JN94K31VY200004D@MAUVE.INNOSOFT.COM>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > > > Second, the definition of a global parameter namespace is far from the worst
> > > > thing about this. The biggest problem this approach has is that now you have
> > > > what amounts to a mandatory parameter, one which has to appear with every
> > > > appearance of a given media type and must have a specific value in those
> > > > cases.
> >
> > > actually, I don't view the parameter as mandatory; I view it as advisory.
> > > The right way to handle this is to treat it as an image/svg.  Depending
> > > on how you look at it, the information in the $superclass parameter is
> > > either a fallback way to handle the type if you don't know how to handle
> > > image/svg, or additional information that can be used by generic "index
> > > everything that is XML" processors.
> >
> > An advisory parameter of this sort is a worthless parameter. If you cannot
> > depend on it appearing for every instance of a given content type you
> > cannot use it for anything.

> that's not true.   if the parameter is missing then some opportunities
> to evaluate the object (in the absence of knowledge of the complete type)
> will be lost.  but that's not the same thing as "you cannot use it for
> anything".

Something that I cannot count on being there is of no use to me, and, I
suspect, to anyone else who cares about this stuff. This has to be
dependable to be useful.

> > (Or worse, you will use it and fail when it isn't present.)

> how is this kind of "failure" any different than the kind of failure
> that exists when a reader doesn't understand image/svg-xml and
> fails to recognize that the -xml suffix means that it can be
> treated as generic xml?

It is different because I can do something about it -- I can upgrade
my software. Whereas in order to get benefit from your proposal I have
to upgrade the software that created the object I received -- software I 
do not control. And in some cases I even have to upgrade the protocols
in use.

> > > > This is completely contrary to the entire intent of the parameter
> > > > space: Parameters were intended to convey information that isn't
> > > > invariant with the media type.
> >
> > > it's true that parameters were intended to convey type-specific information,
> > > but I don't see the harm in defining another set of parameters (disjoint
> > > from the type-specific parameters), which convey other information about
> > > the media type.
> >
> > Well, all I can say is that I do see the potential for tremendous harm in
> > doing it, many times more harm than I see in the suffix that you seem to
> > think is so bad.

> again, a concrete example would help.
 
I gave you three.

> > In order for this to be useful they have to affect every agent.

> again, how is this different than the -xml convention?

I've explained the differences repeatedly. I'm not going to do it again.

> > This is a complete red herring. Nobody is proposing that the suffix be
> > used for negotiation purposes. Negotiation is a different problem than
> > labelling.

> yes, they are different problems, but if we establish this syntatic
> conventntion, people will want to use that convention in content
> negotiation.  or to put it another way, a content-type feature
> labelling convention  that works hand in hand with content negotiation
> is vastly more useful than one which does not.

Then let's by all means add a content-features tag as well. Negotiation
problem solved.

> > Again, an advisory parameter of this sort accomplishes nothing.

> "nothing" seems to me like a gross exaggeration.

I see it as an understatement.

> > > To put it another way, if it's absolutely important that every instance
> > > of image/svg be externally labelled as XML then I'd agree with you.
> > > But I don't see this as absolutely important.

> > I do.

> okay, why?

Because I like to build software that actually works most of the time.

> > > > In summary, what this amounts to is nothing less that a complete
> > > > redesign of how  MIME works.
> >
> > > "complete redesign" seems a bit overstated.  It is a slightly different
> > > way of using parameters than originally envisioned.  But it doesn't
> > > interfere with the existing parameter space.
> >
> > Actually, I think I seriously understated it. It isn't just a complete
> > redesign, it is a revamping of the underlying principles and agreements
> > MIME is based on.

> I don't see how reserving a space of parameter names across the board
> amounts to a revamping of these underlying principles.

> (to clarify: I did not intend that such parameters would automagically
> apply to all content-types, but rather, that such names would be
> reserved, much as "charset" is reserved for text/* types.
> so that if a parameter with a reserved name were defined for
> a content-type its use would have to be consistent with the use
> already established for the reserved parameter name).

> > > And while I understand that many implementations don't have the ability
> > > to generate or the ability to interpret content-type parameters, I don't
> > > see how this breaks anything that works now.  by contrast, the -xml frob
> > > coupled with the desire for content-negotiation languages (http accept
> > > and conneg) to recognize anything that is XML does break things.
> > > I'm just trying to minimize breakage.
> >
> > Please cite a single, solitary example of where it breaks things. I'd like
> > to hear of one.

> okay, how many HTTP servers can deal with

> Accept: */*-xml

Any compliant HTTP server can deal with it. Now, it may not give you  the
result you want, but so what? This is nothing but a strawman argument; the
proposal at hand doesn't specify any extensions to the accept field in HTTP. If
you think it is such a risk that such usage will appear then we can
specifically ban it.

> > Thus far all you have cited are easily surmountable
> > problems, like the ordering of future additional suffixes (assuming there
> > ever are any).

> yes, if we decide to use frobs on the content-type name it's not too
> difficult to define a canonical ordering such that there's one unique
> way of spelling any content-type name.  but if we do go in that direction
> then I'd like us to go ahead and define that syntatic convention,
> including the ordering

Fine with me.

> > > a concrete example of something that this breaks would be helpful
> > > in getting me to understand your concerns.
> >
> > It breaks so many things in so many ways... Some exmaples:
> >
> > (1) Silly state problems. Consider the possible effect of image/jpeg;
> >     $superclass=text/xml on a handler only prepared to accept XML text.
> >     (And compare it with the effect of image/jpeg; charset=us-ascii on
> >     any existing handler.)

> we need to compare apples to apples.  if the handler is only prepared
> to accept XML text and the image/jpeg arrives with a $superclass=text/xml
> type then it gets handled to the XML layer and that layer says
> "invalid XML content" (and ideally the recipient gets a chance to save it).
> if it sees image/jpeg; charset=us-ascii then it gets treated as
> application/octet-stream and the recipient gets a chance to save it.

I'm sorry, but I _am_ comparing apples with apples. I've seen way too much
software that simply crashes under such circumstances. (And so, I suspect, have
you.) Whereas the latter case causes no harm at all.

> > (2) Problems with sending agents not including the tag. Suppose an application
> >     is deployed that depends on the superclass tag. (This is inevitable once
> >     the tag is defined, you can call it advisory until you are blue in the
> >     face but if it is used at all it won't be taken as such.)

> let me see if I understand you correctly: what you are saying is that
> people will expect the new convention (whatever it is) to work and
> control the recipient's MIME readers's fallback behavior even in
> the presence of a  vast installed base that neither understands this
> convention nor XML?

No, that is not what I'm saying. What I'm saying is that a new agent, one that
depends on the parameter being present, only works if it is present. And this
then implies that a substantial number of agents need to send the tag in order
for it to be useful. This won't happen soon if at all, so my agent that I wrote
which depends on the tag we've called for ends up not working.

> or that this would be like the user agents that could recognize
> filename suffixes but refused  to look at the content-type?
> people would expect their content to be read by the recipient
> even if they could not label it correctly?

People do expect it. We have argued long and hard that a filename isn't
sufficient, and we've mostly won. Now that we've won we want to say, "Surprise,
we've changed our minds, you now need to generate these parameters for
things to work". This is unacceptable.

> (the latter strikes me as an argment against any sort of XML
> convention at all - since you seem to be saying that if the convention
> does exist that it will be (mis)used in preference to the primary
> content-type )

The nice thing about the tag being part of the media type is that it leverages
off of our years of insisting that proper media type labels be used. We define
the convention and the rest of the problem takes care of itself.

> and how many user agents does this affect?  e.g., what percentage
> of UAs cannot send out a text/plain attachment with a charset label?

A pretty large number, actually.

> > (3) Problems with places where parameters aren't expected/allowed. Once
> >     the tag is required there will be pressure to generate it. This in turn
> >     will lead to sending agents upgrading producing it and thereby cranking
> >     out parameters for the first time. Some of these agents are now used to
> >     generate values for fields that don't allow parameters. The upgrade will
> >     cause these fields to become synatically invalid.

> such as in accept headers?

Strawman again. There's nothing syntactically invalid about putting funny
wildcards in an accept header.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id PAA10042 for ietf-xml-mime-bks; Mon, 20 Mar 2000 15:38:25 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id PAA10029; Mon, 20 Mar 2000 15:38:18 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id SAA27934; Mon, 20 Mar 2000 18:38:21 -0500 (EST)
Message-Id: <200003202338.SAA27934@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: ned.freed@INNOSOFT.COM
cc: Keith Moore <moore@cs.utk.edu>, "Simon St.Laurent" <simonstl@simonstl.com>, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 15:18:05 PST." <01JN9E2T6ZYU0000MM@MAUVE.INNOSOFT.COM> 
Date: Mon, 20 Mar 2000 18:38:21 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> Whereas I believe your proposal leapfrogs shortsighted into the realm
> of lunacy.

hey, I'm quite willing to entertain criticism and feedback, but
a label like "lunacy" seems completely unjustified.  nor does
it tell me much about the problems you see with it.  (and I
am trying to understand these)  the most I've been able to glean 
thus far is that people will misuse parameters in various ways.  
even granted that this will happen, I still don't see that it's 
"lunacy" to consider that path.

and I'm also willing to consider alternative ways of doing this, 
including the -xml frob.   but I think it's worth looking at
the implications of these alternatives with respect to content 
negotiation.  and I completely don't buy the arguments that people 
aren't going to want to recognize these frobs in content negotiation 
mechanisms, or that these considerations don't matter because they're 
not part of the current proposal.

Keith 


Received: by ns.secondary.com (8.9.3/8.9.3) id PAA09855 for ietf-xml-mime-bks; Mon, 20 Mar 2000 15:28:28 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id PAA09844; Mon, 20 Mar 2000 15:28:25 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id SAA27887; Mon, 20 Mar 2000 18:28:29 -0500 (EST)
Message-Id: <200003202328.SAA27887@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: "Simon St.Laurent" <simonstl@simonstl.com>
cc: Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 17:59:13 EST." <200003202258.RAA27431@hesketh.net> 
Date: Mon, 20 Mar 2000 18:28:29 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> It's a plausible risk only if HTTP 1.1 or other protocols are explicitly
> reopened and modified.  Otherwise, it's pretty easy to point to the spec
> and say "No.  That's forbidden." 

that's not the risk I'm referring to.  The risk I'm talking about is the
risk that people will want to reopen and modify those protocols for such
purposes, but they will be left without a clean way of doing so.

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id PAA09699 for ietf-xml-mime-bks; Mon, 20 Mar 2000 15:19:17 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id PAA09688; Mon, 20 Mar 2000 15:19:14 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN9BU8XYCW0000MM@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 15:20:41 -0800 (PST)
Date: Mon, 20 Mar 2000 15:18:05 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 15:30:53 -0500" <200003202030.PAA27040@astro.cs.utk.edu>
To: Keith Moore <moore@cs.utk.edu>
Cc: "Simon St.Laurent" <simonstl@simonstl.com>, Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN9E2T6ZYU0000MM@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <200003201959.OAA17209@hesketh.net>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > Yes, and I also said quite explicitly that user agents requesting */*xml is
> > not a feature of the proposal.  I don't believe it's useful to defend use
> > cases that we are not proposing.

> the thing you are proposing creates a very obvious and plausible risk.

I disagree; I think there is no risk here at all. This is in sharp contrast to
your MIME parameter proposal, where damage isn't just obvious, it is
guaranteed.

> you aren't addressing that risk.  to me that seems shortsighted.

Whereas I believe your proposal leapfrogs shortsighted into the realm
of lunacy.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id OAA09289 for ietf-xml-mime-bks; Mon, 20 Mar 2000 14:56:51 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id OAA09266; Mon, 20 Mar 2000 14:56:40 -0800 (PST)
Received: from thinkpad (ith1-190.twcny.rr.com [24.92.236.144]) by hesketh.net (8.9.3/8.9.3) with SMTP id RAA27431; Mon, 20 Mar 2000 17:58:20 -0500
Message-Id: <200003202258.RAA27431@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Mon, 20 Mar 2000 17:59:13 -0500
To: Keith Moore <moore@cs.utk.edu>
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion 
Cc: ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <200003202030.PAA27040@astro.cs.utk.edu>
References: <Your message of "Mon, 20 Mar 2000 15:00:27 EST."             <200003201959.OAA17209@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 03:30 PM 3/20/00 -0500, Keith Moore wrote:
>the thing you are proposing creates a very obvious and plausible risk.  
>you aren't addressing that risk.  to me that seems shortsighted.

It's a plausible risk only if HTTP 1.1 or other protocols are explicitly
reopened and modified.  Otherwise, it's pretty easy to point to the spec
and say "No.  That's forbidden."  It might be a good idea to present these
issues in the I-D, and make clearer that this document does not permit such
extensions, but this 'risk' is not inherent in the proposal itself.

However, I repeat:
KM>>or to put it another way, if we're going to invent a new syntatical
KM>>convention for describing content-type characteristics, let's 
KM>>pick one that actually works with content negotiation 
KM>>rather than one which requires drastic changes to content 
KM>>negotiation frameworks.
>
SSL>I have yet to hear what those 'drastic changes' involve.  More details
SSL>would be greatly appreciated.

I would still appreciate details describing the consquences of this risk.  

Thanks!

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id MAA07051 for ietf-xml-mime-bks; Mon, 20 Mar 2000 12:30:51 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id MAA07041; Mon, 20 Mar 2000 12:30:48 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id PAA27040; Mon, 20 Mar 2000 15:30:53 -0500 (EST)
Message-Id: <200003202030.PAA27040@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: "Simon St.Laurent" <simonstl@simonstl.com>
cc: Keith Moore <moore@cs.utk.edu>, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 15:00:27 EST." <200003201959.OAA17209@hesketh.net> 
Date: Mon, 20 Mar 2000 15:30:53 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> Yes, and I also said quite explicitly that user agents requesting */*xml is
> not a feature of the proposal.  I don't believe it's useful to defend use
> cases that we are not proposing.

the thing you are proposing creates a very obvious and plausible risk.  
you aren't addressing that risk.  to me that seems shortsighted.

Keith


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id MAA06997 for ietf-xml-mime-bks; Mon, 20 Mar 2000 12:29:01 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id MAA06987; Mon, 20 Mar 2000 12:28:53 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id PAA27011; Mon, 20 Mar 2000 15:28:51 -0500 (EST)
Message-Id: <200003202028.PAA27011@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: ned.freed@INNOSOFT.COM
cc: Keith Moore <moore@cs.utk.edu>, Chris Lilley <chris@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 10:25:24 PST." <01JN94K31VY200004D@MAUVE.INNOSOFT.COM> 
Date: Mon, 20 Mar 2000 15:28:51 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > > Second, the definition of a global parameter namespace is far from the worst
> > > thing about this. The biggest problem this approach has is that now you have
> > > what amounts to a mandatory parameter, one which has to appear with every
> > > appearance of a given media type and must have a specific value in those
> > > cases.
> 
> > actually, I don't view the parameter as mandatory; I view it as advisory.
> > The right way to handle this is to treat it as an image/svg.  Depending
> > on how you look at it, the information in the $superclass parameter is
> > either a fallback way to handle the type if you don't know how to handle
> > image/svg, or additional information that can be used by generic "index
> > everything that is XML" processors.
> 
> An advisory parameter of this sort is a worthless parameter. If you cannot
> depend on it appearing for every instance of a given content type you
> cannot use it for anything. 

that's not true.   if the parameter is missing then some opportunities
to evaluate the object (in the absence of knowledge of the complete type)
will be lost.  but that's not the same thing as "you cannot use it for
anything".

> (Or worse, you will use it and fail when it isn't present.)

how is this kind of "failure" any different than the kind of failure
that exists when a reader doesn't understand image/svg-xml and
fails to recognize that the -xml suffix means that it can be
treated as generic xml?

> > > This is completely contrary to the entire intent of the parameter
> > > space: Parameters were intended to convey information that isn't
> > > invariant with the media type.
> 
> > it's true that parameters were intended to convey type-specific information,
> > but I don't see the harm in defining another set of parameters (disjoint
> > from the type-specific parameters), which convey other information about
> > the media type.
> 
> Well, all I can say is that I do see the potential for tremendous harm in 
> doing it, many times more harm than I see in the suffix that you seem to 
> think is so bad.

again, a concrete example would help.  
 
> > > Third, this introduces the possibility of silly states in a major way.
> 
> > As far as I can tell, parameters already have tremendous potential for
> > silly states - nothing stops you from adding any parameter to any media
> > type (so you can for example have a charset parameter on an image).
> 
> While this may be a silly state in some sense, it is an entirely harmless one.
> The silly states this superclass nonsense introduces are anything but.
> 
> > But in practice, and despite widespread misuse of certain parameter
> > names with content-types that don't define those parameters, it doesn't
> > seem to cause big problems.  Unrecognized parameters are generally
> > ignored.
> 
> Exactly. We were careful to define parameters in this way so that bad
> behavior as a result of silly states was minimized. But this isn't possible
> with the sort of parameter you're proposing.

I don't buy it.  all of the examples of silly states I can think of 
are at worst equivalent to just mislabelling the content-type, which
already happens.  if you mislabel an object's content-type you should
expect to lose, and I don't see how having the ability to mislabel
parameters changes this.

now I will admit that having the extra parameter creates *additional*
states - for instance, if you as a sender don't want to encourage 
default processing of your xml document, you can omit the extra parameter.  
but while I'm not sure how much value there is in this, those states 
don't quite seem "silly" to me. 
 
> > > Fourth, the infrastructure upgrades to handle this are far nastier
> > > than you seem to think, and have to happen at both the sending and receiving
> > > end.
> 
> > I guess the question in my mind is whether those upgrades affect every
> > MIME handling agent or whether (for the time being) they only affect
> > things that care specifically about XML.
> 
> In order for this to be useful they have to affect every agent.

again, how is this different than the -xml convention?
in either case, an agent that doesn't understand the convention
will ignore it and not be able to benefit from the extra information.
and an agent that does understand the convention will use it.
in either case, the convention is "useful" for those agents
that implement it and is benign for those that don't.

in neither case can a sender expect generic XML processing as a fallback
because (unlike text/plain processing) XML processing is not required
of MIME user agents.

> > The latter set seems a lot
> > smaller than the former set, and I'm less worried about the impact of
> > this on XML-aware agents (which after are in an emerging space anyway)
> > than I am worried about the impact of -xml frobs on things that need
> > to do content-negotiation (which includes every HTTP client and server).
> 
> This is a complete red herring. Nobody is proposing that the suffix be
> used for negotiation purposes. Negotiation is a different problem than
> labelling.

yes, they are different problems, but if we establish this syntatic
conventntion, people will want to use that convention in content 
negotiation.  or to put it another way, a content-type feature
labelling convention  that works hand in hand with content negotiation
is vastly more useful than one which does not.

> 
> > > Fifth, as I said before, this doesn't work with places where only media
> > > types and not parameters are carried. Such places do exist and fixing
> > > them all is going to be nearly impossible.
> 
> > Since I see this parameter as advisory, to me it doesn't matter much.
> > Applications for which the advisory parameter is of sufficient utility
> > will get fixed to interpret or generate that parameter; those that don't
> > benefit sufficiently will not get fixed.
> 
> Again, an advisory parameter of this sort accomplishes nothing.

"nothing" seems to me like a gross exaggeration.  

> > To put it another way, if it's absolutely important that every instance
> > of image/svg be externally labelled as XML then I'd agree with you.
> > But I don't see this as absolutely important.
> 
> I do.

okay, why?

> > > In summary, what this amounts to is nothing less that a complete 
> > > redesign of how  MIME works.
> 
> > "complete redesign" seems a bit overstated.  It is a slightly different
> > way of using parameters than originally envisioned.  But it doesn't
> > interfere with the existing parameter space.
> 
> Actually, I think I seriously understated it. It isn't just a complete
> redesign, it is a revamping of the underlying principles and agreements 
> MIME is based on.

I don't see how reserving a space of parameter names across the board
amounts to a revamping of these underlying principles. 

(to clarify: I did not intend that such parameters would automagically
apply to all content-types, but rather, that such names would be
reserved, much as "charset" is reserved for text/* types.
so that if a parameter with a reserved name were defined for 
a content-type its use would have to be consistent with the use
already established for the reserved parameter name).

> > And while I understand that many implementations don't have the ability
> > to generate or the ability to interpret content-type parameters, I don't
> > see how this breaks anything that works now.  by contrast, the -xml frob
> > coupled with the desire for content-negotiation languages (http accept
> > and conneg) to recognize anything that is XML does break things.
> > I'm just trying to minimize breakage.
> 
> Please cite a single, solitary example of where it breaks things. I'd like
> to hear of one. 

okay, how many HTTP servers can deal with 

Accept: */*-xml

> Thus far all you have cited are easily surmountable
> problems, like the ordering of future additional suffixes (assuming there
> ever are any).

yes, if we decide to use frobs on the content-type name it's not too 
difficult to define a canonical ordering such that there's one unique
way of spelling any content-type name.  but if we do go in that direction
then I'd like us to go ahead and define that syntatic convention,
including the ordering

> 
> > a concrete example of something that this breaks would be helpful
> > in getting me to understand your concerns.
> 
> It breaks so many things in so many ways... Some exmaples:
> 
> (1) Silly state problems. Consider the possible effect of image/jpeg;
>     $superclass=text/xml on a handler only prepared to accept XML text.
>     (And compare it with the effect of image/jpeg; charset=us-ascii on
>     any existing handler.)

we need to compare apples to apples.  if the handler is only prepared
to accept XML text and the image/jpeg arrives with a $superclass=text/xml
type then it gets handled to the XML layer and that layer says
"invalid XML content" (and ideally the recipient gets a chance to save it).  
if it sees image/jpeg; charset=us-ascii then it gets treated as 
application/octet-stream and the recipient gets a chance to save it.

if on the other hand, we take both cases and feed them to 
"any existing handler" (say, a typical MIME mail reader)
then in the former case the $superclass is ignored and in the latter
case the charset is ignored.  in either case the content is either
fed to a jpeg reader, or to a generic image/* reader, or it is
treated as application/octet-stream.  

but since we're really trying to compare putting the xml frob into
a separate parameter vs. putting it into the content-type name, 
perhaps that should be one axis of the case analysis.  the other
obvious axis is existing MIME readers vs. MIME readers taught to
understand the new syntax.  
 
> (2) Problems with sending agents not including the tag. Suppose an application
>     is deployed that depends on the superclass tag. (This is inevitable once
>     the tag is defined, you can call it advisory until you are blue in the
>     face but if it is used at all it won't be taken as such.) 

let me see if I understand you correctly: what you are saying is that 
people will expect the new convention (whatever it is) to work and
control the recipient's MIME readers's fallback behavior even in
the presence of a  vast installed base that neither understands this
convention nor XML?

or that this would be like the user agents that could recognize
filename suffixes but refused  to look at the content-type?
people would expect their content to be read by the recipient
even if they could not label it correctly?

(the latter strikes me as an argment against any sort of XML
convention at all - since you seem to be saying that if the convention
does exist that it will be (mis)used in preference to the primary 
content-type )

and how many user agents does this affect?  e.g., what percentage
of UAs cannot send out a text/plain attachment with a charset label?

> Now consider
>     what happens when something that's incapable of generating the tag sends
>     to the agent that requires it.

the agent either knows how to handle that specific content-type, or it
treats it per the fallback rule for the top-level content-type, or it 
treates it as application/octet-stream.

but I fail to see why a well-behaved application would "require" 
the extra parameter.  if it knows what image/svg is, why would
it insist that the $superclass parameter be present?

> (3) Problems with places where parameters aren't expected/allowed. Once
>     the tag is required there will be pressure to generate it. This in turn
>     will lead to sending agents upgrading producing it and thereby cranking
>     out parameters for the first time. Some of these agents are now used to
>     generate values for fields that don't allow parameters. The upgrade will
>     cause these fields to become synatically invalid.

such as in accept headers? 

> 
> I could go on to list many more, but hopefully this gives some flavor
> of how bad I think this idea is.
> 
> 				Ned

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id LAA06489 for ietf-xml-mime-bks; Mon, 20 Mar 2000 11:57:58 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id LAA06479; Mon, 20 Mar 2000 11:57:55 -0800 (PST)
Received: from thinkpad (ith1-190.twcny.rr.com [24.92.236.144]) by hesketh.net (8.9.3/8.9.3) with SMTP id OAA17209; Mon, 20 Mar 2000 14:59:36 -0500
Message-Id: <200003201959.OAA17209@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Mon, 20 Mar 2000 15:00:27 -0500
To: Keith Moore <moore@cs.utk.edu>
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion 
Cc: ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <200003201950.OAA26918@astro.cs.utk.edu>
References: <Your message of "Mon, 20 Mar 2000 13:51:42 EST."             <200003201850.NAA13040@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 02:50 PM 3/20/00 -0500, Keith Moore wrote:
>or to put it another way, if we're going to invent a new syntatical
>convention for describing content-type characteristics, let's 
>pick one that actually works with content negotiation 
>rather than one which requires drastic changes to content 
>negotiation frameworks.

I have yet to hear what those 'drastic changes' involve.  More details
would be greatly appreciated.

>and yet you just admitted that it would cause a problem if user agents 
>started requesting */*xml.

Yes, and I also said quite explicitly that user agents requesting */*xml is
not a feature of the proposal.  I don't believe it's useful to defend use
cases that we are not proposing.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id LAA06430 for ietf-xml-mime-bks; Mon, 20 Mar 2000 11:52:22 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id LAA06420; Mon, 20 Mar 2000 11:52:16 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id OAA26928; Mon, 20 Mar 2000 14:52:18 -0500 (EST)
Message-Id: <200003201952.OAA26928@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: ned.freed@INNOSOFT.COM
cc: "Simon St.Laurent" <simonstl@simonstl.com>, Keith Moore <moore@cs.utk.edu>, Chris Lilley <chris@w3.org>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 10:48:40 PST." <01JN94MCPL8U00004D@MAUVE.INNOSOFT.COM> 
Date: Mon, 20 Mar 2000 14:52:18 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> I note in passing that we've already registered some XML-based types
> with an "XML" suffix, and no problems have been reported with this
> usage.

one or two -xml types don't hurt much.  but if we're going to have a
general naming for XML types, this might have a more profound effect.


Received: by ns.secondary.com (8.9.3/8.9.3) id LAA06390 for ietf-xml-mime-bks; Mon, 20 Mar 2000 11:51:07 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id LAA06376; Mon, 20 Mar 2000 11:50:55 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id OAA26918; Mon, 20 Mar 2000 14:50:56 -0500 (EST)
Message-Id: <200003201950.OAA26918@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: "Simon St.Laurent" <simonstl@simonstl.com>
cc: Keith Moore <moore@cs.utk.edu>, ned.freed@INNOSOFT.COM, Chris Lilley <chris@w3.org>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 13:51:42 EST." <200003201850.NAA13040@hesketh.net> 
Date: Mon, 20 Mar 2000 14:50:56 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> At 01:33 PM 3/20/00 -0500, Keith Moore wrote:
> >if you want an expression that matches anything ending in -xml, 
> >as far as I can tell, existing http accept header semantics,
> >and existing conneg expressions, are not sufficient to recognize that.
> >
> >if you want it to be more general and anticipate the possibility
> >of multiple frobs (not just -xml) then you need something like
> >a regular expression matching capability 
> >(so you can look for things like "-xml(-|$)" )
> 
> I think we may encountered a problem in our underlying viewpoints.  You're
> talking about sending */*xml over the wire in headers, while I'm only
> talking about using */*xml to recognize XML documents when I get them.  

I am talking about both, but I think they're both problems.

or to put it another way, if we're going to invent a new syntatical
convention for describing content-type characteristics, let's 
pick one that actually works with content negotiation 
rather than one which requires drastic changes to content 
negotiation frameworks.

> I don't think the proposal as written breaks things any more than
> introducing a new MIME type would.

in a sense, it doesn't break anything.  in another sense, it 
creates a new feature of content-type names that can be exploited,
and people will demand that they be exploited.  so it requires 
changes of existing implementations.  and if the -xml feature catches 
on then it will creep into content-type negotiation schemes.  this is 
not an intended consequence, but it is a likely one.  

> Compared to the parameter option you keep proposing, I'd say this causes
> almost zero problems as far as interoperability and compatibility are
> concerned.

interoperability and compatibility aren't as much my concern as
feature creep and its effect on content-negotiation structures.

> If user agents start requesting */*xml, we do have a problem.  

why do you think they won't want to do that?

> However,
> that isn't what this I-D proposes - if users find that useful, they'll have
> to push for changes in the protocol infrastructure.  

yes, but if such changes are to be made, then perhaps it would be a good
idea to pick a syntax for declaring such features that doesn't require
changes to the protocol infrastructure - or at least, to consider what
those infrastructure changes would need to look like and evaluate the 
new syntax in light of those changes.

> All this suffix does is provide additional information identifying content
> as having an XML foundation.  I don't believe it has any other
> implications, nor do I believe it will interfere with other developing
> standards for content negotiation.

and yet you just admitted that it would cause a problem if user agents 
started requesting */*xml.

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id LAA05931 for ietf-xml-mime-bks; Mon, 20 Mar 2000 11:24:21 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id LAA05921; Mon, 20 Mar 2000 11:24:18 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN8A01MCOW00004D@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 11:25:39 -0800 (PST)
Date: Mon, 20 Mar 2000 11:20:01 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 11:10:03 -0800" <003801bf929f$e7887b80$6d61fea9@dbc.mtview.ca.us>
To: Marshall Rose <mrose+mtr.netnews@dbc.mtview.ca.us>
Cc: ned.freed@INNOSOFT.COM, Keith Moore <moore@cs.utk.edu>, Chris Lilley <chris@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org, Marshall Rose <mrose@dbc.mtview.ca.us>
Message-id: <01JN95UFD62S00004D@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=iso-8859-1
References: <38D61D2B.A71DC71C@w3.org> <01JN92LO01ME00004D@MAUVE.INNOSOFT.COM> <01JN94T4BSEM00004D@MAUVE.INNOSOFT.COM>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > > suppose we were just to have something like:
> >
> > >     Content-type: image/svg; representation="xml"
> >
> > > where "representation" is just a plain old MIME parameter.
> >
> > > would you object to something along these lines?
> >
> > Yes. Most of the problems I have with global parameter still apply to this
> > usage.
> >
> > If we absolutely have to do this with a separate piece of information, I
> would
> > opt for a content-feature tag. That way there's a clear delineation
> between
> > when feature information is or is not present, and we don't mess up MIME
> > parameter space. And we need the feature tag anyway for negotiation
> purposes.

> hmmm.  my view of the example above is that XML is being used as the syntax
> but the semantics of the blob being passed are still SVG semantics.

I agree, but I fail to see how this changes things. I was talking about
having a media type other than application/xml _and_ a content-feature
field.

> at the risk of seeming insensitive with the exception of "text/xml", i would
> never expect to see a subtype of "xml" for any media type.

Again I agree. People are going to register separate media types for
the XML-based things they come up with. (This is already hapening.)

> of course, taking
> that line, i suppose that the example above should simply be

>     image/svg

> and that the processing element for that application should already know the
> possible syntaxes that it could encounter.

This is the crux of the -xml proposal. All it says is that if SVG always  has
XML syntax the name image/svg-xml may be used to indicate that in the media
type name and that the -xml suffix won't ever be used on anything that doesn't
have XML syntax. Nothing more and nothing less. It doesn't require that this
naming convention be used for all things XML, it says nothing about matching
such names as part of content negotiation, and it doesn't affect whether
or not a given XML variant gets its own media type name or not.

It seems that a lot of people are reading vastly more into this than is
there.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id LAA05838 for ietf-xml-mime-bks; Mon, 20 Mar 2000 11:17:37 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id LAA05827; Mon, 20 Mar 2000 11:17:34 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id LAA11895; Mon, 20 Mar 2000 11:16:50 -0800
Message-Id: <3.0.32.20000320111851.02474670@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Mon, 20 Mar 2000 11:18:54 -0800
To: "Marshall Rose" <mrose+mtr.netnews@dbc.mtview.ca.us>, <ned.freed@INNOSOFT.COM>
From: Tim Bray <tbray@textuality.com>
Subject: Re: Finishing the XML-tagging discussion
Cc: <ned.freed@INNOSOFT.COM>, "Keith Moore" <moore@cs.utk.edu>, "Chris Lilley" <chris@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>, <Valdis.Kletnieks@vt.edu>, <ietf-xml-mime@imc.org>, <ietf-822@imc.org>, "Marshall Rose" <mrose@dbc.mtview.ca.us>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 11:10 AM 3/20/00 -0800, Marshall Rose wrote:
>hmmm.  my view of the example above is that XML is being used as the syntax
>but the semantics of the blob being passed are still SVG semantics.

I think that's more or less everyone's view.  Hence the reason why a 
top-level xml/ type turns out to be a bad idea.

>at the risk of seeming insensitive with the exception of "text/xml", i would
>never expect to see a subtype of "xml" for any media type. of course, taking
>that line, i suppose that the example above should simply be
>    image/svg

Once again, no-one seems to disagree. The proposal is 

 image/svg-xml

and I believe this is what SVG plans to file for.

>and that the processing element for that application should already know the
>possible syntaxes that it could encounter.

The idea is that it is in the general case useful to know whether something's
in XML syntax, regardless of the application & its semantic.  -Tim


Received: by ns.secondary.com (8.9.3/8.9.3) id LAA05788 for ietf-xml-mime-bks; Mon, 20 Mar 2000 11:14:01 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id LAA05778; Mon, 20 Mar 2000 11:13:59 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN8A01MCOW00004D@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 11:15:25 -0800 (PST)
Date: Mon, 20 Mar 2000 11:09:21 -0800 (PST)
Subject: RE: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 18:35:37 +0000" <AA4C152BA2F9D211B9DD0008C79F760A95D751@odin.cromwellmedia.co.uk>
To: Miles Sabin <msabin@cromwellmedia.co.uk>
Cc: ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN95HQMKOU00004D@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> Simon St.Laurent wrote,
> > I've yet to hear of a meaningful 'breaking', and I'd like a
> > concrete example.
> >
> > As far as I can tell, -xml doesn't interfere with existing
> > systems at all.

> Depends what you mean by break, I guess. If an HTTP user agent
> sends a request with Accept: application/*-xml, then as things
> stand, it's as likely as not to get back a 400 Bad Request
> response.

Well, that would technically be a violation of the HTTP protocol. * isn't
a tspecial, and hence is allowed in a type or subtype token.

But of course the chances of this actually working aren't good, since
the HTTP specification says that only an asterisk by itself is a wildcard.

OTOH, the use of parameter matching, while defined, is called "rare"
in the specification and AFAIK is never used in practice (the one
parameter where it is obviously useful is charset, but that parameter
has a separate accept- field of its own). So I wonder how well trying to
match a parameter would work in practice.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id LAA05706 for ietf-xml-mime-bks; Mon, 20 Mar 2000 11:08:57 -0800 (PST)
Received: from wodc7mr3.ffx.ops.us.uu.net (wodc7mr3.ffx.ops.us.uu.net [192.48.96.19]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id LAA05696; Mon, 20 Mar 2000 11:08:53 -0800 (PST)
Received: from shaylashayla by wodc7mr3.ffx.ops.us.uu.net with SMTP  (peer crosschecked as: 1Cust27.tnt15.alameda.ca.da.uu.net [63.14.1.27]) id QQihjg27359; Mon, 20 Mar 2000 19:10:09 GMT
Message-ID: <003801bf929f$e7887b80$6d61fea9@dbc.mtview.ca.us>
From: "Marshall Rose" <mrose+mtr.netnews@dbc.mtview.ca.us>
To: <ned.freed@INNOSOFT.COM>
Cc: <ned.freed@INNOSOFT.COM>, "Keith Moore" <moore@cs.utk.edu>, "Chris Lilley" <chris@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>, <Valdis.Kletnieks@vt.edu>, <ietf-xml-mime@imc.org>, <ietf-822@imc.org>, "Marshall Rose" <mrose@dbc.mtview.ca.us>
References: <38D61D2B.A71DC71C@w3.org> <01JN92LO01ME00004D@MAUVE.INNOSOFT.COM> <01JN94T4BSEM00004D@MAUVE.INNOSOFT.COM>
Subject: Re: Finishing the XML-tagging discussion
Date: Mon, 20 Mar 2000 11:10:03 -0800
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2919.6600
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6600
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > suppose we were just to have something like:
>
> >     Content-type: image/svg; representation="xml"
>
> > where "representation" is just a plain old MIME parameter.
>
> > would you object to something along these lines?
>
> Yes. Most of the problems I have with global parameter still apply to this
> usage.
>
> If we absolutely have to do this with a separate piece of information, I
would
> opt for a content-feature tag. That way there's a clear delineation
between
> when feature information is or is not present, and we don't mess up MIME
> parameter space. And we need the feature tag anyway for negotiation
purposes.

hmmm.  my view of the example above is that XML is being used as the syntax
but the semantics of the blob being passed are still SVG semantics.

at the risk of seeming insensitive with the exception of "text/xml", i would
never expect to see a subtype of "xml" for any media type. of course, taking
that line, i suppose that the example above should simply be

    image/svg

and that the processing element for that application should already know the
possible syntaxes that it could encounter.

/mtr






Received: by ns.secondary.com (8.9.3/8.9.3) id KAA05388 for ietf-xml-mime-bks; Mon, 20 Mar 2000 10:54:16 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id KAA05377; Mon, 20 Mar 2000 10:54:14 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN8A01MCOW00004D@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 10:55:35 -0800 (PST)
Date: Mon, 20 Mar 2000 10:52:22 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 10:34:55 -0800" <01a201bf929b$05b81700$6d61fea9@dbc.mtview.ca.us>
To: Marshall Rose <mrose+mtr.netnews@dbc.mtview.ca.us>
Cc: ned.freed@INNOSOFT.COM, Keith Moore <moore@cs.utk.edu>, Chris Lilley <chris@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN94T4BSEM00004D@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=iso-8859-1
References: <38D61D2B.A71DC71C@w3.org> <01JN92LO01ME00004D@MAUVE.INNOSOFT.COM>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > > > One alternative - not necessarily one I am advocating, but just
> listing for
> > > > completeness - is to use a similar approach but swapped round:
> > > >
> > > > image/svg;ContentLanguage=xml
> >
> > > I favor something like this.  it seems like it will interact more
> > > cleanly with content negotiation mechanisms (as opposed to those
> > > mechanisms having to acquire new pattern-matching facilities)
> >
> > > it's true that as MIME is architected the parameter space is on a
> > > per-content-type basis.  but I don't see any great harm if we define
> > > some global parameter name space (say, parameter names that begin
> > > with "$" are global).  Then we could do something like:
> >
> > > content-type: image/svg; $superclass="application/xml"

> ned - i agree with your take that having global parameters is a major change
> to MIME.

> suppose we were just to have something like:

>     Content-type: image/svg; representation="xml"

> where "representation" is just a plain old MIME parameter.

> would you object to something along these lines?

Yes. Most of the problems I have with global parameter still apply to this
usage.

If we absolutely have to do this with a separate piece of information, I would
opt for a content-feature tag. That way there's a clear delineation between
when feature information is or is not present, and we don't mess up MIME
parameter space. And we need the feature tag anyway for negotiation purposes.

I still strongly prefer the suffix, however.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id KAA05130 for ietf-xml-mime-bks; Mon, 20 Mar 2000 10:49:22 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id KAA05120; Mon, 20 Mar 2000 10:49:16 -0800 (PST)
Received: from thinkpad (ith1-190.twcny.rr.com [24.92.236.144]) by hesketh.net (8.9.3/8.9.3) with SMTP id NAA13040; Mon, 20 Mar 2000 13:50:52 -0500
Message-Id: <200003201850.NAA13040@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Mon, 20 Mar 2000 13:51:42 -0500
To: Keith Moore <moore@cs.utk.edu>
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion 
Cc: Keith Moore <moore@cs.utk.edu>, ned.freed@INNOSOFT.COM, Chris Lilley <chris@w3.org>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <200003201833.NAA26474@astro.cs.utk.edu>
References: <Your message of "Mon, 20 Mar 2000 13:19:58 EST."             <200003201819.NAA11392@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 01:33 PM 3/20/00 -0500, Keith Moore wrote:
>if you want an expression that matches anything ending in -xml, 
>as far as I can tell, existing http accept header semantics,
>and existing conneg expressions, are not sufficient to recognize that.
>
>if you want it to be more general and anticipate the possibility
>of multiple frobs (not just -xml) then you need something like
>a regular expression matching capability 
>(so you can look for things like "-xml(-|$)" )

I think we may encountered a problem in our underlying viewpoints.  You're
talking about sending */*xml over the wire in headers, while I'm only
talking about using */*xml to recognize XML documents when I get them.  

I don't think the proposal as written breaks things any more than
introducing a new MIME type would.

Perhaps you're reading too much into this, from section 6:
I-D>Applications may match for types that
I-D>represent XML entities by comparing the subtype to the pattern
I-D>*/*-xml.

Yes, older tools wouldn't do anything in particular with the suffix.
Lacking any such regular expression capacities, they'd just see it as
another MIME type and ignore any concept of dispatching it to generic XML
processing.

This is the case with any innovation, not a compatibility breakdown.

Compared to the parameter option you keep proposing, I'd say this causes
almost zero problems as far as interoperability and compatibility are
concerned.

If user agents start requesting */*xml, we do have a problem.  However,
that isn't what this I-D proposes - if users find that useful, they'll have
to push for changes in the protocol infrastructure.  I think it would be
useful, and is a possibility opened by this proposal, but it is not part of
this proposal.

All this suffix does is provide additional information identifying content
as having an XML foundation.  I don't believe it has any other
implications, nor do I believe it will interfere with other developing
standards for content negotiation.

I'll be in San Jose at Software Development for the next few days, so won't
be online with my usual frequency.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id KAA05106 for ietf-xml-mime-bks; Mon, 20 Mar 2000 10:48:47 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id KAA05095; Mon, 20 Mar 2000 10:48:45 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN8A01MCOW00004D@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 10:50:07 -0800 (PST)
Date: Mon, 20 Mar 2000 10:48:40 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 13:19:58 -0500" <200003201819.NAA11392@hesketh.net>
To: "Simon St.Laurent" <simonstl@simonstl.com>
Cc: Keith Moore <moore@cs.utk.edu>, ned.freed@INNOSOFT.COM, Chris Lilley <chris@w3.org>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN94MCPL8U00004D@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
References: <200003201814.NAA26399@astro.cs.utk.edu>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> At 01:14 PM 3/20/00 -0500, Keith Moore wrote:
> >by contrast, the -xml frob
> >coupled with the desire for content-negotiation languages (http accept
> >and conneg) to recognize anything that is XML does break things.
> >I'm just trying to minimize breakage.

> Could you please explain in detail what gets broken here?

> I've yet to hear of a meaningful 'breaking', and I'd like a concrete example.

> As far as I can tell, -xml doesn't interfere with existing systems at all.

I note in passing that we've already registered some XML-based types
with an "XML" suffix, and no problems have been reported with this
usage.

				Ned


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id KAA05053 for ietf-xml-mime-bks; Mon, 20 Mar 2000 10:47:01 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id KAA05042; Mon, 20 Mar 2000 10:46:57 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN8A01MCOW00004D@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 10:48:18 -0800 (PST)
Date: Mon, 20 Mar 2000 10:25:24 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 13:14:56 -0500" <200003201814.NAA26399@astro.cs.utk.edu>
To: Keith Moore <moore@cs.utk.edu>
Cc: ned.freed@INNOSOFT.COM, Keith Moore <moore@cs.utk.edu>, Chris Lilley <chris@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN94K31VY200004D@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <01JN92LO01ME00004D@MAUVE.INNOSOFT.COM>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > Second, the definition of a global parameter namespace is far from the worst
> > thing about this. The biggest problem this approach has is that now you have
> > what amounts to a mandatory parameter, one which has to appear with every
> > appearance of a given media type and must have a specific value in those
> > cases.

> actually, I don't view the parameter as mandatory; I view it as advisory.
> The right way to handle this is to treat it as an image/svg.  Depending
> on how you look at it, the information in the $superclass parameter is
> either a fallback way to handle the type if you don't know how to handle
> image/svg, or additional information that can be used by generic "index
> everything that is XML" processors.

An advisory parameter of this sort is a worthless parameter. If you cannot
depend on it appearing for every instance of a given content type you
cannot use it for anything. (Or worse, you will use it and fail when it
isn't present.)

> > This is completely contrary to the entire intent of the parameter
> > space: Parameters were intended to convey information that isn't
> > invariant with the media type.

> it's true that parameters were intended to convey type-specific information,
> but I don't see the harm in defining another set of parameters (disjoint
> from the type-specific parameters), which convey other information about
> the media type.

Well, all I can say is that I do see the potential for tremendous harm in doing
it, many times more harm than I see in the suffix that you seem to think is so
bad.

> > Third, this introduces the possibility of silly states in a major way.

> As far as I can tell, parameters already have tremendous potential for
> silly states - nothing stops you from adding any parameter to any media
> type (so you can for example have a charset parameter on an image).

While this may be a silly state in some sense, it is an entirely harmless one.
The silly states this superclass nonsense introduces are anything but.

> But in practice, and despite widespread misuse of certain parameter
> names with content-types that don't define those parameters, it doesn't
> seem to cause big problems.  Unrecognized parameters are generally
> ignored.

Exactly. We were careful to define parameters in this way so that bad
behavior as a result of silly states was minimized. But this isn't possible
with the sort of parameter you're proposing.

> > Fourth, the infrastructure upgrades to handle this are far nastier
> > than you seem to think, and have to happen at both the sending and receiving
> > end.

> I guess the question in my mind is whether those upgrades affect every
> MIME handling agent or whether (for the time being) they only affect
> things that care specifically about XML.

In order for this to be useful they have to affect every agent.

> The latter set seems a lot
> smaller than the former set, and I'm less worried about the impact of
> this on XML-aware agents (which after are in an emerging space anyway)
> than I am worried about the impact of -xml frobs on things that need
> to do content-negotiation (which includes every HTTP client and server).

This is a complete red herring. Nobody is proposing that the suffix be
used for negotiation purposes. Negotiation is a different problem than
labelling.

> > Fifth, as I said before, this doesn't work with places where only media
> > types and not parameters are carried. Such places do exist and fixing
> > them all is going to be nearly impossible.

> Since I see this parameter as advisory, to me it doesn't matter much.
> Applications for which the advisory parameter is of sufficient utility
> will get fixed to interpret or generate that parameter; those that don't
> benefit sufficiently will not get fixed.

Again, an advisory parameter of this sort accomplishes nothing.

> To put it another way, if it's absolutely important that every instance
> of image/svg be externally labelled as XML then I'd agree with you.
> But I don't see this as absolutely important.

I do.

> > In summary, what this amounts to is nothing less that a complete redesign of
> > how  MIME works.

> "complete redesign" seems a bit overstated.  It is a slightly different
> way of using parameters than originally envisioned.  But it doesn't
> interfere with the existing parameter space.

Actually, I think I seriously understated it. It isn't just a complete
redesign, it is a revamping of the underlying principles and agreements MIME is
based on.

> And while I understand that many implementations don't have the ability
> to generate or the ability to interpret content-type parameters, I don't
> see how this breaks anything that works now.  by contrast, the -xml frob
> coupled with the desire for content-negotiation languages (http accept
> and conneg) to recognize anything that is XML does break things.
> I'm just trying to minimize breakage.

Please cite a single, solitary example of where it breaks things. I'd like
to hear of one. Thus far all you have cited are easily surmountable
problems, like the ordering of future additional suffixes (assuming there
ever are any).

> a concrete example of something that this breaks would be helpful
> in getting me to understand your concerns.

It breaks so many things in so many ways... Some exmaples:

(1) Silly state problems. Consider the possible effect of image/jpeg;
    $superclass=text/xml on a handler only prepared to accept XML text.
    (And compare it with the effect of image/jpeg; charset=us-ascii on
    any existing handler.)

(2) Problems with sending agents not including the tag. Suppose an application
    is deployed that depends on the superclass tag. (This is inevitable once
    the tag is defined, you can call it advisory until you are blue in the
    face but if it is used at all it won't be taken as such.) Now consider
    what happens when something that's incapable of generating the tag sends
    to the agent that requires it.

(3) Problems with places where parameters aren't expected/allowed. Once
    the tag is required there will be pressure to generate it. This in turn
    will lead to sending agents upgrading producing it and thereby cranking
    out parameters for the first time. Some of these agents are now used to
    generate values for fields that don't allow parameters. The upgrade will
    cause these fields to become synatically invalid.

I could go on to list many more, but hopefully this gives some flavor
of how bad I think this idea is.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id KAA04763 for ietf-xml-mime-bks; Mon, 20 Mar 2000 10:34:27 -0800 (PST)
Received: from odin.cromwellmedia.co.uk ([212.2.15.25]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id KAA04759; Mon, 20 Mar 2000 10:34:26 -0800 (PST)
Received: by odin.cromwellmedia.co.uk with Internet Mail Service (5.5.2448.0) id <GXMJRYTW>; Mon, 20 Mar 2000 18:35:38 -0000
Message-ID: <AA4C152BA2F9D211B9DD0008C79F760A95D751@odin.cromwellmedia.co.uk>
From: Miles Sabin <msabin@cromwellmedia.co.uk>
To: ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: RE: Finishing the XML-tagging discussion 
Date: Mon, 20 Mar 2000 18:35:37 -0000
X-Mailer: Internet Mail Service (5.5.2448.0)
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Simon St.Laurent wrote,
> I've yet to hear of a meaningful 'breaking', and I'd like a 
> concrete example.
> 
> As far as I can tell, -xml doesn't interfere with existing
> systems at all.

Depends what you mean by break, I guess. If an HTTP user agent 
sends a request with Accept: application/*-xml, then as things
stand, it's as likely as not to get back a 400 Bad Request
response.

That might not be breakage, but it's far from obvious (to me at 
least) that it helps us much.

Cheers,


Miles

-- 
Miles Sabin                       Cromwell Media
Internet Systems Architect        5/6 Glenthorne Mews
+44 (0)20 8817 4030               London, W6 0LJ, England
msabin@cromwellmedia.com          http://www.cromwellmedia.com/





Received: by ns.secondary.com (8.9.3/8.9.3) id KAA04751 for ietf-xml-mime-bks; Mon, 20 Mar 2000 10:34:12 -0800 (PST)
Received: from wodc7mr3.ffx.ops.us.uu.net (wodc7mr3.ffx.ops.us.uu.net [192.48.96.19]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id KAA04737; Mon, 20 Mar 2000 10:34:09 -0800 (PST)
Received: from shaylashayla by wodc7mr3.ffx.ops.us.uu.net with SMTP  (peer crosschecked as: 1Cust203.tnt10.alameda.ca.da.uu.net [63.10.112.203]) id QQihje18713; Mon, 20 Mar 2000 18:35:11 GMT
Message-ID: <01a201bf929b$05b81700$6d61fea9@dbc.mtview.ca.us>
From: "Marshall Rose" <mrose+mtr.netnews@dbc.mtview.ca.us>
To: <ned.freed@INNOSOFT.COM>, "Keith Moore" <moore@cs.utk.edu>
Cc: "Chris Lilley" <chris@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>, <Valdis.Kletnieks@vt.edu>, <ietf-xml-mime@imc.org>, <ietf-822@imc.org>
References: <38D61D2B.A71DC71C@w3.org> <01JN92LO01ME00004D@MAUVE.INNOSOFT.COM>
Subject: Re: Finishing the XML-tagging discussion
Date: Mon, 20 Mar 2000 10:34:55 -0800
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2919.6600
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6600
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > > One alternative - not necessarily one I am advocating, but just
listing for
> > > completeness - is to use a similar approach but swapped round:
> > >
> > > image/svg;ContentLanguage=xml
>
> > I favor something like this.  it seems like it will interact more
> > cleanly with content negotiation mechanisms (as opposed to those
> > mechanisms having to acquire new pattern-matching facilities)
>
> > it's true that as MIME is architected the parameter space is on a
> > per-content-type basis.  but I don't see any great harm if we define
> > some global parameter name space (say, parameter names that begin
> > with "$" are global).  Then we could do something like:
>
> > content-type: image/svg; $superclass="application/xml"

ned - i agree with your take that having global parameters is a major change
to MIME.

suppose we were just to have something like:

    Content-type: image/svg; representation="xml"

where "representation" is just a plain old MIME parameter.

would you object to something along these lines?

/mtr




Received: by ns.secondary.com (8.9.3/8.9.3) id KAA04729 for ietf-xml-mime-bks; Mon, 20 Mar 2000 10:33:32 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id KAA04717; Mon, 20 Mar 2000 10:33:10 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id NAA26474; Mon, 20 Mar 2000 13:33:06 -0500 (EST)
Message-Id: <200003201833.NAA26474@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: "Simon St.Laurent" <simonstl@simonstl.com>
cc: Keith Moore <moore@cs.utk.edu>, ned.freed@INNOSOFT.COM, Chris Lilley <chris@w3.org>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 13:19:58 EST." <200003201819.NAA11392@hesketh.net> 
Date: Mon, 20 Mar 2000 13:33:06 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> Could you please explain in detail what gets broken here?

if you want an expression that matches anything ending in -xml, 
as far as I can tell, existing http accept header semantics,
and existing conneg expressions, are not sufficient to recognize that.

if you want it to be more general and anticipate the possibility
of multiple frobs (not just -xml) then you need something like
a regular expression matching capability 
(so you can look for things like "-xml(-|$)" )

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id KAA04375 for ietf-xml-mime-bks; Mon, 20 Mar 2000 10:17:59 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id KAA04365; Mon, 20 Mar 2000 10:17:51 -0800 (PST)
Received: from thinkpad (ith1-190.twcny.rr.com [24.92.236.144]) by hesketh.net (8.9.3/8.9.3) with SMTP id NAA11392; Mon, 20 Mar 2000 13:19:07 -0500
Message-Id: <200003201819.NAA11392@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Mon, 20 Mar 2000 13:19:58 -0500
To: Keith Moore <moore@cs.utk.edu>, ned.freed@INNOSOFT.COM
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion 
Cc: Keith Moore <moore@cs.utk.edu>, Chris Lilley <chris@w3.org>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <200003201814.NAA26399@astro.cs.utk.edu>
References: <Your message of "Mon, 20 Mar 2000 09:39:08 PST."             <01JN92LO01ME00004D@MAUVE.INNOSOFT.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 01:14 PM 3/20/00 -0500, Keith Moore wrote:
>by contrast, the -xml frob
>coupled with the desire for content-negotiation languages (http accept
>and conneg) to recognize anything that is XML does break things.
>I'm just trying to minimize breakage.

Could you please explain in detail what gets broken here?

I've yet to hear of a meaningful 'breaking', and I'd like a concrete example.

As far as I can tell, -xml doesn't interfere with existing systems at all.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id KAA04274 for ietf-xml-mime-bks; Mon, 20 Mar 2000 10:15:06 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id KAA04259; Mon, 20 Mar 2000 10:14:55 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id NAA26399; Mon, 20 Mar 2000 13:14:56 -0500 (EST)
Message-Id: <200003201814.NAA26399@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: ned.freed@INNOSOFT.COM
cc: Keith Moore <moore@cs.utk.edu>, Chris Lilley <chris@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 09:39:08 PST." <01JN92LO01ME00004D@MAUVE.INNOSOFT.COM> 
Date: Mon, 20 Mar 2000 13:14:56 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > content-type: image/svg; $superclass="application/xml"
> 
> First of all, this approach was discussed when MIME was first designed and 
> was roundly rejected at that time. At the time I was actually in favor of 
> doing this, but I've since decided that the people who were opposed to 
> it were right.

I don't recall this part of those discussions.

> Second, the definition of a global parameter namespace is far from the worst
> thing about this. The biggest problem this approach has is that now you have
> what amounts to a mandatory parameter, one which has to appear with every
> appearance of a given media type and must have a specific value in those
> cases. 

actually, I don't view the parameter as mandatory; I view it as advisory.
The right way to handle this is to treat it as an image/svg.  Depending
on how you look at it, the information in the $superclass parameter is 
either a fallback way to handle the type if you don't know how to handle
image/svg, or additional information that can be used by generic "index 
everything that is XML" processors.

> This is completely contrary to the entire intent of the parameter
> space: Parameters were intended to convey information that isn't 
> invariant with the media type.

it's true that parameters were intended to convey type-specific information,
but I don't see the harm in defining another set of parameters (disjoint
from the type-specific parameters), which convey other information about
the media type.

> Third, this introduces the possibility of silly states in a major way.

As far as I can tell, parameters already have tremendous potential for
silly states - nothing stops you from adding any parameter to any media 
type (so you can for example have a charset parameter on an image).
But in practice, and despite widespread misuse of certain parameter
names with content-types that don't define those parameters, it doesn't 
seem to cause big problems.  Unrecognized parameters are generally 
ignored.

> Fourth, the infrastructure upgrades to handle this are far nastier
> than you seem to think, and have to happen at both the sending and receiving
> end.

I guess the question in my mind is whether those upgrades affect every
MIME handling agent or whether (for the time being) they only affect 
things that care specifically about XML.  The latter set seems a lot
smaller than the former set, and I'm less worried about the impact of
this on XML-aware agents (which after are in an emerging space anyway)
than I am worried about the impact of -xml frobs on things that need
to do content-negotiation (which includes every HTTP client and server).

> Fifth, as I said before, this doesn't work with places where only media
> types and not parameters are carried. Such places do exist and fixing
> them all is going to be nearly impossible.

Since I see this parameter as advisory, to me it doesn't matter much.
Applications for which the advisory parameter is of sufficient utility
will get fixed to interpret or generate that parameter; those that don't 
benefit sufficiently will not get fixed.

To put it another way, if it's absolutely important that every instance 
of image/svg be externally labelled as XML then I'd agree with you.
But I don't see this as absolutely important.  

> In summary, what this amounts to is nothing less that a complete redesign of
> how  MIME works. 

"complete redesign" seems a bit overstated.  It is a slightly different
way of using parameters than originally envisioned.  But it doesn't
interfere with the existing parameter space.

And while I understand that many implementations don't have the ability 
to generate or the ability to interpret content-type parameters, I don't 
see how this breaks anything that works now.  by contrast, the -xml frob
coupled with the desire for content-negotiation languages (http accept
and conneg) to recognize anything that is XML does break things.
I'm just trying to minimize breakage.

a concrete example of something that this breaks would be helpful
in getting me to understand your concerns.

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA03696 for ietf-xml-mime-bks; Mon, 20 Mar 2000 09:50:57 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA03686; Mon, 20 Mar 2000 09:50:53 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN8A01MCOW00004D@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 09:52:17 -0800 (PST)
Date: Mon, 20 Mar 2000 09:39:08 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 11:03:29 -0500" <200003201603.LAA25963@astro.cs.utk.edu>
To: Keith Moore <moore@cs.utk.edu>
Cc: Chris Lilley <chris@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN92LO01ME00004D@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <38D61D2B.A71DC71C@w3.org>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > One alternative - not necessarily one I am advocating, but just listing for
> > completeness - is to use a similar approach but swapped round:
> >
> > image/svg;ContentLanguage=xml

> I favor something like this.  it seems like it will interact more
> cleanly with content negotiation mechanisms (as opposed to those
> mechanisms having to acquire new pattern-matching facilities)

> it's true that as MIME is architected the parameter space is on a
> per-content-type basis.  but I don't see any great harm if we define
> some global parameter name space (say, parameter names that begin
> with "$" are global).  Then we could do something like:

> content-type: image/svg; $superclass="application/xml"

First of all, this approach was discussed when MIME was first designed and was
roundly rejected at that time. At the time I was actually in favor of doing
this, but I've since decided that the people who were opposed to it were right.

Second, the definition of a global parameter namespace is far from the worst
thing about this. The biggest problem this approach has is that now you have
what amounts to a mandatory parameter, one which has to appear with every
appearance of a given media type and must have a specific value in those
cases. This is completely contrary to the entire intent of the parameter
space: Parameters were intended to convey information that isn't invariant
with the media type.

Third, this introduces the possibility of silly states in a major way.

Fourth, the infrastructure upgrades to handle this are far nastier
than you seem to think, and have to happen at both the sending and receiving
end.

Fifth, as I said before, this doesn't work with places where only media
types and not parameters are carried. Such places do exist and fixing
them all is going to be nearly impossible.

In summary, what this amounts to is nothing less that a complete redesign of
how  MIME works. And in case my position on this wasn't clear before, this is a
100% ABSOLUTE TOTAL SHOWSTOPPER for me. I will fight this in every way i
possibly can -- I really do think it is that bad of an idea. Frankly, I haven't
seen anything I believe is so destructive and confusing proposed since the use
of line or character counts were considered as an alternative to
boundary markers back in 1991.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA03326 for ietf-xml-mime-bks; Mon, 20 Mar 2000 09:25:20 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA03302; Mon, 20 Mar 2000 09:24:24 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN8A01MCOW00004D@MAUVE.INNOSOFT.COM>; Mon, 20 Mar 2000 09:25:39 -0800 (PST)
Date: Mon, 20 Mar 2000 09:24:01 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Mon, 20 Mar 2000 09:41:51 +0000" <4.2.2.20000320093611.00b3c840@pop.dial.pipex.com>
To: Graham Klyne <GK@dial.pipex.com>
Cc: "Simon St.Laurent" <simonstl@simonstl.com>, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN91NNH17000004D@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii; format=flowed
References: <200003191316.IAA31649@hesketh.net> <200003191406.JAA00561@hesketh.net>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> At 09:07 AM 3/19/00 -0500, Simon St.Laurent wrote:
> >The other problem I have with Content-Feature is that it isn't clear to me
> >- based on my RFC dredgings, which may not be complete - whether or not
> >there is any kind of registration process or even official status for
> >Content-Feature.

> FYI:  RFC 2506;  also RFC 2533.

> (Content-feature is currently an I-D proposal that employs this framework
> to label MIME content.)

> And to try and dispel some misunderstandings:  I have suggested use of the
> CONNEG framework for _content negotiation_.  To me, the use of "-xml" as an
> aid to detecting generic XML seems a reasonable and useful idea;  I just
> think that using a naming convention as a basis for content negotiation
> would be pushing it too far.

I agree that the suffix isn't appropriate for negotiation; this should
be handled by some sort of feature tag.

				Ned



Received: by ns.secondary.com (8.9.3/8.9.3) id IAA01818 for ietf-xml-mime-bks; Mon, 20 Mar 2000 08:03:53 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA01800; Mon, 20 Mar 2000 08:03:32 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id LAA25963; Mon, 20 Mar 2000 11:03:29 -0500 (EST)
Message-Id: <200003201603.LAA25963@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: Chris Lilley <chris@w3.org>
cc: "Simon St.Laurent" <simonstl@simonstl.com>, Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion 
In-reply-to: Your message of "Mon, 20 Mar 2000 13:44:27 +0100." <38D61D2B.A71DC71C@w3.org> 
Date: Mon, 20 Mar 2000 11:03:29 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> One alternative - not necessarily one I am advocating, but just listing for
> completeness - is to use a similar approach but swapped round:
> 
> image/svg;ContentLanguage=xml

I favor something like this.  it seems like it will interact more
cleanly with content negotiation mechanisms (as opposed to those
mechanisms having to acquire new pattern-matching facilities)

it's true that as MIME is architected the parameter space is on a 
per-content-type basis.  but I don't see any great harm if we define
some global parameter name space (say, parameter names that begin
with "$" are global).  Then we could do something like:

content-type: image/svg; $superclass="application/xml"

it's true that a lot of existing content-type dispatchers wouldn't
know how to deal with the parameter, but that doesn't seem like
such a big deal because it's easily added to the few MIME handlers
that know about XML.  similarly there are content-type labellers
that don't know about the concept of parameters but this provides
an opportunity for them to get fixed.  and since this is only a 
hint and the complete type is still specified in the foo/bar
portion of the content-type, the system can still "work" even 
in the absence of the $superclass label.

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id EAA26853 for ietf-xml-mime-bks; Mon, 20 Mar 2000 04:43:02 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id EAA26842; Mon, 20 Mar 2000 04:42:53 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id HAA23508; Mon, 20 Mar 2000 07:44:30 -0500
Message-ID: <38D61D2B.A71DC71C@w3.org>
Date: Mon, 20 Mar 2000 13:44:27 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: "Simon St.Laurent" <simonstl@simonstl.com>
CC: Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org, ietf-822@imc.org
Subject: Re: Finishing the XML-tagging discussion
References: <200003191406.JAA00561@hesketh.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

"Simon St.Laurent" wrote:

> 2) People have grown accustomed to the basic top-level types carrying
> meaningful information.  The Content-Feature approach would mean that
> Scalable Vector Graphics (SVG) would be application/xml;
> Content-Feature=svg  -- not not image/svg-xml.  The contents of those two
> descriptions are significantly different, though they purport to describe
> the same thing.  This doesn't apply in every case, since a lot of XML
> formats belong in application/, but it applies in a significant number of
> cases.

Good summary, I agree. the existing top-level types are a useful feature.

One alternative - not necessarily one I am advocating, but just listing for
completeness - is to use a similar approach but swapped round:

image/svg;ContentLanguage=xml

(which basically amounts to foo/bar;xml=yes)

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id CAA22214 for ietf-xml-mime-bks; Mon, 20 Mar 2000 02:37:32 -0800 (PST)
Received: from msw.mimesweeper.com (msw.mimesweeper.com [194.168.90.18]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id CAA22204; Mon, 20 Mar 2000 02:37:29 -0800 (PST)
Received: from bell.mimesweeper.com (unverified) by msw.mimesweeper.com (Content Technologies SMTPRS 4.1.5) with ESMTP id <Tc2a85a12de4b105a0004@msw.mimesweeper.com>; Mon, 20 Mar 2000 10:41:20 +0000
Received: from GK-VAIO (gk-vaio.mimesweeper.com [194.168.90.137]) by bell.mimesweeper.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2448.0) id HDT54BQ1; Mon, 20 Mar 2000 10:40:18 -0000
Message-Id: <4.2.2.20000320093611.00b3c840@pop.dial.pipex.com>
X-Sender: maiw03@pop.dial.pipex.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Mon, 20 Mar 2000 09:41:51 +0000
To: "Simon St.Laurent" <simonstl@simonstl.com>
From: Graham Klyne <GK@dial.pipex.com>
Subject: Re: Finishing the XML-tagging discussion 
Cc: ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <200003191406.JAA00561@hesketh.net>
References: <200003191316.IAA31649@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 09:07 AM 3/19/00 -0500, Simon St.Laurent wrote:
>The other problem I have with Content-Feature is that it isn't clear to me
>- based on my RFC dredgings, which may not be complete - whether or not
>there is any kind of registration process or even official status for
>Content-Feature.

FYI:  RFC 2506;  also RFC 2533.

(Content-feature is currently an I-D proposal that employs this framework 
to label MIME content.)

And to try and dispel some misunderstandings:  I have suggested use of the 
CONNEG framework for _content negotiation_.  To me, the use of "-xml" as an 
aid to detecting generic XML seems a reasonable and useful idea;  I just 
think that using a naming convention as a basis for content negotiation 
would be pushing it too far.

#g

------------
Graham Klyne
(GK@ACM.ORG)



Received: by ns.secondary.com (8.9.3/8.9.3) id CAA21900 for ietf-xml-mime-bks; Mon, 20 Mar 2000 02:11:56 -0800 (PST)
Received: from odin.cromwellmedia.co.uk ([212.2.15.25]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id CAA21896; Mon, 20 Mar 2000 02:11:54 -0800 (PST)
Received: by odin.cromwellmedia.co.uk with Internet Mail Service (5.5.2448.0) id <GXMJRXXD>; Mon, 20 Mar 2000 10:12:54 -0000
Message-ID: <AA4C152BA2F9D211B9DD0008C79F760A95D745@odin.cromwellmedia.co.uk>
From: Miles Sabin <msabin@cromwellmedia.co.uk>
To: ietf-xml-mime@imc.org
Cc: ietf-822@imc.org
Subject: Cross classification (WAS: Finishing the XML-tagging discussion)
Date: Mon, 20 Mar 2000 10:12:53 -0000
X-Mailer: Internet Mail Service (5.5.2448.0)
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Simon St.Laurent wrote,
> 2) People have grown accustomed to the basic top-level 
> types carrying meaningful information.  The Content-Feature 
> approach would mean that Scalable Vector Graphics (SVG) 
> would be application/xml; Content-Feature=svg  -- not not 
> image/svg-xml.  The contents of those two descriptions are 
> significantly different, though they purport to describe
> the same thing.

To my mind this is the core of the problem. What we've got is 
*two* hierarchical classifications of resource types: the MIME 
type/subtype one; and the xml/xml-application one.

Unfortunately these two hierarchies are othogonal, as the svg 
example illustrates quite neatly. This means we won't be able 
to encode one in the other without either information loss or 
special processing of some sort.

Cheers,


Miles

-- 
Miles Sabin                       Cromwell Media
Internet Systems Architect        5/6 Glenthorne Mews
+44 (0)20 8817 4030               London, W6 0LJ, England
msabin@cromwellmedia.com          http://www.cromwellmedia.com/



Received: by ns.secondary.com (8.9.3/8.9.3) id JAA27554 for ietf-xml-mime-bks; Sun, 19 Mar 2000 09:14:48 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA27544; Sun, 19 Mar 2000 09:14:44 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id MAA13368; Sun, 19 Mar 2000 12:16:08 -0500
Message-Id: <200003191716.MAA13368@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Sun, 19 Mar 2000 12:17:17 -0500
To: Elliotte Rusty Harold <elharo@metalab.unc.edu>
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion
Cc: ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
In-Reply-To: <v04210101b4faaea8f2e9@[192.168.1.254]>
References: <200003181531.KAA19936@hesketh.net> <200003172347.SAA24551@hesketh.net> <4.3.2.20000317145834.04f45100@mail.imc.org> <01JN4P9M11HA00004E@MAUVE.INNOSOFT.COM> <"Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net> <4.3.2.20000316201037.00bc5860@not-real.proper.com> <200003181531.KAA19936@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 11:25 AM 3/19/00 -0500, you wrote:
>I haven't been following this discussion too closely, but has a 
>top-level xml type been positively ruled out for any particular 
>reason? e.g. xml/rdf, xml/rss, xml/xhtml, etc.  It seems to me that 
>if we're going to so much trouble to make sure that generic XML 
>processors will recognize this with suffixes and so forth, maybe what 
>we really have is a different top-level type? The argument here would 
>that not all XML (e.g. SVG) is really text.

I've written a brief summary of how we got here, archived at:
http://www.imc.org/ietf-xml-mime/mail-archive/msg00345.html

For the top-level media types discussion, you might want to see the threads
listed below.  The top-level XML media type discussion sort of winds around
and through them, without clear resolution in a single thread,
unfortunately, but these are probably at least a useful set of threads to
explore.

Parameters for top-level XML media types?
http://www.imc.org/ietf-xml-mime/mail-archive/threads.html#00075

Top-level media types desirable?
http://www.imc.org/ietf-xml-mime/mail-archive/threads.html#00100

Perhaps we need an XML registration tree
http://www.imc.org/ietf-xml-mime/mail-archive/threads.html#00101

Application-specific media types
http://www.imc.org/ietf-xml-mime/mail-archive/threads.html#00111

Negotiated Content Delivery: Maximizing Information 
http://www.imc.org/ietf-xml-mime/mail-archive/threads.html#00121

Using CONNEG instead of MIME types for compound types & references
http://www.imc.org/ietf-xml-mime/mail-archive/threads.html#00133


Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA27264 for ietf-xml-mime-bks; Sun, 19 Mar 2000 08:42:59 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA27254; Sun, 19 Mar 2000 08:42:56 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN71Y1NRLC00027Z@MAUVE.INNOSOFT.COM>; Sun, 19 Mar 2000 08:44:15 -0800 (PST)
Date: Sun, 19 Mar 2000 08:42:54 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Sun, 19 Mar 2000 11:25:10 -0500" <v04210101b4faaea8f2e9@[192.168.1.254]>
To: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Cc: ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN7LWZFQLS00027Z@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii; format=flowed
References: <200003172347.SAA24551@hesketh.net> <4.3.2.20000317145834.04f45100@mail.imc.org> <01JN4P9M11HA00004E@MAUVE.INNOSOFT.COM> <4.3.2.20000316201037.00bc5860@not-real.proper.com> <200003181531.KAA19936@hesketh.net> <200003181531.KAA19936@hesketh.net>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> I haven't been following this discussion too closely, but has a
> top-level xml type been positively ruled out for any particular
> reason?

Please review the list archives. This has been discussed extensively already,
and it is a waste of everyone else's time to repeat that discussion for
a third (at least) time.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA26937 for ietf-xml-mime-bks; Sun, 19 Mar 2000 08:30:27 -0800 (PST)
Received: from russian-caravan.cloud9.net (russian-caravan.cloud9.net [168.100.1.4]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA26921; Sun, 19 Mar 2000 08:29:35 -0800 (PST)
Received: from [168.100.203.234] (macfaq.dialup.cloud9.net [168.100.203.234]) by russian-caravan.cloud9.net (Postfix) with ESMTP id BAA7B76469; Sun, 19 Mar 2000 11:30:39 -0500 (EST)
Mime-Version: 1.0
X-Sender: elharo@luna.oit.unc.edu
Message-Id: <v04210101b4faaea8f2e9@[192.168.1.254]>
In-Reply-To: <200003181531.KAA19936@hesketh.net>
References: <200003172347.SAA24551@hesketh.net> <4.3.2.20000317145834.04f45100@mail.imc.org> <01JN4P9M11HA00004E@MAUVE.INNOSOFT.COM> <"Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net> <4.3.2.20000316201037.00bc5860@not-real.proper.com> <200003181531.KAA19936@hesketh.net>
Date: Sun, 19 Mar 2000 11:25:10 -0500
To: ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Subject: Re: Finishing the XML-tagging discussion
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

I haven't been following this discussion too closely, but has a 
top-level xml type been positively ruled out for any particular 
reason? e.g. xml/rdf, xml/rss, xml/xhtml, etc.  It seems to me that 
if we're going to so much trouble to make sure that generic XML 
processors will recognize this with suffixes and so forth, maybe what 
we really have is a different top-level type? The argument here would 
that not all XML (e.g. SVG) is really text.

And from even further out in left field: why doesn't MIME allow 
sub-sub types? e.g. text/xml/rdf?

I can believe there are really good reasons this hasn't been 
proposed. However it does seem to me that we're struggling because we 
really need to attach four labels to a typical XML document:

1. That it's text
2. That it's XML
3. That it's some specific XML application
4. That it has a particular encoding

And MIME only gives us three places to work with.

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
+----------------------------------+---------------------------------+


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id HAA26591 for ietf-xml-mime-bks; Sun, 19 Mar 2000 07:59:42 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA26587 for <ietf-xml-mime@imc.org>; Sun, 19 Mar 2000 07:59:40 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN71Y1NRLC00027Z@MAUVE.INNOSOFT.COM> for ietf-xml-mime@imc.org; Sun, 19 Mar 2000 08:00:54 -0800 (PST)
Date: Sun, 19 Mar 2000 07:40:41 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Sun, 19 Mar 2000 08:17:35 -0500" <200003191316.IAA31649@hesketh.net>
To: Valdis.Kletnieks@vt.edu
Cc: ietf-xml-mime@imc.org
Message-id: <01JN7KF82U2U00027Z@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> To summarize:

> Most of us are of the opinion that indiscriminate sniffing of objects
> is a Bad Idea unless you're of the canine persuasion, and that some
> form of tagging is a Good Idea.  The question we seem to be hung up on
> is whether we should:

> a) Use application/xml with Content-Feature tagging.  This would
> require more work for some MIME dispatchers to implement up front, but
> buys us automatic drop-back to generic XML if added support isn't
> available.

It isn't just the dispatchers that have to be upgraded, but also the MIME
object label generators. That is, you have to upgrade both the sending and
receiving system for this to work, since a lot of existing labelling systems
don't have the ability to generate content parameters or a content-feature
field. And unless you upgrade the sender, you're stuck with either generic XML
handling or possibly even octet-stream handling. 

This also doesn't make it clear how hard the dispatcher upgrade is in this
case. This isn't some simple tweak to the underlying code -- in many cases a
redesign of the user interface will be necessary to add the ability to specify
control-by-parameter or control-by-content-feature.

And there's also the issue of how you make this work in contexts where MIME
types are used but content parameters or content-feature expressions are not
allowed. (Such uses exist in various popular HTML hacks, for example.) While
you can argue that such usage is broken, it nevertheless exists and is widely
used. So now you are faced with having to redesign a bunch of protocol, much of
which we have no say over, and where way to do it in some cases isn't all that
obvious, and fix these implementations as well. Frankly, I don't think this is
ever going to happen. What will happen is that XML format developers will
continue to insist on registering specific media types, and there's absolutely
nothing in our rules that says they can't.

> Non-upgraded clients would require (possibly) *one*
> intervention to add 'application/xml' to their "how to dispatch THIS
> one" tables to point at a generic XML application, but there's no
> guarantee that added features would be usable, even if installed, if
> their MIME handler doesn't do Content-Feature.

Experience with similar issues surrounding multipart/signed say it
won't be usable. (Unforunately with multipart/signed there was no
useful alternative.)

> b) Use application/foobar-xml.  This apparently requires less work for
> some MIME dispatchers to support, at the expense of creating a "rule"
> that *-xml is all xml if you don't want to drop back to octet-stream.
> In addition, *non-upgraded* clients get to force hand-adding of this
> week's foobar-xml to a table of "how to dispatch THIS one" (even if
> it's Yet Another 'hand it to our XML application') if they want "better
> than octet-stream" handling.

And the latter can be (and often is in practice) done with the assistance
of a "configuration wizard", making the process relatively painless even
for inexperienced users.

> In *both* cases, client software that *has* been upgraded (for either
> the *-xml rule or to correctly dispatch based on a content-feature tag)
> will be able to hand it off to either a specific featureful XML application
> if available, or to a generic XML application if the bells-and-whistles
> aren't installed.  However, we're not sure yet which track involves less
> total aggrivation for upgrading.

We're talking about a process that requires fairly complex sender and receiver
upgrade, as opposed to one that requires a relatively simple receiver upgrade.
The comparison leaves me with little doubt as to which is easier.

> Have I mis-stated any major points of the two camps?

Mis-stated, no, but stated incompletely, yes.

				Ned



Received: by ns.secondary.com (8.9.3/8.9.3) id GAA25840 for ietf-xml-mime-bks; Sun, 19 Mar 2000 06:24:37 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id GAA25830; Sun, 19 Mar 2000 06:24:30 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id JAA00925; Sun, 19 Mar 2000 09:26:07 -0500
Message-Id: <200003191426.JAA00925@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Sun, 19 Mar 2000 09:27:16 -0500
To: Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion 
Cc: ietf-822@imc.org
In-Reply-To: <200003191316.IAA31649@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 08:17 AM 3/19/00 -0500, Valdis.Kletnieks@vt.edu wrote:
>If you go the "application/xml; Content-Feature=foobar" route, you're
>DONE.  You don't have the added support handy, it just drops to
>wherever you configured Mozilla to pass the stuff.  The grounds that
>"its possible" here is that it's tagged as 'application/xml', which
>should be sufficient for tagging that it's possible.  This is actually
>*less* work to make work under Mozilla, at the expense of *not* passing
>along a hint to Notepad/XMLSpy/whatever that it's a *FOOBAR* xml, not
>just a generic xml.

Rereading this paragraph raises further questions about the value of
Content-Feature in my mind.  Less work under Mozilla, but not working with
applications outside of a particular transaction framework (HTTP in this
case) seems like an enormous overall loss to me.

Am I reading this right?

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id GAA25635 for ietf-xml-mime-bks; Sun, 19 Mar 2000 06:04:49 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id GAA25622; Sun, 19 Mar 2000 06:04:26 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id JAA00561; Sun, 19 Mar 2000 09:06:02 -0500
Message-Id: <200003191406.JAA00561@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Sun, 19 Mar 2000 09:07:11 -0500
To: Valdis.Kletnieks@vt.edu, ietf-xml-mime@imc.org
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion 
Cc: ietf-822@imc.org
In-Reply-To: <200003191316.IAA31649@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 08:17 AM 3/19/00 -0500, Valdis.Kletnieks@vt.edu wrote:
>Most of us are of the opinion that indiscriminate sniffing of objects
>is a Bad Idea unless you're of the canine persuasion, and that some
>form of tagging is a Good Idea.  The question we seem to be hung up on
>is whether we should:
>
>a) Use application/xml with Content-Feature tagging.  This would
>require more work for some MIME dispatchers to implement up front, but
>buys us automatic drop-back to generic XML if added support isn't
>available.  Non-upgraded clients would require (possibly) *one*
>intervention to add 'application/xml' to their "how to dispatch THIS
>one" tables to point at a generic XML application, but there's no
>guarantee that added features would be usable, even if installed, if
>their MIME handler doesn't do Content-Feature.
>
>b) Use application/foobar-xml.  This apparently requires less work for
>some MIME dispatchers to support, at the expense of creating a "rule"
>that *-xml is all xml if you don't want to drop back to octet-stream.
>In addition, *non-upgraded* clients get to force hand-adding of this
>week's foobar-xml to a table of "how to dispatch THIS one" (even if
>it's Yet Another 'hand it to our XML application') if they want "better
>than octet-stream" handling.
>
>In *both* cases, client software that *has* been upgraded (for either
>the *-xml rule or to correctly dispatch based on a content-feature tag)
>will be able to hand it off to either a specific featureful XML application
>if available, or to a generic XML application if the bells-and-whistles
>aren't installed.  However, we're not sure yet which track involves less
>total aggrivation for upgrading.

I think that's a reasonable summary of what's going on, and I didn't repeat
the details above it because I think the summary cuts to the heart of the
problem.  (Once sniffing is discarded, anyway.)  We've kicked around a few
times, and mostly kicked it out.  My largest problems with Content-Feature
have to do with human interactions.  There are two aspects to this:

1) Many users have a hard enough time figuring out what MIME Content-Types
are, and adding another layer that only gets use with XML (today, anyway)
is going to be a hard sell to these folks.  Effectively, it's putting a
third layer on to the content type, and doing so in a way that doesn't look
like every other content type.  I'm just not convinced that people will
look past Content-Types.

2) People have grown accustomed to the basic top-level types carrying
meaningful information.  The Content-Feature approach would mean that
Scalable Vector Graphics (SVG) would be application/xml;
Content-Feature=svg  -- not not image/svg-xml.  The contents of those two
descriptions are significantly different, though they purport to describe
the same thing.  This doesn't apply in every case, since a lot of XML
formats belong in application/, but it applies in a significant number of
cases.

The other problem I have with Content-Feature is that it isn't clear to me
- based on my RFC dredgings, which may not be complete - whether or not
there is any kind of registration process or even official status for
Content-Feature.  We've already had several 'acronym collisions' in the XML
world (Steel Markup Language vs. Simple Markup Lanugage, for instance), and
I'd like to make certain these collisions don't all follow us into
MIME-based processing.  (I'm aware that x- is the wild west, but I'd like
to ensure that there's a safer option.)

There have been other objections to parameters along the way, and perhaps
they'll resurface now.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id FAA25226 for ietf-xml-mime-bks; Sun, 19 Mar 2000 05:14:51 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id FAA25222 for <ietf-xml-mime@imc.org>; Sun, 19 Mar 2000 05:14:50 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id IAA31649 for <ietf-xml-mime@imc.org>; Sun, 19 Mar 2000 08:16:26 -0500
Message-Id: <200003191316.IAA31649@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: <ietf-xml-mime@imc.org>
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Sun, 19 Mar 2000 08:17:35 -0500
To: ietf-xml-mime@imc.org
From: Valdis.Kletnieks@vt.edu (by way of "Simon St.Laurent" <simonstl@simonstl.com>)
Subject: Re: Finishing the XML-tagging discussion 
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

[Redirected - This came to me via ietf-822@imc.org, but it belongs here as
well.  I'm not sure the author is subscribed to ietf-xml-mime. - Simon]

On Sat, 18 Mar 2000 10:58:48 EST, "Simon St.Laurent"
<simonstl@simonstl.com>  said:
> I meant this within a framework of XML processing that goes beyond generic
> document+style sheet->presentation.  There are many document-oriented
> applications out there where application/xml or text/xml is perfectly
> appropriate and does what their users need them to do.

OK.. If I'm interpreting what you said here, it's that many document-oriented
XML objects can be tagged as {application,text}/xml and sent out the door,
and that will suffice.
 
> Some of these may eventually develop the need for more specific MIME types,
> of course, but lots of them will likely stay in application/xml or
> text/xml.  Having the -xml suffix would also ease the integration of these
> new MIME types with other XML documents.
> 
> The main argument continues to hold for XML formats that receive more
> specific automated processing.

Now, based on another message that preceeded this, I think we agree
that "some XML objects could utilize further tagging because sniffing
around the inside of objects becomes ugly".  What's unclear (at least
to me, at the moment I write this) is whether you are specifically
advocating tagging such things with 'application/foobar-xml' or whether
just iterating that some tagging (possibly via Content-Feature: or
whatever) is needed.  Here, you seem to be saying -xml suffixes are
a Good Idea (and making a good case for it).  However...

On Sat, 18 Mar 2000 10:38:18 EST, "Simon St.Laurent"
<simonstl@simonstl.com>  said:
> I've been dreaming about that for a while.  Connect it to an editor, and I
> know a lot of people who'd love to buy the product.  Even if I didn't have
> an SVG editor handy, or some kind of generic image tool, I could tell the
> browser to pass it to my preferred XML editor - Notepad, XMLSpy, XMetaL,
> whatever - on grounds that the suffix suggests its possible.

> I wonder what it would take to build that into Mozilla?  Hmmmm....

If you go the "application/xml; Content-Feature=foobar" route, you're
DONE.  You don't have the added support handy, it just drops to
wherever you configured Mozilla to pass the stuff.  The grounds that
"its possible" here is that it's tagged as 'application/xml', which
should be sufficient for tagging that it's possible.  This is actually
*less* work to make work under Mozilla, at the expense of *not* passing
along a hint to Notepad/XMLSpy/whatever that it's a *FOOBAR* xml, not
just a generic xml.

<taking a deep breath>

To summarize:

Most of us are of the opinion that indiscriminate sniffing of objects
is a Bad Idea unless you're of the canine persuasion, and that some
form of tagging is a Good Idea.  The question we seem to be hung up on
is whether we should:

a) Use application/xml with Content-Feature tagging.  This would
require more work for some MIME dispatchers to implement up front, but
buys us automatic drop-back to generic XML if added support isn't
available.  Non-upgraded clients would require (possibly) *one*
intervention to add 'application/xml' to their "how to dispatch THIS
one" tables to point at a generic XML application, but there's no
guarantee that added features would be usable, even if installed, if
their MIME handler doesn't do Content-Feature.

b) Use application/foobar-xml.  This apparently requires less work for
some MIME dispatchers to support, at the expense of creating a "rule"
that *-xml is all xml if you don't want to drop back to octet-stream.
In addition, *non-upgraded* clients get to force hand-adding of this
week's foobar-xml to a table of "how to dispatch THIS one" (even if
it's Yet Another 'hand it to our XML application') if they want "better
than octet-stream" handling.

In *both* cases, client software that *has* been upgraded (for either
the *-xml rule or to correctly dispatch based on a content-feature tag)
will be able to hand it off to either a specific featureful XML application
if available, or to a generic XML application if the bells-and-whistles
aren't installed.  However, we're not sure yet which track involves less
total aggrivation for upgrading.

Have I mis-stated any major points of the two camps?

				Valdis Kletnieks
				Operating Systems Analyst
				Virginia Tech



Received: by ns.secondary.com (8.9.3/8.9.3) id HAA15792 for ietf-xml-mime-bks; Sat, 18 Mar 2000 07:56:15 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA15780; Sat, 18 Mar 2000 07:56:11 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id KAA20618; Sat, 18 Mar 2000 10:57:48 -0500
Message-Id: <200003181557.KAA20618@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Sat, 18 Mar 2000 10:58:48 -0500
To: Paul Hoffman / IMC <phoffman@imc.org>, ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion
In-Reply-To: <200003181531.KAA19936@hesketh.net>
References: <4.3.2.20000317160736.00cd4540@mail.imc.org> <200003172347.SAA24551@hesketh.net> <4.3.2.20000317145834.04f45100@mail.imc.org> <01JN4P9M11HA00004E@MAUVE.INNOSOFT.COM> <"Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net> <4.3.2.20000316201037.00bc5860@not-real.proper.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Just catching myself on an overstatement.

At 10:32 AM 3/18/00 -0500, I wrote:
>I don't think text/xml and application/xml are going to have much of a life
>except as a holding tank for XML formats that are either in development,
>and thus not ready for a MIME content identifier, or which aren't meant for
>any kind of large scale interchange.

I meant this within a framework of XML processing that goes beyond generic
document+style sheet->presentation.  There are many document-oriented
applications out there where application/xml or text/xml is perfectly
appropriate and does what their users need them to do.

Some of these may eventually develop the need for more specific MIME types,
of course, but lots of them will likely stay in application/xml or
text/xml.  Having the -xml suffix would also ease the integration of these
new MIME types with other XML documents.

The main argument continues to hold for XML formats that receive more
specific automated processing.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id HAA15546 for ietf-xml-mime-bks; Sat, 18 Mar 2000 07:35:46 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA15535; Sat, 18 Mar 2000 07:35:43 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id KAA20067; Sat, 18 Mar 2000 10:37:18 -0500
Message-Id: <200003181537.KAA20067@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Sat, 18 Mar 2000 10:38:18 -0500
To: Tim Bray <tbray@textuality.com>, Paul Hoffman / IMC <phoffman@imc.org>, ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion
In-Reply-To: <3.0.32.20000317172905.0135ab20@pop.intergate.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 05:29 PM 3/17/00 -0800, Tim Bray wrote:
>I repeat: whether the inventors of the media type, today, design it
>for dispatch to generic XML processing machinery (and most won't) is
>irrelevant to the point here.  The core notion is that the knowledge
>that some media type happens to use XML syntax opens the door to 
>unanticipated kinds of processing beyond those envisioned by its 
>inventors, and on this basis identifying such encodings is a good
>and useful thing.

Exactly.  These 'unanticipated kinds of processing' suggest that general
guidelines for how to label XML documents are a better idea than leaving it
up to groups of developers tightly focused on a particular type of
processing that only meets current needs.

>Up till a year ago, I might have agreed.  As of now, there is a lot of
>stuff floating around the net that claims to be XML.  For a lot of it,
>I don't have the appropriate receiving software on hand; in these cases,
>I've found it tremendously useful to throw it into IE5, which first of 
>all, tells me quickly whether it really is XML or not, and secondly, allows
>me to look at it in not-too-unreadable form.  I would *love* it if I 
>could configure my user-agent to, whenever it sees a media type it
>doesn't grok but happens to look like foo/bar-xml, to turn it over to the 
>generic XML-capable browser.  [Or even better, based on some other mime-level
>mechanism more elegant than the "-xml" suffix.]

I've been dreaming about that for a while.  Connect it to an editor, and I
know a lot of people who'd love to buy the product.  Even if I didn't have
an SVG editor handy, or some kind of generic image tool, I could tell the
browser to pass it to my preferred XML editor - Notepad, XMLSpy, XMetaL,
whatever - on grounds that the suffix suggests its possible.

I wonder what it would take to build that into Mozilla?  Hmmmm....


Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id HAA15475 for ietf-xml-mime-bks; Sat, 18 Mar 2000 07:31:12 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA15447; Sat, 18 Mar 2000 07:29:51 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id KAA19936; Sat, 18 Mar 2000 10:31:27 -0500
Message-Id: <200003181531.KAA19936@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Sat, 18 Mar 2000 10:32:26 -0500
To: Paul Hoffman / IMC <phoffman@imc.org>, ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion
In-Reply-To: <4.3.2.20000317160736.00cd4540@mail.imc.org>
References: <200003172347.SAA24551@hesketh.net> <4.3.2.20000317145834.04f45100@mail.imc.org> <01JN4P9M11HA00004E@MAUVE.INNOSOFT.COM> <"Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net> <4.3.2.20000316201037.00bc5860@not-real.proper.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 04:22 PM 3/17/00 -0800, Paul Hoffman / IMC wrote:
>Nope, we agree that there are very many. Where we disagree is in how they 
>will labelled when they are moved in MIME-based systems.
>
>Your assumption appears to be that they will all follow the lead of IOTP 
>and have their own sub-type tags. A different assumption is that they will 
>mostly use text/xml and application/xml. Doing so makes them quite friendly 
>to the random MIME parsers you envision. Each will have a unique DTD 
>identifier, so there is no need to have their own subtype.

I don't think text/xml and application/xml are going to have much of a life
except as a holding tank for XML formats that are either in development,
and thus not ready for a MIME content identifier, or which aren't meant for
any kind of large scale interchange.

If I'm going to the trouble of organizing a group of developers to create a
format we all agree on, I'm definitely going to take the tiny extra step of
creating a label for it that we can agree to use to simplify interchange.
If I have a million documents arriving at a gateway every minute, for three
different sets of purposes involving different DTDs, I'm not going to want
to sniff every document to find out what it contains.  I'd much rather look
at a content-type label and THEN pass it to a processor that can combine
the sniffing/validation with parsing that actually brings the information
into my application structures.

XML's ability to ride commodity HTTP infrastructures makes this possible,
cheap, and very likely.  I don't think the creators of most XML formats are
going to go to the length of creating their own transport protocols, but
they will want to be able to take maximum advantage of existing protocols.
The scenario described above takes maximum advantage of an HTTP server
(most likely) and can perform extensive and efficient processing without
having to resort to sniffing.

Sniffing XML can be fairly ugly - DOCTYPE declarations aren't required, and
they can appear fairly deep into a document under certain unpreventable
circumstances. (The XML declaration, comments, and processing instructions
can occupy space before the DOCTYPE declaration.)  Even sniffing the
DOCTYPE isn't completely reliable, thanks to a variety of issues involving
default values for namespaces within external DTDs and overrides inside the
internal DTD...

Sniffing is not an acceptable solution in a large-scale environment, in any
case, and the widespread availability of commodity components will lead
developers to find the easiest way to connect those components.  Sniffing's
a mess.

While I can always slap the wrong MIME content type identifier on a
document and cause problems that resemble a failed sniff, that's a risk
with any MIME identifier and so far as I know the Internet is still running.

>For the record, I don't know why the authors of IOTP chose to use a 
>different sub-tag; they may have a very good reason. But my guess is that 
>most XML-based applications that want to be found by generic XML parsers 
>SHOULD use text/xml and application/xml.

'want to be found' is making assumptions that don't hold very well with
XML.  Suppose that the gateway described above is also making copies of all
messages for regulatory reasons, passing them to different data
repositories based on their content.  Everything except the XML gets passed
to a traditional file storage system, while the XML gets fed into a
hierarchical store that provides random access to the information.

Having an -xml suffix on the information would make it extremely easy to
sort out the XML from the non-XML without relying on fallible and
inefficient sniffing techniques.  If messages started arriving in XML
without the -xml (say in an x- type), they couldn't be analyzed with the
same set of tools.  Human intervention might be worthwhile at that point,
but the overall costs are a lot lower.

>Like Ned, I have nothing against -xml as a concept. I'm just convinced that 
>the systems that use it will also reflexively resort to sniffing everything 
>anyway, so why give the false impression that all subtags that go over XML 
>should end in -xml? Let them sniff away.

I don't know where your assumptions about programmers come from - the ones
I know tend to prefer the easiest way of doing things.  And why the 'false
impression' rhetoric?  Should I talk about the 'false impression' that
image/png provides?

Put simply, sniffing isn't done with off-the-shelf parts and it doesn't
scale very well.  I think most programmers would rather read a clear label
than sniff the box to find out if it contains perfume, bills, or manure.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id FAA11820 for ietf-xml-mime-bks; Sat, 18 Mar 2000 05:05:27 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id FAA11815 for <ietf-xml-mime@imc.org>; Sat, 18 Mar 2000 05:05:26 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id IAA09660; Sat, 18 Mar 2000 08:06:44 -0500
Message-ID: <38D37F63.DD1887CB@w3.org>
Date: Sat, 18 Mar 2000 14:06:43 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: Miles Sabin <msabin@cromwellmedia.co.uk>
CC: ietf-xml-mime@imc.org
Subject: Re: MIME types and content negotiation
References: <AA4C152BA2F9D211B9DD0008C79F760A95D743@odin.cromwellmedia.co.uk>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Miles Sabin wrote:

> The parameter approach would have text/xml and application/xml
> match generic xml, and, eg.,
> 
>  application/xml;app="svg"
> 
> match svg in particular.

Hmm but that means that things which are image, model, video etc are not in
the correct trees, just because they happen to be encoded in xml.

I still want to be able to say image/* for example, and for that to include
SVG images as well.

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id RAA27162 for ietf-xml-mime-bks; Fri, 17 Mar 2000 17:26:26 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id RAA27147; Fri, 17 Mar 2000 17:26:13 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id RAA07557; Fri, 17 Mar 2000 17:27:42 -0800
Message-Id: <3.0.32.20000317172905.0135ab20@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Fri, 17 Mar 2000 17:29:13 -0800
To: Paul Hoffman / IMC <phoffman@imc.org>, ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
From: Tim Bray <tbray@textuality.com>
Subject: Re: Finishing the XML-tagging discussion
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 04:22 PM 3/17/00 -0800, Paul Hoffman / IMC wrote:

>For the record, I don't know why the authors of IOTP chose to use a 
>different sub-tag; they may have a very good reason. But my guess is that 
>most XML-based applications that want to be found by generic XML parsers 
>SHOULD use text/xml and application/xml.

I repeat: whether the inventors of the media type, today, design it
for dispatch to generic XML processing machinery (and most won't) is
irrelevant to the point here.  The core notion is that the knowledge
that some media type happens to use XML syntax opens the door to 
unanticipated kinds of processing beyond those envisioned by its 
inventors, and on this basis identifying such encodings is a good
and useful thing.

>Simon St. Laurent wrote
>>I think we have another fundamental disagreement here.  Such 'specialized
>>systems' are already distributed by the millions, in the forms of IE 5 and
>>Mozilla.  Right now, they have pretty limited capabilities, but they're a
>>start.
>
>Yes, we disagree here. There is no reason to automatically dispatch 
>application/foo-xml in these XML systems. 

Up till a year ago, I might have agreed.  As of now, there is a lot of
stuff floating around the net that claims to be XML.  For a lot of it,
I don't have the appropriate receiving software on hand; in these cases,
I've found it tremendously useful to throw it into IE5, which first of 
all, tells me quickly whether it really is XML or not, and secondly, allows
me to look at it in not-too-unreadable form.  I would *love* it if I 
could configure my user-agent to, whenever it sees a media type it
doesn't grok but happens to look like foo/bar-xml, to turn it over to the 
generic XML-capable browser.  [Or even better, based on some other mime-level
mechanism more elegant than the "-xml" suffix.]

>Tim Bray gave a few far-fetched 
>but possibly valid ones, but all of them would need to be used content 
>sniffing even if we had -xml because each one really, really wanted to be 
>sure to get all XML. 

Far-fetched is a matter of opinion; but the conclusion doesn't follow; one
of the virtues of XML is that if data which purports to be isn't, you
find out quickly and with deterministic and easy-to-handle results.
This doesn't mean that it's a good idea to *always* check opaque media
types for XML-ness; XML processors are not as lightweight as they should
be (sigh, mea culpa) and I don't think you want to poking around at
the beginning of everything that's going through.  -Tim



Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id QAA25857 for ietf-xml-mime-bks; Fri, 17 Mar 2000 16:20:51 -0800 (PST)
Received: from laptop.imc.org (ip12.proper.com [165.227.249.12]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id QAA25852; Fri, 17 Mar 2000 16:20:50 -0800 (PST)
Message-Id: <4.3.2.20000317160736.00cd4540@mail.imc.org>
X-Sender: phoffman@mail.imc.org
X-Mailer: QUALCOMM Windows Eudora Version 4.3
Date: Fri, 17 Mar 2000 16:22:23 -0800
To: ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
From: Paul Hoffman / IMC <phoffman@imc.org>
Subject: Re: Finishing the XML-tagging discussion
In-Reply-To: <200003172347.SAA24551@hesketh.net>
References: <4.3.2.20000317145834.04f45100@mail.imc.org> <01JN4P9M11HA00004E@MAUVE.INNOSOFT.COM> <"Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net> <4.3.2.20000316201037.00bc5860@not-real.proper.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 06:48 PM 3/17/00 -0500, Simon St.Laurent wrote:
>At 03:03 PM 3/17/00 -0800, Paul Hoffman / IMC wrote:
> >"Growing" here means probably about one new type
> >every week or so. This greatly reduces the power of the word "burden".
>
>I think we have fundamental disagreements here about how many XML
>vocabularies are in development today,

Nope, we agree that there are very many. Where we disagree is in how they 
will labelled when they are moved in MIME-based systems.

Your assumption appears to be that they will all follow the lead of IOTP 
and have their own sub-type tags. A different assumption is that they will 
mostly use text/xml and application/xml. Doing so makes them quite friendly 
to the random MIME parsers you envision. Each will have a unique DTD 
identifier, so there is no need to have their own subtype.

For the record, I don't know why the authors of IOTP chose to use a 
different sub-tag; they may have a very good reason. But my guess is that 
most XML-based applications that want to be found by generic XML parsers 
SHOULD use text/xml and application/xml.

>I think we have another fundamental disagreement here.  Such 'specialized
>systems' are already distributed by the millions, in the forms of IE 5 and
>Mozilla.  Right now, they have pretty limited capabilities, but they're a
>start.

Yes, we disagree here. There is no reason to automatically dispatch 
application/foo-xml in these XML systems. It may be valuable to manually 
dispatch those documents to your browser's XML parser, but rarely or never 
automatically. We are talking about trying to help automatic processing, 
not any processing.

>Agents, that collapsed dream of a few years ago, are another likely
>returning possibility that will also have the needs of these 'specialized
>systems'.

I still await some good examples of these. Tim Bray gave a few far-fetched 
but possibly valid ones, but all of them would need to be used content 
sniffing even if we had -xml because each one really, really wanted to be 
sure to get all XML. The problem with the -xml system is that it has to be 
used 100% of the time in order to be useful in the given examples. 
Unfortunately, we're 100% sure that some content creators are going to end 
up sticking XML in text/plain and application/octet-stream bodies.

>Add to that a number of folks out there implementing general architectures
>for XML interchange and processing, some of them with plans for every
>desktop a processor, and I think there are more than a few search engines
>in the room.

If the search engine is XML-only, and only wanted to recognized things it 
had been told were XML (either by full type/subtype or subtypes that end in 
-xml), then, yes, you have a point. What's the chance of that? I think it's 
much more likely that a search engine is going to read every damn text 
object it sees and, if it looks like XML, do some smarter indexing in it.

Like Ned, I have nothing against -xml as a concept. I'm just convinced that 
the systems that use it will also reflexively resort to sniffing everything 
anyway, so why give the false impression that all subtags that go over XML 
should end in -xml? Let them sniff away.

--Paul Hoffman, Director
--Internet Mail Consortium



Received: by ns.secondary.com (8.9.3/8.9.3) id QAA25726 for ietf-xml-mime-bks; Fri, 17 Mar 2000 16:15:04 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id QAA25710; Fri, 17 Mar 2000 16:14:56 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN4YKQYOQ80001TI@MAUVE.INNOSOFT.COM>; Fri, 17 Mar 2000 16:16:08 -0800 (PST)
Date: Fri, 17 Mar 2000 16:10:10 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Fri, 17 Mar 2000 15:03:33 -0800" <4.3.2.20000317145834.04f45100@mail.imc.org>
To: Paul Hoffman / IMC <phoffman@imc.org>
Cc: ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN594JKL7Y0001TI@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii; format=flowed
References: <4.3.2.20000316201037.00bc5860@not-real.proper.com> <01JN4P9M11HA00004E@MAUVE.INNOSOFT.COM>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > > If this growing and distributed burden is more attractive than a 4-byte
> > > naming convention that doesn't interfere with existing processing, then I
> > > suppose we should drop the suffix, and find out how popular this
> > > non-approach proves to be in a couple of years.  At that point, it will be
> > > very difficult to fix things.  I continue to argue that the suffix is a
> > > remarkably low-cost solution with significant benefits.

> >Well, you've just convinced me. I hereby retract my assertion that
> >content sniffing is a "mostly harmless" but partial solution to this
> >problem. In light of this it doesn't look like a solution at all.

> Ned, I remain unconvinced. "Growing" here means probably about one new type
> every week or so.

Speaking as the media type reviewer, the run rate has proved to be quite
a bit higher than this. And I expect it to grow a lot in the future.

> This greatly reduces the power of the word "burden".
> Remember, this is only of interest to specialized systems, that is, the
> ones who want to pass random (that is, unknown format) XML to a generic XML
> parser. Yes, that's "distributed"... over a very small number of systems
> that wan to do this. Even if it is many systems, the cost of sniffing one
> new document type a week is incredibly low, and is only incurred in systems
> where there is not a human who updates the type-to-translation table.

But the corresponding costs of the naming convention are incredibly low.
So, even if the utility is limited (and I'm growing convinced that it isn't
as limited as I first thought), the cost-benefit conclusion seems clear.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id PAA25186 for ietf-xml-mime-bks; Fri, 17 Mar 2000 15:45:50 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id PAA25174; Fri, 17 Mar 2000 15:45:46 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id SAA24551; Fri, 17 Mar 2000 18:47:16 -0500
Message-Id: <200003172347.SAA24551@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Fri, 17 Mar 2000 18:48:15 -0500
To: Paul Hoffman / IMC <phoffman@imc.org>, ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion
In-Reply-To: <4.3.2.20000317145834.04f45100@mail.imc.org>
References: <01JN4P9M11HA00004E@MAUVE.INNOSOFT.COM> <"Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net> <4.3.2.20000316201037.00bc5860@not-real.proper.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 03:03 PM 3/17/00 -0800, Paul Hoffman / IMC wrote:
>"Growing" here means probably about one new type 
>every week or so. This greatly reduces the power of the word "burden".

I think we have fundamental disagreements here about how many XML
vocabularies are in development today, and how many will be in development
in the future.  Maybe I should do a survey on Robin Cover's site for
vocabulary emergence frequency, keeping in mind that this incredibly
comprehensive site represents a subset.

(http://www.oasis-open.org/cover/ - BizTalk's probably worth exploring as
well, and there are more..)
 
>Remember, this is only of interest to specialized systems, that is, the 
>ones who want to pass random (that is, unknown format) XML to a generic XML 
>parser. 

I think we have another fundamental disagreement here.  Such 'specialized
systems' are already distributed by the millions, in the forms of IE 5 and
Mozilla.  Right now, they have pretty limited capabilities, but they're a
start.

Agents, that collapsed dream of a few years ago, are another likely
returning possibility that will also have the needs of these 'specialized
systems'.

Add to that a number of folks out there implementing general architectures
for XML interchange and processing, some of them with plans for every
desktop a processor, and I think there are more than a few search engines
in the room.

>Yes, that's "distributed"... over a very small number of systems 
>that wan to do this. Even if it is many systems, the cost of sniffing one 
>new document type a week is incredibly low, and is only incurred in systems 
>where there is not a human who updates the type-to-translation table.

I think we're at odds on the fundamentals, but I think the cost your vision
is imposing on my vision is rather dramatically greater than the cost my
vision imposes upon your vision.

Should be a fun weekend of discussion!


Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id PAA24457 for ietf-xml-mime-bks; Fri, 17 Mar 2000 15:02:02 -0800 (PST)
Received: from laptop.imc.org (ip12.proper.com [165.227.249.12]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id PAA24453; Fri, 17 Mar 2000 15:02:01 -0800 (PST)
Message-Id: <4.3.2.20000317145834.04f45100@mail.imc.org>
X-Sender: phoffman@mail.imc.org
X-Mailer: QUALCOMM Windows Eudora Version 4.3
Date: Fri, 17 Mar 2000 15:03:33 -0800
To: ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
From: Paul Hoffman / IMC <phoffman@imc.org>
Subject: Re: Finishing the XML-tagging discussion
In-Reply-To: <01JN4P9M11HA00004E@MAUVE.INNOSOFT.COM>
References: <"Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net> <4.3.2.20000316201037.00bc5860@not-real.proper.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 06:40 AM 3/17/00 -0800, ned.freed@INNOSOFT.COM wrote:
> > If this growing and distributed burden is more attractive than a 4-byte
> > naming convention that doesn't interfere with existing processing, then I
> > suppose we should drop the suffix, and find out how popular this
> > non-approach proves to be in a couple of years.  At that point, it will be
> > very difficult to fix things.  I continue to argue that the suffix is a
> > remarkably low-cost solution with significant benefits.
>
>Well, you've just convinced me. I hereby retract my assertion that
>content sniffing is a "mostly harmless" but partial solution to this
>problem. In light of this it doesn't look like a solution at all.

Ned, I remain unconvinced. "Growing" here means probably about one new type 
every week or so. This greatly reduces the power of the word "burden". 
Remember, this is only of interest to specialized systems, that is, the 
ones who want to pass random (that is, unknown format) XML to a generic XML 
parser. Yes, that's "distributed"... over a very small number of systems 
that wan to do this. Even if it is many systems, the cost of sniffing one 
new document type a week is incredibly low, and is only incurred in systems 
where there is not a human who updates the type-to-translation table.

--Paul Hoffman, Director
--Internet Mail Consortium



Received: by ns.secondary.com (8.9.3/8.9.3) id MAA21356 for ietf-xml-mime-bks; Fri, 17 Mar 2000 12:01:26 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id MAA21352 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 12:01:24 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN4YKQYOQ80001TI@MAUVE.INNOSOFT.COM> for ietf-xml-mime@imc.org; Fri, 17 Mar 2000 12:02:19 -0800 (PST)
Date: Fri, 17 Mar 2000 11:59:12 -0800 (PST)
Subject: Re: MIME types and content negotiation
In-reply-to: "Your message dated Fri, 17 Mar 2000 14:20:13 -0500" <200003171919.OAA11639@hesketh.net>
To: "Simon St.Laurent" <simonstl@simonstl.com>
Cc: Keith Moore <moore@cs.utk.edu>, Chris Lilley <chris@w3.org>, ietf-xml-mime@imc.org
Message-id: <01JN509U1QE20001TI@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
References: <200003171906.OAA23461@astro.cs.utk.edu>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> I'm getting a strong feeling that resistance on this issue isn't the suffix
> itself, it's resistance to any change at all in the MIME approach as
> defined in 1996.

Perhaps, although speaking as one of the coauthors of the MIME specification,
I have no problem with the proposal.

> I don't see the 'complexity' that certain other folks here are complaining
> about - in fact, I think these 4 bytes promise considerable simplification
> over any multi-parameter approach, for both humans and machines.

I agree. I'm not wild about the suffix, but I have become convinced that every
other approach (separate media type parameter, separate content-disposition
parameter, conneg tag, new top level media type, sniffing the data) loses in
some fairly major way.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id LAA21105 for ietf-xml-mime-bks; Fri, 17 Mar 2000 11:50:01 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id LAA21099 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 11:49:53 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id LAA06677; Fri, 17 Mar 2000 11:50:10 -0800
Message-Id: <3.0.32.20000317115137.01b31680@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Fri, 17 Mar 2000 11:51:39 -0800
To: Graham Klyne <GK@dial.pipex.com>, "Simon St.Laurent" <simonstl@simonstl.com>
From: Tim Bray <tbray@textuality.com>
Subject: Re: MIME types and content negotiation 
Cc: ietf-xml-mime@imc.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 07:28 PM 3/17/00 +0000, Graham Klyne wrote:
>At 02:20 PM 3/17/00 -0500, Simon St.Laurent wrote:
>>I don't see the 'complexity' that certain other folks here are complaining
>>about - in fact, I think these 4 bytes promise considerable simplification
>>over any multi-parameter approach, for both humans and machines.
>
>This argument might hold if the only requirement is to (a) recognize a 
>specific application of XML, and (b) (usually) recognize a generic 
>XML-based file format, even when the specific application is not recognized.

(b) is the only interesting requirement at this time.  Some people are 
convinced (admittedly, others aren't) that there are useful things you can 
and will want to do based only on the information that some data object 
claims to use XML syntax.  Thus the main goal here is to get something in 
at the MIME level to signal this case.

>I have read a lot of stuff recently that indicates the forward vision of 
>much W3C activity is that a single XML file may contain data corresponding 
>to many different XML applications.  Once we get into this kind of 
>territory, I think the idea of a MIME subtype suffix becomes hopelesly 
>lost, and some kind of multiparameter approach becomes the simplest way to 
>deal with the recognition problem.

I agree entirely.  There is lots of work to do to figure out how to
package up, transmit, and process multi-vocabulary XML documents, and I'm
far from convinced that the MIME layer is going to be the right place to
do it (although it's an option that needs to be examined closely).  But
that's not the problem that Murata/St. Laurent are trying to solve. -Tim



Received: by ns.secondary.com (8.9.3/8.9.3) id LAA20887 for ietf-xml-mime-bks; Fri, 17 Mar 2000 11:40:07 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id LAA20883 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 11:40:06 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id OAA12711; Fri, 17 Mar 2000 14:41:32 -0500
Message-Id: <200003171941.OAA12711@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-xml-mime@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Fri, 17 Mar 2000 14:41:30 -0500
To: Graham Klyne <GK@dial.pipex.com>
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: MIME types and content negotiation 
Cc: ietf-xml-mime@imc.org
In-Reply-To: <4.2.2.20000317192323.00b338e0@pop.dial.pipex.com>
References: <200003171919.OAA11639@hesketh.net> <200003171906.OAA23461@astro.cs.utk.edu> <Your message of "Fri, 17 Mar 2000 18:55:58 +0100."             <38D271AE.1433D70E@w3.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 07:28 PM 3/17/00 +0000, Graham Klyne wrote:
>At 02:20 PM 3/17/00 -0500, Simon St.Laurent wrote:
>
>>I don't see the 'complexity' that certain other folks here are complaining
>>about - in fact, I think these 4 bytes promise considerable simplification
>>over any multi-parameter approach, for both humans and machines.
>
>This argument might hold if the only requirement is to (a) recognize a 
>specific application of XML, and (b) (usually) recognize a generic 
>XML-based file format, even when the specific application is not recognized.

a and b describe the applications we're working on here, and are also
useful even when the problem described below - a very separate issue, in my
opinion - emerges.

>I have read a lot of stuff recently that indicates the forward vision of 
>much W3C activity is that a single XML file may contain data corresponding 
>to many different XML applications.  Once we get into this kind of 
>territory, I think the idea of a MIME subtype suffix becomes hopelesly 
>lost, and some kind of multiparameter approach becomes the simplest way to 
>deal with the recognition problem.

This is a separate problem.  The MIME type discussion in draft-murata-xml
is about identifiers for complete documents, not for the admittedly more
difficult problem of providing a manifest of all types within a document.

Put a different way, the MIME types are only describing the container, not
the content.  While an XHTML document may contain SVG, SMIL, and twelve
other vocabularies, the container is still XHTML.  From my perspective,
it's still worthwhile to know that the XHTML document is also XML, and
similarly it would be worthwhile to know that a 'polluted' SVG, SMIL, or
other document is XML.

In fact, having the suffix might give applications a heads-up that more
complex negotiations may be in order, whereas not having a suffix means
keeping track of these possibilities for every single type.  It might in
fact be worthwhile for a sophisticated generic processor to start by
recognizing the content as XML and then dispatching the subcomponents based
on its reading of the document.  At the same time, less generic processors
might want to know something more about the document than "it's XML."

I don't see this argument as making a case against the suffix.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id LAA20669 for ietf-xml-mime-bks; Fri, 17 Mar 2000 11:29:05 -0800 (PST)
Received: from dervish.mail.pipex.net (dervish.mail.pipex.net [158.43.192.70]) by ns.secondary.com (8.9.3/8.9.3) with SMTP id LAA20664 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 11:29:03 -0800 (PST)
Received: (qmail 5180 invoked from network); 17 Mar 2000 19:30:30 -0000
Received: from useraj04.uk.uudial.com (HELO GK-VAIO) (62.188.133.133) by smtp.dial.pipex.com with SMTP; 17 Mar 2000 19:30:30 -0000
Message-Id: <4.2.2.20000317192323.00b338e0@pop.dial.pipex.com>
X-Sender: maiw03@pop.dial.pipex.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Fri, 17 Mar 2000 19:28:15 +0000
To: "Simon St.Laurent" <simonstl@simonstl.com>
From: Graham Klyne <GK@dial.pipex.com>
Subject: Re: MIME types and content negotiation 
Cc: ietf-xml-mime@imc.org
In-Reply-To: <200003171919.OAA11639@hesketh.net>
References: <200003171906.OAA23461@astro.cs.utk.edu> <Your message of "Fri, 17 Mar 2000 18:55:58 +0100."             <38D271AE.1433D70E@w3.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 02:20 PM 3/17/00 -0500, Simon St.Laurent wrote:

>I don't see the 'complexity' that certain other folks here are complaining
>about - in fact, I think these 4 bytes promise considerable simplification
>over any multi-parameter approach, for both humans and machines.

This argument might hold if the only requirement is to (a) recognize a 
specific application of XML, and (b) (usually) recognize a generic 
XML-based file format, even when the specific application is not recognized.

I have read a lot of stuff recently that indicates the forward vision of 
much W3C activity is that a single XML file may contain data corresponding 
to many different XML applications.  Once we get into this kind of 
territory, I think the idea of a MIME subtype suffix becomes hopelesly 
lost, and some kind of multiparameter approach becomes the simplest way to 
deal with the recognition problem.

#g

------------
Graham Klyne
(GK@ACM.ORG)



Received: by ns.secondary.com (8.9.3/8.9.3) id LAA20444 for ietf-xml-mime-bks; Fri, 17 Mar 2000 11:17:47 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id LAA20440 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 11:17:46 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id OAA11639; Fri, 17 Mar 2000 14:19:16 -0500
Message-Id: <200003171919.OAA11639@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-xml-mime@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Fri, 17 Mar 2000 14:20:13 -0500
To: Keith Moore <moore@cs.utk.edu>, Chris Lilley <chris@w3.org>
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: MIME types and content negotiation 
Cc: ietf-xml-mime@imc.org
In-Reply-To: <200003171906.OAA23461@astro.cs.utk.edu>
References: <Your message of "Fri, 17 Mar 2000 18:55:58 +0100."             <38D271AE.1433D70E@w3.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 02:06 PM 3/17/00 -0500, Keith Moore wrote:
>> What do other folks think about being able to say "all the stuff with -xml"
>
>yeech.  this makes me think that putting the xml-ish attribute in
>a separate parameter is the way to go.

Could you provide a more detailed explanation of 'yecch'?

I'm getting a strong feeling that resistance on this issue isn't the suffix
itself, it's resistance to any change at all in the MIME approach as
defined in 1996.

I don't see the 'complexity' that certain other folks here are complaining
about - in fact, I think these 4 bytes promise considerable simplification
over any multi-parameter approach, for both humans and machines.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id LAA20181 for ietf-xml-mime-bks; Fri, 17 Mar 2000 11:07:15 -0800 (PST)
Received: from dervish.mail.pipex.net (dervish.mail.pipex.net [158.43.192.70]) by ns.secondary.com (8.9.3/8.9.3) with SMTP id LAA20172 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 11:07:13 -0800 (PST)
Received: (qmail 3010 invoked from network); 17 Mar 2000 19:08:40 -0000
Received: from userex19.uk.uudial.com (HELO GK-VAIO) (62.188.18.146) by smtp.dial.pipex.com with SMTP; 17 Mar 2000 19:08:40 -0000
Message-Id: <4.2.2.20000317184627.00a40720@pop.dial.pipex.com>
X-Sender: maiw03@pop.dial.pipex.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Fri, 17 Mar 2000 18:53:26 +0000
To: Miles Sabin <msabin@cromwellmedia.co.uk>
From: Graham Klyne <GK@dial.pipex.com>
Subject: RE: MIME types and content negotiation
Cc: ietf-xml-mime@imc.org
In-Reply-To: <AA4C152BA2F9D211B9DD0008C79F760A95D742@odin.cromwellmedia. co.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 05:59 PM 3/17/00 +0000, Miles Sabin wrote:
>Actually, looking over 14.1 suggests that, for http at least,
>the best way to deal with the sub-sub-type problem would be via
>an additional parameter.
>
>I know that this option has been raised before and rejected,
>but on the face of it it seems like being the best way of
>integrating with conneg.

Another approach to integrate would be to pick up the basic framework of 
TCN (RFC 2295), namely an "Accept-Features" header, but use the conneg 
syntax in place of the TCN-style feature expression.  The same conneg 
expressions could also be used by the server to describe variants.

A slightly different approach, in which the data source makes the final 
determination, might be modelled on a proposal being considered for 
Internet fax <draft-fax-content-negotiation-01.txt> (this updated version 
due to hit the ID directories any time now).

#g

------------
Graham Klyne
(GK@ACM.ORG)



Received: by ns.secondary.com (8.9.3/8.9.3) id LAA20171 for ietf-xml-mime-bks; Fri, 17 Mar 2000 11:07:13 -0800 (PST)
Received: from dervish.mail.pipex.net (dervish.mail.pipex.net [158.43.192.70]) by ns.secondary.com (8.9.3/8.9.3) with SMTP id LAA20162 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 11:07:12 -0800 (PST)
Received: (qmail 3004 invoked from network); 17 Mar 2000 19:08:39 -0000
Received: from userex19.uk.uudial.com (HELO GK-VAIO) (62.188.18.146) by smtp.dial.pipex.com with SMTP; 17 Mar 2000 19:08:39 -0000
Message-Id: <4.2.2.20000317184125.00ae2ca0@pop.dial.pipex.com>
X-Sender: maiw03@pop.dial.pipex.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Fri, 17 Mar 2000 18:44:48 +0000
To: Chris Lilley <chris@w3.org>
From: Graham Klyne <GK@dial.pipex.com>
Subject: Re: MIME types and content negotiation
Cc: "Simon St.Laurent" <simonstl@simonstl.com>, ietf-xml-mime@imc.org
In-Reply-To: <38D269A1.14FC4C4E@w3.org>
References: <"Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net> <4.3.2.20000316201037.00bc5860@not-real.proper.com> <200003171713.MAA04599@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 06:21 PM 3/17/00 +0100, Chris Lilley wrote:
>Accept: */*-xml;q=0.7, application/xml;q=0.5, text/xml;q=0.001

That is not a valid 'Accept' header, per RFC 2616 (see section 14.1).

I also think it's a horrible thing to do.

#g

------------
Graham Klyne
(GK@ACM.ORG)



Received: by ns.secondary.com (8.9.3/8.9.3) id LAA20163 for ietf-xml-mime-bks; Fri, 17 Mar 2000 11:07:12 -0800 (PST)
Received: from dervish.mail.pipex.net (dervish.mail.pipex.net [158.43.192.70]) by ns.secondary.com (8.9.3/8.9.3) with SMTP id LAA20157 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 11:07:11 -0800 (PST)
Received: (qmail 2983 invoked from network); 17 Mar 2000 19:08:37 -0000
Received: from userex19.uk.uudial.com (HELO GK-VAIO) (62.188.18.146) by smtp.dial.pipex.com with SMTP; 17 Mar 2000 19:08:37 -0000
Message-Id: <4.2.2.20000317182655.00ad9760@pop.dial.pipex.com>
X-Sender: maiw03@pop.dial.pipex.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Fri, 17 Mar 2000 18:40:16 +0000
To: "Simon St.Laurent" <simonstl@simonstl.com>
From: Graham Klyne <GK@dial.pipex.com>
Subject: Re: MIME types and content negotiation
Cc: ietf-xml-mime@imc.org
In-Reply-To: <200003171713.MAA04599@hesketh.net>
References: <01JN4P9M11HA00004E@MAUVE.INNOSOFT.COM> <"Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net> <4.3.2.20000316201037.00bc5860@not-real.proper.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 12:14 PM 3/17/00 -0500, Simon St.Laurent wrote:
>I'm wondering how this kind of content negotiation might or might not
>integrate with the use of the -xml suffix, and how it might or might not
>work if that suffix isn't used.

In my opinion, trying to base content negotiation on this kind of suffix 
takes us over the threshold from a "mostly harmless" but often useful way 
to recognize XML for generic handling into a realm of potential complexity 
and problems of the kind that Pete and Keith have been concerned about.

IMO, the current proposal (-xml suffix) will never be 100% perfect:  there 
may be content types containing XML that don't use the -xml extension.  For 
the purposes of recognition for generic XML processing this doesn't seem 
like too much of a problem.  And environments that don't have any concept 
of generic XML processing can simply ignore the naming convention.

To incorporate such a mechanism to protocol developments that really should 
try to achieve 100% accuracy, and to enshrine the naming conventions into 
future protocol developments is not, in my view, the way to go.

Once we start looking to content negotiation protocols, I'd suggest looking 
at indicating generic XML-ness through a CONNEG style media feature expression.

#g

------------
Graham Klyne
(GK@ACM.ORG)



Received: by ns.secondary.com (8.9.3/8.9.3) id LAA20134 for ietf-xml-mime-bks; Fri, 17 Mar 2000 11:06:14 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id LAA20130 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 11:06:12 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id OAA23461; Fri, 17 Mar 2000 14:06:04 -0500 (EST)
Message-Id: <200003171906.OAA23461@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: Chris Lilley <chris@w3.org>
cc: "Simon St.Laurent" <simonstl@simonstl.com>, ietf-xml-mime@imc.org
Subject: Re: MIME types and content negotiation 
In-reply-to: Your message of "Fri, 17 Mar 2000 18:55:58 +0100." <38D271AE.1433D70E@w3.org> 
Date: Fri, 17 Mar 2000 14:06:03 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> What do other folks think about being able to say "all the stuff with -xml"

yeech.  this makes me think that putting the xml-ish attribute in
a separate parameter is the way to go.

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id KAA18980 for ietf-xml-mime-bks; Fri, 17 Mar 2000 10:08:59 -0800 (PST)
Received: from odin.cromwellmedia.co.uk ([212.2.15.25]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id KAA18976 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 10:08:57 -0800 (PST)
Received: by odin.cromwellmedia.co.uk with Internet Mail Service (5.5.2448.0) id <GXMJR4ZX>; Fri, 17 Mar 2000 18:09:50 -0000
Message-ID: <AA4C152BA2F9D211B9DD0008C79F760A95D743@odin.cromwellmedia.co.uk>
From: Miles Sabin <msabin@cromwellmedia.co.uk>
To: ietf-xml-mime@imc.org
Subject: RE: MIME types and content negotiation
Date: Fri, 17 Mar 2000 18:09:41 -0000
X-Mailer: Internet Mail Service (5.5.2448.0)
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Chris Lilley wrote,
> For example it also means that I can't say
>
> Accept: image/cgm;Version=4;ProfileId="webcgm";q=0.8,
> image/cgm;Version=1;ProfileId=*;q=0.3

The only thing that looks wrong here is the 'ProfileId=*' bit.
But http allows for selection based on parameters.

> What do other folks think about being able to say "all the 
> stuff with -xml"

The parameter approach would have text/xml and application/xml
match generic xml, and, eg.,

 application/xml;app="svg"

match svg in particular.

Cheers


Miles

-- 
Miles Sabin                       Cromwell Media
Internet Systems Architect        5/6 Glenthorne Mews
+44 (0)20 8817 4030               London, W6 0LJ, England
msabin@cromwellmedia.com          http://www.cromwellmedia.com/



Received: by ns.secondary.com (8.9.3/8.9.3) id JAA18767 for ietf-xml-mime-bks; Fri, 17 Mar 2000 09:58:54 -0800 (PST)
Received: from odin.cromwellmedia.co.uk ([212.2.15.25]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA18763 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 09:58:53 -0800 (PST)
Received: by odin.cromwellmedia.co.uk with Internet Mail Service (5.5.2448.0) id <GXMJR4ZH>; Fri, 17 Mar 2000 17:59:50 -0000
Message-ID: <AA4C152BA2F9D211B9DD0008C79F760A95D742@odin.cromwellmedia.co.uk>
From: Miles Sabin <msabin@cromwellmedia.co.uk>
To: ietf-xml-mime@imc.org
Subject: RE: MIME types and content negotiation
Date: Fri, 17 Mar 2000 17:59:42 -0000
X-Mailer: Internet Mail Service (5.5.2448.0)
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

I wrote,
> 14.1 Accept
> 
>    Accept = "Accept" ":" #( media-range [ accept-params ] )
>    media-range = ("*/*" |
>                   (type "/" "*") |
>                   (type "/" subtype)) *( ";" parameter)

Actually, looking over 14.1 suggests that, for http at least, 
the best way to deal with the sub-sub-type problem would be via 
an additional parameter.

I know that this option has been raised before and rejected,
but on the face of it it seems like being the best way of
integrating with conneg.

Cheers,


Miles

-- 
Miles Sabin                       Cromwell Media
Internet Systems Architect        5/6 Glenthorne Mews
+44 (0)20 8817 4030               London, W6 0LJ, England
msabin@cromwellmedia.com          http://www.cromwellmedia.com/


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA18676 for ietf-xml-mime-bks; Fri, 17 Mar 2000 09:54:57 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA18665 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 09:54:51 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id MAA18603; Fri, 17 Mar 2000 12:56:04 -0500
Message-ID: <38D271AE.1433D70E@w3.org>
Date: Fri, 17 Mar 2000 18:55:58 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: "Simon St.Laurent" <simonstl@simonstl.com>
CC: ietf-xml-mime@imc.org
Subject: Re: MIME types and content negotiation
References: <"Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net> <4.3.2.20000316201037.00bc5860@not-real.proper.com> <200003171713.MAA04599@hesketh.net> <200003171731.MAA05321@hesketh.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

"Simon St.Laurent" wrote:
> 
> At 06:21 PM 3/17/00 +0100, Chris Lilley wrote:
> >> To me, it seems like there should be a way for a client to say "You can
> >> send me your XML, and I'll give it my best shot", rather than "these are
> >> the only twenty types I know of".
> >
> >Accept: */*-xml;q=0.7, application/xml;q=0.5, text/xml;q=0.001
> 
> That sounds very good to me - it seems like another place where the suffix
> is useful.
> 
> However, it isn't clear from HTTP 1.1 that */*-xml is acceptable in this
> use case - the BNF for media range looks like it accepts */* and type/*,
> where * can only be used for an entire subtype name.

Ah, you are right. So either this can't be done, or HTTP/1.1 wo\ould need a
revision as a result of the -xml convention (and perhaps other faceted
types, and parameters.)

For example it also means that I can't say

Accept: image/cgm;Version=4;ProfileId="webcgm";q=0.8,
image/cgm;Version=1;ProfileId=*;q=0.3

> That's why I'm not so positive that this intriguing possibility actually
> helps my argument, at least pending a revision of HTTP 1.1.
> On the other hand, I don't think it hurts it.

What do other folks think about being able to say "all the stuff with -xml"

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA18422 for ietf-xml-mime-bks; Fri, 17 Mar 2000 09:42:55 -0800 (PST)
Received: from odin.cromwellmedia.co.uk ([212.2.15.25]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA18418 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 09:42:54 -0800 (PST)
Received: by odin.cromwellmedia.co.uk with Internet Mail Service (5.5.2448.0) id <GXMJR4YQ>; Fri, 17 Mar 2000 17:43:40 -0000
Message-ID: <AA4C152BA2F9D211B9DD0008C79F760A95D741@odin.cromwellmedia.co.uk>
From: Miles Sabin <msabin@cromwellmedia.co.uk>
To: ietf-xml-mime@imc.org
Subject: RE: MIME types and content negotiation
Date: Fri, 17 Mar 2000 17:43:33 -0000
X-Mailer: Internet Mail Service (5.5.2448.0)
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Chris Lilley wrote,
> Accept: */*-xml;q=0.7, application/xml;q=0.5, text/xml;q=0.001

Some servers might support */*-xml, but if they do it's because
they implement a non-standard extension to RFC2616. From,

 14.1 Accept

   Accept = "Accept" ":" #( media-range [ accept-params ] )
   media-range = ("*/*" |
                  (type "/" "*") |
                  (type "/" subtype)) *( ";" parameter)

ie. you're allowed to wildcard the *whole* type or subtype, but
you can't expect a general regular expression match.

Cheers,


Miles

-- 
Miles Sabin                       Cromwell Media
Internet Systems Architect        5/6 Glenthorne Mews
+44 (0)20 8817 4030               London, W6 0LJ, England
msabin@cromwellmedia.com          http://www.cromwellmedia.com/



Received: by ns.secondary.com (8.9.3/8.9.3) id JAA18118 for ietf-xml-mime-bks; Fri, 17 Mar 2000 09:30:05 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA18114 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 09:30:04 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id MAA05321; Fri, 17 Mar 2000 12:31:34 -0500
Message-Id: <200003171731.MAA05321@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-xml-mime@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Fri, 17 Mar 2000 12:32:31 -0500
To: Chris Lilley <chris@w3.org>
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: MIME types and content negotiation
Cc: ietf-xml-mime@imc.org
In-Reply-To: <38D269A1.14FC4C4E@w3.org>
References: <"Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net> <4.3.2.20000316201037.00bc5860@not-real.proper.com> <200003171713.MAA04599@hesketh.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 06:21 PM 3/17/00 +0100, Chris Lilley wrote:
>> To me, it seems like there should be a way for a client to say "You can
>> send me your XML, and I'll give it my best shot", rather than "these are
>> the only twenty types I know of".
>
>Accept: */*-xml;q=0.7, application/xml;q=0.5, text/xml;q=0.001

That sounds very good to me - it seems like another place where the suffix
is useful.  

However, it isn't clear from HTTP 1.1 that */*-xml is acceptable in this
use case - the BNF for media range looks like it accepts */* and type/*,
where * can only be used for an entire subtype name.

That's why I'm not so positive that this intriguing possibility actually
helps my argument, at least pending a revision of HTTP 1.1.  

On the other hand, I don't think it hurts it.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA17890 for ietf-xml-mime-bks; Fri, 17 Mar 2000 09:20:35 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA17886 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 09:20:30 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id MAA14007; Fri, 17 Mar 2000 12:21:44 -0500
Message-ID: <38D269A1.14FC4C4E@w3.org>
Date: Fri, 17 Mar 2000 18:21:37 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: "Simon St.Laurent" <simonstl@simonstl.com>
CC: ietf-xml-mime@imc.org
Subject: Re: MIME types and content negotiation
References: <"Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net> <4.3.2.20000316201037.00bc5860@not-real.proper.com> <200003171713.MAA04599@hesketh.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

"Simon St.Laurent" wrote:
> 
> I'm not sure if this helps or hinders the argument over the -xml suffix,
> but I found something interesting at the W3C yesterday, and I'm still
> pondering it.
> 
> http://www.w3.org/TR/photo-rdf/#Jigsaw1
> 
> It describes a simple extension to the Jigsaw server that bases the
> behavior of the server on the MIME content-types that the client says it
> will accept.  It's only a W3C Note describing staff activity, not a working
> draft on the main track.

The note is describing staff activity about a picture database sample
application. It is thatworkwhichis not on the main track.

The actual sending of different resources depending on the advertised
capabilities of the client is your basic HTTP content negotiation, as used
in Apache and other servers already, that was in the CERn server since the
year dot.

> I'm wondering how this kind of content negotiation might or might not
> integrate with the use of the -xml suffix, and how it might or might not
> work if that suffix isn't used.

> To me, it seems like there should be a way for a client to say "You can
> send me your XML, and I'll give it my best shot", rather than "these are
> the only twenty types I know of".

Accept: */*-xml;q=0.7, application/xml;q=0.5, text/xml;q=0.001

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA17713 for ietf-xml-mime-bks; Fri, 17 Mar 2000 09:12:06 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA17708 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 09:12:04 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id MAA04599 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 12:13:36 -0500
Message-Id: <200003171713.MAA04599@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: <ietf-xml-mime@imc.org>
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Fri, 17 Mar 2000 12:14:33 -0500
To: ietf-xml-mime@imc.org
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: MIME types and content negotiation
In-Reply-To: <01JN4P9M11HA00004E@MAUVE.INNOSOFT.COM>
References: <"Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net> <4.3.2.20000316201037.00bc5860@not-real.proper.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

I'm not sure if this helps or hinders the argument over the -xml suffix,
but I found something interesting at the W3C yesterday, and I'm still
pondering it.

http://www.w3.org/TR/photo-rdf/#Jigsaw1

It describes a simple extension to the Jigsaw server that bases the
behavior of the server on the MIME content-types that the client says it
will accept.  It's only a W3C Note describing staff activity, not a working
draft on the main track.

I'm wondering how this kind of content negotiation might or might not
integrate with the use of the -xml suffix, and how it might or might not
work if that suffix isn't used.  

To me, it seems like there should be a way for a client to say "You can
send me your XML, and I'll give it my best shot", rather than "these are
the only twenty types I know of".

Anyway, it's an interesting issue, and I'd love to hear what people on this
list think.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id GAA13616 for ietf-xml-mime-bks; Fri, 17 Mar 2000 06:46:34 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id GAA13597; Fri, 17 Mar 2000 06:46:24 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.1-1 #35243) id <01JN3LIS4GZ400004E@MAUVE.INNOSOFT.COM>; Fri, 17 Mar 2000 06:47:34 -0800 (PST)
Date: Fri, 17 Mar 2000 06:40:04 -0800 (PST)
Subject: Re: Finishing the XML-tagging discussion
In-reply-to: "Your message dated Fri, 17 Mar 2000 09:29:09 -0500" <200003171428.JAA27631@hesketh.net>
To: "Simon St.Laurent" <simonstl@simonstl.com>
Cc: Paul Hoffman / IMC <phoffman@imc.org>, ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
Message-id: <01JN4P9M11HA00004E@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
References: <4.3.2.20000316201037.00bc5860@not-real.proper.com>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> If this growing and distributed burden is more attractive than a 4-byte
> naming convention that doesn't interfere with existing processing, then I
> suppose we should drop the suffix, and find out how popular this
> non-approach proves to be in a couple of years.  At that point, it will be
> very difficult to fix things.  I continue to argue that the suffix is a
> remarkably low-cost solution with significant benefits.

Well, you've just convinced me. I hereby retract my assertion that 
content sniffing is a "mostly harmless" but partial solution to this 
problem. In light of this it doesn't look like a solution at all.

> Past performance is no guarantee of future returns.

Well, actually, I didn't think about the lesson to be learned from past
attempts to use "magic number" approaches. These have proved to work
spectacularly badly sometimes. Again, PostScript provides one of the better
examples of how things can go wrong: I routinely see problem reports whose
underlying cause is either a false positive or a false negative  "PostScript"
result from a content sniffer.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id GAA13525 for ietf-xml-mime-bks; Fri, 17 Mar 2000 06:42:42 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id GAA13521 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 06:42:40 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id JAA28170; Fri, 17 Mar 2000 09:44:11 -0500
Message-Id: <200003171444.JAA28170@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-xml-mime@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Fri, 17 Mar 2000 09:45:07 -0500
To: MURATA Makoto <muraw3c@attglobal.net>, ietf-xml-mime@imc.org
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Some text that may be useful for the update of RFC 2376
In-Reply-To: <200003170736.AA01945@t3knz.attglobal.net>
References: <Pine.GSO.4.21.0003171329280.28387-100000@gate> <Pine.GSO.4.21.0003171329280.28387-100000@gate>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 04:36 PM 3/17/00 +0900, MURATA Makoto wrote:
[Rick Jelliffe]
> >(and, in any case,
> >there is no mechanism currently for an XML parser to feed information
> >about which encodings it accepts to the HTTP system to set up the 
> >preferences in the first place.)
>
>HTTP already has the accept-charset field.  I do not understand your claim.

I think Rick may just be saying that the couplings between XML parsers and
HTTP handlers are pretty loose right now, and it's up to programmers to
tighten those connections.  While it's easy to switch out, say Xerces, in
favor of Aelfred, the HTTP end of the connection isn't going to change it's
accept-charset settings automatically.  It probably won't even notice the
change.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id GAA12830 for ietf-xml-mime-bks; Fri, 17 Mar 2000 06:26:49 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id GAA12814; Fri, 17 Mar 2000 06:26:42 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id JAA27631; Fri, 17 Mar 2000 09:28:13 -0500
Message-Id: <200003171428.JAA27631@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-822@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Fri, 17 Mar 2000 09:29:09 -0500
To: Paul Hoffman / IMC <phoffman@imc.org>, ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Finishing the XML-tagging discussion
In-Reply-To: <4.3.2.20000316201037.00bc5860@not-real.proper.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 08:29 PM 3/16/00 -0800, Paul Hoffman / IMC wrote:
>The original problem is: how do I have my MIME handler automatically know 
>that it should hand a body part that has an unknown tag on it to the 
>generic XML processor?

You seem to be assuming in this case that the handler already has the
document, and that the only use of MIME types is for convenience in
processing documents you've already retrieved and know something about.

Also, 'unknown tag'?  Do you mean 'unknown content-type' here?  I don't
think we're talking about using MIME to dispatch handling of document
fragments.  (At least I hope not!)

>We have explored many, many alternatives in detail. They all have 
>drawbacks, some of them severe. However, there is a simple solution that 
>involves no changes to any protocol *and* will stop this discussion so we 
>can move on with more interesting aspects of XML.

This non-solution makes many of the more interesting aspects of XML
difficult to use.  No changes to any protocol in return for a hobbled
infrastructure is not a solution I will accept happily.

>Proposed solution: every time the MIME handler comes across an unknown 
>media type, it looks in the body part and sees if it is XML. If it is XML, 
>add this media type to the "hand to the generic XML processor" list. If it 
>is not XML, add this media type to the "don't hand to the generic XML 
>processor" list. If you are really paranoid about missing something, clear 
>the latter list every so often.

This is the fallback case which applications are going to have to use in
the event that the -xml suffix is rejected, but I strongly doubt that it
either the best way to do it or the way that has the lowest cost.

If we assume that there will only be a relatively small number of XML media
types created, the costs of trying document types to find out if they
contain XML, in order to have some idea whether to try them again, is
minimal.  You load a document here and there and figure out that it's
useless junk or potentially valuable, and you keep track of the name.

As the number of document types grows, however, this processing and the
associated list become an ever growing burden.  In situations where human
users are managing MIME types across more than one program, keeping track
of this list (which of these are XML?) will grow more difficult quite
rapidly.  Programs and networks will both have to spend significant
resources, especially in cases (like search engines) where a generic XML
processor is exploring millions of documents.  Retrieving documents to test
if they're in a usable format becomes a very bad idea as the number of
document types grows.

Also, there are lots of cases where both application-specific and
XML-generic may be appropriate.  Crazy people like me who hand-code their
documents may want to be able to choose SVG viewing in a browser and
editing in an XML-based environment rather than an SVG drawing program.
It'll be much easier to set up those mappings if the description of the
documents - typically the MIME content-type - describes both possibilities.

If this growing and distributed burden is more attractive than a 4-byte
naming convention that doesn't interfere with existing processing, then I
suppose we should drop the suffix, and find out how popular this
non-approach proves to be in a couple of years.  At that point, it will be
very difficult to fix things.  I continue to argue that the suffix is a
remarkably low-cost solution with significant benefits.

>Done. No changes to the naming schemes needed, no hoping that if the naming 
>scheme changes that all future media types follow it, no worrying about 
>'x-' names getting it wrong. Making these lists won't be that hard; there 
>are only 330 types in the IANA registry now, and that includes all of the 
>'vnd.' names.

There's never been a syntax for creating your own document types (apart
from SGML, which never caught fire) without going through a complex
process, whether it was a standards- or vendor- based process. I suspect
there are at least 330 XML vocabularies out there right now, in various
stages of development, each of which could probably benefit from a
registered MIME type.  As for the x- names, I think they might take the
hint as registered names go out and the convention becomes part of XML best
practices.

Past performance is no guarantee of future returns.  

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id CAA27163 for ietf-xml-mime-bks; Fri, 17 Mar 2000 02:19:38 -0800 (PST)
Received: from dokka.maxware.no (dokka.maxware.no [195.139.236.69]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id CAA27144; Fri, 17 Mar 2000 02:19:06 -0800 (PST)
Received: from alden (alden.maxware.no [10.128.1.246]) by dokka.maxware.no (8.9.3/8.9.3) with ESMTP id LAA23561; Fri, 17 Mar 2000 11:20:22 +0100
Message-Id: <4.2.0.58.20000317112248.028625a0@dokka.maxware.no>
X-Sender: hta@dokka.maxware.no
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58 
Date: Fri, 17 Mar 2000 11:24:37 +0100
To: Tim Bray <tbray@textuality.com>, Paul Hoffman / IMC <phoffman@imc.org>, ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
From: Harald Tveit Alvestrand <Harald@Alvestrand.no>
Subject: Re: Finishing the XML-tagging discussion
In-Reply-To: <3.0.32.20000316213930.01b3d270@pop.intergate.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 21:39 16.03.00 -0800, Tim Bray wrote:
>I'm pretty sure the answer to #1 is "no".  However, my intuition is that
>the answer to #2 is "yes".  I don't have an opinion on #3.  The code isn't
>that hard to write.  But should a MIME hander have to care?
one nice thing about this solution is that it *requires no standard action*.
Anyone can implement this any time they want to, or not, as they choose.
If they don't do it, they are no worse off than before; if they do it, it's 
their headache.

(btw, the failure mode most likely is a document that says

<XML i-am-a-document-metadata-dtd>
lots of gory header-like stuff
</XML>
real content goes here)

              Harald



--
Harald Tveit Alvestrand, EDB Maxware, Norway
Harald.Alvestrand@edb.maxware.no



Received: by ns.secondary.com (8.9.3/8.9.3) id XAA18242 for ietf-xml-mime-bks; Thu, 16 Mar 2000 23:36:03 -0800 (PST)
Received: from prserv.net (out4.prserv.net [32.97.166.34]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id XAA18237 for <ietf-xml-mime@imc.org>; Thu, 16 Mar 2000 23:36:02 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.217]) by prserv.net (out4) with SMTP id <2000031707372623902gu2vhe>; Fri, 17 Mar 2000 07:37:26 +0000
Message-Id: <200003170736.AA01945@t3knz.attglobal.net>
Date: Fri, 17 Mar 2000 16:36:18 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <Pine.GSO.4.21.0003171329280.28387-100000@gate>
References: <Pine.GSO.4.21.0003171329280.28387-100000@gate>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Rick Jelliffe wrote...

 >That is not the way I remember it. Application/xml was taken out at
 >one stage, and we had to lobby hard to get it put in. The reason
 >users want it is not to prevent xml from being spuriously displayed:
 >it is to ensure end-to-end integrity because the out-of-band approach
 >has so far failed to provide that integrity. 

I remember that you have claimed this several times, but I have never 
agreed.  

XML people thought that application/xml is required since text/* does 
not allow UTF-16.  After we started this mailing list, we have learned 
more from Ned.

 >And what should the type be if it has parts that are readable and
 >parts that are unreadable?  If the document is encoded with a lot
 >of numeric character references, it is unreadable as text/plain
 >but readable as text/xml: should we send documents that use 
 >many numeric character entities as application/xml?
 >
 >We need a way to ensure end-to-end integrity. 

I do not agree.  Why?

 >Out-of-band signalling of the encoding of a file to some extent a hack to
 >cope with formats that are not adequately self-describing.  I would have
 >no problem with removing text/xml entirely: we don't need to negotiate
 >encoding since everything can be resolved into Unicode 

Although XML is based on Unicode, we certainly require negotiation and 
transcoding.  Some XML processors can only handle US-ASCII, 8859-1, UTF-8, 
and UTF-16.  There are so many legacy encodings in the world.  Negotiation 
and on-the-fly transcoding is certainly a good thing for I18N.  If we hide 
encoding information from the protocol, such transcoding becomes hard or 
even impossible.

 >(and, in any case,
 >there is no mechanism currently for an XML parser to feed information
 >about which encodings it accepts to the HTTP system to set up the 
 >preferences in the first place.)

HTTP already has the accept-charset field.  I do not understand your claim.

Cheers,

----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id VAA13242 for ietf-xml-mime-bks; Thu, 16 Mar 2000 21:50:42 -0800 (PST)
Received: from gate.sinica.edu.tw (gate.sinica.edu.tw [140.109.4.130]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id VAA13237 for <ietf-xml-mime@imc.org>; Thu, 16 Mar 2000 21:50:40 -0800 (PST)
Received: from localhost by gate.sinica.edu.tw (8.9.3/8.9.3) with ESMTP id NAA11078 for <ietf-xml-mime@imc.org>; Fri, 17 Mar 2000 13:51:57 +0800 (CST)
Date: Fri, 17 Mar 2000 13:51:56 +0800 (CST)
From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
X-Sender: ricko@gate
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
In-Reply-To: <200003161446.AA01936@t3knz.attglobal.net>
Message-ID: <Pine.GSO.4.21.0003171329280.28387-100000@gate>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On Thu, 16 Mar 2000, MURATA Makoto wrote:

> No.  Application/xml was introduced since some XML is not 
> readable for casual users.  As Ned Freed pointed out long time ago, 
> text/* is inappropriate for something unreadable for casual users.

That is not the way I remember it. Application/xml was taken out at
one stage, and we had to lobby hard to get it put in. The reason
users want it is not to prevent xml from being spuriously displayed:
it is to ensure end-to-end integrity because the out-of-band approach
has so far failed to provide that integrity. 

And what should the type be if it has parts that are readable and
parts that are unreadable?  If the document is encoded with a lot
of numeric character references, it is unreadable as text/plain
but readable as text/xml: should we send documents that use 
many numeric character entities as application/xml?

We need a way to ensure end-to-end integrity. If application/*
cannot provide it, what does? 

Out-of-band signalling of the encoding of a file to some extent a hack to
cope with formats that are not adequately self-describing.  I would have
no problem with removing text/xml entirely: we don't need to negotiate
encoding since everything can be resolved into Unicode (and, in any case,
there is no mechanism currently for an XML parser to feed information
about which encodings it accepts to the HTTP system to set up the 
preferences in the first place.)

Rick Jelliffe



Received: by ns.secondary.com (8.9.3/8.9.3) id VAA12942 for ietf-xml-mime-bks; Thu, 16 Mar 2000 21:37:20 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id VAA12928; Thu, 16 Mar 2000 21:37:04 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id VAA05421; Thu, 16 Mar 2000 21:38:10 -0800
Message-Id: <3.0.32.20000316213930.01b3d270@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Thu, 16 Mar 2000 21:39:32 -0800
To: Paul Hoffman / IMC <phoffman@imc.org>, ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
From: Tim Bray <tbray@textuality.com>
Subject: Re: Finishing the XML-tagging discussion
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 08:29 PM 3/16/00 -0800, Paul Hoffman / IMC wrote:
>Proposed solution: every time the MIME handler comes across an unknown 
>media type, it looks in the body part and sees if it is XML. If it is XML, 

This would indeed finesse the problem.  Hmm, three questions on the notion
of detecting whether content is XML:

1. is it possible in principle in the general case?
2. if not, is it possible often enough to be useful?
3. is it an appropriate kind of thing to require a general-purpose
   MIME handler to do?

I'm pretty sure the answer to #1 is "no".  However, my intuition is that
the answer to #2 is "yes".  I don't have an opinion on #3.  The code isn't
that hard to write.  But should a MIME hander have to care?

I'm getting backed into a corner where, like it or not, the -xml convention
is looking substantially less bad than all the alternatives.  As for
charset, if the people who are asserting that we have to support fallback to
text/plain are right, then it's gotta be compulsory.  -Tim


Received: by ns.secondary.com (8.9.3/8.9.3) id VAA12626 for ietf-xml-mime-bks; Thu, 16 Mar 2000 21:24:06 -0800 (PST)
Received: from prserv.net (out1.prserv.net [32.97.166.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id VAA12620 for <ietf-xml-mime@imc.org>; Thu, 16 Mar 2000 21:24:05 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.80]) by prserv.net (out1) with SMTP id <2000031705251625200bfsq7e>; Fri, 17 Mar 2000 05:25:17 +0000
Message-Id: <200003161446.AA01936@t3knz.attglobal.net>
From: MURATA Makoto <muraw3c@attglobal.net>
Date: Thu, 16 Mar 2000 23:46:05 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
In-Reply-To: <Pine.GSO.4.21.0003152319200.19582-100000@gate>
References: <Pine.GSO.4.21.0003152319200.19582-100000@gate>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Rick Jelliffe wrote...
 >Please let us stick with the rationale for having text/xml and
 >application/xml in the first place: the former would be useful
 >in all the cases where transcoding/newline-fiddling/defaulting-to-
 >text-viewer was useful and the latter was useful when we want to
 >prevent trancoding/newline-fiddling/defaulting-to-text-viewer.

No.  Application/xml was introduced since some XML is not 
readable for casual users.  As Ned Freed pointed out long time ago, 
text/* is inappropriate for something unreadable for casual users.

----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id VAA12157 for ietf-xml-mime-bks; Thu, 16 Mar 2000 21:14:29 -0800 (PST)
Received: from rhino.harvard.edu (rhino.harvard.edu [140.247.92.217]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id VAA12144 for <ietf-xml-mime@imc.org>; Thu, 16 Mar 2000 21:14:25 -0800 (PST)
Received: from [10.0.1.2] (63.36.208.58) by rhino.harvard.edu with ESMTP (Eudora Internet Mail Server 1.2); Fri, 17 Mar 2000 01:08:09 -0400
User-Agent: Microsoft-Outlook-Express-Macintosh-Edition/5.0.2012
Date: Thu, 16 Mar 2000 21:16:31 -0800
Subject: Re: Finishing the XML-tagging discussion
From: Dan Crevier <Dan.Crevier@pobox.com>
To: Paul Hoffman / IMC <phoffman@imc.org>, <ietf-types@iana.org>, <ietf-xml-mime@imc.org>, <ietf-822@imc.org>
Message-ID: <B4F6FFAE.1189B%Dan.Crevier@pobox.com>
In-Reply-To: <4.3.2.20000316201037.00bc5860@not-real.proper.com>
Mime-version: 1.0
Content-type: text/plain; charset="US-ASCII"
Content-transfer-encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On 3/16/2000 8:29 PM, Paul Hoffman / IMC wrote:

> Folks, this discussion has gotten so far away from its origin that we've
> lost track of why we are having it.
> 
> The original problem is: how do I have my MIME handler automatically know
> that it should hand a body part that has an unknown tag on it to the
> generic XML processor?
> 
> We have explored many, many alternatives in detail. They all have
> drawbacks, some of them severe. However, there is a simple solution that
> involves no changes to any protocol *and* will stop this discussion so we
> can move on with more interesting aspects of XML.
> 
> Proposed solution: every time the MIME handler comes across an unknown
> media type, it looks in the body part and sees if it is XML. If it is XML,
> add this media type to the "hand to the generic XML processor" list. If it
> is not XML, add this media type to the "don't hand to the generic XML
> processor" list. If you are really paranoid about missing something, clear
> the latter list every so often.
> 
> Done. No changes to the naming schemes needed, no hoping that if the naming
> scheme changes that all future media types follow it, no worrying about
> 'x-' names getting it wrong. Making these lists won't be that hard; there
> are only 330 types in the IANA registry now, and that includes all of the
> 'vnd.' names.
> 
> Look inside and see if this media type seems to carry XML. This is what
> automated systems are good at. As for us humans, we seem to be pretty good
> at finding rat holes to go down...

This sounds like a good solution.  Another solution to the general problem
of "How do I specify that a given MIME type can be handled as another
alternate MIME type?" might be to do something like:

Content-Type: application/iotp; alternate-type=text/xml

If the client doesn't have something that can handle application/iotp, it
starts going through the alternate types to see if there's another way to
handle the data.  Current clients would just ignore the alternate-type.
This seems like a really general solution that would solve this particular
problem.

Dan



Received: by ns.secondary.com (8.9.3/8.9.3) id UAA08974 for ietf-xml-mime-bks; Thu, 16 Mar 2000 20:27:53 -0800 (PST)
Received: from laptop.imc.org (ip12.proper.com [165.227.249.12]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id UAA08970; Thu, 16 Mar 2000 20:27:51 -0800 (PST)
Message-Id: <4.3.2.20000316201037.00bc5860@not-real.proper.com>
X-Sender: phoffman@mail.imc.org
X-Mailer: QUALCOMM Windows Eudora Version 4.3
Date: Thu, 16 Mar 2000 20:29:24 -0800
To: ietf-types@iana.org, ietf-xml-mime@imc.org, ietf-822@imc.org
From: Paul Hoffman / IMC <phoffman@imc.org>
Subject: Finishing the XML-tagging discussion
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Folks, this discussion has gotten so far away from its origin that we've 
lost track of why we are having it.

The original problem is: how do I have my MIME handler automatically know 
that it should hand a body part that has an unknown tag on it to the 
generic XML processor?

We have explored many, many alternatives in detail. They all have 
drawbacks, some of them severe. However, there is a simple solution that 
involves no changes to any protocol *and* will stop this discussion so we 
can move on with more interesting aspects of XML.

Proposed solution: every time the MIME handler comes across an unknown 
media type, it looks in the body part and sees if it is XML. If it is XML, 
add this media type to the "hand to the generic XML processor" list. If it 
is not XML, add this media type to the "don't hand to the generic XML 
processor" list. If you are really paranoid about missing something, clear 
the latter list every so often.

Done. No changes to the naming schemes needed, no hoping that if the naming 
scheme changes that all future media types follow it, no worrying about 
'x-' names getting it wrong. Making these lists won't be that hard; there 
are only 330 types in the IANA registry now, and that includes all of the 
'vnd.' names.

Look inside and see if this media type seems to carry XML. This is what 
automated systems are good at. As for us humans, we seem to be pretty good 
at finding rat holes to go down...

--Paul Hoffman, Director
--Internet Mail Consortium



Received: by ns.secondary.com (8.9.3/8.9.3) id EAA03442 for ietf-xml-mime-bks; Thu, 16 Mar 2000 04:23:53 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id EAA03438 for <ietf-xml-mime@imc.org>; Thu, 16 Mar 2000 04:23:52 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id HAA12833; Thu, 16 Mar 2000 07:25:08 -0500
Message-ID: <38D0D29D.2D8ED1A4@w3.org>
Date: Thu, 16 Mar 2000 13:25:01 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: "Martin J. Duerst" <duerst@w3.org>
CC: MURATA Makoto <muraw3c@attglobal.net>, ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <38CD4D1E.847976B1@w3.org> <38CD4D1E.847976B1@w3.org> <200003150548.OAA01789@sh.w3.mag.keio.ac.jp> <200003160238.LAA07061@sh.w3.mag.keio.ac.jp>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

"Martin J. Duerst" wrote:
> 
> At 15:29 00/03/15 +0100, Chris Lilley wrote:
> 
> > "Martin J. Duerst" wrote:
> > > I'm not sure it makes sense to disallow it, but discouraging
> > > it might be fine.
> >
> > No. Discourage means wool; it means sometimes-this, sometimes-that. It
> > means admitting the possibility of conflicting sources of information.
> 
> RFC 2396 currently even strongly recommends it, so changing that
> to disallowing it doesn't work so easily.

Oh it works very easily. I don't see how changing "strongly encourage" to
"discourage" is easier than changing it to "must not".

But "must not" cetainly makes like simpler, easier and more predicatable
for those who value data integrity.

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id EAA03299 for ietf-xml-mime-bks; Thu, 16 Mar 2000 04:21:35 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id EAA03295 for <ietf-xml-mime@imc.org>; Thu, 16 Mar 2000 04:21:34 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id HAA12670; Thu, 16 Mar 2000 07:22:47 -0500
Message-ID: <38D0D211.E0AAA9B6@w3.org>
Date: Thu, 16 Mar 2000 13:22:41 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: Keith Moore <moore@cs.utk.edu>
CC: Tim Bray <tbray@textuality.com>, Rick Jelliffe <ricko@gate.sinica.edu.tw>, ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <200003151936.OAA06679@astro.cs.utk.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Keith Moore wrote:
> 
> > So: are we really *really* sure that we have to default to US-ASCII,
> > or <important>that we have to default at all</important>?
> 
> if you have a charset parameter, it should be consistent with the
> charset parameter for text/plain.  this is so that mail user agents
> that don't understand text/xml can fallback to handling it as
> text/plain, including their treatment of the charset parameter.

And thus the corollary of this: if you are not happy for text/plain
fallback to be playing fast and loose with your XML instance, then use
application/xml instead.

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id EAA03149 for ietf-xml-mime-bks; Thu, 16 Mar 2000 04:14:48 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id EAA03144 for <ietf-xml-mime@imc.org>; Thu, 16 Mar 2000 04:14:46 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id HAA11879; Thu, 16 Mar 2000 07:16:02 -0500
Message-ID: <38D0D07C.C9464048@w3.org>
Date: Thu, 16 Mar 2000 13:15:56 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: Rick Jelliffe <ricko@gate.sinica.edu.tw>
CC: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <Pine.GSO.4.21.0003152319200.19582-100000@gate>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Rick Jelliffe wrote:
> 
> On Wed, 15 Mar 2000, MURATA Makoto wrote:
> 
> > Are you saying that each format should invent their own rules for
> > indicating the charset?  My understanding was (and still is) that
> > you as an I18n guy at W3C are promoting a single generalized solution
> > for all textual formats.
> 
> Please let us stick with the rationale for having text/xml and
> application/xml in the first place: the former would be useful
> in all the cases where transcoding/newline-fiddling/defaulting-to-
> text-viewer was useful and the latter was useful when we want to
> prevent trancoding/newline-fiddling/defaulting-to-text-viewer.

Yes. Though it is not clear what a non-XML-aware transcodes would do when
swizzling an XML document between encodings. Since it can't use entities or
NCRs, what does it use for characters that do not fall within the
repertoire of te encoding it is converting to? Question marks?

> It should always be an error of some kind if the charset parameter
> does not agree with the encoding attribute (or other Appendix F
> mechanism).  Only given that constraint is it useful to make the
> charset parameter significant.

I agree, but then given that constraint the charset parameter is
superfluous since it adds no new information. However, I might be prepared
to conceed that it is not too harmful as long as it is constrained tosay
what the XML encoding says, and for it to be an error for these to differ.

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id UAA08450 for ietf-xml-mime-bks; Wed, 15 Mar 2000 20:31:59 -0800 (PST)
Received: from gate.sinica.edu.tw (gate.sinica.edu.tw [140.109.4.130]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id UAA08445 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 20:31:57 -0800 (PST)
Received: from localhost by gate.sinica.edu.tw (8.9.3/8.9.3) with ESMTP id MAA06861 for <ietf-xml-mime@imc.org>; Thu, 16 Mar 2000 12:33:11 +0800 (CST)
Date: Thu, 16 Mar 2000 12:33:11 +0800 (CST)
From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
X-Sender: ricko@gate
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
In-Reply-To: <200003160238.LAA07061@sh.w3.mag.keio.ac.jp>
Message-ID: <Pine.GSO.4.21.0003161226490.27383-100000@gate>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On Thu, 16 Mar 2000, Martin J. Duerst wrote:

> RFC 2396 currently even strongly recommends it, so changing that
> to disallowing it doesn't work so easily.

I personally don't like having a charset parameter on application/xml.

But if people want it, then the issue becomes "how can we have 
a charset parameter and still maintain the end-to-end, inband
encoding signalling which is the reason for application/xml to
exist?"

That is why I think if the charset parameter and the encoding
attribute disagree in an application/xml document, it is an
error.  And, if we accept that step, it means there is less
reason for the end-to-end/inbanders to be concerned if the
charset parameter is allowed or even made mandatory.

The worst thing would be to make the charset parameter required
and superior to the encoding attribute. I agree with Chris
there.

Rick Jelliffe



Received: by ns.secondary.com (8.9.3/8.9.3) id UAA07875 for ietf-xml-mime-bks; Wed, 15 Mar 2000 20:25:12 -0800 (PST)
Received: from gate.sinica.edu.tw (gate.sinica.edu.tw [140.109.4.130]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id UAA07869 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 20:25:10 -0800 (PST)
Received: from localhost by gate.sinica.edu.tw (8.9.3/8.9.3) with ESMTP id MAA03649 for <ietf-xml-mime@imc.org>; Thu, 16 Mar 2000 12:26:23 +0800 (CST)
Date: Thu, 16 Mar 2000 12:26:22 +0800 (CST)
From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
X-Sender: ricko@gate
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
In-Reply-To: <200003160238.LAA07064@sh.w3.mag.keio.ac.jp>
Message-ID: <Pine.GSO.4.21.0003161206400.27383-100000@gate>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On Thu, 16 Mar 2000, Martin J. Duerst wrote:

> Please be careful. There are no multiple flavors of Unicode.
> There are multiple flavors of conversion tables between
> Japanese legacy encodings and Unicode.

I take your point, but it  sidesteps the issue of
what someone is supposed to do when they type in Yen and
their "UTF-*"-labelled data comes out with the codepoint
for "/" being used. In that case, strictly they have
used the wrong mapping table and they have corrupted their
data; but if we can give them a way to escape into the
bliss of standard Unicode by labelling the variant encoding
they have effectively used. 

The proposed Japanese Profile for XML, which Murata-san 
has been the leading light, says that there needs to be
extra IANA-registered sets to cover this problem. 

I don't think anyone is proposing the variant encoding
as a thing to be recommended. But, in the context of
XML, an IANA-registered name gives a way to uncorrupt
the data: rather than legitimizing the variant encoding
I think it makes it explicit that the data has a particular
problem that requires a particular remedy (i.e. transcoding
the couple of characters that are at issue.)

The other alternative is for the Unicode consortium
to make the code position occupied by "/" also be
occupied by yen, as a strange kind of variant
which only applies in Japanese-sourced data, I suppose.
Not very attractive!

Rick Jelliffe




Received: by ns.secondary.com (8.9.3/8.9.3) id SAA28407 for ietf-xml-mime-bks; Wed, 15 Mar 2000 18:41:18 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id SAA28401 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 18:41:17 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id VAA08781; Wed, 15 Mar 2000 21:41:24 -0500 (EST)
Message-Id: <200003160241.VAA08781@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: "Martin J. Duerst" <duerst@w3.org>
cc: Keith Moore <moore@cs.utk.edu>, MURATA Makoto <muraw3c@attglobal.net>, ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376 
In-reply-to: Your message of "Thu, 16 Mar 2000 11:20:03 +0900." <200003160238.LAA07058@sh.w3.mag.keio.ac.jp> 
Date: Wed, 15 Mar 2000 21:41:24 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> I don't think the 'charset' parameter works that way.
> Either you use it, the way it's defined, or you don't.

charset is defined for text.  it's not defined for other types.
application/* types can define a charset parameter any way they want.
(so, for that matter, could audio/*, video/*, or model/*, but 
that seems less likely)

> Trying it for *every* data format is a bad idea.
> But trying it for most formats, or a class of formats, usually works.

well, we've had some trouble doing it even for text/* because
folks wanted to use character encoding schemes that didn't fit
with the MIME text/* model.

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id SAA27989 for ietf-xml-mime-bks; Wed, 15 Mar 2000 18:38:20 -0800 (PST)
Received: from sh.w3.mag.keio.ac.jp (sh.w3.mag.keio.ac.jp [133.27.194.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id SAA27982 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 18:38:18 -0800 (PST)
Received: from enoshima (dhcp-100-224.mag.keio.ac.jp [133.27.195.224]) by sh.w3.mag.keio.ac.jp (8.9.3/3.7W) with SMTP id LAA07064; Thu, 16 Mar 2000 11:38:55 +0900 (JST)
Message-Id: <200003160238.LAA07064@sh.w3.mag.keio.ac.jp>
X-Sender: duerst@sh.w3.mag.keio.ac.jp
X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.3-J (32)
Date: Thu, 16 Mar 2000 11:35:31 +0900
To: Tim Bray <tbray@textuality.com>
From: "Martin J. Duerst" <duerst@w3.org>
Subject: Re: Some text that may be useful for the update of RFC 2376
Cc: Rick Jelliffe <ricko@gate.sinica.edu.tw>, ietf-xml-mime@imc.org
In-Reply-To: <3.0.32.20000315080901.01a9fc40@pop.intergate.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 08:09 00/03/15 -0800, Tim Bray wrote:
> At 10:39 PM 3/15/00 +0800, Rick Jelliffe wrote:

> >	3) In all cases, if the document starts with a BOM,
> >	the charset parameter must indicate which flavour
> >	of UTF-16 is being used. There is no default.
> >	Failure is an unrecoverable error, for general
> >	applications. Detection is not mandatory, but should
> >	be made so at some future date.
> ... 
> >The reason for 3) is that, as Murata-san's proposed
> >Japanese Profile of XML makes clear, there are Japanese flavours
> >of Unicode floating about. So just relying on the BOM is not
> >satisfactory. 

Please be careful. There are no multiple flavors of Unicode.
There are multiple flavors of conversion tables between
Japanese legacy encodings and Unicode.

So this does not affect UTF-16 at all.

Regards,   Martin.


#-#-#  Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org/People/D%C3%BCrst


Received: by ns.secondary.com (8.9.3/8.9.3) id SAA27932 for ietf-xml-mime-bks; Wed, 15 Mar 2000 18:37:57 -0800 (PST)
Received: from sh.w3.mag.keio.ac.jp (sh.w3.mag.keio.ac.jp [133.27.194.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id SAA27925 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 18:37:56 -0800 (PST)
Received: from enoshima (dhcp-100-224.mag.keio.ac.jp [133.27.195.224]) by sh.w3.mag.keio.ac.jp (8.9.3/3.7W) with SMTP id LAA07058; Thu, 16 Mar 2000 11:38:54 +0900 (JST)
Message-Id: <200003160238.LAA07058@sh.w3.mag.keio.ac.jp>
X-Sender: duerst@sh.w3.mag.keio.ac.jp
X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.3-J (32)
Date: Thu, 16 Mar 2000 11:20:03 +0900
To: Keith Moore <moore@cs.utk.edu>
From: "Martin J. Duerst" <duerst@w3.org>
Subject: Re: Some text that may be useful for the update of RFC 2376 
Cc: MURATA Makoto <muraw3c@attglobal.net>, ietf-xml-mime@imc.org
In-Reply-To: <200003151542.KAA05553@astro.cs.utk.edu>
References: <Your message of "Wed, 15 Mar 2000 21:46:46 +0900."             <200003151246.AA01923@t3knz.attglobal.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Keith - I'm confused.

At 10:42 00/03/15 -0500, Keith Moore wrote:
> > Are you saying that each format should invent their own rules for 
> > indicating the charset? 
> 
> that's how MIME parameters work.  

I don't think the 'charset' parameter works that way.
Either you use it, the way it's defined, or you don't.


> trying to coerce every data format into a common set of rules is
> much more difficult, and highly unlikely to succeed.

Trying it for *every* data format is a bad idea.
But trying it for most formats, or a class of formats, usually works.

Regards,    Martin.


#-#-#  Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org/People/D%C3%BCrst


Received: by ns.secondary.com (8.9.3/8.9.3) id SAA27894 for ietf-xml-mime-bks; Wed, 15 Mar 2000 18:37:43 -0800 (PST)
Received: from sh.w3.mag.keio.ac.jp (sh.w3.mag.keio.ac.jp [133.27.194.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id SAA27883 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 18:37:41 -0800 (PST)
Received: from enoshima (dhcp-100-224.mag.keio.ac.jp [133.27.195.224]) by sh.w3.mag.keio.ac.jp (8.9.3/3.7W) with SMTP id LAA07061; Thu, 16 Mar 2000 11:38:54 +0900 (JST)
Message-Id: <200003160238.LAA07061@sh.w3.mag.keio.ac.jp>
X-Sender: duerst@sh.w3.mag.keio.ac.jp
X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.3-J (32)
Date: Thu, 16 Mar 2000 11:29:59 +0900
To: Chris Lilley <chris@w3.org>
From: "Martin J. Duerst" <duerst@w3.org>
Subject: Re: Some text that may be useful for the update of RFC 2376
Cc: MURATA Makoto <muraw3c@attglobal.net>, ietf-xml-mime@imc.org
In-Reply-To: <38CF9E4D.D7EC72B1@w3.org>
References: <38CD4D1E.847976B1@w3.org> <38CD4D1E.847976B1@w3.org> <200003150548.OAA01789@sh.w3.mag.keio.ac.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 15:29 00/03/15 +0100, Chris Lilley wrote:

> "Martin J. Duerst" wrote:
> > 
> > At 00:21 00/03/15 +0900, MURATA Makoto wrote:
> > 
> > >  >> - XML sent as application/xml (or equivalent):
> > >  >>   - Charset parameter is strongly recommended, and if present,
> > >  >>     it takes precedence.
> > >  >
> > >  >Charset parameter is *disallowed*.
> > 
> > I'm not sure it makes sense to disallow it, but discouraging
> > it might be fine.
> 
> No. Discourage means wool; it means sometimes-this, sometimes-that. It
> means admitting the possibility of conflicting sources of information.

RFC 2396 currently even strongly recommends it, so changing that
to disallowing it doesn't work so easily.


Regards,   Martin.


#-#-#  Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org/People/D%C3%BCrst


Received: by ns.secondary.com (8.9.3/8.9.3) id RAA22948 for ietf-xml-mime-bks; Wed, 15 Mar 2000 17:46:06 -0800 (PST)
Received: from meer.meer.net (meer.meer.net [140.174.164.2]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id RAA22937 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 17:46:01 -0800 (PST)
Received: from [209.157.137.86] (pm3b-86.mv.meer.net [209.157.137.86]) by meer.meer.net (8.9.3/8.9.3/meer) with SMTP id RAA875798 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 17:46:49 -0800 (PST)
X-Sender: crism@mail.exemplary.net
Message-Id: <v01530501b4f5e20bcd63@[209.157.137.93]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Wed, 15 Mar 2000 17:51:46 -0800
To: ietf-xml-mime@imc.org
From: crism@exemplary.net (Christopher R. Maden)
Subject: Re: Some text that may be useful for the update of RFC 2376
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

[Tim Bray]
><surprised>Hold on a second!  I just went and read RFC 2046, section
>4.1.2, and it seems to me that the US-ASCII default is only compulsory
>for text/plain!  I quote the text:

Yes, but: with the two-level hierarchy of MIME media types, it's expected
that a processor that doesn't understand text/foo can fall back to
processing an entity as text/plain.  So while text/xml can have any or no
default, we have to consider the case of a non-XML-aware processor
(possibly including a transcoder) handling it as text/plain, which includes
a default encoding of US-ASCII.

-Chris

--
Christopher R. Maden, Solutions Architect
Yomu (formerly Exemplary Technologies)
One Embarcadero Center, Ste. 2405
San Francisco, CA 94111




Received: by ns.secondary.com (8.9.3/8.9.3) id LAA14143 for ietf-xml-mime-bks; Wed, 15 Mar 2000 11:37:38 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id LAA14139 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 11:37:36 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id OAA06679; Wed, 15 Mar 2000 14:36:47 -0500 (EST)
Message-Id: <200003151936.OAA06679@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: Tim Bray <tbray@textuality.com>
cc: Rick Jelliffe <ricko@gate.sinica.edu.tw>, ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376 
In-reply-to: Your message of "Wed, 15 Mar 2000 08:09:05 PST." <3.0.32.20000315080901.01a9fc40@pop.intergate.ca> 
Date: Wed, 15 Mar 2000 14:36:47 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> So: are we really *really* sure that we have to default to US-ASCII,
> or <important>that we have to default at all</important>?  

if you have a charset parameter, it should be consistent with the
charset parameter for text/plain.  this is so that mail user agents
that don't understand text/xml can fallback to handling it as 
text/plain, including their treatment of the charset parameter.

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id IAA07912 for ietf-xml-mime-bks; Wed, 15 Mar 2000 08:40:08 -0800 (PST)
Received: from prserv.net (out5.prserv.net [32.97.166.35]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA07908 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 08:40:07 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.40]) by prserv.net (out5) with SMTP id <20000315164115243031no2ae>; Wed, 15 Mar 2000 16:41:15 +0000
Message-Id: <200003151641.AA01930@t3knz.attglobal.net>
From: MURATA Makoto <muraw3c@attglobal.net>
Date: Thu, 16 Mar 2000 01:41:43 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
In-Reply-To: <38CE65A3.1BFE9ECF@w3.org>
References: <38CE65A3.1BFE9ECF@w3.org>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...
 >
 >I have come to believe that taking an out of band solution which sort of
 >works for text/* over HTTP and email, and trying to extend it to
 >application/* and image/* and video/* and ftp and file and other sorts of
 >protocol - to make it fit all cases - has some very clear problems in
 >extremely common use cases. Like the fact that 99.999% of content providers
 >have no control over the configuration of the web server they use, but do
 >have control of the content that they place there. 

Appache allows content provides to take control by writing .htaccess.  
IIS 4.0 and later allows users to provide the charset parameter 
by using the file type menu.  Do you have any evidence for your claim 
"99.999%"?  In my understanding, WWW servers are making progress 
in supporting the charset parameter.

----
MURATA Makoto  muraw3c@attglobal.net


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id IAA06849 for ietf-xml-mime-bks; Wed, 15 Mar 2000 08:07:26 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA06845 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 08:07:25 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id IAA31696; Wed, 15 Mar 2000 08:08:00 -0800
Message-Id: <3.0.32.20000315080901.01a9fc40@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Wed, 15 Mar 2000 08:09:05 -0800
To: Rick Jelliffe <ricko@gate.sinica.edu.tw>, ietf-xml-mime@imc.org
From: Tim Bray <tbray@textuality.com>
Subject: Re: Some text that may be useful for the update of RFC 2376
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 10:39 PM 3/15/00 +0800, Rick Jelliffe wrote:
>What about this?
>
>	1) In all cases, charset parameter is required.
>	There is no default. Failure is an unrecoverable
>	error, for general applications. Detection is
>	mandatory.

I hate this but it's hard to disagree with.  text/xml is the one format
where you can, in a high proportion of cases, send it without worrying
too much about the encoding and The Right Thing Will Happen.  But I gather
that doing so amounts to an assertion that the encoding is US-ASCII 
[unless we can figure out how to dodge RFC2046].  Clearly this is 
intolerable.  

<surprised>Hold on a second!  I just went and read RFC 2046, section
4.1.2, and it seems to me that the US-ASCII default is only compulsory
for text/plain!  I quote the text:

   A critical parameter that may be specified in the Content-Type field
   for "text/plain" data is the character set.  This is specified with a
   "charset" parameter, as in:

     Content-type: text/plain; charset=iso-8859-1

   Unlike some other parameter values, the values of the charset
   parameter are NOT case sensitive.  The default character set, which
   must be assumed in the absence of a charset parameter, is US-ASCII.

   The specification for any future subtypes of "text" must specify
   whether or not they will also utilize a "charset" parameter, and may
   possibly restrict its values as well.  For other subtypes of "text"
   than "text/plain", the semantics of the "charset" parameter should be
   defined to be identical to those specified here for "text/plain",
   i.e., the body consists entirely of characters in the given charset.
   In particular, definers of future "text" subtypes should pay close
   attention to the implications of multioctet character sets for their
   subtype definitions.

Followed by many pages of discussion of the meaning of ASCII and life.
</surprised>

So: are we really *really* sure that we have to default to US-ASCII,
or <important>that we have to default at all</important>?  

>	2) In all cases, all code sequences in
>	the document must match code sequences allowed
>	by the encoding specified by the charset parameter.
>	Failure is an unrecoverable error, for general
>	applications. Detection is not mandatory.
>	
>	3) In all cases, if the document starts with a BOM,
>	the charset parameter must indicate which flavour
>	of UTF-16 is being used. There is no default.
>	Failure is an unrecoverable error, for general
>	applications. Detection is not mandatory, but should
>	be made so at some future date.
... 
>The reason for 3) is that, as Murata-san's proposed
>Japanese Profile of XML makes clear, there are Japanese flavours
>of Unicode floating about. So just relying on the BOM is not
>satisfactory. 

This is going too far.  If there are bogus things in Japan claiming
to be UTF-16 when they're really not, we should not visibly strain the
architecture to accomodate them.  The BOM is in practice a highly robust
and efficient mechanism; telling server applications that they must
distinguish BE and LE flavors is an order that is unlikely to be followed,
and if followed, unlikely to be implemented correctly.  Which is 
especially since it serves no useful purpose whatsoever.

>	4) If the document is sent text/xml, the encoding
>	parameter of the XML header is not checked. However,
>	well-behaved systems should rewrite the encoding
>	attribute of the XML header to agree with charset 
>	parameter. 

Er, "encoding parameter of the XML header", you mean the encoding
declaration?  You need to say "by the transfer agent"; end-user software
should certainly feel free to check this, to catch breakage at the
transfer level.

>	5) If the data is sent application/xml then
>	the charset parameter must agree with the
>	encoding attribute of the XML header. Failure is
>	an unrecoverable error, for general applications.
>	Detection is not mandatory.

Same points as above.  Why must this be supplied?  -Tim



Received: by ns.secondary.com (8.9.3/8.9.3) id HAA06601 for ietf-xml-mime-bks; Wed, 15 Mar 2000 07:55:25 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA06596 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 07:55:23 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id HAA31673; Wed, 15 Mar 2000 07:55:19 -0800
Message-Id: <3.0.32.20000315074308.01353710@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Wed, 15 Mar 2000 07:56:24 -0800
To: Chris Lilley <chris@w3.org>, MURATA Makoto <muraw3c@attglobal.net>
From: Tim Bray <tbray@textuality.com>
Subject: Re: Some text that may be useful for the update of RFC 2376
Cc: "Martin J. Duerst" <duerst@w3.org>, ietf-xml-mime@imc.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 03:26 PM 3/15/00 +0100, Chris Lilley wrote:
>But that is, after all, the point of the -xml suffix? To flag that the
>content is encoded in XML, so thatany processor which feels like fiddling
>with the bytes therin can judge whether it is competent to do so?

C'mon Chris, catch up on the back correspondence.  Several credible
use-cases for the -xml thing have been introduced into the dialogue; this
doesn't mean that everyone is happy with it as a syntactic device.  -Tim


Received: by ns.secondary.com (8.9.3/8.9.3) id HAA06319 for ietf-xml-mime-bks; Wed, 15 Mar 2000 07:42:33 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA06315 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 07:42:32 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id KAA05553; Wed, 15 Mar 2000 10:42:39 -0500 (EST)
Message-Id: <200003151542.KAA05553@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: MURATA Makoto <muraw3c@attglobal.net>
cc: "Martin J. Duerst" <duerst@w3.org>, ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376 
In-reply-to: Your message of "Wed, 15 Mar 2000 21:46:46 +0900." <200003151246.AA01923@t3knz.attglobal.net> 
Date: Wed, 15 Mar 2000 10:42:39 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> Are you saying that each format should invent their own rules for 
> indicating the charset? 

that's how MIME parameters work.  

trying to coerce every data format into a common set of rules is
much more difficult, and highly unlikely to succeed.

Ketih


Received: by ns.secondary.com (8.9.3/8.9.3) id HAA06123 for ietf-xml-mime-bks; Wed, 15 Mar 2000 07:35:53 -0800 (PST)
Received: from gate.sinica.edu.tw (gate.sinica.edu.tw [140.109.4.130]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA06117 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 07:35:51 -0800 (PST)
Received: from localhost by gate.sinica.edu.tw (8.9.3/8.9.3) with ESMTP id XAA06121 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 23:37:01 +0800 (CST)
Date: Wed, 15 Mar 2000 23:37:01 +0800 (CST)
From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
X-Sender: ricko@gate
To: ietf-xml-mime@imc.org
Subject: Resend! Re: Some text that may be useful for the update of RFC 2376
In-Reply-To: <Pine.GSO.4.21.0003152319200.19582-100000@gate>
Message-ID: <Pine.GSO.4.21.0003152334070.19582-100000@gate>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Sorry...last send was garbled

On Wed, 15 Mar 2000, Rick Jelliffe wrote:

> On Wed, 15 Mar 2000, MURATA Makoto wrote:
> 
> > Are you saying that each format should invent their own rules for 
> > indicating the charset?  My understanding was (and still is) that 
> > you as an I18n guy at W3C are promoting a single generalized solution 
> > for all textual formats.

Please let us stick with the rationale for having text/xml and
application/xml in the first place: that the former is useful
in all the cases where transcoding/newline-fiddling/defaulting-to-
text-viewer was useful and the latter is useful when we want to
prevent trancoding/newline-fiddling/defaulting-to-text-viewer.
 
It should always be an error of some kind if the charset parameter
does not agree with the encoding attribute (or other Appendix F
mechanism), for application/xml documents. 

If that constraint is in place, then there is no harm in having
a charset paramter or treating it as significant: it is just being
treated as co-equal with the Appendix F algorithm.

Rick Jelliffe




Received: by ns.secondary.com (8.9.3/8.9.3) id HAA05607 for ietf-xml-mime-bks; Wed, 15 Mar 2000 07:23:27 -0800 (PST)
Received: from gate.sinica.edu.tw (gate.sinica.edu.tw [140.109.4.130]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA05601 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 07:23:24 -0800 (PST)
Received: from localhost by gate.sinica.edu.tw (8.9.3/8.9.3) with ESMTP id XAA01415 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 23:24:34 +0800 (CST)
Date: Wed, 15 Mar 2000 23:24:34 +0800 (CST)
From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
X-Sender: ricko@gate
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
In-Reply-To: <200003151246.AA01923@t3knz.attglobal.net>
Message-ID: <Pine.GSO.4.21.0003152319200.19582-100000@gate>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On Wed, 15 Mar 2000, MURATA Makoto wrote:

> Are you saying that each format should invent their own rules for 
> indicating the charset?  My understanding was (and still is) that 
> you as an I18n guy at W3C are promoting a single generalized solution 
> for all textual formats.

Please let us stick with the rationale for having text/xml and
application/xml in the first place: the former would be useful
in all the cases where transcoding/newline-fiddling/defaulting-to-
text-viewer was useful and the latter was useful when we want to
prevent trancoding/newline-fiddling/defaulting-to-text-viewer.

It should always be an error of some kind if the charset parameter
does not agree with the encoding attribute (or other Appendix F
mechanism).  Only given that constraint is it useful to make the
charset parameter significant.

Rick Jelliffe




Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id GAA04256 for ietf-xml-mime-bks; Wed, 15 Mar 2000 06:37:56 -0800 (PST)
Received: from gate.sinica.edu.tw (gate.sinica.edu.tw [140.109.4.130]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id GAA04252 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 06:37:53 -0800 (PST)
Received: from localhost by gate.sinica.edu.tw (8.9.3/8.9.3) with ESMTP id WAA17730 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 22:39:02 +0800 (CST)
Date: Wed, 15 Mar 2000 22:39:02 +0800 (CST)
From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
X-Sender: ricko@gate
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
In-Reply-To: <38CD4D1E.847976B1@w3.org>
Message-ID: <Pine.GSO.4.21.0003152145500.28051-100000@gate>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

What about this?

	1) In all cases, charset parameter is required.
	There is no default. Failure is an unrecoverable
	error, for general applications. Detection is
	mandatory.

	2) In all cases, all code sequences in
	the document must match code sequences allowed
	by the encoding specified by the charset parameter.
	Failure is an unrecoverable error, for general
	applications. Detection is not mandatory.
	
	3) In all cases, if the document starts with a BOM,
	the charset parameter must indicate which flavour
	of UTF-16 is being used. There is no default.
	Failure is an unrecoverable error, for general
	applications. Detection is not mandatory, but should
	be made so at some future date.

	4) If the document is sent text/xml, the encoding
	parameter of the XML header is not checked. However,
	well-behaved systems should rewrite the encoding
	attribute of the XML header to agree with charset 
	parameter. 

	5) If the data is sent application/xml then
	the charset parameter must agree with the
	encoding attribute of the XML header. Failure is
	an unrecoverable error, for general applications.
	Detection is not mandatory.

	6) The rules above can be bent or strengthened for
	specialist applications, by specific agreement between
	the recipient and sending parties. The main 
	alteration envisaged would be to allow, as an 
	obvious error-recovery strategy, that if the 
	charset parameter is missing, the encoding attribute
	of the XML header can be used. Another alteration
	envisaged is for some defaulting to be used.
	However, specialist applications which require this
	behaviour should not, in general be using text/xml*
	or application/xml*.

Discussion:

The reason for 1) is that we have a clash between user expections
(iso8859-1), RFCs (US-ASCII) and XML defaults (UTF-8). There is
no winnable solution to defaults. 

The reason for 2) is simply to state clearly that error-recovery
from corrupted data is not the norm.

The reason for 3) is that, as Murata-san's proposed
Japanese Profile of XML makes clear, there are Japanese flavours
of Unicode floating about. So just relying on the BOM is not
satisfactory. (Also, I see no reason why it may not be useful
to distinguish in the charset parameter whether Unicode 2 or 
Unicode 3 is being used; but that is another issue.)  This 
issue also impacts 1): if UTF-8 is the default, it is easier
to be lazy, which in turn makes it easier for Japanese data
to be mislabelled as standard UTF-8.

The reason for 4) is that traditionally the text/* types allow
point-to-point transcoding: DOS to MAC to UNIX newlines, 
character encoding, perhaps even trailing white-space trunctation
are the kinds of things. 

The reason for 5) is that the reason why we have application/xml
as well as text/xml is to prevent point-to-point manipulation of
the data. It should be treated like a binary file. It should 
allow end-to-end data integrity. 

(There is a fundamental weak point in point-to-point charset 
parameter transmission: there is no standard mechanism for 
registering the character set of individual files which a 
webserver can pick up: furthermore, some programming languages 
such as C do not have a character type but operate on storage types, 
so the encoding data is not available automatically anyway; 
also, on UNIX systems using pipes, there is no parallel channel 
available for out-of-band information between the processes on 
either side of the pipe, so encoding information may be
difficult to propogate automatically.  However, the 
point-to-point mechanism of text/xml is clearly generally 
useful and usable for single-locale sites and important to 
support.)

7) These measure are perhaps more extreme than many people 
would wish. That is why the detection requirements are so lax,
and the provisions for bending the rules are spelled out.


Rick Jelliffe





Received: by ns.secondary.com (8.9.3/8.9.3) id GAA04034 for ietf-xml-mime-bks; Wed, 15 Mar 2000 06:28:25 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id GAA04030 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 06:28:23 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id JAA16662; Wed, 15 Mar 2000 09:29:35 -0500
Message-ID: <38CF9E4D.D7EC72B1@w3.org>
Date: Wed, 15 Mar 2000 15:29:33 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: "Martin J. Duerst" <duerst@w3.org>
CC: MURATA Makoto <muraw3c@attglobal.net>, ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <38CD4D1E.847976B1@w3.org> <38CD4D1E.847976B1@w3.org> <200003150548.OAA01789@sh.w3.mag.keio.ac.jp>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

"Martin J. Duerst" wrote:
> 
> At 00:21 00/03/15 +0900, MURATA Makoto wrote:
> 
> >  >> - XML sent as application/xml (or equivalent):
> >  >>   - Charset parameter is strongly recommended, and if present,
> >  >>     it takes precedence.
> >  >
> >  >Charset parameter is *disallowed*.
> 
> I'm not sure it makes sense to disallow it, but discouraging
> it might be fine.

No. Discourage means wool; it means sometimes-this, sometimes-that. It
means admitting the possibility of conflicting sources of information.

> > You might think that we can avoid bad WWW servers by this change.  But
> > we cannnot.  We have to handle a collection of XML, XSL, CSS, VBScript,
> > JavaScript, etc.  We need a solution that works for every format.
> > Otherwise, data will corrupt.
> 
> CSS is served as text/css. XSL is XML. VBScript and JavaScript may be
> served as application/... If they don't have a 'charset' parameter,
> and they don't have any internal way to indicate the encoding,
> that's the problem of these registrations, not our problem.

Yes. Thanks for the reminder about the scope and purpose of this list. 

And more to the point, data will not corrupt if it is left alone. However,
software which blindly fiddles with data it does not understand is fairly
obviously a source of corruption.

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id GAA03953 for ietf-xml-mime-bks; Wed, 15 Mar 2000 06:25:01 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id GAA03948 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 06:25:00 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id JAA16109; Wed, 15 Mar 2000 09:26:12 -0500
Message-ID: <38CF9D82.F4420ED4@w3.org>
Date: Wed, 15 Mar 2000 15:26:10 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: MURATA Makoto <muraw3c@attglobal.net>
CC: "Martin J. Duerst" <duerst@w3.org>, ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <200003150548.OAA01789@sh.w3.mag.keio.ac.jp> <200003151246.AA01923@t3knz.attglobal.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:
> 
> In message "Re: Some text that may be useful for the update of RFC 2376",
> Martin J. Duerst wrote...
>  >CSS is served as text/css. XSL is XML. VBScript and JavaScript may be
>  >served as application/... If they don't have a 'charset' parameter,
>  >and they don't have any internal way to indicate the encoding,
>  >that's the problem of these registrations, not our problem.
> 
> Are you saying that each format should invent their own rules for
> indicating the charset?  My understanding was (and still is) that
> you as an I18n guy at W3C are promoting a single generalized solution
> for all textual formats.

Are you saying that each transport protocol (which formally inclues direct
filesystem access) should have their own, sometimes contradictory,
overrides and defaults and assumptions? Or that we should take the current,
lowest-common-denominator, fails far more often than it works charset
parameter of two particular protocols (each of which has a different
default, ands neither of which is implemented consistently) and attempt to
stretch this to make loose and wooly the current, fairly good state of
encoding declaration of XML files?

There is an English colloquial expression: "throwing out the baby with the
bathwater".

Several people have pointed out that I am focussing on XML here. I would
refer them to the name and scope of the mailing list.

Incidentally, XML is probably not best described as a textual format. It is
a data format, which can among other things be used to describe
international text. I am aware that the text/* media types have some
historical requirements regarding 'character set'; this is sufficient that
my opinion is that text/* should not be used for XML in general.
Application/xml has no such problems (though it seems that people propose
to propogate these problems there).

Several people have described dumb, content-unaware charset transoders as
the most important thing that they are concerned with, and expressed
surprise that such converters should be required to know something about
the payload whose bytes they are merrily altering. In reply, I say - of
course. Otherwise, data corruption will clearly occur.

It is possible for example to take a payload of image/svg-xml and alter it
from UTF-16 to ISO-8859-15 (this would entail rewriting the encoding
declaration and insertion of NCRs for any characters outside the repertoire
of 8859-15). I would be most upset, as would every decoder on the planet,
if the same conversion was performed on image/png.

But that is, after all, the point of the -xml suffix? To flag that the
content is encoded in XML, so thatany processor which feels like fiddling
with the bytes therin can judge whether it is competent to do so?



--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id EAA29264 for ietf-xml-mime-bks; Wed, 15 Mar 2000 04:59:04 -0800 (PST)
Received: from server1.software-ag.de (server1.software-ag.de [193.26.194.2]) by ns.secondary.com (8.9.3/8.9.3) with SMTP id EAA29259 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 04:59:00 -0800 (PST)
Message-ID: <2339B88D6AA6D31187A80008C7E6F6722D910B@daemsg01.software-ag.de>
From: "Langer, Paul" <Paul.Langer@softwareag.com>
To: ietf-xml-mime@imc.org
Subject: RE: Some text that may be useful for the update of RFC 2376
Date: Wed, 15 Mar 2000 14:00:12 +0100
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain; charset="iso-8859-1"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

>-----Original Message-----
>From: Martin J. Duerst [mailto:duerst@w3.org]
>Sent: Wednesday, March 15, 2000 6:17 AM
>Subject: Re: Some text that may be useful for the update of RFC 2376
>
> [snip]
>I completely disagree. A trancoder transcodes. A transcoder may know
>about a few (or a lot of) encodings. It is absolutely unreasonable
>to ask for a transcoder to know all kinds of data formats, and
>where in that data format some encoding hints are hidden (if they are).

I agree that transcoders
- are a good thing
- have to be able to do their job without interpreting the data

But there is an open issue with XML (media type "text/xml") via HTTP:

There are systems out there now (e.g. IE5, Netscape 4.7) that send
XML documents with correct encoding declaration as media type "text/xml"
without charset parameter.
If the document arrives without a charset parameter in the Content-Type
header at the XML processor's site, the processor does not know whether
there was a transcoder involved or not and has to use encoding "us-ascii"
for this document.

The XML spec (chapter 4.3.3 Character Encoding in Entities,
(http://www.w3.org/TR/REC-xml.html#charencoding), says:
   "In the absence of information provided by an external transport protocol
    (e.g. HTTP or MIME), it is an  error for an entity including an encoding
    declaration to be presented to the XML processor in an encoding other
than
    that named in the declaration, ..." 

Unfortunately this "absence of information provided by an external transport
protocol" can never happen with the current definition of media type
"text/xml"
since RFC 2376 requires the fallback to the default "us-ascii".

I think the charset parameter should stay "STRONGLY RECOMMENDED" and
authoritative, but if there is no charset parameter given, the encoding 
declaration of the XML document should be used.


All the best,
Paul

-------------------------------------------------------------
Paul Langer               E-mail   Paul.Langer@softwareag.com
Software AG               Tel.     +49-6151-92-1912
Uhlandstr. 12             Fax      +49-6151-92-1613
64297 Darmstadt




-------------------------------------------------------------
Paul Langer               E-mail   Paul.Langer@softwareag.com
Software AG               Tel.     +49-6151-92-1912
Uhlandstr. 12             Fax      +49-6151-92-1613
64297 Darmstadt

 


Received: by ns.secondary.com (8.9.3/8.9.3) id EAA29039 for ietf-xml-mime-bks; Wed, 15 Mar 2000 04:51:27 -0800 (PST)
Received: from prserv.net (out2.prserv.net [32.97.166.32]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id EAA29035 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 04:51:26 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.156]) by prserv.net (out2) with SMTP id <2000031512523422900nda7ee>; Wed, 15 Mar 2000 12:52:35 +0000
Message-Id: <200003151246.AA01923@t3knz.attglobal.net>
Date: Wed, 15 Mar 2000 21:46:46 +0900
To: "Martin J. Duerst" <duerst@w3.org>
Cc: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <200003150548.OAA01789@sh.w3.mag.keio.ac.jp>
References: <200003150548.OAA01789@sh.w3.mag.keio.ac.jp>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Martin J. Duerst wrote...
 >CSS is served as text/css. XSL is XML. VBScript and JavaScript may be
 >served as application/... If they don't have a 'charset' parameter,
 >and they don't have any internal way to indicate the encoding,
 >that's the problem of these registrations, not our problem.

Are you saying that each format should invent their own rules for 
indicating the charset?  My understanding was (and still is) that 
you as an I18n guy at W3C are promoting a single generalized solution 
for all textual formats.

Cheers,

----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id EAA27535 for ietf-xml-mime-bks; Wed, 15 Mar 2000 04:01:57 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id EAA27531 for <ietf-xml-mime@imc.org>; Wed, 15 Mar 2000 04:01:55 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id HAA02674; Wed, 15 Mar 2000 07:03:06 -0500
Message-ID: <38CF7BFA.400446AA@w3.org>
Date: Wed, 15 Mar 2000 13:03:06 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: "Martin J. Duerst" <duerst@w3.org>
CC: MURATA Makoto <muraw3c@attglobal.net>, ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <2339B88D6AA6D31187A80008C7E6F6722D9102@daemsg01.software-ag.de> <200003141520.AA01914@t3knz.attglobal.net> <200003150548.OAA01786@sh.w3.mag.keio.ac.jp>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

"Martin J. Duerst" wrote:
> 
> At 17:15 00/03/14 +0100, Chris Lilley wrote:
> 
> > MURATA Makoto wrote:
> 
> > > We are all aware of this problem.  We are also aware of transcoders
> > > which changes the charset parameter but does not rerwrite encoding
> > > declarations.
> >
> > Yes - such behaviour is clearly broken. Since a transcoder is changing many
> > or all the other bytes in the file, expecting it to also correctly update
> > the encoding declaration rather than leaving it broken is not asking too
> > much.
> 
> I completely disagree. A trancoder transcodes. A transcoder may know
> about a few (or a lot of) encodings. It is absolutely unreasonable
> to ask for a transcoder to know all kinds of data formats, and
> where in that data format some encoding hints are hidden (if they are).

So, rare and unusual formats like XML are probably not worth the hassle,
then?

Making this case slightly simpler to code, at the expense of throwing away
the excellent advantage that XML has in terms of only parsing when the
encoding is correct and making it wooly and ill defined in common cases
like *transferring XML between systems* seems utterly crazy.

> If we had only XML on the internet, that would be different,
> but that's not at all the case (yet).

Exactly. Not only is it on the internet, but it is also everywhere else,
frequently accessed by a local file system. It's not the only format, but
equally is a very important one. 

--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id VAA11560 for ietf-xml-mime-bks; Tue, 14 Mar 2000 21:47:05 -0800 (PST)
Received: from sh.w3.mag.keio.ac.jp (sh.w3.mag.keio.ac.jp [133.27.194.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id VAA11541 for <ietf-xml-mime@imc.org>; Tue, 14 Mar 2000 21:46:55 -0800 (PST)
Received: from enoshima (dhcp-100-224.mag.keio.ac.jp [133.27.195.224]) by sh.w3.mag.keio.ac.jp (8.9.3/3.7W) with SMTP id OAA01786; Wed, 15 Mar 2000 14:48:04 +0900 (JST)
Message-Id: <200003150548.OAA01786@sh.w3.mag.keio.ac.jp>
X-Sender: duerst@sh.w3.mag.keio.ac.jp
X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.3-J (32)
Date: Wed, 15 Mar 2000 14:17:22 +0900
To: Chris Lilley <chris@w3.org>
From: "Martin J. Duerst" <duerst@w3.org>
Subject: Re: Some text that may be useful for the update of RFC 2376
Cc: MURATA Makoto <muraw3c@attglobal.net>, ietf-xml-mime@imc.org
In-Reply-To: <38CE65A3.1BFE9ECF@w3.org>
References: <2339B88D6AA6D31187A80008C7E6F6722D9102@daemsg01.software-ag.de> <200003141520.AA01914@t3knz.attglobal.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 17:15 00/03/14 +0100, Chris Lilley wrote:

> MURATA Makoto wrote:

> > We are all aware of this problem.  We are also aware of transcoders
> > which changes the charset parameter but does not rerwrite encoding
> > declarations.
> 
> Yes - such behaviour is clearly broken. Since a transcoder is changing many
> or all the other bytes in the file, expecting it to also correctly update
> the encoding declaration rather than leaving it broken is not asking too
> much.

I completely disagree. A trancoder transcodes. A transcoder may know
about a few (or a lot of) encodings. It is absolutely unreasonable
to ask for a transcoder to know all kinds of data formats, and
where in that data format some encoding hints are hidden (if they are).

If we had only XML on the internet, that would be different,
but that's not at all the case (yet).


Regards,   Martin.


#-#-#  Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org/People/D%C3%BCrst


Received: by ns.secondary.com (8.9.3/8.9.3) id VAA11549 for ietf-xml-mime-bks; Tue, 14 Mar 2000 21:46:57 -0800 (PST)
Received: from sh.w3.mag.keio.ac.jp (sh.w3.mag.keio.ac.jp [133.27.194.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id VAA11540 for <ietf-xml-mime@imc.org>; Tue, 14 Mar 2000 21:46:55 -0800 (PST)
Received: from enoshima (dhcp-100-224.mag.keio.ac.jp [133.27.195.224]) by sh.w3.mag.keio.ac.jp (8.9.3/3.7W) with SMTP id OAA01789; Wed, 15 Mar 2000 14:48:05 +0900 (JST)
Message-Id: <200003150548.OAA01789@sh.w3.mag.keio.ac.jp>
X-Sender: duerst@sh.w3.mag.keio.ac.jp
X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.3-J (32)
Date: Wed, 15 Mar 2000 14:23:14 +0900
To: MURATA Makoto <muraw3c@attglobal.net>
From: "Martin J. Duerst" <duerst@w3.org>
Subject: Re: Some text that may be useful for the update of RFC 2376
Cc: ietf-xml-mime@imc.org
In-Reply-To: <200003141521.AA01915@t3knz.attglobal.net>
References: <38CD4D1E.847976B1@w3.org> <38CD4D1E.847976B1@w3.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 00:21 00/03/15 +0900, MURATA Makoto wrote:

>  >> - XML sent as application/xml (or equivalent):
>  >>   - Charset parameter is strongly recommended, and if present,
>  >>     it takes precedence.
>  >
>  >Charset parameter is *disallowed*.

I'm not sure it makes sense to disallow it, but discouraging
it might be fine.


> You might think that we can avoid bad WWW servers by this change.  But 
> we cannnot.  We have to handle a collection of XML, XSL, CSS, VBScript, 
> JavaScript, etc.  We need a solution that works for every format.  
> Otherwise, data will corrupt.

CSS is served as text/css. XSL is XML. VBScript and JavaScript may be
served as application/... If they don't have a 'charset' parameter,
and they don't have any internal way to indicate the encoding,
that's the problem of these registrations, not our problem.



Regards,   Martin.


#-#-#  Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org/People/D%C3%BCrst


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA13235 for ietf-xml-mime-bks; Tue, 14 Mar 2000 09:15:50 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA13230 for <ietf-xml-mime@imc.org>; Tue, 14 Mar 2000 09:15:47 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id MAA29673; Tue, 14 Mar 2000 12:16:54 -0500
Message-ID: <38CE7404.316E69DB@w3.org>
Date: Tue, 14 Mar 2000 18:16:52 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: MURATA Makoto <muraw3c@attglobal.net>
CC: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <38CD4D1E.847976B1@w3.org> <200003141521.AA01915@t3knz.attglobal.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:
> 
> In message "Re: Some text that may be useful for the update of RFC 2376",
> Chris Lilley wrote...
> 
>  >> - XML sent (e.g. mail, http) as text/xml (or equivalent, e.g. text/vnd.wap.wml):
>  >
>  >as text/"anything" in other words
> 
> I think that RFC 2046 covers text/* in general.  RFC 2376 cannot
> change the default of HTTP (i.e., 8859-1).  The IAB allowed
> RFC 2376 to change the default for text/xml only.

That was what I was suggesting, changing it for text/xml only.

> 
>  >>   - Charset parameter is strongly recommended
>  >
>  >Charset parameter is required if the charset is not UTF-8 or UTF-16
> 
> Even when the charset is UTF-8 or UTF-16, the parameter is required.
> Otherwise, we will be inconsitent with RFC 2046.
> 
>  >>   - If no charset parameter, default is ASCII. The default of iso-8859-1 in
>  >>     HTTP is explicitly overridden in the specification of the charset
>  >>     parameter in section 3.1 "Text/xml Registration" of RFC 2376
>  >>     (http://www.ietf.org/rfc/rfc2376.txt)
>  >
>  >The charset (not default, but THE charset) is UTF-16 (if BOM) or UTF-8 (if
>  >no BOM) and the "default" of iso-8859-1 in HTTP and US-ASCII in mail is
>  >explicitly overridden ...
> 
> This conflicts with RFC 2046.

Only if adopted as ageneral rule for all text/*, which I was not
suggesting. I was suggesting, instead, that it be for text/xml and
text/*-xml


> 
>  >> - XML sent as application/xml (or equivalent):
>  >>   - Charset parameter is strongly recommended, and if present,
>  >>     it takes precedence.
>  >
>  >Charset parameter is *disallowed*.
> 
> I do not agree.
> 
> You might think that we can avoid bad WWW servers by this change.  But
> we cannnot.  We have to handle a collection of XML, XSL, CSS, VBScript,
> JavaScript, etc. 

Perhaps, but in that case ietf-*XML*-mime would not be the place to propose
such a cange.

> We need a solution that works for every format.
> Otherwise, data will corrupt.

I would add, we need a solutionthat works for every transport, including
'file'. Otherwise, data will become corrupt.

--
Chris


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id IAA12002 for ietf-xml-mime-bks; Tue, 14 Mar 2000 08:14:28 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA11998 for <ietf-xml-mime@imc.org>; Tue, 14 Mar 2000 08:14:25 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id LAA22631; Tue, 14 Mar 2000 11:15:32 -0500
Message-ID: <38CE65A3.1BFE9ECF@w3.org>
Date: Tue, 14 Mar 2000 17:15:31 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: MURATA Makoto <muraw3c@attglobal.net>
CC: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <2339B88D6AA6D31187A80008C7E6F6722D9102@daemsg01.software-ag.de> <200003141520.AA01914@t3knz.attglobal.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:
> 
> In message "RE: Some text that may be useful for the update of RFC 2376",
> Langer, Paul wrote...
> 
>  >We are developing an XML-database that gets input via HTTP.
>  >In a previous release we implemented RFC 2376 correctly (for
>  >media type text/xml we used the value of the charset parameter to
>  >determine the encoding of input documents; if this parameter was
>  >omitted we used the default "us-ascii").
> 
> We are all aware of this problem.  We are also aware of transcoders
> which changes the charset parameter but does not rerwrite encoding
> declarations.

Yes - such behaviour is clearly broken. Since a transcoder is changing many
or all the other bytes in the file, expecting it to also correctly update
the encoding declaration rather than leaving it broken is not asking too
much.

> In Japan, we have a very interesting problem.  We have XML, XSL,
> Javascript, VBScript, CSS, and HTML, which reference to each other.  Some
> formats provide inline declarations.  Other formats do not.  IE 5.0
> appear to assume that if an HTML document is in UTF-16, anything
> referenced from this HTML is also in UTF-16. 

This assumption of IE5 is not correct; and will get them into severe
trouble if carried forward to XML - different entities can use different
encodings, as the XML spec clearly says.

> Unfortunately, even
> when XML, XSL, and CSS are all in Shift_JIS, an internally generated
> HTML is in UTF-16.  Thus, we have data corruption.

It is entirely legal for the CSS stylesheet and for the HTMl docuiment to
use different encodings.  There is absolutely no problem with the style
sheet being in (one of the many) Shift-JISes and the HTML or XML document
being in UTF-16. If IE5 or any other browser does not deal with this use
case correctly, it should be fixed.

> I have come to believe that we need a single solution for every format.

I have come to believe that taking an out of band solution which sort of
works for text/* over HTTP and email, and trying to extend it to
application/* and image/* and video/* and ftp and file and other sorts of
protocol - to make it fit all cases - has some very clear problems in
extremely common use cases. Like the fact that 99.999% of content providers
have no control over the configuration of the web server they use, but do
have control of the content that they place there. And the fact that
file-based processing (on servers, on clients) is extremely common. Making
these common cases not work, to save a few lines of code in a transcoder
which knows it is converting from encoding A to encoding B and thus knows
what encoding declaration to write out if it could be bothered to do so,
seems highly curious.

> The charset parameter is such a solution. 

It is one such solution. There are better ones, and indeed a much better
one in the XML specification. Wisely, XML instances which are read with the
wrong encoding give well formedness arrors and halt. This is excellent.
Unfortunately, complicating the issue with sometimes-there Content-type
headers with sometimes-there encoding ("charset") declartations reverts the
state of play to  transport-dependent defaults and "some sort of error
correction", a world we are trying to move away from, a world of silent
data corruption.


> We should not try to bend
> specifications only to invent an ad-hoc solution for a particular format.

I can only agree with that sentence by replacing "format" with "protocol".

On the contrary, I believe that the excellent basis of XML should not be
bent to cope with historical details in text/* media types and their
divergent defaults depending on transport protocol.

> Let us strongly request internationalized WWW browsers & servers to
> Microsoft and Netscape.

No one could fail to endorse such a general message (and to add the many
other suppliers of technology to the list) but it does not follow that your
preferred complete reliance on the charset parameter will achieve such an
end - in fact, I would say rather the reverse.

Earlier, I suggested revised rules for encoding determination which are
completely rigorous, deterministic, do not rely on any hand waving or
ill-specified error correction, and allow automated content creation tools
to do the right thing simply and easily and for XML files to work correctly
in all cases and to not ever have multiple inconsistent sources of encoding
declaration. I would consider such rules to be an essential step towards
the worth (and indeed, readilly achievable) goal in your message above.


--
Chris


Received: by ns.secondary.com (8.9.3/8.9.3) id HAA11145 for ietf-xml-mime-bks; Tue, 14 Mar 2000 07:39:31 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA11141 for <ietf-xml-mime@imc.org>; Tue, 14 Mar 2000 07:39:30 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id KAA17218; Tue, 14 Mar 2000 10:40:31 -0500
Message-ID: <38CD4D1E.847976B1@w3.org>
Date: Mon, 13 Mar 2000 21:18:38 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: "Martin J. Duerst" <duerst@w3.org>
CC: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <200003130251.LAA20677@sh.w3.mag.keio.ac.jp>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

"Martin J. Duerst" wrote:
> 
> I came up with this for a different purpose, but Dan Connolly
> suggested it might be added to an update of RFC 2376, as a
> quick overview:

Here is my suggested amendment, which removes dubious wiggle room and
weasel words and makes the result purely deterministic:

> - XML sent (e.g. mail, http) as text/xml (or equivalent, e.g. text/vnd.wap.wml):

as text/"anything" in other words

>   - Charset parameter is strongly recommended

Charset parameter is required if the charset is not UTF-8 or UTF-16

>   - If no charset parameter, default is ASCII. The default of iso-8859-1 in
>     HTTP is explicitly overridden in the specification of the charset
>     parameter in section 3.1 "Text/xml Registration" of RFC 2376
>     (http://www.ietf.org/rfc/rfc2376.txt)

The charset (not default, but THE charset) is UTF-16 (if BOM) or UTF-8 (if
no BOM) and the "default" of iso-8859-1 in HTTP and US-ASCII in mail is
explicitly overridden ...

>   - No error handling provisions
>   - An encoding declaration, if present, is irrelevant, but when saving a
>     received resource as a file, the correct encoding declaration should
>     be inserted.

shall be inserted. 

[if the application claims to save as XML rather than saving as a bunch of
stuff with pointy brackets. If it fails to do so, then the rules for static
storage explains what happens when the file is next parsed - WF error. ]

> - XML sent as application/xml (or equivalent):
>   - Charset parameter is strongly recommended, and if present,
>     it takes precedence.

Charset parameter is *disallowed*.

>   - If the charset parameter is omited, the rules for XML in static storage
>     are followed (see below).

The rules for XML in static storage are followed. Such files may be freely
saved to static storage without modification in all cases.

> - XML in static storage without external metainformation (e.g. file):
>   - Default is UTF-8, or UTF-16 if there is a BOM

For files without an explicit encoding declaration, the file is in UTF-16
if there is a BOM and UTF-8 if there is not.

>   - For other things, there has to be an encoding declaration
>   - There is some provision for 'error recovery'. What exactly this
>     means is currently under discussion in the XML Core WG, so that
>     it can  be clarified.

"Some provision"????

There is no provision for error recovery, and if a file does not parse for
whatever reason then it shall be a wellX-Mozilla-Status: 0009--
Chris




Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id HAA10935 for ietf-xml-mime-bks; Tue, 14 Mar 2000 07:27:26 -0800 (PST)
Received: from hesketh.net (wasabi-eth0-1.hesketh.net [216.27.10.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA10931; Tue, 14 Mar 2000 07:27:25 -0800 (PST)
Received: from thinkpad (ith1-1ac.twcny.rr.com [24.92.236.172]) by hesketh.net (8.9.3/8.9.3) with SMTP id KAA25347; Tue, 14 Mar 2000 10:28:34 -0500
Message-Id: <200003141528.KAA25347@hesketh.net>
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-xml-mime@imc.org
X-Sender: simonstl@216.27.10.33
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 
Date: Tue, 14 Mar 2000 10:29:32 -0500
To: ietf-822@imc.org, ietf-xml-mime@imc.org, ietf-types@iana.org
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Compromises on XML Media Types  (was Re: reason for application/iotp-xml)
Cc: Keith Moore <moore@cs.utk.edu>, "Dan Kohn" <dan@teledesic.com>, "MURATA Makoto" <muraw3c@attglobal.net>, ned.freed@INNOSOFT.COM
In-Reply-To: <200003131459.JAA19690@astro.cs.utk.edu>
References: <Your message of "Mon, 13 Mar 2000 11:07:17 GMT."             <4.2.2.20000313110355.00cb0290@pop.dial.pipex.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

First, apologies for spreading this message across multiple discussion
groups.  The headers for this round of discussion seem to keep growing, so
it would appear that there is interest in multiple circles.

The current draft of XML Media Types
(http://www.ietf.org/internet-drafts/draft-murata-xml-02.txt) includes a
section (6, with some examples in 7) representing a compromise position
arrived at through discussion on the ietf-xml-mime list.  I'll attempt to
summarize that discussion with reference to particular posts, and describe
at the end why I think the compromise is worth supporting.  Having
participated actively in the discussion, my viewpoint is biased, however,
so readers may wish to return to the archives themselves.
(http://www.imc.org/ietf-xml-mime/mail-archive/)

RFC 2376 provided the foundation for continuing work on XML Media Types,
describing the text/xml and application/xml types.  RFC 2376 served a
critical need at the time, providing a basic set of rules for transferring
XML over MIME-aware systems.  As XML development has spread to a wider
group, however, many developers began finding that they needed more
specific identifiers for particular XML vocabularies. In the process of
updating RFC 2376, we discussed how to handle these emerging vocabularies
as well as other critical issues, like identifying documents which are used
by XML without being XML per se, and encoding.

I arrived a bit late in the discussion, turning back claims
(http://www.imc.org/ietf-xml-mime/mail-archive/msg00060.html, #3) that
generic XML processing is not in fact useful as a fallback position.  I
think at this point there is significant if not complete consensus on
ietf-xml-mime that generic XML processing can be useful in a wide variety
of situtations, not all of which can be predicted at design-time by the
creators of a particular specification.

Given that generic processing can be useful, and becomes far easier if
content is explicitly labeled as XML - I'll be happy to defend that
position in greater detail if necessary - the question becomes how best to
enable such processing within the MIME framework.  (See
http://www.imc.org/ietf-xml-mime/mail-archive/msg00105.html for one
viewpoint opposing the provision of additional information within MIME
identifiers.)

I pushed hard at one point for a top-level xml/.  In many ways, this would
let the two-part MIME identifiers continue with the least modification,
would take advantage of the hierarchical structure already present.  This
did not prove acceptable, however, for a number of reasons that go well
beyond sheer reluctances to add another top-level types.  Most notably, XML
can be used to represent content in any of the top-level areas described by
content-types.  It is entirely plausible and in fact likely that a generic
tool for playing multimedia content (like the RealPlayer) would support
XML-based formats like SMIL or SVG.

As a result of this, discussion turned to other options for identifying XML
content.  The use of additional parameters (beyond content-type) had been
discussed earlier, the creation of an XML registration tree was also
proposed by Rick Jelliffe.  

I made a 'modest proposal'
(http://www.imc.org/ietf-xml-mime/mail-archive/msg00149.html, more formally
at http://www.imc.org/ietf-xml-mime/mail-archive/msg00149.html) for the
suffix, hoping to strike a balance between the existing structures and
newer needs.  The thread that followed
(http://www.imc.org/ietf-xml-mime/mail-archive/threads.html#00200) was not
exactly a love-fest, but seemed to be acceptable.

Most discussion on the subject since the initial proposals has had to do
with the particulars of the Internet Drafts that subsequently included the
proposals, not a general attack on the principle behind the suffix.  In my
other endeavors, outside of the IETF process, I haven't yet found
resistance to the suffix, though admittedly that's only about five cases,
all of which are in their early stages.

Yes, it's a compromise, and yes, it does mean additional work (suffix
examination) for applications that want to work with XML generically, but
it makes the task of generic XML tools much easier while imposing minimal
costs on the rest of the MIME infrastructure.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com


Received: by ns.secondary.com (8.9.3/8.9.3) id HAA10761 for ietf-xml-mime-bks; Tue, 14 Mar 2000 07:20:22 -0800 (PST)
Received: from prserv.net (out5.prserv.net [32.97.166.35]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA10756 for <ietf-xml-mime@imc.org>; Tue, 14 Mar 2000 07:20:21 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.171]) by prserv.net (out5) with SMTP id <2000031415212024301p4el1e>; Tue, 14 Mar 2000 15:21:21 +0000
Message-Id: <200003141520.AA01914@t3knz.attglobal.net>
Date: Wed, 15 Mar 2000 00:20:52 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <2339B88D6AA6D31187A80008C7E6F6722D9102@daemsg01.software-ag.de>
References: <2339B88D6AA6D31187A80008C7E6F6722D9102@daemsg01.software-ag.de>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "RE: Some text that may be useful for the update of RFC 2376",
Langer, Paul wrote...

 >We are developing an XML-database that gets input via HTTP.
 >In a previous release we implemented RFC 2376 correctly (for
 >media type text/xml we used the value of the charset parameter to
 >determine the encoding of input documents; if this parameter was
 >omitted we used the default "us-ascii").

We are all aware of this problem.  We are also aware of transcoders 
which changes the charset parameter but does not rerwrite encoding 
declarations.

In Japan, we have a very interesting problem.  We have XML, XSL, 
Javascript, VBScript, CSS, and HTML, which reference to each other.  Some  
formats provide inline declarations.  Other formats do not.  IE 5.0 
appear to assume that if an HTML document is in UTF-16, anything 
referenced from this HTML is also in UTF-16.  Unfortunately, even 
when XML, XSL, and CSS are all in Shift_JIS, an internally generated 
HTML is in UTF-16.  Thus, we have data corruption.

I have come to believe that we need a single solution for every format.  
The charset parameter is such a solution.  We should not try to bend 
specifications only to invent an ad-hoc solution for a particular format.  
Let us strongly request internationalized WWW browsers & servers to 
Microsoft and Netscape.

Cheers,



----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id HAA10754 for ietf-xml-mime-bks; Tue, 14 Mar 2000 07:20:21 -0800 (PST)
Received: from prserv.net (out5.prserv.net [32.97.166.35]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA10750 for <ietf-xml-mime@imc.org>; Tue, 14 Mar 2000 07:20:20 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.171]) by prserv.net (out5) with SMTP id <2000031415211824301p4el0e>; Tue, 14 Mar 2000 15:21:19 +0000
Message-Id: <200003141521.AA01915@t3knz.attglobal.net>
Date: Wed, 15 Mar 2000 00:21:50 +0900
To: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <38CD4D1E.847976B1@w3.org>
References: <38CD4D1E.847976B1@w3.org>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...

 >> - XML sent (e.g. mail, http) as text/xml (or equivalent, e.g. text/vnd.wap.wml):
 >
 >as text/"anything" in other words

I think that RFC 2046 covers text/* in general.  RFC 2376 cannot 
change the default of HTTP (i.e., 8859-1).  The IAB allowed 
RFC 2376 to change the default for text/xml only.

 >>   - Charset parameter is strongly recommended
 >
 >Charset parameter is required if the charset is not UTF-8 or UTF-16

Even when the charset is UTF-8 or UTF-16, the parameter is required.  
Otherwise, we will be inconsitent with RFC 2046.

 >>   - If no charset parameter, default is ASCII. The default of iso-8859-1 in
 >>     HTTP is explicitly overridden in the specification of the charset
 >>     parameter in section 3.1 "Text/xml Registration" of RFC 2376
 >>     (http://www.ietf.org/rfc/rfc2376.txt)
 >
 >The charset (not default, but THE charset) is UTF-16 (if BOM) or UTF-8 (if
 >no BOM) and the "default" of iso-8859-1 in HTTP and US-ASCII in mail is
 >explicitly overridden ...

This conflicts with RFC 2046.


 >> - XML sent as application/xml (or equivalent):
 >>   - Charset parameter is strongly recommended, and if present,
 >>     it takes precedence.
 >
 >Charset parameter is *disallowed*.

I do not agree.  

You might think that we can avoid bad WWW servers by this change.  But 
we cannnot.  We have to handle a collection of XML, XSL, CSS, VBScript, 
JavaScript, etc.  We need a solution that works for every format.  
Otherwise, data will corrupt.

Cheers,


----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id GAA09686 for ietf-xml-mime-bks; Tue, 14 Mar 2000 06:20:21 -0800 (PST)
Received: from server1.software-ag.de (server1.software-ag.de [193.26.194.2]) by ns.secondary.com (8.9.3/8.9.3) with SMTP id GAA09682 for <ietf-xml-mime@imc.org>; Tue, 14 Mar 2000 06:20:17 -0800 (PST)
Message-ID: <2339B88D6AA6D31187A80008C7E6F6722D9102@daemsg01.software-ag.de>
From: "Langer, Paul" <Paul.Langer@softwareag.com>
To: ietf-xml-mime@imc.org
Subject: RE: Some text that may be useful for the update of RFC 2376
Date: Tue, 14 Mar 2000 15:21:04 +0100
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain; charset="iso-8859-1"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At Monday, March 13, 2000 9:19 PM, Chris Lilley wrote:

> [snip]
> Charset parameter is required if the charset is not UTF-8 or UTF-16
> [snip]
> The charset (not default, but THE charset) is UTF-16 (if BOM) or
> UTF-8 (if no BOM) and the "default" of iso-8859-1 in HTTP and
> US-ASCII in mail is explicitly overridden ...

I think this proposed change of the definition of media type
"text/xml" is a good idea, because it reflects the expectations of
users (at least the users I know).

We are developing an XML-database that gets input via HTTP.
In a previous release we implemented RFC 2376 correctly (for
media type text/xml we used the value of the charset parameter to
determine the encoding of input documents; if this parameter was
omitted we used the default "us-ascii").
It was not possible to "sell" this behavior to users. They
insisted that it is wrong to use "us-ascii", if the charset parameter
is omitted and the encoding declaration in the XML declaration 
specifies the correct encoding.
Unfortunately both Microsoft IE5 and Netscape Navigator do not 
send charset parameters at all when data is uploaded from HTML
forms. 

All the best,
Paul


-------------------------------------------------------------
Paul Langer               e-mail   Paul.Langer@softwareag.com
Software AG               Tel.     +49-6151-92-1912
Uhlandstr. 12             Fax      +49-6151-92-1613
64297 Darmstadt



Received: by ns.secondary.com (8.9.3/8.9.3) id SAA03521 for ietf-xml-mime-bks; Mon, 13 Mar 2000 18:49:00 -0800 (PST)
Received: from episteme-software.com (resnick1.qualcomm.com [63.250.90.98]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id SAA03507; Mon, 13 Mar 2000 18:48:56 -0800 (PST)
Received: from resnick2.qualcomm.com (63.250.90.99) by  episteme-software.com with ESMTP (Eudora Internet Mail Server 3.0b7); Mon, 13 Mar 2000 20:49:34 -0600
Mime-Version: 1.0
X-Sender: resnick (Unverified)
Message-Id: <a04311413b4f3567ec9a2@resnick2.qualcomm.com>
In-Reply-To: <25D0C66E6D25D311B2AC0008C7913EE0517D19@tdmail2.teledesic.com>
References: <25D0C66E6D25D311B2AC0008C7913EE0517D19@tdmail2.teledesic.com>
X-Mailer: Eudora [Macintosh version 4.3a?]
Date: Mon, 13 Mar 2000 20:49:32 -0600
To: Dan Kohn <dan@teledesic.com>
From: Pete Resnick <presnick@qualcomm.com>
Subject: RE: reason for application/iotp-xml (was RE: Registration of MIME med ia type APPLICATION/IOTP)
Cc: ned.freed@INNOSOFT.COM, Keith Moore <moore@cs.utk.edu>, ietf-types@iana.org, "Martin J. Duerst" <duerst@w3.org>, MURATA Makoto	 <muraw3c@attglobal.net>, "Donald E. Eastlake 3rd"	 <dee3@torque.pothole.com>, ietf-822@imc.org, "'ietf-xml-mime@imc.org'"	 <ietf-xml-mime@imc.org>
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Just to answer an old one:

On 3/11/00 at 12:34 PM -0800, Dan Kohn wrote:

>I appreciate Pete's concern here, and agree that in an ideal world, 
>this could be handled under Content-Disposition or some new sort of 
>Content-Structure header.   But my (perhaps mistaken) belief is that 
>most dispatching tools out there work off of MIME type, and do not 
>necessarily have access to arbitrary Content- headers.

This is true for legacy dispatching tools which don't look at other 
MIME headers, but in those cases, they have an easy solution:

1. Map application/iotp to an IOTP application.
2. Map application/iotp to an XML application.

If you are going to *modify* a dispatching tool to parse subtypes, I 
don't see why you can't modify it to grab a different Content-* field 
or a parameter on the Content-Type line.

>The point is that "-xml" has demonstratable benefit to a significant subset
>of the community, while Pete and Keith have complained about the lack of
>elegance of the solution.

It's not just elegance; I believe both Keith and I are worried about 
mucking up the works down the road.

pr
-- 
Pete Resnick <mailto:presnick@qualcomm.com>
Eudora Engineering - QUALCOMM Incorporated
Ph: (217)337-6377 or (858)651-4478, Fax: (858)651-1102


Received: by ns.secondary.com (8.9.3/8.9.3) id QAA28013 for ietf-xml-mime-bks; Mon, 13 Mar 2000 16:25:31 -0800 (PST)
Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id QAA28009 for <ietf-xml-mime@imc.org>; Mon, 13 Mar 2000 16:25:23 -0800 (PST)
Received: from w3.org (IDENT:root@localhost [127.0.0.1]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id TAA06635; Mon, 13 Mar 2000 19:26:09 -0500
Message-ID: <38CD4D1E.847976B1@w3.org>
Date: Mon, 13 Mar 2000 21:18:38 +0100
From: Chris Lilley <chris@w3.org>
Organization: W3C
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en,fr
MIME-Version: 1.0
To: "Martin J. Duerst" <duerst@w3.org>
CC: ietf-xml-mime@imc.org
Subject: Re: Some text that may be useful for the update of RFC 2376
References: <200003130251.LAA20677@sh.w3.mag.keio.ac.jp>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

"Martin J. Duerst" wrote:
> 
> I came up with this for a different purpose, but Dan Connolly
> suggested it might be added to an update of RFC 2376, as a
> quick overview:

Here is my suggested amendment, which removes dubious wiggle room and
weasel words and makes the result purely deterministic:

> - XML sent (e.g. mail, http) as text/xml (or equivalent, e.g. text/vnd.wap.wml):

as text/"anything" in other words

>   - Charset parameter is strongly recommended

Charset parameter is required if the charset is not UTF-8 or UTF-16

>   - If no charset parameter, default is ASCII. The default of iso-8859-1 in
>     HTTP is explicitly overridden in the specification of the charset
>     parameter in section 3.1 "Text/xml Registration" of RFC 2376
>     (http://www.ietf.org/rfc/rfc2376.txt)

The charset (not default, but THE charset) is UTF-16 (if BOM) or UTF-8 (if
no BOM) and the "default" of iso-8859-1 in HTTP and US-ASCII in mail is
explicitly overridden ...

>   - No error handling provisions
>   - An encoding declaration, if present, is irrelevant, but when saving a
>     received resource as a file, the correct encoding declaration should
>     be inserted.

shall be inserted. 

[if the application claims to save as XML rather than saving as a bunch of
stuff with pointy brackets. If it fails to do so, then the rules for static
storage explains what happens when the file is next parsed - WF error. ]

> - XML sent as application/xml (or equivalent):
>   - Charset parameter is strongly recommended, and if present,
>     it takes precedence.

Charset parameter is *disallowed*.

>   - If the charset parameter is omited, the rules for XML in static storage
>     are followed (see below).

The rules for XML in static storage are followed. Such files may be freely
saved to static storage without modification in all cases.

> - XML in static storage without external metainformation (e.g. file):
>   - Default is UTF-8, or UTF-16 if there is a BOM

For files without an explicit encoding declaration, the file is in UTF-16
if there is a BOM and UTF-8 if there is not.

>   - For other things, there has to be an encoding declaration
>   - There is some provision for 'error recovery'. What exactly this
>     means is currently under discussion in the XML Core WG, so that
>     it can  be clarified.

"Some provision"????

There is no provision for error recovery, and if a file does not parse for
whatever reason then it shall be a well formedness error.

--
Chris




Received: by ns.secondary.com (8.9.3/8.9.3) id HAA15564 for ietf-xml-mime-bks; Mon, 13 Mar 2000 07:00:22 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id HAA15560; Mon, 13 Mar 2000 07:00:20 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id JAA19690; Mon, 13 Mar 2000 09:59:40 -0500 (EST)
Message-Id: <200003131459.JAA19690@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: Graham Klyne <GK@dial.pipex.com>
cc: "Scott Lawrence" <lawrence@agranat.com>, "Dan Kohn" <dan@teledesic.com>, "'Keith Moore'" <moore@cs.utk.edu>, ietf-822@imc.org, ietf-xml-mime@imc.org, ietf-types@iana.org, "Martin J. Duerst" <duerst@w3.org>, "MURATA Makoto" <muraw3c@attglobal.net>, ned.freed@INNOSOFT.COM, "Donald E. Eastlake 3rd" <dee3@torque.pothole.com>
Subject: Re: reason for application/iotp-xml (was RE: Registration of MIME med ia type APPLICATION/IOTP) 
In-reply-to: Your message of "Mon, 13 Mar 2000 11:07:17 GMT." <4.2.2.20000313110355.00cb0290@pop.dial.pipex.com> 
Date: Mon, 13 Mar 2000 09:59:40 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> I could be wrong, but my take is that the top-level MIME type relates to 
> the media presentation/handling capabilities of the receiving system

it's actually both this and default handling.  we went through a lot
of this when agonizing over whether to accept model/ as a new top-level.

but IMHO media presentation takes precedence over default handling.
so it's more important for an xml-based audio type to be audio/foo
than xml/foo.


Received: by ns.secondary.com (8.9.3/8.9.3) id DAA08108 for ietf-xml-mime-bks; Mon, 13 Mar 2000 03:11:59 -0800 (PST)
Received: from msw.mimesweeper.com (msw.mimesweeper.com [194.168.90.18]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id DAA08088; Mon, 13 Mar 2000 03:11:34 -0800 (PST)
Received: from bell.mimesweeper.com (unverified) by msw.mimesweeper.com (Content Technologies SMTPRS 4.1.5) with ESMTP id <Tc2a85a12de4aec6c81d0@msw.mimesweeper.com>; Mon, 13 Mar 2000 11:15:14 +0000
Received: from GK-VAIO (gk-vaio.mimesweeper.com [194.168.90.137]) by bell.mimesweeper.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2448.0) id DKP3JRLR; Mon, 13 Mar 2000 11:13:52 -0000
Message-Id: <4.2.2.20000313110355.00cb0290@pop.dial.pipex.com>
X-Sender: maiw03@pop.dial.pipex.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Mon, 13 Mar 2000 11:07:17 +0000
To: "Scott Lawrence" <lawrence@agranat.com>
From: Graham Klyne <GK@dial.pipex.com>
Subject: RE: reason for application/iotp-xml (was RE: Registration of MIME med ia type APPLICATION/IOTP) 
Cc: "Dan Kohn" <dan@teledesic.com>, "'Keith Moore'" <moore@cs.utk.edu>, <ietf-822@imc.org>, <ietf-xml-mime@imc.org>, <ietf-types@iana.org>, "Martin J. Duerst" <duerst@w3.org>, "MURATA Makoto" <muraw3c@attglobal.net>, <ned.freed@INNOSOFT.COM>, "Donald E. Eastlake 3rd" <dee3@torque.pothole.com>
In-Reply-To: <000101bf8c47$c608dee0$954768c0@oyster.agranat.com>
References: <25D0C66E6D25D311B2AC0008C7913EE0517D06@tdmail2.teledesic.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 12:24 PM 3/12/00 -0500, Scott Lawrence wrote:
> > >why?  is there really much value in a default treatment
> > of text/* xml
> > >documents as plain text?  (the default doesn't seem to
> > work well in
> > >practice for text/html) or is xml really likely to be used for
> > >image/*, audio/*, video/*, or model/* content?
>
>I think that this actually argues in favor of 'xml/' as a top level.

I disagree.

I could be wrong, but my take is that the top-level MIME type relates to 
the media presentation/handling capabilities of the receiving system, and 
the sub-type relates to the content encoding/decoding details of a 
particular message.  I think XML falls into the second category for most 
purposes.

#g

------------
Graham Klyne
(GK@ACM.ORG)



Received: by ns.secondary.com (8.9.3/8.9.3) id SAA23712 for ietf-xml-mime-bks; Sun, 12 Mar 2000 18:51:09 -0800 (PST)
Received: from sh.w3.mag.keio.ac.jp (sh.w3.mag.keio.ac.jp [133.27.194.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id SAA23697 for <ietf-xml-mime@imc.org>; Sun, 12 Mar 2000 18:51:03 -0800 (PST)
Received: from enoshima (dhcp-100-224.mag.keio.ac.jp [133.27.195.224]) by sh.w3.mag.keio.ac.jp (8.9.3/3.7W) with SMTP id LAA20677 for <ietf-xml-mime@imc.org>; Mon, 13 Mar 2000 11:51:40 +0900 (JST)
Message-Id: <200003130251.LAA20677@sh.w3.mag.keio.ac.jp>
X-Sender: duerst@sh.w3.mag.keio.ac.jp
X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.3-J (32)
Date: Mon, 13 Mar 2000 11:45:17 +0900
To: ietf-xml-mime@imc.org
From: "Martin J. Duerst" <duerst@w3.org>
Subject: Some text that may be useful for the update of RFC 2376
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

I came up with this for a different purpose, but Dan Connolly
suggested it might be added to an update of RFC 2376, as a
quick overview:


-----
There are three basic situations:

- XML sent (e.g. mail, http) as text/xml (or equivalent, e.g. text/vnd.wap.wml):
  - Charset parameter is strongly recommended
  - If no charset parameter, default is ASCII. The default of iso-8859-1 in
    HTTP is explicitly overridden in the specification of the charset
    parameter in section 3.1 "Text/xml Registration" of RFC 2376
    (http://www.ietf.org/rfc/rfc2376.txt)
  - No error handling provisions
  - An encoding declaration, if present, is irrelevant, but when saving a
    received resource as a file, the correct encoding declaration should
    be inserted.

- XML sent as application/xml (or equivalent):
  - Charset parameter is strongly recommended, and if present,
    it takes precedence.
  - If the charset parameter is omited, the rules for XML in static storage
    are followed (see below).

- XML in static storage without external metainformation (e.g. file):
  - Default is UTF-8, or UTF-16 if there is a BOM
  - For other things, there has to be an encoding declaration
  - There is some provision for 'error recovery'. What exactly this
    means is currently under discussion in the XML Core WG, so that
    it can  be clarified.
-----


Regards,   Martin.


#-#-#  Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org/People/D%C3%BCrst


Received: by ns.secondary.com (8.9.3/8.9.3) id RAA18099 for ietf-xml-mime-bks; Sun, 12 Mar 2000 17:44:01 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id RAA18089; Sun, 12 Mar 2000 17:43:44 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id UAA00320; Sun, 12 Mar 2000 20:43:50 -0500 (EST)
Message-Id: <200003130143.UAA00320@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: "D. J. Bernstein" <djb@cr.yp.to>
cc: ietf-xml-mime@imc.org, ietf-822@imc.org, ietf-types@iana.org
Subject: Re: reason for application/iotp-xml 
In-reply-to: Your message of "13 Mar 2000 00:08:50 GMT." <20000313000850.24107.qmail@cr.yp.to> 
Date: Sun, 12 Mar 2000 20:43:50 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> When you have a universal video tool, and a universal XML browser, what
> do you do with video/blah-xml? (Or should that be xml/blah-video?)

good question.  offhand, I'd say video/ takes precedence for presentation 
purposes.  it's difficult to imagine any kind of generic XML tool
doing a good job at presentation for anything other than text objects...
and after all one reason that universal video, audio, and image presentation
tools exist is that it's nice to have the same user interface regardless
of subtype.

OTOH, an indexer for a search engine might give more precedence to the -xml 
suffix.

Keith


Received: by ns.secondary.com (8.9.3/8.9.3) id QAA16647 for ietf-xml-mime-bks; Sun, 12 Mar 2000 16:07:25 -0800 (PST)
Received: from muncher.math.uic.edu (muncher.math.uic.edu [131.193.178.181]) by ns.secondary.com (8.9.3/8.9.3) with SMTP id QAA16640 for <ietf-xml-mime@imc.org>; Sun, 12 Mar 2000 16:07:24 -0800 (PST)
Received: (qmail 18837 invoked by uid 1001); 13 Mar 2000 00:08:50 -0000
Date: 13 Mar 2000 00:08:50 -0000
Message-ID: <20000313000850.24107.qmail@cr.yp.to>
Mail-Followup-To: ietf-xml-mime@imc.org, ietf-822@imc.org, ietf-types@iana.org
From: "D. J. Bernstein" <djb@cr.yp.to>
To: ietf-xml-mime@imc.org, ietf-822@imc.org, ietf-types@iana.org
Subject: Re: reason for application/iotp-xml
References: <000101bf8c47$c608dee0$954768c0@oyster.agranat.com> <200003121716.MAA28458@astro.cs.utk.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Keith Moore writes:
> yes, but it does seem fairly common to have a "universal video tool" that
> understands most video formats

When you have a universal video tool, and a universal XML browser, what
do you do with video/blah-xml? (Or should that be xml/blah-video?)

---Dan


Received: by ns.secondary.com (8.9.3/8.9.3) id PAA16105 for ietf-xml-mime-bks; Sun, 12 Mar 2000 15:15:02 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id PAA16092; Sun, 12 Mar 2000 15:14:59 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.0-20 #35243) id <01JMY4IW5SLC00004F@MAUVE.INNOSOFT.COM>; Sun, 12 Mar 2000 15:15:41 -0800 (PST)
Date: Sun, 12 Mar 2000 15:14:36 -0800 (PST)
Subject: Re: reason for application/iotp-xml (was RE: Registration of MIME med ia type APPLICATION/IOTP)
In-reply-to: "Your message dated Sun, 12 Mar 2000 12:16:38 -0500" <200003121716.MAA28458@astro.cs.utk.edu>
To: Keith Moore <moore@cs.utk.edu>
Cc: Scott Lawrence <lawrence@agranat.com>, Dan Kohn <dan@teledesic.com>, "'Keith Moore'" <moore@cs.utk.edu>, ietf-822@imc.org, ietf-xml-mime@imc.org, ietf-types@iana.org, "Martin J. Duerst" <duerst@w3.org>, MURATA Makoto <muraw3c@attglobal.net>, ned.freed@INNOSOFT.COM, "Donald E. Eastlake 3rd" <dee3@torque.pothole.com>
Message-id: <01JMY7JTQBV000004F@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=us-ascii
References: <000101bf8c47$c608dee0$954768c0@oyster.agranat.com>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > the notion of something useful being possible with an arbitrary
> > 'video/foo' type that my application doesn't understand just doesn't
> > fly.

> yes, but it does seem fairly common to have a "universal video tool" that
> understands most video formats - perhaps even formats that the MIME
> reader doesn't know about.  having such a tool as the default handler
> for video/* allows your MIME reader to play video clips even when the
> specific video/ type isn't registered with it.  similarly for audio/*.

Exactly right. I'd also add "image" to the list.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id PAA15997 for ietf-xml-mime-bks; Sun, 12 Mar 2000 15:13:52 -0800 (PST)
Received: from mauve.innosoft.com (mauve.innosoft.com [192.160.253.247]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id PAA15985; Sun, 12 Mar 2000 15:13:35 -0800 (PST)
From: ned.freed@INNOSOFT.COM
Received: from MAUVE.INNOSOFT.COM by MAUVE.INNOSOFT.COM (PMDF V6.0-20 #35243) id <01JMY4IW5SLC00004F@MAUVE.INNOSOFT.COM>; Sun, 12 Mar 2000 15:14:16 -0800 (PST)
Date: Sun, 12 Mar 2000 15:11:20 -0800 (PST)
Subject: RE: reason for application/iotp-xml (was RE: Registration of MIME med ia type APPLICATION/IOTP)
In-reply-to: "Your message dated Sun, 12 Mar 2000 12:24:08 -0500" <000101bf8c47$c608dee0$954768c0@oyster.agranat.com>
To: Scott Lawrence <lawrence@agranat.com>
Cc: Dan Kohn <dan@teledesic.com>, "'Keith Moore'" <moore@cs.utk.edu>, ietf-822@imc.org, ietf-xml-mime@imc.org, ietf-types@iana.org, "Martin J. Duerst" <duerst@w3.org>, MURATA Makoto <muraw3c@attglobal.net>, ned.freed@INNOSOFT.COM, "Donald E. Eastlake 3rd" <dee3@torque.pothole.com>
Message-id: <01JMY7I2E80400004F@MAUVE.INNOSOFT.COM>
MIME-version: 1.0
Content-type: text/plain; charset=iso-8859-1
References: <25D0C66E6D25D311B2AC0008C7913EE0517D06@tdmail2.teledesic.com>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > >why?  is there really much value in a default treatment
> > of text/* xml
> > >documents as plain text?  (the default doesn't seem to
> > work well in
> > >practice for text/html) or is xml really likely to be used for
> > >image/*, audio/*, video/*, or model/* content?

> I think that this actually argues in favor of 'xml/' as a top level.
> It is certainly true that at least some level of processing can be
> done on any XML document by generic XML processors whether or not
> they understand anything at all about the rest of the type - that's
> one of the primary design goals of XML.  The 'text/' top level type
> is probably the _only_ other MIME type for which that is true - the
> notion of something useful being possible with an arbitrary
> 'video/foo' type that my application doesn't understand just doesn't
> fly.

Flies fine for me on a regular basis. I tend to use MIME stuff on two
systems, one which supports default top-level handling and the other
does not. In the case of the latter system, scarcely a month goes by
where I don't have to go through a tedious process of adding entries. In the
case of the former, it has been literally *years* since I've touched my
MIME settings.

				Ned


Received: by ns.secondary.com (8.9.3/8.9.3) id JAA11224 for ietf-xml-mime-bks; Sun, 12 Mar 2000 09:16:40 -0800 (PST)
Received: from astro.cs.utk.edu (ASTRO.CS.UTK.EDU [128.169.93.168]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id JAA11220; Sun, 12 Mar 2000 09:16:39 -0800 (PST)
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id MAA28458; Sun, 12 Mar 2000 12:16:39 -0500 (EST)
Message-Id: <200003121716.MAA28458@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: "Scott Lawrence" <lawrence@agranat.com>
cc: "Dan Kohn" <dan@teledesic.com>, "'Keith Moore'" <moore@cs.utk.edu>, ietf-822@imc.org, ietf-xml-mime@imc.org, ietf-types@iana.org, "Martin J. Duerst" <duerst@w3.org>, "MURATA Makoto" <muraw3c@attglobal.net>, ned.freed@INNOSOFT.COM, "Donald E. Eastlake 3rd" <dee3@torque.pothole.com>
Subject: Re: reason for application/iotp-xml (was RE: Registration of MIME med ia type APPLICATION/IOTP) 
In-reply-to: Your message of "Sun, 12 Mar 2000 12:24:08 EST." <000101bf8c47$c608dee0$954768c0@oyster.agranat.com> 
Date: Sun, 12 Mar 2000 12:16:38 -0500
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> the notion of something useful being possible with an arbitrary
> 'video/foo' type that my application doesn't understand just doesn't
> fly.

yes, but it does seem fairly common to have a "universal video tool" that
understands most video formats - perhaps even formats that the MIME
reader doesn't know about.  having such a tool as the default handler
for video/* allows your MIME reader to play video clips even when the
specific video/ type isn't registered with it.  similarly for audio/*.

presumably such tools would work with XML-based video and audio formats 
also, if such ever became popular.



Received: by ns.secondary.com (8.9.3/8.9.3) id IAA10686 for ietf-xml-mime-bks; Sun, 12 Mar 2000 08:25:50 -0800 (PST)
Received: from firewall.agranat.com (agranat.com [198.113.147.2]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id IAA10672; Sun, 12 Mar 2000 08:25:27 -0800 (PST)
Received: from agranat.com (alice.agranat.com [192.104.71.130]) by firewall.agranat.com (8.9.0/8.9.0) with ESMTP id LAA00874; Sun, 12 Mar 2000 11:25:04 -0500
Received: from oyster (oyster-ppp.agranat.com [192.104.71.180]) by agranat.com (8.8.5/8.8.5) with SMTP id LAA20324; Sun, 12 Mar 2000 11:24:59 -0500
From: "Scott Lawrence" <lawrence@agranat.com>
To: "Dan Kohn" <dan@teledesic.com>, "'Keith Moore'" <moore@cs.utk.edu>, <ietf-822@imc.org>, <ietf-xml-mime@imc.org>
Cc: <ietf-types@iana.org>, "Martin J. Duerst" <duerst@w3.org>, "MURATA Makoto" <muraw3c@attglobal.net>, <ned.freed@INNOSOFT.COM>, "Donald E. Eastlake 3rd" <dee3@torque.pothole.com>
Subject: RE: reason for application/iotp-xml (was RE: Registration of MIME med ia type APPLICATION/IOTP) 
Date: Sun, 12 Mar 2000 12:24:08 -0500
Message-ID: <000101bf8c47$c608dee0$954768c0@oyster.agranat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3
In-Reply-To: <25D0C66E6D25D311B2AC0008C7913EE0517D06@tdmail2.teledesic.com>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> >why?  is there really much value in a default treatment
> of text/* xml
> >documents as plain text?  (the default doesn't seem to
> work well in
> >practice for text/html) or is xml really likely to be used for
> >image/*, audio/*, video/*, or model/* content?

I think that this actually argues in favor of 'xml/' as a top level.
It is certainly true that at least some level of processing can be
done on any XML document by generic XML processors whether or not
they understand anything at all about the rest of the type - that's
one of the primary design goals of XML.  The 'text/' top level type
is probably the _only_ other MIME type for which that is true - the
notion of something useful being possible with an arbitrary
'video/foo' type that my application doesn't understand just doesn't
fly.

I don't like the idea of the '-xml' - it introduces a new piece of
structure that we don't need.  The top level type is there for
exactly this purpose, and XML fits it better than most other top
level types.

--
Scott Lawrence      Director of R & D        <lawrence@agranat.com>
Agranat Systems   Embedded Web Technology
http:/most/www.agranat.com/



Received: by ns.secondary.com (8.9.3/8.9.3) id BAA26519 for ietf-xml-mime-bks; Sun, 12 Mar 2000 01:39:05 -0800 (PST)
Received: from prserv.net (out1.prserv.net [32.97.166.31]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id BAA26459; Sun, 12 Mar 2000 01:38:00 -0800 (PST)
Received: from t3knz.attglobal.net ([210.88.161.55]) by prserv.net (out1) with SMTP id <2000031209385125202deo87e>; Sun, 12 Mar 2000 09:38:52 +0000
Message-Id: <200003120939.AA01864@t3knz.attglobal.net>
Date: Sun, 12 Mar 2000 18:39:31 +0900
To: ietf-822@imc.org, ietf-types@iana.org, ietf-xml-mime@imc.org
Subject: Re: reason for application/iotp-xml (was RE: Registration of  MIME med ia type APPLICATION/IOTP)
From: MURATA Makoto <muraw3c@attglobal.net>
In-Reply-To: <4.3.2.20000311172437.00b1edc0@mail.imc.org>
References: <4.3.2.20000311172437.00b1edc0@mail.imc.org>
MIME-Version: 1.0
X-Mailer: AL-Mail32 Version 1.10
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In message "Re: reason for application/iotp-xml (was RE: Registration of 
  MIME med ia type APPLICATION/IOTP)",
Tim Bray wrote...
 >There's also an argument from principle.  One of the core ideas of XML is to 
 >move past the notion that data should be encoded in a format proprietary 
 >to a product or application.  The idea is that the door should always be 
 >left open for future processing of data in ways that are unpredictable at 
 >the time of creation.  The generalized -xml convention certainly is 
 >consistent with that spirit.

Agreed.

XML is a meta language: XML-based formats can be defined by providing a 
tag-and-attribute inventory on top of XML.  XML is a self-descriptive 
language: any data in any XML-based format can be parsed and further 
processed without any knowledge of a particular tag-and-attribute inventory.
  
The whole point of XML is to make XML-generic processing possible and 
powerful so that development of each XML-based format is easy.  A large 
number of XML software tools (e.g., XML editors or browsers) are designed 
so that they can be used for any tag-and-attribute inventory.  Other than 
IE 5.0, we have XMLSpy, XML Notepad, and so forth.  Generic XML 
processing is the whole point of XML!

In message "Re: reason for application/iotp-xml (was RE: Registration of 
  MIME med ia type APPLICATION/IOTP)",
Paul Hoffman / IMC wrote...

 >And, just to get back to specifics, the author of the IOTP draft said the 
 >other day:
 >
 >>I'm polling the TRADE WG but it is my impression that there is enough
 >>implementation that people would prefer not to change.
 >
 >Not only can we wait for "the next one" on this topic, we have another 
 >example of a group that doesn't feel much inclined towards the utility of 
 >the application/foo-xml solution for their protocol.

Since this issue requires balance of so many things, I do not think that 
we have a thoroughly thougtout example yet.

In message "Re: reason for application/iotp-xml (was RE: Registration of 
  MIME med ia type APPLICATION/IOTP)",
Tim Bray wrote...
 >Having said that, I still have the feeling that if we do all agree that
 >this is a good effect to achieve, there has to be a better way than
 >this -xml convention.  Media types have been working well in a 2-part
 >structure for a long time, going to a 2-and-a-half parts smells funny.
 >But I haven't thought of anything better. -Tim

Nobody really likes the -xml convention.  But we have already investigated 
other proposals and none of them appear have been accepted.

Cheers,


----
MURATA Makoto  muraw3c@attglobal.net


Received: by ns.secondary.com (8.9.3/8.9.3) id VAA18526 for ietf-xml-mime-bks; Sat, 11 Mar 2000 21:05:10 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id VAA18519; Sat, 11 Mar 2000 21:05:06 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id VAA22784; Sat, 11 Mar 2000 21:06:05 -0800
Message-Id: <3.0.32.20000311210727.01501100@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Sat, 11 Mar 2000 21:07:30 -0800
To: Paul Hoffman / IMC <phoffman@imc.org>, ietf-822@imc.org
From: Tim Bray <tbray@textuality.com>
Subject: Re: reason for application/iotp-xml (was RE: Registration of   MIME med ia type APPLICATION/IOTP)
Cc: ietf-types@iana.org, ietf-xml-mime@imc.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 05:40 PM 3/11/00 -0800, Paul Hoffman / IMC wrote:
>>1. A web crawler building a full-text index
>>2. An XLink processor building a topological map a la Google
>>3. A rules-driven firewall filter deciding whether to let something
>>    through
>>4. A schema-driven datatype-checker
>
>Of these four, only (3) makes sense for IOTP. 

Well, I'm claiming the thesis behind XML is that you can't prejudge what
might or might not make sense tomorrow for the data objects you create
today.  Having said that, I'm prepared to believe that you're right
as regards IOTP.

>(3) is pretty psychedelic. Firewalls that look at MIME types but not the 
>content are so horribly insecure as to not need further discussion.

That's not it; one can envision a firewall that checks those data
objects known to be XML-encoded, whatever their primary media types,
for certain patterns of XML markup that are regarded as blessed or
unblessed.  In particular, given that there is likely to be XML
markup that has cross-application inclusion semantics, one can
envision firewalls disallowing such inclusions whatever the application.

>Not only can we wait for "the next one" on this topic, we have another 
>example of a group that doesn't feel much inclined towards the utility of 
>the application/foo-xml solution for their protocol.

I'm not trying to be difficult, but once again, one of the central
theses in the use of XML is explicitly to minimize the damage due to
the fact that very few groups, in designing a data format, bother to take 
the trouble to ensure that it might be reusable for unforeseen 
purposes at a later date.  Thus, the fact that this particular group
says they don't need this particular flag for their particular application
is not very relevant to the argument as to whether the -xml suffix is
a good idea.  -Tim



Received: by ns.secondary.com (8.9.3/8.9.3) id RAA07128 for ietf-xml-mime-bks; Sat, 11 Mar 2000 17:39:06 -0800 (PST)
Received: from laptop.imc.org (ip12.proper.com [165.227.249.12]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id RAA07116; Sat, 11 Mar 2000 17:38:43 -0800 (PST)
Message-Id: <4.3.2.20000311172437.00b1edc0@mail.imc.org>
X-Sender: phoffman@mail.imc.org
X-Mailer: QUALCOMM Windows Eudora Version 4.3
Date: Sat, 11 Mar 2000 17:40:02 -0800
To: ietf-822@imc.org
From: Paul Hoffman / IMC <phoffman@imc.org>
Subject: Re: reason for application/iotp-xml (was RE: Registration of  MIME med ia type APPLICATION/IOTP)
Cc: ietf-types@iana.org, ietf-xml-mime@imc.org
In-Reply-To: <3.0.32.20000311153722.014fe540@pop.intergate.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 03:37 PM 3/11/00 -0800, Tim Bray wrote:
>1. A web crawler building a full-text index
>2. An XLink processor building a topological map a la Google
>3. A rules-driven firewall filter deciding whether to let something
>    through
>4. A schema-driven datatype-checker

Of these four, only (3) makes sense for IOTP. And, even then, a sender or 
receiver of a IOTP object behind a firewall would have already talked to 
the net admin for the firewall to allow application/iotp through. Also, see 
below.

But, to stretch into the theoretical again, let's also look at calendar 
objects in XML. In the case of (1) and (2) and (4), the process doing the 
XML crawling and/or indexing *should* know about application/iotp. Even if 
it doesn't, and it is making the assumption that "there's XML out there 
that I don't know the MIME type of but I want to find it", it will likely 
pull down or look at all objects with MIME types that it doesn't know about 
(thereby skipping all the image/gif files) and then look in the file to see 
"is this XML"? The first time that the process comes across application/foo 
and discovers that it has XML in it, you can bet that the process would 
start to specifically look for application/foo.

(3) is pretty psychedelic. Firewalls that look at MIME types but not the 
content are so horribly insecure as to not need further discussion.

And, just to get back to specifics, the author of the IOTP draft said the 
other day:

>I'm polling the TRADE WG but it is my impression that there is enough
>implementation that people would prefer not to change.

Not only can we wait for "the next one" on this topic, we have another 
example of a group that doesn't feel much inclined towards the utility of 
the application/foo-xml solution for their protocol.

--Paul Hoffman, Director
--Internet Mail Consortium



Received: by ns.secondary.com (8.9.3/8.9.3) id PAA06128 for ietf-xml-mime-bks; Sat, 11 Mar 2000 15:35:29 -0800 (PST)
Received: from smtp.gatewaymail.net (IDENT:root@[207.34.179.250]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id PAA06123; Sat, 11 Mar 2000 15:35:27 -0800 (PST)
Received: from FRITZ (00-10-4b-22-27-db.bconnected.net [209.53.11.246]) by smtp.gatewaymail.net (8.9.3/8.9.3) with SMTP id PAA22511; Sat, 11 Mar 2000 15:36:03 -0800
Message-Id: <3.0.32.20000311153722.014fe540@pop.intergate.ca>
X-Sender: tbray@pop.intergate.ca
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Sat, 11 Mar 2000 15:37:26 -0800
To: Paul Hoffman / IMC <phoffman@imc.org>, ietf-822@imc.org
From: Tim Bray <tbray@textuality.com>
Subject: Re: reason for application/iotp-xml (was RE: Registration of  MIME med ia type APPLICATION/IOTP)
Cc: ietf-types@iana.org, ietf-xml-mime@imc.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 01:55 PM 3/11/00 -0800, Paul Hoffman / IMC wrote:
>IOTP is a trading protocol. It involves talking about money and 
...
>To date, we have not seen anything formatted in XML that would be useful to 
>automatically hand to an XML parser.

I also am troubled by the ad-hoc-ness of the -xml suffix; but unlike Paul,
I do see a potential benefit.  First of all, is there an existence proof
of an application that does generic XML processing (whatever that means)
of an XML vocabulary it doesn't understand, the example here being IOTP?
I have some:

1. A web crawler building a full-text index
2. An XLink processor building a topological map a la Google
3. A rules-driven firewall filter deciding whether to let something
   through
4. A schema-driven datatype-checker

There's also an argument from principle.  One of the core ideas of XML is to 
move past the notion that data should be encoded in a format proprietary 
to a product or application.  The idea is that the door should always be 
left open for future processing of data in ways that are unpredictable at 
the time of creation.  The generalized -xml convention certainly is 
consistent with that spirit.

Having said that, I still have the feeling that if we do all agree that
this is a good effect to achieve, there has to be a better way than
this -xml convention.  Media types have been working well in a 2-part
structure for a long time, going to a 2-and-a-half parts smells funny.
But I haven't thought of anything better. -Tim


Received: by ns.secondary.com (8.9.3/8.9.3) id OAA05803 for ietf-xml-mime-bks; Sat, 11 Mar 2000 14:46:14 -0800 (PST)
Received: from muncher.math.uic.edu (koobera.math.uic.edu [131.193.178.181]) by ns.secondary.com (8.9.3/8.9.3) with SMTP id OAA05795 for <ietf-xml-mime@imc.org>; Sat, 11 Mar 2000 14:46:12 -0800 (PST)
Received: (qmail 12596 invoked by uid 1001); 11 Mar 2000 22:47:23 -0000
Date: 11 Mar 2000 22:47:23 -0000
Message-ID: <20000311224723.28653.qmail@cr.yp.to>
Mail-Followup-To: ietf-xml-mime@imc.org, ietf-822@imc.org, ietf-types@iana.org
From: "D. J. Bernstein" <djb@cr.yp.to>
To: ietf-xml-mime@imc.org, ietf-822@imc.org, ietf-types@iana.org
Subject: Re: reason for application/iotp-xml
References: <25D0C66E6D25D311B2AC0008C7913EE0517D06@tdmail2.teledesic.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Dan Kohn writes:
> These subtypes need to be under
> existing top-level types in order to be dispatched correctly.

You're saying that application/iotp-xml can be dispatched correctly
while xml/iotp can't? Please explain.

---Dan


Received: by ns.secondary.com (8.9.3/8.9.3) id NAA05289 for ietf-xml-mime-bks; Sat, 11 Mar 2000 13:54:34 -0800 (PST)
Received: from laptop.imc.org (ip12.proper.com [165.227.249.12]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id NAA05275; Sat, 11 Mar 2000 13:54:21 -0800 (PST)
Message-Id: <4.3.2.20000311134222.00b27410@mail.imc.org>
X-Sender: phoffman@mail.imc.org
X-Mailer: QUALCOMM Windows Eudora Version 4.3
Date: Sat, 11 Mar 2000 13:55:39 -0800
To: ietf-822@imc.org
From: Paul Hoffman / IMC <phoffman@imc.org>
Subject: Re: reason for application/iotp-xml (was RE: Registration of MIME med ia type APPLICATION/IOTP)
Cc: ietf-types@iana.org, ietf-xml-mime@imc.org
In-Reply-To: <01JMWOONJT0G000N0L@MAUVE.INNOSOFT.COM>
References: <"Your message dated Sat, 11 Mar 2000 14:01:22 -0600" <a0431140bb4f0551706d1@resnick2.qualcomm.com> <200003110057.TAA16028@astro.cs.utk.edu> <200003110057.TAA16028@astro.cs.utk.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

(It's a bad sign when it takes more than 15 seconds to trim off the Cc: 
list...)

At 01:00 PM 3/11/00 -0800, ned.freed@innosoft.com wrote:
>>There are mechanisms in MIME to call out properties like this. Why
>>are we trying to embed this particular one in the name instead of
>>using the mechanisms that MIME provides?
>
>Because upon examination none of them seem even remotely correct for the task
>at hand. My conclusion is that if you want to do this, this is right way.

That's a big "if", particularly if you look at this application. Thank 
goodness we are finally talking about a real example of something that is 
formatted with XML, and are no longer postulating about "potential" 
applications.

IOTP is a trading protocol. It involves talking about money and 
descriptions and roles and promises. What possible value would there be to 
a recipient using a non-IOTP-aware program to have that program 
automatically hand the IOTP package to an XML parser? It's not like that 
automatic handoff will magically cause trade to happen when it wouldn't 
have before.

In the specific case of IOTP, the right thing for the receiving agent to do 
when it doesn't recognize application/iotp is to write the object out to 
disk and say "there is an object whose file name is <foo> waiting for you". 
The sender might even give a filename hint of "foo.xml". The recipient can 
then open that object up in whatever generic file opener they have.

To date, we have not seen anything formatted in XML that would be useful to 
automatically hand to an XML parser. If the parser knows about the type, it 
would have snagged it on receipt. All of these things would be just as 
useful being handed to a text viewer where the user can see "oh, this is 
XML and I have a tool that will format this nicely". (I imagine the same 
thing could be done for binary blobs to make a guess if you want to pass 
them through an ASN.1 dumper...)

I propose to leave the type application/iotp and to drop this discussion 
until there is a request for a real-world application where thre would be 
some advantage to the end user for application/foo-xml to be handed to an 
XML viewer directly instead of being written to disk.

--Paul Hoffman, Director
--Internet Mail Consortium



Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id MAA04686 for ietf-xml-mime-bks; Sat, 11 Mar 2000 12:43:57 -0800 (PST)
Received: from mgate-01.teledesic.com (mgate-01.teledesic.com [216.190.22.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id MAA04676; Sat, 11 Mar 2000 12:43:54 -0800 (PST)
Received: by mgate-01.teledesic.com with Internet Mail Service (5.5.2448.0) id <GV9RN19L>; Sat, 11 Mar 2000 12:41:00 -0800
Message-ID: <25D0C66E6D25D311B2AC0008C7913EE0517D19@tdmail2.teledesic.com>
From: Dan Kohn <dan@teledesic.com>
To: "'Pete Resnick'" <presnick@qualcomm.com>
Cc: ned.freed@INNOSOFT.COM, Keith Moore <moore@cs.utk.edu>, ietf-types@iana.org, "Martin J. Duerst" <duerst@w3.org>, MURATA Makoto <muraw3c@attglobal.net>, "Donald E. Eastlake 3rd" <dee3@torque.pothole.com>, ietf-822@imc.org, "'ietf-xml-mime@imc.org'" <ietf-xml-mime@imc.org>
Subject: RE: reason for application/iotp-xml (was RE: Registration of MIME med ia type APPLICATION/IOTP)
Date: Sat, 11 Mar 2000 12:34:03 -0800
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
Content-Type: text/plain; charset="windows-1252"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

I appreciate Pete's concern here, and agree that in an ideal world, this
could be handled under Content-Disposition or some new sort of
Content-Structure header.   But my (perhaps mistaken) belief is that most
dispatching tools out there work off of MIME type, and do not necessarily
have access to arbitrary Content- headers.

To take just one example of the utility of putting the information in the
content type, think of web crawlers associated with search engines.  Most
web crawlers look only for "text/*" documents for archiving.  With the
"-xml" suffix rule, all existing web crawlers can be easily modified to
search for "*/*-xml" as well, opening up to search a huge quantity of data
that would otherwise be unavailable.  (Obvious caveats apply, that if you
don't want your data searchable, use robot tags.
<http://www.fast.no/fast.php3?d=support_faqs&c=crawler&h=3>)  These crawlers
would require significant changes to parse the rest of the HTTP headers as
well, and more important, most web authors have no way of adding arbitrary
MIME headers to their sites.

The point is that "-xml" has demonstratable benefit to a significant subset
of the community, while Pete and Keith have complained about the lack of
elegance of the solution.

Ned Freed has shown that a simple rule of "-xml" comes last will allow
extensibility.  However, we haven't needed any extensibility of the last 10
years and the popularity and extensibility of XML itself will hopefully mean
that we won't need to have this debate again for a long time.

In lieu of a proposal for a more elegant solution that also provides
complete backward (and easy upward) compatibility, I think we should move
forward with Murata-san's draft.

		- dan
--
Daniel Kohn <mailto:dan@dankohn.com>
tel:+1-425-602-6222  fax:+1-425-602-6223
http://www.dankohn.com 

-----Original Message-----
From: Pete Resnick [mailto:presnick@qualcomm.com]
Sent: Saturday, 2000-03-11 12:01
To: Keith Moore
Cc: ned.freed@innosoft.com; Keith Moore; Dan Kohn; ietf-types@iana.org;
Martin J. Duerst; MURATA Makoto; Donald E. Eastlake 3rd;
ietf-822@imc.org
Subject: Re: reason for application/iotp-xml (was RE: Registration of
MIME med ia type APPLICATION/IOTP)


On 3/10/00 at 7:57 PM -0500, Keith Moore wrote:

>  > Again, I'm not especially supportive of the frob as I fail to see very
much
>>  utility in it. But I also don't oppose it. I see it as "mostly
harmless".
>
>here's the question: say someone else wants to define another frob,
>and it's orthogonal to xml.  does a type that uses both the new
>frob and the xml frob then become application/foo-xml-newfrob
>or application/foo-newfrob-xml?

Oy. The mind reels at what the MIME parsing code for this looks like.

I've really got to agree with Keith here; this is a mess and I oppose 
the direction it is taking. If XML-ishness needs to be called out as 
a property of a body part, it should be separated out as a parameter 
of some sort, or made a new field, not embedded as part of the name 
of a content-type. I know where in our code I can dispatch off of a 
parameter or a new field; that's easy. Dispatching off part of a name 
would be grotesque.

There are mechanisms in MIME to call out properties like this. Why 
are we trying to embed this particular one in the name instead of 
using the mechanisms that MIME provides?

pr
-- 
Pete Resnick <mailto:presnick@qualcomm.com>
Eudora Engineering - QUALCOMM Incorporated
Ph: (217)337-6377 or (858)651-4478, Fax: (858)651-1102


Received: (from majordomo@localhost) by ns.secondary.com (8.9.3/8.9.3) id OAA28191 for ietf-xml-mime-bks; Fri, 10 Mar 2000 14:19:45 -0800 (PST)
Received: from mgate-01.teledesic.com (mgate-01.teledesic.com [216.190.22.41]) by ns.secondary.com (8.9.3/8.9.3) with ESMTP id OAA28166; Fri, 10 Mar 2000 14:18:50 -0800 (PST)
Received: by mgate-01.teledesic.com with Internet Mail Service (5.5.2448.0) id <GV9RNDQ0>; Fri, 10 Mar 2000 14:15:45 -0800
Message-ID: <25D0C66E6D25D311B2AC0008C7913EE0517D06@tdmail2.teledesic.com>
From: Dan Kohn <dan@teledesic.com>
To: "'Keith Moore'" <moore@cs.utk.edu>, "'ietf-822@imc.org'" <ietf-822@imc.org>, "'ietf-xml-mime@imc.org'" <ietf-xml-mime@imc.org>
Cc: ietf-types@iana.org, "Martin J. Duerst" <duerst@w3.org>, MURATA Makoto <muraw3c@attglobal.net>, ned.freed@INNOSOFT.COM, "Donald E. Eastlake 3rd" <dee3@torque.pothole.com>
Subject: RE: reason for application/iotp-xml (was RE: Registration of MIME med ia type APPLICATION/IOTP) 
Date: Fri, 10 Mar 2000 14:08:57 -0800
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
Content-Type: text/plain; charset="iso-8859-1"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

I am cross-posting to ietf-822@imc.org because Keith correctly points out
that <http://www.normos.org/ietf/draft/draft-murata-xml-02.txt> raises
fundamental architectural questions regarding the right way to integrate XML
and MIME.  Those issues have been discussed on ietf-xml-mime@imc.org and I
believe a rough consensus was achieved, but the purpose of this thread is to
try to confirm that. 

>why?  is there really much value in a default treatment of text/* xml 
>documents as plain text?  (the default doesn't seem to work well in
>practice for text/html) or is xml really likely to be used for
>image/*, audio/*, video/*, or model/* content?

>how do you figure that?  such categorization is the primary purpose of 
>top-level types.  and while I grant that XML could be used to represent
>images, audio, or video...it does not seem well-suited as a presentation
>layer for these.

Keith, I think this may be the crux of the disagreement.  For the "-xml"
suffix to make sense, I think it must be likely that XML structure data will
show up in multiple MIME top-level types AND that that data could be
generically processed in a useful way.  Those two constraints would imply
that XML is acting fundamentally as the presentation layer in the network
stack (apologies for obligatory OSI network stack reference), and that that
presentation information should be available to dispatchers that can make
use if it.

The fact that almost all subtypes based on XML can be generically processed
is addressed in the I-D and excerpted in my message, which I attach below.

As to whether subtypes based on XML belong in different top-level type, I
believe the following examples demonstrate that XML-based data types will
most likely be registered in ALL top-level categories:

Application/*:
IOTP
<http://www.normos.org/ietf/draft/draft-ietf-trade-iotp-v1.0-protocol-07.txt
>
RDF <http://www.w3.org/RDF/>
MathML <http://www.w3.org/Math/>

Audio/*:
Nothing yet, but don't rule out VXML <http://www.vxml.org/> or a derivative
SMIL is also a possibility <http://www.w3.org/AudioVideo/>.  Also, note that
XML modularization <http://www.w3.org/TR/xhtml-modularization/> makes it
quite likely that new XML subtypes will be created that serve specific
needs, and encoding binary audio information is completely feasible.

Video/*:
SMIL and/or a derivative could be registered here or in Application/*
<http://www.w3.org/AudioVideo>

Model/*:
UML <http://www.omg.org/uml/> (may be under Application/*)
If VRML were being started from scratch today it would likely be built in
XML.  See <http://www.vrml.org/WorkingGroups/dbwork/vrmlxml.html> for some
integration discussion.

Image/*:
SVG <http://www.w3.org/Graphics/SVG/>

Text/*:
XHTML <http://www.w3.org/TR/xhtml1/>, though this is better under
Application/*

		- dan
--
Daniel Kohn <mailto:dan@dankohn.com>
tel:+1-425-602-6222  fax:+1-425-602-6223
http://www.dankohn.com 


-----Original Message-----
From: Keith Moore [mailto:moore@cs.utk.edu]
Sent: Friday, 2000-03-10 11:38
To: Dan Kohn
Cc: Keith Moore; ietf-types@iana.org; Martin J. Duerst; MURATA Makoto;
ned.freed@INNOSOFT.COM; Donald E. Eastlake 3rd
Subject: Re: reason for application/iotp-xml (was RE: Registration of
MIME med ia type APPLICATION/IOTP) 


> >okay - more generally, I don't think we should make a special case in 
> >the MIME content-type syntax (not even using an ad hoc mechanism)
> >for content-types that happen to be based on XML.  MIME already has
> >a mechanism to define default handling of certain classes of objects,
> >in the top-level content-type.  If there's really enough utility in
> >doing this for XML, we should create an xml/ top-level type.
> >We shouldn't create another separate syntactical convention to do this.
> 
> Keith, take two potential types, image/svg-xml and application/mathml-xml.
> In both cases, for dispatchers that don't understand or care about XML,
the
> "-xml" suffix is completely opaque.  

it may be opaque, but it still does harm if it constrains evolution of
the MIME content-type space.

> These subtypes need to be under
> existing top-level types in order to be dispatched correctly.  

why?  is there really much value in a default treatment of text/* xml 
documents as plain text?  (the default doesn't seem to work well in
practice for text/html) or is xml really likely to be used for
image/*, audio/*, video/*, or model/* content?

> A new XML
> top-level type doesn't provide any information of categorization, because
> XML-based MIME types can fall in any of the top level categories.

how do you figure that?  such categorication is the primary purpose of 
top-level types.  and while I grant that XML could be used to represent
images, audio, or video...it does not seem well-suited as a presentation
layer for these.

but this again begs the question about how much encoding information 
you want to make explicit in the content-type name.  and my main point
is that the answer is not obvious - embedding -xml in the name comes
at a cost.  we need to think carefully before either (a) establishing
yet another syntactic convention for content-type names, or (b) 
making XML explicit in a content-type name.  and this is is a MIME
architectural issue - it isn't something that should be done just because
XML proponents think it is a good idea.

if there is going to be such a convention, it needs to be carefully
considered both for its impact on the MIME architecture and whether
it's actually desirable to do this for XML.  (the latter is easier to
justify than the former)

> In summary, there is no downside to using the "-xml" suffix

disagree. see above.

Keith



-----Original Message-----
From: Dan Kohn 
Sent: Friday, 2000-03-10 11:19
To: Keith Moore
Cc: ietf-types@iana.org; Martin J. Duerst; MURATA Makoto;
ned.freed@INNOSOFT.COM; Donald E. Eastlake 3rd
Subject: reason for application/iotp-xml (was RE: Registration of MIME
media type APPLICATION/IOTP)


Keith Moore said:

>okay - more generally, I don't think we should make a special case in 
>the MIME content-type syntax (not even using an ad hoc mechanism)
>for content-types that happen to be based on XML.  MIME already has
>a mechanism to define default handling of certain classes of objects,
>in the top-level content-type.  If there's really enough utility in
>doing this for XML, we should create an xml/ top-level type.
>We shouldn't create another separate syntactical convention to do this.

Keith, take two potential types, image/svg-xml and application/mathml-xml.
In both cases, for dispatchers that don't understand or care about XML, the
"-xml" suffix is completely opaque.  These subtypes need to be under
existing top-level types in order to be dispatched correctly.  A new XML
top-level type doesn't provide any information of categorization, because
XML-based MIME types can fall in any of the top level categories.

However, for dispatchers that do understand XML, and especially for those
that haven't seen a specific type before (such as application/iotp-xml), the
"-xml" suffix provides additional information that the content could instead
be dispatched to an XML browser or processed generically in various ways.
In the IOTP example, this could be very useful in allowing someone not
configured to deal with the subtype to at least have the information
displayed in an accessible way.  Of course, if the dispatcher is configured
to deal with IOTP, then the "-xml" remains opaque.

In summary, there is no downside to using the "-xml" suffix, but significant
potential functionality can be gained.

To quote <http://www.normos.org/ietf/draft/draft-murata-xml-02.txt>:

   While the benefits of specific MIME types for particular types of
   XML documents are significant, all XML documents share common
   structures and syntax that make possible common processing.

   Some areas where 'generic' processing is useful include:

   o  Browsing - An XML browser can display any XML document with a
      provided CSS[12] or XSLT[19] style sheet, whatever the vocabulary
      of that document.

   o  Editing - Any XML editor can read, modify, and save any XML
      document.

   o  Fragment identification - XPointers[16] can work with any XML
      document, whatever vocabulary it uses and whether or not it uses
      XPointer for its own fragment identification. [Editor's note: the
      use of non-XPointer fragment identifiers by XML vocabularies like
      SVG and SMIL requires further discussion.]

   o  Hypertext Linking - XLink[17] hypertext linking is designed to
      connect any XML documents, regardless of vocabulary.

   o  Searching - Search engines, agents, and XML-oriented query tools
      should be able to read XML documents and extract the content and
      names of elements and attributes even if they are ignorant of the
      particular vocabulary used for elements and attributes.

   o  Storage - XML-oriented storage systems, which keep XML documents
      internally in a parsed form, should similarly be able to process,
      store, and recreate any XML document.

Also, note that Murata-san's draft has specific "opt-out" language if your
XML-based MIME type is not suitable for generic processing:

   XML-generic processing is not always appropriate for XML-based media
   types.  For example, some such media types may require fragment
   identifiers different from XPointer.  By *not* following the naming
   convention */*-xml, such media types can avoid XML-generic
   processing.

However, I would suggest that application/iotp-xml is a perfect illustration
of the utility of suffix-based naming.

		- dan

P.S.  Please use ietf-types@iana.org, which is the permanent address for
this list, rather than ietf-types@uninett.no.

--
Daniel Kohn <mailto:dan@dankohn.com>
tel:+1-425-602-6222  fax:+1-425-602-6223
http://www.dankohn.com 

