
Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8PKjlKP092906 for <ietf-xml-mime-bks@above.proper.com>; Thu, 25 Sep 2003 13:45:47 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8PKjlnp092905 for ietf-xml-mime-bks; Thu, 25 Sep 2003 13:45:47 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.dev.antarcti.ca (gt.antarcti.ca [209.17.183.233]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8PKjjKP092900 for <ietf-xml-mime@imc.org>; Thu, 25 Sep 2003 13:45:46 -0700 (PDT) (envelope-from tbray@textuality.com)
Received: from textuality.com (dev1.dev.antarcti.ca [10.1.1.8]) by mail.dev.antarcti.ca (Postfix) with ESMTP id 9C5D010322; Thu, 25 Sep 2003 13:45:42 -0700 (PDT)
Message-ID: <3F7353F6.5000308@textuality.com>
Date: Thu, 25 Sep 2003 13:45:42 -0700
From: Tim Bray <tbray@textuality.com>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030624
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Martin Duerst <duerst@w3.org>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: AddDefaultCharset considered harmful (Re: Requesting a  revision of RFC3023)
References: <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de> <20030918011830.E21F.MURATA@hokkaido.email.ne.jp> <3F689C27.50407@textuality.com> <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de> <4.2.0.58.J.20030925161926.060d38d0@localhost>
In-Reply-To: <4.2.0.58.J.20030925161926.060d38d0@localhost>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Martin Duerst wrote:

> I have filed a bug with Apache bugzilla, asking that the
> AddDefaultCharset setting in the httpd.conf file when shipped
> be removed/commented out, and the comment fixed, at
> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23421.

+1, if I can help by chiming in anywhere let me know.  BTW, I actually 
use this directive at 'ongoing' (I set it to UTF-8), it works perfectly 
for a website which is a dictatorship and everything is under the steely 
control of one person and one publishing program, but I'm sure that is a 
minority situation.  -Tim




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8PKZdKP092614 for <ietf-xml-mime-bks@above.proper.com>; Thu, 25 Sep 2003 13:35:39 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8PKZdoM092613 for ietf-xml-mime-bks; Thu, 25 Sep 2003 13:35:39 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from dr-nick.w3.org (dr-nick.w3.org [18.29.1.73]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8PKZaKP092608 for <ietf-xml-mime@imc.org>; Thu, 25 Sep 2003 13:35:37 -0700 (PDT) (envelope-from duerst@w3.org)
Received: from enoshima (homer.w3.org [18.29.0.30]) by dr-nick.w3.org (Postfix) with ESMTP id 0964C1376E; Thu, 25 Sep 2003 16:32:40 -0400 (EDT)
Message-Id: <4.2.0.58.J.20030925161926.060d38d0@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Thu, 25 Sep 2003 16:22:20 -0400
To: Tim Bray <tbray@textuality.com>
From: Martin Duerst <duerst@w3.org>
Subject: AddDefaultCharset considered harmful (Re: Requesting a revision of RFC3023)
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
In-Reply-To: <3F69DF5E.3000102@textuality.com>
References: <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de> <20030918011830.E21F.MURATA@hokkaido.email.ne.jp> <3F689C27.50407@textuality.com> <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

I have filed a bug with Apache bugzilla, asking that the
AddDefaultCharset setting in the httpd.conf file when shipped
be removed/commented out, and the comment fixed, at
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23421.

I have commented on a related bug that I found, at
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14513.

I'm copying this list because what we have discussed here is
related. In particular, the comment in httpd.conf file
seems to say some quite strange things:

 >>>>>>>>
#
# Specify a default charset for all pages sent out. This is
# always a good idea and opens the door for future internationalisation
# of your web site, should you ever want it. Specifying it as
# a default does little harm; as the standard dictates that a page
# is in iso-8859-1 (latin1) unless specified otherwise i.e. you
# are merely stating the obvious. There are also some security
# reasons in browsers, related to javascript and URL parsing
# which encourage you to always set a default char set.
#
AddDefaultCharset ISO-8859-1
 >>>>>>>


Regards,    Martin.


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8OKMBKP010837 for <ietf-xml-mime-bks@above.proper.com>; Wed, 24 Sep 2003 13:22:11 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8OKMB8x010836 for ietf-xml-mime-bks; Wed, 24 Sep 2003 13:22:11 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from smtpwest.webmethods.com (smtpwest.webmethods.com [63.119.28.34]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8OKMAKP010803 for <ietf-xml-mime@imc.org>; Wed, 24 Sep 2003 13:22:11 -0700 (PDT) (envelope-from aphillips@webmethods.com)
Received: from godzilla.activesw.com (Not Verified[172.16.0.14]) by smtpwest.webmethods.com with NetIQ MailMarshal (v5.5.4.17) id <BB003c1178>; Wed, 24 Sep 2003 13:22:07 -0700
Received: from aphillipsC640 (eng-9.activesw.com [172.21.6.9]) by godzilla.activesw.com (8.9.3/8.9.3) with SMTP id NAA05858; Wed, 24 Sep 2003 13:21:52 -0700 (PDT)
Reply-To: <aphillips@webmethods.com>
From: "Addison Phillips [wM]" <aphillips@webmethods.com>
To: "MURATA Makoto" <murata@hokkaido.email.ne.jp>
Cc: "WWW-Tag" <www-tag@w3.org>, <ietf-xml-mime@imc.org>
Subject: RE: Requesting a revision of RFC3023
Date: Wed, 24 Sep 2003 13:20:38 -0700
Message-ID: <PNEHIBAMBMLHDMJDDFLHMEHOHAAA.aphillips@webmethods.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0)
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
In-Reply-To: <20030924223233.091E.MURATA@hokkaido.email.ne.jp>
Importance: Normal
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Hi Makoto,

Thanks for your response! I'm happy to contribute as best I can.
>
> This issue was discussed when RFC 2376 was developed.  I recall
> that my then-co-author (E. Whitehead) proposed exactly the same
> thing, but
> that proposal was not agreed.

I think it will continue to keep coming back until the policy is changed.
Too many XML implementators are frustrated by it.
>
> In my understanding, MIME people in IETF would like to keep the charset
> parameter of text/* authoritative, since a number of mail
> programs rely only on
> the charset parameter for text/*.

I don't think it is a problem for the charset parameter *when it is present*
to be authoritative. I think the real problem here is that it is also
considered to be authoritative when it is NOT present--overriding other
potential means for identifying content encoding that are available. The
lack of information should not take precedence over positive information in
the content when the implementation can take advantage of said information.

That doesn't mean that there can't be an outlet for implementations to
choose not to interpret from the content, only that implementations that can
choose to interpret the content be allowed to do so.
>
> However, as people correctly pointed out, omission of the charset
> parameter of text/xml is typically caused by the fact that authors
> cannot change the configuration of WWW servers.  For this reason, W3C
> recommendations for CSS and HTML say something similar to your
> suggestion,
> but the IETF RFC for CSS does not say anything about the default.

Then each standard or RFC should be modified over time to include such
information, either directly or by reference to parent standards. For
example, SOAP does not include any statement whatsoever about the charset to
use in a SOAP message. XML 1.0 and the RFC for the media type contain that
information (and in the case of the media type, IIRC that is by reference to
RFC 3023). But certainly Ur-standards, like CSS, XML, HTML, and so on should
contain a default (and not leave unannounced values open to interpretation).

> The IETF RFC 2854 for HTML says the default is US-ASCII (MIME) or
> 8859-1 (HTTP), but also says that "the actual default is either a
> corporate character encoding or character encodings widely deployed in a
> certain national or regional community"

That last statement is quite sloppy, in my opinion. It basically says: the
default is ASCII/8859, except when it isn't (e.g. you cannot rely on there
being a default). This is basically akin to just saying that the content
should be introspected by the receiver or user agent as best as possible.

I think it is reasonable (if occasionally a problem) to preserve the
authority of the charset tag when it is present in the media type.

I think that when it is not present, it should be permissable to detect the
encoding from the content. This suggests that implementations which cannot
determine the charset when transmitting content should be discouraged from
emitting the charset parameter ("guessing").

Finally, I think that the various default encodings should be modified and
harmonized in a reasonable way.

Personally I think that references to US-ASCII as a default should be
replaced with UTF-8 wherever possible (which is a superset of US-ASCII).
Since UTF-8 is highly patterned, it is also highly detectable. It could be
followed by a reasonable fallback (possibly US-ASCII or ISO8859-1: at least
from 8859-1 you can reconstruct the bytes). The existing RFC3023 contains
the counter argument to this, the logic of which I don't dispute. I'm merely
suggesting that a change in policy here might be a better "move towards the
future".

So to reiterate, I think that the RFC should be modified to try and avoid
the deliberate loss of data (as when using US-ASCII with data that contains
non-US-ASCII sequences) where it is possible for the implementation to cope.
This move won't generate any new incompatibilities for existing
implementations that do not grope the content. I understand that MIME
applications in particular cannot always avoid this kind of data degradation
and that's okay with me, so long as it isn't "wrong" to detect *missing*
charset values.

I'll probably come to regret that paragraph ;-).

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility

432 Lakeside Drive, Sunnyvale, CA, USA
+1 408.962.5487 (office)  +1 408.210.3569 (mobile)
mailto:aphillips@webmethods.com

Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International/ws

Internationalization is an architecture.
It is not a feature.



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8OE3MKP036807 for <ietf-xml-mime-bks@above.proper.com>; Wed, 24 Sep 2003 07:03:22 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8OE3MH7036806 for ietf-xml-mime-bks; Wed, 24 Sep 2003 07:03:22 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail1.asahi-net.or.jp [202.224.39.197]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8OE3EKP036786 for <ietf-xml-mime@imc.org>; Wed, 24 Sep 2003 07:03:20 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (j083089.ppp.asahi-net.or.jp [61.213.83.89]) by mail.asahi-net.or.jp (Postfix) with ESMTP id 1ECD1B37A; Wed, 24 Sep 2003 23:02:04 +0900 (JST)
Date: Wed, 24 Sep 2003 22:59:23 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: "Addison Phillips [wM]" <aphillips@webmethods.com>
Subject: Re: Requesting a revision of RFC3023
Cc: "WWW-Tag" <www-tag@w3.org>, ietf-xml-mime@imc.org
In-Reply-To: <4.2.0.58.J.20030922164833.03c388b0@localhost>
References: <4.2.0.58.J.20030922164833.03c388b0@localhost>
Message-Id: <20030924223233.091E.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.06.02
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Addison,

Thanks for your contribution to this discussion.

> The problem with changing rfc3023 is that there are a number of
> implementations out there that adhere to the exact letter of the involved
> RFCs (3023/2045/2046/etc.). I seem to recall that there are implementations
> that require the charset parameter or which forceably filter the data to
> ASCII (converting all 8th-bit bytes to the '?' character) and thus there are
> many implementations that, to get the right results with these, forceably
> emit charset parameters.
> 
> Therefore, unless absolutely forbidden, implementations would still have to
> support the use of charset with both media types. And I don't see how we can
> forbid the use of the charset parameter given the need for need for
> interoperability with extant sensitive systems.

I think that this is a good reason to keep the charset parameter.

> <snip>
>        Conformant with [RFC2046], if a text/xml entity is received with
>        the charset parameter omitted, MIME processors and XML processors
>        MUST use the default charset value of "us-ascii"[ASCII].  In cases
>        where the XML MIME entity is transmitted via HTTP, the default
>        charset value is still "us-ascii".
> </snip>

This issue was discussed when RFC 2376 was developed.  I recall 
that my then-co-author (E. Whitehead) proposed exactly the same thing, but 
that proposal was not agreed.

In my understanding, MIME people in IETF would like to keep the charset 
parameter of text/* authoritative, since a number of mail programs rely only on 
the charset parameter for text/*.

However, as people correctly pointed out, omission of the charset 
parameter of text/xml is typically caused by the fact that authors 
cannot change the configuration of WWW servers.  For this reason, W3C 
recommendations for CSS and HTML say something similar to your suggestion, 
but the IETF RFC for CSS does not say anything about the default.  The 
IETF RFC 2854 for HTML says the default is US-ASCII (MIME) or 
8859-1 (HTTP), but also says that "the actual default is either a
corporate character encoding or character encodings widely deployed in a
certain national or regional community"


Cheers,

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8MKqHKP003881 for <ietf-xml-mime-bks@above.proper.com>; Mon, 22 Sep 2003 13:52:17 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8MKqHN8003880 for ietf-xml-mime-bks; Mon, 22 Sep 2003 13:52:17 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from dr-nick.w3.org (dr-nick.w3.org [18.29.1.73]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8MKqHKP003874 for <ietf-xml-mime@imc.org>; Mon, 22 Sep 2003 13:52:17 -0700 (PDT) (envelope-from duerst@w3.org)
Received: from enoshima (homer.w3.org [18.29.0.30]) by dr-nick.w3.org (Postfix) with ESMTP id 4627513B33; Mon, 22 Sep 2003 16:50:08 -0400 (EDT)
Message-Id: <4.2.0.58.J.20030922164833.03c388b0@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Mon, 22 Sep 2003 16:50:02 -0400
To: "WWW-Tag" <www-tag@w3.org>, ietf-xml-mime@imc.org
From: "Addison Phillips [wM]" <aphillips@webmethods.com>(by way of Martin Duerst <duerst@w3.org>)
Subject: Re: Requesting a revision of RFC3023
Cc: "Addison Phillips" <aphillips@webmethods.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Hi Martin, et al,

Having tracked the thread down and having read it I feel like I can
contribute something to it. This is a common and not very fun problem that
our implementations encounter frequently with XML documents transmitted to
our system by other products.

SOAP 1.2 recommends the use of application/soap+xml as the media type
(although it is not required, see section 7.1.4 of [SOAP12-PART2], it is
pretty close to a requirement for HTTP). Noah is correct that charset is
optional. In the absence of charset, the application/*xml types default to
the encoding embedded in the XML document itself, which I think is generally
seen to be the desirable way to go.

Various SOAP implementations less than 1.2 use various media types,
including text/xml, depending on the transport, etc.

The problem with changing rfc3023 is that there are a number of
implementations out there that adhere to the exact letter of the involved
RFCs (3023/2045/2046/etc.). I seem to recall that there are implementations
that require the charset parameter or which forceably filter the data to
ASCII (converting all 8th-bit bytes to the '?' character) and thus there are
many implementations that, to get the right results with these, forceably
emit charset parameters.

Therefore, unless absolutely forbidden, implementations would still have to
support the use of charset with both media types. And I don't see how we can
forbid the use of the charset parameter given the need for need for
interoperability with extant sensitive systems.

It would be nice if text/xml could be modified, since it is quite common to
get un-charset-labeled content that really is NOT US-ASCII. Since one can
always detect that a data stream is not US-ASCII, it has always seemed a bit
odd to me that the RFCs require the data to be destroyed when there is clear
evidence that one is losing something. I understand the reasoning, but I
think there is a difference between saying that omission of a charset
parameter invites data corruption (e.g. the MIME or XML processor is not
required to look at the XML content and thus MAY use US-ASCII to interpret
the data) and one that insists on it (e.g. the MIME or XML processor is
required to interpret the data using US-ASCII-7).

Perhaps we should focus on the semantics of charset not being present,
instead of focusing on forbidding/requiring charset itself. Consider this
paragraph of RFC3023:

<snip>
       Conformant with [RFC2046], if a text/xml entity is received with
       the charset parameter omitted, MIME processors and XML processors
       MUST use the default charset value of "us-ascii"[ASCII].  In cases
       where the XML MIME entity is transmitted via HTTP, the default
       charset value is still "us-ascii".
</snip>

This could be changed to something more friendly, like:

<snip>
       If a text/xml entity is received with
       the charset parameter omitted, MIME processors and XML processors
       MAY attempt to detect the charset from the XML content itself. Such
       detection MUST follow the requirements of section 4.3.3 of [XML].
       MIME and XML processors that do not attempt or are unable to detect
       the charset using this requirement must use US-ASCII (or UTF-8????)...
etc. and so forth...
</snip>

This allows receivers the leeway to detect errant senders (while leaving
errant senders of text/xml as non-conforming). This seems like a reasonable
compromise to me.

[SOAP12-PART2] http://www.w3.org/TR/2003/PR-soap12-part2-20030507

Just my two cents. Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility

432 Lakeside Drive, Sunnyvale, CA, USA
+1 408.962.5487 (office)  +1 408.210.3569 (mobile)
mailto:aphillips@webmethods.com

Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International/ws

Internationalization is an architecture.
It is not a feature.

 > -----Original Message-----
 > From: Martin Duerst [mailto:duerst@w3.org]
 > Sent: Friday, September 19, 2003 11:05 AM
 > To: Addison Phillips
 > Subject: Fwd: Re: Requesting a revision of RFC3023
 >
 >
 > Hello Addison,
 >
 > This is from two lists (ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>).
 > Re SOAP, I guess you might have some answer. If yes, can you send it to
 > those lists or to me for forwarding?
 >
 > Regards,    Martin.
 >
 >
 > >To: MURATA Makoto <murata@hokkaido.email.ne.jp>
 > >Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
 > >Subject: Re: Requesting a revision of RFC3023
 > >From: noah_mendelsohn@us.ibm.com
 > >Date: Fri, 19 Sep 2003 11:04:11 -0400
 > >Sender: owner-ietf-xml-mime@mail.imc.org
 > >List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
 > >List-ID: <ietf-xml-mime.imc.org>
 > >List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>
 >
 > >Murata Makoto writes:
 > >
 > > >> I believe that SOAP implementations use the
 > > >> charset parameter.  If we remove the charset
 > > >> parameter, we will make them non-conformant.
 > >
 > >This is not my area of expertise, but I note that the HTTP binding [1]
 > >provided by SOAP 1.2 Recommendation uses an application/soap+xml media
 > >type, a definition of which is at [2] (I believe it is working its way
 > >through the formal registration process.)  My reading is that the
 > >definition lists charset as optional, and makes clear that its proper use
 > >is to be found in RFC 3023.
 > >
 > >I am not aware of what typical implementations of SOAP 1.1 or
 > SOAP 1.2 are
 > >doing, but the 1.2 spec at least seems to list it as optional.
 > Again, I'm
 > >not expert in this stuff and am not offering an opinion, but I
 > thought the
 > >links might be helpful.
 > >
 > >[1]http://www.w3.org/TR/soap12-part2/#soapinhttp
 > >[2] http://www.w3.org/TR/soap12-part2/#ietf-draft
 > >
 > >------------------------------------------------------------------
 > >Noah Mendelsohn                              Voice: 1-617-693-4036
 > >IBM Corporation                                Fax: 1-617-693-8676
 > >One Rogers Street
 > >Cambridge, MA 02142
 > >------------------------------------------------------------------
 > >
 > >
 > >
 > >
 > >
 > >
 > >
 > >MURATA Makoto <murata@hokkaido.email.ne.jp>
 > >Sent by: www-tag-request@w3.org
 > >09/19/03 08:10 AM
 > >
 > >
 > >         To:     ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
 > >         cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
 > >         Subject:        Re: Requesting a revision of RFC3023
 > >
 > >
 > >
 > >
 > >On Fri, 19 Sep 2003 03:50:11 +0200
 > >Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
 > >
 > > > You want to change something that has been STRONGLY
 > RECOMMENDED for over
 > > > five years to (ideally) MUST NOT just because it could cause trouble
 > > > when used improperly or with broken implementations. Today I am good
 > > > with web standards if I use the charset parameter, tommorow I am bad
 > > > with web standards if I do. What's next on #W3C? Use tables for layout
 > > > because people could get CSS wrong and old browsers get some
 > CSS wrong?
 > > > I don't think this leads anywhere.
 > >
 > >I believe that SOAP implementations use the charset parameter.  If we
 > >remove the charset parameter, we will make them non-conformant.
 > >
 > >Cheers,
 > >
 > >--
 > >MURATA Makoto <murata@hokkaido.email.ne.jp>
 > >
 > >



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8MEsLKP087256 for <ietf-xml-mime-bks@above.proper.com>; Mon, 22 Sep 2003 07:54:21 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8MEsLXd087255 for ietf-xml-mime-bks; Mon, 22 Sep 2003 07:54:21 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from dr-nick.w3.org (dr-nick.w3.org [18.29.1.73]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8MEsKKP087250 for <ietf-xml-mime@imc.org>; Mon, 22 Sep 2003 07:54:20 -0700 (PDT) (envelope-from duerst@w3.org)
Received: from enoshima (homer.w3.org [18.29.0.30]) by dr-nick.w3.org (Postfix) with ESMTP id 1DC9E13A75; Mon, 22 Sep 2003 10:54:21 -0400 (EDT)
Message-Id: <4.2.0.58.J.20030922103858.053698c8@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Mon, 22 Sep 2003 10:45:38 -0400
To: MURATA Makoto <murata@hokkaido.email.ne.jp>, "Aaron Swartz" <me@aaronsw.com>
From: Martin Duerst <duerst@w3.org>
Subject: Re: Requesting a revision of RFC3023
Cc: "WWW-Tag" <www-tag@w3.org>, ietf-xml-mime@imc.org
In-Reply-To: <20030922205058.5084.MURATA@hokkaido.email.ne.jp>
References: <C38DC3DD-ECA9-11D7-BAAC-0003936780B2@aaronsw.com> <20030922090229.507C.MURATA@hokkaido.email.ne.jp> <C38DC3DD-ECA9-11D7-BAAC-0003936780B2@aaronsw.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 20:56 03/09/22 +0900, MURATA Makoto wrote:

>On 21 Sep 2003 22:06:30 -0500
>"Aaron Swartz" <me@aaronsw.com> wrote:
>
> > > I think that persuading users is more difficult than
> > > persuading programmers.  I have encouraged use of Unicode
> > > for XML in Japan, but nothing has happened.
> >
> > The users shouldn't have to know what a character encoding is! Their
> > software should just use UTF-8 automatically.
>
>Will users discard existing software, which supports legacy encogins, and
>existing data, which are in legacy encodings?  I am not saying Unicode
>everywhere is bad.  To the contrary, I think Unicode everywhere is better.
>But it is extremely unlikely.

New versions of software in many cases supports Unicode. As users
upgrade, they will be more and more able to use Unicode. XML made
a big step forward by defining UTF-8 and UTF-16 as default encodings
required to be understood by the parser. This is helping getting
software upgraded.

At the next step, some formats may decide to rely even more on Unicode
(such as N3). So we are moving in the right direction, slower than
we would like, but moving nonetheless. Data will move less quickly
than software, that has always been the case. In the short time,
many things seem extremely unlikely. But in the long run, we will
get closer.

Regards,  Martin.


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8MBx9KP081037 for <ietf-xml-mime-bks@above.proper.com>; Mon, 22 Sep 2003 04:59:09 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8MBx9Ix081036 for ietf-xml-mime-bks; Mon, 22 Sep 2003 04:59:09 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail1.asahi-net.or.jp [202.224.39.197]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8MBx8KP081031 for <ietf-xml-mime@imc.org>; Mon, 22 Sep 2003 04:59:08 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (j106204.ppp.asahi-net.or.jp [61.213.106.204]) by mail.asahi-net.or.jp (Postfix) with ESMTP id 7AB377AB7; Mon, 22 Sep 2003 20:58:40 +0900 (JST)
Date: Mon, 22 Sep 2003 20:56:02 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: "Aaron Swartz" <me@aaronsw.com>
Subject: Re: Requesting a revision of RFC3023
Cc: "WWW-Tag" <www-tag@w3.org>, ietf-xml-mime@imc.org
In-Reply-To: <C38DC3DD-ECA9-11D7-BAAC-0003936780B2@aaronsw.com>
References: <20030922090229.507C.MURATA@hokkaido.email.ne.jp> <C38DC3DD-ECA9-11D7-BAAC-0003936780B2@aaronsw.com>
Message-Id: <20030922205058.5084.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.06.02
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On 21 Sep 2003 22:06:30 -0500
"Aaron Swartz" <me@aaronsw.com> wrote:

> > I think that persuading users is more difficult than
> > persuading programmers.  I have encouraged use of Unicode
> > for XML in Japan, but nothing has happened.
> 
> The users shouldn't have to know what a character encoding is! Their 
> software should just use UTF-8 automatically.

Will users discard existing software, which supports legacy encogins, and 
existing data, which are in legacy encodings?  I am not saying Unicode 
everywhere is bad.  To the contrary, I think Unicode everywhere is better.  
But it is extremely unlikely.

Cheers,

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8M36YKP001692 for <ietf-xml-mime-bks@above.proper.com>; Sun, 21 Sep 2003 20:06:34 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8M36YWw001691 for ietf-xml-mime-bks; Sun, 21 Sep 2003 20:06:34 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from vorpal.notabug.com (qmailr@vorpal.notabug.com [63.149.73.20]) by above.proper.com (8.12.9/8.12.8) with SMTP id h8M36WKP001686 for <ietf-xml-mime@imc.org>; Sun, 21 Sep 2003 20:06:33 -0700 (PDT) (envelope-from me@aaronsw.com)
Received: (qmail 25093 invoked from network); 22 Sep 2003 03:06:35 -0000
Received: (ofmipd 12.207.74.63); 22 Sep 2003 03:06:13 -0000
Date: 21 Sep 2003 22:06:30 -0500
Message-Id: <C38DC3DD-ECA9-11D7-BAAC-0003936780B2@aaronsw.com>
From: "Aaron Swartz" <me@aaronsw.com>
To: "MURATA Makoto" <murata@hokkaido.email.ne.jp>
Cc: "WWW-Tag" <www-tag@w3.org>, ietf-xml-mime@imc.org
In-Reply-To: <20030922090229.507C.MURATA@hokkaido.email.ne.jp>
References: <20030921205754.506D.MURATA@hokkaido.email.ne.jp> <9D573628-EC4B-11D7-BAAC-0003936780B2@aaronsw.com> <20030922090229.507C.MURATA@hokkaido.email.ne.jp>
Mime-Version: 1.0 (Apple Message framework v591)
Content-Type: text/plain; charset=US-ASCII; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: Requesting a revision of RFC3023
X-Mailer: Apple Mail (2.591)
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

[re UTF-8 being the only long-term solution]
> I think that persuading users is more difficult than
> persuading programmers.  I have encouraged use of Unicode
> for XML in Japan, but nothing has happened.

The users shouldn't have to know what a character encoding is! Their 
software should just use UTF-8 automatically.

-- 
Aaron Swartz: http://www.aaronsw.com/



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8M0PpKP096262 for <ietf-xml-mime-bks@above.proper.com>; Sun, 21 Sep 2003 17:25:51 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8M0PpE0096261 for ietf-xml-mime-bks; Sun, 21 Sep 2003 17:25:51 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from serrano.hesketh.net (mail.hesketh.net [216.27.21.211]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8M0PoKP096255 for <ietf-xml-mime@imc.org>; Sun, 21 Sep 2003 17:25:50 -0700 (PDT) (envelope-from simonstl@simonstl.com)
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-xml-mime@imc.org
X-Originating-IP: [24.58.125.32]
Received: from 192.168.124.11 (syr-24-58-125-32.twcny.rr.com [24.58.125.32]) (authenticated bits=0) by serrano.hesketh.net (8.12.9p1/8.12.8) with ESMTP id h8M0PcWj004132; Sun, 21 Sep 2003 20:25:40 -0400
X-Spam-Filter: check_local@serrano.hesketh.net by digitalanswers.org
Date: Sun, 21 Sep 2003 20:25:45 -0400
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: Re: Requesting a revision of RFC3023
To: WWW-Tag <www-tag@w3.org>
cc: ietf-xml-mime@imc.org
X-Priority: 3
In-Reply-To: <20030922090229.507C.MURATA@hokkaido.email.ne.jp>
Message-ID: <r02000000-1026-4F221262EC9311D7B3530003937A08C2@[192.168.124.11]>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Mailer: Mailsmith 2.0 (Blindsider)
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

murata@hokkaido.email.ne.jp (MURATA Makoto) writes:
>> > I'm just saying UTF-8 everywhere is even  more unrealistic than any 
>> > other options at hand.
>> 
>> Too bad, because it's the only option that's remotely practical in
>> the long term. Do you really think every programmer who wants to
>> mung text is going to include code that supports not only the
>> hundreds of extant character encodings but also the seventeen kinds
>> of in-band and out-of-band declarations of them?
>
>I think that persuading users is more difficult than 
>persuading programmers.  I have encouraged use of Unicode 
>for XML in Japan, but nothing has happened.

Even in the US, when I talk to users about Unicode I get blank stares.
An awful lot of people have simply been told "UTF-8 is the same as
ASCII" and write XML in 100% ASCII, using character references or entity
references for anything outside of ASCII.  Heck, they don't even specify
encoding a lot of the time, just trusting blindly in ASCII magic.

The ASCII-derived inertia in this country may favor UTF-8, but I don't
think that inertia is sufficient cause to drive the whole world toward
UTF-8.  UTF-16 I could see as plausible, as more and more tools cope
with it, but that's a real change.  

I thought XML was wise to demand support for at least two encodings
(UTF-8 and UTF-16) and leave the door open for others.  In any case, I
don't think a revision of RFC 3023 is the place to attempt to make the
whole world use UTF-8, whether or not it's a good idea.

Simon St.Laurent
http://simonstl.com
http://monasticxml.org






Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8M0AoKP095822 for <ietf-xml-mime-bks@above.proper.com>; Sun, 21 Sep 2003 17:10:50 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8M0Aolu095821 for ietf-xml-mime-bks; Sun, 21 Sep 2003 17:10:50 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail2.asahi-net.or.jp [202.224.39.198]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8M0AlKP095807 for <ietf-xml-mime@imc.org>; Sun, 21 Sep 2003 17:10:48 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (i217217.ppp.asahi-net.or.jp [61.125.217.217]) by mail.asahi-net.or.jp (Postfix) with ESMTP id 469E46186; Mon, 22 Sep 2003 09:10:49 +0900 (JST)
Date: Mon, 22 Sep 2003 09:08:11 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: "WWW-Tag" <www-tag@w3.org>, ietf-xml-mime@imc.org
Subject: Re: Requesting a revision of RFC3023
In-Reply-To: <9D573628-EC4B-11D7-BAAC-0003936780B2@aaronsw.com>
References: <20030921205754.506D.MURATA@hokkaido.email.ne.jp> <9D573628-EC4B-11D7-BAAC-0003936780B2@aaronsw.com>
Message-Id: <20030922090229.507C.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.06.02
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > I'm just saying UTF-8 everywhere is even  more unrealistic than any 
> > other options at hand.
> 
> Too bad, because it's the only option that's remotely practical in the 
> long term. Do you really think every programmer who wants to mung text 
> is going to include code that supports not only the hundreds of extant 
> character encodings but also the seventeen kinds of in-band and 
> out-of-band declarations of them?

I think that persuading users is more difficult than 
persuading programmers.  I have encouraged use of Unicode 
for XML in Japan, but nothing has happened.

Cheers,

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8LFqcKP080575 for <ietf-xml-mime-bks@above.proper.com>; Sun, 21 Sep 2003 08:52:38 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8LFqcZf080574 for ietf-xml-mime-bks; Sun, 21 Sep 2003 08:52:38 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from vorpal.notabug.com (qmailr@vorpal.notabug.com [63.149.73.20]) by above.proper.com (8.12.9/8.12.8) with SMTP id h8LFqbKP080568 for <ietf-xml-mime@imc.org>; Sun, 21 Sep 2003 08:52:37 -0700 (PDT) (envelope-from me@aaronsw.com)
Received: (qmail 5989 invoked from network); 21 Sep 2003 15:52:38 -0000
Received: (ofmipd 12.207.74.63); 21 Sep 2003 15:52:16 -0000
Date: 21 Sep 2003 10:52:33 -0500
Message-Id: <9D573628-EC4B-11D7-BAAC-0003936780B2@aaronsw.com>
From: "Aaron Swartz" <me@aaronsw.com>
To: "MURATA Makoto" <murata@hokkaido.email.ne.jp>
Cc: "WWW-Tag" <www-tag@w3.org>, ietf-xml-mime@imc.org
In-Reply-To: <20030921205754.506D.MURATA@hokkaido.email.ne.jp>
References: <20030919211746.E240.MURATA@hokkaido.email.ne.jp> <3f804f54.1753359818@smtp.bjoern.hoehrmann.de> <20030921205754.506D.MURATA@hokkaido.email.ne.jp>
Mime-Version: 1.0 (Apple Message framework v591)
Content-Type: text/plain; charset=US-ASCII; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: Requesting a revision of RFC3023
X-Mailer: Apple Mail (2.591)
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> I'm just saying UTF-8 everywhere is even  more unrealistic than any 
> other options at hand.

Too bad, because it's the only option that's remotely practical in the 
long term. Do you really think every programmer who wants to mung text 
is going to include code that supports not only the hundreds of extant 
character encodings but also the seventeen kinds of in-band and 
out-of-band declarations of them?

Even if there was a easily-usable well-supported library to do all 
this, and you managed to get everyone to use it, the minor differences 
between encodings hidden by its abstractions would lead to subtle bugs 
that would continue to plague international users and force them to be 
second-class citizens forever.

If you want international users to be on the same level as those who 
just use plain ASCII, a drop-in solution is the only way to go, and 
UTF-8 is the obvious drop-in solution.

-- 
Aaron Swartz: http://www.aaronsw.com/

(For the purposes of this email, "international users" are users who 
need characters other than those in plain ASCII.)



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8LFnSKP080349 for <ietf-xml-mime-bks@above.proper.com>; Sun, 21 Sep 2003 08:49:28 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8LFnS4J080348 for ietf-xml-mime-bks; Sun, 21 Sep 2003 08:49:28 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail1.asahi-net.or.jp [202.224.39.197]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8LFnQKP080343 for <ietf-xml-mime@imc.org>; Sun, 21 Sep 2003 08:49:27 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (i217217.ppp.asahi-net.or.jp [61.125.217.217]) by mail.asahi-net.or.jp (Postfix) with ESMTP id 7FF8D76C6; Mon, 22 Sep 2003 00:49:27 +0900 (JST)
Date: Mon, 22 Sep 2003 00:46:49 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: noah_mendelsohn@us.ibm.com
Subject: Re: Requesting a revision of RFC3023
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
In-Reply-To: <OFF35186E2.2AF6FFA7-ON85256DA6.00528EF2@lotus.com>
References: <OFF35186E2.2AF6FFA7-ON85256DA6.00528EF2@lotus.com>
Message-Id: <20030922003738.5079.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.06.02
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On Fri, 19 Sep 2003 11:04:11 -0400
noah_mendelsohn@us.ibm.com wrote:

> I am not aware of what typical implementations of SOAP 1.1 or SOAP 1.2 are 
> doing, but the 1.2 spec at least seems to list it as optional.  Again, I'm 
> not expert in this stuff and am not offering an opinion, but I thought the 
> links might be helpful.

I googled three words "charset",  "soap", and "User-Agent" and found so many 
WWW pages.  Apparently, some implementations (including Apache Axis) use the charset 
parameter.

Cheers,

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8LFgSKP080138 for <ietf-xml-mime-bks@above.proper.com>; Sun, 21 Sep 2003 08:42:28 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8LFgSP0080137 for ietf-xml-mime-bks; Sun, 21 Sep 2003 08:42:28 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from vorpal.notabug.com (qmailr@vorpal.notabug.com [63.149.73.20]) by above.proper.com (8.12.9/8.12.8) with SMTP id h8LFgQKP080132 for <ietf-xml-mime@imc.org>; Sun, 21 Sep 2003 08:42:26 -0700 (PDT) (envelope-from me@aaronsw.com)
Received: (qmail 5696 invoked from network); 21 Sep 2003 15:42:27 -0000
Received: (ofmipd 12.207.74.63); 21 Sep 2003 15:42:05 -0000
Date: 21 Sep 2003 10:42:22 -0500
Message-Id: <3118DB85-EC4A-11D7-BAAC-0003936780B2@aaronsw.com>
From: "Aaron Swartz" <me@aaronsw.com>
To: "Francois Yergeau" <FYergeau@alis.com>
Cc: "WWW-Tag" <www-tag@w3.org>, ietf-xml-mime@imc.org
In-Reply-To: <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain>
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain>
Mime-Version: 1.0 (Apple Message framework v591)
Content-Type: text/plain; charset=US-ASCII; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: Requesting a revision of RFC3023
X-Mailer: Apple Mail (2.591)
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

>> Programming languages are broken as designed?
> In this respect, yes.  All programming languages should provide for 
> charset
> identification of their source files.  Alas, none do, AFAIK.

Python does; it uses an emacs convention:

     To define a source code encoding, a magic comment must
     be placed into the source files either as first or second
     line in the file:

           #!/usr/bin/python
           # -*- coding: <encoding name> -*-

http://www.python.org/peps/pep-0263.html

-- 
Aaron Swartz: http://www.aaronsw.com/



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8LFdvKP080064 for <ietf-xml-mime-bks@above.proper.com>; Sun, 21 Sep 2003 08:39:57 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8LFdvCV080063 for ietf-xml-mime-bks; Sun, 21 Sep 2003 08:39:57 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail1.asahi-net.or.jp [202.224.39.197]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8LFdtKP080058 for <ietf-xml-mime@imc.org>; Sun, 21 Sep 2003 08:39:56 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (i217217.ppp.asahi-net.or.jp [61.125.217.217]) by mail.asahi-net.or.jp (Postfix) with ESMTP id 751F376C6; Mon, 22 Sep 2003 00:39:56 +0900 (JST)
Date: Mon, 22 Sep 2003 00:37:18 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Subject: Re: Requesting a revision of RFC3023
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
In-Reply-To: <p06002001bb935fe56feb@[192.168.254.4]>
References: <20030921205754.506D.MURATA@hokkaido.email.ne.jp> <p06002001bb935fe56feb@[192.168.254.4]>
Message-Id: <20030922002118.5076.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.06.02
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> By Unicode signature, I'm guessing you mean the BOM? That problem 
> seems to have been easily dealt with by simply deciding to allow it 
> in UTF-8. It doesn't appear to have caused any problems in practice 
> today.

In the case of XML, I think you are right.  In general, however, see

http://www.ietf.org/internet-drafts/draft-yergeau-rfc2279bis-05.txt

> I don't know what you problems you refer to with "representation of 
> non-BMP characters". UTF-8 precisely specifies how these characters 
> are represented. There's no issue here. Did you mean something else?

Quite a few implementations use 6 bytes (rather than 4 bytes) to represent 
non-BMP characters.  See

http://www.unicode.org/reports/tr26/

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8LE8bKP076629 for <ietf-xml-mime-bks@above.proper.com>; Sun, 21 Sep 2003 07:08:37 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8LE8bge076628 for ietf-xml-mime-bks; Sun, 21 Sep 2003 07:08:37 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.speakeasy.net (mail15.speakeasy.net [216.254.0.215]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8LE8aKP076616 for <ietf-xml-mime@imc.org>; Sun, 21 Sep 2003 07:08:36 -0700 (PDT) (envelope-from elharo@metalab.unc.edu)
Received: (qmail 26832 invoked from network); 21 Sep 2003 14:08:37 -0000
Received: from unknown (HELO [192.168.254.4]) ([216.254.85.72]) (envelope-sender <elharo@metalab.unc.edu>) by mail15.speakeasy.net (qmail-ldap-1.03) with RC4-SHA encrypted SMTP for <murata@hokkaido.email.ne.jp>; 21 Sep 2003 14:08:37 -0000
Mime-Version: 1.0
X-Sender: elharo@mail.ibiblio.org
Message-Id: <p06002001bb935fe56feb@[192.168.254.4]>
In-Reply-To: <20030921205754.506D.MURATA@hokkaido.email.ne.jp>
References: <20030919211746.E240.MURATA@hokkaido.email.ne.jp> <3f804f54.1753359818@smtp.bjoern.hoehrmann.de> <20030921205754.506D.MURATA@hokkaido.email.ne.jp>
Date: Sun, 21 Sep 2003 10:04:36 -0400
To: MURATA Makoto <murata@hokkaido.email.ne.jp>, ietf-xml-mime@imc.org
From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Subject: Re: Requesting a revision of RFC3023
Cc: WWW-Tag <www-tag@w3.org>
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 9:08 PM +0900 9/21/03, MURATA Makoto wrote:

>UTF-8 has its own technical problems (the Unicode signature, representation
>of non-BMP characters, etc.).

By Unicode signature, I'm guessing you mean the BOM? That problem 
seems to have been easily dealt with by simply deciding to allow it 
in UTF-8. It doesn't appear to have caused any problems in practice 
today.

I don't know what you problems you refer to with "representation of 
non-BMP characters". UTF-8 precisely specifies how these characters 
are represented. There's no issue here. Did you mean something else?

-- 

   Elliotte Rusty Harold
   elharo@metalab.unc.edu
   Processing XML with Java (Addison-Wesley, 2002)
   http://www.cafeconleche.org/books/xmljava
   http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8LCAwKP072720 for <ietf-xml-mime-bks@above.proper.com>; Sun, 21 Sep 2003 05:10:58 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8LCAwI5072719 for ietf-xml-mime-bks; Sun, 21 Sep 2003 05:10:58 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail1.asahi-net.or.jp [202.224.39.197]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8LCAuKP072712 for <ietf-xml-mime@imc.org>; Sun, 21 Sep 2003 05:10:57 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (i217217.ppp.asahi-net.or.jp [61.125.217.217]) by mail.asahi-net.or.jp (Postfix) with ESMTP id 527F2733A; Sun, 21 Sep 2003 21:10:50 +0900 (JST)
Date: Sun, 21 Sep 2003 21:08:12 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: ietf-xml-mime@imc.org
Subject: Re: Requesting a revision of RFC3023
Cc: WWW-Tag <www-tag@w3.org>
In-Reply-To: <3f804f54.1753359818@smtp.bjoern.hoehrmann.de>
References: <20030919211746.E240.MURATA@hokkaido.email.ne.jp> <3f804f54.1753359818@smtp.bjoern.hoehrmann.de>
Message-Id: <20030921205754.506D.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.06.02
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On Fri, 19 Sep 2003 20:55:46 +0200
Bjoern Hoehrmann <derhoermi@gmx.net> wrote:

> UTF-8 everywhere is a reasonable principle and much simpler to
> understand and implement than any means to specify use of legacy
> encoding schemes. 

UTF-8 has its own technical problems (the Unicode signature, representation 
of non-BMP characters, etc.).  Moreover, people do not throw away legacy 
encodings but stick to them.  For example, although I think that 
Unicode is better than Shift-JIS and I do have Unicode-aware text editors, I 
still use Shift-JIS, which is so convenient at present.

I am not saying UTF-8 is bad.  I'm just saying UTF-8 everywhere is even 
more unrealistic than any other options at hand.

>For inbound encoding declarations, generic syntax
> does not work. Whatever syntax you choose, it will look odd in many
> formats and many authors won't use it anyway.

Please see my mail to Martin.

Cheers,

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8KK8qKP073945 for <ietf-xml-mime-bks@above.proper.com>; Sat, 20 Sep 2003 13:08:52 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8KK8qdE073944 for ietf-xml-mime-bks; Sat, 20 Sep 2003 13:08:52 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.dev.antarcti.ca (gt.antarcti.ca [209.17.183.233]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8KK8oKP073935 for <ietf-xml-mime@imc.org>; Sat, 20 Sep 2003 13:08:50 -0700 (PDT) (envelope-from tbray@textuality.com)
Received: from textuality.com (dev1.dev.antarcti.ca [10.1.1.8]) by mail.dev.antarcti.ca (Postfix) with ESMTP id 94E251032B; Sat, 20 Sep 2003 13:08:50 -0700 (PDT)
Message-ID: <3F6CB3D2.3060500@textuality.com>
Date: Sat, 20 Sep 2003 13:08:50 -0700
From: Tim Bray <tbray@textuality.com>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030624
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Bert Bos <bert@w3.org>
Cc: ietf-xml-mime@imc.org
Subject: Re: transcoding nearly certainly wrong?
References: <000a01c37e78$6326fa60$6401a8c0@MasinterT40> <16236.45366.407330.193217@lanalana.inria.fr>
In-Reply-To: <16236.45366.407330.193217@lanalana.inria.fr>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Bert Bos wrote:

> But if it doesn't work, you can look more closely at the MIME type and
> use your knowledge about foo. If foo is css or html, you can convert
> from any encoding to any encoding (and I do so often). If foo is
> something+xml, you can still do do it, as long as all element names
> and attribute names are in ASCII.

... and if you check the first few bytes and if they say, for example,

<?xml version='1.0' encoding='ISO-8859-1' ?>

then you also fix them.  -Tim




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8KJvrKP073418 for <ietf-xml-mime-bks@above.proper.com>; Sat, 20 Sep 2003 12:57:53 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8KJvrU7073417 for ietf-xml-mime-bks; Sat, 20 Sep 2003 12:57:53 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from lanalana.inria.fr (IDENT:root@lanalana.inria.fr [138.96.74.2]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8KJvpKP073412 for <ietf-xml-mime@imc.org>; Sat, 20 Sep 2003 12:57:52 -0700 (PDT) (envelope-from Bert.Bos@sophia.inria.fr)
Received: (from bbos@localhost) by lanalana.inria.fr (8.12.10/8.12.5) id h8KJvqME011588; Sat, 20 Sep 2003 21:57:52 +0200
Message-ID: <16236.45366.407330.193217@lanalana.inria.fr>
Date: Sat, 20 Sep 2003 21:57:42 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
In-Reply-To: <000a01c37e78$6326fa60$6401a8c0@MasinterT40>
References: <000a01c37e78$6326fa60$6401a8c0@MasinterT40>
X-Mailer: VM 7.14 under 21.4 (patch 13) "Rational FORTRAN" XEmacs Lucid
X-Status: 
X-Keywords: 
X-UID: 1814
From: Bert Bos <bert@w3.org>
To: ietf-xml-mime@imc.org
Subject: Re: transcoding nearly certainly wrong?
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Larry Masinter writes:
> 
> Why is transcoding nearly certain to be wrong with XML?

Can't be *that* wrong. I do it all the time...

> 
> Or, to put it another way, why not limit the use
> of text/xml to XML instances for which transcoding
> is certain not to be wrong, and for which US-ASCII
> is acceptable (because the XML uses numeric character
> references or character entities, is only used to
> code a limited schema with numeric data, etc.?)
> 
> Limiting the scope is less radical than deprecating.

I agree.

Every text/foo format can be transcoded from any encoding to UTF-8.
Something I do often. Going the other way doesn't always work, of
course.

But if it doesn't work, you can look more closely at the MIME type and
use your knowledge about foo. If foo is css or html, you can convert
from any encoding to any encoding (and I do so often). If foo is
something+xml, you can still do do it, as long as all element names
and attribute names are in ASCII. Which happens to be the case with
all XML-based formats created by W3C and all formats that I work with,
fortunately. And thus my conversion program doesn't even check, but
blindly converts every non-ASCII character to &#nnn;. Very useful,
since I create non-English files by cut & paste and my Emacs only
understands Latin-1.

So, for many XML-based formats, including all W3C's formats, using
text/foo instead of application/foo makes a lot of sense. Reserve
application/something+xml for formats that cannot be transcoded.



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos/                              W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8K0SpKP038724 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 17:28:51 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8K0SpK3038722 for ietf-xml-mime-bks; Fri, 19 Sep 2003 17:28:51 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail1.asahi-net.or.jp [202.224.39.197]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8K0SoKP038715 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 17:28:50 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (i217217.ppp.asahi-net.or.jp [61.125.217.217]) by mail.asahi-net.or.jp (Postfix) with ESMTP id 4AE5DAF2F; Sat, 20 Sep 2003 09:28:52 +0900 (JST)
Date: Sat, 20 Sep 2003 09:26:15 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
In-Reply-To: <4.2.0.58.J.20030919131656.04e3c5f0@localhost>
References: <20030919211746.E240.MURATA@hokkaido.email.ne.jp> <4.2.0.58.J.20030919131656.04e3c5f0@localhost>
Message-Id: <20030920090615.5061.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.06.02
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> Can you explain what in particular you like about the CSS
> mechanism? 

I like the CSS mechanism because it appears at the very beginning.

> Do you expect that all formats would use exactly
> the same character sequence, or only that they would use
> the same structure?

An optimistic answer to the first question is yes, although it may 
be more practical to provide a bit of flexibility (e.g.,   @
as well as #).  The structure and the semantics should be standardized.  
At present, everything is a hopeless mess.

Although I suggested this approach, I still feel uneasy about in-band 
encoding declarations.  I'm trying to generalize them only because 
I believe ad-hoc mechanisms are worse.

Cheers,

Makoto



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JLS3KP030322 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 14:28:03 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JLS3wK030321 for ietf-xml-mime-bks; Fri, 19 Sep 2003 14:28:03 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.gmx.net (imap.gmx.net [213.165.64.20]) by above.proper.com (8.12.9/8.12.8) with SMTP id h8JLS1KP030314 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 14:28:02 -0700 (PDT) (envelope-from derhoermi@gmx.net)
Received: (qmail 7842 invoked by uid 65534); 19 Sep 2003 21:27:58 -0000
Received: from pD903B3C2.dip.t-dialin.net (EHLO Voyager) (217.3.179.194) by mail.gmx.net (mp005) with SMTP; 19 Sep 2003 23:27:58 +0200
X-Authenticated: #723575
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: John Cowan <jcowan@reutershealth.com>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
Date: Fri, 19 Sep 2003 23:27:45 +0200
Message-ID: <3f937355.1762576521@smtp.bjoern.hoehrmann.de>
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain> <20030919161017.GE32762@skunk.reutershealth.com> <3f7e48c4.1751679261@smtp.bjoern.hoehrmann.de> <20030919192057.GQ32762@skunk.reutershealth.com> <3f8557f6.1755569315@smtp.bjoern.hoehrmann.de> <20030919210215.GT32762@skunk.reutershealth.com>
In-Reply-To: <20030919210215.GT32762@skunk.reutershealth.com>
X-Mailer: Forte Agent 1.92/32.572
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

* John Cowan wrote:
>> And by the way, some formats you mention are binary
>> and not text formats and could thus not make use of file system encoding
>> information.
>
>Which?  Programming languages, text markup languages, scripts, or
>mbox-type message archives?

Mbox is a good example, it is text-separated binary data, not text.


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JL3UKP028953 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 14:03:30 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JL3Tlk028952 for ietf-xml-mime-bks; Fri, 19 Sep 2003 14:03:29 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.reutershealth.com ([65.246.141.36]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JL3SKP028941 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 14:03:28 -0700 (PDT) (envelope-from jcowan@reutershealth.com)
Received: from skunk.reutershealth.com (mail [65.246.141.36]) by mail.reutershealth.com (Pro-8.9.3/Pro-8.9.3) with SMTP id QAA24253; Fri, 19 Sep 2003 16:58:53 -0400 (EDT)
Received: by skunk.reutershealth.com (sSMTP sendmail emulation); Fri, 19 Sep 2003 17:02:15 -0400
Date: Fri, 19 Sep 2003 17:02:15 -0400
From: John Cowan <jcowan@reutershealth.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
Message-ID: <20030919210215.GT32762@skunk.reutershealth.com>
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain> <20030919161017.GE32762@skunk.reutershealth.com> <3f7e48c4.1751679261@smtp.bjoern.hoehrmann.de> <20030919192057.GQ32762@skunk.reutershealth.com> <3f8557f6.1755569315@smtp.bjoern.hoehrmann.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3f8557f6.1755569315@smtp.bjoern.hoehrmann.de>
User-Agent: Mutt/1.4.1i
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Bjoern Hoehrmann scripsit:

> No. I said that I consider formats that leave implementations to guess
> how to process instances of the format broken as designed. I did not
> suggest to fix them.

Okay, since this judgment has no real-world consequences I can live with it.

> And by the way, some formats you mention are binary
> and not text formats and could thus not make use of file system encoding
> information.

Which?  Programming languages, text markup languages, scripts, or
mbox-type message archives?

-- 
I marvel at the creature: so secret and         John Cowan
so sly as he is, to come sporting in the pool   jcowan@reutershealth.com
before our very window.  Does he think that     http://www.reutershealth.com
Men sleep without watch all night?  --Faramir   http://www.ccil.org/~cowan


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JKU5KP027100 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 13:30:05 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JKU5LX027096 for ietf-xml-mime-bks; Fri, 19 Sep 2003 13:30:05 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from dr-nick.w3.org (dr-nick.w3.org [18.29.1.73]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JKTvKP027076 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 13:30:04 -0700 (PDT) (envelope-from duerst@w3.org)
Received: from enoshima (homer.w3.org [18.29.0.30]) by dr-nick.w3.org (Postfix) with ESMTP id F2939136E0; Fri, 19 Sep 2003 16:29:58 -0400 (EDT)
Message-Id: <4.2.0.58.J.20030917113411.06cf3af8@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Fri, 19 Sep 2003 16:29:55 -0400
To: Tim Bray <tbray@textuality.com>, ietf-xml-mime@imc.org
From: Martin Duerst <duerst@w3.org>
Subject: Re: Requesting a revision of RFC3023
In-Reply-To: <3F674D53.7080906@textuality.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Hello Tim, others,

At 10:50 03/09/16 -0700, Tim Bray wrote:

>[Some of you will get this twice, sorry; Larry Masinter pointed out that 
>my initial choice of destinations was poor.  I slightly revised the note 
>to provide more context.]
>
>The W3C TAG (http://www.w3.org/2001/tag/) has an open issue about proper 
>handling of MIME headers, with a draft in progress "Client Handling of 
>MIME Headers" (http://www.w3.org/2001/tag/doc/mime-respect.html);

Some comments on this in a separate mail.


>the draft finds some fault with the contents of RFC3023.
>
>I took an action item to ask about the chances of revising what 3023 says 
>about the charset parameter; while I'm not sure, I suspect that there may 
>actually be some level of consensus about the desirable changes:
>
>1. Deprecate text/* for anything that's in XML.

Agreed, in general.


>That's because it forces the provider to provide a charset header, because 
>in its absence the receiver is required to assume either ASCII or 8859 
>depending on the context,

Do you mean that it is US-ASCII for email and iso-8859-1 (there are many
parts of iso 8859) for HTTP? My understanding is that RFC 3023 says
that the US-ASCII default applies for all protocols.


>which has a very high probability of being wrong, which is irritating 
>because if there were no charset header the client would be certain of 
>either getting it right or failing deterministically.

That was part of the design of making US-ASCII the default, too,
similar to the design of XML in general: Either the charset parameter
is set (correctly), or it is missing, and then the parser has a very
easy job to detect the problem if the data is not US-ASCII.

On the other hand, there is in some cases no other way than to ask
a linguistic expert to detect a mistaken iso-8859-1 (e.g. for a correct
iso-8859-2).


>And forcing the server to provide a charset= is wrong; see the next point.
>
>2. Deprecate the charset parameter for application/xml and 
>application/*+xml.  I think that Roy Fielding would like to go far as to 
>simply outlaw it; I'd be fine with that too.

I think moving from the current wording which is very close to required
to a more optional wording is fine. However, I'm not sure about deprecating
it (which basically means that it is a bad idea but still tolerated for some
time), and I don't think outlawing is a good idea.


>The reason is that the client is almost certain to get it right, and will 
>fail deterministically if it doesn't.  For the server, on the other hand, 
>this is easy to get wrong, particularly with the introduction of various 
>kinds of filters in modern web servers.

It's not about client or server. It's about author or server, or
any other producer such as some software. The client (hopefully)
never does anything else than follow the spec.


>And since the Web architecture and the XML spec both say that the server's 
>claim has to be taken as authoritative, this is really highly 
>dysfunctional.  At the very least, it should be made clear that nobody 
>sending a media-type should send a charset for an XML media-type unless it 
>REALLY REALLY KNOWS what it's sending, and in that case should consider 
>not sending it anyhow.

I don't disagree with this. But I wonder why this would have to be
stressed that much. It's a REALLY REALLY bad idea to create or send
non-wellformed XML, it's a REALLY REALLY bad idea to send documents
with a wrong mime type, and so on. Yet our specs don't contain
"REALLY REALLY bad" very often.


>Is there any chance we could do this?  It's going to be kind of 
>embarrassing for TAG findings and the Webarch doc to be saying "don't do 
>what this RFC says".

It would definitely be good for things to be in sync. And I'm sure
we can find an adequate compromise.


Regards,    Martin.





Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JKU5KP027099 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 13:30:05 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JKU5ha027097 for ietf-xml-mime-bks; Fri, 19 Sep 2003 13:30:05 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from dr-nick.w3.org (dr-nick.w3.org [18.29.1.73]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JKTvKP027077 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 13:30:04 -0700 (PDT) (envelope-from duerst@w3.org)
Received: from enoshima (homer.w3.org [18.29.0.30]) by dr-nick.w3.org (Postfix) with ESMTP id 1221113638; Fri, 19 Sep 2003 16:29:59 -0400 (EDT)
Message-Id: <4.2.0.58.J.20030919141133.05106ee8@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Fri, 19 Sep 2003 16:16:45 -0400
To: WWW-Tag <www-tag@w3.org>
From: Martin Duerst <duerst@w3.org>
Subject: Comments on mime-respect
Cc: ietf-xml-mime@imc.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

These are my comments on http://www.w3.org/2001/tag/doc/mime-respect.html,
various issues mixed a bit, sorry.
[I have cross-posted ietf-xml-mime@imc.org because some of them are relevant
to the recent discussion about the charset paramenter on Content-Type.]

- Headings: Is this a completed finding, or a draft finding?

- "HTTP/1.1 a a response": word duplication

- Overall, it seems difficult to identify what is general architecture,
   and what is the way it is just because it is the way it (mostly) is.

- My understanding is that one origin of the 'charset' parameter was
   that it was useful to invoke different applications for different
   values. That was definitely the case 10 years or so ago when MIME
   was designed. I remember reading my email that way. This has gone away.
   It may happen that in a somewhat similar way, a lot of what we now
   see as different XML types, in need of different applications, may
   go away in a few years.

- Section 4: "The Unicode encoding of a message body (XML document) is
   inconsistent with the value of the charset parameter in the message
   headers."
   - Please replace 'Unicode encoding' with 'character encoding'.
     It would be strange to e.g. call iso-8859-1 an 'Unicode encoding'.
   - Please remove, or reword "XML document", to not give the impression
     that message bodies are always XML documents.
   - I'm not clear why this is in section 4, entitled "Why user agent
     behavior that misrepresents the user is harmful". This is a
     server problem, the user is not in any way misrepresented.

- The big problem with wrong encoding information for XML and other
   documents is not in a server-user context (where the user has
   to be able to read the document, such problems are usually
   discovered very quickly), but with XML sent between machines.
   This probably should be noted.

- The structure of sections 3 and 4 should be improved. It is good
   style to have an introductory paragraph or two before subsection.
   It is confusing to have a few paragraphs in the first subsection
   of the section after a lot of text that is not in subsections.

- "For this reason, servers should only supply a character encoding
    header when there is complete certainty as to the encoding in use.
    Otherwise, an error will cause a perfectly usable representation
    to be rejected by an architecturally sound client."

    Why doesn't the document say e.g. that a mime type should only be
    supplied when there is complete certainty that this type is
    appropriate? Why does this text assume that the XML is 'perfectly
    usable'? It might not be valid, it might be the wrong mime type,
    or it might not have the right 'encoding' attribute.

- "Servers which generate representations MUST NOT generate the charset
    parameter unless there is certainty that the headers are correct.
    When correct, this information can be used by non-XML processors
    to determine authoritatively the character encoding of the XML MIME
    entity."

    How is a server ever going to know, or going to be able to check,
    what the right character encoding is? Making this a requirement
    on the server itself seems inadequate.

- Section 5: "For instance, the http-equiv attribute of the HTML meta
   element is intended for servers (not clients)."
   Please change 'is' to 'was'. In particular with respect to character
   encoding, current practice is that it's used on the client. If you
   think that this should change, you should say so.

- SMIL 2.0 is "outmoded": I would prefer a different word here.
   I strongly agree that what SMIL 2.0 is saying on content types
   is a very bad idea, and I have said so to the SMIL WG (and more
   recently the Voice browser WG, I think). But given the 2001
   date, I don't think 'outmoded' is the right word, because it was
   never in fashion in the first place.

- Section 6: There is advice to server managers and authors. But
   I think we need to go one more step back, to server implementers
   and the default settings when servers are shipped.
   For example, some servers have an easy way to explore configurations
   and check settings. Others don't. Some servers come with default
   configurations that may be suboptimal. For example (not picking on
   it, just because that's the one I know), Apache at
   http://httpd.apache.org/docs-2.0/en/mod/core.html#adddefaultcharset
   says: "AddDefaultCharset On enables Apache's internal default charset
   of iso-8859-1 as required by the directive."
   Also, the default configuration file contains this:
    #
    # Specify a default charset for all pages sent out. This is
    # always a good idea and opens the door for future internationalisation
    # of your web site, should you ever want it. Specifying it as
    # a default does little harm; as the standard dictates that a page
    # is in iso-8859-1 (latin1) unless specified otherwise i.e. you
    # are merely stating the obvious. There are also some security
    # reasons in browsers, related to javascript and URL parsing
    # which encourage you to always set a default char set.
    #
    AddDefaultCharset ISO-8859-1

   This seems to be 180 degrees opposite to what the TAG is saying.
   It is more about text/html,... than about application/...+xml, but
   there is considerable potential for harm here, too, in particular
   when combined with the default setting that Apache comes with that
   does not allow people managing a directory to override file info.


Regards,     Martin.


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JJuUKP025102 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 12:56:30 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JJuU4L025101 for ietf-xml-mime-bks; Fri, 19 Sep 2003 12:56:30 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.gmx.net (pop.gmx.de [213.165.64.20]) by above.proper.com (8.12.9/8.12.8) with SMTP id h8JJuTKP025093 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 12:56:29 -0700 (PDT) (envelope-from derhoermi@gmx.net)
Received: (qmail 22941 invoked by uid 65534); 19 Sep 2003 19:56:25 -0000
Received: from pD903B35C.dip.t-dialin.net (EHLO Voyager) (217.3.179.92) by mail.gmx.net (mp006) with SMTP; 19 Sep 2003 21:56:25 +0200
X-Authenticated: #723575
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: John Cowan <jcowan@reutershealth.com>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
Date: Fri, 19 Sep 2003 21:56:14 +0200
Message-ID: <3f8557f6.1755569315@smtp.bjoern.hoehrmann.de>
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain> <20030919161017.GE32762@skunk.reutershealth.com> <3f7e48c4.1751679261@smtp.bjoern.hoehrmann.de> <20030919192057.GQ32762@skunk.reutershealth.com>
In-Reply-To: <20030919192057.GQ32762@skunk.reutershealth.com>
X-Mailer: Forte Agent 1.92/32.572
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

* John Cowan wrote:
>> [A]nd even if they did, this would cause interoperability
>> problems with file systems and protocols which do not provide such
>> means. If you transfer the document using FTP to your web server the
>> information is lost and the document will break.
>
>No worse than today's situation, and FTP could be enhanced or abandoned
>in favor of HTTP PUT.

RFC 1342 is eleven years old and I would still see replies containing
Bj<garbage>rn H<garbage>hrmann if I made use of its features.

>*You* are suggesting that every text file format that has ever existed --
>innumerable assembly languages, C, C++, Java, Fortran, Lisp, Scheme, Prolog,
>Perl, Python, Smalltalk, awk, sed, ... sh, csh, bash, zsh, ... mail archives,
>news archives, ... Tex, LaTex, nroff/troff, ... -- be revised to find someplace
>to stuff a charset indication, and then that every one of the billions of
>documents in each of those formats be changed to carry that information.

No. I said that I consider formats that leave implementations to guess
how to process instances of the format broken as designed. I did not
suggest to fix them. And by the way, some formats you mention are binary
and not text formats and could thus not make use of file system encoding
information.


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JJMBKP023128 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 12:22:11 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JJMBRa023127 for ietf-xml-mime-bks; Fri, 19 Sep 2003 12:22:11 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.reutershealth.com ([65.246.141.36]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JJMAKP023121 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 12:22:10 -0700 (PDT) (envelope-from jcowan@reutershealth.com)
Received: from skunk.reutershealth.com (mail [65.246.141.36]) by mail.reutershealth.com (Pro-8.9.3/Pro-8.9.3) with SMTP id PAA22104; Fri, 19 Sep 2003 15:17:34 -0400 (EDT)
Received: by skunk.reutershealth.com (sSMTP sendmail emulation); Fri, 19 Sep 2003 15:20:57 -0400
Date: Fri, 19 Sep 2003 15:20:57 -0400
From: John Cowan <jcowan@reutershealth.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
Message-ID: <20030919192057.GQ32762@skunk.reutershealth.com>
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain> <20030919161017.GE32762@skunk.reutershealth.com> <3f7e48c4.1751679261@smtp.bjoern.hoehrmann.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3f7e48c4.1751679261@smtp.bjoern.hoehrmann.de>
User-Agent: Mutt/1.4.1i
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Bjoern Hoehrmann scripsit:

> Impractical. File systems commonly do not support encoding such
> information

In fact most file systems support extended attributes today.

> [A]nd even if they did, this would cause interoperability
> problems with file systems and protocols which do not provide such
> means. If you transfer the document using FTP to your web server the
> information is lost and the document will break.

No worse than today's situation, and FTP could be enhanced or abandoned
in favor of HTTP PUT.

> Further, file system
> information is typically almost invisible to authors and would thus
> have the same problem as the charset parameter. If I edit a document
> in an XML unaware text editor, change the encoding declaration and
> some text nodes and save the file, file system and encoding declaration
> are likely to contradict each other and the document would break.

No worse than today's situation.

> You are basically suggesting to change all file systems and software
> that interacts with it and expect everyone to upgrade the software and
> the file system information of all documents.

*You* are suggesting that every text file format that has ever existed --
innumerable assembly languages, C, C++, Java, Fortran, Lisp, Scheme, Prolog,
Perl, Python, Smalltalk, awk, sed, ... sh, csh, bash, zsh, ... mail archives,
news archives, ... Tex, LaTex, nroff/troff, ... -- be revised to find someplace
to stuff a charset indication, and then that every one of the billions of
documents in each of those formats be changed to carry that information.

> If an applicable solution
> may go this far, you should rather suggest to outlaw all non-Unicode
> encodings, much simpler, more consistent and more interoperable. This
> would also work if the text is not stored in the file system but rather
> generated by software, something your solution does not consider.

Indeed, which is why Plan 9 sensibly makes everything UTF-8 and Windows NT/2K/XP
makes most things UTF-16, at least under the covers.

> >Otherwise, generic text-processing tools become impossible,
> 
> They are impossible today.

The impossible does not happen, but I usefully use generic text processing
tools every hour of every working day.

> They are not trying to read the format, they are trying to read byte
> streams as character streams. If they are trying to read the format,
> they have to support that format anyway, including mechanisms to
> determine the character encoding.

Not so.  If I want to process a Fortran 77 program as text (to find the
identifiers which occur only once, e.g.) then I can use generic tools
(tr, sort, uniq) and supply the character encoding out of band.  This is
annoying, but it works.  If the tools had to understand where backpatched
Fortran 77 text hides its in-band character encoding declaration, the
results would be as I describe: huge amounts of useless hair.

> If you consider HTTP a file system, it
> already implements your solution; all text is identified using text/*
> types and either the file system provides encoding information (charset
> parameter) or text processors are required to treat the document as
> ISO-8859-1 encoded. Text processors would actually only get character
> streams from the HTTP implementation and would not have to worry about
> character encodings and stuff. Does it work? No. 

It does not work because HTTP is layered over file systems which don't bother
to support the notion of encoding declarations persistently.

-- 
"We are lost, lost.  No name, no business, no Precious, nothing.  Only empty.
Only hungry: yes, we are hungry.  A few little fishes, nassty bony little
fishes, for a poor creature, and they say death.  So wise they are; so just,
so very just."  --Gollum        jcowan@reutershealth.com  www.ccil.org/~cowan


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JJ4kKP022192 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 12:04:46 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JJ4k9N022191 for ietf-xml-mime-bks; Fri, 19 Sep 2003 12:04:46 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from dr-nick.w3.org (dr-nick.w3.org [18.29.1.73]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JJ4jKP022185 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 12:04:45 -0700 (PDT) (envelope-from duerst@w3.org)
Received: from enoshima (homer.w3.org [18.29.0.30]) by dr-nick.w3.org (Postfix) with ESMTP id 1CA4E133A3; Fri, 19 Sep 2003 15:04:46 -0400 (EDT)
Message-Id: <4.2.0.58.J.20030919144223.04fbb920@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Fri, 19 Sep 2003 15:03:31 -0400
To: Tim Bray <tbray@textuality.com>
From: Martin Duerst <duerst@w3.org>
Subject: Re: Requesting a revision of RFC3023
Cc: John Cowan <jcowan@reutershealth.com>, Francois Yergeau <FYergeau@alis.com>, ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
In-Reply-To: <3F6B493B.9040104@textuality.com>
References: <4.2.0.58.J.20030919124030.04fbfd20@localhost> <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain> <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain> <4.2.0.58.J.20030919124030.04fbfd20@localhost>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 11:21 03/09/19 -0700, Tim Bray wrote:

>Martin Duerst wrote:
>
>>This will of course take some time. As one example, N3 just says
>>that it's in UTF-8. For other new formats, that may make sense,
>>too.
>
>In principle, I agree.  In practice, there's a problem, for example I 
>actually don't know how, in my favorite editor on Mac OS X,

What editor is that?


>to type in the u-umlaut in Martin's last name,

How to type it shouldn't depend on the editor, but should be
something the OS deals with. And for an average Mac application,
it actually does.

On the Mac, go to Control Pannel -> International -> Input Menu.
You then get a list of keyboards and other input methods.
One that is particularly useful for occasional input of characters
is the Character palette. As soon as you have selected more than
one keyboard/..., you get an additional menu item where you can
switch these.

The old resedit also had a way to inspect and change your keyboard
configurations, but I'm not sure this is still available.


Regards,    Martin.


>and how to force saving a file in UTF-8.  This isn't a problem for me 
>because I'm usually writing XML and so I just write D&uuml;rst.  So I 
>actually have trouble making N3 assertions about Martin.
>
>If we could make the whole world always use UTF-8 for everything life 
>would be so much simpler :)
>--
>Cheers, Tim Bray
>         (ongoing fragmented essay: http://www.tbray.org/ongoing/)
>



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JIu6KP021707 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 11:56:06 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JIu6r0021706 for ietf-xml-mime-bks; Fri, 19 Sep 2003 11:56:06 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.gmx.net (pop.gmx.de [213.165.64.20]) by above.proper.com (8.12.9/8.12.8) with SMTP id h8JIu4KP021696 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 11:56:05 -0700 (PDT) (envelope-from derhoermi@gmx.net)
Received: (qmail 7619 invoked by uid 65534); 19 Sep 2003 18:56:00 -0000
Received: from pD903B35C.dip.t-dialin.net (EHLO Voyager) (217.3.179.92) by mail.gmx.net (mp003) with SMTP; 19 Sep 2003 20:56:00 +0200
X-Authenticated: #723575
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: MURATA Makoto <murata@hokkaido.email.ne.jp>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
Date: Fri, 19 Sep 2003 20:55:46 +0200
Message-ID: <3f804f54.1753359818@smtp.bjoern.hoehrmann.de>
References: <3F69DF5E.3000102@textuality.com> <3f744d82.1687358032@smtp.bjoern.hoehrmann.de> <20030919211746.E240.MURATA@hokkaido.email.ne.jp>
In-Reply-To: <20030919211746.E240.MURATA@hokkaido.email.ne.jp>
X-Mailer: Forte Agent 1.92/32.572
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

* MURATA Makoto wrote:
>> Depends on the format. Formats should provide means to specify
>> the encoding, if they do not they are BAD, broken as designed.
>
>If we really promote in-band declarations, we should recommend 
>one mechanism for all textual formats.  I think that CSS style 
>declaration (@chaset) is the best I have seen.  If the TAG recommends 
>this mechanism for all formats except XML and HTML, that would 
>be one reasonable principle.

UTF-8 everywhere is a reasonable principle and much simpler to
understand and implement than any means to specify use of legacy
encoding schemes. For inbound encoding declarations, generic syntax
does not work. Whatever syntax you choose, it will look odd in many
formats and many authors won't use it anyway.


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JItxKP021695 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 11:55:59 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JItx3i021694 for ietf-xml-mime-bks; Fri, 19 Sep 2003 11:55:59 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.gmx.net (pop.gmx.net [213.165.64.20]) by above.proper.com (8.12.9/8.12.8) with SMTP id h8JItwKP021688 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 11:55:58 -0700 (PDT) (envelope-from derhoermi@gmx.net)
Received: (qmail 6982 invoked by uid 65534); 19 Sep 2003 18:55:52 -0000
Received: from pD903B35C.dip.t-dialin.net (EHLO Voyager) (217.3.179.92) by mail.gmx.net (mp003) with SMTP; 19 Sep 2003 20:55:52 +0200
X-Authenticated: #723575
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: John Cowan <jcowan@reutershealth.com>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
Date: Fri, 19 Sep 2003 20:55:39 +0200
Message-ID: <3f7e48c4.1751679261@smtp.bjoern.hoehrmann.de>
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain> <20030919161017.GE32762@skunk.reutershealth.com>
In-Reply-To: <20030919161017.GE32762@skunk.reutershealth.com>
X-Mailer: Forte Agent 1.92/32.572
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

* John Cowan wrote:
>Rather than having thousands of ad hoc mechanisms for encoding declarations
>in each of the thousands of text formats now extant, file systems should have
>a convenient mechanism for recording the encoding of each file, and character
>processing libraries should have convenient reading and writing operations that
>do the necessary conversions.

Impractical. File systems commonly do not support encoding such
information and even if they did, this would cause interoperability
problems with file systems and protocols which do not provide such
means. If you transfer the document using FTP to your web server the
information is lost and the document will break. Further, file system
information is typically almost invisible to authors and would thus
have the same problem as the charset parameter. If I edit a document
in an XML unaware text editor, change the encoding declaration and
some text nodes and save the file, file system and encoding declaration
are likely to contradict each other and the document would break.

You are basically suggesting to change all file systems and software
that interacts with it and expect everyone to upgrade the software and
the file system information of all documents. If an applicable solution
may go this far, you should rather suggest to outlaw all non-Unicode
encodings, much simpler, more consistent and more interoperable. This
would also work if the text is not stored in the file system but rather
generated by software, something your solution does not consider.

>Otherwise, generic text-processing tools become impossible,

They are impossible today.

>because each tool has to have a vast library that understands the
>mechanics of the encoding declaration specific to the format it is trying to
>read.

They are not trying to read the format, they are trying to read byte
streams as character streams. If they are trying to read the format,
they have to support that format anyway, including mechanisms to
determine the character encoding. If you consider HTTP a file system, it
already implements your solution; all text is identified using text/*
types and either the file system provides encoding information (charset
parameter) or text processors are required to treat the document as
ISO-8859-1 encoded. Text processors would actually only get character
streams from the HTTP implementation and would not have to worry about
character encodings and stuff. Does it work? No. Especially not because
W3C publishes Recommendations that make it impossible to write
conforming HTTP implementations. That way madness lies.


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JILnKP019679 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 11:21:49 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JILnOZ019678 for ietf-xml-mime-bks; Fri, 19 Sep 2003 11:21:49 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.dev.antarcti.ca (gt.antarcti.ca [209.17.183.233]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JILlKP019673 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 11:21:47 -0700 (PDT) (envelope-from tbray@textuality.com)
Received: from textuality.com (dev1.dev.antarcti.ca [10.1.1.8]) by mail.dev.antarcti.ca (Postfix) with ESMTP id D04C710382; Fri, 19 Sep 2003 11:21:48 -0700 (PDT)
Message-ID: <3F6B493B.9040104@textuality.com>
Date: Fri, 19 Sep 2003 11:21:47 -0700
From: Tim Bray <tbray@textuality.com>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030624
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Martin Duerst <duerst@w3.org>
Cc: John Cowan <jcowan@reutershealth.com>, Francois Yergeau <FYergeau@alis.com>, ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain> <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain> <4.2.0.58.J.20030919124030.04fbfd20@localhost>
In-Reply-To: <4.2.0.58.J.20030919124030.04fbfd20@localhost>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Martin Duerst wrote:

> This will of course take some time. As one example, N3 just says
> that it's in UTF-8. For other new formats, that may make sense,
> too.

In principle, I agree.  In practice, there's a problem, for example I 
actually don't know how, in my favorite editor on Mac OS X, to type in 
the u-umlaut in Martin's last name, and how to force saving a file in 
UTF-8.  This isn't a problem for me because I'm usually writing XML and 
so I just write D&uuml;rst.  So I actually have trouble making N3 
assertions about Martin.

If we could make the whole world always use UTF-8 for everything life 
would be so much simpler :)
-- 
Cheers, Tim Bray
         (ongoing fragmented essay: http://www.tbray.org/ongoing/)




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JIHcKP019434 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 11:17:38 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JIHcce019433 for ietf-xml-mime-bks; Fri, 19 Sep 2003 11:17:38 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.dev.antarcti.ca (gt.antarcti.ca [209.17.183.233]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JIHaKP019427 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 11:17:36 -0700 (PDT) (envelope-from tbray@textuality.com)
Received: from textuality.com (dev1.dev.antarcti.ca [10.1.1.8]) by mail.dev.antarcti.ca (Postfix) with ESMTP id DBB5F10382; Fri, 19 Sep 2003 11:17:37 -0700 (PDT)
Message-ID: <3F6B4840.9060106@textuality.com>
Date: Fri, 19 Sep 2003 11:17:36 -0700
From: Tim Bray <tbray@textuality.com>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030624
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Martin Duerst <duerst@w3.org>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
References: <3F69DF5E.3000102@textuality.com> <20030918011830.E21F.MURATA@hokkaido.email.ne.jp> <3F689C27.50407@textuality.com> <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de> <3F69DF5E.3000102@textuality.com> <4.2.0.58.J.20030919132600.05094c10@localhost>
In-Reply-To: <4.2.0.58.J.20030919132600.05094c10@localhost>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Martin Duerst wrote:

>> I can live with removing the STRONGLY RECOMMENDED status and an
>> informative note that you typically do not need to specifiy the
>> charset parameter but anything beyond that goes much too far.
> 
> I quite agree with this statement.

At the very least, you have to explain the harmful consequences of 
getting the charset wrong and the fact that in normal operational 
scenarios, if you leave it off you are less likely to have problems than 
if you provide it. -Tim




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JI5JKP018841 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 11:05:19 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JI5J5v018838 for ietf-xml-mime-bks; Fri, 19 Sep 2003 11:05:19 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from dr-nick.w3.org (dr-nick.w3.org [18.29.1.73]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JI5HKP018816 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 11:05:17 -0700 (PDT) (envelope-from duerst@w3.org)
Received: from enoshima (homer.w3.org [18.29.0.30]) by dr-nick.w3.org (Postfix) with ESMTP id 923A713A47; Fri, 19 Sep 2003 14:05:18 -0400 (EDT)
Message-Id: <4.2.0.58.J.20030919132600.05094c10@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Fri, 19 Sep 2003 13:40:35 -0400
To: Bjoern Hoehrmann <derhoermi@gmx.net>, Tim Bray <tbray@textuality.com>
From: Martin Duerst <duerst@w3.org>
Subject: Re: Requesting a revision of RFC3023
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
In-Reply-To: <3f744d82.1687358032@smtp.bjoern.hoehrmann.de>
References: <3F69DF5E.3000102@textuality.com> <20030918011830.E21F.MURATA@hokkaido.email.ne.jp> <3F689C27.50407@textuality.com> <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de> <3F69DF5E.3000102@textuality.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 03:50 03/09/19 +0200, Bjoern Hoehrmann wrote:

>You want to change something that has been STRONGLY RECOMMENDED for over
>five years to (ideally) MUST NOT just because it could cause trouble
>when used improperly or with broken implementations. Today I am good
>with web standards if I use the charset parameter, tommorow I am bad
>with web standards if I do. What's next on #W3C? Use tables for layout
>because people could get CSS wrong and old browsers get some CSS wrong?
>I don't think this leads anywhere.
>
>The charset parameter is useful if you cannot or do not want to use an
>encoding declaration,

Yes. One particular example that came up recently is the case of IE
going into quirks mode when seeing an XML declaration on an XHTML file.
I guess we can assume that those sites serving different content types
based on browser type can somehow set the charset parameter.


>for content negotiation, for view source
>functionality, if you perform protocol operations that change the
>encoding without changing the document or if you have to deal with
>legacy applications that could break your document if no charset
>parameter is present.

I'd also want to mention server technology that links the 'charset'
parameter with the actual encoding. For example, for Java servlets,
saying
     resource.setContentType ("type/foo;charset=encoding");
will not only produce the relevant header, it will also make
sure that the right conversion (from the internal UTF-16 to
the specified encoding) happens. It would be a bad idea
to disallow this because it works.


>I admit that there is probably no strong enough
>use case to introduce it, but we have the parameter already and it has
>been STRONGLY RECOMMENDED for ages across various W3C technologies.
>
>I can live with removing the STRONGLY RECOMMENDED status and an
>informative note that you typically do not need to specifiy the
>charset parameter but anything beyond that goes much too far.

I quite agree with this statement.

Regards,    Martin.



> >To put it another way, quoting Larry Wall: "An XML document knows what
> >encoding it's in."
>
><http://www.w3.org/People/Bos/DesignGuide/stability>:
>
>   ...
>   Having to re-learn how to do something is costly, creating new
>   programs to do the same thing in a different way is costly, and
>   converting existing documents and other resources to a different
>   format is also costly, so changes with little or no benefit should
>   be avoided.
>   ...



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JI5JKP018842 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 11:05:19 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JI5JE2018832 for ietf-xml-mime-bks; Fri, 19 Sep 2003 11:05:19 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from dr-nick.w3.org (dr-nick.w3.org [18.29.1.73]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JI5HKP018815 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 11:05:17 -0700 (PDT) (envelope-from duerst@w3.org)
Received: from enoshima (homer.w3.org [18.29.0.30]) by dr-nick.w3.org (Postfix) with ESMTP id 838C013919; Fri, 19 Sep 2003 14:05:18 -0400 (EDT)
Message-Id: <4.2.0.58.J.20030919131656.04e3c5f0@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Fri, 19 Sep 2003 13:19:46 -0400
To: MURATA Makoto <murata@hokkaido.email.ne.jp>, ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
From: Martin Duerst <duerst@w3.org>
Subject: Re: Requesting a revision of RFC3023
In-Reply-To: <20030919211746.E240.MURATA@hokkaido.email.ne.jp>
References: <3f744d82.1687358032@smtp.bjoern.hoehrmann.de> <3F69DF5E.3000102@textuality.com> <3f744d82.1687358032@smtp.bjoern.hoehrmann.de>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 21:25 03/09/19 +0900, MURATA Makoto wrote:


>On Fri, 19 Sep 2003 03:50:11 +0200
>Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
>
> > Depends on the format. Formats should provide means to specify
> > the encoding, if they do not they are BAD, broken as designed.
>
>If we really promote in-band declarations, we should recommend
>one mechanism for all textual formats.  I think that CSS style
>declaration (@chaset) is the best I have seen.

Can you explain what in particular you like about the CSS
mechanism? Do you expect that all formats would use exactly
the same character sequence, or only that they would use
the same structure?

As an example, should C/C++ use '@charset', or should they
use a more C-like '#charset'?

Regards,    Martin.



>If the TAG recommends
>this mechanism for all formats except XML and HTML, that would
>be one reasonable principle.
>
>Cheers,
>
>--
>MURATA Makoto <murata@hokkaido.email.ne.jp>



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JI5JKP018833 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 11:05:19 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JI5IDm018830 for ietf-xml-mime-bks; Fri, 19 Sep 2003 11:05:18 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from dr-nick.w3.org (dr-nick.w3.org [18.29.1.73]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JI5HKP018817 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 11:05:17 -0700 (PDT) (envelope-from duerst@w3.org)
Received: from enoshima (homer.w3.org [18.29.0.30]) by dr-nick.w3.org (Postfix) with ESMTP id A4849142AA; Fri, 19 Sep 2003 14:05:18 -0400 (EDT)
Message-Id: <4.2.0.58.J.20030919134634.04e231b0@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Fri, 19 Sep 2003 13:53:08 -0400
To: Tim Bray <tbray@textuality.com>, Bjoern Hoehrmann <derhoermi@gmx.net>
From: Martin Duerst <duerst@w3.org>
Subject: Re: Requesting a revision of RFC3023
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
In-Reply-To: <3F69DF5E.3000102@textuality.com>
References: <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de> <20030918011830.E21F.MURATA@hokkaido.email.ne.jp> <3F689C27.50407@textuality.com> <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 09:37 03/09/18 -0700, Tim Bray wrote:

>The argument is precisely is that it is not in the slightest useful. 
>Please read appendix F to the XML specification.  Then please suggest a 
>plausible scenario in which an XML instance unaccompanied by a charset 
>parameter can cause breakage.  You'll have to work hard.

No, I think it's quite easy. User buys an English XML book.
XML book has examples with and without XML declarations (and those
examples with an encoding parameter will use any of 'US-ASCII',
'iso-8859-1', and 'UTF-8'). The book probably somewhere has a note
somewhere explaining what the encoding parameter means,...,
but that might not be featured prominently, because it's not usually
that important for English.

User copies/adapts examples. Uses a text editor, adds text
(e.g. Polish, or Chinese, or whatever). The XML file is broken.


>To put it another way, quoting Larry Wall: "An XML document knows what 
>encoding it's in."

In the above example, it pretends to know, but it is wrong.


Regards,    Martin.


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JHt9KP018315 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 10:55:09 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JHt999018314 for ietf-xml-mime-bks; Fri, 19 Sep 2003 10:55:09 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.dev.antarcti.ca (gt.antarcti.ca [209.17.183.233]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JHt7KP018306 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 10:55:07 -0700 (PDT) (envelope-from tbray@textuality.com)
Received: from textuality.com (dev1.dev.antarcti.ca [10.1.1.8]) by mail.dev.antarcti.ca (Postfix) with ESMTP id 520CF10382; Fri, 19 Sep 2003 10:55:04 -0700 (PDT)
Message-ID: <3F6B42F6.1040102@textuality.com>
Date: Fri, 19 Sep 2003 10:55:02 -0700
From: Tim Bray <tbray@textuality.com>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030624
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Larry Masinter <LMM@acm.org>
Cc: ietf-xml-mime@imc.org
Subject: Re: transcoding nearly certainly wrong?
References: <000a01c37e78$6326fa60$6401a8c0@MasinterT40>
In-Reply-To: <000a01c37e78$6326fa60$6401a8c0@MasinterT40>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Larry Masinter wrote:

> Why is transcoding nearly certain to be wrong with XML?

First, transcoding XML is technically difficult, because you can't just 
do a dumb byte-level job, you have to be careful to also take care of 
the BOM & encoding declaration.  Second, given the wide range of 
encodings supported by popular deployed XML software, it is rarely the 
case that transcoding offers any benefit.  Something that is technically 
difficult (i.e. easy to get wrong) and offers little practical benefit 
is nearly certain to be wrong.

> Or, to put it another way, why not limit the use
> of text/xml to XML instances for which transcoding
> is certain not to be wrong, and for which US-ASCII
> is acceptable (because the XML uses numeric character
> references or character entities, is only used to
> code a limited schema with numeric data, etc.?)

This is plausible I guess, except for if you avoid text/*, you just 
don't need to worry whether they used entities the right way or what 
kind of data it is.  Essentially, application/* with no charset is 
highly robust.  Using text/* or adding a charset reduces that robustness 
and is just bad practice.

> Limiting the scope is less radical than deprecating.

True, but practices that cause problems while offering no practical 
benefits should be deprecated.
-- 
Cheers, Tim Bray
         (ongoing fragmented essay: http://www.tbray.org/ongoing/)




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JGo6KP015089 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 09:50:06 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JGo6xS015088 for ietf-xml-mime-bks; Fri, 19 Sep 2003 09:50:06 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from dr-nick.w3.org (dr-nick.w3.org [18.29.1.73]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JGo5KP015082 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 09:50:05 -0700 (PDT) (envelope-from duerst@w3.org)
Received: from enoshima (homer.w3.org [18.29.0.30]) by dr-nick.w3.org (Postfix) with ESMTP id 838E713714; Fri, 19 Sep 2003 12:50:06 -0400 (EDT)
Message-Id: <4.2.0.58.J.20030919124030.04fbfd20@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Fri, 19 Sep 2003 12:50:00 -0400
To: John Cowan <jcowan@reutershealth.com>, Francois Yergeau <FYergeau@alis.com>
From: Martin Duerst <duerst@w3.org>
Subject: Re: Requesting a revision of RFC3023
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
In-Reply-To: <20030919161017.GE32762@skunk.reutershealth.com>
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain> <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

In the long term, I think the fundamental problem is with the large
number of encodings, not with the encoding identifications. The
example of US-ASCII very clearly shows the huge advantages of
having a single encoding.

This will of course take some time. As one example, N3 just says
that it's in UTF-8. For other new formats, that may make sense,
too.

Regards,     Martin.

At 12:10 03/09/19 -0400, John Cowan wrote:

>Francois Yergeau scripsit:
>
> > In this respect, yes.  All programming languages should provide for charset
> > identification of their source files.  Alas, none do, AFAIK.
>
>I almost, but not quite, entirely disagree with this position.
>
>Rather than having thousands of ad hoc mechanisms for encoding declarations
>in each of the thousands of text formats now extant, file systems should have
>a convenient mechanism for recording the encoding of each file, and character
>processing libraries should have convenient reading and writing operations 
>that
>do the necessary conversions.  Otherwise, generic text-processing tools become
>impossible, because each tool has to have a vast library that understands the
>mechanics of the encoding declaration specific to the format it is trying to
>read.  That way madness lies.



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JGC3KP013100 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 09:12:03 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JGC3Bo013099 for ietf-xml-mime-bks; Fri, 19 Sep 2003 09:12:03 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.reutershealth.com ([65.246.141.36]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JGC1KP013094 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 09:12:01 -0700 (PDT) (envelope-from jcowan@reutershealth.com)
Received: from skunk.reutershealth.com (mail [65.246.141.36]) by mail.reutershealth.com (Pro-8.9.3/Pro-8.9.3) with SMTP id MAA20350; Fri, 19 Sep 2003 12:06:54 -0400 (EDT)
Received: by skunk.reutershealth.com (sSMTP sendmail emulation); Fri, 19 Sep 2003 12:10:17 -0400
Date: Fri, 19 Sep 2003 12:10:17 -0400
From: John Cowan <jcowan@reutershealth.com>
To: Francois Yergeau <FYergeau@alis.com>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
Message-ID: <20030919161017.GE32762@skunk.reutershealth.com>
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain>
User-Agent: Mutt/1.4.1i
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Francois Yergeau scripsit:

> In this respect, yes.  All programming languages should provide for charset
> identification of their source files.  Alas, none do, AFAIK.

I almost, but not quite, entirely disagree with this position.

Rather than having thousands of ad hoc mechanisms for encoding declarations
in each of the thousands of text formats now extant, file systems should have
a convenient mechanism for recording the encoding of each file, and character
processing libraries should have convenient reading and writing operations that
do the necessary conversions.  Otherwise, generic text-processing tools become
impossible, because each tool has to have a vast library that understands the
mechanics of the encoding declaration specific to the format it is trying to
read.  That way madness lies.

-- 
As you read this, I don't want you to feel      John Cowan 
sorry for me, because, I believe everyone       jcowan@reutershealth.com
will die someday.    -- From a Nigerian-type    http://www.reutershealth.com
                        scam spam I got         http://www.ccil.org/~cowan


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JF5JKP009239 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 08:05:19 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JF5JV1009238 for ietf-xml-mime-bks; Fri, 19 Sep 2003 08:05:19 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from lotus.lotus.com (lotus.lotus.com [129.42.250.41]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JF5HKP009229 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 08:05:17 -0700 (PDT) (envelope-from noah_mendelsohn@us.ibm.com)
Received: from internet1.lotus.com (internet1 [172.16.131.235]) by lotus.lotus.com (8.12.10/8.12.9) with ESMTP id h8JEvRsQ027738; Fri, 19 Sep 2003 10:58:11 -0400 (EDT)
Received: from wtfmail05a.lotus.com (wtfmail05a.lotus.com [9.33.9.125]) by internet1.lotus.com (8.12.9/8.12.6) with ESMTP id h8JF4A3i009244; Fri, 19 Sep 2003 11:04:11 -0400 (EDT)
To: MURATA Makoto <murata@hokkaido.email.ne.jp>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
MIME-Version: 1.0
X-Mailer: Lotus Notes Release 5.0.8  June 18, 2001
Message-ID: <OFF35186E2.2AF6FFA7-ON85256DA6.00528EF2@lotus.com>
From: noah_mendelsohn@us.ibm.com
Date: Fri, 19 Sep 2003 11:04:11 -0400
X-MIMETrack: Serialize by Router on WTFMAIL05a/WTF/M/Lotus(Release 6.0.1|February 07, 2003) at 09/19/2003 11:04:11 AM, Serialize complete at 09/19/2003 11:04:11 AM
Content-Type: text/plain; charset="us-ascii"
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Murata Makoto writes:

>> I believe that SOAP implementations use the 
>> charset parameter.  If we remove the charset 
>> parameter, we will make them non-conformant.

This is not my area of expertise, but I note that the HTTP binding [1] 
provided by SOAP 1.2 Recommendation uses an application/soap+xml media 
type, a definition of which is at [2] (I believe it is working its way 
through the formal registration process.)  My reading is that the 
definition lists charset as optional, and makes clear that its proper use 
is to be found in RFC 3023. 

I am not aware of what typical implementations of SOAP 1.1 or SOAP 1.2 are 
doing, but the 1.2 spec at least seems to list it as optional.  Again, I'm 
not expert in this stuff and am not offering an opinion, but I thought the 
links might be helpful.

[1]http://www.w3.org/TR/soap12-part2/#soapinhttp
[2] http://www.w3.org/TR/soap12-part2/#ietf-draft

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------







MURATA Makoto <murata@hokkaido.email.ne.jp>
Sent by: www-tag-request@w3.org
09/19/03 08:10 AM

 
        To:     ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        Re: Requesting a revision of RFC3023




On Fri, 19 Sep 2003 03:50:11 +0200
Bjoern Hoehrmann <derhoermi@gmx.net> wrote:

> You want to change something that has been STRONGLY RECOMMENDED for over
> five years to (ideally) MUST NOT just because it could cause trouble
> when used improperly or with broken implementations. Today I am good
> with web standards if I use the charset parameter, tommorow I am bad
> with web standards if I do. What's next on #W3C? Use tables for layout
> because people could get CSS wrong and old browsers get some CSS wrong?
> I don't think this leads anywhere.

I believe that SOAP implementations use the charset parameter.  If we 
remove the charset parameter, we will make them non-conformant.

Cheers,

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>







Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JEO0KP006878 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 07:24:00 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JEO06E006877 for ietf-xml-mime-bks; Fri, 19 Sep 2003 07:24:00 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from dr-nick.w3.org (dr-nick.w3.org [18.29.1.73]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JENxKP006872 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 07:23:59 -0700 (PDT) (envelope-from duerst@w3.org)
Received: from enoshima (homer.w3.org [18.29.0.30]) by dr-nick.w3.org (Postfix) with ESMTP id EF4F71425E; Fri, 19 Sep 2003 10:23:59 -0400 (EDT)
Message-Id: <4.2.0.58.J.20030919095146.04e22050@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Fri, 19 Sep 2003 10:02:05 -0400
To: Larry Masinter <LMM@acm.org>, Tim Bray <tbray@textuality.com>
From: Martin Duerst <duerst@w3.org>
Subject: Re: transcoding nearly certainly wrong?
Cc: ietf-xml-mime@imc.org
In-Reply-To: <000a01c37e78$6326fa60$6401a8c0@MasinterT40>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Hello Larry,

At 23:36 03/09/18 -0700, Larry Masinter wrote:

>Why is transcoding nearly certain to be wrong with XML?

The argument is that people are not able to set up
their servers correctly.


>Or, to put it another way, why not limit the use
>of text/xml to XML instances for which transcoding
>is certain not to be wrong, and for which US-ASCII
>is acceptable (because the XML uses numeric character
>references or character entities, is only used to
>code a limited schema with numeric data, etc.?)
>
>Limiting the scope is less radical than deprecating.

I'm not sure this makes sense. For types with mostly
just numeric data, text/... may be the wrong type.

And defining types that are XML, but restricted to
US-ASCII would be a really bad idea for types that
actually can contain text.

Regards,    Martin.


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JEBGKP006314 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 07:11:16 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JEBGA9006313 for ietf-xml-mime-bks; Fri, 19 Sep 2003 07:11:16 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mtl.alis.com (mtl.alis.com [199.84.165.71]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JEBEKP006305 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 07:11:15 -0700 (PDT) (envelope-from FYergeau@alis.com)
Received: from alis-2k.alis.domain (alis-2k.alis.com [199.84.165.130]) by mtl.alis.com (8.12.8p1/8.12.8) with ESMTP id h8JEB8VR063068; Fri, 19 Sep 2003 14:11:08 GMT (envelope-from FYergeau@alis.com)
Received: by alis-2k.alis.domain with Internet Mail Service (5.5.2653.19) id <TDZ6Q8A9>; Fri, 19 Sep 2003 10:11:10 -0400
Message-ID: <F7D4BDA0E5A1D14B99D32C022AEB73660EB386@alis-2k.alis.domain>
From: Francois Yergeau <FYergeau@alis.com>
To: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: RE: Requesting a revision of RFC3023
Date: Fri, 19 Sep 2003 10:11:09 -0400
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain; charset="iso-8859-1"
X-Spam-Report:   This mail is probably spam.  The original message has been attached along with this report, so you can recognize or block similar unwanted mail in future.  See http://spamassassin.org/tag/ for more details. Content preview:  MURATA Makoto wrote: > If we really promote in-band declarations, we should recommend > one mechanism for all textual formats. I think that CSS style > declaration (@chaset) is the best I have seen. If the TAG recommends > this mechanism for all formats except XML and HTML, that would > be one reasonable principle. [...]  Content analysis details:   (-109.70 points, 5 required) EMAIL_ATTRIBUTION  (-6.5 points) BODY: Contains what looks like an email attribution QUOTED_EMAIL_TEXT  (-3.2 points) BODY: Contains what looks like a quoted email text USER_IN_WHITELIST  (-100.0 points)From: address is in the user's white-list
X-Spam-Checker-Version: SpamAssassin 2.53 (1.174.2.15-2003-03-30-exp)
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by above.proper.com id h8JEBFKP006309
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:
> If we really promote in-band declarations, we should recommend 
> one mechanism for all textual formats.  I think that CSS style 
> declaration (@chaset) is the best I have seen.  If the TAG recommends 
> this mechanism for all formats except XML and HTML, that would 
> be one reasonable principle.

The Character Model section 3.6.2 [1] addresses this question but stops
short of recommending a single mechanism.  I guess you think it should.

[1] http://www.w3.org/TR/charmod/#sec-EncodingIdent

-- 
François Yergeau



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JE8nKP006204 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 07:08:49 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JE8nGW006203 for ietf-xml-mime-bks; Fri, 19 Sep 2003 07:08:49 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mtl.alis.com (mtl.alis.com [199.84.165.71]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JE8mKP006190 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 07:08:48 -0700 (PDT) (envelope-from FYergeau@alis.com)
Received: from alis-2k.alis.domain (alis-2k.alis.com [199.84.165.130]) by mtl.alis.com (8.12.8p1/8.12.8) with ESMTP id h8JE8fVR062946; Fri, 19 Sep 2003 14:08:41 GMT (envelope-from FYergeau@alis.com)
Received: by alis-2k.alis.domain with Internet Mail Service (5.5.2653.19) id <TDZ6Q8AY>; Fri, 19 Sep 2003 10:08:43 -0400
Message-ID: <F7D4BDA0E5A1D14B99D32C022AEB73660EB385@alis-2k.alis.domain>
From: Francois Yergeau <FYergeau@alis.com>
To: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: RE: Requesting a revision of RFC3023
Date: Fri, 19 Sep 2003 10:08:42 -0400
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain; charset="iso-8859-1"
X-Spam-Report:   This mail is probably spam.  The original message has been attached along with this report, so you can recognize or block similar unwanted mail in future.  See http://spamassassin.org/tag/ for more details. Content preview:  John Cowan wrote: >Programming languages are broken as designed? In this respect, yes. All programming languages should provide for charset identification of their source files. Alas, none do, AFAIK. [...]  Content analysis details:   (-106.50 points, 5 required) EMAIL_ATTRIBUTION  (-6.5 points) BODY: Contains what looks like an email attribution USER_IN_WHITELIST  (-100.0 points)From: address is in the user's white-list
X-Spam-Checker-Version: SpamAssassin 2.53 (1.174.2.15-2003-03-30-exp)
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by above.proper.com id h8JE8mKP006199
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

John Cowan wrote:
>Programming languages are broken as designed?

In this respect, yes.  All programming languages should provide for charset
identification of their source files.  Alas, none do, AFAIK.

-- 
François



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JDfeKP004911 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 06:41:40 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JDfeO3004910 for ietf-xml-mime-bks; Fri, 19 Sep 2003 06:41:40 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.gmx.net (mail.gmx.net [213.165.64.20]) by above.proper.com (8.12.9/8.12.8) with SMTP id h8JDfdKP004904 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 06:41:39 -0700 (PDT) (envelope-from derhoermi@gmx.net)
Received: (qmail 7716 invoked by uid 65534); 19 Sep 2003 13:41:34 -0000
Received: from pD903BA43.dip.t-dialin.net (EHLO Voyager) (217.3.186.67) by mail.gmx.net (mp010) with SMTP; 19 Sep 2003 15:41:34 +0200
X-Authenticated: #723575
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: John Cowan <cowan@mercury.ccil.org>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
Date: Fri, 19 Sep 2003 15:41:27 +0200
Message-ID: <3f7c00e0.1733275227@smtp.bjoern.hoehrmann.de>
References: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp> <3F689C27.50407@textuality.com> <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de> <3F69DF5E.3000102@textuality.com> <3f744d82.1687358032@smtp.bjoern.hoehrmann.de> <20030919042853.GD28272@mercury.ccil.org>
In-Reply-To: <20030919042853.GD28272@mercury.ccil.org>
X-Mailer: Forte Agent 1.92/32.572
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

* John Cowan wrote:
>Bjoern Hoehrmann scripsit:
>
>> Depends on the format. Formats should provide means to specify
>> the encoding, if they do not they are BAD, broken as designed.
>
>Plain text is broken as designed?  Programming languages are broken
>as designed?

If a processor is required to guess how to process instances of a
format, the format is clearly broken as designed, yes.


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JCRhKP098194 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 05:27:43 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JCRhZP098193 for ietf-xml-mime-bks; Fri, 19 Sep 2003 05:27:43 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail2.asahi-net.or.jp [202.224.39.198]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JCRgKP098188 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 05:27:42 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (i217217.ppp.asahi-net.or.jp [61.125.217.217]) by mail.asahi-net.or.jp (Postfix) with ESMTP id B87706D9A; Fri, 19 Sep 2003 21:27:42 +0900 (JST)
Date: Fri, 19 Sep 2003 21:25:06 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
In-Reply-To: <3f744d82.1687358032@smtp.bjoern.hoehrmann.de>
References: <3F69DF5E.3000102@textuality.com> <3f744d82.1687358032@smtp.bjoern.hoehrmann.de>
Message-Id: <20030919211746.E240.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.06.02
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On Fri, 19 Sep 2003 03:50:11 +0200
Bjoern Hoehrmann <derhoermi@gmx.net> wrote:

> Depends on the format. Formats should provide means to specify
> the encoding, if they do not they are BAD, broken as designed.

If we really promote in-band declarations, we should recommend 
one mechanism for all textual formats.  I think that CSS style 
declaration (@chaset) is the best I have seen.  If the TAG recommends 
this mechanism for all formats except XML and HTML, that would 
be one reasonable principle.

Cheers,

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JCDTKP096584 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 05:13:29 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JCDTJq096583 for ietf-xml-mime-bks; Fri, 19 Sep 2003 05:13:29 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail1.asahi-net.or.jp [202.224.39.197]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JCDSKP096578 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 05:13:28 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (i217217.ppp.asahi-net.or.jp [61.125.217.217]) by mail.asahi-net.or.jp (Postfix) with ESMTP id A3A1C6741; Fri, 19 Sep 2003 21:13:28 +0900 (JST)
Date: Fri, 19 Sep 2003 21:10:52 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
In-Reply-To: <3f744d82.1687358032@smtp.bjoern.hoehrmann.de>
References: <3F69DF5E.3000102@textuality.com> <3f744d82.1687358032@smtp.bjoern.hoehrmann.de>
Message-Id: <20030919210739.E23D.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.06.02
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On Fri, 19 Sep 2003 03:50:11 +0200
Bjoern Hoehrmann <derhoermi@gmx.net> wrote:

> You want to change something that has been STRONGLY RECOMMENDED for over
> five years to (ideally) MUST NOT just because it could cause trouble
> when used improperly or with broken implementations. Today I am good
> with web standards if I use the charset parameter, tommorow I am bad
> with web standards if I do. What's next on #W3C? Use tables for layout
> because people could get CSS wrong and old browsers get some CSS wrong?
> I don't think this leads anywhere.

I believe that SOAP implementations use the charset parameter.  If we 
remove the charset parameter, we will make them non-conformant.

Cheers,

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JBwlKP095202 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 04:58:47 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JBwlv3095201 for ietf-xml-mime-bks; Fri, 19 Sep 2003 04:58:47 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail2.asahi-net.or.jp [202.224.39.198]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JBwiKP095193 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 04:58:46 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (i217217.ppp.asahi-net.or.jp [61.125.217.217]) by mail.asahi-net.or.jp (Postfix) with ESMTP id 256FF6221; Fri, 19 Sep 2003 20:58:44 +0900 (JST)
Date: Fri, 19 Sep 2003 20:56:08 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: ietf-xml-mime@imc.org, "WWW-Tag" <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
In-Reply-To: <bkcge5.2f8.1@mail.christoph.schneegans.de>
References: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp> <bkcge5.2f8.1@mail.christoph.schneegans.de>
Message-Id: <20030919081334.E237.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.06.02
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> > [5] http://www.asahi-net.or.jp/~eb2m-mrt/charsetDetection.html
> 
> I was quite surprised you didn't address the role of byte order marks.
> BOMs are not limited to XML, neither do they depend on MIME or HTTP. Of
> course, they can only be used for UTF encodings.

Thanks for this comment.  I think that the use of the BOM or Unicode signature 
is an example of charset sniffing based on byte patterns.  However, I agree that 
my document was not clear enough.  I revised it.

Cheers,

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JBueKP094969 for <ietf-xml-mime-bks@above.proper.com>; Fri, 19 Sep 2003 04:56:40 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8JBuevG094967 for ietf-xml-mime-bks; Fri, 19 Sep 2003 04:56:40 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail1.asahi-net.or.jp [202.224.39.197]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8JBucKP094958 for <ietf-xml-mime@imc.org>; Fri, 19 Sep 2003 04:56:39 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (i217217.ppp.asahi-net.or.jp [61.125.217.217]) by mail.asahi-net.or.jp (Postfix) with ESMTP id A2BF26DA8; Fri, 19 Sep 2003 20:56:38 +0900 (JST)
Date: Fri, 19 Sep 2003 20:54:01 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
In-Reply-To: <F7D4BDA0E5A1D14B99D32C022AEB73660EB380@alis-2k.alis.domain>
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB380@alis-2k.alis.domain>
Message-Id: <20030919204327.E23A.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
X-Mailer: Becky! ver. 2.06.02
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by above.proper.com id h8JBudKP094962
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> I happen to agree with that.  But please consider that saying 'XML is text'
> is not at all the same thing as saying 'XML is "text/"', as the latter is a
> MIME concept that carries a lot of additional (and unfortunate) baggage.

I agree with François because I have learned from Ned Freed that text/* is not 
always appropriate for textual data.

http://www.imc.org/ietf-xml-mime/mail-archive/msg00203.html

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>





Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8J6b1KP030703 for <ietf-xml-mime-bks@above.proper.com>; Thu, 18 Sep 2003 23:37:01 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8J6b1MJ030702 for ietf-xml-mime-bks; Thu, 18 Sep 2003 23:37:01 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from smtp-relay-7.sea.adobe.com (smtp-relay-7.adobe.com [192.150.22.7]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8J6b0KP030688 for <ietf-xml-mime@imc.org>; Thu, 18 Sep 2003 23:37:00 -0700 (PDT) (envelope-from LMM@acm.org)
Received: from inner-relay-3.corp.adobe.com (inner-relay-3 [153.32.251.51]) by smtp-relay-7.sea.adobe.com (8.12.10/8.12.10) with ESMTP id h8J6aq7c019001; Thu, 18 Sep 2003 23:36:53 -0700 (PDT)
Received: from mailsjtest (mailsjtest.corp.adobe.com [153.32.1.139]) by inner-relay-3.corp.adobe.com (8.12.9/8.12.9) with ESMTP id h8J6agps029278; Thu, 18 Sep 2003 23:36:52 -0700 (PDT)
Received: from MasinterT40 ([130.248.182.45]) by mailsjtest.corp.adobe.com (iPlanet Messaging Server 5.2 Patch 1 (built Aug 19 2002)) with ESMTP id <0HLG00BPD7QDS4@mailsjtest.corp.adobe.com>; Thu, 18 Sep 2003 23:37:25 -0700 (PDT)
Date: Thu, 18 Sep 2003 23:36:41 -0700
From: Larry Masinter <LMM@acm.org>
Subject: transcoding nearly certainly wrong?
To: Tim Bray <tbray@textuality.com>
Cc: ietf-xml-mime@imc.org
Message-id: <000a01c37e78$6326fa60$6401a8c0@MasinterT40>
MIME-version: 1.0
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
X-Mailer: Microsoft Outlook, Build 10.0.4024
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Importance: Normal
X-Priority: 3 (Normal)
X-MSMail-priority: Normal
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Why is transcoding nearly certain to be wrong with XML?

Or, to put it another way, why not limit the use
of text/xml to XML instances for which transcoding
is certain not to be wrong, and for which US-ASCII
is acceptable (because the XML uses numeric character
references or character entities, is only used to
code a limited schema with numeric data, etc.?)

Limiting the scope is less radical than deprecating.








Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8J4SqKP091483 for <ietf-xml-mime-bks@above.proper.com>; Thu, 18 Sep 2003 21:28:52 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8J4SqEw091482 for ietf-xml-mime-bks; Thu, 18 Sep 2003 21:28:52 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mercury.ccil.org (mercury.ccil.org [192.190.237.100]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8J4SpKP091474 for <ietf-xml-mime@imc.org>; Thu, 18 Sep 2003 21:28:52 -0700 (PDT) (envelope-from cowan@mercury.ccil.org)
Received: from cowan by mercury.ccil.org with local (Exim 3.35 #1 (Debian)) id 1A0Ct3-0000br-00; Fri, 19 Sep 2003 00:28:53 -0400
Date: Fri, 19 Sep 2003 00:28:53 -0400
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
Message-ID: <20030919042853.GD28272@mercury.ccil.org>
References: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp> <3F689C27.50407@textuality.com> <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de> <3F69DF5E.3000102@textuality.com> <3f744d82.1687358032@smtp.bjoern.hoehrmann.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3f744d82.1687358032@smtp.bjoern.hoehrmann.de>
User-Agent: Mutt/1.3.28i
From: John Cowan <cowan@mercury.ccil.org>
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Bjoern Hoehrmann scripsit:

> Depends on the format. Formats should provide means to specify
> the encoding, if they do not they are BAD, broken as designed.

Plain text is broken as designed?  Programming languages are broken
as designed?

-- 
John Cowan  jcowan@reutershealth.com  www.ccil.org/~cowan  www.reutershealth.com
"In computer science, we stand on each other's feet."
        --Brian K. Reid


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8J1oeKP082443 for <ietf-xml-mime-bks@above.proper.com>; Thu, 18 Sep 2003 18:50:40 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8J1oebs082442 for ietf-xml-mime-bks; Thu, 18 Sep 2003 18:50:40 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.gmx.net (pop.gmx.de [213.165.64.20]) by above.proper.com (8.12.9/8.12.8) with SMTP id h8J1ocKP082431 for <ietf-xml-mime@imc.org>; Thu, 18 Sep 2003 18:50:39 -0700 (PDT) (envelope-from derhoermi@gmx.net)
Received: (qmail 25637 invoked by uid 65534); 19 Sep 2003 01:50:30 -0000
Received: from pD903B302.dip.t-dialin.net (EHLO Voyager) (217.3.179.2) by mail.gmx.net (mp023) with SMTP; 19 Sep 2003 03:50:30 +0200
X-Authenticated: #723575
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: Tim Bray <tbray@textuality.com>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
Date: Fri, 19 Sep 2003 03:50:11 +0200
Message-ID: <3f744d82.1687358032@smtp.bjoern.hoehrmann.de>
References: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp> <3F689C27.50407@textuality.com> <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de> <3F69DF5E.3000102@textuality.com>
In-Reply-To: <3F69DF5E.3000102@textuality.com>
X-Mailer: Forte Agent 1.92/32.572
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

* Tim Bray wrote:
>Agreed, which is another of the advantages of XML, since it doesn't need 
>a charset parameter.  You are right about the shortcomings of the 
>charset parameter but for the moment it's the best tool we have.

Depends on the format. Formats should provide means to specify
the encoding, if they do not they are BAD, broken as designed.

>>>I agree, but for XML formats, I still think the charset parameter is 
>>>actively harmful and should be deprecated or even forbidden.
>> 
>> Deprecating something useful just because it could cause trouble when
>> used improperly does not make sense to me.
>
>The argument is precisely is that it is not in the slightest useful.

Which makes me wonder why there is such a parameter. I think W3C should
have raised this concern during IESG review of RFC 2376. Complaining
about it know seems a bit late.

>Please read appendix F to the XML specification.  Then please suggest a 
>plausible scenario in which an XML instance unaccompanied by a charset 
>parameter can cause breakage.  You'll have to work hard.  Then suggest a 
>dozen ways in which deployed software is known to get the charset wrong. 
>You'll have no trouble.

I will neither have trouble to suggest ways in which deployed software
is known to get something wrong when the encoding declaration or the
byte order mark are involved, especially if those are used improperly.
But my logical conclusion is not to forbid the byte order mark or the
encoding declaration.

You want to change something that has been STRONGLY RECOMMENDED for over
five years to (ideally) MUST NOT just because it could cause trouble
when used improperly or with broken implementations. Today I am good
with web standards if I use the charset parameter, tommorow I am bad
with web standards if I do. What's next on #W3C? Use tables for layout
because people could get CSS wrong and old browsers get some CSS wrong?
I don't think this leads anywhere.

The charset parameter is useful if you cannot or do not want to use an
encoding declaration, for content negotiation, for view source
functionality, if you perform protocol operations that change the
encoding without changing the document or if you have to deal with
legacy applications that could break your document if no charset
parameter is present. I admit that there is probably no strong enough
use case to introduce it, but we have the parameter already and it has
been STRONGLY RECOMMENDED for ages across various W3C technologies.

I can live with removing the STRONGLY RECOMMENDED status and an
informative note that you typically do not need to specifiy the
charset parameter but anything beyond that goes much too far.

>To put it another way, quoting Larry Wall: "An XML document knows what 
>encoding it's in."

<http://www.w3.org/People/Bos/DesignGuide/stability>:

  ...
  Having to re-learn how to do something is costly, creating new
  programs to do the same thing in a different way is costly, and
  converting existing documents and other resources to a different
  format is also costly, so changes with little or no benefit should
  be avoided.
  ...


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8IGbsKP055630 for <ietf-xml-mime-bks@above.proper.com>; Thu, 18 Sep 2003 09:37:54 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8IGbsFI055628 for ietf-xml-mime-bks; Thu, 18 Sep 2003 09:37:54 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.dev.antarcti.ca (gt.antarcti.ca [209.17.183.233]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8IGbqKP055621 for <ietf-xml-mime@imc.org>; Thu, 18 Sep 2003 09:37:53 -0700 (PDT) (envelope-from tbray@textuality.com)
Received: from textuality.com (dev1.dev.antarcti.ca [10.1.1.8]) by mail.dev.antarcti.ca (Postfix) with ESMTP id 215CE102EE; Thu, 18 Sep 2003 09:37:52 -0700 (PDT)
Message-ID: <3F69DF5E.3000102@textuality.com>
Date: Thu, 18 Sep 2003 09:37:50 -0700
From: Tim Bray <tbray@textuality.com>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030624
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
References: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp> <3F689C27.50407@textuality.com> <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de>
In-Reply-To: <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Bjoern Hoehrmann wrote:

> The charset parameter does not work as web servers or web server
> configurations often do not allow content providers to setup this
> parameter. In general, technical solutions should either work
> automagically behind the scene or be visible to those who need to
> be aware; HTTP headers are invisible to most content providers so
> I doubt that this is an architecturally-sound position.

Agreed, which is another of the advantages of XML, since it doesn't need 
a charset parameter.  You are right about the shortcomings of the 
charset parameter but for the moment it's the best tool we have.

>>I agree, but for XML formats, I still think the charset parameter is 
>>actively harmful and should be deprecated or even forbidden.
> 
> Deprecating something useful just because it could cause trouble when
> used improperly does not make sense to me.

The argument is precisely is that it is not in the slightest useful. 
Please read appendix F to the XML specification.  Then please suggest a 
plausible scenario in which an XML instance unaccompanied by a charset 
parameter can cause breakage.  You'll have to work hard.  Then suggest a 
dozen ways in which deployed software is known to get the charset wrong. 
  You'll have no trouble.  Given that Web Architecture says that the 
charset takes precedence *if provided*, and given that this can add no 
useful information and can easily cause breakage if (as is known to 
happen) it is wrong, there is normally no good reason to provide a 
charset parameter for a media type which is known to be XML.

Francois Yergeau managed to dream up a couple of scnenarios where you 
might want a charset with XML, but they were somewhere on the border 
between obscure and bad practice (as I'm sure Francois would agree).

To put it another way, quoting Larry Wall: "An XML document knows what 
encoding it's in."
-- 
Cheers, Tim Bray
         (ongoing fragmented essay: http://www.tbray.org/ongoing/)




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8IEreKP050307 for <ietf-xml-mime-bks@above.proper.com>; Thu, 18 Sep 2003 07:53:40 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8IEre80050306 for ietf-xml-mime-bks; Thu, 18 Sep 2003 07:53:40 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from serrano.hesketh.net (mail.hesketh.net [216.27.21.211]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8IErcKP050297 for <ietf-xml-mime@imc.org>; Thu, 18 Sep 2003 07:53:38 -0700 (PDT) (envelope-from simonstl@simonstl.com)
X-Received-From: simonstl@simonstl.com
X-Delivered-To: ietf-xml-mime@imc.org
X-Originating-IP: [24.58.125.32]
Received: from 192.168.124.11 (syr-24-58-125-32.twcny.rr.com [24.58.125.32]) (authenticated bits=0) by serrano.hesketh.net (8.12.9p1/8.12.8) with ESMTP id h8IErQE1024305; Thu, 18 Sep 2003 10:53:29 -0400
X-Spam-Filter: check_local@serrano.hesketh.net by digitalanswers.org
Date: Thu, 18 Sep 2003 10:53:30 -0400
From: "Simon St.Laurent" <simonstl@simonstl.com>
Subject: RE: Requesting a revision of RFC3023
cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
X-Priority: 3
In-Reply-To: <F7D4BDA0E5A1D14B99D32C022AEB73660EB380@alis-2k.alis.domain>
Message-ID: <r02000000-1026-DE57E6A6E9E711D7870D0003937A08C2@[192.168.124.11]>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Mailer: Mailsmith 2.0 (Blindsider)
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

[Should we be cross-posting?  I suspect ietf-xml-mime is probably a
better forum for detailed discussion, but as TAG made the initial
request, maybe cross-posting is appropriate.  Someone please advise.]

FYergeau@alis.com (Francois Yergeau) writes:
>Tim Bray wrote:
>> I agree entirely with Michael and feel that the, er, 
>> textuality of XML is at the centre of everything.
>
>I happen to agree with that.  But please consider that saying 'XML is
>text' is not at all the same thing as saying 'XML is "text/"', as the
>latter is a MIME concept that carries a lot of additional (and
>unfortunate) baggage.

That's my position as well, and a nice way to say it.  While in more
general terms I feel very strongly that XML is text, I don't feel that
XML is text/.

It's unfortunate that text/ doesn't correspond cleanly with text, but
there are lots of good historical reasons for the baggage.  The cost of
that baggage for XML is such that I'm happy to move to application/.



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8IEJDKP048588 for <ietf-xml-mime-bks@above.proper.com>; Thu, 18 Sep 2003 07:19:13 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8IEJDMk048587 for ietf-xml-mime-bks; Thu, 18 Sep 2003 07:19:13 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mtl.alis.com (mtl.alis.com [199.84.165.71]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8IEJCKP048581 for <ietf-xml-mime@imc.org>; Thu, 18 Sep 2003 07:19:12 -0700 (PDT) (envelope-from FYergeau@alis.com)
Received: from alis-2k.alis.domain (alis-2k.alis.com [199.84.165.130]) by mtl.alis.com (8.12.8p1/8.12.8) with ESMTP id h8IEH0bp036862; Thu, 18 Sep 2003 14:17:27 GMT (envelope-from FYergeau@alis.com)
Received: by alis-2k.alis.domain with Internet Mail Service (5.5.2653.19) id <TDZ6Q6JG>; Thu, 18 Sep 2003 10:16:48 -0400
Message-ID: <F7D4BDA0E5A1D14B99D32C022AEB73660EB380@alis-2k.alis.domain>
From: Francois Yergeau <FYergeau@alis.com>
To: "'Tim Bray'" <tbray@textuality.com>, "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
Cc: MURATA Makoto <murata@hokkaido.email.ne.jp>, ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: RE: Requesting a revision of RFC3023
Date: Thu, 18 Sep 2003 10:16:48 -0400
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain; charset="iso-8859-1"
X-Spam-Report:   This mail is probably spam.  The original message has been attached along with this report, so you can recognize or block similar unwanted mail in future.  See http://spamassassin.org/tag/ for more details. Content preview:  Tim Bray wrote: > I agree entirely with Michael and feel that the, er, > textuality of XML is at the centre of everything. I happen to agree with that. But please consider that saying 'XML is text' is not at all the same thing as saying 'XML is "text/"', as the latter is a MIME concept that carries a lot of additional (and unfortunate) baggage. [...]  Content analysis details:   (-109.70 points, 5 required) EMAIL_ATTRIBUTION  (-6.5 points) BODY: Contains what looks like an email attribution QUOTED_EMAIL_TEXT  (-3.2 points) BODY: Contains what looks like a quoted email text USER_IN_WHITELIST  (-100.0 points)From: address is in the user's white-list
X-Spam-Checker-Version: SpamAssassin 2.53 (1.174.2.15-2003-03-30-exp)
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by above.proper.com id h8IEJDKP048583
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Tim Bray wrote:
> I agree entirely with Michael and feel that the, er, 
> textuality of XML is at the centre of everything.

I happen to agree with that.  But please consider that saying 'XML is text'
is not at all the same thing as saying 'XML is "text/"', as the latter is a
MIME concept that carries a lot of additional (and unfortunate) baggage.

> Thus, I disagree with Francois and 
> Makoto in their contention that XML is not usefully 
> considered as text for humans to look at.

I'm afraid this is the case for most humans in most situations.  Sure, there
are people who like to view and edit XML as plain text (pleading guilty
here;-), but deprecating "text/*xml" won't stop me: save to disk and fire up
the editor.  I don't fell I'm going to lose anything of great pragmatic
value, although there is indeed a loss of conceptual value in not calling
XML what it is: text.

> Having said that, I feel that the use of media 
> types beginning with "text/" remains inappropriate, but primarily 
> because of the charset defaulting baggage that comes with those five 
> characters.

Yes.  Oh, and "text/" also implies the MIME canonical form with short lines
delimited by CR-LF, making UTF-16 and UTF-32 impossible, etc.  Broken.  To
be deprecated.

-- 
François



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8IClOKP043806 for <ietf-xml-mime-bks@above.proper.com>; Thu, 18 Sep 2003 05:47:24 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8IClOgi043805 for ietf-xml-mime-bks; Thu, 18 Sep 2003 05:47:24 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from qhmail2.colt1.inetserver.de (qhmail2.colt1.inetserver.de [195.234.228.78]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8IClNKP043771 for <ietf-xml-mime@imc.org>; Thu, 18 Sep 2003 05:47:23 -0700 (PDT) (envelope-from Christoph@Schneegans.de)
Received: from GG57.m1.mailkunden.de (qhmail1.colt1.inetserver.de [195.234.228.77]) by qhmail2.colt1.inetserver.de (Postfix) with ESMTP id E1B55AB74F; Thu, 18 Sep 2003 14:47:18 +0200 (CEST)
Received: from christoph [217.226.216.165] by GG57.m1.mailkunden.de with ESMTP (SMTPD32-6.06) id A930161400B4; Thu, 18 Sep 2003 14:46:40 +0200
From: Christoph Schneegans <Christoph@Schneegans.de>
To: <ietf-xml-mime@imc.org>
Cc: "WWW-Tag" <www-tag@w3.org>
References: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp>
Subject: Re: Requesting a revision of RFC3023
Date: Thu, 18 Sep 2003 12:45:27 GMT
MIME-Version: 1.0
Content-Type: text/plain;charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-ID: <bkcge5.2f8.1@mail.christoph.schneegans.de>
X-Mailer: Outlook Express/6.00.2800.1123, Hamster/2.0.0.1, Korrnews/4.2
Lines: 12
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:

> At present, different technologies have introduced different ad-hoc
> solutions  [5].
>
> [5] http://www.asahi-net.or.jp/~eb2m-mrt/charsetDetection.html

I was quite surprised you didn't address the role of byte order marks.
BOMs are not limited to XML, neither do they depend on MIME or HTTP. Of
course, they can only be used for UTF encodings.

-- 
<http://schneegans.de/>


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8ICKeKP042289 for <ietf-xml-mime-bks@above.proper.com>; Thu, 18 Sep 2003 05:20:40 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8ICKdXK042288 for ietf-xml-mime-bks; Thu, 18 Sep 2003 05:20:39 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from dr-nick.w3.org (dr-nick.w3.org [18.29.1.73]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8ICKcKP042282 for <ietf-xml-mime@imc.org>; Thu, 18 Sep 2003 05:20:39 -0700 (PDT) (envelope-from duerst@w3.org)
Received: from enoshima (homer.w3.org [18.29.0.30]) by dr-nick.w3.org (Postfix) with ESMTP id CD2D313473; Thu, 18 Sep 2003 08:20:38 -0400 (EDT)
Message-Id: <4.2.0.58.J.20030918081150.06c9b690@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J 
Date: Thu, 18 Sep 2003 08:17:05 -0400
To: MURATA Makoto <murata@hokkaido.email.ne.jp>, ietf-xml-mime@imc.org
From: Martin Duerst <duerst@w3.org>
Subject: Re: Requesting a revision of RFC3023
Cc: WWW-Tag <www-tag@w3.org>
In-Reply-To: <20030918074726.E228.MURATA@hokkaido.email.ne.jp>
References: <3F689C27.50407@textuality.com> <20030918011830.E21F.MURATA@hokkaido.email.ne.jp> <3F689C27.50407@textuality.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

At 08:34 03/09/18 +0900, MURATA Makoto wrote:

> > Hmm, the TAG is pretty convinced that 3023 needs to change, so maybe Dan
> > or Chris or TimBL could take this up internally.  I disagree that this
> > should be frozen at the moment, since the TAG is quite likely to publish
> > a document saying "RFC 3023 is wrong".
>
>Since the TAG and the W3C team share some members, please have some
>internal discussion.  Our hands are tied and are also requested to
>take some action.  Please do not blame us.

Tim is currently in the UK. I'll check directly with him when he is
back next week. The idea is that we can pool various needed updates
rather than reissuing the RFC several times.


Regards,    Martin.


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8I5i3KP068151 for <ietf-xml-mime-bks@above.proper.com>; Wed, 17 Sep 2003 22:44:03 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8I5i319068150 for ietf-xml-mime-bks; Wed, 17 Sep 2003 22:44:03 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.dev.antarcti.ca (gt.antarcti.ca [209.17.183.233]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8I5i1KP068137 for <ietf-xml-mime@imc.org>; Wed, 17 Sep 2003 22:44:01 -0700 (PDT) (envelope-from tbray@textuality.com)
Received: from textuality.com (dev1.dev.antarcti.ca [10.1.1.8]) by mail.dev.antarcti.ca (Postfix) with ESMTP id 07AE910321; Wed, 17 Sep 2003 22:44:05 -0700 (PDT)
Message-ID: <3F694624.9080500@textuality.com>
Date: Wed, 17 Sep 2003 22:44:04 -0700
From: Tim Bray <tbray@textuality.com>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030624
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
Cc: MURATA Makoto <murata@hokkaido.email.ne.jp>, ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB37C@alis-2k.alis.domain>	 <20030918083238.E22E.MURATA@hokkaido.email.ne.jp> <1063857247.2449.14.camel@localhost>
In-Reply-To: <1063857247.2449.14.camel@localhost>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

C. M. Sperberg-McQueen wrote:

> I'm sorry to see two people whose opinions I value so highly
> agreeing with a position that so troubles me.  One of the 
> most important characteristics of XML, as compared with many,
> many competing formats for the storage and/or transmission of
> data is that it is textual

I agree entirely with Michael and feel that the, er, textuality of XML 
is at the centre of everything.  Thus, I disagree with Francois and 
Makoto in their contention that XML is not usefully considered as text 
for humans to look at.  Having said that, I feel that the use of media 
types beginning with "text/" remains inappropriate, but primarily 
because of the charset defaulting baggage that comes with those five 
characters.  And secondarily because of the fact that the rules allow 
transcoding, which is nearly certain to be wrong with XML.  The second 
exercises me less because I don't think it's actually happening, so the 
core issue is really the charset default stuff.
-- 
Cheers, Tim Bray
         (ongoing fragmented essay: http://www.tbray.org/ongoing/)




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8I3sRKP045292 for <ietf-xml-mime-bks@above.proper.com>; Wed, 17 Sep 2003 20:54:27 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8I3sRFd045291 for ietf-xml-mime-bks; Wed, 17 Sep 2003 20:54:27 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from dr-nick.w3.org (dr-nick.w3.org [18.29.1.73]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8I3sQKP045284 for <ietf-xml-mime@imc.org>; Wed, 17 Sep 2003 20:54:26 -0700 (PDT) (envelope-from cmsmcq@acm.org)
Received: from localhost (homer.w3.org [18.29.0.30]) by dr-nick.w3.org (Postfix) with ESMTP id 7BD6313533; Wed, 17 Sep 2003 23:54:28 -0400 (EDT)
Subject: Re: Requesting a revision of RFC3023
From: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
To: MURATA Makoto <murata@hokkaido.email.ne.jp>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
In-Reply-To: <20030918083238.E22E.MURATA@hokkaido.email.ne.jp>
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB37C@alis-2k.alis.domain> <20030918083238.E22E.MURATA@hokkaido.email.ne.jp>
Content-Type: text/plain
Organization: 
Message-Id: <1063857247.2449.14.camel@localhost>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.2-3mdk 
Date: 17 Sep 2003 21:54:07 -0600
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On Wed, 2003-09-17 at 17:34, MURATA Makoto wrote:
> On Wed, 17 Sep 2003 14:10:53 -0400
> Francois Yergeau <FYergeau@alis.com> wrote:
> 
> > But stating that "most XML is not text for casual users" says that there is
> > no loss in deprecating text/*xml (save perhaps transition issues), the text/
> > top-level buys nothing of value.
> 
> Agreed.

I'm sorry to see two people whose opinions I value so highly
agreeing with a position that so troubles me.  One of the 
most important characteristics of XML, as compared with many,
many competing formats for the storage and/or transmission of
data is that it is textual (in the sense of being conceptually
a sequence of characters, and represented on the wire -- at least
so far -- as such).  Since much of the XML which I care about
is also a digital representation of texts (in the sense of
being natural-language utterances with a certain degree of
intra-document linguistic and thematic cohesion), it troubles
me to think that labeling XML as text buys us nothing of
value.  On the contrary, I think: it stresses two important
facts.

I assume that both of you are thinking primarily of browser 
fallback behavior, and reflecting your view that users will
be able to make nothing of XML source if they are confronted
with it -- in that context, I understand, the proposition you
endorse is at least plausible.  Even there, though, I don't
find it compelling:  much XML is quite legible to naive humans --
as legible as anything displayed without much intelligent
formatting will ever be.  And for some humans, almost all XML
is legible without special tools.  (I recognize that the latter
group is a relatively small minority of the human population.)

-C. M. Sperberg-McQueen




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8I0haej026449 for <ietf-xml-mime-bks@above.proper.com>; Wed, 17 Sep 2003 17:43:36 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8I0haYv026448 for ietf-xml-mime-bks; Wed, 17 Sep 2003 17:43:36 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.gmx.net (pop.gmx.net [213.165.64.20]) by above.proper.com (8.12.9/8.12.8) with SMTP id h8I0hYej026442 for <ietf-xml-mime@imc.org>; Wed, 17 Sep 2003 17:43:35 -0700 (PDT) (envelope-from derhoermi@gmx.net)
Received: (qmail 19712 invoked by uid 65534); 18 Sep 2003 00:43:21 -0000
Received: from pD903BA07.dip.t-dialin.net (EHLO Voyager) (217.3.186.7) by mail.gmx.net (mp023) with SMTP; 18 Sep 2003 02:43:21 +0200
X-Authenticated: #723575
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: Tim Bray <tbray@textuality.com>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
Date: Thu, 18 Sep 2003 02:42:53 +0200
Message-ID: <3f83f70c.1599687809@smtp.bjoern.hoehrmann.de>
References: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp> <3F689C27.50407@textuality.com>
In-Reply-To: <3F689C27.50407@textuality.com>
X-Mailer: Forte Agent 1.92/32.572
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

* Tim Bray wrote:
>I read [3] and while I agree with much of it, it's obviously far too 
>late to change the XML encoding declaration.  For the moment, I think 
>that the architecturally-sound position is, for Web data formats, either 
>(a) use XML, or (b) use the charset parameter.

The charset parameter does not work as web servers or web server
configurations often do not allow content providers to setup this
parameter. In general, technical solutions should either work
automagically behind the scene or be visible to those who need to
be aware; HTTP headers are invisible to most content providers so
I doubt that this is an architecturally-sound position.

>I agree, but for XML formats, I still think the charset parameter is 
>actively harmful and should be deprecated or even forbidden.

Deprecating something useful just because it could cause trouble when
used improperly does not make sense to me.


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HNb4ej023688 for <ietf-xml-mime-bks@above.proper.com>; Wed, 17 Sep 2003 16:37:04 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8HNb3Rx023687 for ietf-xml-mime-bks; Wed, 17 Sep 2003 16:37:03 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail1.asahi-net.or.jp [202.224.39.197]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HNb2ej023681 for <ietf-xml-mime@imc.org>; Wed, 17 Sep 2003 16:37:02 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (h220208.ppp.asahi-net.or.jp [61.114.220.208]) by mail.asahi-net.or.jp (Postfix) with ESMTP id BFEC3AC40; Thu, 18 Sep 2003 08:37:04 +0900 (JST)
Date: Thu, 18 Sep 2003 08:34:30 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: ietf-xml-mime@imc.org
Subject: Re: Requesting a revision of RFC3023
Cc: WWW-Tag <www-tag@w3.org>
In-Reply-To: <F7D4BDA0E5A1D14B99D32C022AEB73660EB37C@alis-2k.alis.domain>
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB37C@alis-2k.alis.domain>
Message-Id: <20030918083238.E22E.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.06.02
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

On Wed, 17 Sep 2003 14:10:53 -0400
Francois Yergeau <FYergeau@alis.com> wrote:

> But stating that "most XML is not text for casual users" says that there is
> no loss in deprecating text/*xml (save perhaps transition issues), the text/
> top-level buys nothing of value.

Agreed.

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HNavej023677 for <ietf-xml-mime-bks@above.proper.com>; Wed, 17 Sep 2003 16:36:57 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8HNavXv023676 for ietf-xml-mime-bks; Wed, 17 Sep 2003 16:36:57 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail1.asahi-net.or.jp [202.224.39.197]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HNatej023671 for <ietf-xml-mime@imc.org>; Wed, 17 Sep 2003 16:36:56 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (h220208.ppp.asahi-net.or.jp [61.114.220.208]) by mail.asahi-net.or.jp (Postfix) with ESMTP id 241BB7DDE; Thu, 18 Sep 2003 08:36:55 +0900 (JST)
Date: Thu, 18 Sep 2003 08:34:21 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: ietf-xml-mime@imc.org
Subject: Re: Requesting a revision of RFC3023
Cc: WWW-Tag <www-tag@w3.org>
In-Reply-To: <3F689C27.50407@textuality.com>
References: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp> <3F689C27.50407@textuality.com>
Message-Id: <20030918074726.E228.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
X-Mailer: Becky! ver. 2.06.02
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by above.proper.com id h8HNauej023672
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

> Hmm, the TAG is pretty convinced that 3023 needs to change, so maybe Dan 
> or Chris or TimBL could take this up internally.  I disagree that this 
> should be frozen at the moment, since the TAG is quite likely to publish 
> a document saying "RFC 3023 is wrong".

Since the TAG and the W3C team share some members, please have some 
internal discussion.  Our hands are tied and are also requested to 
take some action.  Please do not blame us.

> > As for the charset parameter, I am still uneasy to disallow or
> > deprecate it.  
...
> 
> I think I provided a detailed explanation of why the charset is in fact 
> actively harmful in the context of XML.  If you're not convinced it 
> would be helpful if you could address those points.  If you already 
> have, my apologies, perhaps you could give a pointer.

First, I agree on your request beginning "it should be made clear 
that nobody sending a media-type should send a charset for an XML 
media-type unless it REALLY REALLY KNOWS what it's sending".  Is 
this an acceptable position?

You explained that an ad-hoc mechanism (i.e., encoding dcls) almost perfectly work 
for XML, and I agree.  The above position implies that (1) the ad-hoc mechanism 
is allowed, (2) the generic mechanism is recommended but optional, and (3) if
specified, the generic mechanism takes precedence.  You provided a detailed 
explanation of (1), but your argument against (3) is not persuasive.

> I read [3] and while I agree with much of it, it's obviously far too 
> late to change the XML encoding declaration.  

True.  But what happens to upcoming non-XML formats?

>For the moment, I think 
> that the architecturally-sound position is, for Web data formats, either 
> (a) use XML, or (b) use the charset parameter.  

It would be great if the TAG finding document explicitly state this.  If 
we reach a consensus on this principle, my worries will be lessened.  
I am worried because I think the TAG is trying to deprecate the generic mechanism 
without establishing any principle.

> > 	(1) non-self-describing data formats should rely on the
> >             charset parameter, and
> > 	(2) self-describing data formats should introduce their own
> > 	    mechanism for specifying charsets.
> 
> I'll review the webarch doc, I suspect we haven't thought closely enough 
> about this.

Please think about this.  Frankly, as for this issue, I think that the 
TAG does not have an architecture but only has an ad-hoc solution.   Does 
the I18N WG agree with the position of the TAG?

> I agree, but for XML formats, I still think the charset parameter is 
> actively harmful and should be deprecated or even forbidden.  This is 
> orthogonal to the larger question you (correctly) raise, of charset 
> detection for non-XML formats.

I do not think that this is orthogonal.  This is the point.  Should we 
deprecate the generic mechanism in the particular case that an 
ad-hoc mechanism works better?  WWW programmers are already tired of 
ad-hoc mechanisms.

CHeers,

Makoto





Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HIB6ej011042 for <ietf-xml-mime-bks@above.proper.com>; Wed, 17 Sep 2003 11:11:06 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8HIB6p1011041 for ietf-xml-mime-bks; Wed, 17 Sep 2003 11:11:06 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mtl.alis.com (mtl.alis.com [199.84.165.71]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HIB5ej011030 for <ietf-xml-mime@imc.org>; Wed, 17 Sep 2003 11:11:05 -0700 (PDT) (envelope-from FYergeau@alis.com)
Received: from alis-2k.alis.domain (alis-2k.alis.com [199.84.165.130]) by mtl.alis.com (8.12.8p1/8.12.8) with ESMTP id h8HIAv1M026620; Wed, 17 Sep 2003 18:10:58 GMT (envelope-from FYergeau@alis.com)
Received: by alis-2k.alis.domain with Internet Mail Service (5.5.2653.19) id <R4CRD7Q5>; Wed, 17 Sep 2003 14:10:56 -0400
Message-ID: <F7D4BDA0E5A1D14B99D32C022AEB73660EB37C@alis-2k.alis.domain>
From: Francois Yergeau <FYergeau@alis.com>
To: "'MURATA Makoto'" <murata@hokkaido.email.ne.jp>, ietf-xml-mime@imc.org
Cc: WWW-Tag <www-tag@w3.org>
Subject: RE: Requesting a revision of RFC3023
Date: Wed, 17 Sep 2003 14:10:53 -0400
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain; charset="iso-8859-1"
X-Spam-Report:   This mail is probably spam.  The original message has been attached along with this report, so you can recognize or block similar unwanted mail in future.  See http://spamassassin.org/tag/ for more details. Content preview:  MURATA Makoto wrote: > and to deprecate text/xml not > because the charset parameter is harmful but because most XML is not > text for casual users. It's not the charset parameter that's harmful, it's its absence. Or rather the mandated policy when it is absent: MUST assume ASCII. [...]  Content analysis details:   (-109.70 points, 5 required) EMAIL_ATTRIBUTION  (-6.5 points) BODY: Contains what looks like an email attribution QUOTED_EMAIL_TEXT  (-3.2 points) BODY: Contains what looks like a quoted email text USER_IN_WHITELIST  (-100.0 points)From: address is in the user's white-list
X-Spam-Checker-Version: SpamAssassin 2.53 (1.174.2.15-2003-03-30-exp)
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by above.proper.com id h8HIB5ej011037
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:
> and to deprecate text/xml not
> because the charset parameter is harmful but because most XML is not
> text for casual users.

It's not the charset parameter that's harmful, it's its absence.  Or rather
the mandated policy when it is absent: MUST assume ASCII.

But stating that "most XML is not text for casual users" says that there is
no loss in deprecating text/*xml (save perhaps transition issues), the text/
top-level buys nothing of value.

Regards,

-- 
François Yergeau



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HI4Uej010854 for <ietf-xml-mime-bks@above.proper.com>; Wed, 17 Sep 2003 11:04:30 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8HI4UAd010853 for ietf-xml-mime-bks; Wed, 17 Sep 2003 11:04:30 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mtl.alis.com (mtl.alis.com [199.84.165.71]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HI4Tej010844 for <ietf-xml-mime@imc.org>; Wed, 17 Sep 2003 11:04:29 -0700 (PDT) (envelope-from FYergeau@alis.com)
Received: from alis-2k.alis.domain (alis-2k.alis.com [199.84.165.130]) by mtl.alis.com (8.12.8p1/8.12.8) with ESMTP id h8HI4M1M026527; Wed, 17 Sep 2003 18:04:23 GMT (envelope-from FYergeau@alis.com)
Received: by alis-2k.alis.domain with Internet Mail Service (5.5.2653.19) id <R4CRD7Q2>; Wed, 17 Sep 2003 14:04:16 -0400
Message-ID: <F7D4BDA0E5A1D14B99D32C022AEB73660EB37B@alis-2k.alis.domain>
From: Francois Yergeau <FYergeau@alis.com>
To: "'Tim Bray'" <tbray@textuality.com>, ietf-xml-mime@imc.org
Cc: WWW-Tag <www-tag@w3.org>
Subject: RE: Requesting a revision of RFC3023
Date: Wed, 17 Sep 2003 14:04:12 -0400
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain; charset="iso-8859-1"
X-Spam-Report:   This mail is probably spam.  The original message has been attached along with this report, so you can recognize or block similar unwanted mail in future.  See http://spamassassin.org/tag/ for more details. Content preview:  Tim Bray wrote: > I took an action item to ask about the chances of revising what 3023 > says about the charset parameter; while I'm not sure, I suspect that > there may actually be some level of consensus about the > desirable changes: [...]  Content analysis details:   (-109.70 points, 5 required) EMAIL_ATTRIBUTION  (-6.5 points) BODY: Contains what looks like an email attribution QUOTED_EMAIL_TEXT  (-3.2 points) BODY: Contains what looks like a quoted email text USER_IN_WHITELIST  (-100.0 points)From: address is in the user's white-list
X-Spam-Checker-Version: SpamAssassin 2.53 (1.174.2.15-2003-03-30-exp)
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by above.proper.com id h8HI4Uej010848
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

Tim Bray wrote:
> I took an action item to ask about the chances of revising what 3023 
> says about the charset parameter; while I'm not sure, I suspect that 
> there may actually be some level of consensus about the 
> desirable changes:

Quite possible.

> 1. Deprecate text/* for anything that's in XML.

Yes, please!

> 2. Deprecate the charset parameter for application/xml and 
> application/*+xml.

This would hurt some legitimate (but probably rare, perhaps unimportant) use
cases where:

1) a server generates some XML without an encoding decl, knowing that it
will send an authoritative charset parameter, or

2) a proxy transcodes an XML entity without fixing the encoding decl
therein,  knowing that it will send an authoritative charset parameter.

I'm not saying this kills the proposal, just that it needs to be weighed in.
Perhaps it will make the difference between "deprecate", "outlaw" or various
strengths of "discourage".

> For the server, on the other hand, 
> this is easy to get wrong, particularly with the introduction 
> of various kinds of filters in modern web servers.

No need to invoke filters, older and simpler servers are known historically
to get it wrong most of the time.

> it should be made clear that nobody sending a media-type 
> should send a charset for an XML media-type unless it REALLY REALLY KNOWS
what it's 
> sending, and in that case should consider not sending it anyhow.

Yes.

-- 
François Yergeau



Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HHv5ej010422 for <ietf-xml-mime-bks@above.proper.com>; Wed, 17 Sep 2003 10:57:05 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8HHv5RO010421 for ietf-xml-mime-bks; Wed, 17 Sep 2003 10:57:05 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.reutershealth.com ([65.246.141.36]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HHv4ej010414 for <ietf-xml-mime@imc.org>; Wed, 17 Sep 2003 10:57:04 -0700 (PDT) (envelope-from jcowan@reutershealth.com)
Received: from skunk.reutershealth.com (mail [65.246.141.36]) by mail.reutershealth.com (Pro-8.9.3/Pro-8.9.3) with SMTP id NAA27897; Wed, 17 Sep 2003 13:52:16 -0400 (EDT)
Received: by skunk.reutershealth.com (sSMTP sendmail emulation); Wed, 17 Sep 2003 13:55:41 -0400
Date: Wed, 17 Sep 2003 13:55:41 -0400
From: John Cowan <jcowan@reutershealth.com>
To: MURATA Makoto <murata@hokkaido.email.ne.jp>
Cc: ietf-xml-mime@imc.org, www-tag@w3.org
Subject: Re: Requesting a revision of RFC3023
Message-ID: <20030917175541.GC29372@skunk.reutershealth.com>
References: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp>
User-Agent: Mutt/1.4.1i
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto scripsit:

> This implies that XQuery should introduce its own mechanism (see [4])
> and that the compact syntax of RELAX NG should introduce another.

I don't see that either of these are self-describing in the relevant sense,
any more than a C program is.

> As far as I know, the charset parameter is the only generic mechanism.  I 
> know the charset parameter is not working well, but I do not see any other 
> generic mechanisms.

It's an acceptable mechanism when there is no internal declaration present.
It causes problems when internal declarations are or may be present.

-- 
John Cowan  jcowan@reutershealth.com  www.ccil.org/~cowan  www.reutershealth.com
"In computer science, we stand on each other's feet."
        --Brian K. Reid


Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HHciej009782 for <ietf-xml-mime-bks@above.proper.com>; Wed, 17 Sep 2003 10:38:44 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8HHcivY009781 for ietf-xml-mime-bks; Wed, 17 Sep 2003 10:38:44 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.dev.antarcti.ca (gt.antarcti.ca [209.17.183.233]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HHchej009776 for <ietf-xml-mime@imc.org>; Wed, 17 Sep 2003 10:38:43 -0700 (PDT) (envelope-from tbray@textuality.com)
Received: from textuality.com (dev1.dev.antarcti.ca [10.1.1.8]) by mail.dev.antarcti.ca (Postfix) with ESMTP id 689EF10313; Wed, 17 Sep 2003 10:38:42 -0700 (PDT)
Message-ID: <3F689C27.50407@textuality.com>
Date: Wed, 17 Sep 2003 10:38:47 -0700
From: Tim Bray <tbray@textuality.com>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030624
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: MURATA Makoto <murata@hokkaido.email.ne.jp>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Subject: Re: Requesting a revision of RFC3023
References: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp>
In-Reply-To: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

MURATA Makoto wrote:

> First, Simon and I were asked by the W3C team not to take any action
> on RFC 3023.  This is because the MIME type registration procedure was
> expected to change (see [1] and [2]).  So, Simon, Dan, and I can't do 
> anything right now.

Hmm, the TAG is pretty convinced that 3023 needs to change, so maybe Dan 
or Chris or TimBL could take this up internally.  I disagree that this 
should be frozen at the moment, since the TAG is quite likely to publish 
a document saying "RFC 3023 is wrong".

> As for the charset parameter, I am still uneasy to disallow or
> deprecate it.  But I agree to make "clear that nobody sending a
> media-type should send a charset for an XML media-type unless it
> REALLY REALLY KNOWS what it's sending," and to deprecate text/xml not
> because the charset parameter is harmful but because most XML is not
> text for casual users.

I think I provided a detailed explanation of why the charset is in fact 
actively harmful in the context of XML.  If you're not convinced it 
would be helpful if you could address those points.  If you already 
have, my apologies, perhaps you could give a pointer.

> I have repeatedly asked (e.g., [3]) what is the position of the TAG on
> charset detection for non-XML formats.  The latest version of the TAG
> finding document "Client handling of MIME headers" appears to
> recommend:

I read [3] and while I agree with much of it, it's obviously far too 
late to change the XML encoding declaration.  For the moment, I think 
that the architecturally-sound position is, for Web data formats, either 
(a) use XML, or (b) use the charset parameter.  I'm generally in favor 
of a general-purpose encoding-detection scheme such as you propose, but 
I'm pessimistic about getting it widely deployed for legacy formats.

> 	(1) non-self-describing data formats should rely on the
>             charset parameter, and
> 	(2) self-describing data formats should introduce their own
> 	    mechanism for specifying charsets.

I'll review the webarch doc, I suspect we haven't thought closely enough 
about this.

> As far as I know, the charset parameter is the only generic mechanism.  I 
> know the charset parameter is not working well, but I do not see any other 
> generic mechanisms.

I agree, but for XML formats, I still think the charset parameter is 
actively harmful and should be deprecated or even forbidden.  This is 
orthogonal to the larger question you (correctly) raise, of charset 
detection for non-XML formats.

-- 
Cheers, Tim Bray
         (ongoing fragmented essay: http://www.tbray.org/ongoing/)




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HGXgeo007055 for <ietf-xml-mime-bks@above.proper.com>; Wed, 17 Sep 2003 09:33:42 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8HGXglD007054 for ietf-xml-mime-bks; Wed, 17 Sep 2003 09:33:42 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.asahi-net.or.jp (mail2.asahi-net.or.jp [202.224.39.198]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8HGXfeo007048 for <ietf-xml-mime@imc.org>; Wed, 17 Sep 2003 09:33:41 -0700 (PDT) (envelope-from murata@hokkaido.email.ne.jp)
Received: from [127.0.0.1] (h220208.ppp.asahi-net.or.jp [61.114.220.208]) by mail.asahi-net.or.jp (Postfix) with ESMTP id A99AB611A; Thu, 18 Sep 2003 01:33:28 +0900 (JST)
Date: Thu, 18 Sep 2003 01:30:54 +0900
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
To: ietf-xml-mime@imc.org
Subject: Re: Requesting a revision of RFC3023
Cc: WWW-Tag <www-tag@w3.org>
Message-Id: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.06.02
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

First, Simon and I were asked by the W3C team not to take any action
on RFC 3023.  This is because the MIME type registration procedure was
expected to change (see [1] and [2]).  So, Simon, Dan, and I can't do 
anything right now.

As for the charset parameter, I am still uneasy to disallow or
deprecate it.  But I agree to make "clear that nobody sending a
media-type should send a charset for an XML media-type unless it
REALLY REALLY KNOWS what it's sending," and to deprecate text/xml not
because the charset parameter is harmful but because most XML is not
text for casual users.

I have repeatedly asked (e.g., [3]) what is the position of the TAG on
charset detection for non-XML formats.  The latest version of the TAG
finding document "Client handling of MIME headers" appears to
recommend:

	(1) non-self-describing data formats should rely on the
            charset parameter, and

	(2) self-describing data formats should introduce their own
	    mechanism for specifying charsets.

This implies that XQuery should introduce its own mechanism (see [4])
and that the compact syntax of RELAX NG should introduce another.
CSS, HTML, and XML already have different mechanisms.  I personally
think that this approach will make the current situation even worse.

Many textual data on the WWW requires charset detection.  For example:

1) plain text, XML, HTML, CSS, XQuery, VBScript, Javascript, JSP, perl, 
   RNG compact schemas,  etc. on the server side,

2) textual data generated by CGI programs, Servlets,  Applets, XSLT 
   stylesheets, etc. on the server side,

3) text typed in forms of HTML and sent as multipart/form-data via 
   HTTP also require charset information.

At present, different technologies have introduced different ad-hoc 
solutions  [5].  As a result, it is VERY HARD  to create well-internationalized 
WWW applications.  You have to use several mechanisms correctly.  Nobody
is trying to provide a generalized solution.

As far as I know, the charset parameter is the only generic mechanism.  I 
know the charset parameter is not working well, but I do not see any other 
generic mechanisms.

[1] http://lists.w3.org/Archives/Public/public-ietf-w3c/2003Jul/0000.html
[2] http://www.ietf.org/internet-drafts/draft-freed-mime-p4-02.txt
[3] http://lists.w3.org/Archives/Public/www-tag/2003Apr/0104.html
[4] http://www.w3.org/TR/xquery/#xquery-encoding
[5] http://www.asahi-net.or.jp/~eb2m-mrt/charsetDetection.html

Cheers,

Makoto
-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>




Received: from above.proper.com (localhost [127.0.0.1]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8GHqbeo095193 for <ietf-xml-mime-bks@above.proper.com>; Tue, 16 Sep 2003 10:55:08 -0700 (PDT) (envelope-from owner-ietf-xml-mime@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.9/8.12.9/Submit) id h8GHqbXO095192 for ietf-xml-mime-bks; Tue, 16 Sep 2003 10:52:37 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-xml-mime@mail.imc.org using -f
Received: from mail.dev.antarcti.ca (gt.antarcti.ca [209.17.183.233]) by above.proper.com (8.12.9/8.12.8) with ESMTP id h8GHo6eo095112 for <ietf-xml-mime@imc.org>; Tue, 16 Sep 2003 10:52:36 -0700 (PDT) (envelope-from tbray@textuality.com)
Received: from textuality.com (dev1.dev.antarcti.ca [10.1.1.8]) by mail.dev.antarcti.ca (Postfix) with ESMTP id DA1A010369 for <ietf-xml-mime@imc.org>; Tue, 16 Sep 2003 10:50:07 -0700 (PDT)
Message-ID: <3F674D53.7080906@textuality.com>
Date: Tue, 16 Sep 2003 10:50:11 -0700
From: Tim Bray <tbray@textuality.com>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030624
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: ietf-xml-mime@imc.org
Subject: Requesting a revision of RFC3023
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-xml-mime@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/>
List-ID: <ietf-xml-mime.imc.org>
List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe>

[Some of you will get this twice, sorry; Larry Masinter pointed out that 
my initial choice of destinations was poor.  I slightly revised the note 
to provide more context.]

The W3C TAG (http://www.w3.org/2001/tag/) has an open issue about proper 
handling of MIME headers, with a draft in progress "Client Handling of 
MIME Headers" (http://www.w3.org/2001/tag/doc/mime-respect.html); the 
draft finds some fault with the contents of RFC3023.

I took an action item to ask about the chances of revising what 3023 
says about the charset parameter; while I'm not sure, I suspect that 
there may actually be some level of consensus about the desirable changes:

1. Deprecate text/* for anything that's in XML.  That's because it 
forces the provider to provide a charset header, because in its absence 
the receiver is required to assume either ASCII or 8859 depending on the 
context, which has a very high probability of being wrong, which is 
irritating because if there were no charset header the client would be 
certain of either getting it right or failing deterministically.  And 
forcing the server to provide a charset= is wrong; see the next point.

2. Deprecate the charset parameter for application/xml and 
application/*+xml.  I think that Roy Fielding would like to go far as to 
simply outlaw it; I'd be fine with that too.  The reason is that the 
client is almost certain to get it right, and will fail 
deterministically if it doesn't.  For the server, on the other hand, 
this is easy to get wrong, particularly with the introduction of various 
kinds of filters in modern web servers.  And since the Web architecture 
and the XML spec both say that the server's claim has to be taken as 
authoritative, this is really highly dysfunctional.  At the very least, 
it should be made clear that nobody sending a media-type should send a 
charset for an XML media-type unless it REALLY REALLY KNOWS what it's 
sending, and in that case should consider not sending it anyhow.

Is there any chance we could do this?  It's going to be kind of 
embarrassing for TAG findings and the Webarch doc to be saying "don't do 
what this RFC says".
-- 
Cheers, Tim Bray
         (ongoing fragmented essay: http://www.tbray.org/ongoing/)





