
From: clive@demon.net (Clive D.W. Feather)
Date: Tue, 6 Jan 2004 10:34:32 +0000
Subject: [xml2rfc] Re: Abbreviation or end of sentence?
In-Reply-To: <5.1.0.14.2.20040105124833.00ba3138@127.0.0.1>
References: <20040102153617.32014ac8.henrik@levkowetz.com> <20040102144320.GJ75652@finch-staff-1.thus.net> <20040103002408.381f6c59.henrik@levkowetz.com> <3FF6B71D.9080007@gmx.de> <20040103142444.24bded48.henrik@levkowetz.com> <3FF6C463.9070702@gmx.de> <20040103145204.18880b92.henrik@levkowetz.com> <3FF6CE94.5060805@gmx.de> <20040103171959.18366810.henrik@levkowetz.com> <5.1.0.14.2.20040105124833.00ba3138@127.0.0.1>
Message-ID: <20040106103432.GD51961@finch-staff-1.thus.net>

Graham Klyne said:
>> I might want to mechanically put <abbrev> tags round each i.e. and e.g.
>> and Dr. and Mrs. in my text because, after all, they're abbreviations. I
>> could then automatically expand "<abbrev>Dr.</abbrev>" to "Doctor" in some
>> situations. When I do so, I don't want the sentence-end semantics to
>> change.
> 
> [Having very little to do with XML2RFC ...]
> 
> I learned [1] that a period is not used following a contraction (where the 
> last letter of the abbreviation is also the last letter of the full 
> word).  So "Dr" and "Mrs" above would properly not be followed by a period, 
> unless (in some contorted way?), they appear at the end of a sentence.

Even if so, there are similar abbreviations such as "Rev." which this
clearly doesn't apply to.

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646


From: clive@demon.net (Clive D.W. Feather)
Date: Mon, 5 Jan 2004 17:15:12 +0000
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <3FF5AFC4.4060906@gmx.de>
References: <3FEFFFD7.6020001@gmx.de> <20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FF0A5C5.6090800@gmx.de> <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com> <3FF58380.2030201@gmx.de> <20040102172621.72261265.henrik@levkowetz.com> <3FF5AFC4.4060906@gmx.de>
Message-ID: <20040105171512.GD2703@finch-staff-1.thus.net>

Julian Reschke said:
> I could live with &nbsp;, although I think it doesn't exactly mean what 
> we need (after all it means "no break here", and we don't want to forbid 
> a break, we just don't want an additional *space*).

My reading is that &nbsp; means "no break and no expansion". The latter is
what we want after an abbreviation; I would normally want the former as
well.

If you want a "can break but can't expand" character, then &ensp; (U+2002)
seems the right choice, since one en is the traditional width of an
inter-word space.

If you want a "can't break but can expand", then you put a fixed width
space like &ensp; followed by &zwsp; (U+200B), which is a zero width space
which is allowed to expand - i.e. it marks where space can be added.

I would recommend to Marshall that he implement all of these:

   &nbsp;  U+00A0      single space that never breaks or widens to two
   &ensp;  U+2002      single space that can break but never widens to two
   &emsp;  U+2003      double space that can break but never shrinks to one
   &fsp;   U+2007      space always exactly the same width as a digit
   &zwsp;  U+200B      place where space can be added when justifying

(&emsp; and &fsp; are useful semantic objects).

> Anyway, I'm not going to support a grammar element that is defined to 
> have "no" meaning except for a strange side effect...

Agreed.

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Mon, 5 Jan 2004 16:25:50 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040105151402.GI77123@finch-staff-1.thus.net>
References: <20040103213016.21c87c2a.henrik@levkowetz.com> <20040105121141.GE77123@finch-staff-1.thus.net> <20040105153408.4cd888d0.henrik@levkowetz.com> <20040105145024.GG77123@finch-staff-1.thus.net> <20040105160741.7d639afa.henrik@levkowetz.com> <20040105151402.GI77123@finch-staff-1.thus.net>
Message-ID: <20040105162550.2c22b141.henrik@levkowetz.com>

--Signature=_Mon__5_Jan_2004_16_25_50_+0100_b=GOnnbEt8_qT8R7
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Clive,

  Going by the excellent exposition given by John Klensin earlier,
two of your sentence ending examples below are not sentence endings
in either the british nor the logical style, only in the traditional
american style:

Monday  5 January 2004, Clive D.W. Feather wrote:
...
> Um, why? The commonest cases are:
> 
>   Sentence end:
>     He said "I am here."
Not end of sentence except in traditional US. style

>     He said "I am here".
>     He said "What is it?"
Not end of sentence except in traditional US. style

>     He said "What is it?".
>     She said 'he said "I am here"'.
...

In view of this, it seems simpler to let the editor add the necessary
end of sentence indications according to which style he's following.

	Henrik


--Signature=_Mon__5_Jan_2004_16_25_50_+0100_b=GOnnbEt8_qT8R7
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/+YH/eVhrtTJkXCMRApCPAKCwZlFXncUnSVjYjNVk0UoHe8cJYQCgl5Od
g6a8rU6iS9uDK5H4jZS9uBs=
=YyHL
-----END PGP SIGNATURE-----

--Signature=_Mon__5_Jan_2004_16_25_50_+0100_b=GOnnbEt8_qT8R7--


From: clive@demon.net (Clive D.W. Feather)
Date: Mon, 5 Jan 2004 15:14:02 +0000
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040105160741.7d639afa.henrik@levkowetz.com>
References: <20040103213016.21c87c2a.henrik@levkowetz.com> <20040105121141.GE77123@finch-staff-1.thus.net> <20040105153408.4cd888d0.henrik@levkowetz.com> <20040105145024.GG77123@finch-staff-1.thus.net> <20040105160741.7d639afa.henrik@levkowetz.com>
Message-ID: <20040105151402.GI77123@finch-staff-1.thus.net>

Henrik Levkowetz said:
>> I'm suggesting that markup is the wrong thing to do.
> Well, we seem to pretty consistently agree to disagree on this point :-)

Indeed.

> I think markup is right and adding significance to particular characters
> in the manner you propose is bad.  

Yet the primary problem is because particular characters (dot etc.) have
significance over and above their use as a graphic.

>> (4) A "sentence end sequence" is one of the characters . ! or ? followed by
>> zero or more of the characters ' or ".
> Yes, but you'll still have this come out wrong for some styles.

Um, why? The commonest cases are:

  Sentence end:
    He said "I am here."
    He said "I am here".
    He said "What is it?"
    He said "What is it?".
    She said 'he said "I am here"'.
  Not sentence end:
    Pilate asked 'What is truth?' and did not wait for an answer.

My proposal would get all of these right without problem.

About the only hard case I can come up with is:

    I go ... I come back.

where I would not expect the ... to end the sentence but my rules would.
And that's only because of the capital letter. In my proposal, you'd have
to write it as:

    I go ... &ensp; I come back.

> We might
> just as well leave it to explicit markup.

It ought to work right for as many unaltered texts as possible. Whatever
the override mechanism, it should only be for the corner cases.

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Mon, 5 Jan 2004 16:11:02 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <3FF97B5A.6000705@gmx.de>
References: <20040103213016.21c87c2a.henrik@levkowetz.com> <20040105121141.GE77123@finch-staff-1.thus.net> <20040105153408.4cd888d0.henrik@levkowetz.com> <3FF977A9.50909@gmx.de> <20040105155047.3dd41400.henrik@levkowetz.com> <3FF97B5A.6000705@gmx.de>
Message-ID: <20040105161102.054176e8.henrik@levkowetz.com>

--Signature=_Mon__5_Jan_2004_16_11_02_+0100_jet.ZTFkpqhty6aq
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Monday  5 January 2004, Julian Reschke wrote:
...
> >>However, I think in reality it's more complex than that. Should there be 
> >>an automatism to detect
> >>
> >>	<spanx>Foo.</spanx>
> >>
> >>as sentence end, or should that be handled through the overrides?
> > 
> > 
> > I haven't used <spanx />, so don't have a strong opinion on this.
> > What would you suggest?
> 
> Whatever we define, it shouldn't be a special case. Either say that 
> sentence ends are only detected automatically when there's no 
> interleaved markup (easy) or state that sentence end detection happens 
> logically during the output state (that would be harder).

Ok.

> Personally I think that the easy approach is good enough and more likely 
> to work robustly. If we have a manual override, that should be enough, i.e.
> 
> 	<spanx>Foo.</spanx><eos/>

Sounds good.

> BTW: I can live with <eos/> and <neos/> (now that these elements *do* 
> have semantics :-), but I think I still prefer PIs.

Ok. I guess we leave it up to Marshall then...

	Henrik

--Signature=_Mon__5_Jan_2004_16_11_02_+0100_jet.ZTFkpqhty6aq
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/+X6GeVhrtTJkXCMRAlzQAJwIjeCMxVzRKS3aTL+kRAiZUHOpuACgnif0
q/Z4gnGagixH42ZGibeRPJg=
=zya/
-----END PGP SIGNATURE-----

--Signature=_Mon__5_Jan_2004_16_11_02_+0100_jet.ZTFkpqhty6aq--


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Mon, 5 Jan 2004 16:07:41 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040105145024.GG77123@finch-staff-1.thus.net>
References: <20040103213016.21c87c2a.henrik@levkowetz.com> <20040105121141.GE77123@finch-staff-1.thus.net> <20040105153408.4cd888d0.henrik@levkowetz.com> <20040105145024.GG77123@finch-staff-1.thus.net>
Message-ID: <20040105160741.7d639afa.henrik@levkowetz.com>

--Signature=_Mon__5_Jan_2004_16_07_41_+0100_dh.LzwFho9o0AfUg
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Monday  5 January 2004, Clive D.W. Feather wrote:
> Henrik Levkowetz said:
> >     I think that your proposal below could be made to work, but I also
> > think that the rules are to complex and non-intuitive for the casual
> > user.
> 
> I don't.
> 
> > Having a basic rule of 
> >   a) any of  "."  "!"  "?"  followed by whitespace is sentence end
> 
> My basic rule allows closing quotes after them, but otherwise agrees. Oh,
> only if followed by a word beginning with a capital letter.
> 
> >   b) override markup <eos/> and <noeos/> 
> 
> I'm suggesting that markup is the wrong thing to do.

Well, we seem to pretty consistently agree to disagree on this point :-)

I think markup is right and adding significance to particular characters
in the manner you propose is bad.  


> Rather, use a wide
> space character to force end of sentence, and a non-break space to force
> not-end-of-sentence.
> The rest of my proposal is gravy.
> 
> >> (4) The characters . ! and ? are "sentence end characters". Any sequence of
> >> one or more sentence end characters, possibly mixed in with zero or more '
> >> or " characters, is a "sentence end sequence". [So both ". and ." are such
> >> sequences, but "' without one of the other three isn't.]
> 
> I just realized that could be simplified to:
> 
> (4) A "sentence end sequence" is one of the characters . ! or ? followed by
> zero or more of the characters ' or ".

Yes, but you'll still have this come out wrong for some styles.  We might
just as well leave it to explicit markup.

	Henrik


--Signature=_Mon__5_Jan_2004_16_07_41_+0100_dh.LzwFho9o0AfUg
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/+X29eVhrtTJkXCMRAkaiAKDNceYcnUiO1NUXI4NGvIQb4wX72wCfcb1o
Gz5zNAkKK2CZKnKtItXqznI=
=280Y
-----END PGP SIGNATURE-----

--Signature=_Mon__5_Jan_2004_16_07_41_+0100_dh.LzwFho9o0AfUg--


From: julian.reschke@gmx.de (Julian Reschke)
Date: Mon, 05 Jan 2004 15:57:30 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040105155047.3dd41400.henrik@levkowetz.com>
References: <20040103213016.21c87c2a.henrik@levkowetz.com>	<20040105121141.GE77123@finch-staff-1.thus.net>	<20040105153408.4cd888d0.henrik@levkowetz.com>	<3FF977A9.50909@gmx.de> <20040105155047.3dd41400.henrik@levkowetz.com>
Message-ID: <3FF97B5A.6000705@gmx.de>

Henrik Levkowetz wrote:

> Monday  5 January 2004, Julian Reschke wrote:
> 
>>Henrik Levkowetz wrote:
> 
> ...
> 
>>>    I think that your proposal below could be made to work, but I also
>>>think that the rules are to complex and non-intuitive for the casual
>>>user. Having a basic rule of 
>>>  a) any of  "."  "!"  "?"  followed by whitespace is sentence end
>>>and 
>>>  b) override markup <eos/> and <noeos/> 
>>>is a bit easier to both explain and use, I think.
>>
>>However, I think in reality it's more complex than that. Should there be 
>>an automatism to detect
>>
>>	<spanx>Foo.</spanx>
>>
>>as sentence end, or should that be handled through the overrides?
> 
> 
> I haven't used <spanx />, so don't have a strong opinion on this.
> What would you suggest?

Whatever we define, it shouldn't be a special case. Either say that 
sentence ends are only detected automatically when there's no 
interleaved markup (easy) or state that sentence end detection happens 
logically during the output state (that would be harder).

Personally I think that the easy approach is good enough and more likely 
to work robustly. If we have a manual override, that should be enough, i.e.

	<spanx>Foo.</spanx><eos/>


BTW: I can live with <eos/> and <neos/> (now that these elements *do* 
have semantics :-), but I think I still prefer PIs.

Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Mon, 5 Jan 2004 15:50:47 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <3FF977A9.50909@gmx.de>
References: <20040103213016.21c87c2a.henrik@levkowetz.com> <20040105121141.GE77123@finch-staff-1.thus.net> <20040105153408.4cd888d0.henrik@levkowetz.com> <3FF977A9.50909@gmx.de>
Message-ID: <20040105155047.3dd41400.henrik@levkowetz.com>

--Signature=_Mon__5_Jan_2004_15_50_47_+0100_DHv=wwtROPQg8JDV
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Monday  5 January 2004, Julian Reschke wrote:
> Henrik Levkowetz wrote:
...
> >     I think that your proposal below could be made to work, but I also
> > think that the rules are to complex and non-intuitive for the casual
> > user. Having a basic rule of 
> >   a) any of  "."  "!"  "?"  followed by whitespace is sentence end
> > and 
> >   b) override markup <eos/> and <noeos/> 
> > is a bit easier to both explain and use, I think.
> 
> However, I think in reality it's more complex than that. Should there be 
> an automatism to detect
> 
> 	<spanx>Foo.</spanx>
> 
> as sentence end, or should that be handled through the overrides?

I haven't used <spanx />, so don't have a strong opinion on this.
What would you suggest?

	Henrik

--Signature=_Mon__5_Jan_2004_15_50_47_+0100_DHv=wwtROPQg8JDV
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/+XnHeVhrtTJkXCMRAthPAKDZqSrnriLiIhdPw1HQlkM/WhrbYgCgugF9
/C8mCQe7WZaBunwMlU+pC8s=
=AaOE
-----END PGP SIGNATURE-----

--Signature=_Mon__5_Jan_2004_15_50_47_+0100_DHv=wwtROPQg8JDV--


From: clive@demon.net (Clive D.W. Feather)
Date: Mon, 5 Jan 2004 14:50:25 +0000
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040105153408.4cd888d0.henrik@levkowetz.com>
References: <20040103213016.21c87c2a.henrik@levkowetz.com> <20040105121141.GE77123@finch-staff-1.thus.net> <20040105153408.4cd888d0.henrik@levkowetz.com>
Message-ID: <20040105145024.GG77123@finch-staff-1.thus.net>

Henrik Levkowetz said:
>     I think that your proposal below could be made to work, but I also
> think that the rules are to complex and non-intuitive for the casual
> user.

I don't.

> Having a basic rule of 
>   a) any of  "."  "!"  "?"  followed by whitespace is sentence end

My basic rule allows closing quotes after them, but otherwise agrees. Oh,
only if followed by a word beginning with a capital letter.

>   b) override markup <eos/> and <noeos/> 

I'm suggesting that markup is the wrong thing to do. Rather, use a wide
space character to force end of sentence, and a non-break space to force
not-end-of-sentence.

The rest of my proposal is gravy.

>> (4) The characters . ! and ? are "sentence end characters". Any sequence of
>> one or more sentence end characters, possibly mixed in with zero or more '
>> or " characters, is a "sentence end sequence". [So both ". and ." are such
>> sequences, but "' without one of the other three isn't.]

I just realized that could be simplified to:

(4) A "sentence end sequence" is one of the characters . ! or ? followed by
zero or more of the characters ' or ".

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646


From: julian.reschke@gmx.de (Julian Reschke)
Date: Mon, 05 Jan 2004 15:41:45 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040105153408.4cd888d0.henrik@levkowetz.com>
References: <20040103213016.21c87c2a.henrik@levkowetz.com>	<20040105121141.GE77123@finch-staff-1.thus.net> <20040105153408.4cd888d0.henrik@levkowetz.com>
Message-ID: <3FF977A9.50909@gmx.de>

Henrik Levkowetz wrote:

> Hi Clive,
> 
>     I think that your proposal below could be made to work, but I also
> think that the rules are to complex and non-intuitive for the casual
> user. Having a basic rule of 
>   a) any of  "."  "!"  "?"  followed by whitespace is sentence end
> and 
>   b) override markup <eos/> and <noeos/> 
> is a bit easier to both explain and use, I think.

However, I think in reality it's more complex than that. Should there be 
an automatism to detect

	<spanx>Foo.</spanx>

as sentence end, or should that be handled through the overrides?

Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Mon, 5 Jan 2004 15:34:08 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040105121141.GE77123@finch-staff-1.thus.net>
References: <20040103213016.21c87c2a.henrik@levkowetz.com> <20040105121141.GE77123@finch-staff-1.thus.net>
Message-ID: <20040105153408.4cd888d0.henrik@levkowetz.com>

--Signature=_Mon__5_Jan_2004_15_34_08_+0100_if4DuIeGfT78B6Nx
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Hi Clive,

    I think that your proposal below could be made to work, but I also
think that the rules are to complex and non-intuitive for the casual
user. Having a basic rule of 
  a) any of  "."  "!"  "?"  followed by whitespace is sentence end
and 
  b) override markup <eos/> and <noeos/> 
is a bit easier to both explain and use, I think.

	Henrik


Monday  5 January 2004, Clive D.W. Feather wrote:
>
> (1) Within a <t /> (and anything else equivalent) all sequences of ordinary
> spaces, tabs, and newlines are treated as equivalent. I call them
> "whitespace" in the rest of this proposal.
> 
> (2) The following characters are recognised but are *not* whitespace:
> 
>     &nbsp;    U+00A0
>     &ensp;    U+2002
>     &emsp;    U+2003
>     &fsp;     U+2007
> 
> These are "other space characters". Their semantics are described below.
> [&fsp; is optional.]
> 
> (3) [Optional, to aid readability] Whitespace adjacent to other space
> characters is ignored.
> 
> (4) The characters . ! and ? are "sentence end characters". Any sequence of
> one or more sentence end characters, possibly mixed in with zero or more '
> or " characters, is a "sentence end sequence". [So both ". and ." are such
> sequences, but "' without one of the other three isn't.]
> 
> (5) A actual sentence ending occurs in three places:
>   (A) At the end of a paragraph, irrespective of the last character;
>   (B) Immediately before an &emsp; character;
>   (C) At the end of a sentence end sequence followed by whitespace and
>       then an uppercase letter.
> Case (A) can be ignored. In the case of (B) or (C), the &emsp; or whitespace
> is a "sentence gap".
> 
> (6) [Optional, to allow more control over spacing] The following character
> is recognised:
> 
>     &zwsp;    U+200B
> 
> If it occurs within whitespace it is ignored. If it occurs elsewhere it has
> semantics described below.
> 
> (7) The visual appearance, line-break, and justification properties of the
> various characters and sequences are:
> 
>                         Appearance       Break            Expand
>     sentence gap        2 spaces         yes              yes
>     other whitespace    1 space          yes              yes
>     &nbsp;              1 space          no               no
>     &ensp;              1 space          yes              no
>     &fsp;               1 space          no               no
>     &zwsp;              nothing          no               yes
> 
> "Break" means that the space (and any adjacent space) can be replaced by
> a line-break.
> "Expand" means that extra space can be added after the space to provide
> justification.
> 
> [The difference between &nbsp; and &fsp; is semantic - a &fsp; is a digit
> that happens to be all white, rather than a space character. In particular,
> it should in general not be viewed as a word separator.]
> 
> (8) [Optional, to provide user convenience] Define a processing directive
> something like:
>     <?rfc notse='i.e.'?>
> indicating that "i.e." is not a sentence ending string. You might also want
> a predefined list of these ("i.e", "e.g.", "Dr.", "Mrs.", "etc.", etc.),
> and a way of turning that list off.


--Signature=_Mon__5_Jan_2004_15_34_08_+0100_if4DuIeGfT78B6Nx
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/+XXgeVhrtTJkXCMRAkwGAJ9sTufUSXrLLHR62jLi+08nrRNSOQCaAhp3
8aNax/VM4n4RmfviCg5HPP4=
=p5M7
-----END PGP SIGNATURE-----

--Signature=_Mon__5_Jan_2004_15_34_08_+0100_if4DuIeGfT78B6Nx--


From: GK@ninebynine.org (Graham Klyne)
Date: Mon, 05 Jan 2004 13:00:54 +0000
Subject: [xml2rfc] Abbreviation or end of sentence?
In-Reply-To: <20040105113421.GB77123@finch-staff-1.thus.net>
References: <20040103171959.18366810.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com> <20040102144320.GJ75652@finch-staff-1.thus.net> <20040103002408.381f6c59.henrik@levkowetz.com> <3FF6B71D.9080007@gmx.de> <20040103142444.24bded48.henrik@levkowetz.com> <3FF6C463.9070702@gmx.de> <20040103145204.18880b92.henrik@levkowetz.com> <3FF6CE94.5060805@gmx.de> <20040103171959.18366810.henrik@levkowetz.com>
Message-ID: <5.1.0.14.2.20040105124833.00ba3138@127.0.0.1>

At 11:34 05/01/04 +0000, Clive D.W. Feather wrote:
>I might want to mechanically put <abbrev> tags round each i.e. and e.g.
>and Dr. and Mrs. in my text because, after all, they're abbreviations. I
>could then automatically expand "<abbrev>Dr.</abbrev>" to "Doctor" in some
>situations. When I do so, I don't want the sentence-end semantics to
>change.

[Having very little to do with XML2RFC ...]

I learned [1] that a period is not used following a contraction (where the 
last letter of the abbreviation is also the last letter of the full 
word).  So "Dr" and "Mrs" above would properly not be followed by a period, 
unless (in some contorted way?), they appear at the end of a sentence.

#g
--

[1] Bill Bryson, "Dictionary of Troublesome Words", section "abbreviations, 
contractions, acronyms"


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact



From: clive@demon.net (Clive D.W. Feather)
Date: Mon, 5 Jan 2004 12:11:41 +0000
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040103213016.21c87c2a.henrik@levkowetz.com>
References: <20040103213016.21c87c2a.henrik@levkowetz.com>
Message-ID: <20040105121141.GE77123@finch-staff-1.thus.net>

Henrik Levkowetz said:
> So, here's yet another end of sentence handling proposal.

And here is mine, based on using the various Unicode spaces.

(1) Within a <t /> (and anything else equivalent) all sequences of ordinary
spaces, tabs, and newlines are treated as equivalent. I call them
"whitespace" in the rest of this proposal.

(2) The following characters are recognised but are *not* whitespace:

    &nbsp;    U+00A0
    &ensp;    U+2002
    &emsp;    U+2003
    &fsp;     U+2007

These are "other space characters". Their semantics are described below.
[&fsp; is optional.]

(3) [Optional, to aid readability] Whitespace adjacent to other space
characters is ignored.

(4) The characters . ! and ? are "sentence end characters". Any sequence of
one or more sentence end characters, possibly mixed in with zero or more '
or " characters, is a "sentence end sequence". [So both ". and ." are such
sequences, but "' without one of the other three isn't.]

(5) A actual sentence ending occurs in three places:
  (A) At the end of a paragraph, irrespective of the last character;
  (B) Immediately before an &emsp; character;
  (C) At the end of a sentence end sequence followed by whitespace and
      then an uppercase letter.
Case (A) can be ignored. In the case of (B) or (C), the &emsp; or whitespace
is a "sentence gap".

(6) [Optional, to allow more control over spacing] The following character
is recognised:

    &zwsp;    U+200B

If it occurs within whitespace it is ignored. If it occurs elsewhere it has
semantics described below.

(7) The visual appearance, line-break, and justification properties of the
various characters and sequences are:

                        Appearance       Break            Expand
    sentence gap        2 spaces         yes              yes
    other whitespace    1 space          yes              yes
    &nbsp;              1 space          no               no
    &ensp;              1 space          yes              no
    &fsp;               1 space          no               no
    &zwsp;              nothing          no               yes

"Break" means that the space (and any adjacent space) can be replaced by
a line-break.
"Expand" means that extra space can be added after the space to provide
justification.

[The difference between &nbsp; and &fsp; is semantic - a &fsp; is a digit
that happens to be all white, rather than a space character. In particular,
it should in general not be viewed as a word separator.]

(8) [Optional, to provide user convenience] Define a processing directive
something like:
    <?rfc notse='i.e.'?>
indicating that "i.e." is not a sentence ending string. You might also want
a predefined list of these ("i.e", "e.g.", "Dr.", "Mrs.", "etc.", etc.),
and a way of turning that list off.

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646


From: clive@demon.net (Clive D.W. Feather)
Date: Mon, 5 Jan 2004 11:42:55 +0000
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <526732190.1073164446@scan.jck.com>
References: <20040104000200.19036.69117.Mailman@qawoor.dbc.mtview.ca.us> <526732190.1073164446@scan.jck.com>
Message-ID: <20040105114255.GD77123@finch-staff-1.thus.net>

John C Klensin said:
> The problem/mess is that the conventions for quotes at 
> sentence-end differ depending on which side of the pond one is 
> on.  If I recall, so do the quotes.  Specifically, the form
> 
> 	Sentence ends "here."
> 
> is, indeed, standard  usage.

Please don't characterise this as a US v UK issue, since it's much more
complex than that (as some of your other quotes show).

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646


From: clive@demon.net (Clive D.W. Feather)
Date: Mon, 5 Jan 2004 11:34:21 +0000
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040103171959.18366810.henrik@levkowetz.com>
References: <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com> <20040102144320.GJ75652@finch-staff-1.thus.net> <20040103002408.381f6c59.henrik@levkowetz.com> <3FF6B71D.9080007@gmx.de> <20040103142444.24bded48.henrik@levkowetz.com> <3FF6C463.9070702@gmx.de> <20040103145204.18880b92.henrik@levkowetz.com> <3FF6CE94.5060805@gmx.de> <20040103171959.18366810.henrik@levkowetz.com>
Message-ID: <20040105113421.GB77123@finch-staff-1.thus.net>

Henrik Levkowetz said:
>  * To avoid abbreviations triggering this, they may be enclosed in
>    <abbrev> </abbrev>.  Abbreviations ending a sentence can be left
>    without markup.

I don't like this, because it overloads the meaning.

I might want to mechanically put <abbrev> tags round each i.e. and e.g.
and Dr. and Mrs. in my text because, after all, they're abbreviations. I
could then automatically expand "<abbrev>Dr.</abbrev>" to "Doctor" in some
situations. When I do so, I don't want the sentence-end semantics to
change.

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646


From: clive@demon.net (Clive D.W. Feather)
Date: Mon, 5 Jan 2004 11:29:18 +0000
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040103011917.4d57f586.henrik@levkowetz.com>
References: <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com> <3FF58380.2030201@gmx.de> <20040102172621.72261265.henrik@levkowetz.com> <3FF5AFC4.4060906@gmx.de> <20040102233932.7a95820f.henrik@levkowetz.com> <3FF5F9E1.8070008@gmx.de> <20040103011917.4d57f586.henrik@levkowetz.com>
Message-ID: <20040105112918.GA77123@finch-staff-1.thus.net>

Henrik Levkowetz said:
> Julian, I've clarified what semantics <nul /> has, not said that it has
> no semantics. Please. 
> 
> Supposing you have an input symbol stream in which "." is followed
> immediately by WSP, or contrarily one in which "." is followed
> immediately by a <nul /> symbol and then WSP, or any other tag and then
> WSP.  

Since <nul /> has no semantics, that means you're using the presence or
absence of markup to decide where you have a sentence end - your "any other
tag".

And I've already shown that that's broken.

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646


From: braden@ISI.EDU (Bob Braden)
Date: Sun, 4 Jan 2004 14:21:38 -0800 (PST)
Subject: [xml2rfc] end of sentence: two spaces?
Message-ID: <200401042221.i04MLc800806@boreas.isi.edu>

  *> > 
  *> Of COURSE I do agree with the desire/need to have xml2rfc do the RIGHT 
  *> and CORRECT thing. 
  *> 
  *> All I was saying is that it seems wasting cycles if (RFC-)editor 
  *> (or any person for that matter) goes through a document in a very 
  *> detailed way and just add extra space in between sentences.
  *> 
  *> Bert
  *> 

Bert,

Sorry, we don't get your point.  RFCs have always used two spaces after
periods separating sentences.  It makes them more readable, in the
absence of a variable-width font.  This was a Jon Postel convention.

The RFC Editor has an automatic tool to find exceptions to this rule,
but it cannot correct them entirely automatically because of
ambiguities that require humans for resolutionm. But this is no big
deal; it represents a negligible part of the editorial effort.  OTOH, I
would be astonished to learn that xml2rfc does not already take care of
this already.

Happy New Year to all,

Bob Braden for the RFC Editor



From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Sun, 4 Jan 2004 23:09:31 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040104101138.4b0e7525.mrose+internet.xml2rfc@dbc.mtview.ca.us>
References: <20040104000200.19036.69117.Mailman@qawoor.dbc.mtview .ca.us> <526732190.1073164446@scan.jck.com> <3FF7F796.6040305@gmx.de> <20040104161641.124e78fb.henrik@levkowetz.com> <20040104101138.4b0e7525.mrose+internet.xml2rfc@dbc.mtview.ca.us>
Message-ID: <20040104230931.1720c1d9.henrik@levkowetz.com>

--Signature=_Sun__4_Jan_2004_23_09_31_+0100_v8Z8/4UresmiPY=O
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit


Sunday  4 January 2004, Marshall Rose wrote:
> so, what is the proposal currently under consideration, as stated in
> its most concise form?

This is the current proposal, amended based on comments from Alex
Rousskov and Julian.  Julian has not indicated agreement on using
markup rather than processing instructions, but seems OK with the
proposal otherwise.  

 * "." | "!" | "?" followed by one or more whitespace characters is  
   identified as sentence end.   (This also includes endings like
   "...", "?!", "!..").  

 * Sentences ending in a quote, where the punctuation is placed inside
   the quote (as in: Sentence blah blah "quote.") will not be 
   automatically identified as end of sentence, but will need explicit
   markup instead.

 * Forced sentence ending (needed e.g. when a sentence ends with
   a quote) is indicated by something like <hspace sentenceEnd="yes" />
   or <eos />, with preference for a short form markup like <eos />.

 * Whitespace adjacent to sentence end determined either implicitly or
   explicitly by markup will not be preserved; it will be possible to 
  write (space represented as "_"): "
_____This sentences ends with with a quote "To be or not to be."
_____<eos />  And here's the next sentence."
   or: "
_____Foo_bar.
_____Foo_bar."
   and have it come out with exactly 2 spaces between sentences.

 * Ignore sentence end (needed after abbreviations ending in ".")
   is indicated by something like <hspace sentenceEnd="no" /> or
   <noeos /> with preference for a short form markup like <noeos />

 * Sentence end is rendered by the .txt renderer as "." SP SP

	Henrik

--Signature=_Sun__4_Jan_2004_23_09_31_+0100_v8Z8/4UresmiPY=O
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/+I8beVhrtTJkXCMRAjxqAJ9nbBq5tApbYZzddFaB6t8EprTPmQCgnrPT
x3xMymb8/oxBmehwH+9XPLI=
=2PaY
-----END PGP SIGNATURE-----

--Signature=_Sun__4_Jan_2004_23_09_31_+0100_v8Z8/4UresmiPY=O--


From: swb@employees.org (Scott W Brim)
Date: Sun, 4 Jan 2004 13:58:27 -0500
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040104101138.4b0e7525.mrose+internet.xml2rfc@dbc.mtview.ca.us>
References: <20040104000200.19036.69117.Mailman@qawoor.dbc.mtview.ca.us> <526732190.1073164446@scan.jck.com> <3FF7F796.6040305@gmx.de> <20040104161641.124e78fb.henrik@levkowetz.com> <20040104101138.4b0e7525.mrose+internet.xml2rfc@dbc.mtview.ca.us>
Message-ID: <20040104185827.GC1496@sbrim-w2k01>

On Sun, Jan 04, 2004 10:11:38AM -0800, Marshall Rose allegedly wrote:
> so, what is the proposal currently under consideration, as stated in
> its most concise form?

I suggest leaving things as they are for the time being.


From: mrose+internet.xml2rfc@dbc.mtview.ca.us (Marshall Rose)
Date: Sun, 4 Jan 2004 10:11:38 -0800
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040104161641.124e78fb.henrik@levkowetz.com>
References: <20040104000200.19036.69117.Mailman@qawoor.dbc.mtview .ca.us> <526732190.1073164446@scan.jck.com> <3FF7F796.6040305@gmx.de> <20040104161641.124e78fb.henrik@levkowetz.com>
Message-ID: <20040104101138.4b0e7525.mrose+internet.xml2rfc@dbc.mtview.ca.us>

> I now agree.  Thanks John, for the clarifying information!

so, what is the proposal currently under consideration, as stated in its most
concise form?

thanks,

/mtr


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Sun, 4 Jan 2004 16:16:41 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <3FF7F796.6040305@gmx.de>
References: <20040104000200.19036.69117.Mailman@qawoor.dbc.mtview .ca.us> <526732190.1073164446@scan.jck.com> <3FF7F796.6040305@gmx.de>
Message-ID: <20040104161641.124e78fb.henrik@levkowetz.com>

Sunday  4 January 2004, Julian Reschke wrote:
> John,
> 
> thanks for the explanation. I was suspecting that this was partly an 
> Americanism :-)
> 
> So I'd vote for xml2rfc *not* to use these additonal rules.

I now agree.  Thanks John, for the clarifying information!

	Henrik


From: julian.reschke@gmx.de (Julian Reschke)
Date: Sun, 04 Jan 2004 12:23:02 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <526732190.1073164446@scan.jck.com>
References: <20040104000200.19036.69117.Mailman@qawoor.dbc.mtview .ca.us> <526732190.1073164446@scan.jck.com>
Message-ID: <3FF7F796.6040305@gmx.de>

John,

thanks for the explanation. I was suspecting that this was partly an 
Americanism :-)

So I'd vote for xml2rfc *not* to use these additonal rules.

Regards, Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: julian.reschke@gmx.de (Julian Reschke)
Date: Sun, 04 Jan 2004 12:19:40 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040104010119.69d1ae43.henrik@levkowetz.com>
References: <20040103213016.21c87c2a.henrik@levkowetz.com>	<Pine.BSF.4.58.0401031426320.51127@measurement-factory.com>	<3FF73EBF.7020905@gmx.de> <20040104010119.69d1ae43.henrik@levkowetz.com>
Message-ID: <3FF7F6CC.6060608@gmx.de>

Henrik Levkowetz wrote:

> Oh, you can rest assured, this will be enforced. Haven't you noticed
> the third paragraph of a draft, in the "status of this memo" section?

Indeed. Needless to say, when I wrote rfc2629.xslt I thought that was a 
typo and "fixed" it.

> ...
>>>Something short like <eos /> would be more likely to be used,
>>>IMHO. This should cover cases like (not the markup on a different
>>>line, with indentation)
>>>
>>>_____This sentences ends with the following URL: http://foo.org/
>>>_____<eos />
>>
>>Interesting example. I'd say that a sentence never can end in anything 
>>other than a punctuation mark, and that the above should be written as:
>>
>>_____This sentences ends with the following URL: <http://foo.org/>.
> 
> 
> ?????  If you insert markup saying "the sentence ends here", it should
> shouldn't be conditional on the particular preceding character!  Maybe
> I'm not understanding you correctly here.

Yep. I didn't intend to say that.

What I wanted to say is that I don't understand that example. A sentence 
  should always end with punctuation, thus if a sentence ends in a URL 
it should be written as:

	See also http://xml.resource.org.

Or:

	See also <http://xml.resource.org>.

But never:

	See also http://xml.resource.org

Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: julian.reschke@gmx.de (Julian Reschke)
Date: Sun, 04 Jan 2004 12:08:53 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <8169.1073173309@marajade.sandelman.ottawa.on.ca>
References: <8169.1073173309@marajade.sandelman.ottawa.on.ca>
Message-ID: <3FF7F445.8040004@gmx.de>

Michael Richardson wrote:
> Um. How important are two spaces between sentences to the RFC-editor.

That was indeed the first question asked, and the answer was that that 
RFC Editor indeed inserts the spaces. This is a bad thing because

- the RFC Editor needs to spend time on things that should be automated and

- the xml2rfc txt output differs from published material, causing for 
instance differences in page breaks, TOC and index getting out of sync 
and so on...

> We seem to be doing a lot of thinking here.
> 
> If it is really worth keeping the two spaces, then we should have sentences
> marked, as well as paragraphs. I.e. <s>Subject Verb Nounce</s>

But we don't, and it seems that (almost) nobody is willing to add that 
markup.

> I know how ugly that could be.
> (Frankly, I wish XML was more like latex, with a blank line between paragraphs!)

That's not XML's fault. The RFC2629 could be that way, but it isn't.

> It there anyway we can go the other way? Less markup rather than more?
> I say this as someone that liked the HTML 0.9 revision :-)
> 
> I think many of us do not edit the XML with anything other than Emacs or equivalent.

Yes, that's why we try to achieve the result with no additional markup 
(except in edge cases).

Julian


-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: john+xml@jck.com (John C Klensin)
Date: Sat, 03 Jan 2004 21:14:06 -0500
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040104000200.19036.69117.Mailman@qawoor.dbc.mtview.ca.us>
References: <20040104000200.19036.69117.Mailman@qawoor.dbc.mtview .ca.us>
Message-ID: <526732190.1073164446@scan.jck.com>

Folks, speaking as a native speaker of one of the languages 
called "English", we are rapidly headed into a mess here.  It 
would be good if the RFC Editor would say something definitive 
but, in the past, presumably out of respect to good sense in one 
instance and to our British colleagues in the other, both sets 
of forms have, I believe, been permitted.

The problem/mess is that the conventions for quotes at 
sentence-end differ depending on which side of the pond one is 
on.  If I recall, so do the quotes.  Specifically, the form

	Sentence ends "here."

is, indeed, standard  usage.  It is also, as some native 
speakers of English and virtually every non-native speaker has 
observed, bizarre -- bizarre enough that many technical 
publications have adopted style manuals that insist on the other 
form to avoid massive confusion.  Consider the sentence

	The name of the most-heavily-populated TLD is "COM.".

Now, neither of those periods is superfluous.  The second one 
ends the sentence, the first one if the formal notation for the 
root.  This could, of course, rather easily appear in an RFC 
(and probably does, somewhere).  Strict application of the rule 
would yield

	The name of the most-heavily-populated TLD is "COM.."

which defies all reason and is confusing and ambiguous.  Even 
Fowler (a, if not the, standard reference for anal-compulsive 
English) doesn't like the situation.  He says

	Questions of order between inverted commas and stops are
	much debated and a writer's personal preference often
	conflicts with the style rules of editors and
	publishers.  There are two schools of thought, which
	might be called the conventional and the logical.  The
	conventional prefers to put stops within the inverted
	commas, if it can be done without ambiguity, on the
	ground that this has a more pleasing appearance.  The
	logical punctuates according to sense, and puts them
	outside except when they actually form part of the
	quotation.   [...]
	
	In the treatment of question and exclamation marks the
	systems tend to merge, perhaps because those symbols
	show up so glaringly the illogicality of the
	conventional one.  In the following examples, the
	punctuation is standard under either system:
[...]
	I said 'Am I my brother's keeper?'
	Did you say 'Am I my brother's keeper'?

he then proceeds to say

	[...] The conventional system flouts common sense, and
	it is not easy to see what merit it is supposed to have
	to outweigh that defect; even the more pleasing
	appearance claimed for it is not likely to go
	unquestioned.

So this argument has been going on for at least 80 years or so. 
Even the strongest advocates of the conventional system believe 
that it should yield to clarity when that is necessary, and it 
quite often is necessary in technical writing.

Any sort of heuristics must, I think, either prefer the 
"logical" form or are regularly going to end up in deep trouble, 
requiring yet more markup.

And, just to make this more exciting, the convention in American 
English is to use double quotes to set off quotations, using 
single quotes only within them for nesting, e.g.,

   Tom said "Dick claims 'Harry is incoherent'".

while, if one crosses the pond (I suspect either pond, but am 
not sure and am now curious), single quotes are the preferred 
form, with the double ones being nested, e.g.,

   Tom said 'Dick claims "Harry is incoherent"'.

And, just in case we aren't having enough fun yet, the 
most-cited style manual for American English (the "Chicago 
Manual..."), at least in my early edition, claims that the 
examples given by Fowler many years earlier are incorrect, since 
they want quotations in running text to start with a lower-case 
letter even if the original was a complete sentence, e.g.,

	I said "am I my brother's keeper?"

Sigh.

      john


Henrik Levkowetz <henrik@levkowetz.com>(I think) wrote:

>> It looks like the above does not cover the strange English
>> quotation rules where you are supposed to write
>>
>> 	Sentence ends "here."
>> and not
>> 	Sentence ends "here".
>
> True. I thought of those anomalies, but thought I'd not bring
> that into the discussion yet...  ,:-)
>
>> The same set of rules probably includes things like
>>
>> 	Does this sentence end with "foo?"
>
> Right.
>
>> but a native speaker should double check that.
>>
>> I suspect single quotes should be handled the same way.
>
> Possibly.  In this case, I simply don't know.






From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Sun, 4 Jan 2004 01:01:19 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <3FF73EBF.7020905@gmx.de>
References: <20040103213016.21c87c2a.henrik@levkowetz.com> <Pine.BSF.4.58.0401031426320.51127@measurement-factory.com> <3FF73EBF.7020905@gmx.de>
Message-ID: <20040104010119.69d1ae43.henrik@levkowetz.com>

Saturday  3 January 2004, Julian Reschke wrote:
> Alex Rousskov wrote:
> 
> > I assume the above already covers endings like "...", "?!", and
> > "!.."
> > 
> > It looks like the above does not cover the strange English quotation
> > rules where you are supposed to write
> > 
> > 	Sentence ends "here."
> > and not
> > 	Sentence ends "here".
> 
> I'm starting to think that RFCs should be written in Latin :-) Honestly, 
> I hope that the RFC Editor doesn't enforce *these* rules as well (insert 
> flame war about America being old-fashioned, the metric system and date 
> formats here...).

Oh, you can rest assured, this will be enforced. Haven't you noticed
the third paragraph of a draft, in the "status of this memo" section?

Personally, I think this is perverse and misguided if not outright
unethical :-) but if the style guide says that the way it should be,
we'll probably follow that till an RFC says otherwise or the manual of
style is changed.

> > The same set of rules probably includes things like
> > 
> > 	Does this sentence end with "foo?"
> > 
> > but a native speaker should double check that.
> 
> Seems to be another good reason to have manual overrides. I'd certainly 
> *not* want to put that burden onto the RFC2629 formatting engine.

That's not particularly hard. If you can recognise
    ("." | "?" | "!")(" ")
you certainly can recognise
    ("." | "?" | "!")(" " | "\"")

> > Something short like <eos /> would be more likely to be used,
> > IMHO. This should cover cases like (not the markup on a different
> > line, with indentation)
> > 
> > _____This sentences ends with the following URL: http://foo.org/
> > _____<eos />
> 
> Interesting example. I'd say that a sentence never can end in anything 
> other than a punctuation mark, and that the above should be written as:
> 
> _____This sentences ends with the following URL: <http://foo.org/>.

?????  If you insert markup saying "the sentence ends here", it should
shouldn't be conditional on the particular preceding character!  Maybe
I'm not understanding you correctly here.

	Henrik


From: mcr@sandelman.ottawa.on.ca (Michael Richardson)
Date: Sat, 03 Jan 2004 18:41:49 -0500
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: Your message of "Sat, 03 Jan 2004 01:19:17 +0100." <20040103011917.4d57f586.henrik@levkowetz.com>
Message-ID: <8169.1073173309@marajade.sandelman.ottawa.on.ca>

-----BEGIN PGP SIGNED MESSAGE-----


Um. How important are two spaces between sentences to the RFC-editor.

We seem to be doing a lot of thinking here.

If it is really worth keeping the two spaces, then we should have sentences
marked, as well as paragraphs. I.e. <s>Subject Verb Nounce</s>

I know how ugly that could be.
(Frankly, I wish XML was more like latex, with a blank line between paragraphs!)

It there anyway we can go the other way? Less markup rather than more?
I say this as someone that liked the HTML 0.9 revision :-)

I think many of us do not edit the XML with anything other than Emacs or equivalent.

]       ON HUMILITY: to err is human. To moo, bovine.           |  firewalls  [
]   Michael Richardson,    Xelerance Corporation, Ottawa, ON    |net architect[
] mcr@xelerance.com      http://www.sandelman.ottawa.on.ca/mcr/ |device driver[
] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Finger me for keys

iQCVAwUBP/dTO4qHRg3pndX9AQG81wP/f9aNCtI3R/rNK3WPsYJ9ZV55SB62p7T8
3uXtnY12BSjop1AFVmdfeokg6u5GqlqfAnBu0Dg/dj3EFR56NBHFdq9DOd85+O+F
mygHvY/RGfCLf3mI3Sd3zcBEcN6aTLrgR3inNZITmjzCZ3coijTC0KCxTOUFq70s
WNWOYWm51sA=
=seN6
-----END PGP SIGNATURE-----


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Sat, 3 Jan 2004 23:41:20 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <3FF73D66.2080706@gmx.de>
References: <20040103213016.21c87c2a.henrik@levkowetz.com> <3FF73D66.2080706@gmx.de>
Message-ID: <20040103234120.62f72a22.henrik@levkowetz.com>

--Signature=_Sat__3_Jan_2004_23_41_20_+0100__h+2yaaQyagd2uzZ
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Saturday  3 January 2004, Julian Reschke wrote:
> Henrik Levkowetz wrote:
> > So, here's yet another end of sentence handling proposal. The first
> > name I've proposed for the markup element is inspired by the <vspace />
> > element, but that this be the exact name isn't essential:
> > 
> >  * "." | "!" | "?" followed by one or more whitespace characters is  
> >    identified as sentence end.   (Should any others characters trigger
> >    this?)
> > 
> >  * Forced sentence ending (needed when a sentence ends with an
> >    abbreviation) : <hspace sentenceEnd="yes" /> or maybe <eos />
> > 
> >  * Ignore sentence end (needed after abbreviations ending in "."):  
> >    <hspace sentenceEnd="no" /> or maybe <noeos />
> > 
> > ...
> 
> I'd like to understand why you feel that a simple PI such as <?eos?> (or 
> <?neos?>) wouldn't work. Do you feel that PIs in some way are 
> second-class XML features?

I'm sure it would work. I think style-wise it's inappropriate. 

I've been using PI's together with markup in files used to generate
parts of my website for something like 4 years, and definitely don't
think they're second class.  I just 1. think it's not the most
appropriate in this case and also 2. can just see how often I'll be
writing draft text, working with markup, and write <eos /> instead of
<?eos?> (or more probably <?rfc eos="true"?>).

> In this particular case they seem to me the exactly right thing to use. 
> RFC2629 doesn't care about marking up sentence boundaries, and all 
> except one output format (text) doesn't care about it, so this is a 
> specific processing instruction (in the non-XML meaning) for a specific 
> formatter.

Well, as a draft editor I see it as part of writing the draft, rather
than part of indicating how I'd like it generated (which is the case for
most of the rfc PIs), so I guess we'll just have to continue to disagree
on this point.

	Henrik

--Signature=_Sat__3_Jan_2004_23_41_20_+0100__h+2yaaQyagd2uzZ
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/90UQeVhrtTJkXCMRAnHyAKC66LAbWSIkAuET6tX3t2YB85y4UQCdFuMT
WW2mRtxCZPp6upk56gRk3Bc=
=EYBh
-----END PGP SIGNATURE-----

--Signature=_Sat__3_Jan_2004_23_41_20_+0100__h+2yaaQyagd2uzZ--


From: julian.reschke@gmx.de (Julian Reschke)
Date: Sat, 03 Jan 2004 23:14:23 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <Pine.BSF.4.58.0401031426320.51127@measurement-factory.com>
References: <20040103213016.21c87c2a.henrik@levkowetz.com> <Pine.BSF.4.58.0401031426320.51127@measurement-factory.com>
Message-ID: <3FF73EBF.7020905@gmx.de>

Alex Rousskov wrote:

> I assume the above already covers endings like "...", "?!", and
> "!.."
> 
> It looks like the above does not cover the strange English quotation
> rules where you are supposed to write
> 
> 	Sentence ends "here."
> and not
> 	Sentence ends "here".

I'm starting to think that RFCs should be written in Latin :-) Honestly, 
I hope that the RFC Editor doesn't enforce *these* rules as well (insert 
flame war about America being old-fashioned, the metric system and date 
formats here...).

> The same set of rules probably includes things like
> 
> 	Does this sentence end with "foo?"
> 
> but a native speaker should double check that.

Seems to be another good reason to have manual overrides. I'd certainly 
*not* want to put that burden onto the RFC2629 formatting engine.

> Something short like <eos /> would be more likely to be used,
> IMHO. This should cover cases like (not the markup on a different
> line, with indentation)
> 
> _____This sentences ends with the following URL: http://foo.org/
> _____<eos />

Interesting example. I'd say that a sentence never can end in anything 
other than a punctuation mark, and that the above should be written as:

_____This sentences ends with the following URL: <http://foo.org/>.

> Should HTML renderer do the same? Should it insert sentence markup
> (e.g., <span class="sentence">sentence.</span>) so that users can
> control spacing with CSS?

In rfc2629.xslt I'd probably add an empty span element that can be made 
visible for debugging purposes.

> Thanks guys for converging on a solution before killing each other!

Still working on it. In the end Marshall needs to decide whether he 
wants to implement all of that, and how to markup the special cases.


-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: julian.reschke@gmx.de (Julian Reschke)
Date: Sat, 03 Jan 2004 23:08:38 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040103213016.21c87c2a.henrik@levkowetz.com>
References: <20040103213016.21c87c2a.henrik@levkowetz.com>
Message-ID: <3FF73D66.2080706@gmx.de>

Henrik Levkowetz wrote:
> So, here's yet another end of sentence handling proposal. The first
> name I've proposed for the markup element is inspired by the <vspace />
> element, but that this be the exact name isn't essential:
> 
>  * "." | "!" | "?" followed by one or more whitespace characters is  
>    identified as sentence end.   (Should any others characters trigger
>    this?)
> 
>  * Forced sentence ending (needed when a sentence ends with an
>    abbreviation) : <hspace sentenceEnd="yes" /> or maybe <eos />
> 
>  * Ignore sentence end (needed after abbreviations ending in "."):  
>    <hspace sentenceEnd="no" /> or maybe <noeos />
> 
> ...

I'd like to understand why you feel that a simple PI such as <?eos?> (or 
<?neos?>) wouldn't work. Do you feel that PIs in some way are 
second-class XML features?

In this particular case they seem to me the exactly right thing to use. 
RFC2629 doesn't care about marking up sentence boundaries, and all 
except one output format (text) doesn't care about it, so this is a 
specific processing instruction (in the non-XML meaning) for a specific 
formatter.

Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: julian.reschke@gmx.de (Julian Reschke)
Date: Sat, 03 Jan 2004 23:05:41 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040103212822.240867e3.henrik@levkowetz.com>
References: <3FEEDA07.3070900@gmx.de>	<20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FEFFFD7.6020001@gmx.de>	<20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FF0A5C5.6090800@gmx.de>	<20031230105949.GH76219@finch-staff-1.thus.net>	<20031231005501.31892d3b.henrik@levkowetz.com>	<20040102111006.GF75652@finch-staff-1.thus.net>	<20040102153617.32014ac8.henrik@levkowetz.com>	<20040102144320.GJ75652@finch-staff-1.thus.net>	<20040103002408.381f6c59.henrik@levkowetz.com>	<3FF6B71D.9080007@gmx.de>	<20040103142444.24bded48.henrik@levkowetz.com>	<3FF6C463.9070702@gmx.de>	<20040103145204.18880b92.henrik@levkowetz.com>	<3FF6CE94.5060805@gmx.de>	<20040103171959.18366810.henrik@levkowetz.com>	<3FF6F3D1.6040505@gmx.de>	<20040103191152.6dc6128a.henrik@levkowetz.com>	<3FF70E2E.9070209@gmx.de> <20040103212822.240867e3.henrik@levkowetz.com>
Message-ID: <3FF73CB5.2070607@gmx.de>

Henrik Levkowetz wrote:
>>Sure. For instance (using "_" instead of SP), in
>>
>><t>
>>___Foo_bar.
>>___Foo_bar.
>></t>
>>
>>I'd like to see xml2rfc to produce:
>>
>>Foo_bar.__Foo_bar.
>>
>>Not:
>>
>>___Foo_bar.____Foo_bar.
> 
> 
> Thanks for the example.  For the end-of-sentence part I agree, for
> any other case there should be no change to the current processing.

Correct. That *wouldn't' be a change. I'd just like that to be clarified 
(multiple whitespace characters in text content are treated as a single 
one).

>>Thus
>>
>>- we need to describe the rules that xml2rfc can use to detect sentence 
>>endings to do "the right thing", and
>>
>>- possibly we need explicit "instructions" by which the default rules 
>>can be overridden (elements, PIs or special characters have been discussed).
>>
>>Of course it makes sense to minimize the number of these special cases, 
>>therefore the idea of making abbreviations explicit. That's useful 
>>anyway, because processors may be able to take advantage of that when 
>>producing other outputs, such as HTML.
> 
> 
> Could be so, sure. But this is an addition to what we set out to solve.

Agreed. We can discuss this independantly.

> ...

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Sat, 3 Jan 2004 22:55:21 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <Pine.BSF.4.58.0401031426320.51127@measurement-factory.com>
References: <20040103213016.21c87c2a.henrik@levkowetz.com> <Pine.BSF.4.58.0401031426320.51127@measurement-factory.com>
Message-ID: <20040103225521.22c90987.henrik@levkowetz.com>

--Signature=_Sat__3_Jan_2004_22_55_21_+0100_YMuyO5q+hlG..VjM
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Saturday  3 January 2004, Alex Rousskov wrote:
> 
> On Sat, 3 Jan 2004, Henrik Levkowetz wrote:
> 
> > So, here's yet another end of sentence handling proposal. The first
> > name I've proposed for the markup element is inspired by the <vspace
> > /> element, but that this be the exact name isn't essential:
> >
> >  * "." | "!" | "?" followed by one or more whitespace characters is
> >    identified as sentence end.   (Should any others characters trigger
> >    this?)
> 
> I assume the above already covers endings like "...", "?!", and
> "!.."

Right.

> It looks like the above does not cover the strange English quotation
> rules where you are supposed to write
> 
> 	Sentence ends "here."
> and not
> 	Sentence ends "here".

True. I thought of those anomalies, but thought I'd not bring that
into the discussion yet...  ,:-)

> The same set of rules probably includes things like
> 
> 	Does this sentence end with "foo?"

Right.

> but a native speaker should double check that.
> 
> I suspect single quotes should be handled the same way.

Possibly.  In this case, I simply don't know.

> >  * Forced sentence ending (needed when a sentence ends with an
> >    abbreviation) : <hspace sentenceEnd="yes" /> or maybe <eos />
> 
> Something short like <eos /> would be more likely to be used,
> IMHO. This should cover cases like (note the markup on a different
> line, with indentation)
> 
> _____This sentences ends with the following URL: http://foo.org/
> _____<eos />

Yes.

> >  * Ignore sentence end (needed after abbreviations ending in "."):
> >    <hspace sentenceEnd="no" /> or maybe <noeos />
> >
> >  * Sentence end is rendered by the .txt renderer as "." SP SP
> 
> Should HTML renderer do the same? Should it insert sentence markup
> (e.g., <span class="sentence">sentence.</span>) so that users can
> control spacing with CSS?

I don't think so, but that's not a strong opinion.

> Thanks guys for converging on a solution before killing each other!

We were close, weren't we ,,:-)

	Henrik

--Signature=_Sat__3_Jan_2004_22_55_21_+0100_YMuyO5q+hlG..VjM
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/9zpJeVhrtTJkXCMRAjhFAJ0Y7Gb/3lgQNXcdirduXwdVWaQQ+ACfaK4u
XGQvQ7A66ojIhqH+whxhfyY=
=RnN5
-----END PGP SIGNATURE-----

--Signature=_Sat__3_Jan_2004_22_55_21_+0100_YMuyO5q+hlG..VjM--


From: rousskov@measurement-factory.com (Alex Rousskov)
Date: Sat, 3 Jan 2004 14:38:21 -0700 (MST)
Subject: [xml2rfc] Another end of sentence handling proposal
In-Reply-To: <20040103213016.21c87c2a.henrik@levkowetz.com>
References: <20040103213016.21c87c2a.henrik@levkowetz.com>
Message-ID: <Pine.BSF.4.58.0401031426320.51127@measurement-factory.com>

On Sat, 3 Jan 2004, Henrik Levkowetz wrote:

> So, here's yet another end of sentence handling proposal. The first
> name I've proposed for the markup element is inspired by the <vspace
> /> element, but that this be the exact name isn't essential:
>
>  * "." | "!" | "?" followed by one or more whitespace characters is
>    identified as sentence end.   (Should any others characters trigger
>    this?)

I assume the above already covers endings like "...", "?!", and
"!.."

It looks like the above does not cover the strange English quotation
rules where you are supposed to write

	Sentence ends "here."
and not
	Sentence ends "here".

The same set of rules probably includes things like

	Does this sentence end with "foo?"

but a native speaker should double check that.

I suspect single quotes should be handled the same way.

>  * Forced sentence ending (needed when a sentence ends with an
>    abbreviation) : <hspace sentenceEnd="yes" /> or maybe <eos />

Something short like <eos /> would be more likely to be used,
IMHO. This should cover cases like (not the markup on a different
line, with indentation)

_____This sentences ends with the following URL: http://foo.org/
_____<eos />

>  * Ignore sentence end (needed after abbreviations ending in "."):
>    <hspace sentenceEnd="no" /> or maybe <noeos />
>
>  * Sentence end is rendered by the .txt renderer as "." SP SP

Should HTML renderer do the same? Should it insert sentence markup
(e.g., <span class="sentence">sentence.</span>) so that users can
control spacing with CSS?

Thanks guys for converging on a solution before killing each other!

Alex.


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Sat, 3 Jan 2004 21:30:16 +0100
Subject: [xml2rfc] Another end of sentence handling proposal
Message-ID: <20040103213016.21c87c2a.henrik@levkowetz.com>

So, here's yet another end of sentence handling proposal. The first
name I've proposed for the markup element is inspired by the <vspace />
element, but that this be the exact name isn't essential:

 * "." | "!" | "?" followed by one or more whitespace characters is  
   identified as sentence end.   (Should any others characters trigger
   this?)

 * Forced sentence ending (needed when a sentence ends with an
   abbreviation) : <hspace sentenceEnd="yes" /> or maybe <eos />

 * Ignore sentence end (needed after abbreviations ending in "."):  
   <hspace sentenceEnd="no" /> or maybe <noeos />

 * Sentence end is rendered by the .txt renderer as "." SP SP


	Henrik


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Sat, 3 Jan 2004 21:28:22 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <3FF70E2E.9070209@gmx.de>
References: <3FEEDA07.3070900@gmx.de> <20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FEFFFD7.6020001@gmx.de> <20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FF0A5C5.6090800@gmx.de> <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com> <20040102144320.GJ75652@finch-staff-1.thus.net> <20040103002408.381f6c59.henrik@levkowetz.com> <3FF6B71D.9080007@gmx.de> <20040103142444.24bded48.henrik@levkowetz.com> <3FF6C463.9070702@gmx.de> <20040103145204.18880b92.henrik@levkowetz.com> <3FF6CE94.5060805@gmx.de> <20040103171959.18366810.henrik@levkowetz.com> <3FF6F3D1.6040505@gmx.de> <20040103191152.6dc6128a.henrik@levkowetz.com> <3FF70E2E.9070209@gmx.de>
Message-ID: <20040103212822.240867e3.henrik@levkowetz.com>

--Signature=_Sat__3_Jan_2004_21_28_22_+0100_zSS7l1U_cGfF2/++
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Saturday  3 January 2004, Julian Reschke wrote:
> Henrik Levkowetz wrote:
> 
> >>I'm not sure why it's relevant whether it has the same name in HTML. In 
> >>fact, if it means the same thing, I'd *prefer* it to use the same name. 
> >>But that's not really important.
> > 
> > 
> > As we're defining markup for xml2rfc and related processors here, we don't
> > necessarily want to bind this element to have exactly the same semantics
> > as <abbr />.  
> 
> I didn't say that I *necessarily* want the same syntax. I just find it 
> pointless to avoid identical names. RFC2629 is not HTML, and should 
> there ever be a need to mix both vocabularies, this should be done using 
> XML namespaces.
> 
> BTW: DocBook has an <abbrev> element 
> (<http://docbook.org/tdg/en/html/abbrev.html>) which -- surprise -- 
> seems to have the same semantics.

Well, I'd be happy to leave this up to Marshall, if he chooses to
incorporate something like this.

> >>> * "." | "!" | "?" followed by whitespace is identified as sentence end.
> >>>   (any others?)
> >>
> >>Plus:
> >>
> >>* Sequences of multiple whitespace characters inside text (t, spanx, 
> >>annotation...) are treated as a single character.
>  >
>  >
>  > Could you give an example of what you mean?
> 
> Sure. For instance (using "_" instead of SP), in
> 
> <t>
> ___Foo_bar.
> ___Foo_bar.
> </t>
> 
> I'd like to see xml2rfc to produce:
> 
> Foo_bar.__Foo_bar.
> 
> Not:
> 
> ___Foo_bar.____Foo_bar.

Thanks for the example.  For the end-of-sentence part I agree, for
any other case there should be no change to the current processing.

> >>Well. In fact they must be left without markup, unless no sentence end 
> >>will be detected (at least as far as I understand the model you're 
> >>proposing). Obviously (not being able to use <abbrev> because it occured 
> >>at a sentence end) that would be a bad thing.
> > 
> > 
> > The purpose of using <abbrev /> would be to avoid interpreting "." SP
> > as sentence end.  So if you don't need to avoid it because you _have_
> > sentence end, you're fine, no?  Unless you want to give
> > <abbrev /> additional semantics, of course.
> 
> Sure. That's the whole point. If we introduce a new element to indicate 
> "this is an abbreviation" it needs to work everywhere. I was just 
> proposing it because in many cases it would be enough for disambiguating 
>   the types of dots.
> 
> The issue we need to solve is caused by the fact that RFC2629 (like 
> almost all other vocabularies, btw) does not specifically markup 
> sentence boundaries. This seems to be a non-issue, unless you have 
> indeed to produce monospaced output and are stuck with specific 
> formatting requirements.
> 
> It seems that we all agree that
> 
> - we don't want to break existing files,

Agreed

> - we don't want to do any additional typing unless it's completely 
> unavoidable.

Agreed

> Thus
> 
> - we need to describe the rules that xml2rfc can use to detect sentence 
> endings to do "the right thing", and
>
> - possibly we need explicit "instructions" by which the default rules 
> can be overridden (elements, PIs or special characters have been discussed).
> 
> Of course it makes sense to minimize the number of these special cases, 
> therefore the idea of making abbreviations explicit. That's useful 
> anyway, because processors may be able to take advantage of that when 
> producing other outputs, such as HTML.

Could be so, sure. But this is an addition to what we set out to solve.

> >>> * Sentence end is rendered by the .txt renderer as "." SP SP
> >>
> >>That still leaves the issue open how to handle things like.
> >>
> >>	<spanx>First sentence.</spanx> Second sentence.
> >>
> >>I think in this case the processor should be able to determine that 
> >>there was in fact a sentence ending here.
> 
> OK, so we finally seem to converge on the same plan.
> 
> However, I think we haven't fully described that algorithm yet. I'll 
> have to do some more experimentation with example texts to see which 
> cases we haven't considered.
> 
> Generally I think we'll need overrides for both cases (sentence end not 
> detected, or sentence end detected where there wasn't any). These 
> situations should be rare, and I think in these cases special PIs are 
> the least intrusive solution.

1. If you want to provide both overrides, we can dispense with the
   <abbrev /> element; that is superfluous in this case, and becomes
   a separate proposition.

2. I still think markup rather than PIs is more appropriate for this.

So, if we can't agree to let <abbrev /> be the override, and <nul />
used to override sentence end is too hard to understand, I'll be
posting another proposal.

	Henrik

--Signature=_Sat__3_Jan_2004_21_28_22_+0100_zSS7l1U_cGfF2/++
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/9yXmeVhrtTJkXCMRAthaAJ9phXAPFVEsqaTEodniBI4Cdf08QQCeIK00
6RuMhZQYKmTyMaf2fwS6Tdw=
=Z1JE
-----END PGP SIGNATURE-----

--Signature=_Sat__3_Jan_2004_21_28_22_+0100_zSS7l1U_cGfF2/++--


From: julian.reschke@gmx.de (Julian Reschke)
Date: Sat, 03 Jan 2004 19:47:10 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040103191152.6dc6128a.henrik@levkowetz.com>
References: <3FEEDA07.3070900@gmx.de>	<20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FEFFFD7.6020001@gmx.de>	<20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FF0A5C5.6090800@gmx.de>	<20031230105949.GH76219@finch-staff-1.thus.net>	<20031231005501.31892d3b.henrik@levkowetz.com>	<20040102111006.GF75652@finch-staff-1.thus.net>	<20040102153617.32014ac8.henrik@levkowetz.com>	<20040102144320.GJ75652@finch-staff-1.thus.net>	<20040103002408.381f6c59.henrik@levkowetz.com>	<3FF6B71D.9080007@gmx.de>	<20040103142444.24bded48.henrik@levkowetz.com>	<3FF6C463.9070702@gmx.de>	<20040103145204.18880b92.henrik@levkowetz.com>	<3FF6CE94.5060805@gmx.de>	<20040103171959.18366810.henrik@levkowetz.com>	<3FF6F3D1.6040505@gmx.de> <20040103191152.6dc6128a.henrik@levkowetz.com>
Message-ID: <3FF70E2E.9070209@gmx.de>

Henrik Levkowetz wrote:

>>I'm not sure why it's relevant whether it has the same name in HTML. In 
>>fact, if it means the same thing, I'd *prefer* it to use the same name. 
>>But that's not really important.
> 
> 
> As we're defining markup for xml2rfc and related processors here, we don't
> necessarily want to bind this element to have exactly the same semantics
> as <abbr />.  

I didn't say that I *necessarily* want the same syntax. I just find it 
pointless to avoid identical names. RFC2629 is not HTML, and should 
there ever be a need to mix both vocabularies, this should be done using 
XML namespaces.

BTW: DocBook has an <abbrev> element 
(<http://docbook.org/tdg/en/html/abbrev.html>) which -- surprise -- 
seems to have the same semantics.

>>> * "." | "!" | "?" followed by whitespace is identified as sentence end.
>>>   (any others?)
>>
>>Plus:
>>
>>* Sequences of multiple whitespace characters inside text (t, spanx, 
>>annotation...) are treated as a single character.
 >
 >
 > Could you give an example of what you mean?

Sure. For instance (using "_" instead of SP), in

<t>
___Foo_bar.
___Foo_bar.
</t>

I'd like to see xml2rfc to produce:

Foo_bar.__Foo_bar.

Not:

___Foo_bar.____Foo_bar.

>>Well. In fact they must be left without markup, unless no sentence end 
>>will be detected (at least as far as I understand the model you're 
>>proposing). Obviously (not being able to use <abbrev> because it occured 
>>at a sentence end) that would be a bad thing.
> 
> 
> The purpose of using <abbrev /> would be to avoid interpreting "." SP
> as sentence end.  So if you don't need to avoid it because you _have_
> sentence end, you're fine, no?  Unless you want to give
> <abbrev /> additional semantics, of course.

Sure. That's the whole point. If we introduce a new element to indicate 
"this is an abbreviation" it needs to work everywhere. I was just 
proposing it because in many cases it would be enough for disambiguating 
  the types of dots.

The issue we need to solve is caused by the fact that RFC2629 (like 
almost all other vocabularies, btw) does not specifically markup 
sentence boundaries. This seems to be a non-issue, unless you have 
indeed to produce monospaced output and are stuck with specific 
formatting requirements.

It seems that we all agree that

- we don't want to break existing files,

- we don't want to do any additional typing unless it's completely 
unavoidable.

Thus

- we need to describe the rules that xml2rfc can use to detect sentence 
endings to do "the right thing", and

- possibly we need explicit "instructions" by which the default rules 
can be overridden (elements, PIs or special characters have been discussed).

Of course it makes sense to minimize the number of these special cases, 
therefore the idea of making abbreviations explicit. That's useful 
anyway, because processors may be able to take advantage of that when 
producing other outputs, such as HTML.

>>> * Sentence end is rendered by the .txt renderer as "." SP SP
>>
>>That still leaves the issue open how to handle things like.
>>
>>	<spanx>First sentence.</spanx> Second sentence.
>>
>>I think in this case the processor should be able to determine that 
>>there was in fact a sentence ending here.

OK, so we finally seem to converge on the same plan.

However, I think we haven't fully described that algorithm yet. I'll 
have to do some more experimentation with example texts to see which 
cases we haven't considered.

Generally I think we'll need overrides for both cases (sentence end not 
detected, or sentence end detected where there wasn't any). These 
situations should be rare, and I think in these cases special PIs are 
the least intrusive solution.

Julian



-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Sat, 3 Jan 2004 19:11:52 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <3FF6F3D1.6040505@gmx.de>
References: <3FEEDA07.3070900@gmx.de> <20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FEFFFD7.6020001@gmx.de> <20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FF0A5C5.6090800@gmx.de> <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com> <20040102144320.GJ75652@finch-staff-1.thus.net> <20040103002408.381f6c59.henrik@levkowetz.com> <3FF6B71D.9080007@gmx.de> <20040103142444.24bded48.henrik@levkowetz.com> <3FF6C463.9070702@gmx.de> <20040103145204.18880b92.henrik@levkowetz.com> <3FF6CE94.5060805@gmx.de> <20040103171959.18366810.henrik@levkowetz.com> <3FF6F3D1.6040505@gmx.de>
Message-ID: <20040103191152.6dc6128a.henrik@levkowetz.com>

--Signature=_Sat__3_Jan_2004_19_11_52_+0100_7TlsSP6MTU8af=6K
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Saturday  3 January 2004, Julian Reschke wrote:
> Henrik Levkowetz wrote:
> 
> > In that case, let's use this, except maybe call it <abbrev /> since html
> > has an <abbr /> element.  Seems workable to me, it is simple and when I
> > think through the different cases it seems they can all be covered by
> > this.
> 
> I'm not sure why it's relevant whether it has the same name in HTML. In 
> fact, if it means the same thing, I'd *prefer* it to use the same name. 
> But that's not really important.

As we're defining markup for xml2rfc and related processors here, we don't
necessarily want to bind this element to have exactly the same semantics
as <abbr />.  

> >  * "." | "!" | "?" followed by whitespace is identified as sentence end.
> >    (any others?)
> 
> Plus:
> 
> * Sequences of multiple whitespace characters inside text (t, spanx, 
> annotation...) are treated as a single character.

Could you give an example of what you mean?

> >  * To avoid abbreviations triggering this, they may be enclosed in
> >    <abbrev> </abbrev>.  Abbreviations ending a sentence can be left
> >    without markup.
> 
> Well. In fact they must be left without markup, unless no sentence end 
> will be detected (at least as far as I understand the model you're 
> proposing). Obviously (not being able to use <abbrev> because it occured 
> at a sentence end) that would be a bad thing.

The purpose of using <abbrev /> would be to avoid interpreting "." SP
as sentence end.  So if you don't need to avoid it because you _have_
sentence end, you're fine, no?  Unless you want to give
<abbrev /> additional semantics, of course.

> >  * Sentence end is rendered by the .txt renderer as "." SP SP
> 
> That still leaves the issue open how to handle things like.
> 
> 	<spanx>First sentence.</spanx> Second sentence.
> 
> I think in this case the processor should be able to determine that 
> there was in fact a sentence ending here.

Fine.

	Henrik

--Signature=_Sat__3_Jan_2004_19_11_52_+0100_7TlsSP6MTU8af=6K
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/9wXoeVhrtTJkXCMRAv4JAKDnPhcRehOCFPeBjW/A4PZ5iNNaDACg5RnR
t3zpEV7sHR+HZqd+qwTqOw0=
=XcsI
-----END PGP SIGNATURE-----

--Signature=_Sat__3_Jan_2004_19_11_52_+0100_7TlsSP6MTU8af=6K--


From: julian.reschke@gmx.de (Julian Reschke)
Date: Sat, 03 Jan 2004 17:54:41 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040103171959.18366810.henrik@levkowetz.com>
References: <3FEEDA07.3070900@gmx.de>	<20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FEFFFD7.6020001@gmx.de>	<20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FF0A5C5.6090800@gmx.de>	<20031230105949.GH76219@finch-staff-1.thus.net>	<20031231005501.31892d3b.henrik@levkowetz.com>	<20040102111006.GF75652@finch-staff-1.thus.net>	<20040102153617.32014ac8.henrik@levkowetz.com>	<20040102144320.GJ75652@finch-staff-1.thus.net>	<20040103002408.381f6c59.henrik@levkowetz.com>	<3FF6B71D.9080007@gmx.de>	<20040103142444.24bded48.henrik@levkowetz.com>	<3FF6C463.9070702@gmx.de>	<20040103145204.18880b92.henrik@levkowetz.com>	<3FF6CE94.5060805@gmx.de> <20040103171959.18366810.henrik@levkowetz.com>
Message-ID: <3FF6F3D1.6040505@gmx.de>

Henrik Levkowetz wrote:

> In that case, let's use this, except maybe call it <abbrev /> since html
> has an <abbr /> element.  Seems workable to me, it is simple and when I
> think through the different cases it seems they can all be covered by
> this.

I'm not sure why it's relevant whether it has the same name in HTML. In 
fact, if it means the same thing, I'd *prefer* it to use the same name. 
But that's not really important.

>  * "." | "!" | "?" followed by whitespace is identified as sentence end.
>    (any others?)

Plus:

* Sequences of multiple whitespace characters inside text (t, spanx, 
annotation...) are treated as a single character.

>  * To avoid abbreviations triggering this, they may be enclosed in
>    <abbrev> </abbrev>.  Abbreviations ending a sentence can be left
>    without markup.

Well. In fact they must be left without markup, unless no sentence end 
will be detected (at least as far as I understand the model you're 
proposing). Obviously (not being able to use <abbrev> because it occured 
at a sentence end) that would be a bad thing.

>  * Sentence end is rendered by the .txt renderer as "." SP SP

That still leaves the issue open how to handle things like.

	<spanx>First sentence.</spanx> Second sentence.

I think in this case the processor should be able to determine that 
there was in fact a sentence ending here.

Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Sat, 3 Jan 2004 17:19:59 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <3FF6CE94.5060805@gmx.de>
References: <3FEEDA07.3070900@gmx.de> <20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FEFFFD7.6020001@gmx.de> <20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FF0A5C5.6090800@gmx.de> <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com> <20040102144320.GJ75652@finch-staff-1.thus.net> <20040103002408.381f6c59.henrik@levkowetz.com> <3FF6B71D.9080007@gmx.de> <20040103142444.24bded48.henrik@levkowetz.com> <3FF6C463.9070702@gmx.de> <20040103145204.18880b92.henrik@levkowetz.com> <3FF6CE94.5060805@gmx.de>
Message-ID: <20040103171959.18366810.henrik@levkowetz.com>

--Signature=_Sat__3_Jan_2004_17_19_59_+0100_=ScbyYpJMe_xLM4.
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit


Saturday  3 January 2004, Julian Reschke wrote:
> > I see you snipped the <abbr>e.g.</abbr> proposal.
> 
> Because we agreed. Why would I repeat everything?

In that case, let's use this, except maybe call it <abbrev /> since html
has an <abbr /> element.  Seems workable to me, it is simple and when I
think through the different cases it seems they can all be covered by
this.

 * "." | "!" | "?" followed by whitespace is identified as sentence end.
   (any others?)

 * To avoid abbreviations triggering this, they may be enclosed in
   <abbrev> </abbrev>.  Abbreviations ending a sentence can be left
   without markup.

 * Sentence end is rendered by the .txt renderer as "." SP SP


	Henrik

--Signature=_Sat__3_Jan_2004_17_19_59_+0100_=ScbyYpJMe_xLM4.
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/9uuveVhrtTJkXCMRAreKAJ9q3d+GoPPg21Tdahe2ov9OjHdH/QCfR45w
nMZDJOLx8C08Vfpklv1AAl8=
=UpBV
-----END PGP SIGNATURE-----

--Signature=_Sat__3_Jan_2004_17_19_59_+0100_=ScbyYpJMe_xLM4.--


From: julian.reschke@gmx.de (Julian Reschke)
Date: Sat, 03 Jan 2004 15:15:48 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040103145204.18880b92.henrik@levkowetz.com>
References: <3FEEDA07.3070900@gmx.de>	<20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FEFFFD7.6020001@gmx.de>	<20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FF0A5C5.6090800@gmx.de>	<20031230105949.GH76219@finch-staff-1.thus.net>	<20031231005501.31892d3b.henrik@levkowetz.com>	<20040102111006.GF75652@finch-staff-1.thus.net>	<20040102153617.32014ac8.henrik@levkowetz.com>	<20040102144320.GJ75652@finch-staff-1.thus.net>	<20040103002408.381f6c59.henrik@levkowetz.com>	<3FF6B71D.9080007@gmx.de>	<20040103142444.24bded48.henrik@levkowetz.com>	<3FF6C463.9070702@gmx.de> <20040103145204.18880b92.henrik@levkowetz.com>
Message-ID: <3FF6CE94.5060805@gmx.de>

Henrik Levkowetz wrote:

>>I don't understand this statement. The raw text emitted for any sequence 
>>of xml2rfc markup is (more or less) well-defined, so it's possible to 
>>state whether the a specific element will cause text ending with 
>>punctuation to be output. So yes, this is possible. It's just non-trivial.
> 
> 
> Well, think about it a bit more.  The whole reason for this discussion
> is exactly that the pure text does not contain enough information to
> consistently determine whether you have an end of sentence or not. If
> you look at the text produced by the markup it's too late, you've lost
> too much information.

I didn't say that. What I said is that you can *also* look at that text, 
this way you won't have lost that information. For instance, for any 
given text node (XPath-wise), I can determine the preceding node and 
decide whether that is rendered with a trailing punctuation mark.

> I see you snipped the <abbr>e.g.</abbr> proposal.

Because we agreed. Why would I repeat everything?

> ...

>>That's correct, and this is why I'd prefer either a P.I. or a Unicode 
>>character that doesn't have exactly the NBSP semantics. For instance, we 
>>could just use one from the private-use area (that's what it's for).
> 
> 
> Except that what we want is really markup that will save us from the
> need to put <sentence>Da da da.</sentence> around each sentence.  It's
> not strictly a character we want, and I'm doubtful whether it's a
> processing instruction.

Please stop making wrong assumptions about I was suggesting. Neither a 
specific control character nor a P.I. force you to markup sentences with 
start/end tags, nor did anybody suggest that you need to markup *all* 
sentence endings. Maybe all the confusion is caused by you not having 
understood what Clive and I are suggesting?

To summarize: I'd like the processor to

- ignore excess whitespace in <t> elements (just like it should unless 
xml:space is "preserve")

- do a best-effort guess about when a 
punctuation-mark-followed-by-white-space sequence is a sentence end and 
in particular make that algorithm smart enough to handle cases like the 
spanx example

- have explicit syntax (P.I. or special character) to indicate a) "yes, 
this is a sentence end although it may not look like it" and/or b) "no, 
this isn't, although it does look like it" (we may need both a) and b) 
to cover all cases)


Regards, Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Sat, 3 Jan 2004 14:52:04 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <3FF6C463.9070702@gmx.de>
References: <3FEEDA07.3070900@gmx.de> <20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FEFFFD7.6020001@gmx.de> <20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FF0A5C5.6090800@gmx.de> <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com> <20040102144320.GJ75652@finch-staff-1.thus.net> <20040103002408.381f6c59.henrik@levkowetz.com> <3FF6B71D.9080007@gmx.de> <20040103142444.24bded48.henrik@levkowetz.com> <3FF6C463.9070702@gmx.de>
Message-ID: <20040103145204.18880b92.henrik@levkowetz.com>

--Signature=_Sat__3_Jan_2004_14_52_04_+0100_Cs6vJH4GZMZ0A4a9
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Saturday  3 January 2004, Julian Reschke wrote:
> Henrik Levkowetz wrote:
> 
>  > ...
> >>However, from a pure markup point of view, only the text produced by the 
> >>markup (ideally identical to the text *content* of the markup) should count.
> > 
> > No. As long as we want to detect sentence end _withouth having markup for it_
> > this is not strictly possible.  We are effectively saying that a certain form
> > of text input has markup implications.  We can't have the cake and eat it too.
> 
> I don't understand this statement. The raw text emitted for any sequence 
> of xml2rfc markup is (more or less) well-defined, so it's possible to 
> state whether the a specific element will cause text ending with 
> punctuation to be output. So yes, this is possible. It's just non-trivial.

Well, think about it a bit more.  The whole reason for this discussion
is exactly that the pure text does not contain enough information to
consistently determine whether you have an end of sentence or not. If
you look at the text produced by the markup it's too late, you've lost
too much information.

> > ...
> >>While the iref example can simply be rewritten, the spanx example can't. 
> >>So *if* we choose that as a solution, there should be a way to enforce a 
> >>sentence end no matter what the surrounding markup is.
> > 
> > 
> > Agreed.
> 
> Great.
> 

I see you snipped the <abbr>e.g.</abbr> proposal.

> >>>That's OK; we have different opinions then.  I think <nul /> would be
> >>>better because it would have absolutely no added implications, while
> >>>".&nbsp; Xxxx" or ".&nbsp;Xxxx" would have some.
> >>
> >>Such as...?
> > 
> > 
> > Sigh, do you really need this spelled out?
> 
> Yes, because if you say "has implications" without saying which it's 
> impossible to address them.

Some things are sufficiently obvious that it breaks up a discussion into
details to go into them.  I'm pretty sure you could have written the
two paragraphs below yourself.

> >   ".&nbsp; Xxxx" :
> >      You have &nbsp; binding together a "." and a " ", which would render
> >      as "." SP SP inside a line, and prevent "." from coming in the last
> >      text position on a line, as &nbsp; and " " would be bound to it,
> >      with possibly changed line breaking as a result.
> > 
> >   ".&nbsp;Xxxx" :
> >      Again, you'd get changed line breaking as a line breadk between "." 
> >      and "X" would not be permitted.
> > 
> > Essentially, with this it would not be possible to produce an
> > abbreviation followed by one space, then more text, which would permit
> > line breaking at that space.
> 
> That's correct, and this is why I'd prefer either a P.I. or a Unicode 
> character that doesn't have exactly the NBSP semantics. For instance, we 
> could just use one from the private-use area (that's what it's for).

Except that what we want is really markup that will save us from the
need to put <sentence>Da da da.</sentence> around each sentence.  It's
not strictly a character we want, and I'm doubtful whether it's a
processing instruction.

	Henrik

--Signature=_Sat__3_Jan_2004_14_52_04_+0100_Cs6vJH4GZMZ0A4a9
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/9skFeVhrtTJkXCMRAg7LAKCaiG2MO6cbZ6iVmEjgp/8IMGlH2QCgpAcq
1AcpFAw3X3Y5UO2AoTOFzD8=
=hKFd
-----END PGP SIGNATURE-----

--Signature=_Sat__3_Jan_2004_14_52_04_+0100_Cs6vJH4GZMZ0A4a9--


From: julian.reschke@gmx.de (Julian Reschke)
Date: Sat, 03 Jan 2004 14:32:19 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040103142444.24bded48.henrik@levkowetz.com>
References: <3FEEDA07.3070900@gmx.de>	<20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FEFFFD7.6020001@gmx.de>	<20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FF0A5C5.6090800@gmx.de>	<20031230105949.GH76219@finch-staff-1.thus.net>	<20031231005501.31892d3b.henrik@levkowetz.com>	<20040102111006.GF75652@finch-staff-1.thus.net>	<20040102153617.32014ac8.henrik@levkowetz.com>	<20040102144320.GJ75652@finch-staff-1.thus.net>	<20040103002408.381f6c59.henrik@levkowetz.com>	<3FF6B71D.9080007@gmx.de> <20040103142444.24bded48.henrik@levkowetz.com>
Message-ID: <3FF6C463.9070702@gmx.de>

Henrik Levkowetz wrote:

 > ...
>>However, from a pure markup point of view, only the text produced by the 
>>markup (ideally identical to the text *content* of the markup) should count.
> 
> 
> No. As long as we want to detect sentence end _withouth having markup for it_
> this is not strictly possible.  We are effectively saying that a certain form
> of text input has markup implications.  We can't have the cake and eat it too.

I don't understand this statement. The raw text emitted for any sequence 
of xml2rfc markup is (more or less) well-defined, so it's possible to 
state whether the a specific element will cause text ending with 
punctuation to be output. So yes, this is possible. It's just non-trivial.

> ...
>>While the iref example can simply be rewritten, the spanx example can't. 
>>So *if* we choose that as a solution, there should be a way to enforce a 
>>sentence end no matter what the surrounding markup is.
> 
> 
> Agreed.

Great.

> ...
>>>That's OK; we have different opinions then.  I think <nul /> would be
>>>better because it would have absolutely no added implications, while
>>>".&nbsp; Xxxx" or ".&nbsp;Xxxx" would have some.
>>
>>Such as...?
> 
> 
> Sigh, do you really need this spelled out?

Yes, because if you say "has implications" without saying which it's 
impossible to address them.

>   ".&nbsp; Xxxx" :
>      You have &nbsp; binding together a "." and a " ", which would render
>      as "." SP SP inside a line, and prevent "." from coming in the last
>      text position on a line, as &nbsp; and " " would be bound to it,
>      with possibly changed line breaking as a result.
> 
>   ".&nbsp;Xxxx" :
>      Again, you'd get changed line breaking as a line breadk between "." 
>      and "X" would not be permitted.
> 
> Essentially, with this it would not be possible to produce an
> abbreviation followed by one space, then more text, which would permit
> line breaking at that space.

That's correct, and this is why I'd prefer either a P.I. or a Unicode 
character that doesn't have exactly the NBSP semantics. For instance, we 
could just use one from the private-use area (that's what it's for).

Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Sat, 3 Jan 2004 14:24:44 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <3FF6B71D.9080007@gmx.de>
References: <3FEEDA07.3070900@gmx.de> <20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FEFFFD7.6020001@gmx.de> <20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FF0A5C5.6090800@gmx.de> <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com> <20040102144320.GJ75652@finch-staff-1.thus.net> <20040103002408.381f6c59.henrik@levkowetz.com> <3FF6B71D.9080007@gmx.de>
Message-ID: <20040103142444.24bded48.henrik@levkowetz.com>

--Signature=_Sat__3_Jan_2004_14_24_44_+0100_kQSO9hCZQM35AQjB
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Saturday  3 January 2004, Julian Reschke wrote:
> Henrik Levkowetz wrote:
>  > ...
> >>If so, what about:
> >>
> >>    need a mumble. <xref target='RFC1234'>RFC 1234</ref> says this.
> > 
> > 
> > Here you have "." followed by whitespace. So we have a end of sentence.
> > 
> > 
> >>or
> >>
> >>    need a mumble.<iref item='mumble' subitem='needed' /> Meanwhile, we see
> > 
> > 
> > Here we don't have "." followed by whitespace. This will not generate "." SP SP
> 
> And this seems to be problematic.

No, it's straightforward.

> What we're doing here is letting the detection of a sentence end rely on 
> the fact that neither the end of the previous sentence
> 
> 	<spanx>I don't like that.</spanx> So foo bar...
> 
> nor the space between the sentences (as in the iref example) contains 
> markup (so from an XML point of view the end of the previous sentence 
> and the start of the next sentence are in a single piece of running text).

Yes.

> However, from a pure markup point of view, only the text produced by the 
> markup (ideally identical to the text *content* of the markup) should count.

No. As long as we want to detect sentence end _withouth having markup for it_
this is not strictly possible.  We are effectively saying that a certain form
of text input has markup implications.  We can't have the cake and eat it too.

> So
> 
>    need a mumble.<iref item='mumble' subitem='needed' /> Meanwhile, we
> 
> would be a sentence end (because the iref expands to zero text) and
>
>    <spanx>I don't like that.</spanx>  So foo bar...
> 
> as well (as the spanx has text content expanding to something ending in 
> a dot, and the subsequent text starts with whitespace).
> 
> While the iref example can simply be rewritten, the spanx example can't. 
> So *if* we choose that as a solution, there should be a way to enforce a 
> sentence end no matter what the surrounding markup is.

Agreed.

> I just spent some time trying sentence end detection in XSLT, and here 
> some more things to consider...:
> 
> - Sentences may also stop with other punctuation (such as !).

Yes, that should be taken into consideration as well.

> - If the main issue is not to assume sentence ends because of previous 
> abbreviations, the best solution may be to add explicit markup for that 
> situation, such as: <abbr>e.g.</abbr>. However there'd still be an issue 
> if the abbreviation would indeed by a sentence end.

<abbr>e.g.</abbr>, and an explicit sentence end markup would be OK with me.

>  > ...
> > That's OK; we have different opinions then.  I think <nul /> would be
> > better because it would have absolutely no added implications, while
> > ".&nbsp; Xxxx" or ".&nbsp;Xxxx" would have some.
> 
> Such as...?

Sigh, do you really need this spelled out?

  ".&nbsp; Xxxx" :
     You have &nbsp; binding together a "." and a " ", which would render
     as "." SP SP inside a line, and prevent "." from coming in the last
     text position on a line, as &nbsp; and " " would be bound to it,
     with possibly changed line breaking as a result.

  ".&nbsp;Xxxx" :
     Again, you'd get changed line breaking as a line breadk between "." 
     and "X" would not be permitted.

Essentially, with this it would not be possible to produce an
abbreviation followed by one space, then more text, which would permit
line breaking at that space.

	Henrik


--Signature=_Sat__3_Jan_2004_14_24_44_+0100_kQSO9hCZQM35AQjB
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/9sKceVhrtTJkXCMRAqmzAKDTTo6nv3nHELGFGkPZDXSdyKnkjQCg5B97
r6JMBVp/yqCl/5FR+QNyKyc=
=6ljh
-----END PGP SIGNATURE-----

--Signature=_Sat__3_Jan_2004_14_24_44_+0100_kQSO9hCZQM35AQjB--


From: julian.reschke@gmx.de (Julian Reschke)
Date: Sat, 03 Jan 2004 13:35:41 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040103002408.381f6c59.henrik@levkowetz.com>
References: <3FEEDA07.3070900@gmx.de>	<20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FEFFFD7.6020001@gmx.de>	<20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FF0A5C5.6090800@gmx.de>	<20031230105949.GH76219@finch-staff-1.thus.net>	<20031231005501.31892d3b.henrik@levkowetz.com>	<20040102111006.GF75652@finch-staff-1.thus.net>	<20040102153617.32014ac8.henrik@levkowetz.com>	<20040102144320.GJ75652@finch-staff-1.thus.net> <20040103002408.381f6c59.henrik@levkowetz.com>
Message-ID: <3FF6B71D.9080007@gmx.de>

Henrik Levkowetz wrote:
 > ...
>>If so, what about:
>>
>>    need a mumble. <xref target='RFC1234'>RFC 1234</ref> says this.
> 
> 
> Here you have "." followed by whitespace. So we have a end of sentence.
> 
> 
>>or
>>
>>    need a mumble.<iref item='mumble' subitem='needed' /> Meanwhile, we see
> 
> 
> Here we don't have "." followed by whitespace. This will not generate "." SP SP

And this seems to be problematic.

What we're doing here is letting the detection of a sentence end rely on 
the fact that neither the end of the previous sentence

	<spanx>I don't like that.</spanx> So foo bar...

nor the space between the sentences (as in the iref example) contains 
markup (so from an XML point of view the end of the previous sentence 
and the start of the next sentence are in a single piece of running text).

However, from a pure markup point of view, only the text produced by the 
markup (ideally identical to the text *content* of the markup) should count.

So

   need a mumble.<iref item='mumble' subitem='needed' /> Meanwhile, we

would be a sentence end (because the iref expands to zero text) and

   <spanx>I don't like that.</spanx>  So foo bar...

as well (as the spanx has text content expanding to something ending in 
a dot, and the subsequent text starts with whitespace).

While the iref example can simply be rewritten, the spanx example can't. 
So *if* we choose that as a solution, there should be a way to enforce a 
sentence end no matter what the surrounding markup is.

I just spent some time trying sentence end detection in XSLT, and here 
some more things to consider...:

- Sentences may also stop with other punctuation (such as !).

- If the main issue is not to assume sentence ends because of previous 
abbreviations, the best solution may be to add explicit markup for that 
situation, such as: <abbr>e.g.</abbr>. However there'd still be an issue 
if the abbreviation would indeed by a sentence end.

 > ...
> That's OK; we have different opinions then.  I think <nul /> would be
> better because it would have absolutely no added implications, while
> ".&nbsp; Xxxx" or ".&nbsp;Xxxx" would have some.

Such as...?

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Sat, 3 Jan 2004 01:19:17 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <3FF5F9E1.8070008@gmx.de>
References: <3FEEDA07.3070900@gmx.de> <20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FEFFFD7.6020001@gmx.de> <20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FF0A5C5.6090800@gmx.de> <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com> <3FF58380.2030201@gmx.de> <20040102172621.72261265.henrik@levkowetz.com> <3FF5AFC4.4060906@gmx.de> <20040102233932.7a95820f.henrik@levkowetz.com> <3FF5F9E1.8070008@gmx.de>
Message-ID: <20040103011917.4d57f586.henrik@levkowetz.com>

--Signature=_Sat__3_Jan_2004_01_19_17_+0100_zAxrsjBca+h8I2_A
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Saturday  3 January 2004, Julian Reschke wrote:
> Henrik Levkowetz wrote:
> > Friday  2 January 2004, Julian Reschke wrote:
> > 
> >>Anyway, I'm not going to support a grammar element that is defined to 
> >>have "no" meaning except for a strange side effect...
> > 
> > Very well, don't support it.  The side effect is analogous to that of
> > using zeroes when writing the number 1000.  The contribution of each
> > zero is in its placement, but it neither adds nor subtracts from the
> > total value by itself.  Very strange, I'm sure.
> 
> I fully understand the concept. I just feel that it doesn't have any 
> place in designing markup vocabularies.

So we disagree.

> If you really think we should proceed that way, please make a complete 
> proposal that fully explains exactly when the processor should add the 
> additional space. Keep in mind that -- as you say that <nop/> has in 
> fact no semantics -- you'll have to define that for any combination of 
> legal markup inside the <t> element (remember Clive's questions)?

Julian, I've clarified what semantics <nul /> has, not said that it has
no semantics. Please. 

Supposing you have an input symbol stream in which "." is followed
immediately by WSP, or contrarily one in which "." is followed
immediately by a <nul /> symbol and then WSP, or any other tag and then
WSP.  

Throughout lexing and parsing of the symbol stream, you won't
gratuitously throw away information until the point where you make
decisions about the output rendering; but at that point the
transformation probably becomes irreversible.  

Identifying the symbol (or token) sequence "." immediately followed by
WSP as end of sentence can be done at any point before or at the point
where you reduce the information content (by doing the rendering or
otherwise). If the token sequence has anything but "." preceding WSP,
you don't have end of sentence.  If you have end of sentence, you insert
an end-of-sentence (EOS) token in your token stream at this point.

Subsequently, if you come across a <nul /> token at the point of output
rendering, that token in no way changes the state of the renderer, but
is discarded.  If you come across some other token, you render it
according to its specification.  If you come across an end-of-sentence
token, you render it as SP ( which makes "." WSP EOS come out as "." SP
SP) in a RFC txt renderer.  A renderer of some other format may render
the EOS token in some other, appropriate manner.

(Of course, the detailed mechanics of this can be done differently. You
could replace the "." WSP token sequence by a single EOS, which is
subsequently rendered as "." SP SP; or variations on this theme.)

> I'll make a similar proposal for a special character code and/or PI 
> based solution (both would work almost identically) in the next few days.

Sure, go ahead.

	Henrik.

--Signature=_Sat__3_Jan_2004_01_19_17_+0100_zAxrsjBca+h8I2_A
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/9gqFeVhrtTJkXCMRAmYdAJ94txy4kqgg5xDoy5Frzd05t+/qoACeJTew
wb7yg6WljPnoG/ebz+ynIYQ=
=qt+B
-----END PGP SIGNATURE-----

--Signature=_Sat__3_Jan_2004_01_19_17_+0100_zAxrsjBca+h8I2_A--


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Sat, 3 Jan 2004 00:24:08 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040102144320.GJ75652@finch-staff-1.thus.net>
References: <3FEEDA07.3070900@gmx.de> <20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FEFFFD7.6020001@gmx.de> <20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FF0A5C5.6090800@gmx.de> <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com> <20040102144320.GJ75652@finch-staff-1.thus.net>
Message-ID: <20040103002408.381f6c59.henrik@levkowetz.com>

--Signature=_Sat__3_Jan_2004_00_24_08_+0100_4R6M58x3aGhysi7.
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Friday  2 January 2004, Clive D.W. Feather wrote:
> Henrik Levkowetz said:
> >> But what does the null directive *mean*?
> > The <nul /> element would mean nothing. A markup element being present, but
> > not in itself causing anything to be rendered. But by being there, you
> > wouldn't have "." WSP, but "." <nul/> WSP, which would avoid triggering
> > the rendering of "." WSP as ".  "
> 
> So you're suggesting that any directive after a full stop would have this
> effect?

Yes.

> If so, what about:
> 
>     need a mumble. <xref target='RFC1234'>RFC 1234</ref> says this.

Here you have "." followed by whitespace. So we have a end of sentence.

> or
> 
>     need a mumble.<iref item='mumble' subitem='needed' /> Meanwhile, we see

Here we don't have "." followed by whitespace. This will not generate "." SP SP

> If not, you need to decide which ones have this effect and which don't. And
> you need to give <nul /> specific semantics.

I'd propose no exceptions, no additional rules, as indicated above.

The semantics would be indeed to add nothing, subtract nothing, the only effect
being that the sequence of symbols in the input stream ".<nul /> " would not have
"." immediately before " ".

> > It would be similar to the 'nop' processing instruction, which sometimes
> > is useful in programming.
> 
> But it isn't a NOP.

No, it would be a <nul />

> I still think that &nbsp; instead of the space is the right approach.

That's OK; we have different opinions then.  I think <nul /> would be
better because it would have absolutely no added implications, while
".&nbsp; Xxxx" or ".&nbsp;Xxxx" would have some.

	Henrik

--Signature=_Sat__3_Jan_2004_00_24_08_+0100_4R6M58x3aGhysi7.
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/9f2ZeVhrtTJkXCMRAnMNAJwKxokrrrqE4J5tt6DAz8c/xRG6FACg6bgL
PfjUOowQMwzhMe6NMQNx1Kc=
=IJzb
-----END PGP SIGNATURE-----

--Signature=_Sat__3_Jan_2004_00_24_08_+0100_4R6M58x3aGhysi7.--


From: julian.reschke@gmx.de (Julian Reschke)
Date: Sat, 03 Jan 2004 00:08:17 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040102233932.7a95820f.henrik@levkowetz.com>
References: <3FEEDA07.3070900@gmx.de>	<20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FEFFFD7.6020001@gmx.de>	<20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FF0A5C5.6090800@gmx.de>	<20031230105949.GH76219@finch-staff-1.thus.net>	<20031231005501.31892d3b.henrik@levkowetz.com>	<20040102111006.GF75652@finch-staff-1.thus.net>	<20040102153617.32014ac8.henrik@levkowetz.com>	<3FF58380.2030201@gmx.de>	<20040102172621.72261265.henrik@levkowetz.com>	<3FF5AFC4.4060906@gmx.de> <20040102233932.7a95820f.henrik@levkowetz.com>
Message-ID: <3FF5F9E1.8070008@gmx.de>

Henrik Levkowetz wrote:
> Friday  2 January 2004, Julian Reschke wrote:
> 
>>Anyway, I'm not going to support a grammar element that is defined to 
>>have "no" meaning except for a strange side effect...
> 
> 
> Very well, don't support it.  The side effect is analogous to that of
> using zeroes when writing the number 1000.  The contribution of each
> zero is in its placement, but it neither adds nor subtracts from the
> total value by itself.  Very strange, I'm sure.

I fully understand the concept. I just feel that it doesn't have any 
place in designing markup vocabularies.

If you really think we should proceed that way, please make a complete 
proposal that fully explains exactly when the processor should add the 
additional space. Keep in mind that -- as you say that <nop/> has in 
fact no semantics -- you'll have to define that for any combination of 
legal markup inside the <t> element (remember Clive's questions)?

I'll make a similar proposal for a special character code and/or PI 
based solution (both would work almost identically) in the next few days.

Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Fri, 2 Jan 2004 23:39:32 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <3FF5AFC4.4060906@gmx.de>
References: <3FEEDA07.3070900@gmx.de> <20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FEFFFD7.6020001@gmx.de> <20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FF0A5C5.6090800@gmx.de> <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com> <3FF58380.2030201@gmx.de> <20040102172621.72261265.henrik@levkowetz.com> <3FF5AFC4.4060906@gmx.de>
Message-ID: <20040102233932.7a95820f.henrik@levkowetz.com>

--Signature=_Fri__2_Jan_2004_23_39_32_+0100_Jch5_EiKBBhMT029
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Friday  2 January 2004, Julian Reschke wrote:
> Anyway, I'm not going to support a grammar element that is defined to 
> have "no" meaning except for a strange side effect...

Very well, don't support it.  The side effect is analogous to that of
using zeroes when writing the number 1000.  The contribution of each
zero is in its placement, but it neither adds nor subtracts from the
total value by itself.  Very strange, I'm sure.

	Henrik

--Signature=_Fri__2_Jan_2004_23_39_32_+0100_Jch5_EiKBBhMT029
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/9fMkeVhrtTJkXCMRAhvlAKDxMCUZerlyijhWIGgY1RuPWnFLVQCdEwaV
zDK7ehTGh71UM/ksFOLc9xw=
=NqFk
-----END PGP SIGNATURE-----

--Signature=_Fri__2_Jan_2004_23_39_32_+0100_Jch5_EiKBBhMT029--


From: julian.reschke@gmx.de (Julian Reschke)
Date: Fri, 02 Jan 2004 18:52:04 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040102172621.72261265.henrik@levkowetz.com>
References: <3FEEDA07.3070900@gmx.de>	<20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FEFFFD7.6020001@gmx.de>	<20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FF0A5C5.6090800@gmx.de>	<20031230105949.GH76219@finch-staff-1.thus.net>	<20031231005501.31892d3b.henrik@levkowetz.com>	<20040102111006.GF75652@finch-staff-1.thus.net>	<20040102153617.32014ac8.henrik@levkowetz.com>	<3FF58380.2030201@gmx.de> <20040102172621.72261265.henrik@levkowetz.com>
Message-ID: <3FF5AFC4.4060906@gmx.de>

Henrik Levkowetz wrote:

> Maybe tone down your arrogance a notch?  The element would indeed mean

I'm not going to reply to that...

> "nothing", that _is_ it's semantics. Its only function would be to be
> there at this particular position, so that you don't have "." adjacent
> to WSP.  

So it does *not* mean "nothing", right?

> Anyway, I'd be perfectly happy with an empty <spanx /> if that wouldn't
> have any side effects. The whole idea is to make the code that looks for
> end of sentence ( "." WSP ) not see it where one wants to avoid that.  

As Clive showed, making this depend on the occurence of *any* XML 
element is not going to work. So either we need either an element, a 
P.I. or a special Unicode character that *exactly* carries that semantics.

I could live with &nbsp;, although I think it doesn't exactly mean what 
we need (after all it means "no break here", and we don't want to forbid 
a break, we just don't want an additional *space*). Thus a Processing 
Instruction may make more sense.

Anyway, I'm not going to support a grammar element that is defined to 
have "no" meaning except for a strange side effect...

Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Fri, 2 Jan 2004 17:26:21 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <3FF58380.2030201@gmx.de>
References: <3FEEDA07.3070900@gmx.de> <20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FEFFFD7.6020001@gmx.de> <20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FF0A5C5.6090800@gmx.de> <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com> <3FF58380.2030201@gmx.de>
Message-ID: <20040102172621.72261265.henrik@levkowetz.com>

--Signature=_Fri__2_Jan_2004_17_26_21_+0100_=1VFmRTAK3asknVG
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Friday  2 January 2004, Julian Reschke wrote:
> Henrik Levkowetz wrote:
...
> >>But what does the null directive *mean*?
> > 
> > The <nul /> element would mean nothing. A markup element being present, but
> > not in itself causing anything to be rendered. But by being there, you
> > wouldn't have "." WSP, but "." <nul/> WSP, which would avoid triggering
> > the rendering of "." WSP as ".  "
> > 
> > It would be similar to the 'nop' processing instruction, which sometimes
> > is useful in programming.
> > 
> > If the DTD permitted <t /> nested inside <t> </t>, you could have
> > achieved the same thing by saying  "<t> Ta dum, ta dum (<t>i.e.</t> ta
> > dum) teedeelee</t>".  You mostly can work around not having a null
> > element, but sometimes it's quite useful.
> 
> Come on. If it's defined to mean "nothing", it's useless. What you are 
> proposing has sematics (otherwise it wouldn't do anything). If we would 
> want to go that way, we could always use an empty <spanx/>. On the other 
> hand, &nbsp; looks almost exactly right.

Maybe tone down your arrogance a notch?  The element would indeed mean
"nothing", that _is_ it's semantics. Its only function would be to be
there at this particular position, so that you don't have "." adjacent
to WSP.  

Anyway, I'd be perfectly happy with an empty <spanx /> if that wouldn't
have any side effects. The whole idea is to make the code that looks for
end of sentence ( "." WSP ) not see it where one wants to avoid that.  

	Henrik

--Signature=_Fri__2_Jan_2004_17_26_21_+0100_=1VFmRTAK3asknVG
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/9ZuteVhrtTJkXCMRAojYAKCACIXel35h9IJlqGb3wWD4bnr3TQCghrWM
qWYnhHuR3E5EXaaiHKwp8tA=
=qrdD
-----END PGP SIGNATURE-----

--Signature=_Fri__2_Jan_2004_17_26_21_+0100_=1VFmRTAK3asknVG--


From: clive@demon.net (Clive D.W. Feather)
Date: Fri, 2 Jan 2004 14:43:20 +0000
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040102153617.32014ac8.henrik@levkowetz.com>
References: <3FEEDA07.3070900@gmx.de> <20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FEFFFD7.6020001@gmx.de> <20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FF0A5C5.6090800@gmx.de> <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com>
Message-ID: <20040102144320.GJ75652@finch-staff-1.thus.net>

Henrik Levkowetz said:
>> But what does the null directive *mean*?
> The <nul /> element would mean nothing. A markup element being present, but
> not in itself causing anything to be rendered. But by being there, you
> wouldn't have "." WSP, but "." <nul/> WSP, which would avoid triggering
> the rendering of "." WSP as ".  "

So you're suggesting that any directive after a full stop would have this
effect?

If so, what about:

    need a mumble. <xref target='RFC1234'>RFC 1234</ref> says this.

or

    need a mumble.<iref item='mumble' subitem='needed' /> Meanwhile, we see

If not, you need to decide which ones have this effect and which don't. And
you need to give <nul /> specific semantics.

> It would be similar to the 'nop' processing instruction, which sometimes
> is useful in programming.

But it isn't a NOP.

I still think that &nbsp; instead of the space is the right approach.

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646


From: julian.reschke@gmx.de (Julian Reschke)
Date: Fri, 02 Jan 2004 15:43:12 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040102153617.32014ac8.henrik@levkowetz.com>
References: <3FEEDA07.3070900@gmx.de>	<20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FEFFFD7.6020001@gmx.de>	<20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us>	<3FF0A5C5.6090800@gmx.de>	<20031230105949.GH76219@finch-staff-1.thus.net>	<20031231005501.31892d3b.henrik@levkowetz.com>	<20040102111006.GF75652@finch-staff-1.thus.net> <20040102153617.32014ac8.henrik@levkowetz.com>
Message-ID: <3FF58380.2030201@gmx.de>

Henrik Levkowetz wrote:

> Friday  2 January 2004, Clive D.W. Feather wrote:
> 
>>>I go back to stating a preference for markup, something like <nul />,
>>>e.g. "i.e.<nul /> " to avoid triggering the rendering of "." WSP as
>>>"." SP SP
>>
>>But what does the null directive *mean*?
> 
> 
> The <nul /> element would mean nothing. A markup element being present, but
> not in itself causing anything to be rendered. But by being there, you
> wouldn't have "." WSP, but "." <nul/> WSP, which would avoid triggering
> the rendering of "." WSP as ".  "
> 
> It would be similar to the 'nop' processing instruction, which sometimes
> is useful in programming.
> 
> If the DTD permitted <t /> nested inside <t> </t>, you could have
> achieved the same thing by saying  "<t> Ta dum, ta dum (<t>i.e.</t> ta
> dum) teedeelee</t>".  You mostly can work around not having a null
> element, but sometimes it's quite useful.

Come on. If it's defined to mean "nothing", it's useless. What you are 
proposing has sematics (otherwise it wouldn't do anything). If we would 
want to go that way, we could always use an empty <spanx/>. On the other 
hand, &nbsp; looks almost exactly right.

Julian


-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


From: henrik@levkowetz.com (Henrik Levkowetz)
Date: Fri, 2 Jan 2004 15:36:17 +0100
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20040102111006.GF75652@finch-staff-1.thus.net>
References: <3FEEDA07.3070900@gmx.de> <20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FEFFFD7.6020001@gmx.de> <20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FF0A5C5.6090800@gmx.de> <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com> <20040102111006.GF75652@finch-staff-1.thus.net>
Message-ID: <20040102153617.32014ac8.henrik@levkowetz.com>

--Signature=_Fri__2_Jan_2004_15_36_17_+0100_z4HDCpTAaAutzY8Q
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Friday  2 January 2004, Clive D.W. Feather wrote:
> > I go back to stating a preference for markup, something like <nul />,
> > e.g. "i.e.<nul /> " to avoid triggering the rendering of "." WSP as
> > "." SP SP
> 
> But what does the null directive *mean*?

The <nul /> element would mean nothing. A markup element being present, but
not in itself causing anything to be rendered. But by being there, you
wouldn't have "." WSP, but "." <nul/> WSP, which would avoid triggering
the rendering of "." WSP as ".  "

It would be similar to the 'nop' processing instruction, which sometimes
is useful in programming.

If the DTD permitted <t /> nested inside <t> </t>, you could have
achieved the same thing by saying  "<t> Ta dum, ta dum (<t>i.e.</t> ta
dum) teedeelee</t>".  You mostly can work around not having a null
element, but sometimes it's quite useful.

	Henrik


--Signature=_Fri__2_Jan_2004_15_36_17_+0100_z4HDCpTAaAutzY8Q
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/9YHreVhrtTJkXCMRAmsrAJ40zVGPiqcvm7VgRHdKbs5JmYrYDQCg7L1v
yxnCElIv9u8r8X+kN1htxqs=
=H7II
-----END PGP SIGNATURE-----

--Signature=_Fri__2_Jan_2004_15_36_17_+0100_z4HDCpTAaAutzY8Q--


From: clive@demon.net (Clive D.W. Feather)
Date: Fri, 2 Jan 2004 11:10:06 +0000
Subject: [xml2rfc] end of sentence: two spaces?
In-Reply-To: <20031231005501.31892d3b.henrik@levkowetz.com>
References: <3FEEDA07.3070900@gmx.de> <20031228170447.5f6f7254.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FEFFFD7.6020001@gmx.de> <20031229071158.6985f569.mrose+internet.xml2rfc@dbc.mtview.ca.us> <3FF0A5C5.6090800@gmx.de> <20031230105949.GH76219@finch-staff-1.thus.net> <20031231005501.31892d3b.henrik@levkowetz.com>
Message-ID: <20040102111006.GF75652@finch-staff-1.thus.net>

Henrik Levkowetz said:
>> Unicode has several special characters, so it's a question of picking the
>> right one.
>> 
>> I would argue that things like "i.e." need to be written as *either*
>>     i.e.&nbsp;no line break is permitted, but justification space is.
>>     i.e.&zwj; the zero width joiner shows a closer association.
>> 
>> &nbsp; is &#x00A0; or &#160;   NO-BREAK SPACE
>> &zwj;  is &#x200D; or &#8205;  ZERO WIDTH JOINER
> 
> Good exposition.  
> 
> However, the special character option is starting to look like a rathole
> to me. What if somebody actually needs to use this character withouth
> implying the additional semantics we've added to it?

Well, these characters *do* have semantics already.

NO-BREAK SPACE means "a space that can not be changed to a line break".
On further reading, it turns out that I was wrong in assuming it was
otherwise like a space - Unicode treats space as a gap between words or
sentences, but NO-BREAK SPACE only as the former. In particular, you are
supposed to use it in contexts like: "Dr.&nbsp;Jones" where two words
must remain visually together. I would argue that abbreviations are an
exactly similar situation.

So this strikes me as the right thing for the job.

ZERO WIDTH JOINER means "the things either side are associated more closely
than normal". In this case, we're saying that the dot and space are
associated more closely than normal - the space is attached to the dot
rather than the dot being the end of a sentence. People can still use ZWJ
for other purposes without affecting us; it would only have our special
semantic ("not the end of a sentence") when joining dot to space. However,
I only suggested it because I misunderstood NO-BREAK SPACE.

> I go back to stating a preference for markup, something like <nul />,
> e.g. "i.e.<nul /> " to avoid triggering the rendering of "." WSP as
> "." SP SP

But what does the null directive *mean*?

Or is the rule rather "dot followed by space is the end of a sentence if
and only if there is no intervening directive"? That's surely more
complicated than using &nbsp;.

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646

