From owner-tcp-impl@lerc.nasa.gov  Sun Nov  1 14:31:10 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id OAA03513
	for <tcpimpl-archive@lists.ietf.org>; Sun, 1 Nov 1998 14:31:10 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id MAA25605; Sun, 1 Nov 1998 12:59:59 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from zephyr.isi.edu (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id MAA25601; Sun, 1 Nov 1998 12:59:57 -0500 (EST)
From: braden@ISI.EDU
Received: from gra.isi.edu (gra.isi.edu [128.9.160.133])
	by zephyr.isi.edu (8.8.7/8.8.6) with SMTP id JAA02678;
	Sun, 1 Nov 1998 09:59:55 -0800 (PST)
Date: Sun, 1 Nov 1998 09:57:17 -0800
Posted-Date: Sun, 1 Nov 1998 09:57:17 -0800
Message-Id: <199811011757.AA07037@gra.isi.edu>
Received: by gra.isi.edu (5.65c/4.0.3-6)
	id <AA07037>; Sun, 1 Nov 1998 09:57:17 -0800
To: ehall@ehsco.com
Subject: Re: old TCP options
Cc: tcp-impl@lerc.nasa.gov
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


  *> >
  *> >Did anybody ever make use of any of the expiremental or legacy TCP
  *> >options, such as NAK, Echo and Alternative Checksums? Are these options
  *> >seen much today, still?
  *> >
  *> >Thanks
  *> >
  *> >-- 
  *> >Eric A. Hall                                            ehall@ehsco.com
  *> >+1-650-685-0557                                    http://www.ehsco.com


I don't recognized NAK (see list below, taken from the IANA page).
Echo has been superceded by TSOPT, which certainly is implemented.

Bob Braden


Kind   Length   Meaning                           Reference
----   ------   -------------------------------   ---------
  0        -    End of Option List                 [RFC793]
  1        -    No-Operation                       [RFC793]
  2        4    Maximum Segment Size               [RFC793]
  3        3    WSOPT - Window Scale              [RFC1323]
  4        2    SACK Permitted                    [RFC1072]
  5        N    SACK                              [RFC1072]
  6        6    Echo (obsoleted by option 8)      [RFC1072]
  7        6    Echo Reply (obsoleted by option 8)[RFC1072]
  8       10    TSOPT - Time Stamp Option         [RFC1323]
  9        2    Partial Order Connection Permitted[RFC1693]
 10        3    Partial Order Service Profile     [RFC1693]
 11             CC                                 [Braden]
 12             CC.NEW                             [Braden]
 13             CC.ECHO                            [Braden]
 14         3   TCP Alternate Checksum Request    [RFC1146]
 15         N   TCP Alternate Checksum Data       [RFC1146]
 16             Skeeter                           [Knowles]
 17             Bubba                             [Knowles]
 18         3   Trailer Checksum Option    [Subbu & Monroe]
 19        18   MD5 Signature Option              [RFC2385]


From owner-tcp-impl@lerc.nasa.gov  Sun Nov  1 14:33:43 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id OAA03524
	for <tcpimpl-archive@lists.ietf.org>; Sun, 1 Nov 1998 14:33:42 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id NAA27348; Sun, 1 Nov 1998 13:25:59 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from Krill.EHSco.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id NAA27344; Sun, 1 Nov 1998 13:25:57 -0500 (EST)
Received: from ehsco.com (Ferret.EHSco.com [209.31.7.45])
          by Krill.EHSco.com (Netscape Messaging Server 3.5)  with ESMTP
          id AAA1FC4; Sun, 1 Nov 1998 10:25:52 -0800
Message-ID: <363CA7A3.2414CDB@ehsco.com>
Date: Sun, 01 Nov 1998 10:25:39 -0800
From: "Eric A. Hall" <ehall@ehsco.com>
Organization: EHS Company
X-Mailer: Mozilla 4.5b2 [en] (WinNT; I)
X-Accept-Language: en
MIME-Version: 1.0
To: braden@ISI.EDU
CC: tcp-impl@lerc.nasa.gov
Subject: Re: old TCP options
References: <199811011757.AA07037@gra.isi.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


> I don't recognized NAK (see list below, taken from the IANA page).

NAK was defined in 1106, along with "Big Windows," an alternative to
1072's original Window Scale Option.

I know RFC 1106 is dead, but am wondering if these (and any of the other
"non-standard") options are seen much. Has anybody ever done much with
any of them (like maybe Tandem, the employer of 1106's author)?

-- 
Eric A. Hall                                            ehall@ehsco.com
+1-650-685-0557                                    http://www.ehsco.com


From owner-tcp-impl@lerc.nasa.gov  Sun Nov  1 17:48:39 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id RAA04313
	for <tcpimpl-archive@lists.ietf.org>; Sun, 1 Nov 1998 17:48:39 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id QAA07723; Sun, 1 Nov 1998 16:22:37 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from zephyr.isi.edu (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id QAA07719; Sun, 1 Nov 1998 16:22:35 -0500 (EST)
From: braden@ISI.EDU
Received: from gra.isi.edu (gra.isi.edu [128.9.160.133])
	by zephyr.isi.edu (8.8.7/8.8.6) with SMTP id NAA06599;
	Sun, 1 Nov 1998 13:22:34 -0800 (PST)
Date: Sun, 1 Nov 1998 13:19:57 -0800
Posted-Date: Sun, 1 Nov 1998 13:19:57 -0800
Message-Id: <199811012119.AA07167@gra.isi.edu>
Received: by gra.isi.edu (5.65c/4.0.3-6)
	id <AA07167>; Sun, 1 Nov 1998 13:19:57 -0800
To: braden@ISI.EDU, ehall@ehsco.com
Subject: Re: old TCP options
Cc: tcp-impl@lerc.nasa.gov
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

  *> 
  *> 
  *> > I don't recognized NAK (see list below, taken from the IANA page).
  *> 
  *> NAK was defined in 1106, along with "Big Windows," an alternative to
  *> 1072's original Window Scale Option.
  *> 
  *> I know RFC 1106 is dead, but am wondering if these (and any of the other
  *> "non-standard") options are seen much. Has anybody ever done much with
  *> any of them (like maybe Tandem, the employer of 1106's author)?
  *> 
  *> -- 
  *> Eric A. Hall                                            ehall@ehsco.com
  *> +1-650-685-0557                                    http://www.ehsco.com
  *> 
Hmmm, curious.  RFC 1106 does not even appear in the pantheon
of IETF protocols (RFC 2400); it does not seem to exist at all.
There is no evidence that any option codes were ever assigned.
to it, so implementation seems unlikely.

Bob


From owner-tcp-impl@cthulhu.engr.sgi.com  Mon Nov  2 22:31:10 1998
Received: from sgi.sgi.com (SGI.COM [192.48.153.1])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id WAA03164
	for <tcpimpl-archive@odin.ietf.org>; Mon, 2 Nov 1998 22:31:09 -0500 (EST)
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) 
	by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam:
       SGI does not authorize the use of its proprietary
       systems or networks for unsolicited or bulk email
       from the Internet.) 
	via ESMTP id TAA04084; Mon, 2 Nov 1998 19:24:49 -0800 (PST)
	mail_from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF)
	id TAA49742
	for tcp-impl-list;
	Mon, 2 Nov 1998 19:19:44 -0800 (PST)
	mail_from (owner-tcp-impl@relay.engr.sgi.com)
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id TAA52897
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Mon, 2 Nov 1998 19:19:42 -0800 (PST)
	mail_from (cardwell@cs.washington.edu)
Received: from sake.cs.washington.edu (sake.cs.washington.edu [128.95.4.55]) 
	by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam:
       SGI does not authorize the use of its proprietary
       systems or networks for unsolicited or bulk email
       from the Internet.) 
	via ESMTP id TAA01208
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 2 Nov 1998 19:19:41 -0800 (PST)
	mail_from (cardwell@cs.washington.edu)
Received: from localhost (cardwell@localhost) by sake.cs.washington.edu (8.8.8+CS/7.2ws+) with SMTP id TAA28406; Mon, 2 Nov 1998 19:19:41 -0800
Date: Mon, 2 Nov 1998 19:19:40 -0800 (PST)
From: Neal Cardwell <cardwell@cs.washington.edu>
To: tcp-impl@cthulhu.engr.sgi.com
cc: Neal Cardwell <cardwell@cs.washington.edu>
Subject: delayed ACKs for retransmitted packets: ouch!
Message-ID: <Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@cthulhu.engr.sgi.com
Precedence: bulk


Recently i've been looking at a scenario where i'm seeing delayed ACKs for
retransmitted packets really destroy the performance of New Reno.

The TCP congestion control draft (draft-ietf-tcpimpl-cong-control-00.txt)
specifies that "Out-of-order data segments SHOULD be acknowledged
immediately, in order to trigger the fast retransmit algorithm." Many
implementations -- at least FreeBSD 3.0 and Linux 2.1, and probably most
others, i'm guessing -- interpret this by sending an immediate
acknowledgment only if a data segment they receive is above a hole in
their receive queue. That is, the ACK only if the sequence number is above
and not equal to rcv_next (see Figure 27.15 in Stevens vol 2 for the code
snippet that does this in Net/3 and FreeBSD).

Unfortunately, this means that if the sender retransmits a single segment
which fills in a hole, then the receiver finds that this segment fits in
nicely at rcv_next. So the receiver will sit around until its delayed ACK
timer expires, possibly hundreds of ms later. Only then will it ACK to the
sender that the hole has successfully been filled, and only then will the
sender be able to continue on, perhaps filling other holes.

Consider the following sequence plots of tcpdumps of two TCP connections:

http://www.cs.washington.edu/homes/cardwell/misc/xfer1.ps
http://www.cs.washington.edu/homes/cardwell/misc/xfer2.ps

These show a Linux 2.1.126 sender at UW sending 100KB to my Linux 2.0.32
machine at home over my 440Kbps DSL line. The traces are from the
perspective of the sender.  The RTT is about 22ms for short packets, and
the MSS is about 1460 bytes.

These transfers should have taken about 2 seconds, judging from the slope
of the ACKs during slow start. But of course slow start overshoots, and
there are many losses at around the 1 second mark in both traces. Now
because the Linux 2.1.126 sender is using New Reno, it spends several
painful seconds in Fast Recovery filling in the holes, one segment at a
time. As a result the second transfer, for instance, spends nearly 5
seconds in Fast Recovery; during this period I'm getting about 30Kbps on
average, and not so happy about the $ i forked over for DSL buying me
modem performance!

Why does it spend so long in Fast Recovery? I think the main problem is
that the receiver is delaying its ACKs for the retransmitted segments that
are nicely filling holes in its receive queue. It happens to be delaying
them by a lot, due to the particular delayed ACK implementation in Linux
2.0. But i think the point is that delaying acknowledgments is a very bad
idea when the sender is filling in holes one packet at a time, as it will
tend to do in Fast Recovery, or immediately after an RTO (assuming no
SACK).

So what i'm asking is this: is it a good idea to clarify or extend the
notion of "out-of-order" data that should be ACKed immediately, in such a
way that data segments that fill in a hole in the receive queue should be
ACKed immediately? This would seem to alleviate this problem with New
Reno. Are there other scenarios where it would make things worse instead?

neal




From owner-tcp-impl@lerc.nasa.gov  Tue Nov  3 16:19:48 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id QAA13902
	for <tcpimpl-archive@lists.ietf.org>; Tue, 3 Nov 1998 16:19:47 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id OAA03482; Tue, 3 Nov 1998 14:08:13 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from mercury.Sun.COM (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with SMTP (NASA LeRC 8.7.4.1/2.01-main)
        id OAA03458; Tue, 3 Nov 1998 14:08:06 -0500 (EST)
Received: from Eng.Sun.COM (engmail4 [129.144.134.6]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id LAA05400; Tue, 3 Nov 1998 11:08:02 -0800
Received: from shield.eng.sun.com (shield.Eng.Sun.COM [129.146.85.114])
	by Eng.Sun.COM (SMI-8.6/SMI-5.3) with ESMTP id LAA02718;
	Tue, 3 Nov 1998 11:08:01 -0800
Received: from shield.eng.sun.com (shield.Eng.Sun.COM [129.146.85.114])
	by shield.eng.sun.com (8.9.1b+Sun/8.9.1) with SMTP id LAA16017;
	Tue, 3 Nov 1998 11:08:01 -0800 (PST)
Date: Tue, 3 Nov 1998 11:08:00 -0800 (PST)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: delayed ACKs for retransmitted packets: ouch!
To: Neal Cardwell <cardwell@cs.washington.edu>
Cc: tcp-impl@lerc.nasa.gov
In-Reply-To: "Your message with ID" <Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.washington.edu>
Message-ID: <Roam.SIMCSD.2.0.4.910120080.27665.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> The TCP congestion control draft (draft-ietf-tcpimpl-cong-control-00.txt)
> specifies that "Out-of-order data segments SHOULD be acknowledged
> immediately, in order to trigger the fast retransmit algorithm." Many
> implementations -- at least FreeBSD 3.0 and Linux 2.1, and probably most
> others, i'm guessing -- interpret this by sending an immediate
> acknowledgment only if a data segment they receive is above a hole in
> their receive queue. That is, the ACK only if the sequence number is above
> and not equal to rcv_next (see Figure 27.15 in Stevens vol 2 for the code
> snippet that does this in Net/3 and FreeBSD).

That is quite interesting.  Solaris sends an ACK immediately in this case.
That is why we never saw this problem when we added NewReno to Solaris 2.6.
Maybe you can try to use a Solaris 2.6 or 7 as the receiver and see what
will happen.  Also compare the result using a Solaris sender.

> So what i'm asking is this: is it a good idea to clarify or extend the
> notion of "out-of-order" data that should be ACKed immediately, in such a
> way that data segments that fill in a hole in the receive queue should be
> ACKed immediately? This would seem to alleviate this problem with New
> Reno. Are there other scenarios where it would make things worse instead?

Actually, I always think that a segment which fills a hole is an out of
order segment.  IMHO, this looks like a bug in the implementation.  Maybe
this can be another item in the known problem draft?

							K. Poon.
							kcpoon@eng.sun.com




From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 12:43:08 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id MAA12987
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 12:43:08 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id KAA22920; Wed, 4 Nov 1998 10:53:38 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from Krill.EHSco.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id KAA22915; Wed, 4 Nov 1998 10:53:36 -0500 (EST)
Received: from ehsco.com (Ferret.EHSco.com [209.31.7.45])
          by Krill.EHSco.com (Netscape Messaging Server 3.5)  with ESMTP
          id AAA2501 for <tcp-impl@lerc.nasa.gov>;
          Wed, 4 Nov 1998 07:53:33 -0800
Message-ID: <36407877.43D48BA3@ehsco.com>
Date: Wed, 04 Nov 1998 07:53:27 -0800
From: "Eric A. Hall" <ehall@ehsco.com>
Organization: EHS Company
X-Mailer: Mozilla 4.5b2 [en] (WinNT; I)
X-Accept-Language: en
MIME-Version: 1.0
To: TCP Implementations <tcp-impl@lerc.nasa.gov>
Subject: MTU, MRU, and MSS
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


Question about PPP MTU/MRU sizes and how they affect the MSS being
advertised in various implementations:

RFC 793 states unequivocably that the local network's MRU should be used
for the value that is advertised in the MSS:

        Maximum Segment Size Option Data:  16 bits

          If this option is present, then it communicates the maximum
          receive segment size at the TCP which sends this segment.

However, many of the systems I've been testing don't seem to do this,
instead choosing to advertise a (predetermined) fixed size that has no
relation to the negotiated MRU. Also, some systems seem to use the MTU
for the MSS, even though MRU should be the deciding factor since A) the
spec says so, and B) anything larger would fragment.

Does anybody have a good reason for not defining MSS based on MRU?

-- 
Eric A. Hall                                            ehall@ehsco.com
+1-650-685-0557                                    http://www.ehsco.com


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 13:02:10 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id NAA13709
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 13:02:09 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id LAA27530; Wed, 4 Nov 1998 11:28:25 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from snowcrash.cymru.net (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id LAA27522; Wed, 4 Nov 1998 11:28:22 -0500 (EST)
Received: from the-village.bc.nu (lightning.swansea.uk.linux.org [194.168.151.1]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id QAA27375; Wed, 4 Nov 1998 16:28:18 GMT
Received: by the-village.bc.nu (Smail3.1.29.1 #2)
	id m0zb6f8-0007U6C; Wed, 4 Nov 98 17:24 GMT
Message-Id: <m0zb6f8-0007U6C@the-village.bc.nu>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: MTU, MRU, and MSS
To: ehall@ehsco.com (Eric A. Hall)
Date: Wed, 4 Nov 1998 17:24:05 +0000 (GMT)
Cc: tcp-impl@lerc.nasa.gov
In-Reply-To: <36407877.43D48BA3@ehsco.com> from "Eric A. Hall" at Nov 4, 98 07:53:27 am
Content-Type: text
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> relation to the negotiated MRU. Also, some systems seem to use the MTU
> for the MSS, even though MRU should be the deciding factor since A) the
> spec says so, and B) anything larger would fragment.
> 
> Does anybody have a good reason for not defining MSS based on MRU?

Pre MTU BSD always advertised a lower MSS for 'non local' connections.
Linux uses the MTU but allows the user to override the value, SGI's 
appear to advertise a fixed max of 48K on HIPPI. 

There are plenty of reasons for being the MSS size

o	To keep window > 2*mss for performance
o	To get optimal sized buffers from the remote end for local
	host speed
o	To avoid fragmentation on mixed media paths without MTU
	discovery

Alan



From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 13:36:36 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id NAA14596
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 13:36:35 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id LAA00658; Wed, 4 Nov 1998 11:52:24 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from snowcrash.cymru.net (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id LAA00646; Wed, 4 Nov 1998 11:52:21 -0500 (EST)
Received: from the-village.bc.nu (lightning.swansea.uk.linux.org [194.168.151.1]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id QAA27840; Wed, 4 Nov 1998 16:52:18 GMT
Received: by the-village.bc.nu (Smail3.1.29.1 #2)
	id m0zb72N-0007U5C; Wed, 4 Nov 98 17:48 GMT
Message-Id: <m0zb72N-0007U5C@the-village.bc.nu>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: MTU, MRU, and MSS
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
Date: Wed, 4 Nov 1998 17:48:06 +0000 (GMT)
Cc: ehall@ehsco.com, tcp-impl@lerc.nasa.gov
In-Reply-To: <m0zb6f8-0007U6C@the-village.bc.nu> from "Alan Cox" at Nov 4, 98 05:24:05 pm
Content-Type: text
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> Pre MTU BSD always advertised a lower MSS for 'non local' connections.
   ------^
 discovery




From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 14:16:32 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id OAA15345
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 14:16:31 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id MAA06695; Wed, 4 Nov 1998 12:45:52 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from calcite.rhyolite.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id MAA06679; Wed, 4 Nov 1998 12:45:46 -0500 (EST)
Received: (from vjs@localhost)
	by calcite.rhyolite.com (8.8.5/calcite) id KAA04980
	env-from <vjs>;
	Wed, 4 Nov 1998 10:45:16 -0700 (MST)
Date: Wed, 4 Nov 1998 10:45:16 -0700 (MST)
From: Vernon Schryver <vjs@calcite.rhyolite.com>
Message-Id: <199811041745.KAA04980@calcite.rhyolite.com>
To: alan@lxorguk.ukuu.org.uk, ehall@ehsco.com
Subject: Re: MTU, MRU, and MSS
Cc: tcp-impl@lerc.nasa.gov
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> From: alan@lxorguk.ukuu.org.uk (Alan Cox)
> To: ehall@ehsco.com (Eric A. Hall)
> Cc: tcp-impl@lerc.nasa.gov

>                                                         ...   SGI's 
> appear to advertise a fixed max of 48K on HIPPI. 
> ...

The last time I looked at SGI's source, that's not at all how it works.
Instead, a slightly modifed version of the familiar BSD computes a nice
value for the MSS that is based on the link MTU, other things such as
socket option, the global configuration switch, the MMU page size and the
sizes of various kinds of buffers.


Vernon Schryver    vjs@rhyolite.com


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 14:53:58 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id OAA16005
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 14:53:57 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id NAA09043; Wed, 4 Nov 1998 13:06:26 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from Krill.EHSco.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id NAA09033; Wed, 4 Nov 1998 13:06:22 -0500 (EST)
Received: from ehsco.com (Arachnid.EHSco.com [209.31.7.46])
          by Krill.EHSco.com (Netscape Messaging Server 3.5)  with ESMTP
          id AAA2552; Wed, 4 Nov 1998 10:06:18 -0800
Message-ID: <36409799.48348AE4@ehsco.com>
Date: Wed, 04 Nov 1998 10:06:17 -0800
From: "Eric A. Hall" <ehall@ehsco.com>
Organization: EHS Company
X-Mailer: Mozilla 4.05 [en] (X11; I; Linux 2.0.34 i686)
MIME-Version: 1.0
To: Vernon Schryver <vjs@calcite.rhyolite.com>
CC: tcp-impl@lerc.nasa.gov
Subject: Re: MTU, MRU, and MSS
References: <199811041745.KAA04980@calcite.rhyolite.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> Instead, a slightly modifed version of the familiar BSD computes a nice
> value for the MSS that is based on the link MTU

Well, that's the question. Shouldn't this be calculated according to
MRU, particularly in those situations where MTU and MRU differ (PPP
circuits with different MTU and MRU sizes, TR nets with fixed MTU but
variable MRUs, etc)? 

In such cases, the MSS should be defined according to the MRU of the
local network, according to RFC 793 anyway. What value is there in
deriving MSS from MTU instead?


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 14:59:27 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id OAA16217
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 14:59:26 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id NAA11209; Wed, 4 Nov 1998 13:21:50 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from calcite.rhyolite.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id NAA11203; Wed, 4 Nov 1998 13:21:47 -0500 (EST)
Received: (from vjs@localhost)
	by calcite.rhyolite.com (8.8.5/calcite) id LAA05777
	for tcp-impl@lerc.nasa.gov  env-from <vjs>;
	Wed, 4 Nov 1998 11:21:45 -0700 (MST)
Date: Wed, 4 Nov 1998 11:21:45 -0700 (MST)
From: Vernon Schryver <vjs@calcite.rhyolite.com>
Message-Id: <199811041821.LAA05777@calcite.rhyolite.com>
To: tcp-impl@lerc.nasa.gov
Subject: Re: MTU, MRU, and MSS
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> From: "Eric A. Hall" <ehall@ehsco.com>

> > Instead, a slightly modifed version of the familiar BSD computes a nice
> > value for the MSS that is based on the link MTU
>
> Well, that's the question. Shouldn't this be calculated according to
> MRU, particularly in those situations where MTU and MRU differ (PPP
> circuits with different MTU and MRU sizes, TR nets with fixed MTU but
> variable MRUs, etc)? 
>
> In such cases, the MSS should be defined according to the MRU of the
> local network, according to RFC 793 anyway. What value is there in
> deriving MSS from MTU instead?

In the case of PPP, things are not as simple as you imply.  It is
fairly common that the MTU and MRU change during the lifetime of
the connection.  Consider the effects of negotiating CCP or ECP on
the PPP MTU and MRU.  Consider also what happens if your PPP code
drops in and out of MP encapsulation as the number of links increases
to more than one and returns to one.

Then there is the fact that absolutely every conformant PPP
installation has an MRU of at least 1500, but may have a much
smaller configured MTU.  (While the PPP chitchat allows an PPP
system to ask for an MRU of less than 1500, it can always be forced
to accept 1500 by the peer Configure-Reject'ting the MRU option.)

Hosts tend to not talk directly to the boxes that know about the worst
complications of MTUs and MRUs.  They tend to talk to a LAN that has a
single value for the MRU and MTU.  (If your IP/802.5 code has differing
MTU and MRU, then I would not call that wonderful feature.)  In practice,
using a single value for both the MRU and MTU is good enough.  That it is
called the MTU reflects the fact that the MTU is the constraint on what
the system can choose to do, while the MRU is an implementation limitation.


Vernon Schryver    vjs@rhyolite.com


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 15:06:27 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id PAA16426
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 15:06:26 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id NAA11544; Wed, 4 Nov 1998 13:25:21 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from zephyr.isi.edu (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id NAA11532; Wed, 4 Nov 1998 13:25:17 -0500 (EST)
From: braden@ISI.EDU
Received: from gra.isi.edu (gra.isi.edu [128.9.160.133])
	by zephyr.isi.edu (8.8.7/8.8.6) with SMTP id KAA05488;
	Wed, 4 Nov 1998 10:25:13 -0800 (PST)
Date: Wed, 4 Nov 1998 10:22:33 -0800
Posted-Date: Wed, 4 Nov 1998 10:22:33 -0800
Message-Id: <199811041822.AA09022@gra.isi.edu>
Received: by gra.isi.edu (5.65c/4.0.3-6)
	id <AA09022>; Wed, 4 Nov 1998 10:22:33 -0800
To: tcp-impl@lerc.nasa.gov, ehall@ehsco.com
Subject: Re: MTU, MRU, and MSS
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


  *> 
  *> RFC 793 states unequivocably that the local network's MRU should be used
  *> for the value that is advertised in the MSS:
  *> 
  *>         Maximum Segment Size Option Data:  16 bits
  *> 
  *>           If this option is present, then it communicates the maximum
  *>           receive segment size at the TCP which sends this segment.
  *> 

Note that RFC793 is not the latest word; this issue is discussed more fully
in RFC1122 (section 4.2.2.6).

However, I am not aware of any consideration of the possibility
that MTU and MRU (which I assume means Maximum Receive Unit?) might
be different.  Is this really a problem?

Bob Braden


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 15:31:14 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id PAA17009
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 15:31:13 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id NAA14611; Wed, 4 Nov 1998 13:48:13 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from Krill.EHSco.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id NAA14606; Wed, 4 Nov 1998 13:48:11 -0500 (EST)
Received: from ehsco.com (Ferret.EHSco.com [209.31.7.45])
          by Krill.EHSco.com (Netscape Messaging Server 3.5)  with ESMTP
          id AAA2578; Wed, 4 Nov 1998 10:48:07 -0800
Message-ID: <3640A166.151CA47D@ehsco.com>
Date: Wed, 04 Nov 1998 10:48:06 -0800
From: "Eric A. Hall" <ehall@ehsco.com>
Organization: EHS Company
X-Mailer: Mozilla 4.5b2 [en] (WinNT; I)
X-Accept-Language: en
MIME-Version: 1.0
To: braden@ISI.EDU
CC: tcp-impl@lerc.nasa.gov
Subject: Re: MTU, MRU, and MSS
References: <199811041822.AA09022@gra.isi.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


> Note that RFC793 is not the latest word; this issue is discussed more
> fully in RFC1122 (section 4.2.2.6).

1122 clarifies that systems must use MSS of 536 if not specified. It
doesn't change the notion of MRU being the seed for MSS (when it does
get specified). Indeed, it clarifies that MSS should be restricted to
MSS_S, which is just another way of saying MRU.

> However, I am not aware of any consideration of the possibility
> that MTU and MRU (which I assume means Maximum Receive Unit?) might
> be different.  Is this really a problem?

It's a problem when the implementing TCP insists on sending an MSS of
1460, even though the MRU of the PPP link may only be capable of 536
byte segments (as is found with all Microsoft-native PPP clients).

Other situations (like TR's variable-length MRU) may not be "problems"
so much as they are "less efficient" solutions. An arguable position.

-- 
Eric A. Hall                                            ehall@ehsco.com
+1-650-685-0557                                    http://www.ehsco.com


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 15:37:32 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id PAA17110
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 15:37:32 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id NAA13071; Wed, 4 Nov 1998 13:38:00 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from gecko.nas.nasa.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id NAA13065; Wed, 4 Nov 1998 13:37:58 -0500 (EST)
Received: from gecko.nas.nasa.gov (kml@localhost)
	by gecko.nas.nasa.gov (8.9.1a/NAS8.8.7n) with ESMTP id KAA21781;
	Wed, 4 Nov 1998 10:37:56 -0800 (PST)
Message-Id: <199811041837.KAA21781@gecko.nas.nasa.gov>
To: "Eric A. Hall" <ehall@ehsco.com>
cc: TCP Implementations <tcp-impl@lerc.nasa.gov>
Subject: Re: MTU, MRU, and MSS 
In-reply-to: Your message of "Wed, 04 Nov 1998 07:53:27 PST."
             <36407877.43D48BA3@ehsco.com> 
Date: Wed, 04 Nov 1998 10:37:56 -0800
From: "Kevin M. Lahey" <kml@nas.nasa.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

In message <36407877.43D48BA3@ehsco.com>"Eric A. Hall" writes
>
>Question about PPP MTU/MRU sizes and how they affect the MSS being
>advertised in various implementations:
>
>RFC 793 states unequivocably that the local network's MRU should be used
>for the value that is advertised in the MSS:
>
>        Maximum Segment Size Option Data:  16 bits
>
>          If this option is present, then it communicates the maximum
>          receive segment size at the TCP which sends this segment.

Doesn't this mean the largest segment that can be reassembled on
that host, rather than the largest frame that can be received over
that link?

RFC1191 (Path MTU Discovery) suggests:

   The MSS option should be 40 octets less than the
   size of the largest datagram the host is able to reassemble (MMS_R,
   as defined in [1]); in many cases, this will be the architectural
   limit of 65495 (65535 - 40) octets.  A host MAY send an MSS value
   derived from the MTU of its connected network (the maximum MTU over
   its connected networks, for a multi-homed host); this should not
   cause problems for PMTU Discovery, and may dissuade a broken peer
   from sending enormous datagrams.

          Note: At the moment, we see no reason to send an MSS greater
          than the maximum MTU of the connected networks, and we
          recommend that hosts do not use 65495.  It is quite possible
          that some IP implementations have sign-bit bugs that would be
          tickled by unnecessary use of such a large MSS.

Given the number of asymetric routes out there, it seems to make
some sense to send the maximum of the MRUs of the interfaces
on the host.  This is what we do in NetBSD right now, although
I've heard reasonable arguments for using the MRU of the interface.

Given the number of run-time configurable network interfaces out there,
I wonder how useful the MSS is at all?  What if I boot my system on the
Ethernet, establish a connection, and then fire up my [mythical]
PPP-over-SONET interface?  It'd be nice for the connection to adjust to
use all of the available MTU.  Wouldn't this be useful for mobile
applications as well?

Kevin
kml@nas.nasa.gov


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 15:47:46 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id PAA17190
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 15:47:45 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id NAA15418; Wed, 4 Nov 1998 13:52:02 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from zephyr.isi.edu (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id NAA15408; Wed, 4 Nov 1998 13:51:56 -0500 (EST)
From: braden@ISI.EDU
Received: from gra.isi.edu (gra.isi.edu [128.9.160.133])
	by zephyr.isi.edu (8.8.7/8.8.6) with SMTP id KAA06915;
	Wed, 4 Nov 1998 10:51:52 -0800 (PST)
Date: Wed, 4 Nov 1998 10:49:12 -0800
Posted-Date: Wed, 4 Nov 1998 10:49:12 -0800
Message-Id: <199811041849.AA09058@gra.isi.edu>
Received: by gra.isi.edu (5.65c/4.0.3-6)
	id <AA09058>; Wed, 4 Nov 1998 10:49:12 -0800
To: braden@ISI.EDU, ehall@ehsco.com
Subject: Re: MTU, MRU, and MSS
Cc: tcp-impl@lerc.nasa.gov
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk



  *> 
  *> It's a problem when the implementing TCP insists on sending an MSS of
  *> 1460, even though the MRU of the PPP link may only be capable of 536
  *> byte segments (as is found with all Microsoft-native PPP clients).
  *> 

Sorry, I don't understand here whether it is Microsoft-native PPP
that is broken here, or TCP?

Bob


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 15:52:56 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id PAA17267
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 15:52:55 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id OAA18810; Wed, 4 Nov 1998 14:15:13 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from gecko.nas.nasa.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id OAA18798; Wed, 4 Nov 1998 14:15:11 -0500 (EST)
Received: from gecko.nas.nasa.gov (kml@localhost)
	by gecko.nas.nasa.gov (8.9.1a/NAS8.8.7n) with ESMTP id LAA22111;
	Wed, 4 Nov 1998 11:14:39 -0800 (PST)
Message-Id: <199811041914.LAA22111@gecko.nas.nasa.gov>
To: "Eric A. Hall" <ehall@ehsco.com>
cc: TCP Implementations <tcp-impl@lerc.nasa.gov>
Subject: Re: MTU, MRU, and MSS 
In-reply-to: Your message of "Wed, 04 Nov 1998 10:54:43 PST."
             <3640A2F3.3783B588@ehsco.com> 
Date: Wed, 04 Nov 1998 11:14:39 -0800
From: "Kevin M. Lahey" <kml@nas.nasa.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

In message <3640A2F3.3783B588@ehsco.com>"Eric A. Hall" writes
>
>> > If this option is present, then it communicates the
>> > maximum receive segment size at the TCP which sends this segment.
>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>My interpration is that this means "one whole segment," but I could be
>wrong. If as you say, the question is moot.
>
>>    The MSS option should be 40 octets less than the
>>    size of the largest datagram the host is able to reassemble (MMS_R,
>>    as defined in [1])
>
>Okay, so 1191 and 793/1122 are in conflict. Which wins?

Actually, I don't think there is really much of a conflict.
RFC 1122 says:

            The MSS value to be sent in an MSS option must be less than 
            or equal to:

               MMS_R - 20

            where MMS_R is the maximum size for a transport-layer
            message that can be received (and reassembled).  TCP obtains
            MMS_R and MMS_S from the IP layer; see the generic call
            GET_MAXSIZES in Section 3.4.

Kevin
kml@nas.nasa.gov


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 15:56:07 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id PAA17302
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 15:56:06 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id NAA15760; Wed, 4 Nov 1998 13:54:51 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from Krill.EHSco.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id NAA15754; Wed, 4 Nov 1998 13:54:49 -0500 (EST)
Received: from ehsco.com (Ferret.EHSco.com [209.31.7.45])
          by Krill.EHSco.com (Netscape Messaging Server 3.5)  with ESMTP
          id AAA2545; Wed, 4 Nov 1998 10:54:44 -0800
Message-ID: <3640A2F3.3783B588@ehsco.com>
Date: Wed, 04 Nov 1998 10:54:43 -0800
From: "Eric A. Hall" <ehall@ehsco.com>
Organization: EHS Company
X-Mailer: Mozilla 4.5b2 [en] (WinNT; I)
X-Accept-Language: en
MIME-Version: 1.0
To: "Kevin M. Lahey" <kml@nas.nasa.gov>
CC: TCP Implementations <tcp-impl@lerc.nasa.gov>
Subject: Re: MTU, MRU, and MSS
References: <199811041837.KAA21781@gecko.nas.nasa.gov>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


> Doesn't this mean the largest segment that can be reassembled on
> that host, rather than the largest frame that can be received over
> that link?

> > If this option is present, then it communicates the
> > maximum receive segment size at the TCP which sends this segment.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

My interpration is that this means "one whole segment," but I could be
wrong. If as you say, the question is moot.

>    The MSS option should be 40 octets less than the
>    size of the largest datagram the host is able to reassemble (MMS_R,
>    as defined in [1])

Okay, so 1191 and 793/1122 are in conflict. Which wins?

-- 
Eric A. Hall                                            ehall@ehsco.com
+1-650-685-0557                                    http://www.ehsco.com


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 15:58:29 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id PAA17326
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 15:58:29 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id OAA20461; Wed, 4 Nov 1998 14:26:58 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from calcite.rhyolite.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id OAA20457; Wed, 4 Nov 1998 14:26:55 -0500 (EST)
Received: (from vjs@localhost)
	by calcite.rhyolite.com (8.8.5/calcite) id MAA06887
	env-from <vjs>;
	Wed, 4 Nov 1998 12:26:51 -0700 (MST)
Date: Wed, 4 Nov 1998 12:26:51 -0700 (MST)
From: Vernon Schryver <vjs@calcite.rhyolite.com>
Message-Id: <199811041926.MAA06887@calcite.rhyolite.com>
To: ehall@ehsco.com, tcp-impl@lerc.nasa.gov
Subject: Re: MTU, MRU, and MSS
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> From: braden@ISI.EDU

> However, I am not aware of any consideration of the possibility
> that MTU and MRU (which I assume means Maximum Receive Unit?) might
> be different.  Is this really a problem?

Please excuse me for trying again.
I think I've finally understood the point and figured out a good response.

Yes, MTU's and MRU's do sometimes differ, but in essentially all cases,
the MTU is no larger than the MRU.  You can control the sizes of your
transmitted packets, but not the sizes of the packets you receive.
Reasonable systems are prepared for the worst ('be generous ...')
and so able to swallow the largest possible packet size, even when
configured or built to send something smaller ('be conservative ...').
If you tell the TCP peer to limit its segment size, you'll naturally
compute a value based on your configured transmit limit instead of your
worst case nightmare receive size.


Vernon Schryver    vjs@rhyolite.com


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 16:03:29 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id QAA17453
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 16:03:28 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id OAA21855; Wed, 4 Nov 1998 14:37:44 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from Krill.EHSco.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id OAA21845; Wed, 4 Nov 1998 14:37:41 -0500 (EST)
Received: from ehsco.com (Ferret.EHSco.com [209.31.7.45])
          by Krill.EHSco.com (Netscape Messaging Server 3.5)  with ESMTP
          id AAA25B9; Wed, 4 Nov 1998 11:37:37 -0800
Message-ID: <3640AD00.5C9B5DE5@ehsco.com>
Date: Wed, 04 Nov 1998 11:37:36 -0800
From: "Eric A. Hall" <ehall@ehsco.com>
Organization: EHS Company
X-Mailer: Mozilla 4.5b2 [en] (WinNT; I)
X-Accept-Language: en
MIME-Version: 1.0
To: Vernon Schryver <vjs@calcite.rhyolite.com>
CC: tcp-impl@lerc.nasa.gov
Subject: Re: MTU, MRU, and MSS
References: <199811041926.MAA06887@calcite.rhyolite.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


> If you tell the TCP peer to limit its segment size, you'll naturally
> compute a value based on your configured transmit limit instead of
> your worst case nightmare receive size.

This makes sense, but it is it always efficient, particularly when the
MTU is considerably smaller than the MRU?

Sometimes you want a very small MTU (fill faster) but a very large MRU
(receive larger). Forcing MSS to the (much smaller) MTU prevents this
from happening.

-- 
Eric A. Hall                                            ehall@ehsco.com
+1-650-685-0557                                    http://www.ehsco.com


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 16:25:37 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id QAA17866
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 16:25:36 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id OAA23208; Wed, 4 Nov 1998 14:48:57 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from calcite.rhyolite.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id OAA23196; Wed, 4 Nov 1998 14:48:54 -0500 (EST)
Received: (from vjs@localhost)
	by calcite.rhyolite.com (8.8.5/calcite) id MAA07376
	for tcp-impl@lerc.nasa.gov  env-from <vjs>;
	Wed, 4 Nov 1998 12:48:53 -0700 (MST)
Date: Wed, 4 Nov 1998 12:48:53 -0700 (MST)
From: Vernon Schryver <vjs@calcite.rhyolite.com>
Message-Id: <199811041948.MAA07376@calcite.rhyolite.com>
To: tcp-impl@lerc.nasa.gov
Subject: Re: MTU, MRU, and MSS
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> Okay, so 1191 and 793/1122 are in conflict. Which wins?

Whatever works.

Consider the large computers of a few years ago with their 100 MByte/sec
"slow speed channels" connected to FDDI interfaces that necessarily used
the FDDI MTU but a TCP MSS of 32K to talk to nearby systems.  That was done
for good and sufficent real life reasons.  4.3BSD (perhaps only after Reno)
could both send and receive such fragment segments.


Vernon Schryver    vjs@rhyolite.com


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 16:29:25 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id QAA17945
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 16:29:25 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id PAA25208; Wed, 4 Nov 1998 15:00:13 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from frantic.bsdi.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id PAA24981; Wed, 4 Nov 1998 15:00:08 -0500 (EST)
Received: (from dab@localhost)
	by frantic.bsdi.com (8.9.0/8.9.0) id OAA01385;
	Wed, 4 Nov 1998 14:00:00 -0600 (CST)
Date: Wed, 4 Nov 1998 14:00:00 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199811042000.OAA01385@frantic.bsdi.com>
To: ehall@ehsco.com
Subject: Re: MTU, MRU, and MSS
Cc: tcp-impl@lerc.nasa.gov
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> Date: Wed, 04 Nov 1998 10:48:06 -0800
> From: "Eric A. Hall" <ehall@ehsco.com>
> Subject: Re: MTU, MRU, and MSS
>
>
> > Note that RFC793 is not the latest word; this issue is discussed more
> > fully in RFC1122 (section 4.2.2.6).
>
> 1122 clarifies that systems must use MSS of 536 if not specified. It
> doesn't change the notion of MRU being the seed for MSS (when it does
> get specified). Indeed, it clarifies that MSS should be restricted to
> MSS_S, which is just another way of saying MRU.
  ^^^^^
A nit, that should be MMS_R, the maximum receive transport-message size.

> > However, I am not aware of any consideration of the possibility
> > that MTU and MRU (which I assume means Maximum Receive Unit?) might
> > be different.  Is this really a problem?
>
> It's a problem when the implementing TCP insists on sending an MSS of
> 1460, even though the MRU of the PPP link may only be capable of 536
> byte segments (as is found with all Microsoft-native PPP clients).

This should not be a problem unless the TCP receiving the MSS has bugs.

1)  On the machine that has the PPP link, the PPP layer should be
    communicating up to TCP that it received an MRU of 536, and TCP
    should then limit itself to sending packets no bigger than that, no
    matter how big the value is in the MSS option that it received.

2a) If the TCP is not the machine that has the PPP link, then if it
    implements PMTU Discovery, it shouldn't matter that the received MSS
    is larger than the MTU of one of the links in the path, that will be
    discovered.

2b) Otherwise, if it doesn't implement PMTU Discovery, then it doesn't
    know about the MTUs along the path, so since the remote machine is
    not on a local network, the TCP should be limiting it's packets to
    EMTU_S <= 576, which implies MMS_S = 556, or an Eff.snd.MSS = 536.

See RFC 1122 for more details.

In a nutshell, a well implemented TCP/IP should handle this situation
just fine.  Valid configurations that expose bugs in implementations
should not invalidate those configurations, the implementations should
be fixed.

		-David Borman, dab@bsdi.com


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 17:33:15 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id RAA18824
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 17:33:14 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id PAA02638; Wed, 4 Nov 1998 15:58:17 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from shasta-pc.shastanets.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id PAA02632; Wed, 4 Nov 1998 15:58:14 -0500 (EST)
Received: from stevea-pc (stevea-pc.shastanets.com [209.31.29.163]) by shasta-pc.shastanets.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.1960.3)
	id V69YYTHL; Wed, 4 Nov 1998 12:57:17 -0800
Reply-To: <stevea@shastanets.com>
From: "Steve Alexander" <stevea@shastanets.com>
To: <tcp-impl@lerc.nasa.gov>
Subject: RE: MTU, MRU, and MSS
Date: Wed, 4 Nov 1998 12:58:55 -0800
Message-ID: <000001be0835$f00ec020$a31d1fd1@stevea-pc.shastanets.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2232.26
Importance: Normal
In-Reply-To: <3640A166.151CA47D@ehsco.com>
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk



> 
> It's a problem when the implementing TCP insists on sending an MSS of
> 1460, even though the MRU of the PPP link may only be capable of 536
> byte segments (as is found with all Microsoft-native PPP clients).

My recollection from implementing path MTU discovery some years back
is that the only way to get any reasonable performance from MTU
discovery is to break the segment size / MTU / MRU correlation that
seems so natural and always offer an MSS that is the biggest thing that
you can atomically receive on the link from which you received the SYN.

Otherwise, you always offer something either too big, or too small.

Unfortunately, since paths can change after the 3-way handshake is
completed, this is simply a SWAG over the life of the connection.
At some point, it might be cool if IP could indicate whether the
datagram was reassembled so that TCP could renegotiate the MSS during
the life of the connection.  Unfortunately, I think that you can't do
that by law, and if you could, most implementations would dry heave
regardless.

-- Steve


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 18:23:42 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id SAA19396
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 18:23:41 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id RAA12295; Wed, 4 Nov 1998 17:13:38 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from atlrel2.hp.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id RAA12289; Wed, 4 Nov 1998 17:13:36 -0500 (EST)
Received: from loiter.cup.hp.com (root@loiter.cup.hp.com [15.13.104.252])
	by atlrel2.hp.com (8.8.6/8.8.5tis) with ESMTP id RAA03340
	for <tcp-impl@lerc.nasa.gov>; Wed, 4 Nov 1998 17:13:25 -0500 (EST)
Received: from cup.hp.com (raj@loiter [15.13.104.252]) by loiter.cup.hp.com with ESMTP (8.8.6/8.7.3 TIS Messaging 5.0) id OAA00682 for <tcp-impl@lerc.nasa.gov>; Wed, 4 Nov 1998 14:13:33 -0800 (PST)
Message-ID: <3640D18C.13F88AD6@cup.hp.com>
Date: Wed, 04 Nov 1998 14:13:32 -0800
From: Rick Jones <raj@cup.hp.com>
Organization: SNSL
X-Mailer: Mozilla 4.5 [en] (X11; U; HP-UX B.10.20 9000/735)
X-Accept-Language: en
MIME-Version: 1.0
To: tcp-impl@lerc.nasa.gov
Subject: probablility of successful TIME_WAIT reuse
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

If I have:

*) a passive acceptor of connections on a single port and IP address
*) an active connector to that tuple cycling through 60K port id's
*) roughly 15KB of data transfered on average on each connection
*) the acceptor being the initiator of graceful close

what are the chances of the active connector's SYN segments having ISN's
meeting the requirements for successful transition from TIME_WAIT to
ESTABLISHED (SYN_RECVD?) such that the active connector could generate
say 2000 connections per second over a fifteen minute interval?

rick jones

-- 
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, or post, but please do not do both...
my email address is raj in the cup.hp.com domain...


From owner-tcp-impl@lerc.nasa.gov  Wed Nov  4 19:48:40 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id TAA20030
	for <tcpimpl-archive@lists.ietf.org>; Wed, 4 Nov 1998 19:48:39 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id SAA18048; Wed, 4 Nov 1998 18:28:42 -0500 (EST)
X-Authentication-Warning: assateague.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from frantic.bsdi.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id SAA18044; Wed, 4 Nov 1998 18:28:39 -0500 (EST)
Received: (from dab@localhost)
	by frantic.bsdi.com (8.9.0/8.9.0) id RAA01558;
	Wed, 4 Nov 1998 17:19:43 -0600 (CST)
Date: Wed, 4 Nov 1998 17:19:43 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199811042319.RAA01558@frantic.bsdi.com>
To: raj@cup.hp.com, tcp-impl@lerc.nasa.gov
Subject: Re: probablility of successful TIME_WAIT reuse
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> Date: Wed, 04 Nov 1998 14:13:32 -0800
> From: Rick Jones <raj@cup.hp.com>
> Subject: probablility of successful TIME_WAIT reuse
>
> If I have:
>
> *) a passive acceptor of connections on a single port and IP address
> *) an active connector to that tuple cycling through 60K port id's
> *) roughly 15KB of data transfered on average on each connection
> *) the acceptor being the initiator of graceful close
>
> what are the chances of the active connector's SYN segments having ISN's
> meeting the requirements for successful transition from TIME_WAIT to
> ESTABLISHED (SYN_RECVD?) such that the active connector could generate
> say 2000 connections per second over a fifteen minute interval?

In BSD/OS ISS increments 250KB/second, plus 1/4 of that per connection.
With 60000 ports and 2000 connections per second, it'll take 30 seconds
to reuse a connection.  That means the ISS will go up by:
   60000 * 250K/4 + 30 * 250K, => 250K*(15000+30) => 3847680000.
The upper half of the 32 bit sequence space is 2147483648-4294967296,
so you fall right into it, (the 15Kb of transfered data is in the
noise) and the checks for allowing the transition
from TIME_WAIT -> SYN_RCVD will fail.  So, probability 0%.

Of course, if you are putting timestamps into the packets, then
with PAWS the sequence wrap around isn't an issue, and all new
connections should be accepted (and as I say this, (1) I look at the
4.4 BSD code and note that it doesn't do a PAWS check for the
TIME_WAIT -> SYN_RCVD transition, and (2) RFC 1323 doesn't address
this issue.)

Is that what you were looking for?

			-David Borman, dab@bsdi.com

PS: at 15KB/connection, and 2000 connections/second, you will be
transfering over 245 Mbits/second.


From owner-tcp-impl@lerc.nasa.gov  Thu Nov  5 13:35:04 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id NAA08909
	for <tcpimpl-archive@lists.ietf.org>; Thu, 5 Nov 1998 13:35:03 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id LAA06672; Thu, 5 Nov 1998 11:12:53 -0500 (EST)
Received: from mail.msen.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id LAA06667; Thu, 5 Nov 1998 11:12:51 -0500 (EST)
Received: (from mjo@localhost)
	by mail.msen.com (8.8.8/8.8.5) id LAA11604
	for tcp-impl@lerc.nasa.gov; Thu, 5 Nov 1998 11:12:50 -0500 (EST)
X-Authentication-Warning: conch.msen.com: mjo set sender to mjo@dojo.mi.org using -f
Subject: TOS bits on three-way handshake
To: tcp-impl@lerc.nasa.gov
Date: Thu, 5 Nov 1998 11:12:49 -0500 (EST)
From: "Mike O'Connor" <mjo@dojo.mi.org>
Reply-To: "Mike O'Connor" <mjo@dojo.mi.org>
Message-Id: <981105111249.mjo@dojo.mi.org>
X-Organization: :noitazinagrO-X
Content-Type: text
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Why is the TOS bit not honored on the initial ACK to a SYN in the
three-way handshake between a client and server that employ TOS.  My
setsockopt() appear to be set correctly and the TOS bit shows up as
set in every other packet, as I believe that it should.  I've tested
with other services that use TOS and see the same behavior in traces
when connecting to IRIX, Solaris, and HP-UX systems.  I've looked at
RFC1349 for insight, but I don't see anything obvious that explains
this behavior.  

-- 
 Michael J. O'Connor | WWW: http://dojo.mi.org/~mjo/ | Email: mjo@dojo.mi.org
 InterNIC WHOIS: MJO | (has my PGP & Geek Code info) | Phone: +1 248-848-4481
 =--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--=
"Ok, so I was wrong for once in my life!  Shut up."                   -Calvin


From owner-tcp-impl@cthulhu.engr.sgi.com  Fri Nov  6 18:40:36 1998
Received: from sgi.sgi.com (SGI.COM [192.48.153.1])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id SAA08951
	for <tcpimpl-archive@odin.ietf.org>; Fri, 6 Nov 1998 18:40:36 -0500 (EST)
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) 
	by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam:
       SGI does not authorize the use of its proprietary
       systems or networks for unsolicited or bulk email
       from the Internet.) 
	via ESMTP id PAA05447; Fri, 6 Nov 1998 15:34:35 -0800 (PST)
	mail_from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF)
	id PAA78628
	for tcp-impl-list;
	Fri, 6 Nov 1998 15:25:06 -0800 (PST)
	mail_from (owner-tcp-impl@relay.engr.sgi.com)
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id PAA55096
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Fri, 6 Nov 1998 15:25:04 -0800 (PST)
	mail_from (cardwell@cs.washington.edu)
Received: from sake.cs.washington.edu (sake.cs.washington.edu [128.95.4.55]) 
	by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam:
       SGI does not authorize the use of its proprietary
       systems or networks for unsolicited or bulk email
       from the Internet.) 
	via ESMTP id PAA08341
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 6 Nov 1998 15:25:03 -0800 (PST)
	mail_from (cardwell@cs.washington.edu)
Received: from localhost (cardwell@localhost) by sake.cs.washington.edu (8.8.8+CS/7.2ws+) with SMTP id PAA20336 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 6 Nov 1998 15:25:02 -0800
Date: Fri, 6 Nov 1998 15:25:02 -0800 (PST)
From: Neal Cardwell <cardwell@cs.washington.edu>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: delayed ACKs for retransmitted packets: ouch!
Message-ID: <Pine.LNX.4.02A.9811061346470.19242-100000@sake.cs.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@cthulhu.engr.sgi.com
Precedence: bulk


Although Linux 2.0 appears to perform poorly in acking retransmitted
packets in some cases, just for the record i thought i'd note that Luigi
Rizzo pointed out to me that BSD does actually do the right thing when it
gets a data segment that falls below previously-received data:

Luigi said:
> Fig27.15 in Stevens deals with TCP_REASS, and in order to have a
> DELACK you need that the reassembly queue be empty, and this does
> not happen when the received pkt fills a hole (you know it's a hole
> because you already have a pkt _after_ it).

thanks, Luigi!
neal



From owner-tcp-impl@cthulhu.engr.sgi.com  Sat Nov  7 09:43:11 1998
Received: from sgi.sgi.com (SGI.COM [192.48.153.1])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id JAA23292
	for <tcpimpl-archive@odin.ietf.org>; Sat, 7 Nov 1998 09:43:10 -0500 (EST)
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) 
	by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam:
       SGI does not authorize the use of its proprietary
       systems or networks for unsolicited or bulk email
       from the Internet.) 
	via ESMTP id GAA04652; Sat, 7 Nov 1998 06:38:10 -0800 (PST)
	mail_from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF)
	id GAA55894
	for tcp-impl-list;
	Sat, 7 Nov 1998 06:30:54 -0800 (PST)
	mail_from (owner-tcp-impl@relay.engr.sgi.com)
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id GAA81445
	for <tcp-impl@engr.sgi.com>;
	Sat, 7 Nov 1998 06:30:53 -0800 (PST)
	mail_from (viagra2000@usa.net)
Received: from itek.norut.no (emma.itek.norut.no [193.156.106.30]) 
	by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam:
       SGI does not authorize the use of its proprietary
       systems or networks for unsolicited or bulk email
       from the Internet.) 
	via SMTP id GAA05783
	for <tcp-impl@engr.sgi.com>; Sat, 7 Nov 1998 06:30:47 -0800 (PST)
	mail_from (viagra2000@usa.net)
From: viagra2000@usa.net
Received: from itek.norut.no (embert) by itek.norut.no  with SMTP id AA14612
  (5.65c8+/IDA-1.4.4 for <tcp-impl@engr.sgi.com>); Sat, 7 Nov 1998 15:30:28 +0100
Received: from 193.174.4.1 by itek.norut.no (8.8.8) id PAA30964; Sat, 7 Nov 1998 15:30:18 +0100 (MET)
Date: Sat, 7 Nov 1998 15:30:18 +0100 (MET)
Message-Id: <199811071430.PAA30964@itek.norut.no>
To: <tcp-impl@cthulhu.engr.sgi.com>
Subject: Free Shipping included
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
X-Charset: LATIN1
X-Char-Esc: 29
Sender: owner-tcp-impl@cthulhu.engr.sgi.com
Precedence: bulk

Sender: Viagra 4 you
Email: viagra2000@usa.net


Visit 

http://surf.to/forsale 

for full details.




This message is sent in compliance of the new e-mail bill:
SECTION 301, Paragraph (a)(2)(C) of s. 1618

To be removed from our mailing list, simply reply with
"REMOVE" in the subject.





From owner-tcp-impl@cthulhu.engr.sgi.com  Sat Nov  7 10:07:51 1998
Received: from sgi.sgi.com (SGI.COM [192.48.153.1])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id KAA23469
	for <tcpimpl-archive@odin.ietf.org>; Sat, 7 Nov 1998 10:07:50 -0500 (EST)
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) 
	by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam:
       SGI does not authorize the use of its proprietary
       systems or networks for unsolicited or bulk email
       from the Internet.) 
	via ESMTP id HAA02409; Sat, 7 Nov 1998 07:04:08 -0800 (PST)
	mail_from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF)
	id GAA82204
	for tcp-impl-list;
	Sat, 7 Nov 1998 06:57:45 -0800 (PST)
	mail_from (owner-tcp-impl@relay.engr.sgi.com)
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id GAA82035
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 7 Nov 1998 06:57:43 -0800 (PST)
	mail_from (davem@dm.cobaltmicro.com)
Received: from dm.cobaltmicro.com (dm.cobaltmicro.com [209.133.34.35]) 
	by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam:
       SGI does not authorize the use of its proprietary
       systems or networks for unsolicited or bulk email
       from the Internet.) 
	via ESMTP id GAA03045
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 7 Nov 1998 06:57:41 -0800 (PST)
	mail_from (davem@dm.cobaltmicro.com)
Received: (from davem@localhost)
	by dm.cobaltmicro.com (8.8.7/8.8.7) id GAA06241;
	Sat, 7 Nov 1998 06:55:48 -0800
Date: Sat, 7 Nov 1998 06:55:48 -0800
Message-Id: <199811071455.GAA06241@dm.cobaltmicro.com>
From: "David S. Miller" <davem@dm.cobaltmicro.com>
To: cardwell@cs.washington.edu
CC: tcp-impl@cthulhu.engr.sgi.com
In-reply-to: 
	<Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.washington.edu>
	(message from Neal Cardwell on Mon, 2 Nov 1998 19:19:40 -0800 (PST))
Subject: Re: delayed ACKs for retransmitted packets: ouch!
References:  <Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.washington.edu>
Sender: owner-tcp-impl@cthulhu.engr.sgi.com
Precedence: bulk

   Date: Mon, 2 Nov 1998 19:19:40 -0800 (PST)
   From: Neal Cardwell <cardwell@cs.washington.edu>

   during this period I'm getting about 30Kbps on average, and not so
   happy about the $ i forked over for DSL buying me modem
   performance!

Sorry I did not respond sooner.

Pick your poison, here is the fix for both 2.0.x and 2.1.x
Linux TCP stacks.  First 2.0.x:

--- net/ipv4/tcp_input.c.~1~	Tue Jul 21 08:13:48 1998
+++ net/ipv4/tcp_input.c	Sat Nov  7 06:48:59 1998
@@ -1929,8 +1929,14 @@
 		 * Delay the ack if possible.  Send ack's to
 		 * fin frames immediately as there shouldn't be
 		 * anything more to come.
+		 *
+		 * ACK immediately if we still have any out of
+		 * order data.  This is because we desire "maximum
+		 * feedback during loss".  --DaveM
 		 */
-		if (!sk->delay_acks || th->fin) {
+		if (!sk->delay_acks || th->fin ||
+		    ((sk->acked_seq == skb->end_seq) &&
+		     (skb->next != (struct sk_buff *) &sk->receive_queue))) {
 			tcp_send_ack(sk);
 		} else {
 			/*

And here is the same fix for current 2.1.x Linux TCP:

Index: net/ipv4/tcp_input.c
===================================================================
RCS file: /vger/u4/cvs/linux/net/ipv4/tcp_input.c,v
retrieving revision 1.135
retrieving revision 1.136
diff -u -r1.135 -r1.136
--- tcp_input.c	1998/11/07 10:54:42	1.135
+++ tcp_input.c	1998/11/07 14:36:18	1.136
@@ -5,7 +5,7 @@
  *
  *		Implementation of the Transmission Control Protocol(TCP).
  *
- * Version:	$Id: tcp_input.c,v 1.135 1998/11/07 10:54:42 davem Exp $
+ * Version:	$Id: tcp_input.c,v 1.136 1998/11/07 14:36:18 davem Exp $
  *
  * Authors:	Ross Biro, <bir7@leland.Stanford.Edu>
  *		Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG>
@@ -1517,7 +1517,7 @@
 	 *      - delay time <= 0.5 HZ
 	 *      - we don't have a window update to send
 	 *      - must send at least every 2 full sized packets
-	 *	- must send an ACK if we have any SACKs
+	 *	- must send an ACK if we have any out of order data
 	 *
 	 * With an extra heuristic to handle loss of packet
 	 * situations and also helping the sender leave slow
@@ -1530,8 +1530,8 @@
 	    tcp_raise_window(sk) ||
 	    /* We entered "quick ACK" mode or... */
 	    tcp_in_quickack_mode(tp) ||
-	    /* We have pending SACKs */
-	    (tp->sack_ok && tp->num_sacks)) {
+	    /* We have out of order data */
+	    (skb_peek(&tp->out_of_order_queue) != NULL)) {
 		/* Then ack it now */
 		tcp_send_ack(sk);
 	} else {

Enjoy.  BTW, I never noticed this because most the time when I'm
working on loss recovery on 2.1.x Linux both ends speak SACK.  The fix
here for 2.1.x just turns the old SACK test into a more general test.
I'm hoping 2.2.x gets released soon so SACK can finally be widely
deployed.

Thanks a lot for pointing out this problem.

Later,
David S. Miller
davem@dm.cobaltmicro.com


From owner-tcp-impl@lerc.nasa.gov  Mon Nov  9 18:18:22 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id SAA05728
	for <tcpimpl-archive@lists.ietf.org>; Mon, 9 Nov 1998 18:18:22 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id PAA18678; Mon, 9 Nov 1998 15:35:16 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from mailhub.Stanford.EDU (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id PAA18674; Mon, 9 Nov 1998 15:35:13 -0500 (EST)
Received: from stanford.edu (colorado.Stanford.EDU [171.64.74.157])
	by mailhub.Stanford.EDU (8.8.8/8.8.8/L) with ESMTP id MAA16660
	for <tcp-impl@lerc.nasa.gov>; Mon, 9 Nov 1998 12:35:12 -0800 (PST)
Message-ID: <364751FF.F8308EE8@stanford.edu>
Date: Mon, 09 Nov 1998 12:35:11 -0800
From: "Amr A. Awadallah" <aaa@stanford.edu>
X-Mailer: Mozilla 4.06 [en] (X11; U; HP-UX B.10.20 9000/780)
MIME-Version: 1.0
To: tcp-impl@lerc.nasa.gov
Subject: Quick question on delayed-ACKs and updating the congestion window.
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


Hi,

  Just wondering what is the "correct" way of increasing the congestion
window during congestion avoidance when delayed-ACKs are used, is it:

(1) Increase cwnd by 1 segment once cwnd segments are ACKed,

or

(2) Increase cwnd by 1 segment once cwnd ACKs are received.

They are both equivalent if delayed ACKs are not used. But if ACKs are
for every other received packet, then (2) will increase cwnd at half the
rate of one.

Which is correct, (1) or (2) ?

Thanks,

-- Amr




From owner-tcp-impl@lerc.nasa.gov  Mon Nov  9 18:18:23 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id SAA05732
	for <tcpimpl-archive@lists.ietf.org>; Mon, 9 Nov 1998 18:18:23 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id QAA22755; Mon, 9 Nov 1998 16:03:25 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from guns.lerc.nasa.gov (guns.lerc.nasa.gov [139.88.44.160]) by assateague.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id QAA22751; Mon, 9 Nov 1998 16:03:23 -0500 (EST)
Received: from guns.lerc.nasa.gov by guns.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-local)
        id QAA07164; Mon, 9 Nov 1998 16:03:23 -0500 (EST)
Message-Id: <199811092103.QAA07164@guns.lerc.nasa.gov>
To: "Amr A. Awadallah" <aaa@stanford.edu>
From: Mark Allman <mallman@lerc.nasa.gov>
Reply-To: mallman@lerc.nasa.gov
cc: tcp-impl@lerc.nasa.gov
Subject: Re: Quick question on delayed-ACKs and updating the congestion window. 
Organization: Late Night Hackers, NASA LeRC, Cleveland, Ohio
Song-of-the-Day: Radio Gaga
Date: Mon, 09 Nov 1998 16:03:23 -0500
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


> (1) Increase cwnd by 1 segment once cwnd segments are ACKed,
> 
> or
> 
> (2) Increase cwnd by 1 segment once cwnd ACKs are received.

According to 2001.bis (we just submitted version -01, so it should
be in the archive soon), either of these are allowed.  That is, the
spirit of the congestion avoidance algorithm is that it should
increase cwnd by one segment per RTT.  You can do this using the
traditional method, which is basically adding 1/cwnd for every ACK.
This method adds about 1 segment to cwnd every RTT if delayed ACKs
are not used and roughly 1 segment for every 2 RTTs if delayed ACKs
are used.  Or, you can use another method (such as method (1)
outlined above).  There are two or three methods outlined in
2001.bis, but the basic idea is that you are allowed to somehow add
1 segment to cwnd every RTT.

However, a TCP that is more conservative and only increases the
window by 1 segment every other RTT is also specifically allowed by
2001.bis (i.e., if a TCP implementer thinks any of the algorithms is
too aggressive, throttling the window to something less than what
the standard algorithms would use is allowed).

allman


From owner-tcp-impl@lerc.nasa.gov  Mon Nov  9 20:31:33 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id UAA07345
	for <tcpimpl-archive@lists.ietf.org>; Mon, 9 Nov 1998 20:31:32 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id SAA08862; Mon, 9 Nov 1998 18:46:28 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from mailhub.Stanford.EDU (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id SAA08855; Mon, 9 Nov 1998 18:46:25 -0500 (EST)
Received: from Stanford.Edu (amr.Stanford.EDU [36.191.0.155])
	by mailhub.Stanford.EDU (8.8.8/8.8.8/L) with ESMTP id PAA29658;
	Mon, 9 Nov 1998 15:46:22 -0800 (PST)
Message-ID: <36477ECD.90EF02FA@Stanford.Edu>
Date: Mon, 09 Nov 1998 15:46:21 -0800
From: Amr A Awadallah <aaa@Stanford.Edu>
Organization: Stanford University
X-Mailer: Mozilla 4.07 [en] (Win98; U)
MIME-Version: 1.0
To: mallman@lerc.nasa.gov, tcp-impl@lerc.nasa.gov
Subject: Re: Quick question on delayed-ACKs and updating the congestion window.
References: <199811092103.QAA07164@guns.lerc.nasa.gov>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> > (1) Increase cwnd by 1 segment once cwnd segments are ACKed,
> >
> > or
> >
> > (2) Increase cwnd by 1 segment once cwnd ACKs are received.
> 
> According to 2001.bis (we just submitted version -01, so it should
> be in the archive soon), either of these are allowed.  That is, the

  So which is more popular in current implementations ?

Thanks in advance,

-- Amr


From owner-tcp-impl@lerc.nasa.gov  Mon Nov  9 23:06:31 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id XAA16742
	for <tcpimpl-archive@lists.ietf.org>; Mon, 9 Nov 1998 23:06:30 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id VAA18943; Mon, 9 Nov 1998 21:20:05 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from sake.cs.washington.edu (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id VAA18938; Mon, 9 Nov 1998 21:20:02 -0500 (EST)
Received: from localhost (cardwell@localhost) by sake.cs.washington.edu (8.8.8+CS/7.2ws+) with SMTP id SAA04477; Mon, 9 Nov 1998 18:20:00 -0800
Date: Mon, 9 Nov 1998 18:20:00 -0800 (PST)
From: Neal Cardwell <cardwell@cs.washington.edu>
To: Amr A Awadallah <aaa@Stanford.Edu>
cc: tcp-impl@lerc.nasa.gov
Subject: Re: Quick question on delayed-ACKs and updating the congestion
 window.
In-Reply-To: <36477ECD.90EF02FA@Stanford.Edu>
Message-ID: <Pine.LNX.4.02A.9811091755530.2800-100000@sake.cs.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


> > > (1) Increase cwnd by 1 segment once cwnd segments are ACKed,
> > >
> > > or
> > >
> > > (2) Increase cwnd by 1 segment once cwnd ACKs are received.
> > 
> > According to 2001.bis (we just submitted version -01, so it should
> > be in the archive soon), either of these are allowed.  That is, the
> 
>   So which is more popular in current implementations ?

Linux (2.0,2.1), (Free)BSD and Win95/NT use approach (2), i believe.

neal



From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 00:30:27 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id AAA18781
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 00:30:27 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id XAA26299; Mon, 9 Nov 1998 23:10:28 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from tuvok.lerc.nasa.gov (ppp17.lerc.nasa.gov [139.88.123.27]) by assateague.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id XAA26295; Mon, 9 Nov 1998 23:10:24 -0500 (EST)
Received: from tuvok.lerc.nasa.gov (localhost [127.0.0.1])
	by tuvok.lerc.nasa.gov (8.8.5/8.8.5) with ESMTP id WAA00620;
	Mon, 9 Nov 1998 22:58:44 -0500 (EST)
Message-Id: <199811100358.WAA00620@tuvok.lerc.nasa.gov>
To: Amr A Awadallah <aaa@stanford.edu>
From: Mark Allman <mallman@lerc.nasa.gov>
Reply-To: mallman@lerc.nasa.gov
cc: tcp-impl@lerc.nasa.gov
Subject: Re: Quick question on delayed-ACKs and updating the congestion window. 
Organization: Late Night Hackers, NASA LeRC, Cleveland, Ohio
Song-of-the-Day: Radio Gaga
Date: Mon, 09 Nov 1998 22:58:44 -0500
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


> > > (1) Increase cwnd by 1 segment once cwnd segments are ACKed,
> > >
> > > or
> > >
> > > (2) Increase cwnd by 1 segment once cwnd ACKs are received.
> 
>   So which is more popular in current implementations ?

I haven't surveyed implementations, but I think that most TCPs use
the MSS*MSS/cwnd formula, which effectivly yields option (2).
Certainly the BSD varieties (and decendants) that I play with often
do.

allman




 


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 13:32:37 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id NAA01658
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 13:32:36 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id LAA03523; Tue, 10 Nov 1998 11:38:31 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from ietf.org (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id LAA03509; Tue, 10 Nov 1998 11:38:27 -0500 (EST)
Received: from CNRI.Reston.VA.US (localhost [127.0.0.1])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id LAA27084;
	Tue, 10 Nov 1998 11:38:24 -0500 (EST)
Message-Id: <199811101638.LAA27084@ietf.org>
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary="NextPart"
To: IETF-Announce:;;@ns.cnri.reston.va.us
Cc: tcp-impl@lerc.nasa.gov
From: Internet-Drafts@ietf.org
Reply-to: Internet-Drafts@ietf.org
Subject: I-D ACTION:draft-ietf-tcpimpl-prob-05.txt
Date: Tue, 10 Nov 1998 11:38:23 -0500
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

--NextPart

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the TCP Implementation Working Group of the IETF.

	Title		: Known TCP Implementation Problems
	Author(s)	: V. Paxson, M. Allman, S. Dawson, B. Fenner, 
                          J. Griner, I. Heavens, K. Lahey, J. Semke, B. Volz
	Filename	: draft-ietf-tcpimpl-prob-05.txt
	Pages		: 63
	Date		: 09-Nov-98
	
   This memo catalogs a number of known TCP implementation problems.
   The goal in doing so is to improve conditions in the existing
   Internet by enhancing the quality of current TCP/IP implementations.
   It is hoped that both performance and correctness issues can be
   resolved by making implementors aware of the problems and their
   solutions.  In the long term, it is hoped that this will provide a
   reduction in unnecessary traffic on the network, the rate of
   connection failures due to protocol errors, and load on network
   servers due to time spent processing both unsuccessful connections
   and retransmitted data.  This will help to ensure the stability of
   the global Internet.

Internet-Drafts are available by anonymous FTP.  Login with the username
"anonymous" and a password of your e-mail address.  After logging in,
type "cd internet-drafts" and then
	"get draft-ietf-tcpimpl-prob-05.txt".
A URL for the Internet-Draft is:
ftp://ftp.ietf.org/internet-drafts/draft-ietf-tcpimpl-prob-05.txt

Internet-Drafts directories are located at:

	Africa:	ftp.is.co.za
	
	Europe: ftp.nordu.net
		ftp.nic.it
			
	Pacific Rim: munnari.oz.au
	
	US East Coast: ftp.ietf.org
	
	US West Coast: ftp.isi.edu

Internet-Drafts are also available by mail.

Send a message to:	mailserv@ietf.org.  In the body type:
	"FILE /internet-drafts/draft-ietf-tcpimpl-prob-05.txt".
	
NOTE:	The mail server at ietf.org can return the document in
	MIME-encoded form by using the "mpack" utility.  To use this
	feature, insert the command "ENCODING mime" before the "FILE"
	command.  To decode the response(s), you will need "munpack" or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	"multipart" MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.
		
		
Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary="OtherAccess"

--OtherAccess
Content-Type: Message/External-body;
	access-type="mail-server";
	server="mailserv@ietf.org"

Content-Type: text/plain
Content-ID:	<19981109133033.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-ietf-tcpimpl-prob-05.txt

--OtherAccess
Content-Type: Message/External-body;
	name="draft-ietf-tcpimpl-prob-05.txt";
	site="ftp.ietf.org";
	access-type="anon-ftp";
	directory="internet-drafts"

Content-Type: text/plain
Content-ID:	<19981109133033.I-D@ietf.org>

--OtherAccess--

--NextPart--




From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 13:38:29 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id NAA01993
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 13:38:28 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id LAA04224; Tue, 10 Nov 1998 11:43:18 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from ietf.org (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id LAA04219; Tue, 10 Nov 1998 11:43:15 -0500 (EST)
Received: from CNRI.Reston.VA.US (localhost [127.0.0.1])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id LAA27311;
	Tue, 10 Nov 1998 11:43:11 -0500 (EST)
Message-Id: <199811101643.LAA27311@ietf.org>
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary="NextPart"
To: IETF-Announce:;;@ns.cnri.reston.va.us
Cc: tcp-impl@lerc.nasa.gov
From: Internet-Drafts@ietf.org
Reply-to: Internet-Drafts@ietf.org
Subject: I-D ACTION:draft-ietf-tcpimpl-cong-control-01.txt
Date: Tue, 10 Nov 1998 11:43:11 -0500
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

--NextPart

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the TCP Implementation Working Group of the IETF.

	Title		: TCP Congestion Control
	Author(s)	: M. Allman,  V. Paxson, W. Stevens
	Filename	: draft-ietf-tcpimpl-cong-control-01.txt
	Pages		: 11
	Date		: 09-Nov-98
	
    This document defines TCP's four intertwined congestion control
    algorithms: slow start, congestion avoidance, fast retransmit, and
    fast recovery.  In addition, the document specifies how TCP should
    begin transmission after a relatively long idle period, as well as
    discussing various acknowledgment generation methods.

Internet-Drafts are available by anonymous FTP.  Login with the username
"anonymous" and a password of your e-mail address.  After logging in,
type "cd internet-drafts" and then
	"get draft-ietf-tcpimpl-cong-control-01.txt".
A URL for the Internet-Draft is:
ftp://ftp.ietf.org/internet-drafts/draft-ietf-tcpimpl-cong-control-01.txt

Internet-Drafts directories are located at:

	Africa:	ftp.is.co.za
	
	Europe: ftp.nordu.net
		ftp.nic.it
			
	Pacific Rim: munnari.oz.au
	
	US East Coast: ftp.ietf.org
	
	US West Coast: ftp.isi.edu

Internet-Drafts are also available by mail.

Send a message to:	mailserv@ietf.org.  In the body type:
	"FILE /internet-drafts/draft-ietf-tcpimpl-cong-control-01.txt".
	
NOTE:	The mail server at ietf.org can return the document in
	MIME-encoded form by using the "mpack" utility.  To use this
	feature, insert the command "ENCODING mime" before the "FILE"
	command.  To decode the response(s), you will need "munpack" or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	"multipart" MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.
		
		
Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary="OtherAccess"

--OtherAccess
Content-Type: Message/External-body;
	access-type="mail-server";
	server="mailserv@ietf.org"

Content-Type: text/plain
Content-ID:	<19981109160022.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-ietf-tcpimpl-cong-control-01.txt

--OtherAccess
Content-Type: Message/External-body;
	name="draft-ietf-tcpimpl-cong-control-01.txt";
	site="ftp.ietf.org";
	access-type="anon-ftp";
	directory="internet-drafts"

Content-Type: text/plain
Content-ID:	<19981109160022.I-D@ietf.org>

--OtherAccess--

--NextPart--




From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 14:16:52 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id OAA04178
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 14:16:51 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id MAA11329; Tue, 10 Nov 1998 12:45:11 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from daffy.ee.lbl.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id MAA11316; Tue, 10 Nov 1998 12:45:09 -0500 (EST)
Received: (from vern@localhost)
	by daffy.ee.lbl.gov (8.9.1/8.9.1) id JAA18130;
	Tue, 10 Nov 1998 09:45:07 -0800 (PST)
Message-Id: <199811101745.JAA18130@daffy.ee.lbl.gov>
To: tcp-impl@lerc.nasa.gov
Subject: Known Problems security consideration section
Date: Tue, 10 Nov 1998 09:45:07 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

4. Security Considerations

   This memo does not discuss any specific security-related TCP
   implementation problems, as the working group decided to pursue
   documenting those in a separate document.  Some of the implementation
   problems discussed here, however, can be used for denial-of-service
   attacks.  Those classified as congestion control present
   opportunities to subvert TCPs used for legitimate data transfer into
   excessively loading network elements.  Those classified as
   "performance", "reliability" and "resource management" may be
   exploitable for launching surreptitious denial-of-service attacks
   against the user of the TCP.  Both of these types of attacks can be
   extremely difficult to detect because in most respects they look
   identical to legitimate network traffic.


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 14:18:52 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id OAA04284
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 14:18:52 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id MAA11176; Tue, 10 Nov 1998 12:43:19 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from daffy.ee.lbl.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id MAA11158; Tue, 10 Nov 1998 12:43:14 -0500 (EST)
Received: (from vern@localhost)
	by daffy.ee.lbl.gov (8.9.1/8.9.1) id JAA18107;
	Tue, 10 Nov 1998 09:43:13 -0800 (PST)
Message-Id: <199811101743.JAA18107@daffy.ee.lbl.gov>
To: tcp-impl@lerc.nasa.gov
Cc: mallman@lerc.nasa.gov, sob@harvard.edu
Subject: Last Call for Known Problems I-D
Date: Tue, 10 Nov 1998 09:43:12 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

The -05 version of the Known TCP Implementation Problems I-D has now come
out.  Mark & I feel it's ready for publication, so we're holding a two-week
last call for any final comments, with the last call ending on Nov 24.

The main differences between this version and the -04 version are:

	- A new problem description, "Window probe deadlock", contributed
	  by Bill Fenner
	- Security Considerations section added
	- Discussion of the experimental larger initial window added
	  to the "No initial slow start" problem
	- Table of contents added

There have also been a lot of minor editing tweaks, so I'm not going to
send diffs for those to the list.  (If you'd like diffs generated by wdiff,
a word-oriented diff tool, send me mail.)

The next two messages are the new problem description and the security
considerations section, so they can be discussed in their own threads
if need be.

		Vern


ftp://ftp.ietf.org/internet-drafts/draft-ietf-tcpimpl-prob-05.txt


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 14:20:22 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id OAA04386
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 14:20:22 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id MAA11259; Tue, 10 Nov 1998 12:44:25 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from daffy.ee.lbl.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id MAA11249; Tue, 10 Nov 1998 12:44:19 -0500 (EST)
Received: (from vern@localhost)
	by daffy.ee.lbl.gov (8.9.1/8.9.1) id JAA18119;
	Tue, 10 Nov 1998 09:44:18 -0800 (PST)
Message-Id: <199811101744.JAA18119@daffy.ee.lbl.gov>
To: tcp-impl@lerc.nasa.gov
Subject: Window probe deadlock writeup
Date: Tue, 10 Nov 1998 09:44:18 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

3.12.

Name of Problem
     Window probe deadlock


Classification
     Reliability


Description
     When an application reads a single byte from a full window, the
     window should not be updated, in order to avoid Silly Window
     Syndrome (SWS; see [RFC813]).  If the remote peer uses a single
     byte of data to probe the window, that byte can be accepted into
     the buffer.  In some implementations, at this point a negative
     argument to a signed comparison causes all further new data to be
     considered outside the window; consequently, it is discarded (after
     sending an ACK to resynchronize).  These discards include the ACKs
     for the data packets sent by the local TCP, so the TCP will
     consider the data unacknowledged.

     Consequently, the application may be unable to complete sending new
     data to the remote peer, because it has exhausted the transmit
     buffer available to its local TCP, and buffer space is never being
     freed because incoming ACKs that would do so are being discarded.
     If the application does not read any more data, which may happen
     due to its failure to complete such sends, then deadlock results.


Significance
     It's relatively rare for applications to use TCP in a manner that
     can exercise this problem.  Most applications only transmit bulk
     data if they know the other end is prepared to receive the data.
     However, if a client fails to consume data, putting the server in



Paxson, Editor                                                 [Page 38]





ID                 Known TCP Implementation Problems       November 1998


     persist mode, and then consumes a small amount of data, it can
     mistakenly compute a negative window.  At this point the client
     will discard all further packets from the server, including ACKs of
     the client's own data, since they are not inside the (impossibly-
     sized) window.  If subsequently the client consumes enough data to
     then send a window update to the server, the situation will be
     rectified.  That is, this situation can only happen if the client
     consumes 1 < N < MSS bytes, so as not to cause a window update, and
     then starts its own transmission towards the server of more than a
     window's worth of data.


Implications
     TCP connections will hang and eventually time out.


Relevant RFCs
     RFC 793 describes zero window probing.  RFC 813 describes Silly
     Window Syndrome.


Trace file demonstrating it
     Trace made from a version of tcpdump modified to print out the
     sequence number attached to an ACK even if it's dataless.  An
     unmodified tcpdump would not print seq:seq(0); however, for this
     bug, the sequence number in the ACK is important for unambiguously
     determining how the TCP is behaving.

     [ Normal connection startup and data transmission from B to A.
       Options, including MSS of 16344 in both directions, omitted
       for clarity. ]
     16:07:32.327616 A > B: S 65360807:65360807(0) win 8192
     16:07:32.327304 B > A: S 65488807:65488807(0) ack 65360808 win 57344
     16:07:32.327425 A > B: . 1:1(0) ack 1 win 57344
     16:07:32.345732 B > A: P 1:2049(2048) ack 1 win 57344
     16:07:32.347013 B > A: P 2049:16385(14336) ack 1 win 57344
     16:07:32.347550 B > A: P 16385:30721(14336) ack 1 win 57344
     16:07:32.348683 B > A: P 30721:45057(14336) ack 1 win 57344
     16:07:32.467286 A > B: . 1:1(0) ack 45057 win 12288
     16:07:32.467854 B > A: P 45057:57345(12288) ack 1 win 57344

     [ B fills up A's offered window ]
     16:07:32.667276 A > B: . 1:1(0) ack 57345 win 0

     [ B probes A's window with a single byte ]
     16:07:37.467438 B > A: . 57345:57346(1) ack 1 win 57344

     [ A resynchronizes without accepting the byte ]



Paxson, Editor                                                 [Page 39]





ID                 Known TCP Implementation Problems       November 1998


     16:07:37.467678 A > B: . 1:1(0) ack 57345 win 0

     [ B probes A's window again ]
     16:07:45.467438 B > A: . 57345:57346(1) ack 1 win 57344

     [ A resynchronizes and accepts the byte (per the ack field) ]
     16:07:45.667250 A > B: . 1:1(0) ack 57346 win 0

     [ The application on A has started generating data.  The first
       packet A sends is small due to a memory allocation bug. ]
     16:07:51.358459 A > B: P 1:2049(2048) ack 57346 win 0

     [ B acks A's first packet ]
     16:07:51.467239 B > A: . 57346:57346(0) ack 2049 win 57344

     [ This looks as though A accepted B's ACK and is sending
       another packet in response to it.  In fact, A is trying
       to resynchronize with B, and happens to have data to send
       and can send it because the first small packet didn't use
       up cwnd. ]
     16:07:51.467698 A > B: . 2049:14337(12288) ack 57346 win 0

     [ B acks all of the data that A has sent ]
     16:07:51.667283 B > A: . 57346:57346(0) ack 14337 win 57344

     [ A tries to resynchronize.  Notice that by the packets
       seen on the network, A and B *are* in fact synchronized;
       A only thinks that they aren't. ]
     16:07:51.667477 A > B: . 14337:14337(0) ack 57346 win 0

     [ A's retransmit timer fires, and B acks all of the data.
       A once again tries to resynchronize. ]
     16:07:52.467682 A > B: . 1:14337(14336) ack 57346 win 0
     16:07:52.468166 B > A: . 57346:57346(0) ack 14337 win 57344
     16:07:52.468248 A > B: . 14337:14337(0) ack 57346 win 0

     [ A's retransmit timer fires again, and B acks all of the data.
       A once again tries to resynchronize. ]
     16:07:55.467684 A > B: . 1:14337(14336) ack 57346 win 0
     16:07:55.468172 B > A: . 57346:57346(0) ack 14337 win 57344
     16:07:55.468254 A > B: . 14337:14337(0) ack 57346 win 0


Trace file demonstrating correct behavior
     Made between the same two hosts after applying the bug fix
     mentioned below (and using the same modified tcpdump).

     [ Connection starts up with data transmission from B to A.



Paxson, Editor                                                 [Page 40]





ID                 Known TCP Implementation Problems       November 1998


       Note that due to a separate bug (the fact that A and B
       are communicating over a loopback driver), B erroneously
       skips slow start. ]
     17:38:09.510854 A > B: S 3110066585:3110066585(0) win 16384
     17:38:09.510926 B > A: S 3110174850:3110174850(0) ack 3110066586 win 57344
     17:38:09.510953 A > B: . 1:1(0) ack 1 win 57344
     17:38:09.512956 B > A: P 1:2049(2048) ack 1 win 57344
     17:38:09.513222 B > A: P 2049:16385(14336) ack 1 win 57344
     17:38:09.513428 B > A: P 16385:30721(14336) ack 1 win 57344
     17:38:09.513638 B > A: P 30721:45057(14336) ack 1 win 57344
     17:38:09.519531 A > B: . 1:1(0) ack 45057 win 12288
     17:38:09.519638 B > A: P 45057:57345(12288) ack 1 win 57344

     [ B fills up A's offered window ]
     17:38:09.719526 A > B: . 1:1(0) ack 57345 win 0

     [ B probes A's window with a single byte.  A resynchronizes
       without accepting the byte ]
     17:38:14.499661 B > A: . 57345:57346(1) ack 1 win 57344
     17:38:14.499724 A > B: . 1:1(0) ack 57345 win 0

     [ B probes A's window again.  A resynchronizes and accepts
       the byte, as indicated by the ack field ]
     17:38:19.499764 B > A: . 57345:57346(1) ack 1 win 57344
     17:38:19.519731 A > B: . 1:1(0) ack 57346 win 0

     [ B probes A's window with a single byte.  A resynchronizes
       without accepting the byte ]
     17:38:24.499865 B > A: . 57346:57347(1) ack 1 win 57344
     17:38:24.499934 A > B: . 1:1(0) ack 57346 win 0

     [ The application on A has started generating data.
       B acks A's data and A accepts the ACKs and the
       data transfer continues ]
     17:38:28.530265 A > B: P 1:2049(2048) ack 57346 win 0
     17:38:28.719914 B > A: . 57346:57346(0) ack 2049 win 57344

     17:38:28.720023 A > B: . 2049:16385(14336) ack 57346 win 0
     17:38:28.720089 A > B: . 16385:30721(14336) ack 57346 win 0
     17:38:28.720370 B > A: . 57346:57346(0) ack 30721 win 57344

     17:38:28.720462 A > B: . 30721:45057(14336) ack 57346 win 0
     17:38:28.720526 A > B: P 45057:59393(14336) ack 57346 win 0
     17:38:28.720824 A > B: P 59393:73729(14336) ack 57346 win 0
     17:38:28.721124 B > A: . 57346:57346(0) ack 73729 win 47104

     17:38:28.721198 A > B: P 73729:88065(14336) ack 57346 win 0
     17:38:28.721379 A > B: P 88065:102401(14336) ack 57346 win 0



Paxson, Editor                                                 [Page 41]





ID                 Known TCP Implementation Problems       November 1998


     17:38:28.721557 A > B: P 102401:116737(14336) ack 57346 win 0
     17:38:28.721863 B > A: . 57346:57346(0) ack 116737 win 36864


References
     None known.


How to detect
     Initiate a connection from a client to a server.  Have the server
     continuously send data until its buffers have been full for long
     enough to exhaust the window.  Next, have the client read 1 byte
     and then delay for long enough that the server TCP sends a window
     probe.  Now have the client start sending data.  At this point, if
     it ignores the server's ACKs, then the client's TCP suffers from
     the problem.


How to fix
     In one implementation known to exhibit the problem (derived from
     4.3-Reno), the problem was introduced when the macro MAX() was
     replaced by the function call max() for computing the amount of
     space in the receive window:

         tp->rcv_wnd = max(win, (int)(tp->rcv_adv - tp->rcv_nxt));

     When data has been received into a window beyond what has been
     advertised to the other side, rcv_nxt > rcv_adv, making this
     negative.  It's clear from the (int) cast that this is intended,
     but the unsigned max() function sign-extends so the negative number
     is "larger".  The fix is to change max() to imax():

         tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));

     4.3-Tahoe and before did not have this bug, since it used the macro
     MAX() for this calculation.


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 14:59:23 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id OAA06511
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 14:59:22 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id NAA16434; Tue, 10 Nov 1998 13:27:27 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from daffy.ee.lbl.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id NAA16430; Tue, 10 Nov 1998 13:27:24 -0500 (EST)
Received: (from vern@localhost)
	by daffy.ee.lbl.gov (8.9.1/8.9.1) id KAA18281;
	Tue, 10 Nov 1998 10:27:23 -0800 (PST)
Message-Id: <199811101827.KAA18281@daffy.ee.lbl.gov>
To: tcp-impl@lerc.nasa.gov
Cc: mallman@lerc.nasa.gov, sob@harvard.edu
Subject: Last Call for TCP Congestion Control I-D
Date: Tue, 10 Nov 1998 10:27:23 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

The -01 version of the TCP Congestion Control I-D has now come out.
Mark & I feel it's ready for publication as Proposed Standard, so we're
holding a two-week last call for final comments, with the last call ending
on Nov 24.

The main differences between this version and the -00 version are:

	- Changed acking-at-least-every-second-segment from a MUST to
	  a SHOULD.  This reflects the recent discussion on the mailing
	  list that presented arguments why it might indeed sometimes
	  make sense to generate ACKs at a lower rate.  Added a strong
	  caution that this should not be done lightly.

	- Clarified that the receiver doesn't know the true MSS and
	  so the ack-at-least-every-second-full-MSS-segment requirement
	  means in terms of its offered MSS.

	- Added that a TCP receiver SHOULD generate an immediate ACK
	  when an incoming segment fills a gap in the sequence space.
	  This reflects the recent discussion of how failure to do so
	  can lead to miserable performance.

	- Added discussion of "FlightSize", the amount of outstanding
	  data at the time of a loss.  On loss, ssthresh is set to
	  FlightSize/2 (or 2*MSS, whichever's larger).  This used
	  to be "min(cwnd,rwnd)" but it was pointed out that you might
	  have less data in flight than either of those.

	- Removed proposed algorithm for recovering from multiple losses
	  in a single flight, and replaced it with a general discussion
	  of congestion control requirements for loss recovery mechanisms.

	- Added a pointer to draft-ietf-tcpimpl-newreno-00.txt as one
	  example of Reno modifications for dealing with recovering from
	  multiple losses.  This I-D has been sent to the Internet Drafts
	  editor and should be posted shortly.

The next message are context diffs between -00 and -01.

		Vern


ftp://ftp.ietf.org/internet-drafts/draft-ietf-tcpimpl-cong-control-01.txt


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 15:47:58 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id PAA08568
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 15:47:57 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id NAA16471; Tue, 10 Nov 1998 13:28:08 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from daffy.ee.lbl.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id NAA16467; Tue, 10 Nov 1998 13:28:05 -0500 (EST)
Received: (from vern@localhost)
	by daffy.ee.lbl.gov (8.9.1/8.9.1) id KAA18295;
	Tue, 10 Nov 1998 10:28:03 -0800 (PST)
Message-Id: <199811101828.KAA18295@daffy.ee.lbl.gov>
To: tcp-impl@lerc.nasa.gov
Subject: Re: Last Call for TCP Congestion Control I-D
In-reply-to: Your message of Tue, 10 Nov 1998 10:27:23 PST.
Date: Tue, 10 Nov 1998 10:28:03 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

--- draft-ietf-tcpimpl-cong-control-00.txt	Mon Aug 10 09:50:28 1998
+++ draft-ietf-tcpimpl-cong-control-01.txt	Tue Nov 10 09:46:25 1998
@@ -1,11 +1,10 @@
-
-TCP Implementation Working Group                              W. Stevens
-INTERNET DRAFT                                                Consultant
-File: draft-ietf-tcpimpl-cong-control-00.txt                   M. Allman
-                                            NASA Lewis/Sterling Software
-                                                               V. Paxson
+TCP Implementation Working Group                               M. Allman
+INTERNET DRAFT                              NASA Lewis/Sterling Software
+File: draft-ietf-tcpimpl-cong-control-01.txt                   V. Paxson
                                                                     LBNL
-                                                            August, 1998
+                                                              W. Stevens
+                                                              Consultant
+                                                          November, 1998
 
     
 			TCP Congestion Control
@@ -55,9 +54,9 @@
     implementation of these algorithms.
 
 
-Expires: February, 1999                                           [Page 1]
+Expires: June, 1999                                             [Page 1]
 
-draft-ietf-tcpimpl-cong-control-00.txt                       August 1998
+draft-ietf-tcpimpl-cong-control-01.txt                     November 1998
 
     This document is organized as follows.  Section 2 provides various
     definitions which will be used throughout the paper.  Section 3
@@ -106,18 +105,21 @@
         The restart window is the size of the congestion window after a
         TCP restarts transmission after an idle period.
 
+    FLIGHT SIZE:
+	The amount of data the has been sent but not yet acknowledged. 
+
 3  Congestion Control Algorithms    
     
     This section defines the four congestion control algorithms: slow
     start, congestion avoidance, fast retransmit and fast recovery,
-    developed in [Jac88] and [Jac90].  In some situations it may be
-    beneficial for a TCP sender to be more conservative than the
-    algorithms allow, however a TCP MUST NOT be more aggressive than the
 
-Expires: February, 1999                                           [Page 2]
+Expires: June, 1999                                             [Page 2]
 
-draft-ietf-tcpimpl-cong-control-00.txt                       August 1998
+draft-ietf-tcpimpl-cong-control-01.txt                     November 1998
 
+    developed in [Jac88] and [Jac90].  In some situations it may be
+    beneficial for a TCP sender to be more conservative than the
+    algorithms allow, however a TCP MUST NOT be more aggressive than the
     following algorithms allow (that is, MUST NOT send data when the
     value of cwnd computed by the following algorithms would not allow
     the data to be sent).
@@ -170,13 +172,13 @@
     algorithm is used when cwnd > ssthresh.  When cwnd and ssthresh are
     equal the sender may use either slow start or congestion avoidance.
 
-    During slow start, a TCP increments cwnd by at most MSS bytes for
-    each ACK received that acknowledges new data.  Slow start ends when
-
-Expires: February, 1999                                           [Page 3]
+Expires: June, 1999                                             [Page 3]
 
-draft-ietf-tcpimpl-cong-control-00.txt                       August 1998
+draft-ietf-tcpimpl-cong-control-01.txt                     November 1998
 
+
+    During slow start, a TCP increments cwnd by at most MSS bytes for
+    each ACK received that acknowledges new data.  Slow start ends when
     cwnd exceeds ssthresh (or, optionally, when it reaches it, as noted
     above); or when cwnd reaches rwnd; or when congestion is observed.
 
@@ -188,7 +190,8 @@
 
                           cwnd += MSS*MSS/cwnd                       (2)
 
-    This provides an acceptable approximation to the underlying
+    This adjustment is executed on every incoming non-duplicate ACK.
+    Equation (2) provides an acceptable approximation to the underlying
     principle of increasing cwnd by 1 full-sized segment per RTT.  (Note
     that for a connection in which the receiver acknowledges every data
     segment, (2) proves slightly more aggressive than 1 segment per RTT,
@@ -223,19 +226,22 @@
     timer, the value of ssthresh MUST be set to no more than the value
     given in equation 3:
 
-              ssthresh = max (min (cwnd, rwnd) / 2, 2*MSS)           (3)
+		   ssthresh = max (FlightSize / 2, 2*MSS)             (3)
 	    
-    Implementation Note: an easy mistake to make is to forget the inner
-    min() operation and simply use cwnd, which in some implementations
-    may incidentally increase well beyond rwnd.
+    As discussed above, FlightSize is the amount of outstanding data in
+    the network.
 
-    Furthermore, upon a timeout cwnd MUST be set to no more than the
-    loss window, LW, which equals 1 full-sized segment (regardless of
-
-Expires: February, 1999                                           [Page 4]
+Expires: June, 1999                                             [Page 4]
 
-draft-ietf-tcpimpl-cong-control-00.txt                       August 1998
+draft-ietf-tcpimpl-cong-control-01.txt                     November 1998
 
+
+    Implementation Note: an easy mistake to make is to simply use cwnd,
+    rather than FlightSize, which in some implementations may
+    incidentally increase well beyond rwnd.
+
+    Furthermore, upon a timeout cwnd MUST be set to no more than the
+    loss window, LW, which equals 1 full-sized segment (regardless of
     the value of IW).  Therefore, after retransmitting the dropped
     segment the TCP sender uses the slow start algorithm to increase the
     window from 1 full-sized segment to the new value of ssthresh, at
@@ -252,31 +258,43 @@
     First, they can be caused by dropped segments.  In this case, all
     segments after the dropped segment will trigger duplicate ACKs.
     Second, duplicate ACKs can be caused by the re-ordering of data
-    segments by the network (not a rare event along some network paths).
-    Finally, duplicate ACKs can be caused by replication of ACK or data
-    segments by the network.
+    segments by the network (not a rare event along some network paths
+    [Pax97]).  Finally, duplicate ACKs can be caused by replication of
+    ACK or data segments by the network.  In addition, a TCP receiver
+    SHOULD send an immediate ACK when the incoming segment fills in all
+    or part of a gap in the sequence space.  This will generate more
+    timely information for a sender recovering from a loss through a
+    retransmission timeout, a fast retransmit, or an experimental loss
+    recovery algorithm, such as NewReno [FH98].
 
     The TCP sender SHOULD use the "fast retransmit" algorithm to detect
     and repair loss, based on incoming duplicate ACKs.  The fast
-    retransmit algorithm uses the arrival of 3 duplicate ACKs (i.e., 4
-    identical ACKs) as an indication that a segment has been lost.
-    After receiving 3 duplicate ACKs, TCP performs a retransmission of
-    what appears to be the missing segment, without waiting for the
-    retransmission timer to expire.
+    retransmit algorithm uses the arrival of 3 duplicate ACKs (4
+    identical ACKs without the arrival of any other intervening packets)
+    as an indication that a segment has been lost.  After receiving 3
+    duplicate ACKs, TCP performs a retransmission of what appears to be
+    the missing segment, without waiting for the retransmission timer to
+    expire.
 
     After the fast retransmit sends what appears to be the missing
     segment, the "fast recovery" algorithm governs the transmission of
     new data until a non-duplicate ACK arrives.  The reason for not
     performing slow start is that the receipt of the duplicate ACKs not
-    only tells the TCP that a segment has been lost, but also that
-    segments are leaving the network.  In other words, since the
-    receiver can only generate a duplicate ACK when a segment has
-    arrived, that segment has left the network and is in the receiver's
-    buffer, so we know it is no longer consuming network resources.
-    Furthermore, since the ACK "clock" [Jac88] is preserved, the TCP
-    sender can continue to transmit new segments (although transmission
-    must continue using a reduced cwnd).
+    only indicates that a segment has been lost, but also that segments
+    are most likely leaving the network (although a massive segment
+    duplication by the network can invalidate this conclusion).  In
+    other words, since the receiver can only generate a duplicate ACK
+    when a segment has arrived, that segment has left the network and is
+    in the receiver's buffer, so we know it is no longer consuming
+    network resources.  Furthermore, since the ACK "clock" [Jac88] is
+    preserved, the TCP sender can continue to transmit new segments
+    (although transmission must continue using a reduced cwnd).
 
+Expires: June, 1999                                             [Page 5]
+
+draft-ietf-tcpimpl-cong-control-01.txt                     November 1998
+
+
     The fast retransmit and fast recovery algorithms are usually
     implemented together as follows.
 
@@ -290,11 +308,6 @@
 
     3.  For each additional duplicate ACK received, increment cwnd by
         MSS.  This artificially inflates the congestion window in order
-
-Expires: February, 1999                                           [Page 5]
-
-draft-ietf-tcpimpl-cong-control-00.txt                       August 1998
-
         to reflect the additional segment that has left the network.
 
     4.  Transmit a segment, if allowed by the new value of cwnd and the
@@ -310,30 +323,13 @@
         out-of-order delivery of data segments at the receiver).
         Additionally, this ACK should acknowledge all the intermediate
         segments sent between the lost segment and the receipt of the
-        first duplicate ACK, if none of these were lost.
-    
-    Implementing fast retransmit/fast recovery in this manner can lead
-    to a phenomenon which allows the fast retransmit algorithm to repair
-    multiple dropped segments from a single window of data [Flo94].
-    However, in doing so, the size of cwnd is also reduced multiple
-    times, which reduces performance.  The following steps MAY be taken
-    to reduce the impact of successive fast retransmits on performance.
+        third duplicate ACK, if none of these were lost.
 
-    A.  After the third duplicate ACK is received follow step 1 above,
-        but also record the highest sequence number transmitted
-        (send_high).
+    Note: This algorithm is known to generally not recover very
+    efficiently from multiple losses in a single flight of packets.  One
+    proposed set of modifications to it to address this problem can be
+    found in [FH98].
 
-    B.  Instead of reducing cwnd to ssthresh upon receipt of the first
-        non-duplicate ACK received (step 5), the sender should wait
-        until an ACK covering send_high is received.  In addition, each
-        duplicate ACK received should continue to artificially inflate
-        cwnd by 1 MSS.
-
-    C.  A non-duplicate ACK that does not cover send_high, followed by 3
-        duplicate ACKs should not reduce ssthresh or cwnd but SHOULD
-        trigger a retransmission.  A non-duplicate ACK that does not
-        cover send_high SHOULD NOT cause any adjustment in cwnd.
-
 4   Additional Considerations
 
 4.1 Re-starting Idle Connections
@@ -349,14 +345,14 @@
 
     [Jac88] recommends that a TCP use slow start to restart transmission
     after a relatively long idle period.  Slow start serves to restart
-
-Expires: February, 1999                                           [Page 6]
-
-draft-ietf-tcpimpl-cong-control-00.txt                       August 1998
-
     the ACK clock, just as it does at the beginning of a transfer.  This
     mechanism has been widely deployed in the following manner.  When
     TCP has not received a segment for more than one retransmission
+
+Expires: June, 1999                                             [Page 6]
+
+draft-ietf-tcpimpl-cong-control-01.txt                     November 1998
+
     timeout, cwnd is reduced to the value of the restart window (RW)
     before transmission begins.
 
@@ -379,43 +375,82 @@
     beginning transmission if the TCP has not sent data in an interval
     exceeding the retransmission timeout.
 
-4.2 Acknowledgment Mechanisms
+4.2 Generating Acknowledgments
 
     The delayed ACK algorithm specified in [Bra89] SHOULD be used by a
     TCP receiver.  When used, a TCP receiver MUST NOT excessively delay
-    acknowledgments.  Specifically, an ACK MUST be generated for every
-    second full-sized segment.  (This "MUST" is listed in [Bra89] in one
-    place as a SHOULD and another as a MUST; here we unambiguously state
-    it is a MUST.)  Furthermore, an ACK SHOULD be generated for every
-    second segment regardless of size.  Finally, an ACK MUST NOT be
-    delayed for more than 500 ms waiting on a second full-sized segment
-    to arrive.  Out-of-order data segments SHOULD be acknowledged
-    immediately, in order to trigger the fast retransmit algorithm.
+    acknowledgments.  Specifically, an ACK SHOULD be generated for at
+    least every second full-sized segment, and MUST be generated within
+    500 ms of the arrival of the first unacknowledged packet.
 
-    A TCP receiver MUST NOT generate more than one ACK for every
-    incoming segment.
+    The requirement that an ACK "SHOULD" be generated for at least every
+    second full-sized segment is listed in [Bra89] in one place as a
+    SHOULD and another as a MUST.  Here we unambiguously state it is a
+    SHOULD.  We also emphasize that this is a "strong" SHOULD, meaning
+    that an implementor should indeed only deviate from this requirement
+    after careful consideration of the implications.  See the discussion
+    of "Stretch ACK violation" in [PAD+98] and the references therein
+    for a discussion of the possible performance problems with
+    generating ACKs less frequently than every second full-sized
+    segment.
 
-    TCP implementations that implement the selective acknowledgment
-    (SACK) option [MMFR96] are able to determine which segments have not
-    arrived at the receiver.  Consequently, they can retransmit the lost
-    segments more quickly than TCPs without SACKs.  This allows a TCP
-    sender to repair multiple losses in roughly one RTT after detecting
-    loss [FF96,MM96a,MM96b].  While no specific SACK-based recovery
-    algorithm has yet been standardized, any SACK-based algorithm should
-    follow the general principles embodied by the above algorithms.
-    First, as soon as loss is detected, ssthresh should be decreased per
-    equation (3).  Second, in the RTT following loss detection, the
-    number of segments sent should be no more than half the number
-    transmitted in the previous RTT (i.e., before loss occurred).
-    Third, after the recovery period is finished, cwnd should be set to
+    In some cases, the sender and receiver may not agree on what what
+    constitutes a full-sized segment.  An implementation is deemed to
+    comply with this requirement if it sends at least one acknowledgment
+    every time it receives 2*MSS bytes of new data from the sender,
+    where MSS is the Maximum Segment Size specified by the receiver to
+    the sender (or the default value of 536 bytes, per [Bra89], if the
+    receiver does not specify an MSS option during connection
+    establishment).  Finally, we repeat that an ACK MUST NOT be delayed
+    for more than 500 ms waiting on a second full-sized segment to
+    arrive.  Out-of-order data segments SHOULD be acknowledged
+    immediately, in order to accelerate loss recovery.  To trigger the
+    fast retransmit algorithm, the receiver SHOULD send an immediate
+    duplicate ACK when it receives a data segment above a gap in the
 
-Expires: February, 1999                                           [Page 7]
+Expires: June, 1999                                             [Page 7]
 
-draft-ietf-tcpimpl-cong-control-00.txt                       August 1998
+draft-ietf-tcpimpl-cong-control-01.txt                     November 1998
 
-    the reduced value of ssthresh.  The SACK-based algorithms outlined
-    in [FF96,MM96a,MM96b] adhere to these guidelines.
+    sequence space.  To provide feedback to senders recovering from
+    losses, the receiver SHOULD send an immediate ACK when it receives a
+    data segment that fills in all or part of a gap in the sequence
+    space.
 
+    A TCP receiver MUST NOT generate more than one ACK for every
+    incoming segment, other than to update the offered window as the
+    receiving application consumes new data.
+
+4.4 Loss Recovery Mechanisms
+    
+    A number of loss recovery algorithms that augment fast retransmit
+    and fast recovery have been suggested by TCP researchers.  While
+    some of these algorithms are based on the TCP selective
+    acknowledgment (SACK) option [MMFR96], such as [FF96,MM96a,MM96b],
+    others do not require SACKs [Hoe96,FF96,FH98].  The non-SACK
+    algorithms use ``partial acknowledgments'' (ACKs which cover new
+    data, but not all the data outstanding when loss was detected) to
+    trigger retransmissions.  While this document does not standardize
+    any of the specific algorithms that may improve fast retransmit/fast
+    recovery, these enhanced algorithms are implicitly allowed, as long
+    as they follow the general principles of the basic four algorithms
+    outlined above.
+
+    Therefore, when the first loss in a window of data is detected,
+    ssthresh MUST be set to no more than the value given by equation
+    (3).  Second, until all lost segments in the window of data in
+    question are repaired, the number of segments transmitted in each
+    RTT MUST be no more than half the number of outstanding segments
+    when the loss was detected.  Finally, after all loss in the given
+    window of segments has been successfully retransmitted, cwnd MUST be
+    set to no more than ssthresh and congestion avoidance MUST be used
+    to further increase cwnd.  Loss in two successive windows of data,
+    or the loss of a retransmission, should be taken as two indications
+    of congestion and, therefore, cwnd (and ssthresh) MUST be lowered
+    twice in this case.  The algorithms outlined in
+    [Hoe96,FF96,MM96a,MM6b] follow the principles of the basic four
+    congestion control algorithms outlined in this document.
+
 5.  Security Considerations
 
     This document requires a TCP to diminish its sending rate in the
@@ -431,6 +466,11 @@
     The Internet to a considerable degree relies on the correct
     implementation of these algorithms in order to preserve network
     stability and avoid congestion collapse.  An attacker could cause
+
+Expires: June, 1999                                             [Page 8]
+
+draft-ietf-tcpimpl-cong-control-01.txt                     November 1998
+
     TCP endpoints to respond more aggressively in the face of congestion
     by forging excessive duplicate acknowledgments or excessive
     acknowledgments for new data.  Conceivably, such an attack could
@@ -448,16 +488,13 @@
     (Addison-Wesley, 1995).  This material is used with the permission
     of Addison-Wesley.
 
-    Sally Floyd devised the algorithm presented in section 3.3 for
-    avoiding multiple cwnd cutbacks in the presence of multiple packets
-    lost from the same flight.  Craig Partridge and Joe Touch
+    Neal Cardwell, Sally Floyd, Craig Partridge and Joe Touch
     contributed a number of helpful suggestions.
-    
+
 References
 
     [AFP98] M. Allman, S. Floyd, C. Partridge, Increasing TCP's Initial
-        Window Size, Internet-Draft draft-floyd-incr-init-win-03.txt.
-        May, 1998.  (Work in progress).
+        Window Size, September 1998.  RFC 2414.
 
     [Bra89] B. Braden, ed., "Requirements for Internet Hosts --
         Communication Layers," RFC 1122, Oct. 1989.
@@ -465,27 +502,34 @@
     [Bra97] S. Bradner, "Key words for use in RFCs to Indicate
         Requirement Levels", BCP 14, RFC 2119, March 1997.
 
-    
-    
+    [FF96] K. Fall, S. Floyd.  Simulation-based Comparisons of Tahoe,
+        Reno and SACK TCP.  Computer Communication Review, July 1996.
+        ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z.
 
-Expires: February, 1999                                           [Page 8]
-
-draft-ietf-tcpimpl-cong-control-00.txt                       August 1998
+    [FH98] S. Floyd, T. Henderson.  The NewReno Modification to TCP's
+        Fast Recovery Algorithm.  Internet-Draft
+        draft-ietf-tcpimpl-newreno-00.txt, November 1998.  (Work in
+        progress).
 
-    [FF96] Kevin Fall and Sally Floyd.  Simulation-based Comparisons of
-        Tahoe, Reno and SACK TCP.  Computer Communication Review, July
-        1996.  ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z.
-    
     [Flo94] S. Floyd, TCP and Successive Fast Retransmits. Technical
         report, October 1994.
-        ftp://ftp.ee.lbl.gov/papers/fastretrans.ps.
-    
-    [HTH98] Amy Hughes, Joe Touch, John Heidemann.  Internet-Draft
+	ftp://ftp.ee.lbl.gov/papers/fastretrans.ps.
+
+    [Hoe96] J. Hoe, Improving the Start-up Behavior of a Congestion
+        Control Scheme for TCP.  In ACM SIGCOMM, August 1996.
+
+    [HTH98] A. Hughes, J. Touch, J. Heidemann.  Issues in TCP Slow-Start
+        Restart After Idle.  Internet-Draft
         draft-ietf-tcpimpl-restart-00.txt, March 1998.  (Work in
         progress).
 
     [Jac88] V. Jacobson, "Congestion Avoidance and Control," Computer
         Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988.
+
+Expires: June, 1999                                             [Page 9]
+
+draft-ietf-tcpimpl-cong-control-01.txt                     November 1998
+
         ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.
 
     [Jac90] V. Jacobson, "Modified TCP Congestion Avoidance Algorithm,"
@@ -504,11 +548,14 @@
     [MMFR96] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective
         Acknowledgement Options", RFC 2018, October 1996.
     
-    [PAD+98] V. Paxson, M. Allman, S. Dawson, J. Griner, I. Heavens,
-        K. Lahey, J. Semke, B. Volz.  Internet-Draft
-        draft-ietf-tcpimpl-prob-04.txt, August 1998.  (Work in
+    [PAD+98] V. Paxson, M. Allman, S. Dawson, W. Fenner, J. Griner,
+	I. Heavens, K. Lahey, J. Semke, B. Volz.  Internet-Draft
+        draft-ietf-tcpimpl-prob-05.txt, October 1998.  (Work in
         progress).
 
+    [Pax97] V. Paxson, "End-to-End Internet Packet Dynamics,"
+        Proceedings of SIGCOMM '97, Cannes, France, Sep. 1997.
+    
     [Pos81] J. Postel, Transmission Control Protocol, September 1981.
         RFC 793.
 
@@ -527,19 +574,23 @@
 
 
 
-Expires: February, 1999                                           [Page 9]
+
+
+
+
+
+
+
+
+
+
+
+Expires: June, 1999                                            [Page 10]
 
-draft-ietf-tcpimpl-cong-control-00.txt                       August 1998
+draft-ietf-tcpimpl-cong-control-01.txt                     November 1998
 
 Author's  Address:
 
-    W. Richard Stevens
-    1202 E. Paseo del Zorro
-    Tucson, AZ  85718
-    520-297-9416
-    rstevens@kohala.com
-    http://www.kohala.com/~rstevens
-
     Mark Allman
     NASA Lewis Research Center/Sterling Software
     21000 Brookpark Rd.  MS 54-2
@@ -556,6 +607,12 @@
     510-486-7504
     vern@ee.lbl.gov
 
+    W. Richard Stevens
+    1202 E. Paseo del Zorro
+    Tucson, AZ  85718
+    520-297-9416
+    rstevens@kohala.com
+    http://www.kohala.com/~rstevens
 
 
 
@@ -586,5 +643,6 @@
 
 
 
-Expires: February, 1999                                          [Page 10]
+
+Expires: June, 1999                                            [Page 11]
 


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 16:03:36 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id QAA08954
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 16:03:35 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id OAA25271; Tue, 10 Nov 1998 14:31:09 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from gecko.nas.nasa.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id OAA25267; Tue, 10 Nov 1998 14:31:07 -0500 (EST)
Received: from gecko.nas.nasa.gov (kml@localhost)
	by gecko.nas.nasa.gov (8.9.1a/NAS8.8.7n) with ESMTP id LAA08892;
	Tue, 10 Nov 1998 11:31:05 -0800 (PST)
Message-Id: <199811101931.LAA08892@gecko.nas.nasa.gov>
To: Vern Paxson <vern@ee.lbl.gov>
cc: tcp-impl@lerc.nasa.gov
Subject: Re: Last Call for TCP Congestion Control I-D 
In-reply-to: Your message of "Tue, 10 Nov 1998 10:27:23 PST."
             <199811101827.KAA18281@daffy.ee.lbl.gov> 
Date: Tue, 10 Nov 1998 11:31:05 -0800
From: "Kevin M. Lahey" <kml@nas.nasa.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

In message <199811101827.KAA18281@daffy.ee.lbl.gov>Vern Paxson writes
>The -01 version of the TCP Congestion Control I-D has now come out.
>Mark & I feel it's ready for publication as Proposed Standard, so we're
>holding a two-week last call for final comments, with the last call ending
>on Nov 24.

>	- Clarified that the receiver doesn't know the true MSS and
>	  so the ack-at-least-every-second-full-MSS-segment requirement
>	  means in terms of its offered MSS.

Is there any way that we can tweak MSS to take path MTU discovery into
account?  It seems like the current definition would result in an
unnaturally small number of ACKs on a path with a smaller discovered
MTU.  For instance, if two ATM-equipped hosts wind up communicating
across an Ethernet, won't we wind up with only a quarter as many ACKs
as we'd get without PMTU?

Since increases in slow-start are defined in terms of MSS rather than
the number of segments, I guess the PMTU-equipped stacks wouldn't
pay a penalty versus non-PMTU-equipped stacks, but wouldn't they
be dumping large numbers of packets into the network?  Wouldn't
it be smoother to have more frequent ACKs?

I guess implementations are free to send more frequent ACKs to 
avoid this, and PMTU-equipped stacks could just be smarter about it.
Could this result in an unnaturally fast increase in the congestion
window when the receiver ACKed every two packets (not every two 
MSS-sized packets) and the sender always increased by MSS (not
the appropriate PMTU)?

Thanks,

Kevin
kml@nas.nasa.gov


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 20:04:37 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id UAA13858
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 20:04:36 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id SAA21431; Tue, 10 Nov 1998 18:13:05 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from gecko.nas.nasa.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id SAA21425; Tue, 10 Nov 1998 18:13:03 -0500 (EST)
Received: from gecko.nas.nasa.gov (kml@localhost)
	by gecko.nas.nasa.gov (8.9.1a/NAS8.8.7n) with ESMTP id PAA10710;
	Tue, 10 Nov 1998 15:13:01 -0800 (PST)
Message-Id: <199811102313.PAA10710@gecko.nas.nasa.gov>
To: Vern Paxson <vern@ee.lbl.gov>
cc: tcp-impl@lerc.nasa.gov
Subject: Re: Last Call for TCP Congestion Control I-D 
In-reply-to: Your message of "Tue, 10 Nov 1998 14:39:43 PST."
             <199811102239.OAA22618@daffy.ee.lbl.gov> 
Date: Tue, 10 Nov 1998 15:13:01 -0800
From: "Kevin M. Lahey" <kml@nas.nasa.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

In message <199811102239.OAA22618@daffy.ee.lbl.gov>Vern Paxson writes
>One of the likely agenda topics for Orlando is whether there are enough
>PMTU issues to consider doing a document on them.  Mark & I think the
>WG is pretty much ready to go dormant, except for that possibility.

I had three PMTU issues:  one bug, one possible problem, and 
a need for more research.

* Several stacks will mistakenly use a PMTU value as an MSS.
  The problem is that the sender correctly opens a connection to
  a receiver, and then determines a PMTU for the connection.
  If another connection is opened to that receiver, the MSS
  advertised is actually the previously determined PMTU.
  This means that the second connection can never try to increase
  that PMTU value as suggested in RFC1191.

* At least one vendor keeps PMTU values on a per-connection basis.
  This means that each new TCP connection starts over at the
  default MTU, and has to go through the PMTU discovery process.
  This seems like a lose for HTTP-style connections, but it is
  certainly allowed by RFC1191.

* Almost all PMTU discovery implementations are easily confused
  by routers that do not forward ICMP messages or otherwise 
  silently drop large packets.  Black hole detection has been
  discussed on this mailing list, but it doesn't seem to be
  getting (correctly) implemented.  I made a survey of PMTU 
  discovery algorithms about six months ago, and found that only
  Windows NT got it totally right.

  It sure would be nice to get some more experience with this and
  to hear from recent implementors (it sounds like the Linux 
  folks have done some good stuff).  It'd be nice to get closer to
  a standard for delayed acknowledgements with path MTU discovery...

Thanks,

Kevin
kml@nas.nasa.gov


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 20:05:15 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id UAA13873
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 20:05:15 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id SAA22464; Tue, 10 Nov 1998 18:26:25 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from daffy.ee.lbl.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id SAA22456; Tue, 10 Nov 1998 18:26:22 -0500 (EST)
Received: (from vern@localhost)
	by daffy.ee.lbl.gov (8.9.1/8.9.1) id OAA22618;
	Tue, 10 Nov 1998 14:39:43 -0800 (PST)
Message-Id: <199811102239.OAA22618@daffy.ee.lbl.gov>
To: "Kevin M. Lahey" <kml@nas.nasa.gov>
Cc: tcp-impl@lerc.nasa.gov
Subject: Re: Last Call for TCP Congestion Control I-D 
In-reply-to: Your message of Tue, 10 Nov 1998 13:52:34 PST.
Date: Tue, 10 Nov 1998 14:39:43 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> My worry here is that we could see a sender that does path MTU
> discovery and a naive receiver that does not.  Say the MSS is four
> times the PMTU.  The receiver will ACK after receiving 2 * MSS bytes,
> but this'll have been perhaps 8 segments.
> 
> The sender will get the ACK, and increase the congestion window by MSS,
> resulting in 4 new segments, plus the 8 segments acknowledged, for a
> total of 12 new segments.  Ouch!

My assumption is that MSS means the current maximum segment size, which
may reflect the value offered by the receiver during start up, or may
reflect a subsequent (smaller) value discovered by PMTU discovery.  So
the above would lead to 9 new segments, the 8 ack'd plus one more for
slow-start.

I would hope that receivers are a bit more intelligent about deciding
how much data to ack, e.g., every two segments of any decent size.  But
I don't think 2001.bis should mandate this, because we don't have a
standard algorithm to mandate.

> OTOH, if the sender only increases the congestion window by the PMTU
> value (rather than MSS), then its slow-start would be really slow when
> coupled with the reduced number of ACKs...

Yes, though I suspect that, in general, receivers that offer large initial
MSS's care enough about performance to also ack fairly often in order to
open the window quickly.

> Ouch, that's where I'm obviously slipping up.  I'd sure like to see
> these standards updated a bit to take PMTU and other advances into
> account, but I guess we're pretty much just trying to get folks to get
> the original stuff right...

Right, that's what 2001.bis is trying to do.  We can also consider writing
Experimental RFCs on tweaks to 2001.bis (such as the New Reno I-D).

One of the likely agenda topics for Orlando is whether there are enough
PMTU issues to consider doing a document on them.  Mark & I think the
WG is pretty much ready to go dormant, except for that possibility.

		Vern


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 20:51:38 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id UAA14179
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 20:51:37 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id PAA06240; Tue, 10 Nov 1998 15:52:41 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from daffy.ee.lbl.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id PAA06236; Tue, 10 Nov 1998 15:52:39 -0500 (EST)
Received: (from vern@localhost)
	by daffy.ee.lbl.gov (8.9.1/8.9.1) id MAA18983;
	Tue, 10 Nov 1998 12:52:34 -0800 (PST)
Message-Id: <199811102052.MAA18983@daffy.ee.lbl.gov>
To: "Kevin M. Lahey" <kml@nas.nasa.gov>
Cc: tcp-impl@lerc.nasa.gov
Subject: Re: Last Call for TCP Congestion Control I-D 
In-reply-to: Your message of Tue, 10 Nov 1998 11:31:05 PST.
Date: Tue, 10 Nov 1998 12:52:34 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> Is there any way that we can tweak MSS to take path MTU discovery into
> account?  It seems like the current definition would result in an
> unnaturally small number of ACKs on a path with a smaller discovered
> MTU.

Note that the current definition is for the *upper bound* on how often
a TCP acks.  A TCP can certainly ack more often, and in particular if
it has a different notion of the sender's MSS, that's fine, as long as
it's less than the original MSS it told the sender.  We're not trying to
*change* how (most) current TCPs ack, just clarify the requirement in
the spec.

> Since increases in slow-start are defined in terms of MSS rather than
> the number of segments, I guess the PMTU-equipped stacks wouldn't
> pay a penalty versus non-PMTU-equipped stacks, but wouldn't they
> be dumping large numbers of packets into the network?

They will be dumping the same number of packets.  The packets may be
larger or smaller, depending on the path MTU.  We hope that they are
larger.  This is overall a Good Thing, we want to encourage use of
the largest non-fragmenting segment size.

> I guess implementations are free to send more frequent ACKs to 
> avoid this ...

Right, per the above.

> Could this result in an unnaturally fast increase in the congestion
> window when the receiver ACKed every two packets (not every two 
> MSS-sized packets) and the sender always increased by MSS (not
> the appropriate PMTU)?

MSS for the sender means the maximum segment size it's using to send,
and includes whatever modifications it's made to its original size
based on PMTU discovery.

Hmmmm, it looks like this is the key point that you're stressing.
We need to add this to the definition of MSS.

		Vern


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 20:53:14 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id UAA14194
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 20:53:13 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id TAA28408; Tue, 10 Nov 1998 19:31:15 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from palrel3.hp.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id TAA28404; Tue, 10 Nov 1998 19:31:13 -0500 (EST)
Received: from loiter.cup.hp.com (root@loiter.cup.hp.com [15.13.104.252])
	by palrel3.hp.com (8.8.6 (PHNE_14041)/8.8.5tis) with ESMTP id QAA16688
	for <tcp-impl@lerc.nasa.gov>; Tue, 10 Nov 1998 16:31:36 -0800 (PST)
Received: from cup.hp.com (raj@loiter [15.13.104.252]) by loiter.cup.hp.com with ESMTP (8.8.6/8.7.3 TIS Messaging 5.0) id QAA12406 for <tcp-impl@lerc.nasa.gov>; Tue, 10 Nov 1998 16:31:10 -0800 (PST)
Message-ID: <3648DACE.3A7F59E7@cup.hp.com>
Date: Tue, 10 Nov 1998 16:31:10 -0800
From: Rick Jones <raj@cup.hp.com>
Organization: SNSL
X-Mailer: Mozilla 4.08 [en] (X11; I; HP-UX B.10.20 9000/735)
MIME-Version: 1.0
To: tcp-impl@lerc.nasa.gov
Subject: Re: Last Call for TCP Congestion Control I-D
References: <199811102313.PAA10710@gecko.nas.nasa.gov>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

My favorite issue with PMTU is that the base RFC states that after the
PTMU timer expires, the PMTU route should have the MTU reset to the
local link MTU. 

If you do this on the great big Internet, you end-up slowly (or quickly)
accumulating PMTU routes.

So, if we want to add clarifications to PMTU, stating that it would be a
good idea to have some means to cull old PMTU routes from the routing
tables would be a good idea.

rick jones

-- 
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, or post, but please do not do both...
my email address is raj in the cup.hp.com domain...


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 21:45:24 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id VAA14624
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 21:45:23 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id QAA14008; Tue, 10 Nov 1998 16:52:38 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from gecko.nas.nasa.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id QAA13998; Tue, 10 Nov 1998 16:52:36 -0500 (EST)
Received: from gecko.nas.nasa.gov (kml@localhost)
	by gecko.nas.nasa.gov (8.9.1a/NAS8.8.7n) with ESMTP id NAA10021;
	Tue, 10 Nov 1998 13:52:35 -0800 (PST)
Message-Id: <199811102152.NAA10021@gecko.nas.nasa.gov>
To: Vern Paxson <vern@ee.lbl.gov>
cc: tcp-impl@lerc.nasa.gov
Subject: Re: Last Call for TCP Congestion Control I-D 
In-reply-to: Your message of "Tue, 10 Nov 1998 12:52:34 PST."
             <199811102052.MAA18983@daffy.ee.lbl.gov> 
Date: Tue, 10 Nov 1998 13:52:34 -0800
From: "Kevin M. Lahey" <kml@nas.nasa.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

In message <199811102052.MAA18983@daffy.ee.lbl.gov>Vern Paxson writes
>> Since increases in slow-start are defined in terms of MSS rather than
>> the number of segments, I guess the PMTU-equipped stacks wouldn't
>> pay a penalty versus non-PMTU-equipped stacks, but wouldn't they
>> be dumping large numbers of packets into the network?
>
>They will be dumping the same number of packets.  The packets may be
>larger or smaller, depending on the path MTU.  We hope that they are
>larger.  This is overall a Good Thing, we want to encourage use of
>the largest non-fragmenting segment size.

My worry here is that we could see a sender that does path MTU
discovery and a naive receiver that does not.  Say the MSS is four
times the PMTU.  The receiver will ACK after receiving 2 * MSS bytes,
but this'll have been perhaps 8 segments.

The sender will get the ACK, and increase the congestion window by MSS,
resulting in 4 new segments, plus the 8 segments acknowledged, for a
total of 12 new segments.  Ouch!

OTOH, if the sender only increases the congestion window by the PMTU
value (rather than MSS), then its slow-start would be really slow when
coupled with the reduced number of ACKs...

>We're not trying to *change* how (most) current TCPs ack, just clarify
>the requirement in the spec.

Ouch, that's where I'm obviously slipping up.  I'd sure like to see
these standards updated a bit to take PMTU and other advances into
account, but I guess we're pretty much just trying to get folks to get
the original stuff right...

Naively,

Kevin 
kml@nas.nasa.gov


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 10 22:40:36 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id WAA16788
	for <tcpimpl-archive@lists.ietf.org>; Tue, 10 Nov 1998 22:40:36 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id VAA05799; Tue, 10 Nov 1998 21:20:12 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from shasta-pc.shastanets.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id VAA05795; Tue, 10 Nov 1998 21:20:09 -0500 (EST)
Received: from stevea-pc (stevea-pc.shastanets.com [209.31.29.163]) by shasta-pc.shastanets.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.1960.3)
	id V69YYWNR; Tue, 10 Nov 1998 18:19:03 -0800
Reply-To: <stevea@shastanets.com>
From: "Steve Alexander" <stevea@shastanets.com>
To: "'Rick Jones'" <raj@cup.hp.com>, <tcp-impl@lerc.nasa.gov>
Subject: RE: Last Call for TCP Congestion Control I-D
Date: Tue, 10 Nov 1998 18:20:45 -0800
Message-ID: <000001be0d19$e3e39940$a31d1fd1@stevea-pc.shastanets.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2232.26
Importance: Normal
In-Reply-To: <3648DACE.3A7F59E7@cup.hp.com>
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk



> So, if we want to add clarifications to PMTU, stating that it 
> would be a
> good idea to have some means to cull old PMTU routes from the routing
> tables would be a good idea.
 
I found that deleting the PMTU route was semantically indistinguishable
from leaving it around with the link's MTU.  I guess I must be a
scofflaw, but I still believe in the "running code" idea.

-- Steve


From owner-tcp-impl@lerc.nasa.gov  Wed Nov 11 11:40:05 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id LAA07042
	for <tcpimpl-archive@lists.ietf.org>; Wed, 11 Nov 1998 11:40:03 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id JAA26246; Wed, 11 Nov 1998 09:47:59 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from ietf.org (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id JAA26239; Wed, 11 Nov 1998 09:47:57 -0500 (EST)
Received: from CNRI.Reston.VA.US (localhost [127.0.0.1])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id JAA00949;
	Wed, 11 Nov 1998 09:47:53 -0500 (EST)
Message-Id: <199811111447.JAA00949@ietf.org>
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary="NextPart"
To: IETF-Announce:;;@ns.cnri.reston.va.us
Cc: tcp-impl@lerc.nasa.gov
From: Internet-Drafts@ietf.org
Reply-to: Internet-Drafts@ietf.org
Subject: I-D ACTION:draft-ietf-tcpimpl-newreno-00.txt
Date: Wed, 11 Nov 1998 09:47:53 -0500
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

--NextPart

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the TCP Implementation Working Group of the IETF.

	Title		: The NewReno Modification to TCP's 
                          Fast Recovery Algorithm
	Author(s)	: S. Floyd, T. Henderson
	Filename	: draft-ietf-tcpimpl-newreno-00.txt
	Pages		: 11
	Date		: 10-Nov-98
	
   RFC 2001 [RFC2001] documents the following four intertwined TCP
   congestion control algorithms: Slow Start, Congestion Avoidance, Fast
   Retransmit, and Fast Recovery.  RFC 2001-bis [RFC2001-bis] explicitly
   allows certain modifications of these algorithms, including
   modifications that use the TCP Selective Acknowledgement (SACK)
   option [MMFR96], and modifications that respond to ``partial
   acknowledgments'' (ACKs which cover new data, but not all the data
   outstanding when loss was detected) in the absence of SACK.  This
   document describes a specific algorithm for responding to partial
   acknowledgments, referred to as NewReno.  This response to partial
   acknowledgments was first proposed by Janey Hoe in [Hoe95].


Internet-Drafts are available by anonymous FTP.  Login with the username
"anonymous" and a password of your e-mail address.  After logging in,
type "cd internet-drafts" and then
	"get draft-ietf-tcpimpl-newreno-00.txt".
A URL for the Internet-Draft is:
ftp://ftp.ietf.org/internet-drafts/draft-ietf-tcpimpl-newreno-00.txt

Internet-Drafts directories are located at:

	Africa:	ftp.is.co.za
	
	Europe: ftp.nordu.net
		ftp.nic.it
			
	Pacific Rim: munnari.oz.au
	
	US East Coast: ftp.ietf.org
	
	US West Coast: ftp.isi.edu

Internet-Drafts are also available by mail.

Send a message to:	mailserv@ietf.org.  In the body type:
	"FILE /internet-drafts/draft-ietf-tcpimpl-newreno-00.txt".
	
NOTE:	The mail server at ietf.org can return the document in
	MIME-encoded form by using the "mpack" utility.  To use this
	feature, insert the command "ENCODING mime" before the "FILE"
	command.  To decode the response(s), you will need "munpack" or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	"multipart" MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.
		
		
Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary="OtherAccess"

--OtherAccess
Content-Type: Message/External-body;
	access-type="mail-server";
	server="mailserv@ietf.org"

Content-Type: text/plain
Content-ID:	<19981110165404.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-ietf-tcpimpl-newreno-00.txt

--OtherAccess
Content-Type: Message/External-body;
	name="draft-ietf-tcpimpl-newreno-00.txt";
	site="ftp.ietf.org";
	access-type="anon-ftp";
	directory="internet-drafts"

Content-Type: text/plain
Content-ID:	<19981110165404.I-D@ietf.org>

--OtherAccess--

--NextPart--




From owner-tcp-impl@lerc.nasa.gov  Tue Nov 17 14:34:57 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id OAA16985
	for <tcpimpl-archive@lists.ietf.org>; Tue, 17 Nov 1998 14:34:56 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id MAA12637; Tue, 17 Nov 1998 12:35:10 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from guns.lerc.nasa.gov (guns.lerc.nasa.gov [139.88.44.160]) by assateague.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id MAA12623; Tue, 17 Nov 1998 12:35:06 -0500 (EST)
Received: by guns.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-local)
        id MAA22291; Tue, 17 Nov 1998 12:35:04 -0500 (EST)
Date: Tue, 17 Nov 1998 12:35:04 -0500 (EST)
From: mallman@guns.lerc.nasa.gov (Mark Allman)
Message-Id: <199811171735.MAA22291@guns.lerc.nasa.gov>
To: tcp-impl@lerc.nasa.gov
Subject: PILC BOF -- Orlando IETF
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


[I appologize if you receive more than one copy of this.]

A quick note to announce a BOF that will be held at the IETF meeting
in Orlando.  This BOF will be to gauage the interest in (and scope
of) possibly creating an IETF working group to document performance
implications of various less-than-ideal link characteristics.
Below is the BOF description.  At the bottom are instructions for
getting on the PILC mailing list.  Please direct all discussions of
the upcoming BOF to the PILC list.  Thanks!

allman




BOF Description:

Name: Performance Implications of Link Characteristics (PILC)
Chairs: Aaron Falk and Mark Allman 
        adalk@mail.hac.com, mallman@lerc.nasa.gov


Description:

The Internet network-layer and transport-layer protocols are
designed to accommodate a very wide range of networking technologies
and characteristics.  Nevertheless, experience has shown that the
particular properties of different network links can have a
significant impact on the performance of Internet protocols
operating over those links, and on the performance of connections
along paths that include such links.  Some examples of possibly
problematic characteristics:

	- Long delay links.  Affects timer estimation, congestion
	  adaptation. 

	- Links with high bandwidth-delay product.  Difficult to
	  fully utilize and still respond to congestion.

	- Links with variable bandwidth-delay products (such as
	  multi-channel ISDN, or LEOs).  Difficult to track how much
	  throughput a connection should try to attain.

	- Links with varying delay. Delays due to link level
	  signalling may affect the accuracy of RTO estimation.

	- Links with link layer flow control.  May have adverse
	  interactions with congestion control.

	- Links with asymmetric bandwidth.  Can lead to throughput
	  limitations such as ACK starvation.

	- Links with unusually high error rates.  Can lead to loss
	  of performance due to inappropriate perception of
	  congestion.  Can impair protocols that do not protect
	  their data with strong checksums.

	- Links with inconsistent error rates.  Can defeat attempts
	  to distinguish between congestive loss and corruption
	  loss. 

        - Links with significant monetary, resource-scarcity, or
	  delay cost for establishing connections and/or keeping
	  connections open.  These encourage not only
	  bandwidth-efficient protocols, but also protocols that
	  cluster their packet exchanges together rather than
	  spacing them out in time, and can increase the cost of
	  using standard soft-state mechanisms such as periodic
	  state refresh messages.

	- Significantly low bandwidth links.  Creates pressure for
	  mixing together otherwise separate elements (such as
	  transport and network layers, or multiple transport
	  connections) in an effort to reduce bandwidth requirements
	  via compression.

	- Unidirectional.  Can make link-layer request/response, such as ARP,
	  impossible.  (UDLR already addresses the routing elements
	  of this.)

	- Non-transitive reachability (A can reach B, and B can reach C,
	  but A can't reach C).  Breaks assumptions about subnet
	  properties.

	- Shared-channel broadcast to huge numbers of receivers.  Such
	  links stress the notion of using them for unicast traffic
	  and the viability of protocols that require a backchannel
	  such as IGMP.  Such links also incur a danger of major
	  broadcast reply implosions.

	- Links that reorder packets.  Can erroneously trigger TCP fast
	  retransmission.  Impairs protocols that adapt based on
	  timing analysis.  Can stress reassembly code if not well
	  tuned.

	- Multipathing.  Can complicate protocol adaptation since there
	  is no longer a single path property to estimate, but more
	  than one.  Diminshes efficiency of header compression
	  algorithms.

	- Links with intermittent outages.  Timers may fail to adapt
	  to outage and either falsely signal lost connectivity, or
	  back off so far that response once the outage resolves is
	  highly delayed.

	- Small MTU links.  Can lead to "black holes" if PMTU
          discovery mechanism fails.

This BOF will explore the size of this problem space: what sort of
link characteristics prove problematic, what sort of Internet
protocols are adversely affected?

One goal is to assess the utility of forming a working group to
produce informational document(s) detailing how link characteristics
interact with different IETF protocols - what well-established and
less-well-established steps can be taken to ameliorate these
problems.

Another goal is to assess the utility forming a working group(s) to
develop modifications or extensions to IETF protocols to address the
most pressing performance issues.

Note that the BOF focusses on link *characteristics* and not on link
layer *technologies* such as Frame Relay, ATM or Ethernet.
Discussing particular characteristics present in some of these
(e.g., capture effect) is in scope; discussing aspects of the
technology as a whole (e.g., CIR) is out of scope.

A mailing list for discussing the issues related to this BOF has
been setup.  Notes to the list should be sent to
pilc@lerc.nasa.gov.  To subscribe to the mailing list, send a note
to majordomo@lerc.nasa.gov with the words "subscribe pilc" in the
body of the note.  The mailing list archive is available at
http://pilc.lerc.nasa.gov/pilc/.


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 17 16:27:37 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id QAA24395
	for <tcpimpl-archive@lists.ietf.org>; Tue, 17 Nov 1998 16:27:36 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id OAA28060; Tue, 17 Nov 1998 14:45:30 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from mail3.microsoft.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id OAA28050; Tue, 17 Nov 1998 14:45:28 -0500 (EST)
Received: by mail3.microsoft.com with Internet Mail Service (5.5.2232.9)
	id <WYYRD1HV>; Tue, 17 Nov 1998 11:45:27 -0800
Message-ID: <416245726D64D211808100805F3198EE0132F65D@RED-MSG-46>
From: Peter Ford <peterf@microsoft.com>
To: "'Vern Paxson'" <vern@ee.lbl.gov>, tcp-impl@lerc.nasa.gov
Cc: mallman@lerc.nasa.gov, sob@harvard.edu
Subject: RE: Last Call for TCP Congestion Control I-D (IW>1)
Date: Tue, 17 Nov 1998 11:45:19 -0800
X-Mailer: Internet Mail Service (5.5.2232.9)
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Mark, Vern and William,

Given the following text in RFC 2414:

   This upper bound for the initial window size represents a change from
   RFC 2001 [S97], which specifies that the congestion window be
   initialized to one segment.  If implementation experience proves
   successful, then the intent is for this change to be incorporated
   into a revision to RFC 2001.

isn't the language in the -01 draft a little out of sync?

    IW, the initial value of cwnd, MUST be less than or equal to MSS
bytes.

    We note that a non-standard, experimental TCP extension allows that
    a TCP MAY use a larger initial window (IW), as defined in equation 1
    [AFP98]:

               IW = min (4*MSS, max (2*MSS, 4380 bytes))             (1)

    With this extension, a TCP sender MAY use a 2 segment initial
    window, regardless of the segment size, and 3 and 4 segment initial
    windows MAY be used, provided the combined size of the segments does
    not exceed 4380 bytes.  We do NOT allow this change as part of the
    standard defined by this document.  However, we include discussion
    of (1) in the remainder of this document as a guideline for those
    experimenting with the change, rather than conforming to the present
    standards for TCP congestion control. 

   
I believe the language in the draft should be modified to reflect the intent
and direction the Internet community is taking.  This is clearly documented
in RFC 2414.  May I propose that the paragraph from RFC 2414 be lifted and
used to replace the sentences in the last paragraph  starting with: "We do
NOT allow this change as part of the standard defined by this document.
However, ..."

The draft on TCP Congestion Control  should reflect the knowledge we have
gained since 1988, and indicate direction and rationale to those who read
the spec.

cheers, peter





 


From owner-tcp-impl@lerc.nasa.gov  Tue Nov 17 16:55:57 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id QAA26415
	for <tcpimpl-archive@lists.ietf.org>; Tue, 17 Nov 1998 16:55:56 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id PAA01887; Tue, 17 Nov 1998 15:19:26 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from guns.lerc.nasa.gov (guns.lerc.nasa.gov [139.88.44.160]) by assateague.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id PAA01882; Tue, 17 Nov 1998 15:19:24 -0500 (EST)
Received: from guns.lerc.nasa.gov by guns.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-local)
        id PAA23405; Tue, 17 Nov 1998 15:19:24 -0500 (EST)
Message-Id: <199811172019.PAA23405@guns.lerc.nasa.gov>
To: Peter Ford <peterf@MICROSOFT.com>
From: Mark Allman <mallman@lerc.nasa.gov>
Reply-To: mallman@lerc.nasa.gov
cc: "'Vern Paxson'" <vern@ee.lbl.gov>, tcp-impl@lerc.nasa.gov, sob@harvard.edu
Subject: Re: Last Call for TCP Congestion Control I-D (IW>1) 
Organization: Late Night Hackers, NASA LeRC, Cleveland, Ohio
Song-of-the-Day: In the City
Date: Tue, 17 Nov 1998 15:19:23 -0500
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


>    This upper bound for the initial window size represents a
>    change from RFC 2001 [S97], which specifies that the congestion
>    window be initialized to one segment.  If implementation
>    experience proves successful, then the intent is for this
>    change to be incorporated into a revision to RFC 2001.

Peter-

RFC 2414 was published as an experimental RFC.  It shows promise as
a Good Thing, but being that it has been out only a couple of months
I don't think we have all that much implementation experience with
it right now (at least not much more than when it was published as
an experimental RFC).  I think that the larger initial window will
be appropriate to standardize in **A** version of 2001.  And, I hope
that implementation experience proves me right.  But, I don't think
the experience has shown larger initial windows to be a Good Thing
or a Bad Thing at this point.  And, therefore, I think that the
wording in the current version of the document is correct.

allman


From owner-tcp-impl@lerc.nasa.gov  Sat Nov 28 13:02:32 1998
Received: from assateague.lerc.nasa.gov (assateague-fi.lerc.nasa.gov [139.88.112.23])
	by ietf.org (8.8.5/8.8.7a) with ESMTP id NAA15179
	for <tcpimpl-archive@lists.ietf.org>; Sat, 28 Nov 1998 13:02:31 -0500 (EST)
Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)
        id DAA19266; Sat, 28 Nov 1998 03:29:36 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-tcp-impl@lerc.nasa.gov using -f
Received: from daffy.ee.lbl.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id DAA19262; Sat, 28 Nov 1998 03:29:32 -0500 (EST)
Received: (from vern@localhost)
	by daffy.ee.lbl.gov (8.9.1/8.9.1) id AAA26174;
	Sat, 28 Nov 1998 00:29:32 -0800 (PST)
Message-Id: <199811280829.AAA26174@daffy.ee.lbl.gov>
To: tcp-impl@lerc.nasa.gov
Subject: draft agenda for Orlando tcpimpl meeting
Cc: mallman@lerc.nasa.gov, sob@harvard.edu
Date: Sat, 28 Nov 1998 00:29:31 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Here's the draft agenda Mark & I have put together.  Comments/additions
welcome, please send them as soon as possible.  (A revised agenda will
be sent to the IETF agenda meister on Monday.)

	1.  Agenda bashing (5 minutes)
	2.  Status of WG drafts:
	      - Known Problems, TCP Congestion Control
	      - NewReno I-D
	3.  Report on Usefulness of Documents -- Jamshid Mahdavi
	4.  PMTU Issues

This last is meant to be an effort to sketch out different issues with
implementing PMTU discovery, as we're wondering if there are enough of
these (and sufficient WG energy) to merit writing a document for them.

Also, note that this is likely to be the last tcpimpl WG meeting, with
the group going on hiatus (though the mailing list still active) once we
get the pending documents out.

		Vern


