From nobody Wed Aug  2 08:54:18 2017
Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 66A7B131BBF for <tcpm@ietfa.amsl.com>; Wed,  2 Aug 2017 08:54:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level: 
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZS9gTAG5twLe for <tcpm@ietfa.amsl.com>; Wed,  2 Aug 2017 08:54:14 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7B14C131935 for <tcpm@ietf.org>; Wed,  2 Aug 2017 08:54:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:References:Cc:To:Subject:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=VL7gsOyeLSFMZ2UJaz45XCzw+1StF8vaPqWyauBaHAQ=; b=J8rq4Icp332Ih65mNq2PUPGdq oPZdK/OZKNUw4YfMFoNxAhzHItwl9PBN8mBKSg8FtZ9o+Puq2gcnHi7rvEtfs9HCqnTI/lpcV4yXD 73YokmN/+iRK4i5yXlTw8tYva2aAUepzWpbY2nLt4sq21h6XqLCboeTLm8B6iXWjo46DYfQAlgYe6 3aF4eeJK60D72d7MiS26NfjZt2NbYcsSdgwC7WK3pkmWDea53UmESi6yCqMrkM1ZMSATJ34OFTSdR Si51z1MjfX7YWkoe4R+ZbtNgdHjcwQ9MC70lWpjPf4oi5MSGfoLMZ7c8pqOFr7o2U0dJ7XXqeDZq/ ULT6qLQsw==;
Received: from 52.139.199.146.dyn.plus.net ([146.199.139.52]:33722 helo=[192.168.1.2]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89) (envelope-from <ietf@bobbriscoe.net>) id 1dcvyO-0005ot-9f; Wed, 02 Aug 2017 16:54:12 +0100
From: Bob Briscoe <ietf@bobbriscoe.net>
To: Eric Dumazet <edumazet@google.com>, Yuchung Cheng <ycheng@google.com>, Wei Wang <weiwan@google.com>, Neal Cardwell <ncardwell@google.com>
Cc: tcpm IETF list <tcpm@ietf.org>
References: <8abadc4d-4165-a5bc-23bb-e4f9258c695b@bobbriscoe.net> <CAK6E8=c4D0QTzMobMQXLZMU5JiBRXXPdYJ0KTqvg08t+G0VDxQ@mail.gmail.com> <CANn89iL+TC6sh=e+keb4Psxz+E6oHV3Mcvsay6UYL2qEKUT6bw@mail.gmail.com> <2131135f-b123-70f0-d464-dac6640d6cd2@bobbriscoe.net>
Message-ID: <d2570431-8c01-d7fc-5aa3-581d69836923@bobbriscoe.net>
Date: Wed, 2 Aug 2017 16:54:11 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <2131135f-b123-70f0-d464-dac6640d6cd2@bobbriscoe.net>
Content-Type: multipart/alternative; boundary="------------3828722705E8406C00D6F410"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/pPg8FK3a3kEeBcp3_YAmJAWQpN8>
Subject: [tcpm] Review of draft-wang-tcpm-low-latency-opt-00
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Aug 2017 15:54:17 -0000

This is a multi-part message in MIME format.
--------------3828722705E8406C00D6F410
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Wei, Yuchung, Neal and Eric, as authors of 
draft-wang-tcpm-low-latency-opt-00,

I promised a review. It questions the technical logic behind the draft, 
so I haven't bothered to give a detailed review of the wording of the 
draft, because that might be irrelevant if you agree with my arguments.

*1/ MAD by configuration?**
*

    o  If the user does not specify a MAD value, then the implementation
       SHOULD NOT specify a MAD value in the Low Latency option.

That sentence triggered my "anti-human-intervention" reflex. My train of 
thought went as follows:

* Let's consider what advice we would give on what MAD value ought to be 
configured.
* You say that MAD can be smaller in DCs. So I assume your advice would 
be that MAD should depend on RTT {Note 1} and clock granularity {Note 2}.
* So why configure one value of MAD for all RTTs? That only makes sense 
in DC environments where the range of RTTs is small.
* However, for the range of RTTs on the public Internet, why not 
calculate MAD from RTT and granularity, then standardize the calculation 
so that both ends arrive at the same result when starting from the same 
RTT and granularity parameters? (The sender and receiver might measure 
different smoothed (SRTT) values, but they will converge as the flow 
progresses.)

Then the receiver only needs to communicate its clock granularity to the 
sender, and the fact that it is driving MAD off its SRTT. Then the 
sender can use a formula for RTO derived from the value of MAD that it 
calculates the receiver will be using. Then its RTO will be completely 
tailored to the RTT of the flow.

Note: There are two different uses for the min RTO that need to be 
separated:
     a) Before an initial RTT value has been measured, to determine the 
RTO during the 3WHS.
     b) Once either end has measured the RTT for a connection.
(a) needs to cope with the whole range of possible RTTs, whereas (b) is 
the subject of this email, because it can be tailored for the measured RTT.

*2/ The problem, and its prevalence**
*
With gradual removal of bufferbloat and more prevalent usage of CDNs, 
typical base RTTs on the public Internet now make the value of minRTO 
and of MAD look silly.

As can be seen above, the problem is indeed that each end only has 
partial knowledge of the config of the other end.
However, the problem is not just that MAD needs to be communicated to 
the other end so it can be hard-coded to a lower value.
The problem is that MAD is hard-coded in the first place.

The draft needs to say how prevalent the problem is (on the public 
Internet) where the sender has to wait for the receiver's delayed ACK 
timer at the end of a flow or between the end of a volley of packets and 
the start of the next.

The draft also needs to say what tradeoff is considered acceptable 
between a residual level of spurious retransmissions and lower timeout 
delay. Eliminating all spurious retransmissions is not the goal.

The draft also needs to say that introducing a new TCP Option is itself 
a problem (on the public Internet), because of middleboxes particularly 
proxies. Therefore a solution that does not need a new TCP Option would 
be preferable....

Perhaps the solution for communicating timestamp resolution in 
draft-scheffenegger-tcpm-timestamp-negotiation-05 (which cites 
draft-trammell-tcpm-timestamp-interval-01) could be modified to also 
communicate:
* TCP's clock granularity (closely related to TCP timestamp resolution),
*  and the fact that the host is calculating MAD as a function of RTT 
and granularity.
Then the existing timestamp option could be repurposed, which should 
drastically reduce deployment problems.

*3/ Only DC?**
*
All the related work references are solely in the context of a DC. Pls 
include refs about this problem in a public Internet context. You will 
find there is a pretty good search engine at www.google.com.

The only non-DC ref I can find about minRTO is [Psaras07], which is 
mainly about a proposal to apply minRTO if the sender expects the next 
ACK to be delayed. Nonetheless, the simulation experiment in Section 5.1 
provides good evidence for how RTO latency is dependent on uncertainty 
about the MAD that the other end is using.

[Psaras07] Psaras, I. & Tsaoussidis, V., "The TCP Minimum RTO 
Revisited," In: Proc. 6th Int'l IFIP-TC6 Conference on Ad Hoc and Sensor 
Networks, Wireless Networks, Next Generation Internet NETWORKING'07 
pp.981-991 Springer-Verlag (2007)
https://www.researchgate.net/publication/225442912_The_TCP_Minimum_RTO_Revisited

*4/ Status**
*
Normally, I wouldn't want to hold up a draft that has been proven over 
years of practice, such as the technique in low-latency-opt, which has 
been proven in Google's DCs over the last few years. Whereas, my ideas 
are just that: ideas, not proven. However, the technique in 
low-latency-opt has only been proven in DC environments where the range 
of RTTs is limited. So, now that you are proposing to transplant it onto 
the public Internet, it also only has the status of an unproven idea.

To be clear, as it stands, I do not think low-latency-opt is applicable 
to the public Internet.


*5/ Nits**
*These nits depart from my promise not comment on details that could 
become irrelevant if you agree with my idea. Hey, whatever,...

S.3.5:

	RTO <- SRTT + max(G, K*RTTVAR) + max(G, max_ACK_delay)

My immediate reaction to this was that G should not appear twice. 
However, perhaps you meant them to be G_s and G_r (sender and receiver) 
respectively. {Note 2}

S.3.5 & S.5. It seems unnecessary to prohibit values of MAD greater than 
the default (given some companies are already investing in commercial 
public space flight programmes, so TCP could need to routinely support 
RTTs that are longer than typical not just shorter).


Cheers


Bob

*
**{Note 1}*: On average, if not app-limited, the time between ACKs will 
be d_r*R_r/W_s where:
    R is SRTT
    d is the delayed ACK factor, e.g. d=2 for ACKing every other packet
    W is the window in units of segments
    subscripts X_r or X_s denote receiver or sender for the half-connection.

So as long as the receiver can estimate the varying value of W at the 
sender, the receiver's MAD could be
     MAD_r = max(k*d_r*R_r / W_s, G_r),
The factor k (lower case) allows for some bunching of packets e.g. due 
to link layer aggregation or the residual effects of slow-start, which 
leaves some bunching even if SS uses pacing. Let's say k=2, but it would 
need to be checked empirically.

For example, take R=100us, d=2, W=8 and G = 1us.
Given d*R/W = 25us, MAD could be perhaps 50us (i.e. k=2). k might need 
to be greater, but there would certainly be no need for MAD to be 5ms, 
which is perhaps 100 times greater than necessary.
*
**{Note 2}*: Why is there no field in the Low Latency option to 
communicate receiver clock granularity to the sender?


Bob

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/


--------------3828722705E8406C00D6F410
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Wei, Yuchung, Neal and Eric, as authors of
    draft-wang-tcpm-low-latency-opt-00,<br>
    <br>
    I promised a review. It questions the technical logic behind the
    draft, so I haven't bothered to give a detailed review of the
    wording of the draft, because that might be irrelevant if you agree
    with my arguments.<br>
    <br>
    <b>1/ MAD by configuration?</b><b><br>
    </b>
    <pre class="newpage">   o  If the user does not specify a MAD value, then the implementation
      SHOULD NOT specify a MAD value in the Low Latency option.
</pre>
    That sentence triggered my "anti-human-intervention" reflex. My
    train of thought went as follows:<br>
    <br>
    * Let's consider what advice we would give on what MAD value ought
    to be configured.<br>
    * You say that MAD can be smaller in DCs. So I assume your advice
    would be that MAD should depend on RTT {Note 1} and clock
    granularity {Note 2}.<br>
    * So why configure one value of MAD for all RTTs? That only makes
    sense in DC environments where the range of RTTs is small. <br>
    * However, for the range of RTTs on the public Internet, why not
    calculate MAD from RTT and granularity, then standardize the
    calculation so that both ends arrive at the same result when
    starting from the same RTT and granularity parameters? (The sender
    and receiver might measure different smoothed (SRTT) values, but
    they will converge as the flow progresses.)<br>
    <br>
    Then the receiver only needs to communicate its clock granularity to
    the sender, and the fact that it is driving MAD off its SRTT. Then
    the sender can use a formula for RTO derived from the value of MAD
    that it calculates the receiver will be using. Then its RTO will be
    completely tailored to the RTT of the flow. <br>
    <br>
    Note: There are two different uses for the min RTO that need to be
    separated:<br>
        a) Before an initial RTT value has been measured, to determine
    the RTO during the 3WHS.<br>
        b) Once either end has measured the RTT for a connection.<br>
    (a) needs to cope with the whole range of possible RTTs, whereas (b)
    is the subject of this email, because it can be tailored for the
    measured RTT.<br>
    <br>
    <b>2/ The problem, and its prevalence</b><b><br>
    </b><br>
    With gradual removal of bufferbloat and more prevalent usage of
    CDNs, typical base RTTs on the public Internet now make the value of
    minRTO and of MAD look silly.<br>
    <br>
    As can be seen above, the problem is indeed that each end only has
    partial knowledge of the config of the other end.<br>
    However, the problem is not just that MAD needs to be communicated
    to the other end so it can be hard-coded to a lower value.<br>
    The problem is that MAD is hard-coded in the first place.<br>
    <br>
    The draft needs to say how prevalent the problem is (on the public
    Internet) where the sender has to wait for the receiver's delayed
    ACK timer at the end of a flow or between the end of a volley of
    packets and the start of the next. <br>
    <br>
    The draft also needs to say what tradeoff is considered acceptable
    between a residual level of spurious retransmissions and lower
    timeout delay. Eliminating all spurious retransmissions is not the
    goal.<br>
    <br>
    The draft also needs to say that introducing a new TCP Option is
    itself a problem (on the public Internet), because of middleboxes
    particularly proxies. Therefore a solution that does not need a new
    TCP Option would be preferable....<br>
    <br>
    Perhaps the solution for communicating timestamp resolution in
    draft-scheffenegger-tcpm-timestamp-negotiation-05 (which cites
    draft-trammell-tcpm-timestamp-interval-01) could be modified to also
    communicate:<br>
    * TCP's clock granularity (closely related to TCP timestamp
    resolution), <br>
    *  and the fact that the host is calculating MAD as a function of
    RTT and granularity. <br>
    Then the existing timestamp option could be repurposed, which should
    drastically reduce deployment problems.<br>
    <br>
    <b>3/ Only DC?</b><b><br>
    </b><br>
    All the related work references are solely in the context of a DC.
    Pls include refs about this problem in a public Internet context.
    You will find there is a pretty good search engine at
    <a class="moz-txt-link-abbreviated" href="http://www.google.com">www.google.com</a>.<br>
    <br>
    The only non-DC ref I can find about minRTO is [Psaras07], which is
    mainly about a proposal to apply minRTO if the sender expects the
    next ACK to be delayed. Nonetheless, the simulation experiment in
    Section 5.1 provides good evidence for how RTO latency is dependent
    on uncertainty about the MAD that the other end is using.<br>
    <br>
    [Psaras07] Psaras, I. &amp; Tsaoussidis, V., "The TCP Minimum RTO
    Revisited," In: Proc. 6th Int'l IFIP-TC6 Conference on Ad Hoc and
    Sensor Networks, Wireless Networks, Next Generation Internet
    NETWORKING'07 pp.981-991 Springer-Verlag (2007)<br>
<a class="moz-txt-link-freetext" href="https://www.researchgate.net/publication/225442912_The_TCP_Minimum_RTO_Revisited">https://www.researchgate.net/publication/225442912_The_TCP_Minimum_RTO_Revisited</a><br>
    <br>
    <b>4/ Status</b><b><br>
    </b><br>
    Normally, I wouldn't want to hold up a draft that has been proven
    over years of practice, such as the technique in low-latency-opt,
    which has been proven in Google's DCs over the last few years.
    Whereas, my ideas are just that: ideas, not proven. However, the
    technique in low-latency-opt has only been proven in DC environments
    where the range of RTTs is limited. So, now that you are proposing
    to transplant it onto the public Internet, it also only has the
    status of an unproven idea.<br>
    <br>
    To be clear, as it stands, I do not think low-latency-opt is
    applicable to the public Internet.<br>
    <br>
    <br>
    <b>5/ Nits</b><b><br>
    </b>These nits depart from my promise not comment on details that
    could become irrelevant if you agree with my idea. Hey, whatever,...
    <br>
    <br>
    S.3.5:<br>
    <pre class="newpage">	RTO &lt;- SRTT + max(G, K*RTTVAR) + max(G, max_ACK_delay)</pre>
    My immediate reaction to this was that G should not appear twice.
    However, perhaps you meant them to be G_s and G_r (sender and
    receiver) respectively. {Note 2}<br>
    <br>
    S.3.5 &amp; S.5. It seems unnecessary to prohibit values of MAD
    greater than the default (given some companies are already investing
    in commercial public space flight programmes, so TCP could need to
    routinely support RTTs that are longer than typical not just
    shorter).<br>
    <br>
    <br>
    Cheers<br>
    <br>
    <br>
    <br>
    Bob<br>
    <br>
    <b><br>
    </b><b>{Note 1}</b>: On average, if not app-limited, the time
    between ACKs will be d_r*R_r/W_s where:<br>
       R is SRTT<br>
       d is the delayed ACK factor, e.g. d=2 for ACKing every other
    packet<br>
       W is the window in units of segments<br>
       subscripts X_r or X_s denote receiver or sender for the
    half-connection.<br>
    <br>
    So as long as the receiver can estimate the varying value of W at
    the sender, the receiver's MAD could be <br>
        MAD_r = max(k*d_r*R_r / W_s, G_r), <br>
    The factor k (lower case) allows for some bunching of packets e.g.
    due to link layer aggregation or the residual effects of slow-start,
    which leaves some bunching even if SS uses pacing. Let's say k=2,
    but it would need to be checked empirically.<br>
    <br>
    For example, take R=100us, d=2, W=8 and G = 1us.<br>
    Given d*R/W = 25us, MAD could be perhaps 50us (i.e. k=2). k might
    need to be greater, but there would certainly be no need for MAD to
    be 5ms, which is perhaps 100 times greater than necessary.<br>
    <b><br>
    </b><b>{Note 2}</b>: Why is there no field in the Low Latency option
    to communicate receiver clock granularity to the sender?<br>
    <br>
    <br>
    Bob<br>
    <br>
    <pre class="moz-signature" cols="72">-- 
________________________________________________________________
Bob Briscoe                               <a class="moz-txt-link-freetext" href="http://bobbriscoe.net/">http://bobbriscoe.net/</a></pre>
  </body>
</html>

--------------3828722705E8406C00D6F410--


From nobody Fri Aug  4 09:55:28 2017
Return-Path: <weiwan@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0832A1321EA for <tcpm@ietfa.amsl.com>; Fri,  4 Aug 2017 09:55:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.7
X-Spam-Level: 
X-Spam-Status: No, score=-2.7 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EpUEPCZ0Yupk for <tcpm@ietfa.amsl.com>; Fri,  4 Aug 2017 09:55:22 -0700 (PDT)
Received: from mail-vk0-x22e.google.com (mail-vk0-x22e.google.com [IPv6:2607:f8b0:400c:c05::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BF981131F23 for <tcpm@ietf.org>; Fri,  4 Aug 2017 09:55:21 -0700 (PDT)
Received: by mail-vk0-x22e.google.com with SMTP id u133so8023008vke.3 for <tcpm@ietf.org>; Fri, 04 Aug 2017 09:55:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=SWiJu6vu5N8mMHFarDe9NF9gOc1XnczGvCNMFywLc7U=; b=Hw8ju3V6Gil3N1lsNULVMU0ZoqdS2DppXFqa7lSv5uCtn4e6ARG5iO9cbXFTCE14W1 Dz4dAJJGmLLXO98vyujaglf1ATy/FiCwyWtvhxs9YYjokolWEmsUSjZRrLzZeGyW3l1s jn/zUyzqOaZozx5XDqOoNZyY5FF2IQfpWM33QHvI6cU6K8Fpu41ysBmcRns2mb1bHRgF T/lmBhsCjVi2R9tv6kzkxHfMnXbTeUjbwX8TYjWZRyvU2jkbWpTLb3wcStgbewRAglXy 9b3eHUoJicmbKPmk1mR8FxAK2LzO/ME0uvhqUqqY480ajAoMT+C1cnueJF9KMP9UWJ+A dxTg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=SWiJu6vu5N8mMHFarDe9NF9gOc1XnczGvCNMFywLc7U=; b=AT65j+RtyLOEBBaQsD+eME3yFgtbcgfBPCDIor5dFZptRck9ATi0jUo5fEyU8iQgkK UVad3tMCeTU81zkfOb+U8kOdc4fmuDImLTM/YXXlmVA0MNuj74BdFdSnjtD1CadQvc2V ULGeoIJseXmDfq5i+qFxqpXai66UV7468TetV+LAom9CWy0Z+BtvKJxbVG8y0ufdmHTc YIOe8pKmgt2soo7yDDGkSByG2rl4APwXBfrYR/uQZHJdT0vAxhP5rZd/9vr1a+dcZVMp n9tPaCPBP5z2KzvDZzS/tWLbr4ESvalD4g5562HoIa56CsHPUKPuszDVpP6s2mAtYWVf hAVA==
X-Gm-Message-State: AHYfb5i5vIdIb1Wyyao5/O/fjZfTuEn5ncSnyIZiEei+pX2AqxkHTt+b WWuEZoifGfkveyrZC7ukrBhhnvB12l/7
X-Received: by 10.31.129.78 with SMTP id c75mr1656516vkd.16.1501865720656; Fri, 04 Aug 2017 09:55:20 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.31.70.7 with HTTP; Fri, 4 Aug 2017 09:55:20 -0700 (PDT)
In-Reply-To: <d2570431-8c01-d7fc-5aa3-581d69836923@bobbriscoe.net>
References: <8abadc4d-4165-a5bc-23bb-e4f9258c695b@bobbriscoe.net> <CAK6E8=c4D0QTzMobMQXLZMU5JiBRXXPdYJ0KTqvg08t+G0VDxQ@mail.gmail.com> <CANn89iL+TC6sh=e+keb4Psxz+E6oHV3Mcvsay6UYL2qEKUT6bw@mail.gmail.com> <2131135f-b123-70f0-d464-dac6640d6cd2@bobbriscoe.net> <d2570431-8c01-d7fc-5aa3-581d69836923@bobbriscoe.net>
From: Wei Wang <weiwan@google.com>
Date: Fri, 4 Aug 2017 09:55:20 -0700
Message-ID: <CAEA6p_CN+w6XH-A=zNEc3SL9gnRF-oH5jKD4Kvkxb3=p_PTBUg@mail.gmail.com>
To: Bob Briscoe <ietf@bobbriscoe.net>
Cc: Eric Dumazet <edumazet@google.com>, Yuchung Cheng <ycheng@google.com>,  Neal Cardwell <ncardwell@google.com>, tcpm IETF list <tcpm@ietf.org>
Content-Type: multipart/alternative; boundary="001a11411bcec82dc20555f05c98"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/-a8QLrwjzva1_CPN3FkvyAztkn4>
Subject: Re: [tcpm] Review of draft-wang-tcpm-low-latency-opt-00
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Aug 2017 16:55:26 -0000

--001a11411bcec82dc20555f05c98
Content-Type: text/plain; charset="UTF-8"

Hi Bob,

Thanks a lot for your review and detailed feedback on the draft.
Please see my comments inline below:

On Wed, Aug 2, 2017 at 8:54 AM, Bob Briscoe <ietf@bobbriscoe.net> wrote:

> Wei, Yuchung, Neal and Eric, as authors of draft-wang-tcpm-low-latency-
> opt-00,
>
> I promised a review. It questions the technical logic behind the draft, so
> I haven't bothered to give a detailed review of the wording of the draft,
> because that might be irrelevant if you agree with my arguments.
>
> *1/ MAD by configuration?*
>
>    o  If the user does not specify a MAD value, then the implementation
>       SHOULD NOT specify a MAD value in the Low Latency option.
>
> That sentence triggered my "anti-human-intervention" reflex. My train of
> thought went as follows:
>
> * Let's consider what advice we would give on what MAD value ought to be
> configured.
> * You say that MAD can be smaller in DCs. So I assume your advice would be
> that MAD should depend on RTT {Note 1} and clock granularity {Note 2}.
> * So why configure one value of MAD for all RTTs? That only makes sense in
> DC environments where the range of RTTs is small.
> * However, for the range of RTTs on the public Internet, why not calculate
> MAD from RTT and granularity, then standardize the calculation so that both
> ends arrive at the same result when starting from the same RTT and
> granularity parameters? (The sender and receiver might measure different
> smoothed (SRTT) values, but they will converge as the flow progresses.)
>
> Then the receiver only needs to communicate its clock granularity to the
> sender, and the fact that it is driving MAD off its SRTT. Then the sender
> can use a formula for RTO derived from the value of MAD that it calculates
> the receiver will be using. Then its RTO will be completely tailored to the
> RTT of the flow.
>

First of all, we recommend that operating system should have a per-route
MAD configuration API and a per-connection MAD configuration API. So
different connections could have different MAD values configured. It is not
one value for all.

And in my opinion, what MAD value should be set to is not only depending on
RTT and clock granularity. It also depends on how the application wants the
delayed ack behavior to be. Some application might only send data say every
1ms, so it will delay its ack up to 2ms so that it can always piggy back
the ack to the data.
That is why a per-connection MAD configuration makes sense for the
application to fine tune MAD according to its own demand.

And when user tries to set a new MAD value, we do boundary check to make
sure it is less than the current default MAD value. This is a safety check
to make sure user does not configure something that is worse than current
default value.

About your question in {Note 2} that why receiver does not communicate its
clock granularity to the sender, I don't really see a reason why receiver
side clock granularity is needed. Because the MAD value sent by receiver is
already a value that is rounded to the clock granularity. Say if a user
wants to set MAD to 1ms, and the clock granularity is 10ms, receiver will
send MAD value as 10ms. In the draft, we specify that:

      If specified, then the MAD value in the Low Latency option MUST be
      set, as close as possible, to the implementation's actual delayed
      ACK timeout for the connection.  Note that the actual maximum
      delayed ACK timeout of the connection may be larger than the
      actual user specified value because of implementation constraints
             (e.g. timer granularity limitations).


> Note: There are two different uses for the min RTO that need to be
> separated:
>     a) Before an initial RTT value has been measured, to determine the RTO
> during the 3WHS.
>     b) Once either end has measured the RTT for a connection.
> (a) needs to cope with the whole range of possible RTTs, whereas (b) is
> the subject of this email, because it can be tailored for the measured RTT.
>

Again, we don't think MAD value is only a function of RTT and clock
granularity.


>
> *2/ The problem, and its prevalence*
>
> With gradual removal of bufferbloat and more prevalent usage of CDNs,
> typical base RTTs on the public Internet now make the value of minRTO and
> of MAD look silly.
>
> As can be seen above, the problem is indeed that each end only has partial
> knowledge of the config of the other end.
> However, the problem is not just that MAD needs to be communicated to the
> other end so it can be hard-coded to a lower value.
> The problem is that MAD is hard-coded in the first place.
>
> The draft needs to say how prevalent the problem is (on the public
> Internet) where the sender has to wait for the receiver's delayed ACK timer
> at the end of a flow or between the end of a volley of packets and the
> start of the next.
>

Noted. We will add more contexts on how delayed ack works and why long
delayed ack time is hurting performance. We are also planning on adding
some history about why delayed ack was configured as a constant in the
first place and why the current constant value was chosen.


>
> The draft also needs to say what tradeoff is considered acceptable between
> a residual level of spurious retransmissions and lower timeout delay.
> Eliminating all spurious retransmissions is not the goal.
>

Noted.


>
> The draft also needs to say that introducing a new TCP Option is itself a
> problem (on the public Internet), because of middleboxes particularly
> proxies. Therefore a solution that does not need a new TCP Option would be
> preferable....
>
>
There is already a section in the draft that states the middle box issue:
        5. Middlebox Considerations
Is that portion a good enough explanation on this?


> Perhaps the solution for communicating timestamp resolution in
> draft-scheffenegger-tcpm-timestamp-negotiation-05 (which cites
> draft-trammell-tcpm-timestamp-interval-01) could be modified to also
> communicate:
> * TCP's clock granularity (closely related to TCP timestamp resolution),
> *  and the fact that the host is calculating MAD as a function of RTT and
> granularity.
> Then the existing timestamp option could be repurposed, which should
> drastically reduce deployment problems.
>

I am not sure if this is doable but will look into it.


>
> *3/ Only DC?*
>
> All the related work references are solely in the context of a DC. Pls
> include refs about this problem in a public Internet context. You will find
> there is a pretty good search engine at www.google.com.
>
> The only non-DC ref I can find about minRTO is [Psaras07], which is mainly
> about a proposal to apply minRTO if the sender expects the next ACK to be
> delayed. Nonetheless, the simulation experiment in Section 5.1 provides
> good evidence for how RTO latency is dependent on uncertainty about the MAD
> that the other end is using.
>
> [Psaras07] Psaras, I. & Tsaoussidis, V., "The TCP Minimum RTO Revisited,"
> In: Proc. 6th Int'l IFIP-TC6 Conference on Ad Hoc and Sensor Networks,
> Wireless Networks, Next Generation Internet NETWORKING'07 pp.981-991
> Springer-Verlag (2007)
> https://www.researchgate.net/publication/225442912_The_TCP_
> Minimum_RTO_Revisited
>

Noted. Thanks a lot for the pointers. Will look into them and add to the
draft.


>
>
> *4/ Status*
>
> Normally, I wouldn't want to hold up a draft that has been proven over
> years of practice, such as the technique in low-latency-opt, which has been
> proven in Google's DCs over the last few years. Whereas, my ideas are just
> that: ideas, not proven. However, the technique in low-latency-opt has only
> been proven in DC environments where the range of RTTs is limited. So, now
> that you are proposing to transplant it onto the public Internet, it also
> only has the status of an unproven idea.
>
> To be clear, as it stands, I do not think low-latency-opt is applicable to
> the public Internet.
>


Hmm... I think overall, this approach should not do any harm to the
network. It provides an additional feature to let the user configure the
MAD if the user cares about it. If not, they can leave it as the default
behavior as it is right now.
To your concerns about the RTT variation in the internet, first, as I
explained, this MAD value will be set per connection or per route.
Secondly, I would think it is doable to do some bound check or error
correction on the MAD value set by the user if we find that it is way below
RTT and does not make sense. But again, we don't think MAD value is only a
function of RTT. User should be able to configure it to a value suitable
for his/her need.
We want to make it as a standard so that all operating systems could
implement this in the same way so that they could understand each other.
One use case is that in a cloud environment where different operating
systems are running in the same DC, they should be able to interpret this
option with no issue.


>
> *5/ Nits*
> These nits depart from my promise not comment on details that could become
> irrelevant if you agree with my idea. Hey, whatever,...
>
> S.3.5:
>
> 	RTO <- SRTT + max(G, K*RTTVAR) + max(G, max_ACK_delay)
>
> My immediate reaction to this was that G should not appear twice. However,
> perhaps you meant them to be G_s and G_r (sender and receiver)
> respectively. {Note 2}
>
>
As explained earlier, clock granularity of the receiver is already being
considered in the MAD value itself. In the above formula, both G are the
clock granularity on the sender side.


> S.3.5 & S.5. It seems unnecessary to prohibit values of MAD greater than
> the default (given some companies are already investing in commercial
> public space flight programmes, so TCP could need to routinely support RTTs
> that are longer than typical not just shorter).
>
>

Noted. Will take consideration of this.


>
>
Cheers
>
>
>
> Bob
>
>
> *{Note 1}*: On average, if not app-limited, the time between ACKs will be
> d_r*R_r/W_s where:
>    R is SRTT
>    d is the delayed ACK factor, e.g. d=2 for ACKing every other packet
>    W is the window in units of segments
>    subscripts X_r or X_s denote receiver or sender for the half-connection.
>
> So as long as the receiver can estimate the varying value of W at the
> sender, the receiver's MAD could be
>     MAD_r = max(k*d_r*R_r / W_s, G_r),
> The factor k (lower case) allows for some bunching of packets e.g. due to
> link layer aggregation or the residual effects of slow-start, which leaves
> some bunching even if SS uses pacing. Let's say k=2, but it would need to
> be checked empirically.
>
> For example, take R=100us, d=2, W=8 and G = 1us.
> Given d*R/W = 25us, MAD could be perhaps 50us (i.e. k=2). k might need to
> be greater, but there would certainly be no need for MAD to be 5ms, which
> is perhaps 100 times greater than necessary.
>
> *{Note 2}*: Why is there no field in the Low Latency option to
> communicate receiver clock granularity to the sender?
>
>
> Bob
>
> --
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
>
>

--001a11411bcec82dc20555f05c98
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Bob,<div><br></div><div>Thanks a lot for your review an=
d detailed feedback on the draft.</div><div>Please see my comments inline b=
elow:</div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Wed=
, Aug 2, 2017 at 8:54 AM, Bob Briscoe <span dir=3D"ltr">&lt;<a href=3D"mail=
to:ietf@bobbriscoe.net" target=3D"_blank">ietf@bobbriscoe.net</a>&gt;</span=
> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0=
.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
 =20
   =20
 =20
  <div bgcolor=3D"#FFFFFF">
    Wei, Yuchung, Neal and Eric, as authors of
    draft-wang-tcpm-low-latency-<wbr>opt-00,<br>
    <br>
    I promised a review. It questions the technical logic behind the
    draft, so I haven&#39;t bothered to give a detailed review of the
    wording of the draft, because that might be irrelevant if you agree
    with my arguments.<br>
    <br>
    <b>1/ MAD by configuration?</b><b><br>
    </b>
    <pre class=3D"gmail-m_3610993658251518594newpage">   o  If the user doe=
s not specify a MAD value, then the implementation
      SHOULD NOT specify a MAD value in the Low Latency option.
</pre>
    That sentence triggered my &quot;anti-human-intervention&quot; reflex. =
My
    train of thought went as follows:<br>
    <br>
    * Let&#39;s consider what advice we would give on what MAD value ought
    to be configured.<br>
    * You say that MAD can be smaller in DCs. So I assume your advice
    would be that MAD should depend on RTT {Note 1} and clock
    granularity {Note 2}.<br>
    * So why configure one value of MAD for all RTTs? That only makes
    sense in DC environments where the range of RTTs is small. <br>
    * However, for the range of RTTs on the public Internet, why not
    calculate MAD from RTT and granularity, then standardize the
    calculation so that both ends arrive at the same result when
    starting from the same RTT and granularity parameters? (The sender
    and receiver might measure different smoothed (SRTT) values, but
    they will converge as the flow progresses.)<br>
    <br>
    Then the receiver only needs to communicate its clock granularity to
    the sender, and the fact that it is driving MAD off its SRTT. Then
    the sender can use a formula for RTO derived from the value of MAD
    that it calculates the receiver will be using. Then its RTO will be
    completely tailored to the RTT of the flow. <br></div></blockquote><div=
><br></div><div>First of all, we recommend that operating system should hav=
e a per-route MAD configuration API and a per-connection MAD configuration =
API. So different connections could have different MAD values configured. I=
t is not one value for all.</div><div><br></div><div>And in my opinion, wha=
t MAD value should be set to is not only depending on RTT and clock granula=
rity. It also depends on how the application wants the delayed ack behavior=
 to be. Some application might only send data say every 1ms, so it will del=
ay its ack up to 2ms so that it can always piggy back the ack to the data.<=
/div><div>That is why a per-connection MAD configuration makes sense for th=
e application to fine tune MAD according to its own demand.</div><div><br><=
/div><div>And when user tries to set a new MAD value, we do boundary check =
to make sure it is less than the current default MAD value. This is a safet=
y check to make sure user does not configure something that is worse than c=
urrent default value.</div><div><br></div><div>About your question in {Note=
 2} that why receiver does not communicate its clock granularity to the sen=
der, I don&#39;t really see a reason why receiver side clock granularity is=
 needed. Because the MAD value sent by receiver is already a value that is =
rounded to the clock granularity. Say if a user wants to set MAD to 1ms, an=
d the clock granularity is 10ms, receiver will send MAD value as 10ms. In t=
he draft, we specify that:</div><div><br></div><div>=C2=A0 =C2=A0 =C2=A0 If=
 specified, then the MAD value in the Low Latency option MUST be</div><div>=
=C2=A0 =C2=A0 =C2=A0 set, as close as possible, to the implementation&#39;s=
 actual delayed</div><div>=C2=A0 =C2=A0 =C2=A0 ACK timeout for the connecti=
on.=C2=A0 Note that the actual maximum</div><div>=C2=A0 =C2=A0 =C2=A0 delay=
ed ACK timeout of the connection may be larger than the</div><div>=C2=A0 =
=C2=A0 =C2=A0 actual user specified value because of implementation constra=
ints=C2=A0</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(e.g. =
timer granularity limitations). =C2=A0</div><div><br></div><div><br></div><=
blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l=
eft:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor=3D"#FFFFFF">
    <br>
    Note: There are two different uses for the min RTO that need to be
    separated:<br>
    =C2=A0=C2=A0=C2=A0 a) Before an initial RTT value has been measured, to=
 determine
    the RTO during the 3WHS.<br>
    =C2=A0=C2=A0=C2=A0 b) Once either end has measured the RTT for a connec=
tion.<br>
    (a) needs to cope with the whole range of possible RTTs, whereas (b)
    is the subject of this email, because it can be tailored for the
    measured RTT.<br></div></blockquote><div><br></div><div>Again, we don&#=
39;t think MAD value is only a function of RTT and clock granularity.</div>=
<div><br></div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"=
margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-lef=
t:1ex"><div bgcolor=3D"#FFFFFF">
    <br>
    <b>2/ The problem, and its prevalence</b><b><br>
    </b><br>
    With gradual removal of bufferbloat and more prevalent usage of
    CDNs, typical base RTTs on the public Internet now make the value of
    minRTO and of MAD look silly.<br>
    <br>
    As can be seen above, the problem is indeed that each end only has
    partial knowledge of the config of the other end.<br>
    However, the problem is not just that MAD needs to be communicated
    to the other end so it can be hard-coded to a lower value.<br>
    The problem is that MAD is hard-coded in the first place.<br>
    <br>
    The draft needs to say how prevalent the problem is (on the public
    Internet) where the sender has to wait for the receiver&#39;s delayed
    ACK timer at the end of a flow or between the end of a volley of
    packets and the start of the next. <br></div></blockquote><div><br></di=
v><div>Noted. We will add more contexts on how delayed ack works and why lo=
ng delayed ack time is hurting performance. We are also planning on adding =
some history about why delayed ack was configured as a constant in the firs=
t place and why the current constant value was chosen.</div><div>=C2=A0</di=
v><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;borde=
r-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor=3D"#FFFFFF=
">
    <br>
    The draft also needs to say what tradeoff is considered acceptable
    between a residual level of spurious retransmissions and lower
    timeout delay. Eliminating all spurious retransmissions is not the
    goal.<br></div></blockquote><div><br></div><div>Noted.</div><div>=C2=A0=
<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8=
ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor=3D=
"#FFFFFF">
    <br>
    The draft also needs to say that introducing a new TCP Option is
    itself a problem (on the public Internet), because of middleboxes
    particularly proxies. Therefore a solution that does not need a new
    TCP Option would be preferable....<br>
    <br></div></blockquote><div><br></div><div>There is already a section i=
n the draft that states the middle box issue:</div><div>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 5. Middlebox Considerations=C2=A0</div><div>Is that portion a go=
od enough explanation on this?</div><div>=C2=A0</div><blockquote class=3D"g=
mail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204=
,204,204);padding-left:1ex"><div bgcolor=3D"#FFFFFF">
    Perhaps the solution for communicating timestamp resolution in
    draft-scheffenegger-tcpm-<wbr>timestamp-negotiation-05 (which cites
    draft-trammell-tcpm-timestamp-<wbr>interval-01) could be modified to al=
so
    communicate:<br>
    * TCP&#39;s clock granularity (closely related to TCP timestamp
    resolution), <br>
    *=C2=A0 and the fact that the host is calculating MAD as a function of
    RTT and granularity. <br>
    Then the existing timestamp option could be repurposed, which should
    drastically reduce deployment problems.<br></div></blockquote><div><br>=
</div><div>I am not sure if this is doable but will look into it.</div><div=
><br></div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"marg=
in:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1e=
x"><div bgcolor=3D"#FFFFFF">
    <br>
    <b>3/ Only DC?</b><b><br>
    </b><br>
    All the related work references are solely in the context of a DC.
    Pls include refs about this problem in a public Internet context.
    You will find there is a pretty good search engine at
    <a class=3D"gmail-m_3610993658251518594moz-txt-link-abbreviated" href=
=3D"http://www.google.com" target=3D"_blank">www.google.com</a>.<br>
    <br>
    The only non-DC ref I can find about minRTO is [Psaras07], which is
    mainly about a proposal to apply minRTO if the sender expects the
    next ACK to be delayed. Nonetheless, the simulation experiment in
    Section 5.1 provides good evidence for how RTO latency is dependent
    on uncertainty about the MAD that the other end is using.<br>
    <br>
    [Psaras07] Psaras, I. &amp; Tsaoussidis, V., &quot;The TCP Minimum RTO
    Revisited,&quot; In: Proc. 6th Int&#39;l IFIP-TC6 Conference on Ad Hoc =
and
    Sensor Networks, Wireless Networks, Next Generation Internet
    NETWORKING&#39;07 pp.981-991 Springer-Verlag (2007)<br>
<a class=3D"gmail-m_3610993658251518594moz-txt-link-freetext" href=3D"https=
://www.researchgate.net/publication/225442912_The_TCP_Minimum_RTO_Revisited=
" target=3D"_blank">https://www.researchgate.net/<wbr>publication/225442912=
_The_TCP_<wbr>Minimum_RTO_Revisited</a></div></blockquote><div><br></div><d=
iv>Noted. Thanks a lot for the pointers. Will look into them and add to the=
 draft.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:=
1ex"><div bgcolor=3D"#FFFFFF"><br>
    <br>
    <b>4/ Status</b><b><br>
    </b><br>
    Normally, I wouldn&#39;t want to hold up a draft that has been proven
    over years of practice, such as the technique in low-latency-opt,
    which has been proven in Google&#39;s DCs over the last few years.
    Whereas, my ideas are just that: ideas, not proven. However, the
    technique in low-latency-opt has only been proven in DC environments
    where the range of RTTs is limited. So, now that you are proposing
    to transplant it onto the public Internet, it also only has the
    status of an unproven idea.<br>
    <br>
    To be clear, as it stands, I do not think low-latency-opt is
    applicable to the public Internet.<br></div></blockquote><div><br></div=
><div><br></div><div>Hmm... I think overall, this approach should not do an=
y harm to the network. It provides an additional feature to let the user co=
nfigure the MAD if the user cares about it. If not, they can leave it as th=
e default behavior as it is right now.</div><div>To your concerns about the=
 RTT variation in the internet, first, as I explained, this MAD value will =
be set per connection or per route. Secondly, I would think it is doable to=
 do some bound check or error correction on the MAD value set by the user i=
f we find that it is way below RTT and does not make sense. But again, we d=
on&#39;t think MAD value is only a function of RTT. User should be able to =
configure it to a value suitable for his/her need.</div><div>We want to mak=
e it as a standard so that all operating systems could implement this in th=
e same way so that they could understand each other. One use case is that i=
n a cloud environment where different operating systems are running in the =
same DC, they should be able to interpret this option with no issue.</div><=
div><br></div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"m=
argin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left=
:1ex"><div bgcolor=3D"#FFFFFF">
    <br>
    <b>5/ Nits</b><b><br>
    </b>These nits depart from my promise not comment on details that
    could become irrelevant if you agree with my idea. Hey, whatever,...
    <br>
    <br>
    S.3.5:<br>
    <pre class=3D"gmail-m_3610993658251518594newpage">	RTO &lt;- SRTT + max=
(G, K*RTTVAR) + max(G, max_ACK_delay)</pre>
    My immediate reaction to this was that G should not appear twice.
    However, perhaps you meant them to be G_s and G_r (sender and
    receiver) respectively. {Note 2}<br>
    <br></div></blockquote><div><br></div><div>As explained earlier, clock =
granularity of the receiver is already being considered in the MAD value it=
self. In the above formula, both G are the clock granularity on the sender =
side.</div><div><br></div><div>=C2=A0</div><blockquote class=3D"gmail_quote=
" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);=
padding-left:1ex"><div bgcolor=3D"#FFFFFF">
    S.3.5 &amp; S.5. It seems unnecessary to prohibit values of MAD
    greater than the default (given some companies are already investing
    in commercial public space flight programmes, so TCP could need to
    routinely support RTTs that are longer than typical not just
    shorter).<br>=C2=A0 =C2=A0 </div></blockquote><div><br></div><div>Noted=
. Will take consideration of this.</div><div>=C2=A0</div><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex"><div bgcolor=3D"#FFFFFF">=C2=A0<br></div><=
/blockquote><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0=
.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor=
=3D"#FFFFFF">
    Cheers<br>
    <br>
    <br>
    <br>
    Bob<br>
    <br>
    <b><br>
    </b><b>{Note 1}</b>: On average, if not app-limited, the time
    between ACKs will be d_r*R_r/W_s where:<br>
    =C2=A0=C2=A0 R is SRTT<br>
    =C2=A0=C2=A0 d is the delayed ACK factor, e.g. d=3D2 for ACKing every o=
ther
    packet<br>
    =C2=A0=C2=A0 W is the window in units of segments<br>
    =C2=A0=C2=A0 subscripts X_r or X_s denote receiver or sender for the
    half-connection.<br>
    <br>
    So as long as the receiver can estimate the varying value of W at
    the sender, the receiver&#39;s MAD could be <br>
    =C2=A0=C2=A0=C2=A0 MAD_r =3D max(k*d_r*R_r / W_s, G_r), <br>
    The factor k (lower case) allows for some bunching of packets e.g.
    due to link layer aggregation or the residual effects of slow-start,
    which leaves some bunching even if SS uses pacing. Let&#39;s say k=3D2,
    but it would need to be checked empirically.<br>
    <br>
    For example, take R=3D100us, d=3D2, W=3D8 and G =3D 1us.<br>
    Given d*R/W =3D 25us, MAD could be perhaps 50us (i.e. k=3D2). k might
    need to be greater, but there would certainly be no need for MAD to
    be 5ms, which is perhaps 100 times greater than necessary.<br>
    <b><br>
    </b><b>{Note 2}</b>: Why is there no field in the Low Latency option
    to communicate receiver clock granularity to the sender?<span class=3D"=
gmail-HOEnZb"><font color=3D"#888888"><br>
    <br>
    <br>
    Bob<br>
    <br>
    <pre class=3D"gmail-m_3610993658251518594moz-signature" cols=3D"72">--=
=20
______________________________<wbr>______________________________<wbr>____
Bob Briscoe                               <a class=3D"gmail-m_3610993658251=
518594moz-txt-link-freetext" href=3D"http://bobbriscoe.net/" target=3D"_bla=
nk">http://bobbriscoe.net/</a></pre>
  </font></span></div>

</blockquote></div><br></div></div>

--001a11411bcec82dc20555f05c98--


From nobody Fri Aug  4 15:21:02 2017
Return-Path: <ncardwell@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4DB541204DA for <tcpm@ietfa.amsl.com>; Fri,  4 Aug 2017 15:21:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level: 
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z05apmDULXTE for <tcpm@ietfa.amsl.com>; Fri,  4 Aug 2017 15:20:56 -0700 (PDT)
Received: from mail-qt0-x229.google.com (mail-qt0-x229.google.com [IPv6:2607:f8b0:400d:c0d::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F36B012783A for <tcpm@ietf.org>; Fri,  4 Aug 2017 15:20:55 -0700 (PDT)
Received: by mail-qt0-x229.google.com with SMTP id p3so16663941qtg.2 for <tcpm@ietf.org>; Fri, 04 Aug 2017 15:20:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=8foj1PUuzFunimwO4VxRM4ypRagLNleKFRa0a96dSt4=; b=XUFBKi/04MEoORTx5pidZ6oXPWmcVmmEBHx8/UFodBUwCNXQik6IlhZgGOd1n4Z6gV 3AwrEjI7aGGmpl4knZFtaTY7RtE/IfxvXaYSgU9sE3TPBY3+oghCsyihIwkQ/yZ+DI4Y rC9WoisMdnZLF0p2es2go7guNn4o0UxU0NPeIj7kbWx7voKrX+fW22VCqoubtN2K6X1Z 2f7tPap0N/d7VDphh40gyk0WgpUZ4ODbuZsVyyLCyFnDBBwCRBRIn6Aj+8O2gCfHNOV8 6Yjg+yU8u6XCJ6wBkb7BEmSRb0+r4Xec1MKrgecoencuiw2LRJ/GUNJfDT8BfcYONbtv LXTg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=8foj1PUuzFunimwO4VxRM4ypRagLNleKFRa0a96dSt4=; b=D2U42NWZZkDGKaxyg7VcgbgBero86p4vLzPR0cnFYdcDyulCThVXDNpMbKfS3MvWw+ t/27ik/RbAgtiRMKFPj00DQkf6sjrue5awTGjr1DDPTbmpb6Piqx9Owo/CP+sWuVO/UO l5A0xkAkU5YOMSPkePf2dYWZAqhdgfMV7sUHTxukIZsrZMBcxjhIPjAuphCjHCcB53Eo UxvtjjJSDmaXcgkaQMcytvyJRhHg7nZNq+1s6/+QNpWKqSoXtAQAP9QyTyJZzIzUSCm5 schml2ocOWf8qLmt7vaxiZGf2TlopmDaAf7lKwCSiqPvKJVnMt4Wc0IFXsNeRjfAO0Fg GpFA==
X-Gm-Message-State: AHYfb5iGEFtt+FRDx+iKkibllzIC+PZIVxbUiXdAAaMg8mC2pT6geJIZ IRDeHq+FnxcsQd6qOOdOIAe+KpDmj1eM
X-Received: by 10.200.8.106 with SMTP id x39mr5586856qth.309.1501885254693; Fri, 04 Aug 2017 15:20:54 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.12.176.119 with HTTP; Fri, 4 Aug 2017 15:20:24 -0700 (PDT)
In-Reply-To: <d2570431-8c01-d7fc-5aa3-581d69836923@bobbriscoe.net>
References: <8abadc4d-4165-a5bc-23bb-e4f9258c695b@bobbriscoe.net> <CAK6E8=c4D0QTzMobMQXLZMU5JiBRXXPdYJ0KTqvg08t+G0VDxQ@mail.gmail.com> <CANn89iL+TC6sh=e+keb4Psxz+E6oHV3Mcvsay6UYL2qEKUT6bw@mail.gmail.com> <2131135f-b123-70f0-d464-dac6640d6cd2@bobbriscoe.net> <d2570431-8c01-d7fc-5aa3-581d69836923@bobbriscoe.net>
From: Neal Cardwell <ncardwell@google.com>
Date: Fri, 4 Aug 2017 18:20:24 -0400
Message-ID: <CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com>
To: Bob Briscoe <ietf@bobbriscoe.net>
Cc: Eric Dumazet <edumazet@google.com>, Yuchung Cheng <ycheng@google.com>, Wei Wang <weiwan@google.com>, tcpm IETF list <tcpm@ietf.org>
Content-Type: multipart/alternative; boundary="001a113fcc3019e8c40555f4e903"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/FmwyMH68zabbCRge9fimKvysgW4>
Subject: Re: [tcpm] Review of draft-wang-tcpm-low-latency-opt-00
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Aug 2017 22:21:00 -0000

--001a113fcc3019e8c40555f4e903
Content-Type: text/plain; charset="UTF-8"

Thanks, Bob, for your detailed and thoughtful review! This is very
insightful and useful.

Sorry I'm coming to this discussion a little late. I wanted to add a few
points, beyond what Wei has already noted.

On Wed, Aug 2, 2017 at 11:54 AM, Bob Briscoe <ietf@bobbriscoe.net> wrote:

> Wei, Yuchung, Neal and Eric, as authors of draft-wang-tcpm-low-latency-op
> t-00,
>
> I promised a review. It questions the technical logic behind the draft, so
> I haven't bothered to give a detailed review of the wording of the draft,
> because that might be irrelevant if you agree with my arguments.
>
> *1/ MAD by configuration?*
>
>    o  If the user does not specify a MAD value, then the implementation
>       SHOULD NOT specify a MAD value in the Low Latency option.
>
> That sentence triggered my "anti-human-intervention" reflex. My train of
> thought went as follows:
>

Bob's remark about his "anti-human-intervention" reflex being
triggered got me thinking.

I, too, would like to minimize the amount of human (application)
intervention this proposal involves (to avoid errors, maintenance,
etc).

It occurs to me that actually at Google our experience has shown that
indeed apps have repeatedly made mistakes with this value, and we have
found it convenient to progressively narrow their freedom in tuning
this knob. To the point where actually in our deployment there is very
little freedom left. Because in reality the OS and TCP stack
developers know the timer granularity considerations, and the apps
don't (and tend to use values 5 years out of date). So we've found it
useful to have the OS tightly clamp the app's request for a MAD value.

So in the interests of simplicity and avoiding human intervention,
what if we do not have the MAD value as part of the API, but rather
just allow the API to express a single "please use MAD" bit? And then
the transport implementation uses the smallest value that it can
support on this end host.

Can we go further, and make MAD an automatic feature of the TCP
implementation (so the transport implementation hard-wires MAD to "on"
or "off")? My sense is that we don't want to go that far, and that
instead we want to still allow apps to decide whether to use the
"please use MAD" bit. Why? There may be middlebox or remote host
compatibility issues with MAD. So we want apps (like browsers) to be
able to do A/B experiments to validate that sending the MAD option on
SYNs does not cause problems. We don't want to turn on MAD in Linux
and then find compatibility issues, and have to wait for a client OS
upgrade to everyone's cell phone to turn off MAD; instead we want to
only have to wait for an app update.

So... suppose an app decides it is latency-sensitive and wants to
reduce ACK delays and negotiate a MAD value. And furthermore, the app
is either (a) doing A/B experiments, or (b) has already convinced
itself that MAD will work on this path.

Then the app could enable MAD with a simple API like:
   int mad = 1; // enable
   err = setsockopt(fd, SOL_TCP, TCP_MAD, &mad, sizeof(mad));

For better or for worse, that makes the TCP_MAD option much like the
TCP_NODELAY option. Both in the sense that latency sensitive apps
should remember to set this bit if they want low-latency behavior. And
in the sense that the APIs would look very similar. And TCP_NODELAY
and TCP_MAD would be sort of complimentary: TCP_NODELAY is the app
saying "I want low latency for my sends" and TCP_MAD is the app
saying "I want low latency for my ACKs". My guess is that most
low-latency apps will want both.

For the MAD API, I think this might be the "as simple as possible, but
no simpler point".

That said, that's an API issue. And I think for TCPM we should focus
more on the wire protocol issues.


> * Let's consider what advice we would give on what MAD value ought to be
> configured.
>

I would suggest that the advice be that when an app requests TCP_MAD,
then transport implementors would have the transport implementation
use the lowest feasible value based on the end host hardware/OS/app
capabilities and workloads. Our sense from our deployment at Google
is that for many current technologies and workloads this is probably
currently in the range of 5ms - 10ms.

But I don't think we should get bogged down in a discussion of what this
configured value ought to be. I think we should focus on the simplest
protocol mechanism that can convey to the remote host the minimum
info needed for the remote transport endpoint to achieve excellent
performance.

Here I think of the MSS option as a good analogy (and that's why we
suggested the name "MAD").

For MSS, the point is not to spend time discussing what MSS should be
used, or to come up with complicated formulas to derive MSS. The point
is to have a simple but general mechanism so that, no matter what the
MSS value is (or the underlying hardware constraints are), there is a
simple option that can convey a hint to the remote host. Then the
remote host can use that hint to tune its sending behavior to achieve
good performance.

Now substitute "MAD" in the place of "MSS" in the preceding paragraph. :-)


> * You say that MAD can be smaller in DCs. So I assume your advice would be
> that MAD should depend on RTT {Note 1} and clock granularity {Note 2}.
>

Personally I do not think that MAD should depend on RTT. And I don't think
the draft says that it should (though let me know if there is some spot I
didn't notice).

I'd vote for keeping MAD as simple as possible, which means keeping RTT out
of it. :-)

* So why configure one value of MAD for all RTTs? That only makes sense in
> DC environments where the range of RTTs is small.
>

I'd recommend one value of MAD for all RTTs for the sake of simplicity. If
we keep MAD as simple as possible, then it stays just about the practical
delay limitations of the end host (OS timers, CPU power, CPU load, app
behavior, end host queuing delays, etc). That is what we have found makes
sense in our deployment. And note that our deployment of a MAD-like option
covers RTTs that span quite a range, from <1 ms up to hundreds of ms.

Most OSes I know already have a constant that defines the maximum interval
over which they can delay their ACKs. We are basically just suggesting a
simple wire format for transport endpoints to advertise this existing value
as a hint.


> * However, for the range of RTTs on the public Internet, why not calculate
> MAD from RTT and granularity, then standardize the calculation so that both
> ends arrive at the same result when starting from the same RTT and
> granularity parameters? (The sender and receiver might measure different
> smoothed (SRTT) values, but they will converge as the flow progresses.)
>
> Then the receiver only needs to communicate its clock granularity to the
> sender, and the fact that it is driving MAD off its SRTT. Then the sender
> can use a formula for RTO derived from the value of MAD that it calculates
> the receiver will be using. Then its RTO will be completely tailored to the
> RTT of the flow.
>

A couple questions here:

- Why  should we add the complexity of making MAD dependent on RTT? I'm not
clear on what the argument would be for the benefit of introducing this
complexity.

- Even if the receiver only communicates its clock granularity to the
sender, and the fact that it is driving MAD off its SRTT, then there's a
the question of *how* it is deriving MAD. Presumably this could change, as
we come up with better ideas. So then we would want a version number field
to indicate which calculation is being used. It seems much simpler to me to
allow the end point to just communicate a numerical delay value, rather
than negotiate a version number of a formula that can take a clock
granularity and RTT as input and produce a delay as output.

- Introducing RTT as a dependence also introduces the question of what to
do when there is no RTT estimate (because all packets so far have been
retransmitted, with no timestamps). And as we discussed in Prague and you
mention here, the two sides often have slightly different RTT estimates.
There are probably other wrinkles as well.


>
> Note: There are two different uses for the min RTO that need to be
> separated:
>     a) Before an initial RTT value has been measured, to determine the RTO
> during the 3WHS.
>     b) Once either end has measured the RTT for a connection.
> (a) needs to cope with the whole range of possible RTTs, whereas (b) is
> the subject of this email, because it can be tailored for the measured RTT.
>
> *2/ The problem, and its prevalence*
>
> With gradual removal of bufferbloat and more prevalent usage of CDNs,
> typical base RTTs on the public Internet now make the value of minRTO and
> of MAD look silly.
>
> As can be seen above, the problem is indeed that each end only has partial
> knowledge of the config of the other end.
> However, the problem is not just that MAD needs to be communicated to the
> other end so it can be hard-coded to a lower value.
> The problem is that MAD is hard-coded in the first place.
>
> The draft needs to say how prevalent the problem is (on the public
> Internet) where the sender has to wait for the receiver's delayed ACK timer
> at the end of a flow or between the end of a volley of packets and the
> start of the next.
>
> The draft also needs to say what tradeoff is considered acceptable between
> a residual level of spurious retransmissions and lower timeout delay.
> Eliminating all spurious retransmissions is not the goal.
>
> The draft also needs to say that introducing a new TCP Option is itself a
> problem (on the public Internet), because of middleboxes particularly
> proxies. Therefore a solution that does not need a new TCP Option would be
> preferable....
>
> Perhaps the solution for communicating timestamp resolution in
> draft-scheffenegger-tcpm-timestamp-negotiation-05 (which cites
> draft-trammell-tcpm-timestamp-interval-01) could be modified to also
> communicate:
> * TCP's clock granularity (closely related to TCP timestamp resolution),
> *  and the fact that the host is calculating MAD as a function of RTT and
> granularity.
> Then the existing timestamp option could be repurposed, which should
> drastically reduce deployment problems.
>
> *3/ Only DC?*
>
> All the related work references are solely in the context of a DC. Pls
> include refs about this problem in a public Internet context. You will find
> there is a pretty good search engine at www.google.com.
>
> The only non-DC ref I can find about minRTO is [Psaras07], which is mainly
> about a proposal to apply minRTO if the sender expects the next ACK to be
> delayed. Nonetheless, the simulation experiment in Section 5.1 provides
> good evidence for how RTO latency is dependent on uncertainty about the MAD
> that the other end is using.
>
> [Psaras07] Psaras, I. & Tsaoussidis, V., "The TCP Minimum RTO Revisited,"
> In: Proc. 6th Int'l IFIP-TC6 Conference on Ad Hoc and Sensor Networks,
> Wireless Networks, Next Generation Internet NETWORKING'07 pp.981-991
> Springer-Verlag (2007)
> https://www.researchgate.net/publication/225442912_The_TCP_M
> inimum_RTO_Revisited
>

All great points. Thanks!


>
> *4/ Status*
>
> Normally, I wouldn't want to hold up a draft that has been proven over
> years of practice, such as the technique in low-latency-opt, which has been
> proven in Google's DCs over the last few years. Whereas, my ideas are just
> that: ideas, not proven. However, the technique in low-latency-opt has only
> been proven in DC environments where the range of RTTs is limited. So, now
> that you are proposing to transplant it onto the public Internet, it also
> only has the status of an unproven idea.
>
> To be clear, as it stands, I do not think low-latency-opt is applicable to
> the public Internet.
>

Can you please elaborate on this? Is this because you think there ought to
be a dependence on RTT?


>
>
> *5/ Nits*
> These nits depart from my promise not comment on details that could become
> irrelevant if you agree with my idea. Hey, whatever,...
>
> S.3.5:
>
> 	RTO <- SRTT + max(G, K*RTTVAR) + max(G, max_ACK_delay)
>
> My immediate reaction to this was that G should not appear twice. However,
> perhaps you meant them to be G_s and G_r (sender and receiver)
> respectively. {Note 2}
>
> S.3.5 & S.5. It seems unnecessary to prohibit values of MAD greater than
> the default (given some companies are already investing in commercial
> public space flight programmes, so TCP could need to routinely support RTTs
> that are longer than typical not just shorter).
>
>
> Cheers
>
>
>
> Bob
>
>
> *{Note 1}*: On average, if not app-limited, the time between ACKs will be
> d_r*R_r/W_s where:
>    R is SRTT
>    d is the delayed ACK factor, e.g. d=2 for ACKing every other packet
>    W is the window in units of segments
>    subscripts X_r or X_s denote receiver or sender for the half-connection.
>
> So as long as the receiver can estimate the varying value of W at the
> sender, the receiver's MAD could be
>     MAD_r = max(k*d_r*R_r / W_s, G_r),
> The factor k (lower case) allows for some bunching of packets e.g. due to
> link layer aggregation or the residual effects of slow-start, which leaves
> some bunching even if SS uses pacing. Let's say k=2, but it would need to
> be checked empirically.
>
> For example, take R=100us, d=2, W=8 and G = 1us.
> Given d*R/W = 25us, MAD could be perhaps 50us (i.e. k=2). k might need to
> be greater, but there would certainly be no need for MAD to be 5ms, which
> is perhaps 100 times greater than necessary.
>

With currently popular OS implementations I'm aware of, 50us for a delayed
ACK timer is infeasible. Most have a minimum granularity of 1ms, or 10ms,
or even larger, for delayed ACKs. And part of the point of delayed ACKs is
to wait for applications to respond, so that data can be combined with the
ACK. And 50us does not give the app much time to respond.

Again, IMHO the MAD needs to incorporate hardware, software, and workload
constraints on the receiving end host.


>
> *{Note 2}*: Why is there no field in the Low Latency option to
> communicate receiver clock granularity to the sender?
>
>
The idea is that the MAD value is a function of many parameters on the end
host. The clock granularity is only one of them. The simplest way to convey
on the wire a MAD parameter that is a function of many other parameters is
just to convey the MAD value itself.

Bob, thanks again for your detailed and insightful feedback!

neal

--001a113fcc3019e8c40555f4e903
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanks, Bob, for your detailed and thoughtful review! This=
 is very insightful and useful.<div><br></div><div>Sorry I&#39;m coming to =
this discussion a little late. I wanted to add a few points, beyond what We=
i has already noted.<br><div class=3D"gmail_extra"><br><div class=3D"gmail_=
quote">On Wed, Aug 2, 2017 at 11:54 AM, Bob Briscoe <span dir=3D"ltr">&lt;<=
a href=3D"mailto:ietf@bobbriscoe.net" target=3D"_blank">ietf@bobbriscoe.net=
</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin=
:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"=
>
 =20
   =20
 =20
  <div bgcolor=3D"#FFFFFF">
    Wei, Yuchung, Neal and Eric, as authors of
    draft-wang-tcpm-low-latency-op<wbr>t-00,<br>
    <br>
    I promised a review. It questions the technical logic behind the
    draft, so I haven&#39;t bothered to give a detailed review of the
    wording of the draft, because that might be irrelevant if you agree
    with my arguments.<br>
    <br>
    <b>1/ MAD by configuration?</b><b><br>
    </b>
    <pre class=3D"gmail-m_4924700057452149308gmail-m_-7387466908820014094ne=
wpage">   o  If the user does not specify a MAD value, then the implementat=
ion
      SHOULD NOT specify a MAD value in the Low Latency option.
</pre>
    That sentence triggered my &quot;anti-human-intervention&quot; reflex. =
My
    train of thought went as follows:<br></div></blockquote><div><br></div>=
<div><div>Bob&#39;s remark about his &quot;anti-human-intervention&quot; re=
flex being</div><div>triggered got me thinking.</div><div><br></div><div>I,=
 too, would like to minimize the amount of human (application)</div><div>in=
tervention this proposal involves (to avoid errors, maintenance,</div><div>=
etc).</div><div><br></div><div>It occurs to me that actually at Google our =
experience has shown that</div><div>indeed apps have repeatedly made mistak=
es with this value, and we have</div><div>found it convenient to progressiv=
ely narrow their freedom in tuning</div><div>this knob. To the point where =
actually in our deployment there is very</div><div>little freedom left. Bec=
ause in reality the OS and TCP stack</div><div>developers know the timer gr=
anularity considerations, and the apps</div><div>don&#39;t (and tend to use=
 values 5 years out of date). So we&#39;ve found it</div><div>useful to hav=
e the OS tightly clamp the app&#39;s request for a MAD value.</div><div><br=
></div><div>So in the interests of simplicity and avoiding human interventi=
on,</div><div>what if we do not have the MAD value as part of the API, but =
rather</div><div>just allow the API to express a single &quot;please use MA=
D&quot; bit? And then</div><div>the transport implementation uses the small=
est value that it can</div><div>support on this end host.</div><div><br></d=
iv><div>Can we go further, and make MAD an automatic feature of the TCP</di=
v><div>implementation (so the transport implementation hard-wires MAD to &q=
uot;on&quot;</div><div>or &quot;off&quot;)? My sense is that we don&#39;t w=
ant to go that far, and that</div><div>instead we want to still allow apps =
to decide whether to use the</div><div>&quot;please use MAD&quot; bit. Why?=
 There may be middlebox or remote host</div><div>compatibility issues with =
MAD. So we want apps (like browsers) to be</div><div>able to do A/B experim=
ents to validate that sending the MAD option on</div><div>SYNs does not cau=
se problems. We don&#39;t want to turn on MAD in Linux</div><div>and then f=
ind compatibility issues, and have to wait for a client OS</div><div>upgrad=
e to everyone&#39;s cell phone to turn off MAD; instead we want to</div><di=
v>only have to wait for an app update.</div><div><br></div><div>So... suppo=
se an app decides it is latency-sensitive and wants to</div><div>reduce ACK=
 delays and negotiate a MAD value. And furthermore, the app</div><div>is ei=
ther (a) doing A/B experiments, or (b) has already convinced</div><div>itse=
lf that MAD will work on this path.</div><div><br></div><div>Then the app c=
ould enable MAD with a simple API like:</div><div>=C2=A0 =C2=A0int mad =3D =
1; // enable</div><div>=C2=A0 =C2=A0err =3D setsockopt(fd, SOL_TCP, TCP_MAD=
, &amp;mad, sizeof(mad));</div><div><br></div><div>For better or for worse,=
 that makes the TCP_MAD option much like the</div><div>TCP_NODELAY option. =
Both in the sense that latency sensitive apps</div><div>should remember to =
set this bit if they want low-latency behavior. And</div><div>in the sense =
that the APIs would look very similar. And TCP_NODELAY</div><div>and TCP_MA=
D would be sort of complimentary: TCP_NODELAY is the app</div><div>saying &=
quot;I want low latency for my sends&quot; and TCP_MAD is the app</div><div=
>saying &quot;I want low latency for my ACKs&quot;. My guess is that most</=
div><div>low-latency apps will want both.</div><div><br></div><div>For the =
MAD API, I think this might be the &quot;as simple as possible, but</div><d=
iv>no simpler point&quot;.</div></div><div><br></div><div>That said, that&#=
39;s an API issue. And I think for TCPM we should focus</div><div>more on t=
he wire protocol issues.</div><div>=C2=A0</div><blockquote class=3D"gmail_q=
uote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,2=
04);padding-left:1ex"><div bgcolor=3D"#FFFFFF">
    * Let&#39;s consider what advice we would give on what MAD value ought
    to be configured.<br></div></blockquote><div><br></div><div>I would sug=
gest that the advice be that when an app requests TCP_MAD,</div><div>then t=
ransport implementors would have the transport implementation</div><div>use=
 the lowest feasible value based on the end host hardware/OS/app</div><div>=
capabilities and workloads. Our sense from our deployment at Google</div><d=
iv>is that for many current technologies and workloads this is probably</di=
v><div>currently in the range of 5ms - 10ms.</div><div><br></div><div>But I=
 don&#39;t think we should get bogged down in a discussion of what this</di=
v><div>configured value ought to be. I think we should focus on the simples=
t</div><div>protocol mechanism that can convey to the remote host the minim=
um</div><div>info needed for the remote transport endpoint to achieve excel=
lent</div><div>performance.</div><div><br></div><div>Here I think of the MS=
S option as a good analogy (and that&#39;s why we</div><div>suggested the n=
ame &quot;MAD&quot;).</div><div><br></div><div>For MSS, the point is not to=
 spend time discussing what MSS should be</div><div>used, or to come up wit=
h complicated formulas to derive MSS. The point</div><div>is to have a simp=
le but general mechanism so that, no matter what the</div><div>MSS value is=
 (or the underlying hardware constraints are), there is a</div><div>simple =
option that can convey a hint to the remote host. Then the</div><div>remote=
 host can use that hint to tune its sending behavior to achieve</div><div>g=
ood performance.</div><div><br></div><div>Now substitute &quot;MAD&quot; in=
 the place of &quot;MSS&quot; in the preceding paragraph. :-)</div><div>=C2=
=A0<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px =
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor=
=3D"#FFFFFF">
    * You say that MAD can be smaller in DCs. So I assume your advice
    would be that MAD should depend on RTT {Note 1} and clock
    granularity {Note 2}.<br></div></blockquote><div><br></div><div>Persona=
lly I do not think that MAD should depend on RTT. And I don&#39;t think the=
 draft says that it should (though let me know if there is some spot I didn=
&#39;t notice).</div><div><br></div><div>I&#39;d vote for keeping MAD as si=
mple as possible, which means keeping RTT out of it. :-)</div><div><br></di=
v><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;borde=
r-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor=3D"#FFFFFF=
">
    * So why configure one value of MAD for all RTTs? That only makes
    sense in DC environments where the range of RTTs is small. <br></div></=
blockquote><div><br></div><div>I&#39;d recommend one value of MAD for all R=
TTs for the sake of simplicity. If we keep MAD as simple as possible, then =
it stays just about the practical delay limitations of the end host (OS tim=
ers, CPU power, CPU load, app behavior, end host queuing delays, etc). That=
 is what we have found makes sense in our deployment. And note that our dep=
loyment of a MAD-like option covers RTTs that span quite a range, from &lt;=
1 ms up to hundreds of ms.</div><div><br></div><div>Most OSes I know alread=
y have a constant that defines the maximum interval over which they can del=
ay their ACKs. We are basically just suggesting a simple wire format for tr=
ansport endpoints to advertise this existing value as a hint.</div><div>=C2=
=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8e=
x;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor=3D"=
#FFFFFF">
    * However, for the range of RTTs on the public Internet, why not
    calculate MAD from RTT and granularity, then standardize the
    calculation so that both ends arrive at the same result when
    starting from the same RTT and granularity parameters? (The sender
    and receiver might measure different smoothed (SRTT) values, but
    they will converge as the flow progresses.)<br>
    <br>
    Then the receiver only needs to communicate its clock granularity to
    the sender, and the fact that it is driving MAD off its SRTT. Then
    the sender can use a formula for RTO derived from the value of MAD
    that it calculates the receiver will be using. Then its RTO will be
    completely tailored to the RTT of the flow. <br></div></blockquote><div=
><br></div><div>A couple questions here:</div><div><br></div><div>- Why =C2=
=A0should we add the complexity of making MAD dependent on RTT? I&#39;m not=
 clear on what the argument would be for the benefit of introducing this co=
mplexity.</div><div><br></div><div>- Even if the receiver only communicates=
 its clock granularity to the sender, and the fact that it is driving MAD o=
ff its SRTT, then there&#39;s a the question of *how* it is deriving MAD. P=
resumably this could change, as we come up with better ideas. So then we wo=
uld want a version number field to indicate which calculation is being used=
. It seems much simpler to me to allow the end point to just communicate a =
numerical delay value, rather than negotiate a version number of a formula =
that can take a clock granularity and RTT as input and produce a delay as o=
utput.</div><div><br></div><div>- Introducing RTT as a dependence also intr=
oduces the question of what to do when there is no RTT estimate (because al=
l packets so far have been retransmitted, with no timestamps). And as we di=
scussed in Prague and you mention here, the two sides often have slightly d=
ifferent RTT estimates. There are probably other wrinkles as well.</div><di=
v>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolo=
r=3D"#FFFFFF">
    <br>
    Note: There are two different uses for the min RTO that need to be
    separated:<br>
    =C2=A0=C2=A0=C2=A0 a) Before an initial RTT value has been measured, to=
 determine
    the RTO during the 3WHS.<br>
    =C2=A0=C2=A0=C2=A0 b) Once either end has measured the RTT for a connec=
tion.<br>
    (a) needs to cope with the whole range of possible RTTs, whereas (b)
    is the subject of this email, because it can be tailored for the
    measured RTT.<br>
    <br>
    <b>2/ The problem, and its prevalence</b><b><br>
    </b><br>
    With gradual removal of bufferbloat and more prevalent usage of
    CDNs, typical base RTTs on the public Internet now make the value of
    minRTO and of MAD look silly.<br>
    <br>
    As can be seen above, the problem is indeed that each end only has
    partial knowledge of the config of the other end.<br>
    However, the problem is not just that MAD needs to be communicated
    to the other end so it can be hard-coded to a lower value.<br>
    The problem is that MAD is hard-coded in the first place.<br>
    <br>
    The draft needs to say how prevalent the problem is (on the public
    Internet) where the sender has to wait for the receiver&#39;s delayed
    ACK timer at the end of a flow or between the end of a volley of
    packets and the start of the next. <br>
    <br>
    The draft also needs to say what tradeoff is considered acceptable
    between a residual level of spurious retransmissions and lower
    timeout delay. Eliminating all spurious retransmissions is not the
    goal.<br>
    <br>
    The draft also needs to say that introducing a new TCP Option is
    itself a problem (on the public Internet), because of middleboxes
    particularly proxies. Therefore a solution that does not need a new
    TCP Option would be preferable....<br>
    <br>
    Perhaps the solution for communicating timestamp resolution in
    draft-scheffenegger-tcpm-times<wbr>tamp-negotiation-05 (which cites
    draft-trammell-tcpm-timestamp-<wbr>interval-01) could be modified to al=
so
    communicate:<br>
    * TCP&#39;s clock granularity (closely related to TCP timestamp
    resolution), <br>
    *=C2=A0 and the fact that the host is calculating MAD as a function of
    RTT and granularity. <br>
    Then the existing timestamp option could be repurposed, which should
    drastically reduce deployment problems.<br>
    <br>
    <b>3/ Only DC?</b><b><br>
    </b><br>
    All the related work references are solely in the context of a DC.
    Pls include refs about this problem in a public Internet context.
    You will find there is a pretty good search engine at
    <a class=3D"gmail-m_4924700057452149308gmail-m_-7387466908820014094moz-=
txt-link-abbreviated" href=3D"http://www.google.com" target=3D"_blank">www.=
google.com</a>.<br>
    <br>
    The only non-DC ref I can find about minRTO is [Psaras07], which is
    mainly about a proposal to apply minRTO if the sender expects the
    next ACK to be delayed. Nonetheless, the simulation experiment in
    Section 5.1 provides good evidence for how RTO latency is dependent
    on uncertainty about the MAD that the other end is using.<br>
    <br>
    [Psaras07] Psaras, I. &amp; Tsaoussidis, V., &quot;The TCP Minimum RTO
    Revisited,&quot; In: Proc. 6th Int&#39;l IFIP-TC6 Conference on Ad Hoc =
and
    Sensor Networks, Wireless Networks, Next Generation Internet
    NETWORKING&#39;07 pp.981-991 Springer-Verlag (2007)<br>
<a class=3D"gmail-m_4924700057452149308gmail-m_-7387466908820014094moz-txt-=
link-freetext" href=3D"https://www.researchgate.net/publication/225442912_T=
he_TCP_Minimum_RTO_Revisited" target=3D"_blank">https://www.researchgate.ne=
t/p<wbr>ublication/225442912_The_TCP_M<wbr>inimum_RTO_Revisited</a></div></=
blockquote><div><br></div><div>All great points. Thanks!</div><div>=C2=A0</=
div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor=
der-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor=3D"#FFFF=
FF">
    <br>
    <b>4/ Status</b><b><br>
    </b><br>
    Normally, I wouldn&#39;t want to hold up a draft that has been proven
    over years of practice, such as the technique in low-latency-opt,
    which has been proven in Google&#39;s DCs over the last few years.
    Whereas, my ideas are just that: ideas, not proven. However, the
    technique in low-latency-opt has only been proven in DC environments
    where the range of RTTs is limited. So, now that you are proposing
    to transplant it onto the public Internet, it also only has the
    status of an unproven idea.<br>
    <br>
    To be clear, as it stands, I do not think low-latency-opt is
    applicable to the public Internet.<br></div></blockquote><div><br></div=
><div>Can you please elaborate on this? Is this because you think there oug=
ht to be a dependence on RTT?</div><div>=C2=A0</div><blockquote class=3D"gm=
ail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,=
204,204);padding-left:1ex"><div bgcolor=3D"#FFFFFF">
    <br>
    <br>
    <b>5/ Nits</b><b><br>
    </b>These nits depart from my promise not comment on details that
    could become irrelevant if you agree with my idea. Hey, whatever,...
    <br>
    <br>
    S.3.5:<br>
    <pre class=3D"gmail-m_4924700057452149308gmail-m_-7387466908820014094ne=
wpage">	RTO &lt;- SRTT + max(G, K*RTTVAR) + max(G, max_ACK_delay)</pre>
    My immediate reaction to this was that G should not appear twice.
    However, perhaps you meant them to be G_s and G_r (sender and
    receiver) respectively. {Note 2}<br>
    <br>
    S.3.5 &amp; S.5. It seems unnecessary to prohibit values of MAD
    greater than the default (given some companies are already investing
    in commercial public space flight programmes, so TCP could need to
    routinely support RTTs that are longer than typical not just
    shorter).<br>
    <br>
    <br>
    Cheers<br>
    <br>
    <br>
    <br>
    Bob<br>
    <br>
    <b><br>
    </b><b>{Note 1}</b>: On average, if not app-limited, the time
    between ACKs will be d_r*R_r/W_s where:<br>
    =C2=A0=C2=A0 R is SRTT<br>
    =C2=A0=C2=A0 d is the delayed ACK factor, e.g. d=3D2 for ACKing every o=
ther
    packet<br>
    =C2=A0=C2=A0 W is the window in units of segments<br>
    =C2=A0=C2=A0 subscripts X_r or X_s denote receiver or sender for the
    half-connection.<br>
    <br>
    So as long as the receiver can estimate the varying value of W at
    the sender, the receiver&#39;s MAD could be <br>
    =C2=A0=C2=A0=C2=A0 MAD_r =3D max(k*d_r*R_r / W_s, G_r), <br>
    The factor k (lower case) allows for some bunching of packets e.g.
    due to link layer aggregation or the residual effects of slow-start,
    which leaves some bunching even if SS uses pacing. Let&#39;s say k=3D2,
    but it would need to be checked empirically.<br>
    <br>
    For example, take R=3D100us, d=3D2, W=3D8 and G =3D 1us.<br>
    Given d*R/W =3D 25us, MAD could be perhaps 50us (i.e. k=3D2). k might
    need to be greater, but there would certainly be no need for MAD to
    be 5ms, which is perhaps 100 times greater than necessary.<br></div></b=
lockquote><div><br></div><div>With currently popular OS implementations I&#=
39;m aware of, 50us for a delayed ACK timer is infeasible. Most have a mini=
mum granularity of 1ms, or 10ms, or even larger, for delayed ACKs. And part=
 of the point of delayed ACKs is to wait for applications to respond, so th=
at data can be combined with the ACK. And 50us does not give the app much t=
ime to respond.</div><div><br></div><div>Again, IMHO the MAD needs to incor=
porate hardware, software, and workload constraints on the receiving end ho=
st.=C2=A0</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"=
margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-lef=
t:1ex"><div bgcolor=3D"#FFFFFF">
    <b><br>
    </b><b>{Note 2}</b>: Why is there no field in the Low Latency option
    to communicate receiver clock granularity to the sender?<span class=3D"=
gmail-m_4924700057452149308gmail-HOEnZb"><font color=3D"#888888"><br><br>
  </font></span></div>

</blockquote></div><br></div></div><div class=3D"gmail_extra">The idea is t=
hat the MAD value is a function of many parameters on the end host. The clo=
ck granularity is only one of them. The simplest way to convey on the wire =
a MAD parameter that is a function of many other parameters is just to conv=
ey the MAD value itself.</div><div class=3D"gmail_extra"><br></div><div cla=
ss=3D"gmail_extra">Bob, thanks again for your detailed and insightful feedb=
ack!</div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">n=
eal</div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra"><b=
r></div></div>

--001a113fcc3019e8c40555f4e903--


From nobody Sun Aug  6 04:27:15 2017
Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 01505131C95 for <tcpm@ietfa.amsl.com>; Sun,  6 Aug 2017 04:27:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level: 
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zEfENpiAbKlw for <tcpm@ietfa.amsl.com>; Sun,  6 Aug 2017 04:27:09 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 966F3131CD7 for <tcpm@ietf.org>; Sun,  6 Aug 2017 04:27:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:Cc:To:Subject:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=G9ie926oWCtMdi71cErh49BCFGn+dz3RwONy+SizZ3k=; b=pt45lta6BenP1W0gHhrJLC49F RVs1KKyGXI376AAIDX9ZbyrLlZcrrY/j0RZxuUm0NOqD4V2R+aYTfsudOny6Z0bvkrC5/Gjg/RvI5 ZB7W77UBFAIdi7v93l0IEoncVhCUIq4uYitEAq4bCqMSKBCVuALbjVsC1KFHHV6ovtJlzM35h2MKd M0m73bnClDpBosSXqyIFGxZhz+zlyVYPTwedcHls43N97wUvtQ6Uzg3FXuhtuJMltGAmDib9aE9mM p0FgMRbbd0PiSgiTM3A9j2h9nw+dNgm9Ft7N+OUYza0cU8Ol1Os4/kpBSub2iPJhC00jX1Bz4XAzl IR6MuRRFg==;
Received: from 6.136.199.146.dyn.plus.net ([146.199.136.6]:45162 helo=[192.168.0.2]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89) (envelope-from <ietf@bobbriscoe.net>) id 1deJi5-00061n-GY; Sun, 06 Aug 2017 12:27:06 +0100
To: Wei Wang <weiwan@google.com>
Cc: Eric Dumazet <edumazet@google.com>, Yuchung Cheng <ycheng@google.com>, Neal Cardwell <ncardwell@google.com>, tcpm IETF list <tcpm@ietf.org>
References: <8abadc4d-4165-a5bc-23bb-e4f9258c695b@bobbriscoe.net> <CAK6E8=c4D0QTzMobMQXLZMU5JiBRXXPdYJ0KTqvg08t+G0VDxQ@mail.gmail.com> <CANn89iL+TC6sh=e+keb4Psxz+E6oHV3Mcvsay6UYL2qEKUT6bw@mail.gmail.com> <2131135f-b123-70f0-d464-dac6640d6cd2@bobbriscoe.net> <d2570431-8c01-d7fc-5aa3-581d69836923@bobbriscoe.net> <CAEA6p_CN+w6XH-A=zNEc3SL9gnRF-oH5jKD4Kvkxb3=p_PTBUg@mail.gmail.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <dbd19f9e-cd80-4ac8-8e88-1c56577d8ad6@bobbriscoe.net>
Date: Sun, 6 Aug 2017 12:27:04 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <CAEA6p_CN+w6XH-A=zNEc3SL9gnRF-oH5jKD4Kvkxb3=p_PTBUg@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------D0BD57B3215DFFC78BA84A3C"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/UFzOi06RH_qHoRD_dNCb4Fnwf6I>
Subject: Re: [tcpm] Review of draft-wang-tcpm-low-latency-opt-00
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 06 Aug 2017 11:27:14 -0000

This is a multi-part message in MIME format.
--------------D0BD57B3215DFFC78BA84A3C
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Wei,

On 04/08/17 17:55, Wei Wang wrote:
> Hi Bob,
>
> Thanks a lot for your review and detailed feedback on the draft.
> Please see my comments inline below:
>
> On Wed, Aug 2, 2017 at 8:54 AM, Bob Briscoe <ietf@bobbriscoe.net 
> <mailto:ietf@bobbriscoe.net>> wrote:
>
>     Wei, Yuchung, Neal and Eric, as authors of
>     draft-wang-tcpm-low-latency-opt-00,
>
>     I promised a review. It questions the technical logic behind the
>     draft, so I haven't bothered to give a detailed review of the
>     wording of the draft, because that might be irrelevant if you
>     agree with my arguments.
>
>     *1/ MAD by configuration?**
>     *
>
>         o  If the user does not specify a MAD value, then the implementation
>            SHOULD NOT specify a MAD value in the Low Latency option.
>
>     That sentence triggered my "anti-human-intervention" reflex. My
>     train of thought went as follows:
>
>     * Let's consider what advice we would give on what MAD value ought
>     to be configured.
>     * You say that MAD can be smaller in DCs. So I assume your advice
>     would be that MAD should depend on RTT {Note 1} and clock
>     granularity {Note 2}.
>     * So why configure one value of MAD for all RTTs? That only makes
>     sense in DC environments where the range of RTTs is small.
>     * However, for the range of RTTs on the public Internet, why not
>     calculate MAD from RTT and granularity, then standardize the
>     calculation so that both ends arrive at the same result when
>     starting from the same RTT and granularity parameters? (The sender
>     and receiver might measure different smoothed (SRTT) values, but
>     they will converge as the flow progresses.)
>
>     Then the receiver only needs to communicate its clock granularity
>     to the sender, and the fact that it is driving MAD off its SRTT.
>     Then the sender can use a formula for RTO derived from the value
>     of MAD that it calculates the receiver will be using. Then its RTO
>     will be completely tailored to the RTT of the flow.
>
>
> First of all, we recommend that operating system should have a 
> per-route MAD configuration API and a per-connection MAD configuration 
> API. So different connections could have different MAD values 
> configured. It is not one value for all.
[BB]: I prefer Neal's subsequent response agreeing that a MAD API is 
fraught with human-intervention problems, and preferring a binary API 
(use MAD or not).

I saw that pre route config is already possible when I checked the Linux 
code. However, per route config just makes the likelihood of errors 
greater. Particularly cos IETF standardization is primarily for the 
Internet, not just DCs. And on the Internet, a large proportion of 
clients are not controlled by a management system.

>
> And in my opinion, what MAD value should be set to is not only 
> depending on RTT and clock granularity. It also depends on how the 
> application wants the delayed ack behavior to be. Some application 
> might only send data say every 1ms, so it will delay its ack up to 2ms 
> so that it can always piggy back the ack to the data.
> That is why a per-connection MAD configuration makes sense for the 
> application to fine tune MAD according to its own demand.
[BB]: This has to be automated for the Internet.

An app only cares if MAD is too long. An app doesn't care if the ACK 
delay is too short. But 'the network' cares if there are too many 
unnecessary ACKs (and this knocks-on to every other app including the 
original app). So on the public Internet, the stack, not the app, is an 
appropriate place to determine MAD. The app can only be trusted to do 
this in a managed environment.

See response to Neal for further thoughts.


>
> And when user tries to set a new MAD value, we do boundary check to 
> make sure it is less than the current default MAD value. This is a 
> safety check to make sure user does not configure something that is 
> worse than current default value.
[BB]: That warrants a warning on the UI, not prohibition and certainly 
not silently ignoring the input (see point I already made below about 
large RTT environments).

>
> About your question in {Note 2} that why receiver does not communicate 
> its clock granularity to the sender, I don't really see a reason why 
> receiver side clock granularity is needed. Because the MAD value sent 
> by receiver is already a value that is rounded to the clock 
> granularity. Say if a user wants to set MAD to 1ms, and the clock 
> granularity is 10ms, receiver will send MAD value as 10ms. In the 
> draft, we specify that:
>
>       If specified, then the MAD value in the Low Latency option MUST be
>       set, as close as possible, to the implementation's actual delayed
>       ACK timeout for the connection.  Note that the actual maximum
>       delayed ACK timeout of the connection may be larger than the
>       actual user specified value because of implementation constraints
>              (e.g. timer granularity limitations).
[BB]: Understood. I should have made clear that my question was only 
relevant if you accepted my argument that the sender would calculate 
what the receiver would use for MAD (from RTT and granularity).

See my response to Neal for further thoughts.

>
>
>
>     Note: There are two different uses for the min RTO that need to be
>     separated:
>         a) Before an initial RTT value has been measured, to determine
>     the RTO during the 3WHS.
>         b) Once either end has measured the RTT for a connection.
>     (a) needs to cope with the whole range of possible RTTs, whereas
>     (b) is the subject of this email, because it can be tailored for
>     the measured RTT.
>
>
> Again, we don't think MAD value is only a function of RTT and clock 
> granularity.
>
>
>     *2/ The problem, and its prevalence**
>     *
>     With gradual removal of bufferbloat and more prevalent usage of
>     CDNs, typical base RTTs on the public Internet now make the value
>     of minRTO and of MAD look silly.
>
>     As can be seen above, the problem is indeed that each end only has
>     partial knowledge of the config of the other end.
>     However, the problem is not just that MAD needs to be communicated
>     to the other end so it can be hard-coded to a lower value.
>     The problem is that MAD is hard-coded in the first place.
>
>     The draft needs to say how prevalent the problem is (on the public
>     Internet) where the sender has to wait for the receiver's delayed
>     ACK timer at the end of a flow or between the end of a volley of
>     packets and the start of the next.
>
>
> Noted. We will add more contexts on how delayed ack works and why long 
> delayed ack time is hurting performance. We are also planning on 
> adding some history about why delayed ack was configured as a constant 
> in the first place and why the current constant value was chosen.
>
>
>     The draft also needs to say what tradeoff is considered acceptable
>     between a residual level of spurious retransmissions and lower
>     timeout delay. Eliminating all spurious retransmissions is not the
>     goal.
>
>
> Noted.
>
>
>     The draft also needs to say that introducing a new TCP Option is
>     itself a problem (on the public Internet), because of middleboxes
>     particularly proxies. Therefore a solution that does not need a
>     new TCP Option would be preferable....
>
>
> There is already a section in the draft that states the middle box issue:
>         5. Middlebox Considerations
> Is that portion a good enough explanation on this?
[BB]: I'm afraid not.

1/ The likelihood that the option is stripped (e.g. by proxies) is not 
mentioned. It only mentions the likelihood the whole SYN is discarded 
because of the option. That was why I pointed out it may be possible to 
redesign this without a new TCP option, by repurposing the timestamp 
option in a similar way to tcpm-timestamp-negotiation (note: 'similar' 
means using the ideas, not necessarily the exact same scheme).

2/ The first bullet relies on data about middleboxes that Michio 
gathered 6 years ago. Google has the ability to verify the current position.

3/ The second bullet would be irrelevant if you accept my point that the 
option needs to support larger RTTs not just smaller. Nonetheless, there 
is little evidence that middleboxes alter the fields  in unknown options.

>     Perhaps the solution for communicating timestamp resolution in
>     draft-scheffenegger-tcpm-timestamp-negotiation-05 (which cites
>     draft-trammell-tcpm-timestamp-interval-01) could be modified to
>     also communicate:
>     * TCP's clock granularity (closely related to TCP timestamp
>     resolution),
>     *  and the fact that the host is calculating MAD as a function of
>     RTT and granularity.
>     Then the existing timestamp option could be repurposed, which
>     should drastically reduce deployment problems.
>
>
> I am not sure if this is doable but will look into it.
>
>
>     *3/ Only DC?**
>     *
>     All the related work references are solely in the context of a DC.
>     Pls include refs about this problem in a public Internet context.
>     You will find there is a pretty good search engine at
>     www.google.com <http://www.google.com>.
>
>     The only non-DC ref I can find about minRTO is [Psaras07], which
>     is mainly about a proposal to apply minRTO if the sender expects
>     the next ACK to be delayed. Nonetheless, the simulation experiment
>     in Section 5.1 provides good evidence for how RTO latency is
>     dependent on uncertainty about the MAD that the other end is using.
>
>     [Psaras07] Psaras, I. & Tsaoussidis, V., "The TCP Minimum RTO
>     Revisited," In: Proc. 6th Int'l IFIP-TC6 Conference on Ad Hoc and
>     Sensor Networks, Wireless Networks, Next Generation Internet
>     NETWORKING'07 pp.981-991 Springer-Verlag (2007)
>     https://www.researchgate.net/publication/225442912_The_TCP_Minimum_RTO_Revisited
>     <https://www.researchgate.net/publication/225442912_The_TCP_Minimum_RTO_Revisited>
>
>
> Noted. Thanks a lot for the pointers. Will look into them and add to 
> the draft.
>
>
>
>     *4/ Status**
>     *
>     Normally, I wouldn't want to hold up a draft that has been proven
>     over years of practice, such as the technique in low-latency-opt,
>     which has been proven in Google's DCs over the last few years.
>     Whereas, my ideas are just that: ideas, not proven. However, the
>     technique in low-latency-opt has only been proven in DC
>     environments where the range of RTTs is limited. So, now that you
>     are proposing to transplant it onto the public Internet, it also
>     only has the status of an unproven idea.
>
>     To be clear, as it stands, I do not think low-latency-opt is
>     applicable to the public Internet.
>
>
>
> Hmm... I think overall, this approach should not do any harm to the 
> network. It provides an additional feature to let the user configure 
> the MAD if the user cares about it. If not, they can leave it as the 
> default behavior as it is right now.
> To your concerns about the RTT variation in the internet, first, as I 
> explained, this MAD value will be set per connection or per route. 
> Secondly, I would think it is doable to do some bound check or error 
> correction on the MAD value set by the user if we find that it is way 
> below RTT and does not make sense. But again, we don't think MAD value 
> is only a function of RTT. User should be able to configure it to a 
> value suitable for his/her need.
> We want to make it as a standard so that all operating systems could 
> implement this in the same way so that they could understand each 
> other. One use case is that in a cloud environment where different 
> operating systems are running in the same DC, they should be able to 
> interpret this option with no issue.
[BB]: Yes, I guessed that this was probably what Google was really 
wanting to standardize this for. With the current config constraint 
text, it is limited to managed environments, which would make it 
uninteresting to many at the IETF.

Fortunately, I think the line of thinking between Neal & me is already 
widening applicability to unmanaged environments.

>
>
>     *5/ Nits**
>     *These nits depart from my promise not comment on details that
>     could become irrelevant if you agree with my idea. Hey, whatever,...
>
>     S.3.5:
>
>     	RTO <- SRTT + max(G, K*RTTVAR) + max(G, max_ACK_delay)
>
>     My immediate reaction to this was that G should not appear twice.
>     However, perhaps you meant them to be G_s and G_r (sender and
>     receiver) respectively. {Note 2}
>
>
> As explained earlier, clock granularity of the receiver is already 
> being considered in the MAD value itself. In the above formula, both G 
> are the clock granularity on the sender side.
[BB]: Then it should not be necessary to round up 2 terms to the same 
granularity. Would it not be correct to use:

	RTO <- SRTT + max(G, (K*RTTVAR + max_ACK_delay) )


>
>     S.3.5 & S.5. It seems unnecessary to prohibit values of MAD
>     greater than the default (given some companies are already
>     investing in commercial public space flight programmes, so TCP
>     could need to routinely support RTTs that are longer than typical
>     not just shorter).
>
>
> Noted. Will take consideration of this.

Regards


Bob
>
>
>     Cheers
>
>
>
>     Bob
>
>     *
>     **{Note 1}*: On average, if not app-limited, the time between ACKs
>     will be d_r*R_r/W_s where:
>        R is SRTT
>        d is the delayed ACK factor, e.g. d=2 for ACKing every other packet
>        W is the window in units of segments
>        subscripts X_r or X_s denote receiver or sender for the
>     half-connection.
>
>     So as long as the receiver can estimate the varying value of W at
>     the sender, the receiver's MAD could be
>         MAD_r = max(k*d_r*R_r / W_s, G_r),
>     The factor k (lower case) allows for some bunching of packets e.g.
>     due to link layer aggregation or the residual effects of
>     slow-start, which leaves some bunching even if SS uses pacing.
>     Let's say k=2, but it would need to be checked empirically.
>
>     For example, take R=100us, d=2, W=8 and G = 1us.
>     Given d*R/W = 25us, MAD could be perhaps 50us (i.e. k=2). k might
>     need to be greater, but there would certainly be no need for MAD
>     to be 5ms, which is perhaps 100 times greater than necessary.
>     *
>     **{Note 2}*: Why is there no field in the Low Latency option to
>     communicate receiver clock granularity to the sender?
>
>
>     Bob
>
>     -- 
>     ________________________________________________________________
>     Bob Briscoehttp://bobbriscoe.net/
>
>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


--------------D0BD57B3215DFFC78BA84A3C
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Wei,<br>
    <br>
    <div class="moz-cite-prefix">On 04/08/17 17:55, Wei Wang wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAEA6p_CN+w6XH-A=zNEc3SL9gnRF-oH5jKD4Kvkxb3=p_PTBUg@mail.gmail.com">
      <div dir="ltr">Hi Bob,
        <div><br>
        </div>
        <div>Thanks a lot for your review and detailed feedback on the
          draft.</div>
        <div>Please see my comments inline below:</div>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Wed, Aug 2, 2017 at 8:54 AM, Bob
            Briscoe <span dir="ltr">&lt;<a
                href="mailto:ietf@bobbriscoe.net" target="_blank"
                moz-do-not-send="true">ietf@bobbriscoe.net</a>&gt;</span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> Wei, Yuchung, Neal and Eric, as
                authors of draft-wang-tcpm-low-latency-<wbr>opt-00,<br>
                <br>
                I promised a review. It questions the technical logic
                behind the draft, so I haven't bothered to give a
                detailed review of the wording of the draft, because
                that might be irrelevant if you agree with my arguments.<br>
                <br>
                <b>1/ MAD by configuration?</b><b><br>
                </b>
                <pre class="gmail-m_3610993658251518594newpage">   o  If the user does not specify a MAD value, then the implementation
      SHOULD NOT specify a MAD value in the Low Latency option.
</pre>
                That sentence triggered my "anti-human-intervention"
                reflex. My train of thought went as follows:<br>
                <br>
                * Let's consider what advice we would give on what MAD
                value ought to be configured.<br>
                * You say that MAD can be smaller in DCs. So I assume
                your advice would be that MAD should depend on RTT {Note
                1} and clock granularity {Note 2}.<br>
                * So why configure one value of MAD for all RTTs? That
                only makes sense in DC environments where the range of
                RTTs is small. <br>
                * However, for the range of RTTs on the public Internet,
                why not calculate MAD from RTT and granularity, then
                standardize the calculation so that both ends arrive at
                the same result when starting from the same RTT and
                granularity parameters? (The sender and receiver might
                measure different smoothed (SRTT) values, but they will
                converge as the flow progresses.)<br>
                <br>
                Then the receiver only needs to communicate its clock
                granularity to the sender, and the fact that it is
                driving MAD off its SRTT. Then the sender can use a
                formula for RTO derived from the value of MAD that it
                calculates the receiver will be using. Then its RTO will
                be completely tailored to the RTT of the flow. <br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>First of all, we recommend that operating system should
              have a per-route MAD configuration API and a
              per-connection MAD configuration API. So different
              connections could have different MAD values configured. It
              is not one value for all.</div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: I prefer Neal's subsequent response agreeing that a MAD API is
    fraught with human-intervention problems, and preferring a binary
    API (use MAD or not).<br>
    <br>
    I saw that pre route config is already possible when I checked the
    Linux code. However, per route config just makes the likelihood of
    errors greater. Particularly cos IETF standardization is primarily
    for the Internet, not just DCs. And on the Internet, a large
    proportion of clients are not controlled by a management system.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAEA6p_CN+w6XH-A=zNEc3SL9gnRF-oH5jKD4Kvkxb3=p_PTBUg@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>And in my opinion, what MAD value should be set to is
              not only depending on RTT and clock granularity. It also
              depends on how the application wants the delayed ack
              behavior to be. Some application might only send data say
              every 1ms, so it will delay its ack up to 2ms so that it
              can always piggy back the ack to the data.</div>
            <div>That is why a per-connection MAD configuration makes
              sense for the application to fine tune MAD according to
              its own demand.</div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: This has to be automated for the Internet. <br>
    <br>
    An app only cares if MAD is too long. An app doesn't care if the ACK
    delay is too short. But 'the network' cares if there are too many
    unnecessary ACKs (and this knocks-on to every other app including
    the original app). So on the public Internet, the stack, not the
    app, is an appropriate place to determine MAD. The app can only be
    trusted to do this in a managed environment.<br>
    <br>
    See response to Neal for further thoughts.<br>
    <br>
    <br>
    <blockquote type="cite"
cite="mid:CAEA6p_CN+w6XH-A=zNEc3SL9gnRF-oH5jKD4Kvkxb3=p_PTBUg@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>And when user tries to set a new MAD value, we do
              boundary check to make sure it is less than the current
              default MAD value. This is a safety check to make sure
              user does not configure something that is worse than
              current default value.</div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: That warrants a warning on the UI, not prohibition and
    certainly not silently ignoring the input (see point I already made
    below about large RTT environments).<br>
    <br>
    <blockquote type="cite"
cite="mid:CAEA6p_CN+w6XH-A=zNEc3SL9gnRF-oH5jKD4Kvkxb3=p_PTBUg@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>About your question in {Note 2} that why receiver does
              not communicate its clock granularity to the sender, I
              don't really see a reason why receiver side clock
              granularity is needed. Because the MAD value sent by
              receiver is already a value that is rounded to the clock
              granularity. Say if a user wants to set MAD to 1ms, and
              the clock granularity is 10ms, receiver will send MAD
              value as 10ms. In the draft, we specify that:</div>
            <div><br>
            </div>
            <div>      If specified, then the MAD value in the Low
              Latency option MUST be</div>
            <div>      set, as close as possible, to the
              implementation's actual delayed</div>
            <div>      ACK timeout for the connection.  Note that the
              actual maximum</div>
            <div>      delayed ACK timeout of the connection may be
              larger than the</div>
            <div>      actual user specified value because of
              implementation constraints </div>
            <div>             (e.g. timer granularity limitations).  <br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: Understood. I should have made clear that my question was only
    relevant if you accepted my argument that the sender would calculate
    what the receiver would use for MAD (from RTT and granularity). <br>
    <br>
    See my response to Neal for further thoughts.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAEA6p_CN+w6XH-A=zNEc3SL9gnRF-oH5jKD4Kvkxb3=p_PTBUg@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> <br>
                Note: There are two different uses for the min RTO that
                need to be separated:<br>
                    a) Before an initial RTT value has been measured, to
                determine the RTO during the 3WHS.<br>
                    b) Once either end has measured the RTT for a
                connection.<br>
                (a) needs to cope with the whole range of possible RTTs,
                whereas (b) is the subject of this email, because it can
                be tailored for the measured RTT.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Again, we don't think MAD value is only a function of
              RTT and clock granularity.</div>
            <div><br>
            </div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> <br>
                <b>2/ The problem, and its prevalence</b><b><br>
                </b><br>
                With gradual removal of bufferbloat and more prevalent
                usage of CDNs, typical base RTTs on the public Internet
                now make the value of minRTO and of MAD look silly.<br>
                <br>
                As can be seen above, the problem is indeed that each
                end only has partial knowledge of the config of the
                other end.<br>
                However, the problem is not just that MAD needs to be
                communicated to the other end so it can be hard-coded to
                a lower value.<br>
                The problem is that MAD is hard-coded in the first
                place.<br>
                <br>
                The draft needs to say how prevalent the problem is (on
                the public Internet) where the sender has to wait for
                the receiver's delayed ACK timer at the end of a flow or
                between the end of a volley of packets and the start of
                the next. <br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Noted. We will add more contexts on how delayed ack
              works and why long delayed ack time is hurting
              performance. We are also planning on adding some history
              about why delayed ack was configured as a constant in the
              first place and why the current constant value was chosen.</div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> <br>
                The draft also needs to say what tradeoff is considered
                acceptable between a residual level of spurious
                retransmissions and lower timeout delay. Eliminating all
                spurious retransmissions is not the goal.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Noted.</div>
            <div> <br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> <br>
                The draft also needs to say that introducing a new TCP
                Option is itself a problem (on the public Internet),
                because of middleboxes particularly proxies. Therefore a
                solution that does not need a new TCP Option would be
                preferable....<br>
                <br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>There is already a section in the draft that states the
              middle box issue:</div>
            <div>        5. Middlebox Considerations </div>
            <div>Is that portion a good enough explanation on this?</div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: I'm afraid not. <br>
    <br>
    1/ The likelihood that the option is stripped (e.g. by proxies) is
    not mentioned. It only mentions the likelihood the whole SYN is
    discarded because of the option. That was why I pointed out it may
    be possible to redesign this without a new TCP option, by
    repurposing the timestamp option in a similar way to
    tcpm-timestamp-negotiation (note: 'similar' means using the ideas,
    not necessarily the exact same scheme). <br>
    <br>
    2/ The first bullet relies on data about middleboxes that Michio
    gathered 6 years ago. Google has the ability to verify the current
    position.<br>
    <br>
    3/ The second bullet would be irrelevant if you accept my point that
    the option needs to support larger RTTs not just smaller.
    Nonetheless, there is little evidence that middleboxes alter the
    fields  in unknown options.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAEA6p_CN+w6XH-A=zNEc3SL9gnRF-oH5jKD4Kvkxb3=p_PTBUg@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> Perhaps the solution for
                communicating timestamp resolution in
                draft-scheffenegger-tcpm-<wbr>timestamp-negotiation-05
                (which cites draft-trammell-tcpm-timestamp-<wbr>interval-01)
                could be modified to also communicate:<br>
                * TCP's clock granularity (closely related to TCP
                timestamp resolution), <br>
                *  and the fact that the host is calculating MAD as a
                function of RTT and granularity. <br>
                Then the existing timestamp option could be repurposed,
                which should drastically reduce deployment problems.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>I am not sure if this is doable but will look into it.</div>
            <div><br>
            </div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> <br>
                <b>3/ Only DC?</b><b><br>
                </b><br>
                All the related work references are solely in the
                context of a DC. Pls include refs about this problem in
                a public Internet context. You will find there is a
                pretty good search engine at <a
                  class="gmail-m_3610993658251518594moz-txt-link-abbreviated"
                  href="http://www.google.com" target="_blank"
                  moz-do-not-send="true">www.google.com</a>.<br>
                <br>
                The only non-DC ref I can find about minRTO is
                [Psaras07], which is mainly about a proposal to apply
                minRTO if the sender expects the next ACK to be delayed.
                Nonetheless, the simulation experiment in Section 5.1
                provides good evidence for how RTO latency is dependent
                on uncertainty about the MAD that the other end is
                using.<br>
                <br>
                [Psaras07] Psaras, I. &amp; Tsaoussidis, V., "The TCP
                Minimum RTO Revisited," In: Proc. 6th Int'l IFIP-TC6
                Conference on Ad Hoc and Sensor Networks, Wireless
                Networks, Next Generation Internet NETWORKING'07
                pp.981-991 Springer-Verlag (2007)<br>
                <a
                  class="gmail-m_3610993658251518594moz-txt-link-freetext"
href="https://www.researchgate.net/publication/225442912_The_TCP_Minimum_RTO_Revisited"
                  target="_blank" moz-do-not-send="true">https://www.researchgate.net/<wbr>publication/225442912_The_TCP_<wbr>Minimum_RTO_Revisited</a></div>
            </blockquote>
            <div><br>
            </div>
            <div>Noted. Thanks a lot for the pointers. Will look into
              them and add to the draft.</div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"><br>
                <br>
                <b>4/ Status</b><b><br>
                </b><br>
                Normally, I wouldn't want to hold up a draft that has
                been proven over years of practice, such as the
                technique in low-latency-opt, which has been proven in
                Google's DCs over the last few years. Whereas, my ideas
                are just that: ideas, not proven. However, the technique
                in low-latency-opt has only been proven in DC
                environments where the range of RTTs is limited. So, now
                that you are proposing to transplant it onto the public
                Internet, it also only has the status of an unproven
                idea.<br>
                <br>
                To be clear, as it stands, I do not think
                low-latency-opt is applicable to the public Internet.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div><br>
            </div>
            <div>Hmm... I think overall, this approach should not do any
              harm to the network. It provides an additional feature to
              let the user configure the MAD if the user cares about it.
              If not, they can leave it as the default behavior as it is
              right now.</div>
            <div>To your concerns about the RTT variation in the
              internet, first, as I explained, this MAD value will be
              set per connection or per route. Secondly, I would think
              it is doable to do some bound check or error correction on
              the MAD value set by the user if we find that it is way
              below RTT and does not make sense. But again, we don't
              think MAD value is only a function of RTT. User should be
              able to configure it to a value suitable for his/her need.</div>
            <div>We want to make it as a standard so that all operating
              systems could implement this in the same way so that they
              could understand each other. One use case is that in a
              cloud environment where different operating systems are
              running in the same DC, they should be able to interpret
              this option with no issue.</div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: Yes, I guessed that this was probably what Google was really
    wanting to standardize this for. With the current config constraint
    text, it is limited to managed environments, which would make it
    uninteresting to many at the IETF.<br>
    <br>
    Fortunately, I think the line of thinking between Neal &amp; me is
    already widening applicability to unmanaged environments.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAEA6p_CN+w6XH-A=zNEc3SL9gnRF-oH5jKD4Kvkxb3=p_PTBUg@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> <br>
                <b>5/ Nits</b><b><br>
                </b>These nits depart from my promise not comment on
                details that could become irrelevant if you agree with
                my idea. Hey, whatever,... <br>
                <br>
                S.3.5:<br>
                <pre class="gmail-m_3610993658251518594newpage">	RTO &lt;- SRTT + max(G, K*RTTVAR) + max(G, max_ACK_delay)</pre>
                My immediate reaction to this was that G should not
                appear twice. However, perhaps you meant them to be G_s
                and G_r (sender and receiver) respectively. {Note 2}<br>
                <br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>As explained earlier, clock granularity of the receiver
              is already being considered in the MAD value itself. In
              the above formula, both G are the clock granularity on the
              sender side.</div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: Then it should not be necessary to round up 2 terms to the
    same granularity. Would it not be correct to use:<br>
    <pre class="gmail-m_3610993658251518594newpage">	RTO &lt;- SRTT + max(G, (K*RTTVAR + max_ACK_delay) )</pre>
    <br>
    <blockquote type="cite"
cite="mid:CAEA6p_CN+w6XH-A=zNEc3SL9gnRF-oH5jKD4Kvkxb3=p_PTBUg@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> S.3.5 &amp; S.5. It seems
                unnecessary to prohibit values of MAD greater than the
                default (given some companies are already investing in
                commercial public space flight programmes, so TCP could
                need to routinely support RTTs that are longer than
                typical not just shorter).<br>
                    </div>
            </blockquote>
            <div><br>
            </div>
            <div>Noted. Will take consideration of this.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Regards<br>
    <br>
    <br>
    <br>
    Bob<br>
    <blockquote type="cite"
cite="mid:CAEA6p_CN+w6XH-A=zNEc3SL9gnRF-oH5jKD4Kvkxb3=p_PTBUg@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> <br>
              </div>
            </blockquote>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> Cheers<br>
                <br>
                <br>
                <br>
                Bob<br>
                <br>
                <b><br>
                </b><b>{Note 1}</b>: On average, if not app-limited, the
                time between ACKs will be d_r*R_r/W_s where:<br>
                   R is SRTT<br>
                   d is the delayed ACK factor, e.g. d=2 for ACKing
                every other packet<br>
                   W is the window in units of segments<br>
                   subscripts X_r or X_s denote receiver or sender for
                the half-connection.<br>
                <br>
                So as long as the receiver can estimate the varying
                value of W at the sender, the receiver's MAD could be <br>
                    MAD_r = max(k*d_r*R_r / W_s, G_r), <br>
                The factor k (lower case) allows for some bunching of
                packets e.g. due to link layer aggregation or the
                residual effects of slow-start, which leaves some
                bunching even if SS uses pacing. Let's say k=2, but it
                would need to be checked empirically.<br>
                <br>
                For example, take R=100us, d=2, W=8 and G = 1us.<br>
                Given d*R/W = 25us, MAD could be perhaps 50us (i.e.
                k=2). k might need to be greater, but there would
                certainly be no need for MAD to be 5ms, which is perhaps
                100 times greater than necessary.<br>
                <b><br>
                </b><b>{Note 2}</b>: Why is there no field in the Low
                Latency option to communicate receiver clock granularity
                to the sender?<span class="gmail-HOEnZb"><font
                    color="#888888"><br>
                    <br>
                    <br>
                    Bob<br>
                    <br>
                    <pre class="gmail-m_3610993658251518594moz-signature" cols="72">-- 
______________________________<wbr>______________________________<wbr>____
Bob Briscoe                               <a class="gmail-m_3610993658251518594moz-txt-link-freetext" href="http://bobbriscoe.net/" target="_blank" moz-do-not-send="true">http://bobbriscoe.net/</a></pre>
                  </font></span></div>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
________________________________________________________________
Bob Briscoe                               <a class="moz-txt-link-freetext" href="http://bobbriscoe.net/">http://bobbriscoe.net/</a></pre>
  </body>
</html>

--------------D0BD57B3215DFFC78BA84A3C--


From nobody Sun Aug  6 10:39:30 2017
Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1A9D8131CDC for <tcpm@ietfa.amsl.com>; Sun,  6 Aug 2017 10:39:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level: 
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id T9XgdrAvGH2c for <tcpm@ietfa.amsl.com>; Sun,  6 Aug 2017 10:39:22 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 379EE131CEB for <tcpm@ietf.org>; Sun,  6 Aug 2017 10:39:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:Cc:To:Subject:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=OhmUkmmjtM+b4/5gMly/ypaEX++oUdi8z+1WlbPAAZ8=; b=0Xh5CcO0O5ZaBri4rfkIEvCPJ ZcejagRH18ISsdv3KvPTrMk9SgdCpve/ak0ovH/DnfwGMYUZ5V5F3AgZ/3qorUQu6yHScwHecYl7C iT18Nm53Zvxct5jp3UW9J/ELRuTv+Z672+HbN864LymvMOR1/wvVbQyJjIwXdbJKLP6qLwC5bYhHm YnUzQhO3WftfKlGdrc2SmMUVbsS5WqhYvPW7ddRMTVe3cxxaqKtWwamnh1CcZHtsbmmKmQcWJY0Zc rgjTYAA8mL1V0BAqUiJo7fYIa5m4B11XTNSn6LjMIBiGFIcBKYlD6Wo2iE6k9EEfAhFc6AYaba6cq eADzOPJ+g==;
Received: from 6.136.199.146.dyn.plus.net ([146.199.136.6]:46196 helo=[192.168.0.2]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89) (envelope-from <ietf@bobbriscoe.net>) id 1dePWH-0005yb-Al; Sun, 06 Aug 2017 18:39:18 +0100
To: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>, Yuchung Cheng <ycheng@google.com>, Wei Wang <weiwan@google.com>, tcpm IETF list <tcpm@ietf.org>
References: <8abadc4d-4165-a5bc-23bb-e4f9258c695b@bobbriscoe.net> <CAK6E8=c4D0QTzMobMQXLZMU5JiBRXXPdYJ0KTqvg08t+G0VDxQ@mail.gmail.com> <CANn89iL+TC6sh=e+keb4Psxz+E6oHV3Mcvsay6UYL2qEKUT6bw@mail.gmail.com> <2131135f-b123-70f0-d464-dac6640d6cd2@bobbriscoe.net> <d2570431-8c01-d7fc-5aa3-581d69836923@bobbriscoe.net> <CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <edfd5337-307c-2395-0bb1-83267d52088c@bobbriscoe.net>
Date: Sun, 6 Aug 2017 18:39:16 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------4DAD03512A616B65595DB32D"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/ruDlMq2QdejaAvFMt8CtOMEgqvM>
Subject: Re: [tcpm] Review of draft-wang-tcpm-low-latency-opt-00
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 06 Aug 2017 17:39:28 -0000

This is a multi-part message in MIME format.
--------------4DAD03512A616B65595DB32D
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Neal,

On 04/08/17 23:20, Neal Cardwell wrote:
> Thanks, Bob, for your detailed and thoughtful review! This is very 
> insightful and useful.
>
> Sorry I'm coming to this discussion a little late. I wanted to add a 
> few points, beyond what Wei has already noted.
>
> On Wed, Aug 2, 2017 at 11:54 AM, Bob Briscoe <ietf@bobbriscoe.net 
> <mailto:ietf@bobbriscoe.net>> wrote:
>
>     Wei, Yuchung, Neal and Eric, as authors of
>     draft-wang-tcpm-low-latency-opt-00,
>
>     I promised a review. It questions the technical logic behind the
>     draft, so I haven't bothered to give a detailed review of the
>     wording of the draft, because that might be irrelevant if you
>     agree with my arguments.
>
>     *1/ MAD by configuration?**
>     *
>
>         o  If the user does not specify a MAD value, then the implementation
>            SHOULD NOT specify a MAD value in the Low Latency option.
>
>     That sentence triggered my "anti-human-intervention" reflex. My
>     train of thought went as follows:
>
>
> Bob's remark about his "anti-human-intervention" reflex being
> triggered got me thinking.
>
> I, too, would like to minimize the amount of human (application)
> intervention this proposal involves (to avoid errors, maintenance,
> etc).
>
> It occurs to me that actually at Google our experience has shown that
> indeed apps have repeatedly made mistakes with this value, and we have
> found it convenient to progressively narrow their freedom in tuning
> this knob. To the point where actually in our deployment there is very
> little freedom left. Because in reality the OS and TCP stack
> developers know the timer granularity considerations, and the apps
> don't (and tend to use values 5 years out of date). So we've found it
> useful to have the OS tightly clamp the app's request for a MAD value.
>
> So in the interests of simplicity and avoiding human intervention,
> what if we do not have the MAD value as part of the API, but rather
> just allow the API to express a single "please use MAD" bit? And then
> the transport implementation uses the smallest value that it can
> support on this end host.
>
> Can we go further, and make MAD an automatic feature of the TCP
> implementation (so the transport implementation hard-wires MAD to "on"
> or "off")? My sense is that we don't want to go that far, and that
> instead we want to still allow apps to decide whether to use the
> "please use MAD" bit. Why? There may be middlebox or remote host
> compatibility issues with MAD. So we want apps (like browsers) to be
> able to do A/B experiments to validate that sending the MAD option on
> SYNs does not cause problems. We don't want to turn on MAD in Linux
> and then find compatibility issues, and have to wait for a client OS
> upgrade to everyone's cell phone to turn off MAD; instead we want to
> only have to wait for an app update.
[BB]: If there are problems, they will be per path, not per app. So 
there could be a cache to record per-path black-holing of packets 
carrying the option (no need to record stripping the option, which would 
be benign). Then no API at all would be needed.

As a fail-safe, you would want a system-wide sysctl to turn on MAD. I 
guess switching that would require an OS upgrade.

Whatever, as you say below, these are not really interop standardization 
issues (but it's still worth airing the possibilities).

>
> So... suppose an app decides it is latency-sensitive and wants to
> reduce ACK delays and negotiate a MAD value. And furthermore, the app
> is either (a) doing A/B experiments, or (b) has already convinced
> itself that MAD will work on this path.
>
> Then the app could enable MAD with a simple API like:
>    int mad = 1; // enable
>    err = setsockopt(fd, SOL_TCP, TCP_MAD, &mad, sizeof(mad));
>
> For better or for worse, that makes the TCP_MAD option much like the
> TCP_NODELAY option. Both in the sense that latency sensitive apps
> should remember to set this bit if they want low-latency behavior. And
> in the sense that the APIs would look very similar. And TCP_NODELAY
> and TCP_MAD would be sort of complimentary: TCP_NODELAY is the app
> saying "I want low latency for my sends" and TCP_MAD is the app
> saying "I want low latency for my ACKs". My guess is that most
> low-latency apps will want both.
>
> For the MAD API, I think this might be the "as simple as possible, but
> no simpler point".
[BB]: Is there an app that wants high delay loss recovery?
There is no tradeoff here, so pls keep it simple and just enable low 
latency for all connections.

>
> That said, that's an API issue. And I think for TCPM we should focus
> more on the wire protocol issues.
>
>     * Let's consider what advice we would give on what MAD value ought
>     to be configured.
>
>
> I would suggest that the advice be that when an app requests TCP_MAD,
> then transport implementors would have the transport implementation
> use the lowest feasible value based on the end host hardware/OS/app
> capabilities and workloads.
> Our sense from our deployment at Google
> is that for many current technologies and workloads this is probably
> currently in the range of 5ms - 10ms.
>
> But I don't think we should get bogged down in a discussion of what this
> configured value ought to be.
[BB]: Sry, perhaps I wasn't clear. I wrote that sentence to ask:
* not "what specific MAD value ought to be configured"
* but rather "what a good MAD value ought to depend on". I pick up on 
this question later...

> I think we should focus on the simplest
> protocol mechanism that can convey to the remote host the minimum
> info needed for the remote transport endpoint to achieve excellent
> performance.
>
> Here I think of the MSS option as a good analogy (and that's why we
> suggested the name "MAD").
>
> For MSS, the point is not to spend time discussing what MSS should be
> used, or to come up with complicated formulas to derive MSS. The point
> is to have a simple but general mechanism so that, no matter what the
> MSS value is (or the underlying hardware constraints are), there is a
> simple option that can convey a hint to the remote host. Then the
> remote host can use that hint to tune its sending behavior to achieve
> good performance.
>
> Now substitute "MAD" in the place of "MSS" in the preceding paragraph. :-)
>
>     * You say that MAD can be smaller in DCs. So I assume your advice
>     would be that MAD should depend on RTT {Note 1} and clock
>     granularity {Note 2}.
>
>
> Personally I do not think that MAD should depend on RTT. And I don't 
> think the draft says that it should (though let me know if there is 
> some spot I didn't notice).
>
> I'd vote for keeping MAD as simple as possible, which means keeping 
> RTT out of it. :-)
>
>     * So why configure one value of MAD for all RTTs? That only makes
>     sense in DC environments where the range of RTTs is small.
>
>
> I'd recommend one value of MAD for all RTTs for the sake of 
> simplicity. If we keep MAD as simple as possible, then it stays just 
> about the practical delay limitations of the end host (OS timers, CPU 
> power, CPU load, app behavior, end host queuing delays, etc). That is 
> what we have found makes sense in our deployment. And note that our 
> deployment of a MAD-like option covers RTTs that span quite a range, 
> from <1 ms up to hundreds of ms.
>
> Most OSes I know already have a constant that defines the maximum 
> interval over which they can delay their ACKs. We are basically just 
> suggesting a simple wire format for transport endpoints to advertise 
> this existing value as a hint.
>
>     * However, for the range of RTTs on the public Internet, why not
>     calculate MAD from RTT and granularity, then standardize the
>     calculation so that both ends arrive at the same result when
>     starting from the same RTT and granularity parameters? (The sender
>     and receiver might measure different smoothed (SRTT) values, but
>     they will converge as the flow progresses.)
>
>     Then the receiver only needs to communicate its clock granularity
>     to the sender, and the fact that it is driving MAD off its SRTT.
>     Then the sender can use a formula for RTO derived from the value
>     of MAD that it calculates the receiver will be using. Then its RTO
>     will be completely tailored to the RTT of the flow.
>
>
> A couple questions here:
>
> - Why  should we add the complexity of making MAD dependent on RTT? 
> I'm not clear on what the argument would be for the benefit of 
> introducing this complexity.
>
> - Even if the receiver only communicates its clock granularity to the 
> sender, and the fact that it is driving MAD off its SRTT, then there's 
> a the question of *how* it is deriving MAD. Presumably this could 
> change, as we come up with better ideas. So then we would want a 
> version number field to indicate which calculation is being used. It 
> seems much simpler to me to allow the end point to just communicate a 
> numerical delay value, rather than negotiate a version number of a 
> formula that can take a clock granularity and RTT as input and produce 
> a delay as output.
[BB]: Good point.
>
> - Introducing RTT as a dependence also introduces the question of what 
> to do when there is no RTT estimate (because all packets so far have 
> been retransmitted, with no timestamps). And as we discussed in Prague 
> and you mention here, the two sides often have slightly different RTT 
> estimates. There are probably other wrinkles as well.
[BB]: OK. I understood that the pretext for this draft was that the max 
ACK delay is too long for the low RTTs that are often in use these days. 
So I hadn't appreciated that you would advise that MAD would not depend 
on RTT.

Fair enough. I'll go along with this advice for now (but see later). 
However, let's just check that your proposal makes sense in other respects.

Q1. Is there not a risk that a value of MAD solely dependent on the 
receiver's OS parameters will be lower than the typical inter-packet 
arrival time for some flows? E.g.
     If data packets arrive every 7 ms {Note 3} then, even with a 
del_ack factor of 2, a receiver with MAD = 5 ms will ACK every packet. 
In fact, I think it will immediately ACK the first packet, then delay 
the ACK of every subsequent packet by 5ms. {Note 4}

I guess you are saying that would be OK from the point of view of the 
receiver's workload (otherwise it would not have set MAD=5ms). However, 
delayed ACKs are also intended to reduce network workload. {Note 5}.

{Note 3}: With 1500B packets that implies 1.7Mb/s, which is more than 3x 
my own ADSL uplink (I live in the developed world, but in a rural part 
of  it, where  such rates are common and the only alternative is 3G, 
which offers an even slower uplink :(

{Note 4}: I don't know what Implementations do, but RFC5681 implies that 
a receiver delays the next ACK whenever it sent the previous ACK, even 
if it delayed the previous one. The words are: "MUST be generated within 
<MAD> of the arrival of the first unacknowledged packet,"

{Note 5}: Not to mention that delaying every ACK makes it hard for the 
sender to use the ACKs to monitor queuing delay. However, this might be 
fixed by separate introduction of way to measure one-way delay using 
timestamps.

>
>     Note: There are two different uses for the min RTO that need to be
>     separated:
>         a) Before an initial RTT value has been measured, to determine
>     the RTO during the 3WHS.
>         b) Once either end has measured the RTT for a connection.
>     (a) needs to cope with the whole range of possible RTTs, whereas
>     (b) is the subject of this email, because it can be tailored for
>     the measured RTT.
>
>     *2/ The problem, and its prevalence**
>     *
>     With gradual removal of bufferbloat and more prevalent usage of
>     CDNs, typical base RTTs on the public Internet now make the value
>     of minRTO and of MAD look silly.
>
>     As can be seen above, the problem is indeed that each end only has
>     partial knowledge of the config of the other end.
>     However, the problem is not just that MAD needs to be communicated
>     to the other end so it can be hard-coded to a lower value.
>     The problem is that MAD is hard-coded in the first place.
>
>     The draft needs to say how prevalent the problem is (on the public
>     Internet) where the sender has to wait for the receiver's delayed
>     ACK timer at the end of a flow or between the end of a volley of
>     packets and the start of the next.
>
>     The draft also needs to say what tradeoff is considered acceptable
>     between a residual level of spurious retransmissions and lower
>     timeout delay. Eliminating all spurious retransmissions is not the
>     goal.
>
>     The draft also needs to say that introducing a new TCP Option is
>     itself a problem (on the public Internet), because of middleboxes
>     particularly proxies. Therefore a solution that does not need a
>     new TCP Option would be preferable....
>
>     Perhaps the solution for communicating timestamp resolution in
>     draft-scheffenegger-tcpm-timestamp-negotiation-05 (which cites
>     draft-trammell-tcpm-timestamp-interval-01) could be modified to
>     also communicate:
>     * TCP's clock granularity (closely related to TCP timestamp
>     resolution),
>     *  and the fact that the host is calculating MAD as a function of
>     RTT and granularity.
>     Then the existing timestamp option could be repurposed, which
>     should drastically reduce deployment problems.
>
>     *3/ Only DC?**
>     *
>     All the related work references are solely in the context of a DC.
>     Pls include refs about this problem in a public Internet context.
>     You will find there is a pretty good search engine at
>     www.google.com <http://www.google.com>.
>
>     The only non-DC ref I can find about minRTO is [Psaras07], which
>     is mainly about a proposal to apply minRTO if the sender expects
>     the next ACK to be delayed. Nonetheless, the simulation experiment
>     in Section 5.1 provides good evidence for how RTO latency is
>     dependent on uncertainty about the MAD that the other end is using.
>
>     [Psaras07] Psaras, I. & Tsaoussidis, V., "The TCP Minimum RTO
>     Revisited," In: Proc. 6th Int'l IFIP-TC6 Conference on Ad Hoc and
>     Sensor Networks, Wireless Networks, Next Generation Internet
>     NETWORKING'07 pp.981-991 Springer-Verlag (2007)
>     https://www.researchgate.net/publication/225442912_The_TCP_Minimum_RTO_Revisited
>     <https://www.researchgate.net/publication/225442912_The_TCP_Minimum_RTO_Revisited>
>
>
> All great points. Thanks!
>
>
>     *4/ Status**
>     *
>     Normally, I wouldn't want to hold up a draft that has been proven
>     over years of practice, such as the technique in low-latency-opt,
>     which has been proven in Google's DCs over the last few years.
>     Whereas, my ideas are just that: ideas, not proven. However, the
>     technique in low-latency-opt has only been proven in DC
>     environments where the range of RTTs is limited. So, now that you
>     are proposing to transplant it onto the public Internet, it also
>     only has the status of an unproven idea.
>
>     To be clear, as it stands, I do not think low-latency-opt is
>     applicable to the public Internet.
>
>
> Can you please elaborate on this? Is this because you think there 
> ought to be a dependence on RTT?
[BB]: I was trying to judge whether this is a straightforward 
standardization of tried and tested technology, or experimental.

The opinion about inapplicability to the Internet was based on the way 
the config requirements were written, which limited the draft to 
environments covered by a configuration management system, which is not 
typical for the public Internet.

I'm happier now that the focus is moving towards auto-tuning. However, 
this makes my first point about unproven territory even more 
applicable..., so Google's previous experience becomes less relevant, 
and makes this more experimental/researchy. For instance, the case I 
pointed out above for my own uplink would double the ACK rate, which 
might lead to knock-on problems - perhaps an increase in server 
processing load, or even processor overload on intermediate network 
equipment. We are also likely to discover interactions with ACK-thinning 
middleboxes.

more...
>
>
>
>     *5/ Nits**
>     *These nits depart from my promise not comment on details that
>     could become irrelevant if you agree with my idea. Hey, whatever,...
>
>     S.3.5:
>
>     	RTO <- SRTT + max(G, K*RTTVAR) + max(G, max_ACK_delay)
>
>     My immediate reaction to this was that G should not appear twice.
>     However, perhaps you meant them to be G_s and G_r (sender and
>     receiver) respectively. {Note 2}
>
>     S.3.5 & S.5. It seems unnecessary to prohibit values of MAD
>     greater than the default (given some companies are already
>     investing in commercial public space flight programmes, so TCP
>     could need to routinely support RTTs that are longer than typical
>     not just shorter).
>
>
>     Cheers
>
>
>
>     Bob
>
>     *
>     **{Note 1}*: On average, if not app-limited, the time between ACKs
>     will be d_r*R_r/W_s where:
>        R is SRTT
>        d is the delayed ACK factor, e.g. d=2 for ACKing every other packet
>        W is the window in units of segments
>        subscripts X_r or X_s denote receiver or sender for the
>     half-connection.
>
>     So as long as the receiver can estimate the varying value of W at
>     the sender, the receiver's MAD could be
>         MAD_r = max(k*d_r*R_r / W_s, G_r),
>     The factor k (lower case) allows for some bunching of packets e.g.
>     due to link layer aggregation or the residual effects of
>     slow-start, which leaves some bunching even if SS uses pacing.
>     Let's say k=2, but it would need to be checked empirically.
>
>     For example, take R=100us, d=2, W=8 and G = 1us.
>     Given d*R/W = 25us, MAD could be perhaps 50us (i.e. k=2). k might
>     need to be greater, but there would certainly be no need for MAD
>     to be 5ms, which is perhaps 100 times greater than necessary.
>
>
> With currently popular OS implementations I'm aware of, 50us for a 
> delayed ACK timer is infeasible. Most have a minimum granularity of 
> 1ms, or 10ms, or even larger, for delayed ACKs. And part of the point 
> of delayed ACKs is to wait for applications to respond, so that data 
> can be combined with the ACK. And 50us does not give the app much time 
> to respond.
[BB]: A modern processor can to do as much in 50us as a processor from 
the 1990s could do in about 10mins.

The min clock interrupt period has not changed much from the typical 
value of 10ms in 1990 [Dovrolis00]. This minimum is meant to maintain 
performance by keeping a healthy ratio between real work and context 
switching. However, the number of ops that can be processed in this 
duration has increased by about 10^7 over the same period.

I am (genuinely) interested to know what is the underlying factor that 
limits ACK delay to no less than 1-10ms? Is it Wirth's Law of software 
bloat (that the same task takes just as long, because increases in 
processing speed are absorbed by increases in code complexity)?

Nonetheless, whatever the clock granularity on a particular OS/machine, 
a stack should still be able to calculate what MAD ought to be. Then the 
final step in the calculation would round MAD up to the interrupt clock 
granularity. At least then code would perform better for OSs that reduce 
their clock granularity.


[Dovrolis00] Dovrolis, C. & Ramanathan, P., "Increasing the Clock 
Interrupt Frequency for Better Support of Real-Time Applications," Uni 
Wisconsin-Madison, Dept. Electrical & Computer Engineering 
http://www.cc.gatech.edu/~dovrolis/Papers/timers.ps (March 2000).


This returns us to the question:
*What ought MAD depend on?**
***
I prefer to start by analyzing what the best function and dependencies 
should be, then try to approximate for simplicity. I don't think we 
should start from the other direction ("let's think up a simple way to 
do it, and see if it works"). The former gives insight. The latter risks 
a random stumble across a territory of countless unforeseen problems.

The thought experiment I was conducting in my 'Note 1' above started 
from the idea that MAD ought to be somehow related to the average 
inter-arrival time in a flow. The (unstated) reasoning went like this:
* the retransmission delay after a pause/stop in the data stream ought 
to be similar to the retransmission delay without a pause/stop.
* so the ack delay after a pause/stop in the data stream ought to be 
similar to the ack delay without a pause/stop.

Put more succinctly:
     R + MAD ~= R + d * t_i            (1)
Then by definition:
     t_i = R/W
Which is how I got to
     MAD ~= d * R / W

where the notation is as earlier, plus
     t_i = avg inter-arrival time

Now, moving on to how to simplify this...

I accept your point that MAD has to be above "the practical delay 
limitations of the end host (OS timers, CPU power, CPU load, app 
behavior, end host queuing delays, etc)". Let's wrap that all into a 
variable we'll call g_r (which is itself lower bounded by clock 
granularity G_r).

Eqn (1) shows that the approximation for MAD only has to be within the 
order of the RTT.

Also, setting a lower bound for MAD of the order of an RTT would help to 
prevent the case I raised earlier where MAD is less than the average 
inter-arrival time (because it is abnormal to have <1 packet per RTT). 
Even for ultra-low RTTs, this also protects the network, because network 
processing should be sized to cope with ~1 ACK per RTT.

So, in summary, this is my current preferred approximation (but I'm open 
to others):

     MAD ~= max(c*R_r, g_r)

c is a constant factor determined empirically. I'm not sure whether it 
will be less or greater than 1, so let's assume nominally c=1.

To be clear, I'm accepting your argument that it is simplest for the 
receiver to communicate MAD in the TCP option. So (for now) I'm no 
longer proposing that the sender bases MAD on the RTT it measures 
itself. I.e. the receiver calculates MAD based on it's initial estimate 
of the RTT of the connection and other local parameters, then 
communicates it to the sender. It's not important how accurate the RTT 
estimate is. This is just to get a lower bound of roughly the right 
order of magnitude.

You are right that the receiver might not have a good RTT estimate if 
packets within the 3WHS were retransmitted. But, let me assume (for now) 
that we are using TCP timestamps, so a host can get a good RTT estimate 
even with retransmissions (...because I believe it will be easiest to 
deploy this MAD option by repurposing the TCP timestamp, similar to 
draft-scheffenegger-tcpm-timestamp-negotiation-05).

>
> Again, IMHO the MAD needs to incorporate hardware, software, and 
> workload constraints on the receiving end host.

[BB]: As above, delayed ACKs are also about reducing processing load in 
network equipment. If we do not take this into account, we risk networks 
deploying boxes to take this into account for themselves (e.g. ACK 
thinning).


>     *
>     **{Note 2}*: Why is there no field in the Low Latency option to
>     communicate receiver clock granularity to the sender?
>
>
> The idea is that the MAD value is a function of many parameters on the 
> end host. The clock granularity is only one of them. The simplest way 
> to convey on the wire a MAD parameter that is a function of many other 
> parameters is just to convey the MAD value itself.
>
> Bob, thanks again for your detailed and insightful feedback!
[BB]: I think we're getting somewhere.

Cheers


Bob

>
> neal
>
>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


--------------4DAD03512A616B65595DB32D
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Neal,<br>
    <br>
    <div class="moz-cite-prefix">On 04/08/17 23:20, Neal Cardwell wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">Thanks, Bob, for your detailed and thoughtful
        review! This is very insightful and useful.
        <div><br>
        </div>
        <div>Sorry I'm coming to this discussion a little late. I wanted
          to add a few points, beyond what Wei has already noted.<br>
          <div class="gmail_extra"><br>
            <div class="gmail_quote">On Wed, Aug 2, 2017 at 11:54 AM,
              Bob Briscoe <span dir="ltr">&lt;<a
                  href="mailto:ietf@bobbriscoe.net" target="_blank"
                  moz-do-not-send="true">ietf@bobbriscoe.net</a>&gt;</span>
              wrote:<br>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
                0.8ex;border-left:1px solid
                rgb(204,204,204);padding-left:1ex">
                <div bgcolor="#FFFFFF"> Wei, Yuchung, Neal and Eric, as
                  authors of draft-wang-tcpm-low-latency-op<wbr>t-00,<br>
                  <br>
                  I promised a review. It questions the technical logic
                  behind the draft, so I haven't bothered to give a
                  detailed review of the wording of the draft, because
                  that might be irrelevant if you agree with my
                  arguments.<br>
                  <br>
                  <b>1/ MAD by configuration?</b><b><br>
                  </b>
                  <pre class="gmail-m_4924700057452149308gmail-m_-7387466908820014094newpage">   o  If the user does not specify a MAD value, then the implementation
      SHOULD NOT specify a MAD value in the Low Latency option.
</pre>
                  That sentence triggered my "anti-human-intervention"
                  reflex. My train of thought went as follows:<br>
                </div>
              </blockquote>
              <div><br>
              </div>
              <div>
                <div>Bob's remark about his "anti-human-intervention"
                  reflex being</div>
                <div>triggered got me thinking.</div>
                <div><br>
                </div>
                <div>I, too, would like to minimize the amount of human
                  (application)</div>
                <div>intervention this proposal involves (to avoid
                  errors, maintenance,</div>
                <div>etc).</div>
                <div><br>
                </div>
                <div>It occurs to me that actually at Google our
                  experience has shown that</div>
                <div>indeed apps have repeatedly made mistakes with this
                  value, and we have</div>
                <div>found it convenient to progressively narrow their
                  freedom in tuning</div>
                <div>this knob. To the point where actually in our
                  deployment there is very</div>
                <div>little freedom left. Because in reality the OS and
                  TCP stack</div>
                <div>developers know the timer granularity
                  considerations, and the apps</div>
                <div>don't (and tend to use values 5 years out of date).
                  So we've found it</div>
                <div>useful to have the OS tightly clamp the app's
                  request for a MAD value.</div>
                <div><br>
                </div>
                <div>So in the interests of simplicity and avoiding
                  human intervention,</div>
                <div>what if we do not have the MAD value as part of the
                  API, but rather</div>
                <div>just allow the API to express a single "please use
                  MAD" bit? And then</div>
                <div>the transport implementation uses the smallest
                  value that it can</div>
                <div>support on this end host.</div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">
        <div>
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div>
                <div><br>
                </div>
                <div>Can we go further, and make MAD an automatic
                  feature of the TCP</div>
                <div>implementation (so the transport implementation
                  hard-wires MAD to "on"</div>
                <div>or "off")? My sense is that we don't want to go
                  that far, and that</div>
                <div>instead we want to still allow apps to decide
                  whether to use the</div>
                <div>"please use MAD" bit. Why? There may be middlebox
                  or remote host</div>
                <div>compatibility issues with MAD. So we want apps
                  (like browsers) to be</div>
                <div>able to do A/B experiments to validate that sending
                  the MAD option on</div>
                <div>SYNs does not cause problems. We don't want to turn
                  on MAD in Linux</div>
                <div>and then find compatibility issues, and have to
                  wait for a client OS</div>
                <div>upgrade to everyone's cell phone to turn off MAD;
                  instead we want to</div>
                <div>only have to wait for an app update.</div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: If there are problems, they will be per path, not per app. So
    there could be a cache to record per-path black-holing of packets
    carrying the option (no need to record stripping the option, which
    would be benign). Then no API at all would be needed.<br>
    <br>
    As a fail-safe, you would want a system-wide sysctl to turn on MAD.
    I guess switching that would require an OS upgrade.<br>
    <br>
    Whatever, as you say below, these are not really interop
    standardization issues (but it's still worth airing the
    possibilities).<br>
    <br>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">
        <div>
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div>
                <div><br>
                </div>
                <div>So... suppose an app decides it is
                  latency-sensitive and wants to</div>
                <div>reduce ACK delays and negotiate a MAD value. And
                  furthermore, the app</div>
                <div>is either (a) doing A/B experiments, or (b) has
                  already convinced</div>
                <div>itself that MAD will work on this path.</div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">
        <div>
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div>
                <div><br>
                </div>
                <div>Then the app could enable MAD with a simple API
                  like:</div>
                <div>   int mad = 1; // enable</div>
                <div>   err = setsockopt(fd, SOL_TCP, TCP_MAD, &amp;mad,
                  sizeof(mad));</div>
                <div><br>
                </div>
                <div>For better or for worse, that makes the TCP_MAD
                  option much like the</div>
                <div>TCP_NODELAY option. Both in the sense that latency
                  sensitive apps</div>
                <div>should remember to set this bit if they want
                  low-latency behavior. And</div>
                <div>in the sense that the APIs would look very similar.
                  And TCP_NODELAY</div>
                <div>and TCP_MAD would be sort of complimentary:
                  TCP_NODELAY is the app</div>
                <div>saying "I want low latency for my sends" and
                  TCP_MAD is the app</div>
                <div>saying "I want low latency for my ACKs". My guess
                  is that most</div>
                <div>low-latency apps will want both.</div>
                <div><br>
                </div>
                <div>For the MAD API, I think this might be the "as
                  simple as possible, but</div>
                <div>no simpler point".</div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: Is there an app that wants high delay loss recovery? <br>
    There is no tradeoff here, so pls keep it simple and just enable low
    latency for all connections.<br>
    <br>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">
        <div>
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div><br>
              </div>
              <div>That said, that's an API issue. And I think for TCPM
                we should focus</div>
              <div>more on the wire protocol issues.</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">
        <div>
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
                0.8ex;border-left:1px solid
                rgb(204,204,204);padding-left:1ex">
                <div bgcolor="#FFFFFF"> * Let's consider what advice we
                  would give on what MAD value ought to be configured.<br>
                </div>
              </blockquote>
              <div><br>
              </div>
              <div>I would suggest that the advice be that when an app
                requests TCP_MAD,</div>
              <div>then transport implementors would have the transport
                implementation</div>
              <div>use the lowest feasible value based on the end host
                hardware/OS/app</div>
              <div>capabilities and workloads. </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">
        <div>
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div>Our sense from our deployment at Google</div>
              <div>is that for many current technologies and workloads
                this is probably</div>
              <div>currently in the range of 5ms - 10ms.</div>
              <div><br>
              </div>
              <div>But I don't think we should get bogged down in a
                discussion of what this</div>
              <div>configured value ought to be. </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: Sry, perhaps I wasn't clear. I wrote that sentence to ask:<br>
    * not "what specific MAD value ought to be configured" <br>
    * but rather "what a good MAD value ought to depend on". I pick up
    on this question later...<br>
    <br>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">
        <div>
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div>I think we should focus on the simplest</div>
              <div>protocol mechanism that can convey to the remote host
                the minimum</div>
              <div>info needed for the remote transport endpoint to
                achieve excellent</div>
              <div>performance.</div>
              <div><br>
              </div>
              <div>Here I think of the MSS option as a good analogy (and
                that's why we</div>
              <div>suggested the name "MAD").</div>
              <div><br>
              </div>
              <div>For MSS, the point is not to spend time discussing
                what MSS should be</div>
              <div>used, or to come up with complicated formulas to
                derive MSS. The point</div>
              <div>is to have a simple but general mechanism so that, no
                matter what the</div>
              <div>MSS value is (or the underlying hardware constraints
                are), there is a</div>
              <div>simple option that can convey a hint to the remote
                host. Then the</div>
              <div>remote host can use that hint to tune its sending
                behavior to achieve</div>
              <div>good performance.</div>
              <div><br>
              </div>
              <div>Now substitute "MAD" in the place of "MSS" in the
                preceding paragraph. :-)</div>
              <div> <br>
              </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
                0.8ex;border-left:1px solid
                rgb(204,204,204);padding-left:1ex">
                <div bgcolor="#FFFFFF"> * You say that MAD can be
                  smaller in DCs. So I assume your advice would be that
                  MAD should depend on RTT {Note 1} and clock
                  granularity {Note 2}.<br>
                </div>
              </blockquote>
              <div><br>
              </div>
              <div>Personally I do not think that MAD should depend on
                RTT. And I don't think the draft says that it should
                (though let me know if there is some spot I didn't
                notice).</div>
              <div><br>
              </div>
              <div>I'd vote for keeping MAD as simple as possible, which
                means keeping RTT out of it. :-)</div>
              <div><br>
              </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
                0.8ex;border-left:1px solid
                rgb(204,204,204);padding-left:1ex">
                <div bgcolor="#FFFFFF"> * So why configure one value of
                  MAD for all RTTs? That only makes sense in DC
                  environments where the range of RTTs is small. <br>
                </div>
              </blockquote>
              <div><br>
              </div>
              <div>I'd recommend one value of MAD for all RTTs for the
                sake of simplicity. If we keep MAD as simple as
                possible, then it stays just about the practical delay
                limitations of the end host (OS timers, CPU power, CPU
                load, app behavior, end host queuing delays, etc). That
                is what we have found makes sense in our deployment. And
                note that our deployment of a MAD-like option covers
                RTTs that span quite a range, from &lt;1 ms up to
                hundreds of ms.</div>
              <div><br>
              </div>
              <div>Most OSes I know already have a constant that defines
                the maximum interval over which they can delay their
                ACKs. We are basically just suggesting a simple wire
                format for transport endpoints to advertise this
                existing value as a hint.</div>
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
                0.8ex;border-left:1px solid
                rgb(204,204,204);padding-left:1ex">
                <div bgcolor="#FFFFFF"> * However, for the range of RTTs
                  on the public Internet, why not calculate MAD from RTT
                  and granularity, then standardize the calculation so
                  that both ends arrive at the same result when starting
                  from the same RTT and granularity parameters? (The
                  sender and receiver might measure different smoothed
                  (SRTT) values, but they will converge as the flow
                  progresses.)<br>
                  <br>
                  Then the receiver only needs to communicate its clock
                  granularity to the sender, and the fact that it is
                  driving MAD off its SRTT. Then the sender can use a
                  formula for RTO derived from the value of MAD that it
                  calculates the receiver will be using. Then its RTO
                  will be completely tailored to the RTT of the flow. <br>
                </div>
              </blockquote>
              <div><br>
              </div>
              <div>A couple questions here:</div>
              <div><br>
              </div>
              <div>- Why  should we add the complexity of making MAD
                dependent on RTT? I'm not clear on what the argument
                would be for the benefit of introducing this complexity.</div>
              <div><br>
              </div>
              <div>- Even if the receiver only communicates its clock
                granularity to the sender, and the fact that it is
                driving MAD off its SRTT, then there's a the question of
                *how* it is deriving MAD. Presumably this could change,
                as we come up with better ideas. So then we would want a
                version number field to indicate which calculation is
                being used. It seems much simpler to me to allow the end
                point to just communicate a numerical delay value,
                rather than negotiate a version number of a formula that
                can take a clock granularity and RTT as input and
                produce a delay as output.</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: Good point.<br>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">
        <div>
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div><br>
              </div>
              <div>- Introducing RTT as a dependence also introduces the
                question of what to do when there is no RTT estimate
                (because all packets so far have been retransmitted,
                with no timestamps). And as we discussed in Prague and
                you mention here, the two sides often have slightly
                different RTT estimates. There are probably other
                wrinkles as well.</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: OK. I understood that the pretext for this draft was that the
    max ACK delay is too long for the low RTTs that are often in use
    these days. So I hadn't appreciated that you would advise that MAD
    would not depend on RTT.<br>
    <br>
    Fair enough. I'll go along with this advice for now (but see later).
    However, let's just check that your proposal makes sense in other
    respects. <br>
    <br>
    Q1. Is there not a risk that a value of MAD solely dependent on the
    receiver's OS parameters will be lower than the typical inter-packet
    arrival time for some flows? E.g.<br>
        If data packets arrive every 7 ms {Note 3} then, even with a
    del_ack factor of 2, a receiver with MAD = 5 ms will ACK every
    packet. In fact, I think it will immediately ACK the first packet,
    then delay the ACK of every subsequent packet by 5ms. {Note 4}<br>
    <br>
    I guess you are saying that would be OK from the point of view of
    the receiver's workload (otherwise it would not have set MAD=5ms).
    However, delayed ACKs are also intended to reduce network workload.
    {Note 5}.<br>
    <br>
    {Note 3}: With 1500B packets that implies 1.7Mb/s, which is more
    than 3x my own ADSL uplink (I live in the developed world, but in a
    rural part of  it, where  such rates are common and the only
    alternative is 3G, which offers an even slower uplink :(<br>
    <br>
    {Note 4}: I don't know what Implementations do, but RFC5681 implies
    that a receiver delays the next ACK whenever it sent the previous
    ACK, even if it delayed the previous one. The words are: "MUST be
    generated within &lt;MAD&gt; of the arrival of the first
    unacknowledged packet,"<br>
    <br>
    {Note 5}: Not to mention that delaying every ACK makes it hard for
    the sender to use the ACKs to monitor queuing delay. However, this
    might be fixed by separate introduction of way to measure one-way
    delay using timestamps.<br>
    <br>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">
        <div>
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
                0.8ex;border-left:1px solid
                rgb(204,204,204);padding-left:1ex">
                <div bgcolor="#FFFFFF"> <br>
                  Note: There are two different uses for the min RTO
                  that need to be separated:<br>
                      a) Before an initial RTT value has been measured,
                  to determine the RTO during the 3WHS.<br>
                      b) Once either end has measured the RTT for a
                  connection.<br>
                  (a) needs to cope with the whole range of possible
                  RTTs, whereas (b) is the subject of this email,
                  because it can be tailored for the measured RTT.<br>
                  <br>
                  <b>2/ The problem, and its prevalence</b><b><br>
                  </b><br>
                  With gradual removal of bufferbloat and more prevalent
                  usage of CDNs, typical base RTTs on the public
                  Internet now make the value of minRTO and of MAD look
                  silly.<br>
                  <br>
                  As can be seen above, the problem is indeed that each
                  end only has partial knowledge of the config of the
                  other end.<br>
                  However, the problem is not just that MAD needs to be
                  communicated to the other end so it can be hard-coded
                  to a lower value.<br>
                  The problem is that MAD is hard-coded in the first
                  place.<br>
                  <br>
                  The draft needs to say how prevalent the problem is
                  (on the public Internet) where the sender has to wait
                  for the receiver's delayed ACK timer at the end of a
                  flow or between the end of a volley of packets and the
                  start of the next. <br>
                  <br>
                  The draft also needs to say what tradeoff is
                  considered acceptable between a residual level of
                  spurious retransmissions and lower timeout delay.
                  Eliminating all spurious retransmissions is not the
                  goal.<br>
                  <br>
                  The draft also needs to say that introducing a new TCP
                  Option is itself a problem (on the public Internet),
                  because of middleboxes particularly proxies. Therefore
                  a solution that does not need a new TCP Option would
                  be preferable....<br>
                  <br>
                  Perhaps the solution for communicating timestamp
                  resolution in draft-scheffenegger-tcpm-times<wbr>tamp-negotiation-05
                  (which cites draft-trammell-tcpm-timestamp-<wbr>interval-01)
                  could be modified to also communicate:<br>
                  * TCP's clock granularity (closely related to TCP
                  timestamp resolution), <br>
                  *  and the fact that the host is calculating MAD as a
                  function of RTT and granularity. <br>
                  Then the existing timestamp option could be
                  repurposed, which should drastically reduce deployment
                  problems.<br>
                  <br>
                  <b>3/ Only DC?</b><b><br>
                  </b><br>
                  All the related work references are solely in the
                  context of a DC. Pls include refs about this problem
                  in a public Internet context. You will find there is a
                  pretty good search engine at <a
class="gmail-m_4924700057452149308gmail-m_-7387466908820014094moz-txt-link-abbreviated"
                    href="http://www.google.com" target="_blank"
                    moz-do-not-send="true">www.google.com</a>.<br>
                  <br>
                  The only non-DC ref I can find about minRTO is
                  [Psaras07], which is mainly about a proposal to apply
                  minRTO if the sender expects the next ACK to be
                  delayed. Nonetheless, the simulation experiment in
                  Section 5.1 provides good evidence for how RTO latency
                  is dependent on uncertainty about the MAD that the
                  other end is using.<br>
                  <br>
                  [Psaras07] Psaras, I. &amp; Tsaoussidis, V., "The TCP
                  Minimum RTO Revisited," In: Proc. 6th Int'l IFIP-TC6
                  Conference on Ad Hoc and Sensor Networks, Wireless
                  Networks, Next Generation Internet NETWORKING'07
                  pp.981-991 Springer-Verlag (2007)<br>
                  <a
class="gmail-m_4924700057452149308gmail-m_-7387466908820014094moz-txt-link-freetext"
href="https://www.researchgate.net/publication/225442912_The_TCP_Minimum_RTO_Revisited"
                    target="_blank" moz-do-not-send="true">https://www.researchgate.net/p<wbr>ublication/225442912_The_TCP_M<wbr>inimum_RTO_Revisited</a></div>
              </blockquote>
              <div><br>
              </div>
              <div>All great points. Thanks!</div>
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
                0.8ex;border-left:1px solid
                rgb(204,204,204);padding-left:1ex">
                <div bgcolor="#FFFFFF"> <br>
                  <b>4/ Status</b><b><br>
                  </b><br>
                  Normally, I wouldn't want to hold up a draft that has
                  been proven over years of practice, such as the
                  technique in low-latency-opt, which has been proven in
                  Google's DCs over the last few years. Whereas, my
                  ideas are just that: ideas, not proven. However, the
                  technique in low-latency-opt has only been proven in
                  DC environments where the range of RTTs is limited.
                  So, now that you are proposing to transplant it onto
                  the public Internet, it also only has the status of an
                  unproven idea.<br>
                  <br>
                  To be clear, as it stands, I do not think
                  low-latency-opt is applicable to the public Internet.<br>
                </div>
              </blockquote>
              <div><br>
              </div>
              <div>Can you please elaborate on this? Is this because you
                think there ought to be a dependence on RTT?</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: I was trying to judge whether this is a straightforward
    standardization of tried and tested technology, or experimental.<br>
    <br>
    The opinion about inapplicability to the Internet was based on the
    way the config requirements were written, which limited the draft to
    environments covered by a configuration management system, which is
    not typical for the public Internet.<br>
    <br>
    I'm happier now that the focus is moving towards auto-tuning.
    However, this makes my first point about unproven territory even
    more applicable..., so Google's previous experience becomes less
    relevant, and makes this more experimental/researchy. For instance,
    the case I pointed out above for my own uplink would double the ACK
    rate, which might lead to knock-on problems - perhaps an increase in
    server processing load, or even processor overload on intermediate
    network equipment. We are also likely to discover interactions with
    ACK-thinning middleboxes.<br>
    <br>
    more...<br>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">
        <div>
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
                0.8ex;border-left:1px solid
                rgb(204,204,204);padding-left:1ex">
                <div bgcolor="#FFFFFF"> <br>
                  <br>
                  <b>5/ Nits</b><b><br>
                  </b>These nits depart from my promise not comment on
                  details that could become irrelevant if you agree with
                  my idea. Hey, whatever,... <br>
                  <br>
                  S.3.5:<br>
                  <pre class="gmail-m_4924700057452149308gmail-m_-7387466908820014094newpage">	RTO &lt;- SRTT + max(G, K*RTTVAR) + max(G, max_ACK_delay)</pre>
                  My immediate reaction to this was that G should not
                  appear twice. However, perhaps you meant them to be
                  G_s and G_r (sender and receiver) respectively. {Note
                  2}<br>
                  <br>
                  S.3.5 &amp; S.5. It seems unnecessary to prohibit
                  values of MAD greater than the default (given some
                  companies are already investing in commercial public
                  space flight programmes, so TCP could need to
                  routinely support RTTs that are longer than typical
                  not just shorter).<br>
                  <br>
                  <br>
                  Cheers<br>
                  <br>
                  <br>
                  <br>
                  Bob<br>
                  <br>
                  <b><br>
                  </b><b>{Note 1}</b>: On average, if not app-limited,
                  the time between ACKs will be d_r*R_r/W_s where:<br>
                     R is SRTT<br>
                     d is the delayed ACK factor, e.g. d=2 for ACKing
                  every other packet<br>
                     W is the window in units of segments<br>
                     subscripts X_r or X_s denote receiver or sender for
                  the half-connection.<br>
                  <br>
                  So as long as the receiver can estimate the varying
                  value of W at the sender, the receiver's MAD could be
                  <br>
                      MAD_r = max(k*d_r*R_r / W_s, G_r), <br>
                  The factor k (lower case) allows for some bunching of
                  packets e.g. due to link layer aggregation or the
                  residual effects of slow-start, which leaves some
                  bunching even if SS uses pacing. Let's say k=2, but it
                  would need to be checked empirically.<br>
                  <br>
                  For example, take R=100us, d=2, W=8 and G = 1us.<br>
                  Given d*R/W = 25us, MAD could be perhaps 50us (i.e.
                  k=2). k might need to be greater, but there would
                  certainly be no need for MAD to be 5ms, which is
                  perhaps 100 times greater than necessary.<br>
                </div>
              </blockquote>
              <div><br>
              </div>
              <div>With currently popular OS implementations I'm aware
                of, 50us for a delayed ACK timer is infeasible. Most
                have a minimum granularity of 1ms, or 10ms, or even
                larger, for delayed ACKs. And part of the point of
                delayed ACKs is to wait for applications to respond, so
                that data can be combined with the ACK. And 50us does
                not give the app much time to respond.</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    [BB]: A modern processor can to do as much in 50us as a processor
    from the 1990s could do in about 10mins.<br>
    <br>
    The min clock interrupt period has not changed much from the typical
    value of 10ms in 1990 [Dovrolis00]. This minimum is meant to
    maintain performance by keeping a healthy ratio between real work
    and context switching. However, the number of ops that can be
    processed in this duration has increased by about 10^7 over the same
    period. <br>
    <br>
    I am (genuinely) interested to know what is the underlying factor
    that limits ACK delay to no less than 1-10ms? Is it Wirth's Law of
    software bloat (that the same task takes just as long, because
    increases in processing speed are absorbed by increases in code
    complexity)?<br>
    <br>
    Nonetheless, whatever the clock granularity on a particular
    OS/machine, a stack should still be able to calculate what MAD ought
    to be. Then the final step in the calculation would round MAD up to
    the interrupt clock granularity. At least then code would perform
    better for OSs that reduce their clock granularity.<br>
    <br>
    <br>
    [Dovrolis00] Dovrolis, C. &amp; Ramanathan, P., "Increasing the
    Clock Interrupt Frequency for Better Support of Real-Time
    Applications," Uni Wisconsin-Madison, Dept. Electrical &amp;
    Computer Engineering
    <a class="moz-txt-link-freetext" href="http://www.cc.gatech.edu/~dovrolis/Papers/timers.ps">http://www.cc.gatech.edu/~dovrolis/Papers/timers.ps</a> (March 2000).<br>
    <br>
    <br>
    This returns us to the question:<br>
    <b>What ought MAD depend on?</b><b><br>
    </b><b>
    </b><br>
    I prefer to start by analyzing what the best function and
    dependencies should be, then try to approximate for simplicity. I
    don't think we should start from the other direction ("let's think
    up a simple way to do it, and see if it works"). The former gives
    insight. The latter risks a random stumble across a territory of
    countless unforeseen problems.<br>
    <br>
    The thought experiment I was conducting in my 'Note 1' above started
    from the idea that MAD ought to be somehow related to the average
    inter-arrival time in a flow. The (unstated) reasoning went like
    this:<br>
    * the retransmission delay after a pause/stop in the data stream
    ought to be similar to the retransmission delay without a
    pause/stop.<br>
    * so the ack delay after a pause/stop in the data stream ought to be
    similar to the ack delay without a pause/stop. <br>
    <br>
    Put more succinctly:<br>
        R + MAD ~= R + d * t_i            (1)<br>
    Then by definition:<br>
        t_i = R/W<br>
    Which is how I got to<br>
        MAD ~= d * R / W<br>
    <br>
    where the notation is as earlier, plus<br>
        t_i = avg inter-arrival time<br>
    <br>
    Now, moving on to how to simplify this... <br>
    <br>
    I accept your point that MAD has to be above "the practical delay
    limitations of the end host (OS timers, CPU power, CPU load, app
    behavior, end host queuing delays, etc)". Let's wrap that all into a
    variable we'll call g_r (which is itself lower bounded by clock
    granularity G_r).<br>
    <br>
    Eqn (1) shows that the approximation for MAD only has to be within
    the order of the RTT. <br>
    <br>
    Also, setting a lower bound for MAD of the order of an RTT would
    help to prevent the case I raised earlier where MAD is less than the
    average inter-arrival time (because it is abnormal to have &lt;1
    packet per RTT). Even for ultra-low RTTs, this also protects the
    network, because network processing should be sized to cope with ~1
    ACK per RTT.<br>
    <br>
    So, in summary, this is my current preferred approximation (but I'm
    open to others):<br>
    <br>
        MAD ~= max(c*R_r, g_r)<br>
    <br>
    c is a constant factor determined empirically. I'm not sure whether
    it will be less or greater than 1, so let's assume nominally c=1.<br>
    <br>
    To be clear, I'm accepting your argument that it is simplest for the
    receiver to communicate MAD in the TCP option. So (for now) I'm no
    longer proposing that the sender bases MAD on the RTT it measures
    itself. I.e. the receiver calculates MAD based on it's initial
    estimate of the RTT of the connection and other local parameters,
    then communicates it to the sender. It's not important how accurate
    the RTT estimate is. This is just to get a lower bound of roughly
    the right order of magnitude.<br>
    <br>
    You are right that the receiver might not have a good RTT estimate
    if packets within the 3WHS were retransmitted. But, let me assume
    (for now) that we are using TCP timestamps, so a host can get a good
    RTT estimate even with retransmissions (...because I believe it will
    be easiest to deploy this MAD option by repurposing the TCP
    timestamp, similar to
    draft-scheffenegger-tcpm-timestamp-negotiation-05).<br>
    <br>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">
        <div>
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div><br>
              </div>
              <div>Again, IMHO the MAD needs to incorporate hardware,
                software, and workload constraints on the receiving end
                host. <br>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    [BB]: As above, delayed ACKs are also about reducing processing load
    in network equipment. If we do not take this into account, we risk
    networks deploying boxes to take this into account for themselves
    (e.g. ACK thinning).<br>
    <br>
    <br>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">
        <div>
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
                0.8ex;border-left:1px solid
                rgb(204,204,204);padding-left:1ex">
                <div bgcolor="#FFFFFF"> <b><br>
                  </b><b>{Note 2}</b>: Why is there no field in the Low
                  Latency option to communicate receiver clock
                  granularity to the sender?<span
                    class="gmail-m_4924700057452149308gmail-HOEnZb"><font
                      color="#888888"><br>
                      <br>
                    </font></span></div>
              </blockquote>
            </div>
            <br>
          </div>
        </div>
        <div class="gmail_extra">The idea is that the MAD value is a
          function of many parameters on the end host. The clock
          granularity is only one of them. The simplest way to convey on
          the wire a MAD parameter that is a function of many other
          parameters is just to convey the MAD value itself.</div>
        <div class="gmail_extra"><br>
        </div>
        <div class="gmail_extra">Bob, thanks again for your detailed and
          insightful feedback!</div>
      </div>
    </blockquote>
    [BB]: I think we're getting somewhere.<br>
    <br>
    Cheers<br>
    <br>
    <br>
    Bob<br>
    <br>
    <blockquote type="cite"
cite="mid:CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra"><br>
        </div>
        <div class="gmail_extra">neal</div>
        <div class="gmail_extra"><br>
        </div>
        <div class="gmail_extra"><br>
        </div>
      </div>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
________________________________________________________________
Bob Briscoe                               <a class="moz-txt-link-freetext" href="http://bobbriscoe.net/">http://bobbriscoe.net/</a></pre>
  </body>
</html>

--------------4DAD03512A616B65595DB32D--


From nobody Sun Aug  6 11:17:21 2017
Return-Path: <jgh@wizmail.org>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DD1BD129B2A for <tcpm@ietfa.amsl.com>; Sun,  6 Aug 2017 11:17:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id njFUnZzg2w2o for <tcpm@ietfa.amsl.com>; Sun,  6 Aug 2017 11:17:18 -0700 (PDT)
Received: from wizmail.org (wizmail.org [IPv6:2a00:1940:107::2:0:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 15629132043 for <tcpm@ietf.org>; Sun,  6 Aug 2017 11:17:18 -0700 (PDT)
Received: from [2a00:b900:109e:0:c5d6:c61b:f5e0:b51f] (helo=lap.dom.ain) by wizmail.org with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89.113) id 1deQ71-0003pZ-4q for tcpm@ietf.org (return-path <jgh@wizmail.org>); Sun, 06 Aug 2017 18:17:15 +0000
To: tcpm@ietf.org
References: <8abadc4d-4165-a5bc-23bb-e4f9258c695b@bobbriscoe.net> <CAK6E8=c4D0QTzMobMQXLZMU5JiBRXXPdYJ0KTqvg08t+G0VDxQ@mail.gmail.com> <CANn89iL+TC6sh=e+keb4Psxz+E6oHV3Mcvsay6UYL2qEKUT6bw@mail.gmail.com> <2131135f-b123-70f0-d464-dac6640d6cd2@bobbriscoe.net> <d2570431-8c01-d7fc-5aa3-581d69836923@bobbriscoe.net> <CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com>
From: Jeremy Harris <jgh@wizmail.org>
Message-ID: <81bd3c24-20d7-7de8-45c6-98daa7b95d18@wizmail.org>
Date: Sun, 6 Aug 2017 19:17:13 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Pcms-Received-Sender: [2a00:b900:109e:0:c5d6:c61b:f5e0:b51f] (helo=lap.dom.ain) with esmtpsa
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/GjJo92b-Zr_Wy3GNaafq-fz_88M>
Subject: Re: [tcpm] Review of draft-wang-tcpm-low-latency-opt-00
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 06 Aug 2017 18:17:20 -0000

The draft proposes a SYN-time option to notify a MAD value
for the connection.  Would it not be preferable to use
a data-time option, permitting an implementation to track
(from the TCP endpoint's view) the application response time,
adjusting the delayed-ACK timer to suit - and notifying the
peer occasionally?
-- 
Cheers,
  Jeremy


From nobody Sun Aug  6 11:22:38 2017
Return-Path: <jgh@wizmail.org>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3E850131DF7 for <tcpm@ietfa.amsl.com>; Sun,  6 Aug 2017 11:22:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XxwsDoaFG_uJ for <tcpm@ietfa.amsl.com>; Sun,  6 Aug 2017 11:22:35 -0700 (PDT)
Received: from wizmail.org (wizmail.org [IPv6:2a00:1940:107::2:0:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CBC96131D6D for <tcpm@ietf.org>; Sun,  6 Aug 2017 11:22:34 -0700 (PDT)
Received: from [2a00:b900:109e:0:c5d6:c61b:f5e0:b51f] (helo=lap.dom.ain) by wizmail.org with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89.113) id 1deQC9-0003zr-28 for tcpm@ietf.org (return-path <jgh@wizmail.org>); Sun, 06 Aug 2017 18:22:33 +0000
To: tcpm@ietf.org
References: <8abadc4d-4165-a5bc-23bb-e4f9258c695b@bobbriscoe.net> <CAK6E8=c4D0QTzMobMQXLZMU5JiBRXXPdYJ0KTqvg08t+G0VDxQ@mail.gmail.com> <CANn89iL+TC6sh=e+keb4Psxz+E6oHV3Mcvsay6UYL2qEKUT6bw@mail.gmail.com> <2131135f-b123-70f0-d464-dac6640d6cd2@bobbriscoe.net> <d2570431-8c01-d7fc-5aa3-581d69836923@bobbriscoe.net> <CADVnQykz_pUqQLRmzpUd+E0R0iLWeZ3fZN=_K9Roee0zuz1x6A@mail.gmail.com> <edfd5337-307c-2395-0bb1-83267d52088c@bobbriscoe.net>
From: Jeremy Harris <jgh@wizmail.org>
Message-ID: <fe058621-dc71-749e-ff51-e3a6416cedc8@wizmail.org>
Date: Sun, 6 Aug 2017 19:22:31 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <edfd5337-307c-2395-0bb1-83267d52088c@bobbriscoe.net>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Pcms-Received-Sender: [2a00:b900:109e:0:c5d6:c61b:f5e0:b51f] (helo=lap.dom.ain) with esmtpsa
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/JYQl00LJmt9-2dWnYgQKzsjFVqo>
Subject: Re: [tcpm] Review of draft-wang-tcpm-low-latency-opt-00
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 06 Aug 2017 18:22:36 -0000

On 06/08/17 18:39, Bob Briscoe wrote:
> [BB]: Is there an app that wants high delay loss recovery?
> There is no tradeoff here, so pls keep it simple and just enable low
> latency for all connections.

The tradeoff is against potentially mistaken retransmissions, which
could matter on a data-costly channel or for an energy-constrained
endpoint.
-- 
Cheers,
  Jeremy


From nobody Mon Aug 28 23:51:48 2017
Return-Path: <internet-drafts@ietf.org>
X-Original-To: tcpm@ietf.org
Delivered-To: tcpm@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 851C0132386; Mon, 28 Aug 2017 23:51:46 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: internet-drafts@ietf.org
To: <i-d-announce@ietf.org>
Cc: tcpm@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.59.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <150398950648.13143.4092402418089275384@ietfa.amsl.com>
Date: Mon, 28 Aug 2017 23:51:46 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/Cn0MIIJDlB2ZmE6y-DW9beR0eho>
Subject: [tcpm] I-D Action: draft-ietf-tcpm-dctcp-10.txt
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Aug 2017 06:51:47 -0000

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the TCP Maintenance and Minor Extensions WG of the IETF.

        Title           : Datacenter TCP (DCTCP): TCP Congestion Control for Datacenters
        Authors         : Stephen Bensley
                          Dave Thaler
                          Praveen Balasubramanian
                          Lars Eggert
                          Glenn Judd
	Filename        : draft-ietf-tcpm-dctcp-10.txt
	Pages           : 16
	Date            : 2017-08-28

Abstract:
   This informational memo describes Datacenter TCP (DCTCP), a TCP
   congestion control scheme for datacenter traffic.  DCTCP extends the
   Explicit Congestion Notification (ECN) processing to estimate the
   fraction of bytes that encounter congestion, rather than simply
   detecting that some congestion has occurred.  DCTCP then scales the
   TCP congestion window based on this estimate.  This method achieves
   high burst tolerance, low latency, and high throughput with shallow-
   buffered switches.  This memo also discusses deployment issues
   related to the coexistence of DCTCP and conventional TCP, the lack of
   a negotiating mechanism between sender and receiver, and presents
   some possible mitigations.  This memo documents DCTCP as currently
   implemented by several major operating systems.  DCTCP as described
   in this draft is applicable to deployments in controlled environments
   like datacenters but it must not be deployed over the public Internet
   without additional measures.


The IETF datatracker status page for this draft is:
https://datatracker.ietf.org/doc/draft-ietf-tcpm-dctcp/

There are also htmlized versions available at:
https://tools.ietf.org/html/draft-ietf-tcpm-dctcp-10
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-dctcp-10

A diff from the previous version is available at:
https://www.ietf.org/rfcdiff?url2=draft-ietf-tcpm-dctcp-10


Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/


From nobody Mon Aug 28 23:58:37 2017
Return-Path: <lars@netapp.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AA5071320BB for <tcpm@ietfa.amsl.com>; Mon, 28 Aug 2017 23:58:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=netapp.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id THExaVSYJbSC for <tcpm@ietfa.amsl.com>; Mon, 28 Aug 2017 23:58:33 -0700 (PDT)
Received: from mx142.netapp.com (mx142.netapp.com [IPv6:2620:10a:4005:8000:2306::b]) (using TLSv1.2 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D130C132705 for <tcpm@ietf.org>; Mon, 28 Aug 2017 23:58:33 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="5.41,444,1498546800";  d="asc'?scan'208";a="208519173"
Received: from hioexcmbx06-prd.hq.netapp.com ([10.122.105.39]) by mx142-out.netapp.com with ESMTP; 28 Aug 2017 23:34:07 -0700
Received: from VMWEXCCAS09-PRD.hq.netapp.com (10.122.105.27) by hioexcmbx06-prd.hq.netapp.com (10.122.105.39) with Microsoft SMTP Server (TLS) id 15.0.1263.5; Mon, 28 Aug 2017 23:58:30 -0700
Received: from NAM01-BN3-obe.outbound.protection.outlook.com (10.120.60.153) by VMWEXCCAS09-PRD.hq.netapp.com (10.122.105.27) with Microsoft SMTP Server (TLS) id 15.0.1263.5 via Frontend Transport; Mon, 28 Aug 2017 23:58:30 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netapp.onmicrosoft.com; s=selector1-netapp-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=uVgeWAm6dA2JEfbIl/b9723//9+xBIg6RDv+0m0lsMI=; b=fzeQgQMb1p2wL98mRj3hyzWYFnersQwsJW27US5MaMqOPE4EQNYx1ppjBmmm15U6jtZusmZNdFGQi3djCgFnxnOktVlGoZ7ghR+GBIOBHc7CtnqXjZQ2e6d9ZLvtbgoXfuNGfuzoODyhd/dp4X69yyKPzEpggsODepzpP3RoLHo=
Received: from BLUPR06MB1764.namprd06.prod.outlook.com (10.162.224.150) by BLUPR06MB612.namprd06.prod.outlook.com (10.141.207.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.1.1385.9; Tue, 29 Aug 2017 06:58:31 +0000
Received: from BLUPR06MB1764.namprd06.prod.outlook.com ([10.162.224.150]) by BLUPR06MB1764.namprd06.prod.outlook.com ([10.162.224.150]) with mapi id 15.01.1385.014; Tue, 29 Aug 2017 06:58:31 +0000
From: "Eggert, Lars" <lars@netapp.com>
To: Neal Cardwell <ncardwell@google.com>, Praveen Balasubramanian <pravb@microsoft.com>
CC: "tcpm@ietf.org" <tcpm@ietf.org>
Thread-Topic: [tcpm] I-D Action: draft-ietf-tcpm-dctcp-10.txt
Thread-Index: AQHTIJNUP3KBuBLtAUSgdrL0xwUiz6Ka50oA
Date: Tue, 29 Aug 2017 06:58:31 +0000
Message-ID: <784ACCE6-78AB-4359-A9E6-F0CE0625BA77@netapp.com>
References: <150398950648.13143.4092402418089275384@ietfa.amsl.com>
In-Reply-To: <150398950648.13143.4092402418089275384@ietfa.amsl.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
x-mailer: Apple Mail (2.3273)
authentication-results: spf=none (sender IP is ) smtp.mailfrom=lars@netapp.com; 
x-originating-ip: [2001:a61:31dc:2801:ac53:acb7:99ae:401f]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; BLUPR06MB612; 6:dqc5g9z0QFJ4QWcGwt1KZTIMOAu/5EOPOesJfeTDWHrpTI+BlKEKrhWGR7VhUG2Bm/KuFLVtSPzwYbkI6HFHe8a6x4WeR1J0k8PJib3X/YRru97GFYUoF/M+N0tutCgyl733qQt5yi9/8SRyeU+e4XpUvtHlY6Alz853i6OSlB71IJZKW+Cdck/49IExv4qhJENCcUeuQWCogoLDuGVZXOvyVAXvQfllFrWLMI6qDZHgxL88nbL8Qg+W4NZK13M5zXK4y7Mg65Wjqj54GWQbIiBkWF6j1Gz7dAzCbHsbO3DtIUBAPK8YcN+3jCaNeuOvrFfmcyxLsBsLtNvT3xRoBA==; 5:YprCQ9wqYM04YQcthQZttG+9qPsUWCVMwOnslyOE35jeASUnWIuUucwmUQuA+bdyFOrbVkJxbUWdVBH7g6NmLR6j9eFYYbrIGtnG75WiWsQwuVJTovidovY6W6M4TqfHhP1cwZzk2GMSXU7t5mhRpA==; 24:247R5v4oL9mhzif8Pmdb2MoLaNRJGkSnemasiNxX4yR4okC/S7FWs+PhGoRsdEQJtXDnAUPO4lYjAS9z4Cen77r7CVCV7Bc4nDmzz/pQWKQ=; 7:3zo7UuCLJ6xvZQB3dVoZRso0LCeYxNzVmszJAWgMS3MEzwi4ybrkFUGvYo7kd/JYwMrJ+y0MUCRONrUkxC9XzUpexonRQrv0vI/QOYbz/pLrUN36T9d62PfhtMYZYuMAZA7MjfQbf1XfCWd39O67V04GbZuhLQlv2A9THl4iTzKAiKH8iQz/LIiMZVx8mU6gk7GY4eUdpgcnE1RHV8skyIrqPQPaqHShQqKa/3ozxl4=
x-ms-exchange-antispam-srfa-diagnostics: SSOS;
x-ms-office365-filtering-correlation-id: ace0f437-78c7-4003-c3f1-08d4eeab5bdf
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(300000503095)(300135400095)(2017052603199)(49563074)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:BLUPR06MB612; 
x-ms-traffictypediagnostic: BLUPR06MB612:
x-exchange-antispam-report-test: UriScan:(120809045254105);
x-microsoft-antispam-prvs: <BLUPR06MB612997EBE39222F21DD709CA79F0@BLUPR06MB612.namprd06.prod.outlook.com>
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(102415395)(6040450)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(93006095)(93001095)(100000703101)(100105400095)(6055026)(6041248)(20161123560025)(20161123562025)(20161123555025)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:BLUPR06MB612; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:BLUPR06MB612; 
x-forefront-prvs: 0414DF926F
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(979002)(6009001)(199003)(24454002)(377424004)(189002)(3280700002)(7736002)(102836003)(97736004)(3660700001)(8936002)(6506006)(76176999)(2421001)(6512007)(2950100002)(1511001)(2900100001)(478600001)(36756003)(50986999)(6306002)(8666007)(82746002)(77096006)(6486002)(53546010)(6116002)(83716003)(5660300001)(53936002)(57306001)(99286003)(50226002)(305945005)(6246003)(86362001)(99936001)(189998001)(4326008)(68736007)(6436002)(2906002)(2561002)(4001150100001)(14454004)(106356001)(105586002)(81156014)(230783001)(966005)(81166006)(101416001)(229853002)(8676002)(33656002)(25786009)(969003)(989001)(999001)(1009001)(1019001); DIR:OUT; SFP:1101; SCL:1; SRVR:BLUPR06MB612; H:BLUPR06MB1764.namprd06.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords;  A:1; MX:1; LANG:en; 
received-spf: None (protection.outlook.com: netapp.com does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/signed; boundary="Apple-Mail=_7EE11216-3E59-43F5-8689-40119721487D"; protocol="application/pgp-signature"; micalg=pgp-sha512
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Aug 2017 06:58:31.2260 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 4b0911a0-929b-4715-944b-c03745165b3a
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BLUPR06MB612
X-OriginatorOrg: netapp.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/tarCOuQ1DMpI4Keex9_UgiG0n84>
Subject: Re: [tcpm] I-D Action: draft-ietf-tcpm-dctcp-10.txt
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Aug 2017 06:58:35 -0000

--Apple-Mail=_7EE11216-3E59-43F5-8689-40119721487D
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

This hopefully addresses Neal's comments

> On 2017-8-29, at 8:51, internet-drafts@ietf.org wrote:
>=20
>=20
> A New Internet-Draft is available from the on-line Internet-Drafts =
directories.
> This draft is a work item of the TCP Maintenance and Minor Extensions =
WG of the IETF.
>=20
>        Title           : Datacenter TCP (DCTCP): TCP Congestion =
Control for Datacenters
>        Authors         : Stephen Bensley
>                          Dave Thaler
>                          Praveen Balasubramanian
>                          Lars Eggert
>                          Glenn Judd
> 	Filename        : draft-ietf-tcpm-dctcp-10.txt
> 	Pages           : 16
> 	Date            : 2017-08-28
>=20
> Abstract:
>   This informational memo describes Datacenter TCP (DCTCP), a TCP
>   congestion control scheme for datacenter traffic.  DCTCP extends the
>   Explicit Congestion Notification (ECN) processing to estimate the
>   fraction of bytes that encounter congestion, rather than simply
>   detecting that some congestion has occurred.  DCTCP then scales the
>   TCP congestion window based on this estimate.  This method achieves
>   high burst tolerance, low latency, and high throughput with shallow-
>   buffered switches.  This memo also discusses deployment issues
>   related to the coexistence of DCTCP and conventional TCP, the lack =
of
>   a negotiating mechanism between sender and receiver, and presents
>   some possible mitigations.  This memo documents DCTCP as currently
>   implemented by several major operating systems.  DCTCP as described
>   in this draft is applicable to deployments in controlled =
environments
>   like datacenters but it must not be deployed over the public =
Internet
>   without additional measures.
>=20
>=20
> The IETF datatracker status page for this draft is:
> https://datatracker.ietf.org/doc/draft-ietf-tcpm-dctcp/
>=20
> There are also htmlized versions available at:
> https://tools.ietf.org/html/draft-ietf-tcpm-dctcp-10
> https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-dctcp-10
>=20
> A diff from the previous version is available at:
> https://www.ietf.org/rfcdiff?url2=3Ddraft-ietf-tcpm-dctcp-10
>=20
>=20
> Please note that it may take a couple of minutes from the time of =
submission
> until the htmlized version and diff are available at tools.ietf.org.
>=20
> Internet-Drafts are also available by anonymous FTP at:
> ftp://ftp.ietf.org/internet-drafts/
>=20
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm


--Apple-Mail=_7EE11216-3E59-43F5-8689-40119721487D
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="signature.asc"
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJZpRCWAAoJEFS1wwm/cMFXDjwP/1wlxsdj0iut4BWRWS8+IHo/
ALSJ37tWrMdsHqLv/vorFJrWh/fcmad2RT3WN/EHd7xZE1OMndXcLhaSy+W2GT6/
hJT37LomT15T/BaiGYsdPyfCf/JX9ZrrGbJEswTuFJSBLHAvpoBGV01kbqHiWblj
HMeNv4nb7DYPKLdY4ABxFVZL0cgD73jez32OUUjxIYdc0z1yd3mUAMtA9f1TryDU
h01dvghiqsZq7T3cgMUyNDfGsbTziKc7/0/Dx9/eoPFwF+6elMXphkPuz9ECO++O
I/rRYt7Z6TfaFEv6m1+J4KS558Tb3M/58AnA/zNLzw11SF6mz3nRDmGazO6jSHHS
599DkNCBKi6blE7h4GnmqW7YhZTA8f++fNtjvfgI5aoFXBOgjQuUI7wZnGs3Zguu
SvPyzlKNq9nmxzqoxZi78KwuXkXJ1hXJXt5zehhjkRi+1dn89PyGIEd1jOiRT0YB
rXXLQyqK1AlZDWR3ikENJkM0tG2iI4j2cWDEzb425CtnKYlVH4YojfNNTOI2QZLQ
gvpJGY+i595ewuflSoJ/n4S9G6nqEUHtmNYwS0lT5adsLyzFzWBbTgJN3QczV80H
85DjU2jC1QiSGoSr8vKlmmJD6aS7TAlIMNOnVUCHkuxotQ54k0arPiXxWmmm2JeQ
8Uzvb+82F4Pvy69QIl6L
=59dg
-----END PGP SIGNATURE-----

--Apple-Mail=_7EE11216-3E59-43F5-8689-40119721487D--


From nobody Tue Aug 29 04:59:49 2017
Return-Path: <session-request@ietf.org>
X-Original-To: tcpm@ietf.org
Delivered-To: tcpm@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id A9AD81321B7; Tue, 29 Aug 2017 04:59:47 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: IETF Meeting Session Request Tool <session-request@ietf.org>
To: <session-request@ietf.org>
Cc: michael.scharf@nokia.com, tcpm@ietf.org, ietf@kuehlewind.net, tcpm-chairs@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.59.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <150400798759.13230.9636168137771677148.idtracker@ietfa.amsl.com>
Date: Tue, 29 Aug 2017 04:59:47 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/l9BqhLBYj7T5SL-ynH4ErwxUs_E>
Subject: [tcpm] tcpm - New Meeting Session Request for IETF 100
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Aug 2017 11:59:48 -0000

A new meeting session request has just been submitted by Michael Scharf, a Chair of the tcpm working group.


---------------------------------------------------------
Working Group Name: TCP Maintenance and Minor Extensions
Area Name: Transport Area
Session Requester: Michael Scharf

Number of Sessions: 1
Length of Session(s):  2.5 Hours
Number of Attendees: 80
Conflicts to Avoid: 
 First Priority: iccrg tcpinc mptcp taps tsvarea tsvwg quic
 Second Priority: httpbis lwig maprg rtcweb rmcat teas


People who must be present:
  Yoshifumi Nishida
  Michael Tuexen
  Michael Scharf
  Mirja Kuehlewind

Resources Requested:

Special Requests:
  
---------------------------------------------------------


From nobody Tue Aug 29 07:22:36 2017
Return-Path: <ncardwell@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F37EA132CF2 for <tcpm@ietfa.amsl.com>; Tue, 29 Aug 2017 07:22:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.7
X-Spam-Level: 
X-Spam-Status: No, score=-2.7 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4uuUdhmYM8G8 for <tcpm@ietfa.amsl.com>; Tue, 29 Aug 2017 07:22:33 -0700 (PDT)
Received: from mail-qk0-x235.google.com (mail-qk0-x235.google.com [IPv6:2607:f8b0:400d:c09::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3A16B132C4C for <tcpm@ietf.org>; Tue, 29 Aug 2017 07:22:33 -0700 (PDT)
Received: by mail-qk0-x235.google.com with SMTP id k126so15703454qkb.4 for <tcpm@ietf.org>; Tue, 29 Aug 2017 07:22:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=owaBKEwvRr2f7lMVBQXucLDR7p/Oy7oar1rlWBmMWZc=; b=Hy8ksL6nk3D7ZbGTg9XNSihkd2TiT9TO6JZFESWuMMpWUQZ8A0BDIfVd8Km8d26hIa rpqxkwwDhryLUEwbY1bYqAUum7qSUu/mqz1PsE6zhaenvmXLe9uvljYQ/2Dn7iJ4ebSz KqEU7wYpvq8iwsdJ7/XsYW/r9oNE8iZgaeNCQnHIZcwwfT1GdZ+CkSHUBoRDwb2mE9l2 i11Z4oRAGd1Msn66Q/K9w42ayV9I6TDan7wTRB4nS3C3g/JqweXu8TU6DUVrageJthpA 4ko1JWlAVRX4bkDHXu+82y/XIg7iGb0l+YEffl7Ze0PfP2X4d5OFXhxh3Zi3Sgef27wO RYVg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=owaBKEwvRr2f7lMVBQXucLDR7p/Oy7oar1rlWBmMWZc=; b=oczaqC1Q21stEFdvsol8ubtz+VsTzLGNEAQW1pc1ueZ6ZI37CL0EsAiJNFHM5a7A+3 sw0FGqhfrn+WGIV1kAlfc1GSW3jYDZWIHVlxVZbxuUUDElzr5ClOmwaBFaxDQeicDsJx PaI0EAaw7v56Mpc/KYnmlsCp/1dQCBkkqfCIXjSFTe0XV0Ve0dAh4kG2T9CJ6jpg01VZ Ck06RpPInDP0p8xekLu+XBEDDFRxxeAm7+JKLvFiNPJjO29679S9SgfsPYgUHV5QOLBx Hns+6GMFRqyE0vT2Y19IKANfv2Y5R+c+Uqa+mmnYNXuc2ofwTRdaNzoPrVZGNnGvj92Q I2Qg==
X-Gm-Message-State: AHYfb5i77dpekQFYGod9xmAEQLbn+uYRbwIaajm/O10a6OrHcUs0XyUP QQvZByaWlaRgesu1kJzT37eM2KT5j9iY
X-Google-Smtp-Source: ADKCNb7rZWF/mWfVw8/zDzjL3aD3mq3VLmHSs3LmRwPpYWYGrkinmKMjnciTd+nOmXX++Zo0Q8Biu+ROtrcHJN6ILz4=
X-Received: by 10.55.66.67 with SMTP id p64mr6204020qka.55.1504016552063; Tue, 29 Aug 2017 07:22:32 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.12.128.209 with HTTP; Tue, 29 Aug 2017 07:22:01 -0700 (PDT)
In-Reply-To: <784ACCE6-78AB-4359-A9E6-F0CE0625BA77@netapp.com>
References: <150398950648.13143.4092402418089275384@ietfa.amsl.com> <784ACCE6-78AB-4359-A9E6-F0CE0625BA77@netapp.com>
From: Neal Cardwell <ncardwell@google.com>
Date: Tue, 29 Aug 2017 10:22:01 -0400
Message-ID: <CADVnQy=T=HxWzBTgfu_QK23Cn=ACepjwY7Z6xxeRbENjNMuAOw@mail.gmail.com>
To: "Eggert, Lars" <lars@netapp.com>
Cc: Praveen Balasubramanian <pravb@microsoft.com>, "tcpm@ietf.org" <tcpm@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/qlQQaxja-oZCWOmwsdX4Kgn6hU0>
Subject: Re: [tcpm] I-D Action: draft-ietf-tcpm-dctcp-10.txt
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Aug 2017 14:22:35 -0000

On Tue, Aug 29, 2017 at 2:58 AM, Eggert, Lars <lars@netapp.com> wrote:
>
> This hopefully addresses Neal's comments

Yes, this ( https://www.ietf.org/rfcdiff?url2=draft-ietf-tcpm-dctcp-10
) looks great to me. Thanks, Lars!

neal


From nobody Tue Aug 29 08:50:39 2017
Return-Path: <lars@netapp.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6045F132C39 for <tcpm@ietfa.amsl.com>; Tue, 29 Aug 2017 08:50:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=netapp.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gVVlJE3qR9LB for <tcpm@ietfa.amsl.com>; Tue, 29 Aug 2017 08:50:35 -0700 (PDT)
Received: from mx141.netapp.com (mx141.netapp.com [IPv6:2620:10a:4005:8000:2306::a]) (using TLSv1.2 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id ED2F1132D52 for <tcpm@ietf.org>; Tue, 29 Aug 2017 08:50:34 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="5.41,445,1498546800";  d="asc'?scan'208";a="224578638"
Received: from hioexcmbx01-prd.hq.netapp.com ([10.122.105.34]) by mx141-out.netapp.com with ESMTP; 29 Aug 2017 08:28:16 -0700
Received: from VMWEXCCAS06-PRD.hq.netapp.com (10.122.105.22) by hioexcmbx01-prd.hq.netapp.com (10.122.105.34) with Microsoft SMTP Server (TLS) id 15.0.1263.5; Tue, 29 Aug 2017 08:50:31 -0700
Received: from NAM02-BL2-obe.outbound.protection.outlook.com (10.120.60.153) by VMWEXCCAS06-PRD.hq.netapp.com (10.122.105.22) with Microsoft SMTP Server (TLS) id 15.0.1263.5 via Frontend Transport; Tue, 29 Aug 2017 08:50:31 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netapp.onmicrosoft.com; s=selector1-netapp-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=uNH6TYZaGHlMfBZhaENB/Pq6elMt1McDcoXks7caMMU=; b=GJ8A6tB+Ly7r+0Qkz2ePDIzpfMR023JSnCYJs/+vUO02smN7ic/2pxiFwZ/MbWIIA8Z9y5Hze1qsFaAAmV6jQwdATGOx3NGb7+YhOgVlNUPjhBuTWTATYQGvpxO6BgAJSrOtRGVlowCwlGMJihqsMXWqc1uDp42Xryqwk2Muz/g=
Received: from BLUPR06MB1764.namprd06.prod.outlook.com (10.162.224.150) by BLUPR06MB244.namprd06.prod.outlook.com (10.242.191.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.13.10; Tue, 29 Aug 2017 15:50:32 +0000
Received: from BLUPR06MB1764.namprd06.prod.outlook.com ([10.162.224.150]) by BLUPR06MB1764.namprd06.prod.outlook.com ([10.162.224.150]) with mapi id 15.01.1385.014; Tue, 29 Aug 2017 15:50:32 +0000
From: "Eggert, Lars" <lars@netapp.com>
To: Neal Cardwell <ncardwell@google.com>
CC: Praveen Balasubramanian <pravb@microsoft.com>, "tcpm@ietf.org" <tcpm@ietf.org>
Thread-Topic: [tcpm] I-D Action: draft-ietf-tcpm-dctcp-10.txt
Thread-Index: AQHTIJNUP3KBuBLtAUSgdrL0xwUiz6Ka50oAgAB764CAABi6gA==
Date: Tue, 29 Aug 2017 15:50:32 +0000
Message-ID: <BFC5AFBB-1620-444B-BCCC-50629F63DEF5@netapp.com>
References: <150398950648.13143.4092402418089275384@ietfa.amsl.com> <784ACCE6-78AB-4359-A9E6-F0CE0625BA77@netapp.com> <CADVnQy=T=HxWzBTgfu_QK23Cn=ACepjwY7Z6xxeRbENjNMuAOw@mail.gmail.com>
In-Reply-To: <CADVnQy=T=HxWzBTgfu_QK23Cn=ACepjwY7Z6xxeRbENjNMuAOw@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
x-mailer: Apple Mail (2.3273)
x-originating-ip: [188.174.84.129]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; BLUPR06MB244; 6:S/JXpKDg1Jsm7NprdYfiCqyo2l74PG/he3z2VTKX//o/Unm8HyDEUGNtaSu8uzlbShvgOJwFqVLsJAe5dPZqavfxpgJ+qq7BL3kFRbE/1sT7SyJ2R2ylMuamQ9y5M8v3x1bPfHf3QZ3MByK8OcLLGhwN3bD2sazy6WGpIy1y3hjJExeCXXAgTdIZTe7wWVodtLr7M2CQtx25+XVveEmOSwUF7l4gmRFBrD/FJhbct9TZZ1hEKIvuvq5WQiAknsp58DF7Fm4J1xczZac0qsWAcsRQhVx5Fx+URJ6y4h5+u7QNmJIiYGVh1tC0eWljB+mSleSbLzoLfQBTx4KDpP5UxQ==; 5:VhI9tQQidC0i1fCUol3z5iMQO64vqaeDqpvLiBWuCDt68ge2ZOSPlYXegDreJIBPWFeqMGeNZnYsI5lvf7DgAIB8CEYVqcS6zg6ay841oe4Sw5b9iUHHRdTDE+4nIKFfOy6nV28MO3FTzX0EqUquHw==; 24:3qjGl8FbeyNAiEFO0Q4ERaUUv2UqWH6OqL1vX5vuVgWeVv5nozH5mI0QmgztiuByZc2zaQczMiQE8yyZNFKCTxHAlXW9cqcqGW2LmHD8zzk=; 7:M+FaDOuJL88QIAQ+wNx8DXsGQb0OD4wJaGHpt0K3+e0/kZRvVJap8gthjES0Pl8jXZ//tlE4w/GgULI9YGlsZFOdg+4HJrp2He1hQjIm7VmgCGTg6dttB/z8e7I+LY82fBaNHHT++Slo/HzZnB1U5X9/mIrGDaGRznQWl0qp+peGYcyzjceKbIO+RQrz1Vj9gZWzG3LWwZDKCxNyqj4YrtHSA7bxcPshTf+790/nFqk=
x-ms-exchange-antispam-srfa-diagnostics: SSOS;
x-ms-office365-filtering-correlation-id: c8a515fe-cb3a-4c19-a573-08d4eef5ae64
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(300000503095)(300135400095)(2017052603199)(49563074)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:BLUPR06MB244; 
x-ms-traffictypediagnostic: BLUPR06MB244:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=lars@netapp.com; 
x-exchange-antispam-report-test: UriScan:(211936372134217)(153496737603132);
x-microsoft-antispam-prvs: <BLUPR06MB2448DFE947C74BF3E0BC4F6A79F0@BLUPR06MB244.namprd06.prod.outlook.com>
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(102415395)(6040450)(601004)(2401047)(5005006)(8121501046)(93006095)(93001095)(100000703101)(100105400095)(3002001)(10201501046)(920507026)(6055026)(6041248)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(20161123558100)(20161123562025)(20161123564025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:BLUPR06MB244; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:BLUPR06MB244; 
x-forefront-prvs: 0414DF926F
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(377424004)(24454002)(199003)(189002)(377454003)(230783001)(33656002)(2906002)(6506006)(305945005)(99286003)(6512007)(6306002)(82746002)(66066001)(77096006)(50986999)(76176999)(3280700002)(99936001)(3660700001)(53936002)(6486002)(189998001)(68736007)(478600001)(6246003)(110136004)(229853002)(54906002)(966005)(83716003)(2900100001)(14454004)(8666007)(57306001)(7736002)(34040400001)(6436002)(86362001)(81156014)(101416001)(105586002)(81166006)(25786009)(5660300001)(8676002)(53546010)(36756003)(4001150100001)(2950100002)(6916009)(106356001)(4326008)(102836003)(3846002)(8936002)(6116002)(97736004)(50226002); DIR:OUT; SFP:1101; SCL:1; SRVR:BLUPR06MB244; H:BLUPR06MB1764.namprd06.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords;  A:1; MX:1; LANG:en; 
received-spf: None (protection.outlook.com: netapp.com does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/signed; boundary="Apple-Mail=_8F923421-C521-462B-9DAC-F51212C815A8"; protocol="application/pgp-signature"; micalg=pgp-sha512
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Aug 2017 15:50:32.5999 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 4b0911a0-929b-4715-944b-c03745165b3a
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BLUPR06MB244
X-OriginatorOrg: netapp.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/XWiMiBWpGm3rgNsX6zfOEVnwpeI>
Subject: Re: [tcpm] I-D Action: draft-ietf-tcpm-dctcp-10.txt
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Aug 2017 15:50:37 -0000

--Apple-Mail=_8F923421-C521-462B-9DAC-F51212C815A8
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=us-ascii

Praveen did the work, I just submitted :-)

> On 2017-8-29, at 16:22, Neal Cardwell <ncardwell@google.com> wrote:
> 
> On Tue, Aug 29, 2017 at 2:58 AM, Eggert, Lars <lars@netapp.com> wrote:
>> 
>> This hopefully addresses Neal's comments
> 
> Yes, this ( https://www.ietf.org/rfcdiff?url2=draft-ietf-tcpm-dctcp-10
> ) looks great to me. Thanks, Lars!
> 
> neal


--Apple-Mail=_8F923421-C521-462B-9DAC-F51212C815A8
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="signature.asc"
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJZpY1HAAoJEFS1wwm/cMFXdd8P/AwzxKZKGEIXAuh7GQ3uj+D+
sSb4uBIeekY49ilQOuVZvxuqGIKCf26T8tyox6ZXSJ/xSQAFLKeB/NnliEFjfS8D
BV8BqfxHBTBu1GKxvb7dZrh4sxnMYACytGOGuovbzt3vxCxJcT6b86dZ4SOTCuyN
Kmjxd+1isDtGg9sLsr5/C1PMy3rdHCHt/+onDNoDIA+xVxks7srlK/6fYM4wOiNn
wQWQ/Ov0WbkqhW28J3z6hfkO8rwcchM3L1QRVWAv6XhycGNaooLRyBgzUwyVEge7
gDJKRtKspyMaq6EUSzNUJZQAR7P+1JO2p+NeHg98O+x7TZ+vuulsyNvQUwODWLQx
PKM5Q2pT+EvwV6fLWzC/m8nxNMMhe8cMVM9lzmRq8yVxjR8TWin44MPOlS6YYtgJ
FXoGDbOz91sikoHmxdqCffJggF8e/6ceQdeNubeyPjOP5hOsszo9VnfhtAKUm7PA
LCRNznHVd+oNsgNmYUHa6eV0hZpK3yVUsrVRHV6GNDKo8bTbni5PBGGfCjzDUatC
wRDxZei0Z9twr/THPwI8wcU5XPX+lnTmUPb44MUjBzrseVaY3RefgV3MyLpakJYb
iXU8D05WQlEVXzPH2gfyvK2KCrGupII86cxIJGE/Gocf1CgAOpG3pCdVDGUxfAla
KO+zmItcVDuqnfOeDC6G
=/fye
-----END PGP SIGNATURE-----

--Apple-Mail=_8F923421-C521-462B-9DAC-F51212C815A8--