<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rfc [
<!ENTITY nbsp "&#160;">
<!ENTITY zwsp "&#8203;">
<!ENTITY nbhy "&#8209;">
<!ENTITY wj "&#8288;">
]>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="std" ipr="trust200902" docName="draft-ietf-bier-path-mtu-discovery-17" obsoletes="" updates="" submissionType="IETF" xml:lang="en" tocInclude="true" tocDepth="3" symRefs="true" sortRefs="true" version="3">
  <!-- xml2rfc v2v3 conversion 3.6.0 -->
  <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

<front>
    <title abbrev="PMTUD for BIER">Path Maximum Transmission Unit Discovery (PMTUD) for Bit Index Explicit Replication (BIER) Layer</title>
    <seriesInfo name="Internet-Draft" value="draft-ietf-bier-path-mtu-discovery-17"/>
    <author initials="G." surname="Mirsky" fullname="Greg Mirsky">
      <organization>Ericsson</organization>
      <address>
        <email>gregimirsky@gmail.com</email>
      </address>
    </author>
    <author initials="T." surname="Przygienda" fullname="Tony Przygienda">
      <organization>Juniper Networks</organization>
      <address>
        <email>prz@juniper.net </email>
      </address>
    </author>
    <author initials="A." surname="Dolganow" fullname="Andrew Dolganow">
      <organization>Nokia</organization>
      <address>
        <email>andrew.dolganow@nokia.com</email>
      </address>
    </author>
    
    <date year="2024"/>
    
    <area>Routing</area>
    <workgroup>BIER  Working Group</workgroup>
    <keyword>Internet-Draft</keyword>
    <keyword>BIER</keyword>
    <keyword>OAM</keyword>
    <abstract>
      <t>
	   This document describes Path Maximum Transmission Unit Discovery (PMTUD) in Bit Indexed Explicit Replication (BIER) layer.
      </t>
    </abstract>
  </front>
  <middle>
    <section anchor="intro" numbered="true" toc="default">
      <name>Introduction</name>
      <t>
        In packet switched networks, when a host seeks to transmit data to a
        target destination, the data is transmitted as a set of packets.  In many cases, it is more efficient to 
        use the largest size packets that are less than or equal to the smallest 
        Maximum Transmission Unit (MTU) for any forwarding device 
        along the routed path to the IP destination for these packets. 
Such "least MTU" is known as Path MTU (PMTU). 
        Fragmentation or packet drop, silent or not, may  occur on hops along the path where an MTU is smaller 
        than the size of the datagram.  To avoid any of the listed above behaviors, the packet source must find
        the value of the least MTU, i.e., PMTU, that will be encountered along the path that 
        a set of packets will follow to reach the given set of destinations.
        Such MTU determination along a specific path is referred to as path MTU discovery (PMTUD).
      </t>
      <t>
   <xref target="RFC8279" format="default"/> introduces and explains Bit Index Explicit Replication (BIER)
   architecture and how it supports the forwarding of multicast data packets.
<xref target="I-D.ietf-bier-ping" format="default"/> introduced BIER Ping as a transport-independent 
OAM mechanism to detect and localize failures in the BIER data plane. This document specifies
how BIER Ping can be used to perform efficient PMTUD in the BIER domain.
      </t>
      
      <section numbered="true" toc="default">
        <name>Conventions used in this document</name>
        <section numbered="true" toc="default">
                <name>Terminology</name>
  <t>This document uses terminology defined in <xref target="RFC8279"/>. Familiarity with this specification and the terminology used
is expected.</t>
</section>
        <section numbered="true" toc="default">
          <name>Requirements Language</name>
          <t>
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
   NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
   "MAY", and "OPTIONAL" in this document are to be interpreted as
   described in BCP 14 <xref target="RFC2119" format="default"/> <xref target="RFC8174" format="default"/> 
   when, and only when, they appear in all capitals, as shown here.
          </t>
        </section>
      </section>
    </section>
    <section anchor="problem-state" numbered="true" toc="default">
      <name>Problem Statement</name>
      <t>
<xref target="I-D.ietf-bier-oam-requirements"/> sets forth the requirement to define PMTUD 
protocol for BIER domain. This document describes the extension to 
<xref target="I-D.ietf-bier-ping" format="default"/> for use in the BIER PMTUD solution.
</t>
      <t>
Current PMTUD mechanisms (<xref target="RFC1191" format="default"/>, <xref target="RFC8201" format="default"/>,
and <xref target="RFC4821" format="default"/>) are primarily targeted
to work on point-to-point, i.e. unicast paths. These mechanisms use packet
fragmentation control by disabling fragmentation of the probe packet. As a result,
a transit node that cannot forward a probe packet that is bigger than its 
link MTU sends to the packet's source an error notification, otherwise the packet 
destination may respond with a positive acknowledgment. Thus,
possibly through a series of iterations, varying the size of the probe packet, 
the packet source discovers the PMTU of the particular path.
</t>
      <t>
  Applying such existing PMTUD solutions are inefficient for point-to-multipoint paths 
  constructed for multicast traffic. Probe packets must be flooded through the whole set of 
  multicast distribution paths repeatedly until the very last egress responds with a 
  positive acknowledgment. Consider the multicast network presented in <xref target="mcast-network" format="default"/>,
  where MTU on all links but one (B, D) is the same. If MTU on the link (B, D) is 
  smaller than the MTU on the other links, using
  existing PMTUD mechanism probes will unnecessarily flood to leaf nodes E, F, and G
  for the second and consecutive times and positive responses will 
  be generated and received by root A repeatedly. 
      </t>
      <figure anchor="mcast-network">
        <name>Multicast network</name>
        <artwork name="" type="" align="left" alt=""><![CDATA[
 
	                -----  
                      --| D |	  
              -----  /  -----
            --| B |--
           /  -----  \  -----
          /           --| E |
-----    /              -----
| A |---                -----
-----    \            --| F |         
          \  -----   /  -----
           --| C |--
             -----   \  -----
                      --| G |   
                        -----     

]]></artwork>
      </figure>
    </section>
    <section anchor="pmtud-bier-solution" numbered="true" toc="default">
      <name>PMTUD Mechanism for BIER</name>
      <t>
A multicast distribution tree connects a BFIR with a set of BFERs via procedures
explained in <xref target="RFC8279"/>. The BFIR determines the MTU of this
multicast distribution tree by transmitting a series of probe packets
from BFIR to the set of BFERs. In the case of ECMP, BFIR MAY
   test each path by variating the value in the Entropy field. The
   critical step in the process of Path MTU discovery is the notification of BFIR about the failure at an intermediate BFR to
  forward the probe packet toward the subset of targeted downstream BFERs. That is achieved by BFR
   responding with a partial (compared to the one it received in the
   request) bitmask towards the originating BFIR in error notification.
   That allows for the
   retransmission of the next probe with a smaller MTU addressed only
   toward a smaller set of BFERs downstream from the failed BFR instead
   of all BFERs within the multicast distribution tree.
   In the scenario discussed in <xref target="problem-state"/>,
the second and all following (if needed) probes will be
   sent only to node D because the smaller link MTU of interface B-D.
Since the MTU discovery of E, F, and G has been completed already by the first probe,
the second, and any of the following probes will not be forwarded to these leaves. 
</t>

      <t>
Consider the network displayed in <xref target="mcast-network" format="default"/> to be a presentation of a BIER domain
and all nodes to be BFRs. To discover MTU over BIER domain to BFERs D, F, E, and G BFIR A will use
BIER Ping with Data TLV, defined in <xref target="data-tlv" format="default"/>. Size of the first probe set to M_max
determined as minimal MTU value of BFIR's links to BIER domain.
 As has been assumed in <xref target="problem-state" format="default"/>, MTUs of all links but the link (B, D) are the same.
Thus BFERs E, F, and G would receive BIER Echo Request and will send their 
respective replies to BFIR A. BFR B may pass the packet which is too large to 
forward over egress link (B, D) to the appropriate network layer for error processing
where it would be recognized as a BIER Echo Request packet.
BFR B MUST send BIER Echo Reply to BFIR A and MUST include Downstream Mapping TLV, 
defined in <xref target="I-D.ietf-bier-ping" format="default"/> setting its fields in the following fashion:
</t>
      <ul spacing="normal">
        <li>MTU SHOULD be set to the minimal MTU value among all BIER-enabled egress interfaces
toward downstream BFRs that could be used to reach B's downstream BFERs;</li>
        <li>Address Type MAY be set to any value defined in Section 3.3.4 <xref target="I-D.ietf-bier-ping"/>.</li>
        <li>I flag MUST be cleared to direct the responding BFR not to include the Incoming SI-BitString TLV in the BIER Echo Response.</li>
        <li>Downstream Interface Address field MUST be zeroed.</li>
        <li>List of Sub-TLVs MUST include the Egress Bitstring sub-TLV with the list of all BFERs that cannot be reached because
the egress MTU turned out to be too small.</li>
      </ul>
      <t>
The BFIR will receive either of the two types of packets:
</t>
      <ul spacing="normal">
        <li>a positive Echo Reply from one of BFERs to which the probe has been sent. In this
case, the bit corresponding to the BFER MUST be cleared from the bitmask string (BMS);
</li>
        <li>
a negative Echo Reply with bit string listing unreached BFERs and recommended MTU 
value MTU'. The BFIR MUST add the bit string
to its BMS and set the size of the next probe as min(MTU, MTU')
</li>
      </ul>
      <t>
If a negative Echo Reply is received, the BFIR MUST wait for the expiration of the Echo Request before
transmitting the updated Echo Request.
If upon expiration of the Echo Request timer BFIR didn't receive any Echo Replies,
then the size of the probe SHOULD be decreased. There are scenarios
   when an implementation of the PMTUD would not decrease the size of the probe.
 For example, suppose upon expiration of the Echo Request timer BFIR didn't receive
 any Echo Reply. In that case, BFIR MAY continue to retransmit the probe using
 the initial size and MAY apply probe delay retransmission procedures. The algorithm used to 
  delay retransmission procedures on BFIR  is outside the scope of this specification.
<!--The BFIR
 MUST continue sending probes using BMS until the bit string is clear or the discovery is
declared unsuccessful. --> 
The BFIR sends probes using BMS and locally defined retransmission 
procedures, but not more frequently than after the Echo Request timer expired,
until either the bit string is clear, i.e., contains no set bits, or until the BFIR 
retransmission procedure terminates and PMTU discovery is declared unsuccessful.
In the case of convergence of the procedure, the size of the last probe indicates the PMTU size
that can be used for all BFERs in the initial BMS without incurring fragmentation.
</t>
      <t>
Thus we conclude that in order to comply with the requirement in <xref target="I-D.ietf-bier-oam-requirements" format="default"/>:
</t>
      <ul spacing="normal">
        <li>a BFR SHOULD support PMTUD;</li>
        <li>a BFR MAY use defined per BIER sub-domain MTU value as initial MTU 
value for discovery or use it as MTU for this BIER sub-domain to reach BFERs;</li>
        <li>a BFIR MUST have a locally defined PMTUD probe retransmission procedure.</li>
      </ul>
      <section anchor="data-tlv" numbered="true" toc="default">
        <name>Data TLV for BIER Ping</name>
        <t>
There needs to be a control for probe size in order to support the BIER PMTUD. Data TLV format
is presented in <xref target="data-tlv-fig" format="default"/>. Data TLV MAY be added in BIER Echo
Request or Echo Reply message as defined in <xref target="I-D.ietf-bier-ping"/>.

        </t>
        <figure anchor="data-tlv-fig">
          <name>Data TLV format</name>
          <artwork name="" type="" align="left" alt=""><![CDATA[    
  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |          Type  (TBA1)         |             Length            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                              Data                             |
 ~                                                               ~
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
        </figure>
        <ul spacing="normal">
          <li>Type: indicates Data TLV, to be allocated by IANA <xref target="iana-considerations" format="default"/>.</li>
          <li>Length: the length of the Data field in octets.
</li>
          <li>Data: n octets (n = Length) of arbitrary data. The receiver SHOULD ignore it.</li>
        </ul>
      </section>
    </section>
    <section anchor="iana-considerations" numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>
  IANA is requested to assign a new Type value for Data TLV Type from its registry of TLV and sub-TLV Types of BIER Ping
  as follows:
      </t>
      <table anchor="data-tlv-table" align="center">
        <name>Data TLV Type</name>
        <thead>
          <tr>
            <th align="left">Value</th>
            <th align="center">Description</th>
            <th align="left">Reference</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td align="left">TBA1</td>
            <td align="center">Data</td>
            <td align="left">This&nbsp;document</td>
          </tr>
        </tbody>
      </table>
    </section>
    <section anchor="security-considerations" numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>
     Routers that support PMTUD based on this document are subject to the same security considerations as defined in
     <xref target="I-D.ietf-bier-ping" format="default"/>
      </t>
    </section>
    <section anchor="ack" numbered="true" toc="default">
      <name>Acknowledgment</name>
      <t>
   Authors greatly appreciate thorough review and the most detailed comments by Eric Gray.
      </t>
    </section>
  </middle>
  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.1191.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8201.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4821.xml"/>
        <xi:include href="https://datatracker.ietf.org/doc/bibxml3/draft-ietf-bier-ping.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8279.xml"/>
      </references>
      <references>
        <name>Informative References</name>

        <xi:include href="https://datatracker.ietf.org/doc/bibxml3/draft-ietf-bier-oam-requirements.xml"/>
      </references>
    </references>
  </back>
</rfc>
