<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<?xml-model href="rfc7991bis.rnc"?>
<!-- <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?> -->
<!-- This third-party XSLT can be enabled for direct transformations in XML processors, including most browsers -->
<!-- For a complete list and description of processing instructions (PIs), 
    please see http://xml.resource.org/authoring/README.html. -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->

<!-- control vertical white space 
    (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc
    xmlns:xi="http://www.w3.org/2001/XInclude"
    category="std"
    docName="draft-burdet-bess-evpn-fast-reroute-08"
    consensus="true"
    submissionType="IETF"
    ipr="trust200902"
    tocInclude="true"
    tocDepth="4"
    symRefs="true"
    sortRefs="true"
    version="3">

  <!-- ***** FRONT MATTER ***** -->
  <front>
    <!-- The abbreviated title is used in the page header - it is only necessary if the 
        full title is longer than 39 characters -->
    <title abbrev="EVPN Fast Reroute">EVPN Fast Reroute</title>
    <seriesInfo name="Internet-Draft" value="draft-burdet-bess-evpn-fast-reroute-08"/>
    <author fullname="Luc Andre Burdet" initials="LA." surname="Burdet" role="editor">
      <organization>Cisco</organization>
      <address>
        <email>lburdet@cisco.com</email>
      </address>
    </author>
    <author fullname="Patrice Brissette" initials="P." surname="Brissette">
      <organization>Cisco</organization>
      <address>
        <email>pbrisset@cisco.com</email>
      </address>
    </author>
    <author fullname="Takuya Miyasaka" initials="T." surname="Miyasaka">
      <organization>KDDI Corporation</organization>
      <address>
        <email>ta-miyasaka@kddi.com</email>
      </address>
    </author>
    <author fullname="Jorge Rabadan" initials="J." surname="Rabadan">
     <organization>Nokia</organization>
     <address>
       <email>jorge.rabadan@nokia.com</email>
     </address>
    </author>
   

    <date year="2024"/>

    <!-- Meta-data Declarations -->
    <area>General</area>
    <workgroup>BESS Working Group</workgroup>
    <keyword>RFC7432</keyword>
    <keyword>EVPN</keyword>
    <keyword>Convergence</keyword>
    <abstract>
      <t>This document summarises EVPN convergence mechanisms and specifies
      procedures for EVPN networks to achieve fast and scale‑independent convergence.
      </t>
    </abstract>


   <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
      NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
      "MAY", and "OPTIONAL" in this document are to be interpreted as
      described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when,
      and only when, they appear in all capitals, as shown here. 
     </t>
   </note>

   </front>

<middle>
    <section anchor="intro">
      <name>Introduction</name>
      <t>EVPN convergence and failure recovery methods from different types of
        network failures is described in <relref target="I-D.ietf-bess-rfc7432bis"  section="17"/>.
        Similarly for EVPN‑VPWS, the end of <relref target="RFC8214" section="5"/> briefly evokes
        an egress link protection mechanism.</t>
      <t>The fundamentals of EVPN convergence rely on a mass‑withdraw technique
        of the Ethernet A-D per ES route to unresolve all the associated
        forwarding paths (<relref target="I-D.ietf-bess-rfc7432bis" section="9.2.2"/> 'Route Resolution').
        The mass‑withdraw grouping approach results in suitable EVPN convergence at lower scale, but is not
        sufficient to meet stricter convergence requirements, often sub-second.       
        Other control-plane enhancements such as route‑prioritisation (<xref target="I-D.ietf-bess-rfc7432bis"/>) help further but still provide no
        guarantees. </t>
      <t>EVPN convergence using only control-plane approaches is constrained by BGP route propagation delays, routes
        processing times in software and hardware programming.
        These are additionally often performed sequentially and linearly given the
        potential large scale of EVPN routes present in control plane.</t>
      <t>This document presents a mechanism for fast reroute to minimise packet loss in the case of a
        link failure using EVPN redirect labels (ERLs) with special forwarding behaviors.
        Multiple-failures where loops may occur are addressed, as are cascading failures.
        A mechanism for distributing redirect labels (ERLs) alongside EVPN service labels (ESLs) is
        shown.</t>
      <t>The main objective is to achieve fast convergence in
        EVPN networks without relying on control plane actions.
        The procedures in this document apply to the following EVPN services: EVPN <xref target="I-D.ietf-bess-rfc7432bis"/>,
        EVPN-VPWS  <xref target="RFC8214"/>, EVPN Inter-Subnet Forwarding <xref target="RFC9135"/>
        and EVPN IP-VRF-to-IP-VRF models as in <relref target="RFC9136" section="4.4"/>.
        All the EVPN Multi-Homing modes are included.
      </t>
    </section>
    <section anchor="terminology">
      <name>Terminology</name>
      <t>Some of the terminology in this document is borrowed from <xref target="RFC8679"/> for
      consistency across fast reroute frameworks.
      <br/>
      The term 'label' when used in this document,
      especially when referring to ERL and ESL (below) indicates an MPLS label, a VNI (VXLAN Network
      Identifier) or a Segment Routing IPv6 SID, depending on the transport being used.</t>
      <dl newline="false" spacing="normal" indent="3">
        <dt>CE:</dt>
        <dd>Customer Edge device, e.g., a host, router, or switch.</dd>
        <dt>PE:</dt>
        <dd>Provider Edge device.</dd>
        <dt>Ethernet Segment (ES):</dt>
        <dd>A set of ethernet links connected to one or more PEs.</dd>
        <dt>Ethernet Segment Identifier (ESI):</dt>
        <dd>A unique non-zero
            identifier that identifies an Ethernet segment.</dd>
        <dt>Egress link:</dt>
        <dd>Specific Ethernet link connecting a given PE-CE,
           which forms part of an Ethernet Segment.</dd>
        <dt>Single-Active Redundancy Mode:</dt>
        <dd>When only a single PE,
            among all the PEs attached to an Ethernet segment,
            is allowed to forward traffic to/from that Ethernet segment for
            a given VLAN, then the Ethernet segment is defined to be operating
            in Single-Active redundancy mode.</dd>
        <dt>All-Active Redundancy Mode:</dt>
        <dd>When all PEs attached to
            an Ethernet segment are allowed to forward known unicast traffic
            to/from that Ethernet segment for a given VLAN, then the Ethernet
            segment is defined to be operating in All-Active redundancy mode.</dd>
        <dt>Port-Active Redundancy Mode:</dt>
        <dd>When only a single PE,
            among all the PEs attached to an Ethernet segment,
            is allowed to forward traffic to/from that Ethernet segment for
            the entire interface (all VLANs),
            then the Ethernet segment is defined to be operating
            in Port-Active redundancy mode.</dd>
        <dt>Single-Flow-Active Redundancy Mode:</dt>
        <dd>When all PEs attached to
            an Ethernet segment are allowed to forward known unicast traffic
            to/from that Ethernet segment for a given VLAN,
            but only one does based on receiving a traffic flow from the access for that VLAN,
            then the Ethernet segment is defined to be operating
            in Single-Flow-Active redundancy mode.</dd>
        <dt>DF-Election:</dt>
        <dd>Designated Forwarder election, as in <xref target="I-D.ietf-bess-rfc7432bis"/>
          and <xref target="RFC8584"/>.</dd>
        <dt>DF:</dt>
        <dd>Designated Forwarder.</dd>
        <dt>Backup-DF (BDF):</dt>
        <dd>Backup-Designated Forwarder.</dd>
        <dt>Non-DF (NDF):</dt>
        <dd>Non-Designated Forwarder.</dd>
        <dt>AC:</dt>
        <dd>Attachment Circuit.</dd>
        <dt>ERL:</dt>
        <dd>EVPN redirect label, as described in this document.</dd>
        <dt>ESL:</dt>
        <dd>EVPN service label, as in <xref target="I-D.ietf-bess-rfc7432bis"/>, <xref
        target="RFC8214"/>, <xref target="RFC9135"/> and <xref target="RFC9136"/>.</dd>
        <dt>FRR:</dt>
        <dd>Fast Re-Route.</dd>
      </dl>
    </section>
    <section anchor="requirements">
      <name>Requirements</name>
      <ol spacing="normal" type="1"><li>EVPN multihoming is often described as 2 peering PEs. The solution MUST be generic enough
        to apply multiple peering PE and no artificial limit imposed on the number of peering PEs.</li>
        <li>The solution MUST apply to all EVPN load-balancing modes. </li>
        <li>The solution MUST be robust enough to tolerate failures of the same ES at multiple PEs.
        Simultaneous as well as cascading failures on the same ES must be addressed.</li>
        <li>The solution MUST support EVPN <xref target="I-D.ietf-bess-rfc7432bis"/>,
        EVPN-VPWS <xref target="RFC8214"/>, EVPN Inter-Subnet Forwarding <xref target="RFC9135"/>
        and EVPN IP-VRF-to-IP-VRF models as in <relref target="RFC9136" section="4.4"/>.</li>
        <li>An implementation of this document SHOULD support one, or many, of the above-listed services.</li>
        <li>The solution SHOULD meet stringent requirements for traffic
        loss of EVPN services.</li>
        <li>The solution MUST allow redirected-traffic to bypass port blocking states resulting from
        DF-Election (BDF or NDF).</li>
        <li>The solution MUST be scale-independent and agnostic of EVPN route types, scale or choice
        of underlay.</li>
        <li>The solution MUST address egress link (PE-CE link) failures.</li>
        <li>The solution MUST be loop-free, and once-redirected traffic MUST never be repeatedly
        redirected.</li>
        <li>The solution MUST NOT rely on pushing an additional label onto the label stack, or on
        the definition of a special-purpose label (underlay-specific to MPLS)</li>
      </ol>
    </section>
    <section anchor="solution">
      <name>Solution</name>
      <t>Fast convergence in EVPN networks is achieved using a combined approach to
        minimising traffic loss:
      </t>
      <ul spacing="normal">
        <li>Local failure detection and restoration of traffic flows in minimal time using a
            pre-computed redirect path;</li>
        <li>Restoration of optimal traffic paths, and reconvergence of EVPN control plane
            with EVPN mass withdraw.</li>
      </ul>
      <t>
        The solution presented in this document addresses the local failure detection and
        restoration, without impeding on or impacting existing EVPN control plane convergence
        mechanisms.
      </t>
      <t>Consider the following EVPN topology where PE1 and PE2 are multihoming PEs on a shared ES, ESI1.
        EVPN (known unicast) or EVPN‑VPWS traffic from CE1 to CE2 is sent to PE1 and PE2 using
        EVPN service labels ESL1 and/or ESL2 (depending on load-balancing mode of the ESI1 interfaces).
      </t>
 
      <figure anchor="evpn_mh_topology" title="EVPN Multihoming with service and redirect labels">
        <artwork><![CDATA[ 
                                    +------+
                                    |  PE1 |
                                    |      |
                   +-------+        | ESL1---DF----
                   |       |--------|      |       \
                   |       |        | ERL1--------> \
        +-----+    |       |        +------+         \
        |     |    |IP/MPLS|                          \
 CE1 ---| PE3 |----|Core   |                     ESI1  === CE2
        |     |    |Network|                          /
        +-----+    |       |        +------+         /
                   |       |        | ERL2--------> /
                   |       |--------|      |       /
                   +-------+        | ESL2---BDF--X
                                    |      |
                                    |  PE2 |
                                    +------+
                ]]></artwork>
      </figure>

      <t> Alongside the service labels ESL1 and ESL2, two redirect labels
        ERL1 and ERL2 are allocated with special forwarding behaviors, as detailed in <xref target="redirect_label_behaviors"/>.
        Fast-reroute and use of the ERLs is shown in <xref target="failures"/></t>
      <section anchor="backup_selection">
        <name>Pre-selection of Backup Path</name>
        <t>EVPN DF-Election lends itself well to the selection of a pre-computed path amongst any given
        number of peering PEs by providing a DF‑Elected and BDF‑Elected node at the &lt;EVI, ESI&gt;
        granularity (<xref target="RFC8584"/> and <xref target="I-D.ietf-bess-rfc7432bis"/>).</t>
        <t>In All-active mode, all PEs in the Ethernet Segment are actively forwarding known unicast
        traffic to the CE. For All-active services where DF&nbhy;Election is not strictly required
        (EVPN-VPWS) the DF-Election algorithm is run to determine BDF-Elected PE for ERL selection
        purposes only, without impacting the service itself.<br/>
        In Single-active and Port-Active modes, only a single PE in the Ethernet Segment is actively forwarding known unicast
        traffic to the CE: the DF-Elected PE. The BDF-Elected PE is next to be
        elected in the redundancy group and is already known.
        In Single-flow-active mode (<xref target="I-D.ietf-bess-evpn-l2gw-proto"/>), only a single PE in the Ethernet Segment is actively forwarding known
        unicast to the CE for a given flow: the PE which initially
        received that flow from the Ethernet-Segment. The backup PE is the multihoming peer in the
        redundancy group, referred to as "BDF" for consistency with other redundancy modes.</t>
        <t>For consistency across PEs and load-balancing modes, the backup path selected should be in order
        of {DF, BDF, NDF1, NDF2, ...}. The DF-Elected PE selects the next-best BDF-Elected as backup and
        all BDF- and NDF-Elected nodes select the best DF-Elected for the protection of their
        egress links.
        </t>
        <ul spacing="normal">
          <li>PE1 (DF) selects PE2 as BDF,</li>
          <li>PE1&nbsp;&nbsp; (DF) uses the ERL2 label signaled by PE2 to redirect the traffic of
          its failed local AC connected to CE2,</li>
          <li>PE2&nbsp;&nbsp; (BDF) uses the ERL1 label signaled by PE1 to redirect the traffic of
          its failed local AC connected to CE2,</li>
          <li>PE..n (NDF) use the ERL1 label signaled by PE1 to redirect the traffic of
          their failed local AC connected to CE2.</li>
        </ul>
        <t>The use of PE2's ERL2 as redirect label applies to local failures in all load-balancing
        modes at PE1.</t>
        <t>The number of peering PEs is not limited by existing DF-Election algorithms. A solution
        based on DF-Election supports subsequent redirection upon multiple cascading failures, once
        a new DF-Election has occurred. Pre-selection of a backup path is supported by all current
        DF-Election algorithms, and more generally by all algorithms supporting BDF-Election,
        as recommended in (<xref target="I-D.ietf-bess-rfc7432bis"/>).
        </t>
      </section>
      <section anchor="failures">
        <name>Failure Detection and Traffic Restoration</name>

        <figure anchor="evpn_mh_failure" title="EVPN Multihoming failure scenario">
          <artwork><![CDATA[ 
                                        +------+
                                        |  PE1 |
                                        |      |
                       +-------+        | ESL1-----XX..
                       |       |--------|      |   *   .
                       |       |        | ERL1 |  *     .
            +-----+    |       |        +------+ *       .
            |     |    |IP/MPLS|                *         .
     CE1 ---| PE3 |----|Core   |               *     ESI1  *** CE2
            |     |    |Network|              *           *
            +-----+    |       |        +----*-+         *
                       |       |        | ERL2* * * * * *
                       |       |--------|      |       /
                       +-------+        | ESL2---BDF--X
                                        |      |
                                        |  PE2 |
                                        +------+

                ]]></artwork>
        </figure>
        
        <t>The procedures for forwarding known unicast packets received from a remote PE on the local
        redirect label follow <relref target="I-D.ietf-bess-rfc7432bis" section="13.2.2"/>
        for known unicast traffic. Since the CE next-hop forwarding information reflects the current
            BDF state of the AC, additional steps to bypass blocking state
            and preventing another re-direction are applied, as described further in this document.</t>
        <t>Consider the EVPN multihoming topology in <xref target="evpn_mh_topology"/>, and a
        traffic flow from CE1 to CE2 which is currently using EVPN service label ESL1 and forwarded through the core
        arriving at PE1. When the local AC representing the &lt;EVI,ESI&gt; pair is protected using the fast-reroute solution,
        the pre-computed backup path's redirect label (i.e. ERL2 from BDF-Elected PE2) is installed against the AC.</t>
        <t>Under normal conditions, PE1 disposition using ESL1 will result in forwarding the packet
        to the CE by selecting the local AC associated with the EVPN service label
        (<xref target="RFC8214"/>, <xref target="I-D.ietf-bess-rfc7432bis"/>).
        When this local AC is in failed state, the fast-reroute solution at PE1 will begin rerouting packets
        using the BDF-Elected peer's
        nexthop and ERL2. ERL2 is chosen for redirected traffic and not ESL2 to
        prevent loops and overcome DF-Election timing as described in Sections <xref target="loop_free" format="counter"/> and <xref target="df_bypass" format="counter"/> respectively.</t>
        <section>
          <name>Simultaneous Failures in ES</name>
          <t>In EVPN multihoming where the CE connects to peering PEs
            through link aggregation (LAG), a single LAG failure at the CE may manifest as multiple ES failures
            at all peering PEs simultaneously.</t>
            
          <t>As all peering PEs would enable simultaneously the fast-reroute mechanism,
            redirection would be permanent causing a traffic storm or until TTL expires.</t>

          <t>Once-redirected traffic may
            not be redirected again, according to the terminal nature of ERLs described in <xref target="loop_free"/></t>
        </section>
        <section>
          <name>Successive and Cascading Failures in ES</name>
          <t>Trying to support cascading failures by redirecting once-redirected traffic is
            substantially equivalent to simultaneous failures above.</t>
          <t>Once-redirected traffic may not be redirected again, according to the terminal nature
            of ERLs described in <xref target="loop_free"/> and loss is to be expected until EVPN
            control plane reconverges for double-failure scenarios.</t>
          <t>In a scenario with 3 peering PEs (PE1-DF, PE2-BDF, PE3-NDF) where PE1 fails, followed
            by a PE2 failure before control-plane reconvergence, there is no reroute of PE1's
            original traffic
            towards PE3 because the reroute-label is terminal at PE2.</t>
          <t>In such rapid-succession failures, it is expected that control plane must first correct for
            the initial failure and DF-Elect PE2 as new‑DF and PE3 as the new‑BDF.
            PE2 to PE3 redirection would then begin, unless control-plane is rapid enough to correct
            directly, and elect PE3 new-DF.</t>
        </section>
      </section>
    </section>
    <section anchor="redirect_label_behaviors">
      <name>Redirect Labels: Forwarding Behaviors</name>
      <t>The EVPN redirect labels MUST be downstream assigned, and it is directly associated with
        the &lt;EVI,ESI&gt; AC being egress protected. The special forwarding characteristics and use of an
        EVPN redirect label (ERL) described below, are a matter
        of local significance only to the advertising PE (which is also the disposition PE).</t>
      <t>Special behaviors to the ERLs do not affect any other PEs or transit P nodes.
        There are no extra labels appended to the label stack in the IP/MPLS network and the ERL
        appears to label-switching transit nodes as would any other EVPN service label. Since they
        appear as EVPN service labels, ERL labels do not have any impact on Flow-Label or
        Control-Word procedures in <xref target="I-D.ietf-bess-rfc7432bis"/>.

      </t>
      <ul spacing="normal">
        <li>Traffic redirection and use of reroute labels may create routing loops upon multiple
        failures. Such loops are detrimental to the network and may cause congestion between protected PEs.
        </li>
        <li>
        Local restoration and redirection is meant to occur much faster than control-plane
        operations, meaning redirected packets may arrive at the BDF PE long before a DF-Election
        operation unblocks the egress link.</li>
      </ul>
      <t>
        Two special forwarding characteristics and behaviors of EVPN redirect labels are described below to
        mitigate these issues.</t>
      <section anchor="df_bypass">
        <name>Bypassing DF-Election Behavior</name>
        <t>Local detection and restoration at DF-Elected PE1 will begin rapidly redirecting traffic onto the
        backup path selected (PE2).<br/>
        Redirected packets will arrive at the Backup-DF port much faster than control plane
        DF-Election at the Backup-DF peer is capable of unblocking its local egress link for the
        shared ES (ESI1).  All redirected traffic would drop at Backup-DF and
        no net reduction in traffic loss is achieved.</t>

        <t>Traffic restoration remains dependant upon ES route or Ethernet A-D per ES/EVI routes
        withdrawal for a DF-Election operation and for PE1 to assume the traffic forwarding role.
        This is especially important in single-active load-balancing mode where known unicast
        traffic is blocked.
        </t>

        <t>To mitigate this, the redirect labels allocated must carry a special attribute in the
        local forwarding and decapsulation chain:
        for traffic received on the ERL when the AC is up, an override to the DF&nbhy;Election is applied
        and traffic from the ERL will bypass the local Backup-DF blocking state.
        Once EVPN control plane reconverges, traffic from the ERL will cease and the
        optimal forwarding path based on ESLs will resume.</t>
        
        <t>The EVPN redirect label MUST carry a context locally, such that from disposition to
        egress redirected packets are allowed to bypass the Backup-DF blocking state that would otherwise
        drop. Similarly, this may open the gate to the traffic in the reverse direction.<br/>
        In Port-Active mode, the Backup-DF interface may signal Out-of-Service but remain in Up/Backup
        state: to support EVPN Fast Reroute, the CE must be able to receive traffic from an OOS
        LAG link.</t>
      </section>
      <section anchor="loop_free">
        <name>Terminal Disposition Behavior</name>
        <t>The reroute scheme is susceptible to loops and persistant redirects
        between peering PEs which have setup FRR redirection.
        Consider the scenario where both CE-facing interfaces fail simultaneously, fast reroute
        will be activated at both PE1 and PE2 effectively bouncing a redirected packet
        between the two PEs indefinitely (or until the TTL expires) causing a traffic storm.</t>
        <t>To prevent this, a distinction is made between 'regular' EVPN service labels for
        disposition (i.e. known unicast EVI label or EVPN-VPWS label) and reroute labels with terminal disposition.
</t>
        <t>At the redirecting PE2, we consider the case of ESL2 vs. ERL2 , where both are locally allocated and provided in EVPN routes (downstream
                allocation) to BGP peers:
        </t>
        <ol spacing="normal" type="1"><li>
            <t>EVPN Service label, ESL2:
            </t>
            <ul spacing="normal">
              <li>Regular MAC-lookup or traffic forwarding occurs towards the access AC. </li>
              <li>If the AC is up, traffic will exit the interface, subject to local blocking state on
             the AC from DF-Election.</li>
              <li>If the AC is down and fast-reroute procedures are enabled, traffic may be re-encapsulated using
             BDF peer's redirect label ERL1 (if received).</li>
              <li>In most implementations, MACs are flushed on PE2 upon AC failure. When
              fast-reroute procedures are enabled at PE2, it must maintain all MAC-CE2 programmed against the
              failed access AC for some time in order for the MAC-lookup to provide traffic continuity to
              the failed AC and the redirection above.</li>
            </ul>
          </li>
          <li>
            <t>EVPN Reroute label, ERL2:
            </t>
            <ul spacing="normal">
              <li>Regular MAC-lookup or traffic forwarding occurs towards the access AC.</li>
              <li>If the AC is up, traffic will apply an override to DF-Election and bypass
             the local blocking state on the AC.</li>
              <li>If the AC is down, traffic is dropped. No reroute must occur of once-rerouted
             traffic. Redirecting towards peer's redirect label ERL1 is explicitly prevented.</li>
            </ul>
          </li>
        </ol>
        <t>The ERL acts like a local cross-connect by providing a direct channel from disposition to the AC.
        ERLs are terminal-disposition and prevents once‑redirected packets from being redirected again.
        
        With this forwarding attribute on ERLs, known only locally to the downstream-allocating PE, redirection is achieved without growing the label stack
        with another special purpose label.</t>
      </section>

    </section>
    <section>
      <name>Controlled Recovery Sequence</name>
      <t>Fast reroute mechanisms such as the one described in this document
             generally provide a way to preserve traffic flows at failure time.
             Use of fast reroute in EVPN, however, permits setting up a controlled recovery sequence
             to shorten the period of loss between an interface coming up and the EVPN DF-Election
             procedures and default timers for peer discovery.</t>
      <t>The benefit of a controlled recovery sequence is amplified when used in conjunction
        with <xref target="I-D.ietf-bess-evpn-fast-df-recovery"/> (synchronised DF-Election)&gt;</t>
    </section>

    <section>
      <name>Transport Underlay</name>
        <t>The solution is agnostic to transport underlays, for instance similar behavior is carried
        forward for NVO tunnels (VXLAN) and SRv6.</t>

        <section>
        <name>NVO Tunnels</name>
        <t>The rerouting procedures and behaviors in this document apply as well for <xref
        target="RFC8365"/> NVO tunnels: traffic destined to an Ethernet Segment link in a failed
        state should be re-encapsulated into a NVO redirect tunnel.</t>
        <t>For MPLS-based NVO tunnels, i.e. MPLSoGRE, MPLSoUDP, etc., no additional behaviors are required.</t>
        <t>For non-MPLS NVO tunnels, the labels are 24-bit VNIs, not downstream assigned and usually global, i.e. same
        value for all the PEs attached to the BD. In this case, the rerouting mechanisms described
        in this document would not work without some additional behaviors: the rerouting
        mechanism needs to avoid local-bias split-horizon filtering upon reception of the redirected packets.</t>
           
           <section anchor="ignore_local_bias">
        <name>Ignoring Local Bias Behavior</name>
        <t>Non-MPLS NVO tunnel encapsulations may use local-bias procedures instead of
        ES label-based split-horizon (for EVPN multihoming).<br/>            
        
        This means that, e.g. when PE1 sends redirected traffic to multihoming peer PE2, PE2 in the
        example above will drop the packets due to the filtering based on the tunnel source IP.

        To support non-MPLS NVO tunnels such as VXLAN, PE2 needs to bypass the
        source IP based filtering if the source IP identifies a local redirection instance.<br/>

        When rerouting traffic, PE1 uses a shared Anycast IP of the failed Ethernet Segment as the
        source IP address of the redirection NVO tunnel.
        PE2 applies the forwarding behaviors in <xref target="redirect_label_behaviors"/>
        when the source&nbsp;IP of received packets matches a local Anycast&nbsp;IP.
        Specifically, the Local Bias split-horizon
        filtering of NVO tunnels which implements "Non-DF blocking" based on the unicast
        source&nbsp;IP of the Ethernet Segment peer is bypassed at PE2 due to the use of an Anycast source IP in the
        redirection tunnel.</t>
        </section>
        
        </section> 

        <section><name>Segment Routing v6</name>
        <t>Ethernet A-D per EVI routes are advertised along with the Service SID used for End.DX2 or End.DT2U behaviors <relref target="RFC9252" section="6.1.2"/>.
        These advertisements correspond to the ESL behavior in this document (EVPN Service SID).
        An additional EVPN Redirect SID is advertised in Ethernet A-D per EVI routes to enable EVPN
        Fast Reroute, with one of 2 new SRv6 Endpoint Behaviors at the disposition PE (PE2). At the redirecting PE1, the
        EVPN&nbsp;Redirect&nbsp;SID is used to implement ERL behaviors described in <xref target="failures"/>.</t>
        

        <section anchor="dt2u.reroute"><name>End.DT2U.Reroute : End.DT2U with Fast Reroute</name>

        <t>The "End.DT2U with Fast Reroute" behavior ("End.DT2U.Reroute" for short) is a variant of
        the End.DT2U behavior.</t>

        <t>The End.DT2U.Reroute behavior is defined for the fast-reroute application between two
        EVPN multi-homing peers, and extends the base End.DT2U behavior. This behavior takes an
        optional Fast Reroute argument: "Arg.FR2". This argument provides a local mapping to Attachment Circuit (EVI/ESI) for the received traffic, which also implements the forwarding
   behaviors in <xref target="redirect_label_behaviors"/>.</t>

        <t>Any SID instance of this behavior may be used in two ways:</t>
        <ol>
          <li>by ingress PEs not performing any reroute (such as PE3 in <xref
          target="evpn_mh_topology"/>) by setting the Arg.FR2 argument as zero for handling at an
          egress PE that is the same as End.DT2U</li>
 
          <li>by peering PEs performing redirection (such as PE1 in <xref
          target="evpn_mh_failure"/>), by setting the argument Arg.FR2 with a non-zero value for the
          reroute handling in addition to the End.DT2U functionality</li>
        </ol>

        <t>Thus, the SID entry for this behavior when instantiated in the FIB performs the disposition
        of both base L2 Table traffic (i.e., the base End.DT2U behavior) traffic as well as rerouted
        traffic (i.e., the End.DT2U+Arg.FR2 handling).
        End.DT2U processing is as in <relref target="RFC8986" section="4.11"/>.</t>

        <t>When processing the Upper-Layer header of a packet matching a FIB entry locally
        instantiated as an End.DT2U.Reroute SID, N does the following: </t>
        <sourcecode type="pseudocode" markers="false">
S01. If (Upper-Layer header type == 143(Ethernet) ) {
S02.    Remove the outer IPv6 header with all its extension headers
S03.    If (Arg.FR2 is 0) {
S04.       Process as per Section 4.11 of [RFC8986]  (End.DT2U)
S05.    } Else {
S06.       Lookup the egress interface L2 OIF I for Arg.FR2
S07.    If (L2 OIF interface I is down) {
S08.       Drop the Ethernet frame
S09.    } Else {
S10.       Forward the Ethernet frame to the OIF I
              bypassing any EVPN DF-Election blocking state
S11.    }
S12. } Else {
S13.    Process as per Section 4.1.1 of [RFC8986]
S14. }
        </sourcecode>


        <t>To maintain backwards-compatibility, both End.DT2U.Reroute and End.DT2U Behavior SIDs
        MAY be advertised together whereby legacy receivers ignore the SRv6 SID of unknown behavior
        End.DT2U.Reroute.</t>

        <t>The SRv6 L2 Service TLV in this case will carry two SRv6 SID Information sub-TLVs:</t>
        <ul>
            <li>the first one with the base End.DT2U behavior and</li>
            <li>the second one with the End.DT2U.Reroute behavior variant.<br/>
            The second one will have a non-zero Arg length (AL) and convey Arg.FR2 embedded in the
            advertised SID</li>
        </ul>

        <t>When advertised alongside an End.DT2U EVPN Service SID, the End.DT2U.Reroute EVPN Reroute
        SID MUST be identical to the End.DT2U except for the inclusion of an Argument Arg.FR2.
        Both SRv6 SIDs can use transposition since the function MUST be identical between the 2 SIDs.
        A receiver unable to validate the applicability of arguments for SRv6 Endpoint Behaviors
        that are unknown to it MUST ignore the End.DT2U.Reroute SID (<relref target="RFC9252"
        section="3.2.1"/>).</t>


        <t>Following is an example representation of the BGP Prefix-SID Attribute encoding in this
        case for a 16-bit argument Arg.FR2 (0xaaaa):</t>

        <figure anchor="dt2u_sid_format" title="EVPN Route Type 1 with dual End.DT2U SIDs">
          <artwork><![CDATA[ 
BGP Prefix SID Attr:
   SRv6 L2 Service TLV:
      SRv6 SID Information sub-TLV:
         SID: 2001:db8:b:1:fbd1::
            Behavior: End.DT2U
            SRv6 SID Structure sub-sub-TLV:
               LBL: 48, LNL: 16, FL: 16, AL: 0, TPOS-L: 0, TPOS-O: 0
      SRv6 SID Information sub-TLV:
         SID: 2001:db8:b:1:fbd1:aaaa::
            Behavior: End.DT2U.Reroute
            SRv6 SID Structure sub-sub-TLV:
               LBL: 48, LNL: 16, FL: 16, AL: 16, TPOS-L: 0, TPOS-O: 0
]]></artwork>
        </figure>

        <t>When both End.DT2U.Reroute and End.DT2U are advertised, the ingress PE not performing
        reroute MUST use the End.DT2U as the EVPN Service SID.</t>

        </section>

        <section anchor="dx2.reroute"><name>End.DX2.Reroute : End.DX2 with Fast Reroute</name>

        <t>The "End.DX2 with Fast Reroute" behavior ("End.DX2.Reroute" for short) is a variant of
        the End.DX2 behavior.</t>

        <t>The text in this section mirrors that of <xref target="dt2u.reroute"/> (End.DT2U.Reroute) and is included
        for completeness' sake.</t>

        <t>The End.DX2.Reroute behavior is defined for the fast-reroute application between two
        EVPN multi-homing peers, and extends the base End.DX2 behavior. This behavior takes an
        optional Fast Reroute argument: "Arg.FR2". This argument provides a local mapping to
        Attachment Circuit (EVI/ESI) for the received traffic, which also implements the forwarding
        behaviors in <xref target="redirect_label_behaviors"/>.
        </t>

        <t>Any SID instance of this behavior may be used in two ways:</t>
        <ol>
          <li>by ingress PEs not performing any reroute (such as PE3 in <xref
          target="evpn_mh_topology"/>) by setting the Arg.FR2 argument as zero for handling at an
          egress PE that is the same as End.DX2</li>
 
          <li>by peering PEs performing redirection (such as PE1 in <xref
          target="evpn_mh_failure"/>), by setting the argument Arg.FR2 with a non-zero value for the
          reroute handling in addition to the End.DX2 functionality</li>
        </ol>

        <t>Thus, the SID entry for this behavior when instantiated in the FIB performs the disposition
        of both base L2 Table traffic (i.e., the base End.DX2 behavior) traffic as well as rerouted
        traffic (i.e., the End.DX2+Arg.FR2 handling).
        End.DX2 processing is as in <relref target="RFC8986" section="4.9"/>.</t>

        <t>When processing the Upper-Layer header of a packet matching a FIB entry locally
        instantiated as an End.DX2.Reroute SID, N does the following: </t>
        <sourcecode type="pseudocode" markers="false">
S01. If (Upper-Layer header type == 143(Ethernet) ) {
S02.    Remove the outer IPv6 header with all its extension headers
S03.    If (Arg.FR2 is 0) {
S04.       Process as per Section 4.9 of [RFC8986]  (End.DX2)
S05.    } Else {
S06.       Lookup the egress interface L2 OIF I for Arg.FR2
S07.    If (L2 OIF interface I is down) {
S08.       Drop the Ethernet frame
S09.    } Else {
S10.       Forward the Ethernet frame to the OIF I
              bypassing any EVPN DF-Election blocking state
S11.    }
S12. } Else {
S13.    Process as per Section 4.1.1 of [RFC8986]
S14. }
        </sourcecode>


        <t>To maintain backwards-compatibility, both End.DX2.Reroute and End.DX2 Behavior SIDs
        MAY be advertised together. Receiving PEs SHOULD use the SRv6 SID from the first instance of
        the Sub-TLV only (<relref target="RFC9252"
        section="3.1"/>), and ignore the SRv6 SID of unknown behavior
        End.DX2.Reroute (<relref target="RFC9252"
        section="3.2.1"/>).</t>

        <t>The SRv6 L2 Service TLV in this case will carry two SRv6 SID Information sub-TLVs:</t>
        <ul>
            <li>the first one with the base End.DX2 behavior and</li>
            <li>the second one with the End.DX2.Reroute behavior variant.<br/>
            The second one will have a non-zero Arg length (AL) and convey Arg.FR2 embedded in the
            advertised SID</li>
        </ul>

        <t>When advertised alongside an End.DX2 EVPN Service SID, the End.DX2.Reroute EVPN Reroute
        SID MUST be identical to the End.DX2 except for the inclusion of an Argument Arg.FR2.
        Both SRv6 SIDs can use transposition since the function MUST be identical between the 2 SIDs.
        A receiver unable to validate the applicability of arguments for SRv6 Endpoint Behaviors
        that are unknown to it MUST ignore the End.DX2.Reroute SID (<relref target="RFC9252"
        section="3.2.1"/>).</t>


        <t>Following is an example representation of the BGP Prefix-SID Attribute encoding in this
        case for a 16-bit argument Arg.FR2 (0xaaaa):</t>

        <figure anchor="dx2_sid_format" title="EVPN Route Type 1 with dual End.DX2 SIDs">
          <artwork><![CDATA[ 
BGP Prefix SID Attr:
   SRv6 L2 Service TLV:
      SRv6 SID Information sub-TLV:
         SID: 2001:db8:b:1:fbd1::
            Behavior: End.DX2
            SRv6 SID Structure sub-sub-TLV:
               LBL: 48, LNL: 16, FL: 16, AL: 0, TPOS-L: 0, TPOS-O: 0
      SRv6 SID Information sub-TLV:
         SID: 2001:db8:b:1:fbd1:aaaa::
            Behavior: End.DX2.Reroute
            SRv6 SID Structure sub-sub-TLV:
               LBL: 48, LNL: 16, FL: 16, AL: 16, TPOS-L: 0, TPOS-O: 0
]]></artwork>
        </figure>

        <t>When both End.DX2.Reroute and End.DX2 are advertised, the ingress PE not performing
        reroute MUST use the End.DX2 as the EVPN Service SID.</t>

        </section>

        <section><name>Conflicting Endpoint Behaviors</name>
        <t>End.DT2U.Reroute ad End.DX2.Reroute are variants of their respective base behaviors and
        when two SIDs are advertised together in an Ethernet A-D per EVI route, the variant
        advertised MUST be the same as base behavior.<br/>
        In other words, advertisement of an End.DT2U.Reroute variant alongside an End.DX2 base is
        unusable and SHALL be discarded by receivers, and similarly an End.DX2.Reroute variant
        advertised alongside an End.DT2U base SHALL be discarded by receivers.</t>
        </section>

        </section>

        <section><name>Inter-AS Option B</name>
        <t>EVPN multi-homing peers in different AS are rather an exception.
        In Inter-AS Option&nbsp;B or inter&nbhy;domain scenarios, the ASBR/ABR and BGP
        route-reflectors with nexthop-self procedures are extended:</t>
        <ul>
        <li>Prior to this spec the ABR/ASBR receives the Ethernet A-D per EVI route, programs
        a label swap operation and redistributes the route with a new allocated label in the NLRI's
        label field.</li>
        <li>To implement the procedures in this document, the ABR/ASBR needs to allocate
        two downstream labels for each Ethernet-A-D per EVI route: one for the NLRI's label (ERL)
        and another one for the ESI Label Extended Community label (ESL). A label swap operation is
        programmed for both ERL and ESL labels.</li>
        </ul>
        </section>
    </section>

    <section>
      <name>BGP Extensions</name>
      <t>While this document describes a new behavior, there are no new BGP extensions required
      to advertise the redirect label(s) used for EVPN egress link protection. 
        
        The ESI Label Extended Community defined in <relref target="I-D.ietf-bess-rfc7432bis" section="7.5"/> may be
        advertised along with Ethernet A-D routes:
      </t>
      <ul spacing="normal">
        <li>When advertised with an Ethernet A-D per ES route, it enables
        split-horizon procedures for multihomed sites as described in <relref target="I-D.ietf-bess-rfc7432bis"
        section="8.3"/>;</li>
        <li>When advertised with an Ethernet A-D per EVI route, it enables link
        protection and fast‑reroute procedures for multihomed sites as described in this
        document. The label value represents the per-&lt;EVI,ESI&gt; EVPN redirect label (ERL).
        The Flags field SHOULD NOT be set and MUST be ignored.</li>
      </ul>
      <t>Prior to this document, advertising the ESI Label Extended Community along with
      an Ethernet A-D per EVI route (Ethertag different than MAX-ET) was undefined, and presumably ignored.
        </t>
      <t>Remote PEs SHOULD NOT use the ERLs as a substitution for ESLs in route resolution, and
        is especially not to be confused with the aliasing and backup path ESL as described and used
        in <relref target="I-D.ietf-bess-rfc7432bis" section="8.4"/>.</t>
    </section>

    <section>
      <name>Security Considerations</name>
      <t>The mechanisms in this document use the EVPN control plane as defined
           in <xref target="I-D.ietf-bess-rfc7432bis"/> and <xref target="RFC8214"/>, and the security considerations
           described therein are
           equally applicable.
           Reroute labels redistributed in EVPN control plane
           are meant for consumption by the peering PE in a same ES. It is, however,
           visible in the EVPN control plane to remote peers. Care shall be taken when
           installing reroute labels, since their use may result in bypassing
           DF-Election procedures and lead to duplicate traffic at CEs if incorrectly installed.
      </t>
    </section>
    
   <section anchor="acknowledgements"><name>Acknowledgements</name>
      <t>Authors would like to thank Ketan Talaulikar for his review of SRv6 procedures in this
      document.</t>
    </section>
    <section anchor="iana"><name>IANA Considerations</name>
      <t>This document introduces two new Endpoint behaviors. This document requests IANA assign a
      two new values and update the "SRv6 Endpoint Behaviors" subregistry under the top-level
      "Segment Routing" registry as follows:</t>

<table>
<name>SRv6 Endpoint Behaviors Subregistry</name>
<thead><tr><td>Value</td><td>Hex</td><td>Endpoint Behavior</td><td>Reference</td></tr></thead>
<tbody>
<tr><td>TBD</td><td>TBD</td><td>End.DT2U.Reroute</td><td>This document</td></tr>
<tr><td>TBD</td><td>TBD</td><td>End.DX2.Reroute</td><td>This document</td></tr>
</tbody>
</table>


    </section>
  </middle>
  <!--  *****BACK MATTER ***** -->
  <back>
    <!-- References split into informative and normative -->
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>

        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.2119.xml"/>
        <?rfc include="reference.I-D.draft-ietf-bess-rfc7432bis-10.xml"?>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.8174.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.8365.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.8214.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.8584.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.8986.xml"/>
      </references>
      
      <references>
        <name>Informative References</name>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.8679.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.9135.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.9136.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.9252.xml"/>

        <?rfc include="reference.I-D.draft-ietf-bess-evpn-fast-df-recovery-10.xml"?>
        <?rfc include="reference.I-D.draft-ietf-bess-evpn-l2gw-proto-02.xml"?>

      </references>
    </references>
  </back>
</rfc>
