<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY RFC7432 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7432.xml">
<!ENTITY RFC8214 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8214.xml">
<!ENTITY RFC7623 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7623.xml">
<!ENTITY RFC8365 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8365.xml">
<!ENTITY RFC8584 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8584.xml">
<!ENTITY RFC2119 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC8174 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml">
<!ENTITY I-D.ietf-bess-evpn-virtual-eth-segment SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-bess-evpn-virtual-eth-segment.xml">
]>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-ietf-bess-evpn-pref-df-13"
     ipr="trust200902" submissionType="IETF" updates="8584">
  <!--Generated by id2xml 1.5.0 on 2019-12-17T09:50:28Z -->

  <?rfc strict="yes"?>

  <?rfc compact="yes"?>

  <?rfc subcompact="no"?>

  <?rfc symrefs="yes"?>

  <?rfc sortrefs="no"?>

  <?rfc text-list-symbols="o-*+"?>

  <?rfc toc="yes"?>

  <front>
    <title>Preference-based EVPN DF Election</title>

    <author fullname="J. Rabadan" initials="J." role="editor"
            surname="Rabadan">
      <organization>Nokia</organization>

      <address>
        <postal>
          <street>520 Almanor Avenue</street>

          <city>Sunnyvale</city>

          <region>CA</region>

          <code>94085</code>

          <country>USA</country>
        </postal>

        <email>jorge.rabadan@nokia.com</email>
      </address>
    </author>

    <author fullname="S. Sathappan" initials="S." surname="Sathappan">
      <organization>Nokia</organization>

      <address>
        <email>senthil.sathappan@nokia.com</email>
      </address>
    </author>

    <author fullname="W. Lin" initials="W." surname="Lin">
      <organization>Juniper Networks</organization>

      <address>
        <email>wlin@juniper.net</email>
      </address>
    </author>

    <author fullname="J. Drake" initials="J." surname="Drake">
      <organization>Independent</organization>

      <address>
        <email>je_drake@yahoo.com</email>
      </address>
    </author>

    <author fullname="A. Sajassi" initials="A." surname="Sajassi">
      <organization>Cisco Systems</organization>

      <address>
        <email>sajassi@cisco.com</email>
      </address>
    </author>

    <date day="9" month="October" year="2023"/>

    <workgroup>BESS Workgroup</workgroup>

    <abstract>
      <t>The Designated Forwarder (DF) in Ethernet Virtual Private Networks
      (EVPN) is defined as the Provider Edge (PE) router responsible for
      sending Broadcast, Unknown unicast and Multicast traffic (BUM) to a
      multi-homed device/network in the case of an all-active multi-homing
      Ethernet Segment (ES), or BUM and unicast in the case of single-active
      multi-homing. The Designated Forwarder is selected out of a candidate
      list of PEs that advertise the same Ethernet Segment Identifier (ESI) to
      the EVPN network, according to the Default Designated Forwarder Election
      algorithm. While the Default Algorithm provides an efficient and
      automated way of selecting the Designated Forwarder across different
      Ethernet Tags in the Ethernet Segment, there are some use cases where a
      more 'deterministic' and user-controlled method is required. At the same
      time, Network Operators require an easy way to force an on-demand
      Designated Forwarder switchover in order to carry out some maintenance
      tasks on the existing Designated Forwarder or control whether a new
      active PE can preempt the existing Designated Forwarder PE.</t>

      <t>This document proposes a Designated Forwarder Election algorithm that
      meets the requirements of determinism and operation control. This
      document updates RFC8584 by modifying the definition of the DF Election
      Extended Community. </t>
    </abstract>
  </front>

  <middle>
    <section anchor="sect-1" title="Introduction">
      <section anchor="sect-1.1" title="Problem Statement">
        <t><xref target="RFC7432"/> defines the Designated Forwarder (DF) in
        EVPN networks as the PE responsible for sending Broadcast, Unknown
        unicast and Multicast traffic (BUM) to a multi-homed device/network in
        the case of an all-active multi-homing Ethernet Segment or BUM and
        unicast traffic to a multi-homed device or network in the case of
        single-active multi-homing. The Designated Forwarder is selected out
        of a candidate list of PEs that advertise the Ethernet Segment
        Identifier (ESI) to the EVPN network and according to the Designated
        Forwarder Election Algorithm, or DF Alg as per <xref
        target="RFC8584"/>.</t>

        <t>While the Default Designated Forwarder Algorithm <xref
        target="RFC7432"/> or the Highest Random Weight algorithm (HRW) <xref
        target="RFC8584"/> provide an efficient and automated way of selecting
        the Designated Forwarder across different Ethernet Tags in the
        Ethernet Segment, there are some use-cases where a more
        user-controlled method is required. At the same time, Network
        Operators require an easy way to force an on-demand Designated
        Forwarder switchover in order to carry out some maintenance tasks on
        the existing Designated Forwarder or control whether a new active PE
        can preempt the existing Designated Forwarder PE.</t>
      </section>

      <section anchor="sect-1.2" title="Solution Requirements">
        <t>The procedures described in this document meet the following
        requirements:<list style="letters">
            <t>The solution provides an administrative preference option so
            that the user can control in what order the candidate PEs may
            become Designated Forwarder, assuming they are all operationally
            ready to take over as Designated Forwarder. The operator can
            determine whether the Highest-Preference or Lowest-Preference PE
            among the PEs in the Ethernet Segment will be elected as
            Designated Forwarder, based on the DF Algorithms described in this
            document.</t>

            <t>The extensions in this document work for <xref
            target="RFC7432"/> Ethernet Segments and virtual Ethernet
            Segments, as defined in <xref
            target="I-D.ietf-bess-evpn-virtual-eth-segment"/>.</t>

            <t>The user may force a PE to preempt the existing Designated
            Forwarder for a given Ethernet Tag without re-configuring all the
            PEs in the Ethernet Segment, by simply modifying the existing
            administrative preference in that PE.</t>

            <t>The solution allows an option to NOT preempt the current
            Designated Forwarder ("Don't Preempt" capability), even if the
            former Designated Forwarder PE comes back up after a failure. This
            is also known as "non-revertive" behavior, as opposed to the <xref
            target="RFC7432"/> Designated Forwarder election procedures that
            are always revertive (because the winner PE of the default
            Designated Forwarder election algorithm always takes over as the
            operational Designated Forwarder).</t>

            <t>The procedures described in this document support single-active
            and all-active multi-homing Ethernet Segments.</t>
          </list></t>
      </section>

      <section anchor="sect-1.3" title="Solution Overview">
        <t>To provide a solution that satisfies the above requirements, we
        introduce two new DF Algorithms that can be advertised in the DF
        Election Extended Community <xref target="sect-3"/>. Carried with the
        new DF Election Extended Community variants are a DF election
        preference advertised for each PE, that influences which PE will
        become DF <xref target="sect-4.1"/>. The advertised DF election
        preference can dynamically vary from the administratively configured
        preference to provide non-revertive behavior <xref
        target="sect-4.3"/>. An optional solution is discussed in <xref
        target="sect-4.2"/>, for use in Ethernet segments that support large
        numbers of Ethernet Tags and therefore need to balance load among
        multiple DFs. </t>
      </section>
    </section>

    <section anchor="sect-2" title="Requirements Language and Terminology">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in BCP 14
      <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when,
      they appear in all capitals, as shown here.</t>

      <t><list style="symbols">
          <t>AC - Attachment Circuit. An AC has an Ethernet Tag associated to
          it.</t>

          <t>CE - Customer Equipment router.</t>

          <t>DF - Designated Forwarder.</t>

          <t>DF Alg - refers to Designated Forwarder Election Algorithm. This
          is sometimes shortened to &ldquo;Alg&rdquo; in this document.</t>

          <t>DP - refers to the "Don't Preempt" (me) capability in the
          Designated Forwarder Election extended community.</t>

          <t>ENNI - Ethernet Network to Network Interface.</t>

          <t>ES and vES - Ethernet Segment and virtual Ethernet Segment.</t>

          <t>Ethernet A-D per EVI route - refers to <xref target="RFC7432"/>
          route type 1 or Auto-Discovery per EVPN Instance route.</t>

          <t>EVC - Ethernet Virtual Circuit.</t>

          <t>EVI - EVPN Instance.</t>

          <t>Ethernet Tag - used to represent a Broadcast Domain that is
          configured on a given Ethernet Segment for the purpose of Designated
          Forwarder election. Note that any of the following may be used to
          represent a Broadcast Domain: VIDs (including Q-in-Q tags),
          configured IDs, VNI (VXLAN Network Identifiers), normalized VID,
          I-SIDs (Service Instance Identifiers), etc., as long as the
          representation of the broadcast domains is configured consistently
          across the multi-homed PEs attached to that Ethernet Segment. The
          Ethernet Tag value MUST NOT be zero.</t>

          <t>HRW - Highest Random Weight, as per <xref target="RFC8584"/>.</t>

          <t>OAM - refers to Operations And Maintenance protocols.</t>
        </list></t>
    </section>

    <section anchor="sect-3" title="EVPN BGP Attributes Extensions">
      <t>This solution reuses and extends the Designated Forwarder Election
      Extended Community defined in <xref target="RFC8584"/> that is
      advertised along with the Ethernet Segment route. It does so by
      replacing the last two reserved octets of the DF Election Extended
      Community when the DF Algorithm is set to Highest-Preference or
      Lowest-Preference. This document also defines a new capability referred
      to as the "Don't Preempt" capability, that MAY be used with
      Highest-Preference or Lowest-Preference DF Algorithms. The format of the
      DF Election Extended Community that is used in this document
      follows:</t>

      <figure anchor="df-election-extended-community"
              title="DF Election Extended Community">
        <artwork><![CDATA[
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type=0x06     | Sub-Type(0x06)| RSV |  DF Alg |    Bitmap     ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~     Bitmap    |   Reserved    |   DF Preference (2 octets)    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
      </figure>

      <t>Where the above fields are defined as follows:</t>

      <t><list style="symbols">
          <t>DF Algorithm can have the following values:<list style="symbols">
              <t>Alg 0 - Default Designated Forwarder Election algorithm, or
              modulus-based algorithm as per <xref target="RFC7432"/>.</t>

              <t>Alg 1 - HRW algorithm as per <xref target="RFC8584"/>.</t>

              <t>Alg 2 - Highest-Preference algorithm (this document <xref
              target="sect-4.1"/>).</t>

              <t>Alg TBD - Lowest-Preference algorithm (this document <xref
              target="sect-4.1"/>). TBD will be replaced by the allocated
              value at the time of publication.</t>
            </list></t>
        </list><list style="symbols">
          <t>Bitmap (2 octets) encodes "capabilities" <xref
          target="RFC8584"/>, where this document defines the "Don't Preempt"
          capability, used to indicate if a PE supports a non-revertive
          behavior:</t>
        </list></t>

      <figure anchor="bitmap-field-in-the-df-election-extended-community"
              title="Bitmap field in the DF Election Extended Community">
        <artwork><![CDATA[
                       1 1 1 1 1 1
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |D|A|                           |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
      </figure>

      <t><list style="empty">
          <t><list style="symbols">
              <t>Bit 0 (corresponds to Bit 24 of the Designated Forwarder
              Election Extended Community and it is defined by this document):
              the D bit or 'Don't Preempt' bit (DP hereafter), determines if
              the PE advertising the Ethernet Segment route requests the
              remote PEs in the Ethernet Segment not to preempt it as
              Designated Forwarder. The default value is DP=0, which is
              compatible with the 'preempt' or 'revertive' behavior in the
              Default DF Algorithm <xref target="RFC7432"/>. The DP capability
              is supported by the Highest-Preference or Lowest-Preference DF
              Algorithms. The procedures of the "Don't Preempt" capability for
              other DF Algorithms are out of the scope of this document. The
              procedures of the "Don't Preempt" capability for the
              Highest-Preference and Lowest-Preference DF Algorithms are
              described in <xref target="sect-4.1"/>.</t>

              <t>Bit 1: AC-DF or AC-Influenced Designated Forwarder Election
              is described in <xref target="RFC8584"/>. When set to 1, it
              indicates the desire to use AC-Influenced Designated Forwarder
              Election with the rest of the PEs in the Ethernet Segment. The
              AC-DF capability bit MAY be set along with the DP capability and
              the Highest-Preference or Lowest-Preference DF Algorithms.</t>
            </list></t>
        </list><list style="symbols">
          <t>Designated Forwarder (DF) Preference (described in this
          document): defines a 2-octet value that indicates the PE preference
          to become the Designated Forwarder in the Ethernet Segment, as
          described in <xref target="sect-4.1"/>. The allowed values are
          within the range 0-65535, and the default value MUST be 32767. This
          value is the midpoint in the allowed Preference range of values,
          which gives the operator the flexibility of choosing a significant
          number of values, above or below the default Preference. A
          numerically higher or lower value of this field is more preferred
          for Designated Forwarder election depending on the DF Algorithm
          being used, as explained in <xref target="sect-4.1"/>. The
          Designated Forwarder Preference field is specific to DF Algorithms
          Highest-Preference and Lowest-Preference, and this document does not
          define any meaning for other algorithms. If the DF Algorithm is
          different from Highest-Preference or Lowest-Preference, these two
          octets can be encoded differently.</t>

          <t>RSV and Reserved fields (from bit 16 to bit 18, and from bit 40
          to 47): when DF Algorithm is set to Highest-Preference or
          Lowest-Preference algorithm, the values are set to zero when
          advertising the Ethernet Segment route, and they are ignored when
          receiving the Ethernet Segment route.</t>
        </list></t>
    </section>

    <section anchor="sect-4" title="Solution description">
      <t><xref target="es-and-deterministic-df-election"/> illustrates an
      example that will be used in the description of the solution.</t>

      <figure anchor="es-and-deterministic-df-election"
              title="Preference-based DF Election">
        <artwork><![CDATA[
              EVPN network
         +-------------------+
         |                +-------+  ENNI    Aggregation
         |   <---ESI1,500 |  PE1  |   /\  +----Network---+
         | <-----ESI2,100 |       |===||===              |
         |                |       |===||== \      vES1   |  +----+
     +-----+              |       |   \/  |\----------------+CE1 |
CE3--+ PE4 |              +-------+       | \   ------------+    |
     +-----+                 |            |  \ /         |  +----+
         |                   |            |   X          |
         |   <---ESI1,255  +-----+============ \         |
         | <-----ESI2,200  | PE2 |==========    \ vES2   | +----+
         |                 +-----+        | \    ----------+CE2 |
         |                   |            |  --------------+    |
         |                 +-----+   ----------------------+    |
         | <-----ESI2,300  | PE3 +--/     |              | +----+
         |                 +-----+        +--------------+
         --------------------+]]></artwork>
      </figure>

      <t><xref target="es-and-deterministic-df-election"/> shows three PEs
      that are connecting EVCs coming from the Aggregation Network to their
      EVIs in the EVPN network. CE1 is connected to vES1 - that spans PE1 and
      PE2 - and CE2 is connected to vES2, that is attached to PE1, PE2 and
      PE3.</t>

      <t>If the algorithm chosen for vES1 and vES2 is DF Algorithm
      Highest-Preference or Lowest-Preference, the PEs may become Designated
      Forwarder irrespective of their IP address and based on the
      administrative Preference value. The following sections provide some
      examples of the procedures and how they are applied in the use-case of
      <xref target="es-and-deterministic-df-election"/>.</t>

      <section anchor="sect-4.1"
               title="Use of the Highest-Preference and Lowest Preference Algorithm">
        <t>Assuming the operator wants to control - in a flexible way - what
        PE becomes the Designated Forwarder for a given virtual Ethernet
        Segment and the order in which the PEs become Designated Forwarder in
        case of multiple failures, the Highest-Preference or Lowest-Preference
        algorithms can be used. Using the example in <xref
        target="es-and-deterministic-df-election"/>, these algorithms are used
        as follows:</t>

        <t><list style="letters">
            <t>vES1 and vES2 are now configurable with three optional
            parameters that are signaled in the Designated Forwarder Election
            extended community. These parameters are the Preference,
            Preemption option (or "Don't Preempt" option) and DF Algorithm. We
            will represent these parameters as (Pref,DP,Alg). For instance,
            vES1 (Pref,DP,Alg) is configured as (500,0,Highest-Preference) in
            PE1, and (255,0,Highest-Preference) in PE2. vES2 is configured as
            (100,0,Highest-Preference), (200,0,Highest-Preference) and
            (300,0,Highest-Preference) in PE1, PE2 and PE3 respectively.</t>

            <t>The PEs advertise an Ethernet Segment route for each virtual
            Ethernet Segment, including the three parameters indicated in 'a'
            above, in the Designated Forwarder Election Extended Community
            (encoded as described in <xref target="sect-3"/>).</t>

            <t>According to <xref target="RFC8584"/>, each PE will run the
            Designated Forwarder election algorithm upon expiration of the DF
            Wait timer. Each PE runs the Highest-Preference or
            Lowest-Preference DF Algorithm for each Ethernet Segment as
            follows: <list style="symbols">
                <t>The PE will check the DF Algorithm value in each Ethernet
                Segment route, and assuming all the Ethernet Segment routes
                (including the local route) are consistent in this DF
                Algorithm (that is, all are configured for Highest-Preference
                or Lowest-Preference, but not a mix), the PE runs the
                procedure in this section. Otherwise, the procedure falls back
                to <xref target="RFC7432"/> Default Algorithm. The
                Highest-Preference and Lowest-Preference Algorithms are
                different Algorithms, therefore if two PEs configured for
                Highest-Preference and Lowest-Preference respectively, are
                attached to the same Ethernet Segment, the operational
                Designated Forwarder Election Algorithm will fall back to the
                Default Algorithm.</t>

                <t>If all the PEs attached to the Ethernet Segment advertise
                Highest-Preference Algorithm, each PE builds a list of
                candidate PEs, ordered by Preference value from the
                numerically highest value to lowest value. E.g., PE1 builds a
                list of candidate PEs for vES1 ordered by the Preference, from
                high to low: &lt;PE1, PE2&gt; (since PE1's preference is more
                preferred than PE2's). Hence, PE1 becomes the Designated
                Forwarder for vES1. In the same way, PE3 becomes the
                Designated Forwarder for vES2.</t>

                <t>If all the PEs attached to the Ethernet Segment advertise
                Lowest-Preference Algorithm, then the candidate list is
                ordered from the numerically lowest Preference value to the
                highest Preference value. E.g., PE1's ordered list for vES1 is
                &lt;PE2, PE1&gt;. Hence, PE2 becomes the Designated Forwarder
                for vES1. In the same way, PE1 becomes the Designated
                Forwarder for vES2.</t>
              </list></t>

            <t>Assuming some maintenance tasks had to be executed on a PE the
            operator may want to make sure the PE is not the Designated
            Forwarder for the Ethernet Segment so that the impact on the
            service is minimized. E.g., if PE3 is going on maintenance and the
            DF Algorithm is Highest-Preference, the operator could change
            vES2's Preference on PE3 from 300 to e.g., 50 (hence, the Ethernet
            Segment route from PE3 is updated with the new preference value)
            so that PE2 is forced to take over as Designated Forwarder for
            vES2 (irrespective of the DP capability). Once the maintenance
            task on PE3 is over, the operator could decide to leave the latest
            configured preference value or configure the initial preference
            value back. A similar procedure can be used for DF Algorithm
            Lowest-Preference too, that is, suppose the algorithm for vES2 is
            Lowest-Preference, and PE1 (the DF) goes on maintenance mode. The
            operator could change vES2's Preference on PE1 from 100 to e.g.,
            250, so that PE2 is forced to take over as Designated Forwarder
            for vES2.</t>

            <t>In case of equal Preference in two or more PEs in the Ethernet
            Segment, the DP bit and the numerically lowest IP address of the
            candidate PE(s) are used as tiebreakers. The procedures for the
            use of the DP bit are specified in <xref target="sect-4.3"/>.If
            more than one PE is advertising itself as the preferred Designated
            Forwarder, an implementation MUST first select the PE advertising
            the DP bit set, and then select the PE with the lowest IP address
            (if the DP bit selection does not yield a unique candidate). The
            PE's IP address is the address used in the candidate list and it
            is derived from the Originating Router's IP address of the
            Ethernet Segment route. In case PEs use the Originating Router's
            IP address of different families, an IPv4 address is always
            considered numerically lower than an IPv6 address. Some examples
            of the use of the DP bit and IP address tiebreakers follow: <list
                style="symbols">
                <t>If vES1 parameters were (500,0,Highest-Preference) in PE1
                and (500,1,Highest-Preference) in PE2, PE2 would be elected
                due to the DP bit. The same example applies if PE1 and PE2
                advertise Lowest-Preference DF Algorithm instead.</t>

                <t>If vES1 parameters were (500,0,Highest-Preference) in PE1
                and (500,0,Highest-Preference) in PE2, PE1 would be elected,
                if PE1's IP address is lower than PE2's. Or PE2 would be
                elected if PE2's IP address is lower than PE1's. The same
                example applies if PE1 and PE2 advertise Lowest-Preference DF
                Algorithm instead.</t>
              </list></t>

            <t>The Preference is an administrative option that MUST be
            configured on a per-Ethernet Segment basis, and it is normally
            configured from the management plane. The Preference value MAY
            also be dynamically changed based on the use of local policies
            that react to events on the PE. The following examples illustrate
            the use of local policy to change the Preference value in a
            dynamic way.<list style="empty">
                <t>E.g., on PE1, if the DF Algorithm is Highest-Preference,
                ES1's Preference value can be lowered from 500 to 100 in case
                the bandwidth on the ENNI port is decreased by 50% (that could
                happen if e.g., the 2-port Link Aggregation Group between PE1
                and the Aggregation Network loses one port).</t>

                <t>Local policy MAY also trigger dynamic Preference changes
                based on the PE's bandwidth availability in the core, specific
                ports going operationally down, etc.</t>

                <t>The definition of the actual local policies is out of scope
                of this document.</t>
              </list></t>
          </list></t>

        <t>The Highest-Preference and Lowest-Preference Algorithms MAY be used
        along with the AC-DF capability. Assuming all the PEs in the Ethernet
        Segment are configured consistently with Highest-Preference or
        Lowest-Preference Algorithm and AC-DF capability, a given PE in the
        Ethernet Segment is not considered as a candidate for Designated
        Forwarder Election until its corresponding Ethernet A-D per ES and
        Ethernet A-D per EVI routes are received, as described in <xref
        target="RFC8584"/>.</t>

        <t>The Highest-Preference and Lowest-Preference DF Algorithms can be
        used in different virtual Ethernet Segments on the same PE. For
        instance, PE1 and PE2 can use Highest-Preference for vES1 and PE1, PE2
        and PE3 Lowest-Preference for vES2. The use of one DF Algorithm over
        the other is the operator's choice. The existence of both provides
        flexibility and full control to the operator.</t>

        <t>The procedures in this document can be used in <xref
        target="RFC7432"/>-based Ethernet Segment or virtual Ethernet Segment
        as in <xref target="I-D.ietf-bess-evpn-virtual-eth-segment"/>, and
        also EVPN networks as in <xref target="RFC8214"/>, <xref
        target="RFC7623"/> or <xref target="RFC8365"/>.</t>
      </section>

      <section anchor="sect-4.2"
               title="Use of the Highest-Preference or Lowest-Preference algorithm in [RFC7432] Ethernet Segments">
        <t>While the Highest-Preference or Lowest-Preference DF Algorithm
        described in <xref target="sect-4.1"/> is typically used in virtual
        Ethernet Segment scenarios where there is normally an individual
        Ethernet Tag per virtual Ethernet Segment, the existing <xref
        target="RFC7432"/> definition of an Ethernet Segment allows
        potentially up to thousands of Ethernet Tags on the same Ethernet
        Segment. If this is the case, if Highest-Preference or
        Lowest-Preference Algorithm is configured in all the PEs of the
        Ethernet Segment, the same PE will be the elected Designated Forwarder
        for all the Ethernet Tags of the Ethernet Segment. A potential way to
        achieve a more granular load balancing is described below.</t>

        <t>The Ethernet Segment is configured with an administrative
        Preference value and an administrative DF Algorithm, i.e.,
        Highest-Preference or Lowest-Preference Algorithm. However, the
        administrative DF Algorithm (which is used to signal the DF Algorithm
        for the Ethernet Segment) MAY be overridden to a different operational
        DF Algorithm for a range of Ethernet Tags. With this option, the PE
        builds a list of candidate PEs ordered by Preference, however the
        Designated Forwarder for a given Ethernet Tag will be determined by
        the locally overridden DF Algorithm.</t>

        <t>For instance:</t>

        <t><list style="symbols">
            <t>Assuming ES3 is defined in PE1 and PE2, PE1 may be configured
            as (500,0,Highest-Preference) for ES3 and PE2 as
            (100,0,Highest-Preference). Both PEs will advertise the Ethernet
            Segment routes for ES3 with the indicated parameters in the DF
            Election Extended Community.</t>

            <t>In addition, assuming VLAN-based service interfaces and that
            the PEs are attached to all Ethernet Tags in the range 1-4000,
            both PE1 and PE2 may be configured with (Ethernet
            Tag-range,Lowest-Preference), e.g., (2001-4000,
            Lowest-Preference).</t>

            <t>This will result in PE1 being Designated Forwarder for Ethernet
            Tags 1-2000 (since they use the default Highest-Preference
            Algorithm) and PE2 being Designated Forwarder for Ethernet Tags
            2001-4000, due to the local policy overriding the
            Highest-Preference Algorithm.</t>
          </list></t>

        <t>While the above logic provides a perfect load balancing
        distribution of Ethernet Tags per Designated Forwarder when there are
        only two PEs, for Ethernet Segments attached to three or more PEs,
        there would be only two Designated Forwarder PEs for all the Ethernet
        Tags. Any other logic that provides a fair distribution of the
        Designated Forwarder function among the three or more PEs is valid, as
        long as that logic is consistent in all the PEs in the Ethernet
        Segment. It is important to note that, when a local policy overrides
        the Highest-Preference or Lowest-Preference signaled by all the PEs in
        the Ethernet Segment, this local policy MUST be consistent in all the
        PEs of the Ethernet Segment. If the local policy is inconsistent for a
        given Ethernet Tag in the Ethernet Segment, packet drops or packet
        duplication may occur on that Ethernet Tag. For all these reasons the
        use of virtual Ethernet Segments is RECOMMENDED for cases where more
        than two PEs per Ethernet Segment exist and a good load balancing
        distribution per Ethernet Tag of the Designated Forwarder function is
        desired. </t>
      </section>

      <section anchor="sect-4.3" title="The Non-Revertive Capability">
        <t>As discussed in <xref target="sect-1.2"/> (d), a capability to NOT
        preempt the existing Designated Forwarder (for all the Ethernet Tags
        in the Ethernet Segment) is required and therefore added to the
        Designated Forwarder Election extended community. This option allows a
        non-revertive behavior in the Designated Forwarder election.</t>

        <t>Note that when a given PE in an Ethernet Segment is taken down for
        maintenance operations, before bringing it back, the Preference may be
        changed in order to provide a non-revertive behavior. The DP bit and
        the mechanism explained in this section will be used for those cases
        when a former Designated Forwarder comes back up without any
        controlled maintenance operation, and the non-revertive option is
        desired in order to avoid service impact.</t>

        <t>In <xref target="es-and-deterministic-df-election"/>, we assume
        that based on the Highest-Preference Algorithm, PE3 is the Designated
        Forwarder for ESI2.</t>

        <t>If PE3 has a link, EVC or node failure, PE2 would take over as
        Designated Forwarder. If/when PE3 comes back up again, PE3 will take
        over, causing some unnecessary packet loss in the Ethernet
        Segment.</t>

        <t>The following procedure avoids preemption upon failure recovery
        (please refer to <xref target="es-and-deterministic-df-election"/>).
        The procedure supports a non-revertive mode that can be used along
        with: <list style="symbols">
            <t>Highest-Preference Algorithm</t>

            <t>Lowest-Preference Algorithm</t>

            <t>Highest-Preference or Lowest-Preference Algorithm, where a
            local policy overrides the Highest/Lowest-Preference tiebreaker
            for a range of Ethernet Tags <xref target="sect-4.2"/></t>
          </list>The procedure is described assuming Highest-Preference
        Algorithm in the Ethernet Segment, where local policy overrides the
        tiebreaker for a given Ethernet Tag. The other cases above are a
        sub-set of this one and the differences are explained.</t>

        <t><list style="numbers">
            <t>A "Don't Preempt" capability is defined on a
            per-PE/per-Ethernet Segment basis, as described in <xref
            target="sect-3"/>. If "Don't Preempt" is disabled (default
            behavior), the PE sets DP to zero and advertises it in an Ethernet
            Segment route. If "Don't Preempt" is enabled, the Ethernet Segment
            route from the PE indicates the desire of not being preempted by
            the other PEs in the Ethernet Segment. All the PEs in an Ethernet
            Segment should be consistent in their configuration of the DP
            capability, however, this document does not enforce the
            consistency across all the PEs. In case of inconsistency in the
            support of the DP capability in the PEs of the same Ethernet
            Segment, non-revertive behavior is not guaranteed. However, PEs
            supporting this capability still attempt this procedure.</t>

            <t>We assume we want to avoid 'preemption' in all the PEs in the
            Ethernet Segment, the three PEs are configured with the "Don't
            Preempt" capability. In this example, we assume ESI2 is configured
            as 'DP=enabled' in the three PEs.</t>

            <t>We also assume vES2 is attached to Ethernet Tag-1 and Ethernet
            Tag-2. vES2 uses Highest-Preference as DF Algorithm and a local
            policy is configured in the three PEs to use Lowest-Preference for
            Ethernet Tag-2. When vES2 is enabled in the three PEs, the PEs
            will exchange the Ethernet Segment routes and select PE3 as
            Designated Forwarder for Ethernet Tag-1 (due to the
            Highest-Preference), and PE1 as Designated Forwarder for Ethernet
            Tag-2 (due to the Lowest-Preference).</t>

            <t>If PE3's vES2 goes down (due to EVC failure - detected by OAM,
            or port failure or node failure), PE2 will become the Designated
            Forwarder for Ethernet Tag-1. No changes will occur for Ethernet
            Tag-2.</t>

            <t>When PE3's vES2 comes back up, PE3 will start a boot-timer (if
            booting up) or hold-timer (if the port or EVC recovers). That
            timer will allow some time for PE3 to receive the Ethernet Segment
            routes from PE1 and PE2. This timer is applied between the INIT
            and the DF_WAIT states in the Designated Forwarder Election Finite
            State Machine described in <xref target="RFC8584"/>. PE3 will
            then:<list style="symbols">
                <t>Select a "reference-PE" among the Ethernet Segment routes
                in the virtual Ethernet Segment. If the Ethernet Segment uses
                the Highest-Preference algorithm, select a "Highest-PE". If it
                uses the Lowest-Preference algorithm, select a "Lowest-PE". If
                a local policy is in use, to override the
                Highest/Lowest-Preference for a range of Ethernet Tags (as
                discussed in <xref target="sect-4.2"/>), it is necessary to
                select both a Highest-PE and a Lowest-PE. They are selected as
                follows: <list style="symbols">
                    <t>The Highest-PE is the PE with higher Preference, using
                    the DP bit first (with DP=1 being better) and, after that,
                    the lower PE-IP address as tiebreakers. </t>

                    <t>The Lowest-PE is the PE with lower Preference, using
                    the DP bit first (with DP=1 being better) and, after that,
                    the lower PE-IP address as tiebreakers. </t>

                    <t>In our example, the Highest-Preference algorithm is
                    used, with a local policy to override it to use
                    Lowest-Preference for a range of Ethernet Tags. Therefore
                    PE3 selects a Highest-PE and a Lowest-PE. PE3 will select
                    PE2 as Highest-PE over PE1, since, when comparing
                    (Pref,DP,PE-IP), (200,1,PE2-IP) wins over (100,1,PE1-IP).
                    PE3 will select PE1 as Lowest-PE over PE2, since
                    (100,1,PE1-IP) wins over (200,1,PE2-IP). Note that if
                    there were only one remote PE in the Ethernet Segment,
                    Lowest and Highest PE would be the same PE.</t>
                  </list></t>

                <t>Check its own administrative Pref and compare it with the
                one of the Highest-PE and Lowest-PE that have the DP
                capability set in their Ethernet Segment routes. Depending on
                this comparison PE3 sends the Ethernet Segment route with a
                (Pref,DP) that may be different from its administrative
                (Pref,DP):<list style="symbols">
                    <t>If PE3's Pref value is higher or equal than the
                    Highest-PE's, PE3 will send the Ethernet Segment route
                    with an 'in-use' operational Pref equal to the
                    Highest-PE's and DP=0.</t>

                    <t>If PE3's Pref value is lower or equal than the
                    Lowest-PE's, PE3 will send the Ethernet Segment route with
                    an 'in-use' operational Preference equal to the
                    Lowest-PE's and DP=0.</t>

                    <t>If PE3's Pref value is not higher or equal than the
                    Highest-PE's and is not lower or equal than the
                    Lowest-PE's, PE3 will send the Ethernet Segment route with
                    its administrative (Pref,DP)=(300,1).</t>

                    <t>In this example, PE3's administrative Pref=300 is
                    higher than the Highest-PE with DP=1, that is, PE2
                    (Pref=200). Hence, PE3 will inherit PE2's preference and
                    send the Ethernet Segment route with an operational
                    'in-use' (Pref,DP)=(200,0).</t>
                  </list></t>

                <t>Note that, a PE will always send its DP capability set to
                zero as long as the advertised Pref is the 'in-use'
                operational Pref (as opposed to the 'administrative'
                Pref).</t>

                <t>This Ethernet Segment route update sent by PE3, with
                (200,0,PE3-IP), will not cause any Designated Forwarder
                switchover for any Ethernet Tag. PE2 will continue being
                Designated Forwarder for Ethernet Tag-1. This is because the
                DP bit will be used as a tiebreaker in the Designated
                Forwarder election. That is, if a PE has two candidate PEs
                with the same Pref, it will pick the one with DP=1. There are
                no Designated Forwarder changes for Ethernet Tag-2 either.</t>
              </list></t>

            <t>For any subsequent received update/withdraw in the Ethernet
            Segment, the PEs will go through the process described in (5) to
            select Highest and Lowest-PEs, now considering themselves as
            candidates. For instance, if PE2 fails, upon receiving PE2's
            Ethernet Segment route withdrawal, PE3 and PE1 will go through the
            selection of new Highest and Lowest-PEs (considering their own
            active Ethernet Segment route) and then they will run the
            Designated Forwarder Election.<list style="symbols">
                <t>If a PE selects itself as new Highest or Lowest-PE and it
                was not before, the PE will then compare its operational
                'in-use' Pref with its administrative Pref. If different, the
                PE will send an Ethernet Segment route update with its
                administrative Pref and DP values. In the example, PE3 will be
                the new Highest-PE, therefore it will send an Ethernet Segment
                route update with (Pref,DP)=(300,1).</t>

                <t>After running the Designated Forwarder Election, PE3 will
                become the new Designated Forwarder for Ethernet Tag-1. No
                changes will occur for Ethernet Tag-2.</t>
              </list></t>
          </list></t>

        <t>Note that, irrespective of the DP bit, when a PE or Ethernet
        Segment comes back and the PE advertises a Designated Forwarder
        Election Algorithm different from the one configured in the rest of
        the PEs in the Ethernet Segment, all the PEs in the Ethernet Segment
        MUST fall back to the Default <xref target="RFC7432"/> Algorithm.</t>

        <t>This document does not modify the use of the P and B bits in the
        Ethernet A-D per EVI routes <xref target="RFC8214"/> advertised by the
        PEs in the Ethernet Segment after running the Designated Forwarder
        Election, irrespective of the revertive or non-revertive behavior in
        the PE.</t>
      </section>
    </section>

    <section anchor="sect-5" title="Security Considerations">
      <t>This document describes a Designated Forwarder Election Algorithm
      that provides absolute control (by configuration) over what PE is the
      Designated Forwarder for a given Ethernet Tag. While this control is
      desired in many situations, a malicious user that gets access to the
      configuration of a PE in the Ethernet Segment may change the behavior of
      the network. In other DF Algorithms such as HRW, the Designated
      Forwarder Election is more automated and cannot be determined by
      configuration. With Highest-Preference or Lowest-Preference as DF
      Algorithm, an attacker may change the configuration of the Preference
      value on a PE and Ethernet Segment, and impact the traffic going through
      that PE and Ethernet Segment.</t>

      <t>The non-revertive capability described in this document may be seen
      as a security improvement over the regular EVPN revertive Designated
      Forwarder Election: an intentional link (or node) "flapping" on a PE
      will only cause service disruption once, when the PE goes to
      Non-Designated Forwarder state. However, an attacker who gets access to
      the configuration of a PE in the Ethernet Segment will be able to
      disable the non-revertive behavior, by advertising a conflicting DF
      election algorithm and thereby forcing fallback to the Default
      algorithm. </t>

      <t>The document also describes how a local policy can override the
      Highest-Preference or Lowest-Preference algorithms for a range of
      Ethernet Tags in the Ethernet Segment. If the local policy is not
      consistent across all PEs in the Ethernet Segment and there is an
      Ethernet Tag that ends up with an inconsistent use of Highest-Preference
      or Lowest-Preference in different PEs, packet drop or packet duplication
      may occur for that Ethernet Tag.</t>

      <t>Finally, the two Designated Forwarder Election Algorithms specified
      in this document (Highest-Preference and Lowest-Preference) do not
      change the way the PEs share their Ethernet Segment information,
      compared to the algorithms in <xref target="RFC7432"/> and <xref
      target="RFC8584"/>. Therefore the security considerations in <xref
      target="RFC7432"/> and <xref target="RFC8584"/> apply to this document
      too.</t>
    </section>

    <section anchor="sect-6" title="IANA Considerations">
      <t>This document solicits:</t>

      <t><list style="symbols">
          <t>The allocation of two new values in the "DF Alg" registry created
          by <xref target="RFC8584"/> as follows:<figure>
              <artwork><![CDATA[Alg         Name                               Reference
----        -----------------------------      -------------
2           Highest-Preference Algorithm       This document
TBD         Lowest-Preference Algorithm        This document]]></artwork>
            </figure></t>

          <t>The allocation of a new value in the "DF Election Capabilities"
          registry created by <xref target="RFC8584"/> for the 2-octet Bitmap
          field in the DF Election Extended Community (Border gateway Protocol
          (BGP) Extended Communities registry), as follows:<figure>
              <artwork><![CDATA[Bit         Name                             Reference
----        -----------------------------    -------------
0           D (Don't Preempt) Capability     This document]]></artwork>
            </figure></t>
        </list><list style="symbols">
          <t>To update the reference of the "DF Election Extended Community"
          field, in the EVPN Extended Community Sub-Types registry, as
          follows:<figure>
              <artwork><![CDATA[Sub-Type Value     Name                              Reference
--------------     ------------------------------    ---------------------------
0x06               DF Election Extended Community    [RFC8584] and This Document]]></artwork>
            </figure> </t>
        </list></t>
    </section>

    <section anchor="sect-7" title="Acknowledgments ">
      <t>The authors would like to thank Kishore Tiruveedhula and Sasha
      Vainshtein for their review and comments. Also thank you to Luc Andre
      Burdet and Stephane Litkowski for their thorough review and suggestions
      for a new DF Algorithm for lowest-preference.</t>
    </section>

    <section anchor="sect-8" title="Contributors ">
      <t>In addition to the authors listed, the following individuals also
      contributed to this document:</t>

      <t>Tony Przygienda, Juniper</t>

      <t>Satya Mohanty, Cisco</t>

      <t>Kiran Nagaraj, Nokia</t>

      <t>Vinod Prabhu, Nokia</t>

      <t>Selvakumar Sivaraj, Juniper</t>

      <t>Sami Boutros, VMWare</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      &RFC7432;

      &RFC8584;

      &RFC2119;

      &RFC8174;

      &I-D.ietf-bess-evpn-virtual-eth-segment;
    </references>

    <references title="Informative References">
      &RFC8214;

      &RFC8365;

      &RFC7623;
    </references>
  </back>
</rfc>
