<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent">

<rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="std"
     consensus="true" docName="draft-ietf-pim-drlb-15" number="8775"
     ipr="trust200902" obsoletes="" updates="" submissionType="IETF"
     xml:lang="en" tocInclude="true" tocDepth="6" symRefs="true"
     sortRefs="true" version="3">

  <!-- xml2rfc v2v3 conversion 2.39.0 -->
  <!-- ***** FRONT MATTER ***** -->
  <front>
    <title abbrev="PIM Designated Router Load Balancing">PIM Designated Router
    Load Balancing</title>
    <seriesInfo name="RFC" value="8775"/>
    <author fullname="Yiqun Cai" initials="Y" surname="Cai">
      <organization>Alibaba Group</organization>
      <address>
	<postal>
	  <street>520 Almanor Avenue</street>
	  <city>Sunnyvale</city><region>CA</region>
	  <code>94085</code>
	  <country>United States of America</country>
	</postal>
        <email>yiqun.cai@alibaba-inc.com</email>
      </address>
    </author>
    <author initials="H" surname="Ou" fullname="Heidi Ou">
      <organization>Alibaba Group</organization>
      <address>
	<postal>
	  <street>520 Almanor Avenue</street>
	  <city>Sunnyvale</city><region>CA</region>
	  <code>94085</code>
	  <country>United States of America</country>
	</postal>
        <email>heidi.ou@alibaba-inc.com</email>
      </address>
    </author>
    <author initials="S" surname="Vallepalli" fullname="Sri Vallepalli">
      <address>
        <email>vallepal@yahoo.com</email>
      </address>
    </author>
    <author initials="M" surname="Mishra" fullname="Mankamana Mishra">
      <organization>Cisco Systems, Inc.</organization>
      <address>
        <postal>
          <street>821 Alder Drive,</street>
          <city>Milpitas</city>
          <region>CA</region>
	  <code>95035</code>
          <country>United States of America</country>
        </postal>
        <email>mankamis@cisco.com</email>
      </address>
    </author>
    <author initials="S" surname="Venaas" fullname="Stig Venaas">
      <organization>Cisco Systems, Inc.</organization>
      <address>
        <postal>
          <street>Tasman Drive</street>
          <city>San Jose</city>
          <region>CA</region>
	  <code>95134</code>
          <country>United States of America</country>
        </postal>
        <email>stig@cisco.com</email>
      </address>
    </author>
    <author initials="A" surname="Green" fullname="Andy Green">
      <organization>British Telecom</organization>
      <address>
        <postal>
          <street>Adastral Park</street>
          <city>Ipswich</city>
          <code>IP5 2RE</code>
          <country>United Kingdom</country>
        </postal>
        <email>andy.da.green@bt.com</email>
      </address>
    </author>
    <date year="2020" month="April" />
    <area>Routing</area>
    <keyword>Multicast</keyword>
    <abstract>
      <t>On a multi-access network, one of the PIM-SM (PIM Sparse Mode)
      routers is elected as a
      Designated Router. One of the responsibilities of the Designated Router
      is to track local multicast listeners and forward data to these
      listeners if the group is operating in PIM-SM. This
      document specifies a modification to the PIM-SM protocol that
      allows more than one of the PIM-SM routers to take on this responsibility
      so that the forwarding load can be distributed among multiple routers.
      </t>
    </abstract>
  </front>
  <!-- ***** MIDDLE MATTER ***** -->

  <middle>
    <section numbered="true" toc="default">
      <name>Introduction</name>
      <t>On a multi-access LAN (such as an Ethernet) with one or more PIM-SM
      (PIM Sparse Mode) <xref target="RFC7761" format="default"/> routers, one
      of the PIM-SM 
      routers is elected as a Designated Router (DR). The PIM DR has two
      responsibilities in the PIM-SM protocol. For any active sources on a LAN,
      the PIM DR is responsible for registering with the Rendezvous Point (RP)
      if the group is operating in PIM-SM. Also, the PIM DR is responsible for
      tracking local multicast listeners and forwarding data to these
      listeners if the group is operating in PIM-SM.
      </t>
      <t>Consider the following LAN in <xref target="LAN-REC"
      format="default"/>:</t> 
<figure anchor="LAN-REC">
<name>LAN with Receivers</name>
<artwork name="" type="" align="left" alt=""><![CDATA[            
                          (core networks)
                           |     |     |
                           |     |     |
                          R1    R2     R3
                           |     |     |
                           ----(LAN)----
                                 |
                                 |
                         (many receivers)
]]></artwork>
</figure>
      <t>Assume R1 is elected as the DR.  According to the
      PIM-SM protocol, R1 will be responsible for forwarding traffic
      to that LAN on behalf of all local members. In addition to keeping
      track of membership reports, R1 is also responsible for
      initiating the creation of source and/or shared trees towards the
      senders or the RPs. The membership reports would be IGMP or Multicast
      Listener Discovery (MLD)
      messages. This applies to any versions of the IGMP and MLD protocols.
      The most recent versions are IGMPv3 <xref target="RFC3376" format="default"/> and
      MLDv2 <xref target="RFC3810" format="default"/>.
      </t>
      <t>Having a single router acting as DR and being responsible for
      data-plane forwarding leads to several issues.  One of the issues is
      that the
      aggregated bandwidth will be limited to what R1 can handle with
      regards to capacity of incoming links, the interface on the LAN,
      and total forwarding capacity. It is very common that a LAN consists of
      switches that run IGMP/MLD or PIM snooping <xref target="RFC4541"
      format="default"/>. 
      This allows the forwarding of multicast packets to be
      restricted only to segments leading to receivers that have indicated
      their interest in multicast groups using either IGMP or MLD.  The
      emergence of the switched Ethernet allows the aggregated bandwidth to
      exceed, sometimes by a large number, that of a single link.  For
      example, let us modify <xref target="LAN-REC" format="default"/> and
      introduce an Ethernet switch in <xref target="LAN-SWITCH"
      format="default"/>. 
      </t>
      <figure anchor="LAN-SWITCH">
	<name>LAN with Ethernet Switch</name>
      <artwork name="" type="" align="left" alt=""><![CDATA[
                         (core networks)
                          |     |     |
                          |     |     |
                         R1    R2     R3
                          |     |     |
                       +=gi1===gi2===gi3=+
                       +                 +
                       +      switch     +
                       +                 +
                       +=gi4===gi5===gi6=+
                          |     |     |
                         H1    H2     H3
]]></artwork>
      </figure>
      <t>Let us assume that each individual link is a Gigabit Ethernet.  Each
      router (R1, R2, and R3) and the switch have enough forwarding capacity
      to handle hundreds of gigabits of data.
      </t>
      <t>Let us further assume that each of the hosts requests 500 Mbps of
      unique multicast data. This totals to 1.5 Gbps of data, which is less
      than what each switch or the combined uplink bandwidth across the
      routers can handle, even under failure of a single router.
      </t>
      <t> On the other hand, the link between R1 and switch, via port gi1, can
      only handle a throughput of 1 Gbps.  And if R1 is the only DR (the
      PIM DR elected using the procedure defined by <xref target="RFC7761"
      format="default"/>),
      at least 500 Mbps worth of data will be lost because the only link that
      can be used to draw the traffic from the routers to the switch is via
      gi1. In other words, the entire network's throughput is limited by the
      single connection between the PIM DR and the switch (or LAN, as in
      <xref target="LAN-REC" format="default"/>).
      </t>
      <t>Another important issue is related to failover.  If R1 is the only
      forwarder on a shared LAN, when R1
      goes out of service, multicast forwarding for the entire LAN has
      to be rebuilt by the newly elected PIM DR.  However, if there were a
      way that allowed multiple routers to forward to the LAN for
      different groups, failure of one of the routers would only lead to
      disruption to a subset of the flows, therefore improving the overall
      resilience of the network.
      </t>
      <t>This document specifies a modification to the PIM-SM protocol
      that allows more than one of these routers, called Group Designated
      Routers (GDRs), to be selected so that the forwarding load can be
      distributed among a number of routers.
      </t>
    </section>
    <section numbered="true" toc="default">
      <name>Terminology</name>
        <t>
    The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
    "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL 
    NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>",
    "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>", 
    "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are
    to be interpreted as 
    described in BCP&nbsp;14 <xref target="RFC2119"/> <xref target="RFC8174"/>
    when, and only when, they appear in all capitals, as shown here.
        </t>
      <t>With respect to PIM-SM, this document follows the terminology that
      has been defined in <xref target="RFC7761" format="default"/>.
      </t>
      <t> This document also introduces the following new acronyms: 
      </t>
      <dl newline="false" spacing="normal">
        <dt> GDR: Group Designated Router.</dt> 
	<dd>For each multicast
	  flow, either a (*,G) for Any-Source Multicast (ASM) or an (S,G)
	  for Source-Specific Multicast (SSM) <xref target="RFC4607"
	  format="default"/>, 
	  a hash algorithm (described below) is used to select one of the
	  routers as a GDR.  The GDR is responsible for initiating the
	  forwarding tree building process for the corresponding multicast
	  flow.
          </dd>
        <dt>GDR Candidate:</dt> 
	<dd>a router that has the potential to
          become a GDR. There might be multiple GDR Candidates on a LAN,
          but only one can become the GDR for a specific multicast flow.
          </dd>
      </dl>
    </section>
    <section numbered="true" toc="default">
      <name>Applicability</name>
      <t>The extension specified in this document applies to
      PIM-SM routers acting as last-hop routers (there are directly connected
      receivers). It does not alter the behavior of a PIM DR or any other
      routers on the first-hop network (directly connected sources).
      This is because the source tree is built using the IP address of the
      sender, not the IP address of the PIM DR that sends PIM registers
      towards the RP.  The load balancing between first-hop routers can be
      achieved naturally if an IGP provides equal cost multiple paths
      (which it usually does in practice).  Also, distributing the load to do
      source registration does not justify the additional complexity required
      to support it.
      </t>
    </section>
    <section numbered="true" toc="default">
      <name>Functional Overview</name>
      <t>In the PIM DR election as defined in <xref target="RFC7761"
      format="default"/>, when 
      multiple routers are connected to a multi-access LAN (for
      example, an Ethernet), one of them is elected to act as PIM DR.  The
      PIM DR is responsible for sending local Join/Prune messages towards the
      RP or source. In order to elect the PIM DR, each PIM router on the LAN
      examines the received PIM Hello messages and compares its own DR
      priority and IP address with those of its neighbors.  The router with
      the highest DR priority is the PIM DR.  If there are multiple such
      routers, their IP addresses are used as the tiebreaker, as described
      in <xref target="RFC7761" format="default"/>.
      </t>
      <t>
        In order to share forwarding load among last-hop routers, besides the
        normal PIM DR election, one or more GDRs are elected on the
	multi-access LAN.  There is only one PIM DR on the multi-access
        LAN, but there might be multiple GDR Candidates.
      </t>
      <t>For each multicast flow, that is, (*,G) for ASM and (S,G) for SSM,
      a hash algorithm (<xref target="maskalgo" format="default"/>) is used to
      select one of the routers to be the GDR.
      The new DR Load-Balancing Capability (DRLB-Cap) PIM Hello Option is
      used to announce the Capability, as well as the hash algorithm type.
      Routers with the new DRLB-Cap Option advertised in their PIM Hello,
      using the same GDR election hash algorithm and the same DR priority as
      the PIM DR, are considered as GDR Candidates.
      </t>
      <t>Hash masks are defined for Source, Group, and RP, separately, in
      order to handle PIM ASM/SSM.  The masks, as well as a sorted list of GDR
      Candidate addresses, are announced by the DR in a new DR Load-Balancing
      List (DRLB-List) PIM Hello Option.
      </t>
      <t>A hash algorithm based on the announced Source, Group, or RP masks
      allows one GDR to be assigned to a corresponding multicast state.
      That GDR is responsible for initiating the creation of the
      multicast forwarding tree for multicast traffic.
      </t>
      <section numbered="true" toc="default">
        <name>GDR Candidates</name>
        <t>GDR is the new concept introduced by this specification.  GDR
        Candidates are routers eligible for GDR election on the LAN.  To
        become a GDR Candidate, a router must have the same DR priority and
	run the same GDR election hash algorithm as the DR on the LAN.
        </t>
        <t>For example, assume there are 4 routers on the LAN: R1, R2, R3, and
        R4, each announcing a DRLB-Cap Option. R1, R2, and R3 have the same
	DR priority, while R4's DR priority is less preferred.
        In this example, R4 will not be eligible for GDR election, because R4
        will not become a PIM DR unless all of R1, R2, and R3 go out of
        service.
        </t>
        <t>Furthermore, assume router R1 wins the PIM DR election, R1 and R2
        advertise the same hash algorithm for GDR election, while R3 advertises
	a different one. In this case, only R1 and R2 will be eligible for GDR
        election, while R3 will not.
        </t>
        <t>As a DR, R1 will include its own Load-Balancing Hash Masks and 
        the identity of R1 and R2 (the GDR Candidates) in its DRLB-List Hello
        Option.
        </t>
      </section>
    </section>
    <section numbered="true" toc="default">
      <name>Protocol Specification</name>
      <section anchor="maskalgo" numbered="true" toc="default">
        <name>Hash Mask and Hash Algorithm</name>
        <t>A hash mask is used to extract a number of bits from the
        corresponding IP address field (32 for IPv4, 128 for IPv6) and
	calculate a hash value.  A hash value is used to select a GDR from GDR
        Candidates advertised by the PIM DR. Hash masks allow for certain flows
	to always be forwarded by the same GDR, by ignoring certain bits in the
	hash value calculation, so that the hash values are the same. For
	example, 0.0.255.0 defines a
        hash mask for an IPv4 address that masks the first, second, and
        fourth octets, which means that only the third octet will
	influence the hash value computed. Note that the masks need not
	be a contiguous set of bits. For example, for IPv4, 15.15.15.15 would be a
	valid mask.
        </t>
        <t>
	  In the text below, a hash mask is, in some places, said to be zero.
	  A hash mask is zero if no bits are set, that is,
	  0.0.0.0 for IPv4 and :: for IPv6. Also, a hash mask is said to be
	  an all-bits-set mask if it is 255.255.255.255 for IPv4 or
	  ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff for IPv6.
        </t>
        <t>There are three hash masks defined:
        </t>
        <ul spacing="normal">
          <li>RP Hash Mask</li>
          <li>Source Hash Mask</li>
          <li>Group Hash Mask</li>
        </ul>
        <t>The hash masks need to be configured on the PIM routers that can
        potentially become a PIM DR, unless the implementation provides
        default hash mask values.
        An implementation <bcp14>SHOULD</bcp14> have default hash mask values as follows.
	The default RP Hash Mask <bcp14>SHOULD</bcp14> be zero (no bits set). The default
	Source and Group Hash Masks <bcp14>SHOULD</bcp14> both be all-bits-set masks.
	These default values are likely acceptable for most deployments and
	simplify configuration. There is only a need to use other masks if
	one needs to ensure that certain flows are forwarded by the same GDR.
        </t>
        <t>
	  The DRLB-List Hello Option contains a list of GDR Candidates.
	  The first one listed has ordinal number 0, the second listed
	  ordinal number 1, and the last one has ordinal number N - 1 if
	  there are N candidates listed. The hash value computed will be
	  the ordinal number of the GDR Candidate that is acting as GDR for
	  the flow in question.
        </t>
        <t>The input to be hashed is determined as follows:
        </t>
        <ul spacing="normal">
          <li>If the group is in ASM mode and the RP Hash Mask announced by
            the PIM DR is not zero (at least one bit is set), calculate the
	    value of hashvalue_RP (<xref target="algorithm" format="default"/>) to determine
	    the GDR.
            </li>
          <li>If the group is in ASM mode and the RP Hash Mask announced by
            the PIM DR is zero (no bits are set), obtain the value of
            hashvalue_Group (<xref target="algorithm" format="default"/>) to determine the
	    GDR.
            </li>
          <li>If the group is in SSM mode, use
            hashvalue_SG (<xref target="algorithm" format="default"/>) to determine the GDR.
            </li>
        </ul>
        <t>
	  A simple modulo hash algorithm is defined in this document.
          However, to allow another hash algorithm to be used, a 1-octet
          "Hash Algorithm" field is included in the DRLB-Cap Hello Option to
          specify the hash algorithm used by the router.
        </t>
        <t>If different hash algorithms are advertised among the routers
	  on a LAN, only the routers advertising the same hash algorithm
	  as the DR (as well as having the same DR priority as the DR) are
	  eligible for GDR election.
        </t>
      </section>
      <section anchor="algorithm" numbered="true" toc="default">
        <name>Modulo Hash Algorithm</name>
        <t>
	  As part of computing the hash, the notation LSZC(hash_mask) is used
	  to denote the number of zeroes
	  counted from the least significant bit of a hash mask
	  hash_mask. As an example, LSZC(255.255.128) is 7 and
	  LSZC(ffff:8000::) is 111. If all bits are set, LSZC will
	  be 0. If the mask is zero, then
	  LSZC will be 32 for IPv4 and 128 for IPv6.
        </t>
        <t>
	  The number of GDR Candidates is denoted as GDRC.
        </t>
        <t>
	  The idea behind the modulo hash algorithm is, in simple terms,
	  that the corresponding mask is applied to a value, then the result
	  is shifted right LSZC(mask) bits so that the least significant bits
	  that were masked out are not considered. Then, this result is masked
	  by 0xffffffff, keeping only the last 32 bits of the result 
	  (this only makes a difference for IPv6). Finally, the hash value is
	  this result modulo the number of GDR Candidates (GDRC).
        </t>
        <t>
	  The modulo hash algorithm, for computing the values hashvalue_RP,
          hashvalue_Group, and hashvalue_SG, is defined as follows.
        </t>
        <t>
	  hashvalue_RP is calculated as:
	</t>
<artwork>
   (((RP_address &amp; RP_mask) &gt;&gt; LSZC(RP_mask)) &amp; 0xffffffff) % GDRC
</artwork>

<ul empty="true">
	<li>RP_address is the address of the RP defined for the group,
	and RP_mask is the RP Hash Mask.</li>
	</ul>

	  <t>
	  hashvalue_Group is calculated as:
	  </t>
        <artwork>
   (((Group_address &amp; Group_mask) &gt;&gt; LSZC(Group_mask)) &amp; 0xffffffff) 
   % GDRC
</artwork>
<ul empty="true">
          <li>
	      Group_address is the group address, and Group_mask is the
	      Group Hash Mask.</li>
        </ul>

	  <t>
	  hashvalue_SG is calculated as:
	  </t>
<artwork>
   ((((Source_address &amp; Source_mask) &gt;&gt; LSZC(Source_mask)) &amp; 
   0xffffffff) ^ (((Group_address &amp; Group_mask) &gt;&gt; LSZC(Group_mask))
   &amp; 0xffffffff)) % GDRC
</artwork>
        <ul empty="true">
          <li>
	      Group_address is the group address, and Group_mask is the
	      Group Hash Mask.</li>
	</ul>
        <section numbered="true" toc="default">
          <name>Modulo Hash Algorithm Examples</name>
          <t>To help illustrate the algorithm, consider this example.
	  Router X with IPv4 address 203.0.113.1 receives a DRLB-List
	  Hello Option from the DR that announces RP Hash
	  Mask 0.0.255.0 and a list of GDR Candidates, sorted by IP
	  addresses from high to low: 203.0.113.3, 203.0.113.2, and
	  203.0.113.1.  The ordinal number assigned to those addresses
	  would be:
          </t>
	  <t>
          0 for 203.0.113.3; 1 for 203.0.113.2; 2 for 203.0.113.1
	  (Router X).</t>

          <t>Assume there are 2 RPs: RP1 192.0.2.1 for Group1 and RP2
	  198.51.100.2 for Group2.  Following the modulo hash algorithm:
          </t>
	  <ul spacing="normal">
          <li>LSZC(0.0.255.0) is 8, and GDRC is 3.
	  The hashvalue_RP for Group1 with RP RP1 is:
          </li>
	  </ul>
          <ul empty="true">
	  <li>
<artwork>
(((192.0.2.1 &amp; 0.0.255.0) &gt;&gt; 8) &amp; 0xffffffff % 3) 
= 2 % 3
= 2
</artwork>
</li>
          <li>This  matches the ordinal number assigned to Router X.
	  Router X will be the GDR for Group1.</li>
	  </ul>
	  <ul spacing="normal">
          <li>The hashvalue_RP for Group2 with RP RP2 is:</li>
	  </ul>
	  <ul empty="true">
          <li>
<artwork>
(((198.51.100.2 &amp; 0.0.255.0) &gt;&gt; 8) &amp; 0xffffffff % 3) 
= 100 % 3
= 1
</artwork>
</li>
          <li>This is different from the ordinal number of Router X (2).
	  Hence, Router X will not be GDR for Group2.</li>
	  </ul>
          <t>For IPv6, consider this example, similar to the above.
	  Router X with IPv6 address fe80::1 receives a DRLB-List
	  Hello Option from the DR that announces RP Hash
	  Mask ::ffff:ffff:ffff:0 and a list of GDR Candidates, sorted by IP
	  addresses from high to low: fe80::3, fe80::2, and fe80::1.
	  The ordinal number assigned to those addresses would be:
          </t>
	  <ul empty="true">
          <li>0 for fe80::3; 1 for fe80::2; 2 for fe80::1 (Router X).</li>
          </ul>
          <t>Assume there are 2 RPs: RP1 2001:db8::1:0:5678:1 for Group1 and
	  RP2 2001:db8::1:0:1234:2 for Group2.
	  Following the modulo hash algorithm:
          </t>
	  <ul spacing="normal">
          <li>LSZC(::ffff:ffff:ffff:0) is 16, and GDRC is 3.
	  The hashvalue_RP for Group1 with RP RP1 is:</li>
	  </ul>
	  <ul empty="true">
          <li>
<artwork>
(((2001:db8::1:0:5678:1 &amp; ::ffff:ffff:ffff:0) &gt;&gt; 16) &amp; 
 0xffffffff % 3)
= ((::1:0:5678:0 &gt;&gt; 16) &amp; 0xffffffff % 3)
= (::1:0:5678 &amp; 0xffffffff % 3) 
= ::5678 % 3 
= 2
</artwork>
          </li>
          <li>This matches the ordinal number assigned to Router X.
	  Router X will be the GDR for Group1.</li>
	  </ul>
	  <ul spacing="normal">
          <li>The hashvalue_RP for Group2 with RP RP2 is:</li>
	  </ul>
	  <ul empty="true">
          <li>
<artwork>
(((2001:db8::1:0:1234:1 &amp; ::ffff:ffff:ffff:0) &gt;&gt; 16) &amp;
 0xffffffff % 3)
= ((::1:0:1234:0 &gt;&gt; 16) &amp; 0xffffffff % 3) 
= (::1:0:1234 &amp; 0xffffffff % 3) 
= ::1234 % 3 
= 1
</artwork>
</li>
          <li>This is different from the ordinal number of Router X (2).
	  Hence, Router X will not be GDR for Group2.</li>
	  </ul>
        </section>
        <section numbered="true" toc="default">
          <name>Limitations</name>
          <t>
            The modulo hash algorithm has poor failover characteristics when
            a shared LAN has more than two GDRs. In the
            case of more than two GDRs on a LAN, when one GDR fails, all
            of the groups may be reassigned to a different GDR, even if
	    they were not assigned to the failed GDR. However, many
	    deployments use only two routers on a shared LAN for redundancy
	    purposes. Future work may define new hash algorithms where only
	    groups assigned to the failed GDR get reassigned.
          </t>
          <t>The modulo hash algorithm will use, at most, 32 consecutive bits of
	  the input addresses for its computation. Exactly which bits are
	  used of the source, group, or RP addresses depend on the respective
	  masks. This limitation may be an issue for IPv6 deployments,
	  since not all bits of the IPv6 addresses are considered. If this
	  causes operational issues, a new hash algorithm would need to be
	  defined.
          </t>
        </section>
      </section>
      <section numbered="true" toc="default">
        <name>PIM Hello Options</name>
        <t>PIM routers include a new option, called
        "Load-Balancing Capability (DRLB-Cap)", in their PIM Hello messages.
        </t>
        <t>Besides this DRLB-Cap Hello Option, the elected PIM DR also
	includes a new "DR Load-Balancing List (DRLB-List) Hello Option".
	The DRLB-List Hello Option consists of three hash masks, as defined
	above, and also a list of GDR Candidate addresses on the LAN. It is
	recommended that the GDR Candidate addresses are sorted in descending
	order. This ensures that when using algorithms, such as the modulo hash
	algorithm in this document, that it is predictable which GDR is
	responsible for which groups, regardless of the order the DR learned
	about the candidates.
        </t>
        <section numbered="true" toc="default">
          <name>PIM DR Load-Balancing Capability (DRLB-Cap) Hello Option</name>
	  <figure anchor="PIM-CAP">
	  <name>PIM DR Load-Balancing Capability Hello Option</name>
          <artwork align="center" name="" type="" alt=""><![CDATA[            
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Type = 34           |         Length = 4            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     Reserved                  |Hash Algorithm |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
	  </figure>
          <dl newline="false" spacing="normal">
            <dt>Type:</dt>
	    <dd>34</dd>
            <dt>Length:</dt> 
	    <dd>4</dd>
            <dt>Reserved:</dt>
	    <dd>Transmitted as zero, ignored on receipt.</dd>
            <dt>Hash Algorithm:</dt>
	    <dd>Hash algorithm type. A value listed in the
	      IANA "PIM Designated Router Load-Balancing Hash Algorithms"
	      registry. 0 is used for the hash algorithm defined in this
	      document.
	      </dd>
          </dl>
          <t>This DRLB-Cap Hello Option <bcp14>MUST</bcp14> be advertised by routers on
          all interfaces where DR Load Balancing is enabled. Note that the
	  option is included, at most, once.
          </t>
        </section>
        <section numbered="true" toc="default">
          <name>PIM DR Load-Balancing List (DRLB-List) Hello Option</name>
          <figure anchor="PIM-LIST">
	    <name>PIM DR Load-Balancing List Hello Option</name>
<artwork align="center" name="" type="" alt=""><![CDATA[
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Type = 35           |         Length                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          Group Mask                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          Source Mask                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            RP Mask                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    GDR Candidate Address(es)                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     ]]></artwork>
          </figure>
          <dl newline="false" spacing="normal">
            <dt>Type:</dt>
	    <dd>35</dd>
            <dt>Length:</dt>
	    <dd>(3 + n) x (4 or 16) bytes, where n is the number
              of GDR Candidates.</dd>
            <dt>Group Mask (32/128 bits):</dt>
	    <dd>Mask applied to group addresses
	      as part of hash computation.</dd>
            <dt> Source Mask (32/128 bits):</dt>
	    <dd>Mask applied to source addresses
	      as part of hash computation.</dd>
            <dt>RP Mask (32/128 bits):</dt> 
	    <dd>Mask applied to RP addresses
	      as part of hash computation.</dd>
	  </dl>
             <t>All masks <bcp14>MUST</bcp14> have the same number of bits as the IP
	     source address in the PIM Hello IP header.
             </t>	     
	  <dl newline="false" spacing="normal">
	    <dt>GDR Candidate Address(es) (32/128 bits):</dt>
	    <dd><t>List of GDR Candidate(s)</t>
                <t>All addresses <bcp14>MUST</bcp14> be in the same address family as the
		PIM Hello IP header. It is recommended that the addresses are
		sorted in descending order.
		</t>
                  <t>If the "Interface ID" option, as specified in
		<xref target="RFC6395" format="default"/>, is present in a GDR Candidate's
		PIM Hello message and the "Router Identifier" portion is
		non-zero:
                  </t>
                  <ul spacing="normal">
                    <li>For IPv4, the "GDR Candidate Address" will be set directly
                  to the "Router Identifier".
                </li>
                    <li>For IPv6, the "GDR Candidate Address" will be 96 bits of
		zeroes, followed by the 32 bit Router Identifier.
                </li>
                  </ul>
                <t>If the "Interface ID" option is not present in a GDR
		Candidate's PIM Hello message or if the "Interface ID"
		option is present but the "Router Identifier" field is zero,
		the "GDR Candidate Address" will be the IPv4 or IPv6 source
		address of the PIM Hello message.
		</t>
                <t>This DRLB-List Hello Option <bcp14>MUST</bcp14> only be advertised by the
		elected PIM DR. It <bcp14>MUST</bcp14> be ignored if received from a non-DR.
		The option <bcp14>MUST</bcp14> also be ignored if the hash masks are not
		the correct number of bits or GDR Candidate addresses are in
		the wrong address family.
		</t>
	    </dd></dl>
        </section>
      </section>
      <section numbered="true" toc="default">
        <name>PIM DR Operation</name>
        <t>The DR election process is still the same as defined in
	<xref target="RFC7761" format="default"/>. The DR advertises the new DRLB-List Hello
	Option, which contains mask values from user configuration (or default
	values), followed by a list of GDR Candidate addresses. Note that
	if a router included the "Interface ID" option in the hello message
	and the Router ID is non-zero, the Router ID will be used to form the
	GDR Candidate address of the router, as discussed in the previous
	section. It is recommended that the list be sorted from the highest
	value to the lowest value.  The reason for sorting the list is to
	make the behavior deterministic, regardless of the order in which the
	DR learns of new candidates.  Note that, as for non-DR routers, the DR
	also advertises the DRLB-Cap Hello Option to indicate its ability to
	support the new functionality and the type of GDR election hash
	algorithm it uses.
        </t>
        <t>If a PIM DR receives a neighbor DRLB-Cap Hello Option that
	contains the same hash algorithm as the DR and the neighbor has the
	same DR priority as the DR, PIM DR <bcp14>SHOULD</bcp14> consider the neighbor as a
	GDR Candidate and insert the GDR Candidate's Address into the
	list of the DRLB-List Option. However, the DR may have policies
	limiting which or the number of GDR Candidates to
	include. Likewise, the DR <bcp14>SHOULD</bcp14> include itself in the list of GDR
	Candidates, but it is permissible not to do so, for instance, if there
	is some policy restricting the candidate set.
        </t>
        <t>If a PIM neighbor included in the list expires, stops announcing
	the DRLB-Cap Hello Option, changes DR priority, changes hash algorithm,
	or otherwise becomes ineligible as a candidate, the DR <bcp14>SHOULD</bcp14>
	immediately send a triggered hello with a new list in the DRLB-List
	option, excluding the neighbor.
        </t>
        <t>If a new router becomes eligible as a candidate, there is no
	urgency in sending out an updated list. An updated list <bcp14>SHOULD</bcp14> be
	included in the next hello.
        </t>
      </section>
      <section numbered="true" toc="default">
        <name>PIM GDR Candidate Operation</name>
        <t>When an IGMP/MLD report is received, a hash algorithm is used by
	the GDR Candidates to determine which router is going to be responsible
	for building forwarding trees on behalf of the host.
        </t>
        <t>The router <bcp14>MUST</bcp14> include the DRLB-Cap Hello Option in all PIM Hello
	messages sent on the interface.  Note that the presence of the
	DRLB-Cap Option in the PIM Hello does not guarantee that the router
	will be considered as a GDR Candidate.  Once the DR election is done,
	the DRLB-List Hello Option is received from the current PIM DR
	containing a list of the selected GDR Candidates.
        </t>
        <t>A router only acts as a GDR Candidate if it is included in the GDR
        Candidate list of the DRLB-List Hello Option. See next section for
	details.
        </t>
      </section>
      <section numbered="true" toc="default">
        <name>DRLB-List Hello Option Processing</name>
        <t>
	  This section discusses processing of the DRLB-List Hello Option,
	  including the case where it was received in the previous hello
	  but not in the current hello.
	  All routers <bcp14>MUST</bcp14> ignore the DRLB-List Hello Option if it is
	  received from a PIM router that is not the DR. The option <bcp14>MUST</bcp14>
	  only be processed by routers that are announcing the DRLB-Cap Option
	  and only if the hash algorithm announced by the DR is the same as
	  the local announcement.
	  All GDR Candidates <bcp14>MUST</bcp14> use the hash masks advertised
	  in the Option, 
	  even if they differ from those the candidate was configured with.
	  The DR <bcp14>MUST</bcp14> also process its own DRLB-List Hello Option.
        </t>
        <t>A router stores the latest option contents that were announced,
	if any, and deletes the previous contents. The router <bcp14>MUST</bcp14> also
	compare the new contents with any previous contents and, if there
	are any changes, continue processing as below. Note that if the
	option does not pass the above checks, the below processing <bcp14>MUST</bcp14> be
	done as if the option was not announced.
        </t>
        <t>
	  If the contents of the DRLB-List Option, the masks, or the candidate
	  list differ from the previously saved copy, it is received for the
	  first time, or it is no longer being received or accepted, the
	  option <bcp14>MUST</bcp14> be processed as below.
        </t>
        <ol spacing="normal" type="1">
          <li>
            <t>If the local router is included in the "GDR Candidate
            Address(es)" field, it will look for its own address, or if it
	    announces a non-zero Router ID, its own Router ID. For each of the
            groups or source and group pairs, if the group is in SSM mode
            with local receiver interest, the router <bcp14>MUST</bcp14> run
            the hash algorithm to determine which of them is for the GDR.
            </t>
            <ul spacing="normal">
              <li>If there is no change in the GDR status, then no further
	      action is required.
	      </li>
              <li>If the router becomes the new GDR, then a multicast
	      forwarding tree <bcp14>MUST</bcp14> be built <xref target="RFC7761" format="default"/>.
              </li>
              <li>
	      If the router is no longer the GDR, then it uses an Assert as
              explained in <xref target="assert" format="default"/>.
              </li>
            </ul>
          </li>


	  <li>
	    <t>If one of the following occurs:</t>
	    <ul>
	      <li>the local router is not included in the "GDR Candidate
              Address(es)" field,</li>
	      <li>the DRLB-List Hello Option is no longer included in the DR's
              Hello, or</li>
	      <li>the DR's Neighbor Liveness Timer expires [RFC7761],</li>
	    </ul>
	    <t>
	      then for each group (or each source and group pair if the group
	      is in SSM mode) with local receiver interest, for which the
	      router is the GDR, the router uses an Assert as explained in
	      <xref target="assert"/>.
	    </t>
</li>

        </ol>
      </section>
      <section anchor="assert" numbered="true" toc="default">
        <name>PIM Assert Modification</name>
        <t>GDR changes may occur due to configuration change,
	GDR Candidates going down, and also new routers coming up and
	becoming GDR Candidates. This may occur while flows are being
	forwarded. If the GDR for an active flow changes, there is likely
	to be some disruption, such as packet loss or duplicates.
	By using asserts, packet loss is minimized while allowing a small
	amount of duplicates.
        </t>
        <t>When a router stops acting as the GDR for a group, or source and
	group pair if SSM, it <bcp14>MUST</bcp14> set the Assert metric preference to maximum
	(0x7fffffff) and the Assert metric to one less than maximum
	(0xfffffffe). That is, whenever it sends or receives an Assert for the
	group, it must use these values as the metric preference and metric
	rather than the values provided by the unicast routing protocol.
        </t>
        <t>The rest of this section is just for illustration purposes and
	not part of the protocol definition.
        </t>
        <t>To illustrate the behavior when there is a GDR change, consider
	the following scenario where there are two flows:
        G1 and G2.  R1 is the GDR for G1, and R2 is the GDR for G2.
        When R3 comes up, it is possible that R3 becomes GDR for both
        G1 and G2; hence, R3 starts to build the forwarding tree for G1 and
        G2.  If R1 and R2 stop forwarding before R3 completes the process,
        packet loss might occur.  On the other hand, if R1 and R2 continue
        forwarding while R3 is building the forwarding trees, duplicates
        might occur.
        </t>
        <t>When the role of GDR changes as above, instead of immediately
        stopping forwarding, R1 and R2 continue forwarding to G1 and G2
        respectively, while, at the same time, R3 build forwarding trees for
        G1 and G2.  This will lead to PIM Asserts.
        </t>
        <t>For G1, using the functionality described in this document, R1
	and R3 determine the new GDR, which is R3.  With the modified Assert
	behavior, R1 sets its Assert metric to the near maximum value, as discussed
	above.  That will make R3, which has normal metric in its Assert,
	the Assert winner.
        </t>
      </section>
      <section numbered="true" toc="default">
        <name>Backward Compatibility</name>
        <t>In the case of a hybrid Ethernet shared LAN (where some PIM routers
	support the functionality defined in this document and some do not):
        </t>
        <ul spacing="normal">
          <li>If the DR does not support the new functionality, then there
	  will be no load balancing.
          </li>

          <li>If non-DR routers do not support the new functionality, they
	  will not be considered as GDR Candidate and will not take part
	  in load balancing. Load balancing may still happen on the link.
          </li>
        </ul>
      </section>
    </section>
    <section numbered="true" toc="default">
      <name>Operational Considerations</name>
      <t>
	An administrator needs to consider what the total bandwidth
	requirements are and find a set of routers that together have
	enough available capacity while making sure that each of the routers
	can handle its part, assuming that the traffic is distributed
	roughly equally among the routers. Ideally, one should also have
	enough bandwidth to handle the case where at least one router fails.
	All routers should have reachability to the sources and
	RPs, if applicable, that are not via the LAN.
      </t>
      <t>Care must be taken when choosing what hash masks to configure. One
      would typically configure the same masks on all the routers so that
      they are the same, regardless of which router is elected as DR. The
      default masks are likely suitable for most deployment. The RP Hash
      Mask must be configured (the default is no bits set) if one wishes to
      hash based on the RP address rather than the group address for ASM.
      The default masks will use the entire group addresses, and source
      addresses if SSM, as part of the hash. An administrator may set other
      masks that mask out part of the addresses to ensure that certain
      flows always get hashed to the same router. How this is achieved depends
      on how the group addresses are allocated.
      </t>
      <t>
	Only the routers announcing the same hash algorithm as the DR
        would be considered as GDR Candidates. Network administrators
        need to make sure that the desired set of routers announce the
        same algorithm. Migration between different algorithms is
        not considered in this document.
      </t>
    </section>
    <section numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>IANA has made these assignments in the "PIM-Hello Options" registry:
      value 34 for the PIM DR Load-Balancing Capability (DRLB-Cap) Hello
      Option (with Length of 4), and value 35 for the PIM DR Load-Balancing
      List (DRLB-List) Hello Option (with variable Length).
      </t>
      <t> 
      Per this document, IANA has created a registry called
      "PIM Designated Router Load-Balancing Hash Algorithms" in the
      "Protocol Independent Multicast (PIM)" branch of the registry tree.
      The registry lists hash algorithms for use by PIM Designated Router
      Load Balancing.
      </t>
      <section numbered="true" toc="default">
        <name>Initial Registry</name>
        <t>
          The initial content of the registry is as follows.
        </t>
	<table anchor="initial-reg" align="center">
	  <thead>
	    <tr>
	      <th>Type</th>
	      <th>Name</th>
	      <th>Reference</th>
	    </tr>
	  </thead>
	  <tbody>
	    <tr>
	      <td>0</td>
	      <td>Modulo</td>
	      <td>RFC 8775</td>
	    </tr>
	    <tr>
	      <td>1-255</td>
	      <td>Unassigned</td>
	      <td></td>
	    </tr>
	  </tbody>
	</table>
      </section>
      <section numbered="true" toc="default">
        <name>Assignment of New Hash Algorithms</name>
        <t>Assignment of new hash algorithms is done according to the "IETF
        Review" procedure; see <xref target="RFC8126" format="default"/>.
        </t>
      </section>
    </section>
    <section numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>Security of the new DR Load-Balancing PIM Hello Options is only
      guaranteed by the security of PIM Hello messages, so the security
      considerations for PIM Hello messages, as described in PIM-SM
      <xref target="RFC7761" format="default"/>, apply here.
      </t>
      <t>If the DR is subverted, it could omit or add certain GDRs or
      announce an unsupported algorithm. If another router is subverted, it
      could be made DR and cause similar issues. While these issues are
      specific to this specification, they are not that different from existing
      attacks, such as subverting a DR and lowering the DR priority, causing a
      different router to become the DR.
      </t>
      <t>If, for any reason, the DR includes a GDR in the announced list that
      announces a different algorithm from what the DR announces, the GDR
      is required to ignore the announcement, and there will be no router
      acting as the DR for the flows that hash to that GDR.
      </t>
      <t>If a GDR is subverted, it could potentially be made to stop forwarding
      all the traffic it is expected to forward. This is also similar today to
      if a DR is subverted.
      </t>
      <t>An administrator may be able to achieve the desired load balancing
      of known flows, but an attacker may send a single high rate flow that
      is served by a single GDR or send multiple flows that are expected to
      be hashed to the same GDR.</t>
    </section>
  </middle>
  <!--  *****BACK MATTER ***** -->

  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.6395.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7761.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8126.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
      </references>
      <references>
        <name>Informative References</name>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3376.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3810.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4541.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4607.xml"/>
      </references>
    </references>
    <section numbered="false" toc="default">
      <name>Acknowledgements</name>
      <t>
        The authors would like to thank <contact fullname="Steve Simlo"/> and
	<contact fullname="Taki Millonis"/> for
        helping with the original idea; <contact fullname="Alia Atlas"/>,
	<contact fullname="Bill Atwood"/>, <contact fullname="Joe Clarke"/>,
	<contact fullname="Alissa Cooper"/>, <contact fullname="Jake
	Holland"/>, <contact fullname="Bharat Joshi"/>, <contact
	fullname="Anish Kachinthaya"/>, 
	<contact fullname="Anvitha Kachinthaya"/>, <contact fullname="Benjamin
	Kaduk"/>, <contact fullname="Mirja Kühlewind"/>, <contact
	fullname="Barry Leiba"/>, 
	<contact fullname="Ben Niven-Jenkins"/>, <contact fullname="Alvaro
	Retana"/>, <contact fullname="Adam Roach"/>,
	<contact fullname="Michael Scharf"/>, <contact fullname="Éric
	Vyncke"/>, and <contact fullname="Carl Wallace"/>
	for reviews and comments; and <contact fullname="Toerless Eckert"/>
	and <contact fullname="Rishabh Parekh"/> for helpful conversation on
	the document.
      </t>
    </section>
  </back>

</rfc>
