<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
]>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="4"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc strict="no"?>
<?rfc rfcedstyle="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-ietf-bess-bgp-multicast-08" ipr="trust200902">
  <front>
    <title abbrev="bgp-mcast">BGP Based Multicast</title>

    <author fullname="Zhaohui Zhang" initials="Z." surname="Zhang">
      <organization>Juniper Networks</organization>
      <address>
        <email>zzhang@juniper.net</email>
      </address>
    </author>

    <author fullname="Lenny Giuliano" initials="L." surname="Giuliano">
      <organization>Juniper Networks</organization>
      <address>
        <email>lenny@juniper.net</email>
      </address>
    </author>

    <author fullname="Keyur Patel" initials="K." surname="Patel">
      <organization>Arrcus</organization>
      <address>
        <email>keyur@arrcus.com</email>
      </address>
    </author>

    <author fullname="IJsbrand Wijnands" initials="I." surname="Wijnands">
      <organization>Arrcus</organization>
      <address>
        <email>ice@braindump.be</email>
      </address>
    </author>
    <author fullname="Mankamana Mishra" initials="M." surname="Mishra">
      <organization>Cisco Systems</organization>
      <address>
        <email>mankamis@cisco.com</email>
      </address>
    </author>

    <author fullname="Arkadiy Gulko" initials="A." surname="Gulko">
      <organization>EdwardJones</organization>
      <address>
        <email>arkadiy.gulko@edwardjones.com</email>
      </address>
    </author>

    <workgroup>BESS</workgroup>

    <abstract>
      <t>This document specifies a BGP address family and related procedures
         that allow BGP to be used for setting up multicast distribution
         trees.  This document also specifies procedures that enable BGP to
         be used for multicast source discovery, and for showing interest in
         receiving particular multicast flows.  Taken together, these
         procedures allow BGP to be used as a replacement for other
         multicast routing protocols, such as PIM or mLDP.  The BGP
         procedures specified here are based on the BGP multicast procedures
         that were originally designed for use by providers of Multicast
         Virtual Private Network service.
      </t>
	  <t>This document also describes how various signaling mechanisms can be
	     used to set up end-to-end inter-region multicast trees.
    </t>
    </abstract>

    <note title="Requirements Language">
	<t>
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they appear in all
   capitals, as shown here.
	</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
	  <section title="Terminology">
		<t>This document assumes the readers are familiar with basic multicast
		concepts. Some terminologies are included here for convenience.
		<list>
		  <t>PIM: Protocol Independent Multicast <xref target="RFC7761"/>.</t>
		  <t>ASM: All/Any Source Multicast. A multicast mode where a receiver
		  is interested in receiving traffic for a multicast group from anywhere.</t>
		  <t>SSM: Source-Specific Multicast <xref target="RFC7761"/>.</t>
		  <t>PIM-ASM: PIM procedures for ASM.</t>
		  <t>PIM-SSM: PIM procedures for SSM.</t>
		  <t>PIM-Port: PIM Over Reliable Transportation <xref target="RFC6559"/>.</t>
		  <t>PIM-Bidir: PIM procedures for bidirectional shared trees connecting
   multicast sources and receivers <xref target="RFC5015"/>.</t>
		  <t>RP: Rendezvous Point, the root
      of the non-source-specific distribution tree for a multicast
      group <xref target="RFC7761"/>.</t>
		  <t>RPA: Rendezvous Point Address for the root of a bidirectional distribution tree for a range of multicast groups <xref target="RFC5015"/>.</t>
		  <t>RPL: Rendezvous Point Link, the link to which the RPA belongs.</t>
		  <t>RPF: Reverse Path Forwarding, by which multicast traffic is forwarded from the root of a multicast tree in the reverse path on which the tree leaves would reach the root.</t>
		  <t>FHR: First Hop Router (connecting to a multicast source) <xref target="RFC7761"/>.</t>
		  <t>LHR: Last Hop Router (connecting to a multicast receiver) <xref target="RFC7761"/>.</t>
		  <t>(x,G): An IP multicast flow/tree for a group G, where x is either an 'S' for a specific source or a '*' for all sources.</t>
		  <t>IGMP: Internet Group Management Protocol <xref target="RFC3376"/>.</t>
		  <t>MLD: Multicast Listener Discovery <xref target="RFC3810"/>.</t>
		  <t>P2MP: Point-to-MultiPoint.
		  </t>
		  <t>MP2MP: MultiPoint-to-MultiPoint.
		  </t>
		  <t>mLDP: Label Distribution Protocol Extensions for P2MP and MP2MP
		  Label Switched Paths <xref target="RFC6388"/>.</t>
		  <t>S-PMSI: Selective Provider Multicast Service Interface.
		  In <xref target="RFC6514"/>, the term refers to a pseudo-interface
		  used to send customer multicast to a subset of Provider Edge routers,
		  and S-PMSI Auto-Discovery (A-D) routes are used to advertise the
		  binding of customer multicast flows to a tunnel that instantiates a
		  pseudo-interface. In this document, S-PMSI A-D routes are used for
		  various purposes as described in <xref target="lanproc"/> and
		  <xref target="upstream-spmsi"/>.
		  </t>
		  <t>PTA: PMSI Tunnel Attribute <xref target="RFC6514"/>. An attribute
		  carried in an S-PMSI A-D route that specifies the tunnel instantiating
		  the PMSI.
		  </t>
		  <t>EC: BGP Extended Community.
		  </t>
		  <t>RT: BGP Route Target.
		  </t>
		  <t>RTC: Route Target Constrain (RTC) <xref target="RFC4684"/>.
		  </t>
		</list>
		</t>
	  </section>
    <section title="Motivation">
    <t>This section provides some motivation for BGP signaling for native
       and labeled multicast.
       One target deployment would be a Data Center (DC) that requires multicast
       but uses BGP as its only routing protocol <xref target="RFC7938"/>.
       In such a deployment, it would be desirable to
       support multicast by extending the deployed routing protocol, without
       requiring the deployment of tree-building protocols such as PIM, mLDP, and without requiring an IGP.
    </t>
    <t>Additionally, compared to PIM, BGP-based signaling has several advantage
       as described in the following section, and may be desired in non-DC
       deployment scenarios as well.
    </t>
    <section title="Native/unlabeled Multicast">
    <t>Protocol Independent Multicast (PIM) <xref target="RFC7761"/> has been the prevailing
       multicast protocol for many years. Despite its success, it has two
       drawbacks:
       <list style="symbols">
          <t>The ASM model, which is prevalent, introduces complexity
             in the following areas: source discovery procedures, need for
             Rendezvous Points (RPs) and group-to-RP mappings, need to
             switch between RP-rooted trees and source-rooted trees, etc.
          </t>
          <t>Periodical protocol state refreshes due to soft state nature.
          </t>
       </list>
    </t>
    <t>PIM-SSM removes much of the 
complexity of PIM-ASM by moving source discovery to the application layer.  
However, for various reasons, many legacy applications and devices still 
rely upon network-based source discovery.
       PIM Over Reliable Transport (PORT) <xref target="RFC6559"/> solves the soft state issue, though its
       deployment has also been limited for two reasons:
       <list style="symbols">
          <t>It does not remove the ASM complexities.
          </t>
          <t>In many of the scenarios where reliable transport is deemed
             important, BGP-based multicast (e.g. BGP-MVPN) has been used
             instead of PORT.
          </t>
       </list>
    </t>
    <t>Partly because of the above-mentioned problems, some Data Center
       operators have been avoiding deploying multicast in their networks.
    </t>
    <t>BGP-MVPN <xref target="RFC6514"/> uses BGP to signal VPN customer multicast state
       over provider networks. It removes the above-mentioned problems
       from the Service Provider (SP) environment and has been widely deployed.
	   While RFC 6514 enables an SP to provide MVPN
       service without running PIM on its backbone, it assumes
       that PIM (or mLDP) runs on the PE-CE links. <xref target="I-D.ietf-bess-mvpn-pe-ce"/>
       adapts the concept of BGP-MVPN to PE-CE links so that the use of PIM on
       the PE-CE links can be eliminated (though the PIM-ASM complexities
       still remain in the customer network), and this document
       extends it further to general topologies, so that they can
       be run on any router, as a replacement for PIM or mLDP.
    </t>
    <t>With that, PIM can be eliminated from the network. PIM
       soft state is replaced by BGP hard state. For ASM,
       source-specific trees are set up directly after simpler source
       discovery (data-driven on FHRs and control-driven elsewhere),
       all based on BGP. All the complexities related to source
       discovery and shared/source tree switch are also eliminated.
       Additionally, the trees can be set up with MPLS labels, with just
       minor enhancements in the signaling.
    </t>
    </section>
    <section title="Labeled Multicast">
    <t>There could be two forms of labeled multicast signaled by BGP. The first
       one is labeled (x,G) multicast where 'x' stands for either 'S' or '*'.
       Basically, it is for a BGP-signaled multicast tree as described in
       the previous section but with labels.
       The second one is for mLDP tunnels with BGP signaling in part or whole
       through a BGP domain.
    </t>
    <t>For both cases, BGP is used because other label distribution mechanisms
       like mLDP may not be desired by some operators. For example, a DC
       operator may prefer to have a BGP-only deployment.
    </t>
    </section>
    </section>
    <section title="Overview">
	  <t>This overview section describes the
	  mode of operation and some considerations.</t>
    <t>At a very high level, PIM Join messages or mLDP Label mapping
	  messages are replaced by BGP updates of MCAST-TREE SAFI with the
	following NLRI format (<xref target="nlri"/>):
	
    <figure>
    <artwork>
      +-----------------------------------+
      |    Route Type (1 octet)           |
      +-----------------------------------+
      |     Length (1 octet)              |
      +-----------------------------------+
      | Route Type specific (variable)    |
      +-----------------------------------+
    </artwork>
    </figure>
	</t>
    <t>Different route types are described in this section as they are encountered.
    </t>
    <section title="(x,G) Multicast" anchor="pim">
    <t>PIM/mLDP-like functionality is provided, using BGP-based join
       signaling and BGP-based source discovery in the case of ASM.  The BGP-based
       join signaling supports both labeled multicast and IP multicast.
    </t>
    <t>The same RPF procedures
       as in PIM/mLDP are used for each router to determine the RPF neighbor for
       a particular source or RPA (in the case of Bidirectional Tree) or root.
	   Except in the Bidirectional Tree case and a special case described in
       <xref target="shared"/>, no (*,G) join is used - LHR routers
       discover the sources for ASM and then join towards the sources
       directly. Data-driven mechanisms like PIM Assert are replaced by
       control-driven mechanisms (<xref target="lan"/>).
    </t>
	<t>
	One of the route types is Leaf A-D route - the
	equivalent of a PIM Join message or mLDP Label Mapping message. The Leaf
	A-D routes are
       targeted at the upstream neighbor by use of Route Targets. In some cases,
	   S-PMSI A-D routes are also used, as described in some sections below.
	   <!--
       There are three benefits of using S-PMSI/Leaf routes for this purpose:
        a) when the routes go through RRs, we have to distinguish
        different routes based on upstream router and downstream router.
        This leads to Leaf routes. b) for labeled bidirectional trees, we need
        to signal "upstream fec". S-PMSI suits this very well. c) we may want
        to allow the option of setting up trees or parts of a tree from the
        root/upstream towards leaves/downstream and S-PMSI suits that
        very well.
       -->
    </t>
    <t>If the BGP updates carry labels (via Tunnel Encapsulation Attribute
       <xref target="RFC9012"/>), then (S,G)
       multicast traffic can use the labels. This is very similar to mLDP
       Inband Signaling <xref target="RFC6826"/>, except that there are no corresponding
       "mLDP tunnels" for the PIM trees. Similar to mLDP, labeled traffic
       on transit Local Area Networks (LANs) are point to point. Of course,
	   traffic sent to receivers on a LAN by a LHR is native multicast.
    </t>
    <t>For labeled bidirectional (*,G) trees, downstream traffic (away from the RPA)
       is forwarded as in the (S,G) case. For upstream traffic (towards
       RPA), the upstream neighbor needs to advertise a label for its
       downstream neighbors. The same label that the upstream neighbor
       advertises to its upstream in a Leaf A-D route is the same one that
	   it advertises to its downstreams using an S-PMSI A-D route.
    </t>
<!--
    <t>For Bidirectional PIM, DF election is required on each link to select a
       router to forward traffic to/from the RPA direction. This is based
       on DF messages exchanged rapidly among the BIDIR-PIM routers on the
       link. The procedure is complicated and may not be robust enough
       in all situations, and . In a typical provider
       network, transit LANs are rarely used therefore for simplicity this
       document does not support transit LANs for bidirectional trees.
    </t>
    <t>For resilience purpose the RPA is typically a "virtual address" on
       a multi-access link and is not associated with any routers. No DF
       election is needed on this RPL (Rendezvous Point Link), and all routers
       on the RPL forward traffic to/from the RPL. With Bidir-PIM,
       the RPL routers terminate the Join/Prune messages from downstream
       neighbors and the same applies if BGP is used for signaling.
    </t>
-->
    <section title="Source Discovery for ASM" anchor="source">
    <t>This document does not support ASM via shared trees (aka RP Tree,
       or RPT) with one exception discussed in the next section.
       Instead, FHRs, LHRs, and optionally RRs work together to
       propagate/discover source information via control plane and LHRs join
       source-specific Shortest Path Trees (SPT) directly.
    </t>
<!--
    <t>The RPs are just Route Reflectors. Multicast data traffic does not
       necessarily go through them, and redundancy can be easily achieved
       by having multiple RRs. They do not participate in any multicast specific
       procedures, besides that they redistribute Source Active A-D routes.
    </t>
-->
    <t>An FHR originates Source Active (SA) A-D routes upon discovering sources for
    particular flows and advertises them to its peers. Route targets are
	used so that
       the SA routes only reach LHRs that are interested in receiving the
       traffic (<xref target="sa"/>). 
    </t>
    <t>Typically, a set of RRs are used and they maintain all Source Active
       routes but only distribute to interested LHRs on demand<!-- (upon receiving
       corresponding Route Target Membership routes, which are triggered on
       LHRs when they receive IGMP/MLD membership reports)-->. The rest of the
       document assumes that RRs are used, even though that is not required.
    </t>
    <t>That the set of RRs maintain all SA routes is comparable to that
	the RPs in PIM-ASM maintain all (S,G) states in the network. In fact,
	in PIM-ASM case the states are maintained in both the control plane and
	data plane, while in the case of BGP SA-based discovery, the states are
	only maintained in the control plane, and the RRs can be placed outside
	the traffic path.
    </t>
<!--
    <t>Because the RPs are only used for distributing SA route and not as
       data rendezvous points, a small number of them are enough and there is
       no need to have different RPs for different groups. As a result,
       static configuration is sufficient - no need for dynamic RP learning
       protocols like BSR and Auto-RP.
    </t>
-->
    <t>Note that the data-driven source discovery and subsequent control-driven
	tree setup means receivers will miss the initial packets of a multicast
	flow when it just starts or resumes. If it is important to avoid this,
	source discovery should be provided by the application layer instead of
	the network.
	</t>
    </section>
    <section title="ASM Shared-tree-only Mode" anchor="shared">
    <t>It may be desired that only a shared tree is used to distribute all
       traffic for a particular ASM group from its RP to all LHRs, as described
       in Section 4.1 "PIM Shared Tree Forwarding"  of <xref target="RFC7438"/>. This will
       significantly cut down the number of trees and works out very well
       in certain deployment scenarios. For example, all the sources could
       be connected to the RP, or clustered close to the RP. In the latter
       case, either the paths from FHRs to the RP do not intersect the
       shared tree so native forwarding can be used between the FHRs
       and the RP, or other means outside of this document could be used
       to forward traffic from FHRs to the RP.
    </t>
    <t>For native forwarding from FHRs to the RP, SA routes may be used
       to announce the sources so that the RP can join source-specific trees
       to pull traffic, but the LHRs do not advertise the group-specific Route
	   Target Membership routes as they do not need the SA routes.
    </t>
    <t>To establish the shared tree, (*,G) Leaf A-D routes are originated
	hop-by-hop towards the RP, and corresponding (*,G) forwarding states are
	established along the way, just like how (S,G) Leaf A-D routes are
	originated hop-by-hop towards the source and (S,G) forwarding sates are
	established along the way.
    </t>
    </section>
    <section title="Integration with BGP-MVPN" anchor="mvpn">
    <t>For each VPN, the Source Active routes distribution in that VPN do not
       have to involve PEs at all (unless there are sources/receivers directly
       connected to some PEs) and they are independent of MVPN SA routes.
       For example, FHRs and LHRs establish BGP
       sessions with RRs of that particular VPN for the purpose of SA
       distribution.
    </t>
<!--
    <t>Alternatively, one or more PEs can serve as the RRs for their local
       sites for the purpose of distributing SA routes. They can then translate
       the Source Active routes into BGP-MVPN SA routes. Compared to the
       approach in the previous paragraph, the PEs use a single session
       (vs. one session for each VPN) to exchange BGP-MVPN SA routes
       (MCAST-VPN SAFI) among themselves, following the procedures defined
       in Section 14 of RFC 6514. That's in addition to exchanging BGP SA
       routes (MCAST-TREE SAFI) between
       a PE and FHRs/LHRs that it is responsible for. Note that RFC 6514
       does not explicictly specify that an egress PE translate received
       BGP-MVPN SA A-D routes into PIM Null Register messages or MSDP SA routes
       (for the purpose of Anycast RP). In this document, a PE acting
       as a RR for SA A-D routes does translate received BGP-MVPN SA A-D routes
       to BGP SA A-D routes, and vice versa.
    </t>
-->
    <t>After source discovery, BGP multicast signaling is done from LHRs
       towards the sources. When the signaling reaches an egress PE,
       BGP-MVPN signaling takes over, as if a PIM (S,G) join was received
       on the PE-CE interface. When the BGP-MVPN signaling reaches the ingress
       PE, BGP multicast signaling as specified in this document takes over,
       similar to how BGP-MVPN triggers PIM (S,G) join on PE-CE
       interfaces.
    </t>
    </section>
    </section>
    <section title="BGP Inband Signaling for mLDP Tunnels">
    <t>Part of an (or the whole) mLDP tunnel can also be signaled via BGP and seamlessly
       integrated with the rest of mLDP tunnel signaled natively via mLDP.
       All the procedures are similar to mLDP except that the signaling is done
       via BGP. The mLDP FEC is encoded in the BGP NLRI, with MCAST-TREE SAFI
       and S-PMSI/Leaf A-D Routes for mLDP defined
       in this document. The Leaf A-D routes correspond to mLDP Label Mapping
       messages and the S-PMSI A-D routes are used to signal upstream FEC for 
       MP2MP mLDP tunnels, similar to the bidirectional (*,G) case.
    </t>
    </section>
    <!--section title="BGP Inband Signaling for SR-P2MP">
    <t><xref target="I-D.ietf-pim-sr-p2mp-policy"/> describes an architecture
	to construct SR-P2MP trees using Segment Routing Replication Segments
	<xref target="I-D.ietf-spring-sr-replication-segment"/>. A P2MP tree
	is identified in the control plane by &lt;root-id, tree-id&gt;
	and a replication segment is identified in the control plane by
	&lt;root-id, tree-id, candidate-path-id, node-id&gt;.
	Besides the	controller-based tree calculation and signaling,
	this document specifices another option - hop-by-hop signaling via BGP
	using Replication State route type
	<xref target="I-D.ietf-bess-bgp-multicast-controller"/>. Besides the
	name/encoding difference, the proecudure for this route used for hop-by-hop
	signaling of SR-P2MP is the same as the procedure for Leaf A-D routes
	for mLDP tunnel signaling via BGP.
    </t>
    </section-->
    <section title="BGP Sessions">
    <t>In order for two BGP speakers to exchange MCAST-TREE NLRI, they MUST use
   BGP Capabilities Advertisement <xref target="RFC5492"/> to ensure that they both are
   capable of properly processing the MCAST-TREE NLRI.  This is done as
   specified in <xref target="RFC4760"/>, by using a capability code 1 (multiprotocol
   BGP) with an AFI of IPv4 (1) or IPv6 (2) and a SAFI of MCAST-TREE (78).
    </t>
    <t>How the BGP peer sessions are provisioned, whether EBGP or IBGP, whether
       statically, automatically,
       or programmably via an external controller, is outside the scope of
       this document.
    </t>
    <t>In the case of IBGP, it could be that every router peering with Route
       Reflectors, or hop-by-hop IBGP sessions could be used to exchange
       MCAST-TREE NLRIs for joins. In the latter case,
       unless desired otherwise for reasons outside of
       the scope of this document, the hop-by-hop IBGP sessions SHOULD only
       be used to exchange MCAST-TREE NLRIs.
    </t>
    <t>When multihop BGP is used, a router advertises its local interface
       addresses, for the same purposes that the Address List TLV in LDP
       serves. This is achieved by advertising the interface address
       as host prefixes with IPv4/v6 Address Specific Extended Community (EC)
	   corresponding to the router's local address used for its BGP sessions
	   (<xref target="LAEC"/>).
    </t>
    <t>Because the BGP Capability Advertisement is only between two peers,
       when the sessions are only via RRs, a router needs another way to
       determine if its neighbor is capable of signaling multicast via BGP.
       The interface address advertisement can be used for that purpose -
       the inclusion of a Session Address EC indicates that the BGP speaker
       identified in the EC supports the MCAST-TREE NLRIs.
    </t>
    <t>FHRs and LHRs may also establish BGP sessions to some Route Reflectors
       for source discovery purposes (<xref target="source"/>).
    </t>
    <t>With the traditional PIM, the FHRs and LHRs refer to the PIM Desginated
	   Routers (DRs) on
       the source or receiver networks. With BGP based multicast, PIM may not
       be running at all, and the FHRs and LHRs refer to the IGMP/MLD queriers
       or the Desginated Forwarders (DFs) elected per
       <xref target="I-D.wijnands-bier-mld-lan-election"/>. Alternatively,
       if it is known that a network only has senders then no IGMP/MLD or DF
       election is needed - any router may generate SA routes. That will not
       cause any issue other than redundant SA routes being originated.
    </t>
    </section>
    <section title="LAN and Parallel Links" anchor="lan">
    <t>There could be parallel links between two BGP peers. A single multi-hop
       session, whether IBGP or EBGP, between loopback addresses
       may be used. Except for LAN interfaces in the case of unlabeled (x,G)
       unidirectional trees
       (note that transit LAN interface is not supported for BGP signaled
        (*,G) bidirectional tree, and for mLDP tunnels, traffic on transit
        LAN is point to point between neighbors), any link between
       the two peers can be automatically used by a downstream peer to
       receive traffic from the upstream peer, and it is for the upstream peer
       to decide which link to use. If one of the links goes down, the upstream
       peer switches to a different link and there is no change needed on the
       downstream peer.
    </t>
    <t>For unlabeled (x,G) unidirectional trees, the upstream peer may prefer
       LAN interfaces
       to send traffic (since multiple downstream peers may be reached
       simultaneously), or it may make a decision based
       on local policy, e.g., for load balancing purposes. Because different
       downstream peers might choose different upstream peers for RPF,
       when an upstream peer decides to use a LAN interface to send traffic,
       it originates an S-PMSI A-D route indicating that one or more LAN
        interface will be used. The route carries Route Targets specific
       to the LANs so that all the peers on the LANs import the route. If more
       than one router originate the route specifying the same LAN for the
       same (S,G) or (*,G) flow, then assert procedure based on the
       S-PMSI A-D routes happens and assert losers will stop sending traffic
       to the LAN.
    </t>
    <t>There may be multiple LAN interfaces between two neighbors, and the
	upstream neighbor may send traffic on both LAN interfaces because of
	other downstream neighbors on both LANs. In this case, a downstream
	neighbor will choose one of the LANs to receive traffic - the RTs in the
	S-PMSI route enables the downstream neighbor to determine that its
	upstream neighbor is sending on both interfaces and it will only choose
	one on which to receive traffic.
    </t>
    </section>
    <section title="Transition">
    <t>A network currently running PIM can be incrementally transitioned to
       BGP based multicast. At any time, a router supporting BGP based
       multicast can use PIM with some neighbors (upstream or
       downstream) and BGP with some other neighbors. PIM and BGP
       MUST NOT be used simultaneously between two neighbors for multicast
       purposes, and routers connected to the same LAN MUST be
       transitioned during the same maintenance window.
    </t>
    <t>In the case of PIM-SSM, any router can be transitioned at any time (except
       on a LAN). It may
       receive source tree joins from a mixed set of BGP and PIM downstream
       neighbors and send source tree joins to its upstream neighbor using
       either PIM or BGP signaling.
    </t>
    <t>In the case of PIM-ASM, the RPs are first upgraded to support BGP based
       multicast. They learn sources
       either via PIM procedures from PIM FHRs, or via Source Active A-D
       routes from BGP FHRs. In the former case, the RPs can originate proxy
       Source Active A-D routes. There may be a mixed set of RPs/RRs - some
       capable of both traditional PIM RP functionalities while some only
       redistribute SA routes.
    </t>
    <t>Then any routers can be transitioned incrementally. A transitioned
       LHR router will pull Source Active A-D routes
       from the RPs/RRs when they receive IGMP/MLD (*,G) joins for ASM groups,
       and may send either PIM (S,G) joins or BGP Source Tree Join routes.
       A transitioned
       transit router may receive (*,G) PIM joins but only send source
       tree joins after pulling Source Active A-D routes from RPs/RRs.
    </t>
    <t>Similarly, a network currently running mLDP can be incrementally
       transitioned to BGP signaling. Without the complication of ASM, any
       router can be transitioned at any time, even without the restriction
       of coordinated transition on a LAN. It may receive mixed mLDP label
       mapping or BGP updates from different downstream neighbors, and may
       exchange either mLDP label mapping or BGP updates with its upstream
       neighbors, depending on if the neighbor is using BGP based signaling
       or not.
    </t>
    </section>
    <section title="Inter-region Multicast">
	<t>An end-to-end multicast tree or P2MP tunnel may span multiple regions,
	   where a region could be an IGP area (or even a sub-area) or an
       Autonomous System (AS), and different multicast signaling could be used
	   in different regions. There are several situations to consider.
	</t>
    <section title="Inband Signaling across a Region">
	<t>With inband signaling, the multicast tree/tunnel is signaled through
       a region and internal routers in the region maintain
       corresponding per-tree/tunnel state. A downstream region and
	   an upstream region may use the same or different signaling. For example,
       a (*/s, G) IP multicast tree with BGP signaling in a downstream region
	   can be signaled with mLDP Inband Signaling <xref target="RFC6826"/> or with PIM across
	   the upstream region, and a p2mp tunnel with BGP signaling in the
	   downstream region can be signaled with mLDP across the upstream region,
	   or vice versa. A Regional Border Router (RBR) will stitch the upstream
	   portion (e.g. PIM/mLDP-signaled) to the downstream portion
	   (e.g. BGP-signaled).
	</t>
	<t>If all routers in the region have routes towards the
       source/root of the tree/tunnel then there is nothing different
       from the intra-region case. On the other hand, if internal routers
       do not have routes towards the source/root, e.g. as with
       Seamless MPLS <xref target="I-D.ietf-mpls-seamless-mpls"/><!--or Seamless SR <xref target="I-D.hegde-spring-mpls-seamless-sr"/-->,
	   the internal routers need to do RPF towards an
       upstream RBR. To signal the RBR information
       to an internal upstream router, one of the following ways is used
	   depending on the signaling method:
       <list style="symbols">
         <t>With BGP signaling, the Leaf A-D Route carries a new
       BGP Extended Community referred to as Multicast RPF Address EC,
       similar to PIM RPF Vector <xref target="RFC5496"/> and mLDP Recursive FEC <xref target="RFC6512"/>.
         </t>
		 <t>With PIM signaling, PIM RPF Vector is used.
		 </t>
		 <t>With mLDP signaling, mLDP Recursive FEC is used.
		 </t>
       </list>
	</t>
    </section>
    <section title="Overlay Signaling Over a Region" anchor="overlay">
	<t>With overlay signaling, a downstream RBR signals to its
       upstream RBR over the region and the internal
       routers do not maintain the state of the (overlay) tree/tunnel.
	   This can be done with one of the following methods:
       <list style="symbols">
         <t>mLDP P2MP tunnels can be signaled over the region via targeted LDP
		 sessions <xref target="RFC7060"/>.
         </t>
		 <t>Both IP multicast tree and mLDP P2MP tunnels can be signaled over
		 a region via BGP-MVPN procedures <xref target="RFC6514"/>.
		 </t>
		 <t>Both IP multicast tree and mLDP P2MP tunnels can be signaled over
		 a region via BGP as discussed in the rest of this section.
		 </t>
       </list>
	</t>
	<t>All three methods are actually very similar in concept. The
       upstream RBR tunnels packets to the downstream RBR, just as in the
       intra-region case when two routers on the tree/tunnel are not
       directly connected. The rest of this section only discusses
	   BGP signaling.
	</t>
	<t>When a downstream RBR determines that the route towards the source/root
       has a BGP Next Hop towards a BGP speaker capable of multicast signaling
       via BGP as specified in this document, it signals to that
       BGP speaker (via a RR or not).
	</t>
	<t>Suppose an upstream RBR receives the signaling for the same tree/tunnel
       from several downstream RBRs. It could use Ingress Replication to
       replicate packets directly to those downstream RBRs, or it could use
       underlay P2MP tunnels instead.
	</t>
	<t>In the latter case, the upstream RBR advertises an S-PMSI A-D route
       with a PMSI Tunnel Attribute (PTA) specifying the underlay tunnel.
       This is very much like the "mLDP Over Targeted Sessions" <xref target="RFC7060"/> or
       BGP-MVPN <xref target="RFC6514"/> (though MCAST-VPN's C-Multicast routes are replaced
	   with MCAST-TREE's Leaf A-D routes). If the mapping between overlay
	   tree/tunnel and
       underlay tunnel is one-to-one, the MPLS Label field in the PTA is set to
       0 or otherwise set to a Domain-wide Common Block (DCB) label
       <xref target="RFC9573"/> or an upstream-assigned
       label corresponding to the overlay tree/tunnel.
	</t>
	<!--t>The underlay tunnel, whether P2P to individual downstream RBRs or P2MP
       to the set of downstream RBRs, can be of any type including Segment
       Routing (SR) <xref target="RFC8402"/> policies <xref target="RFC9256"/>
       [I-D.ietf-pim-sr-p2mp-policy].
	</t-->
    </section>
    <section title="Controller Based Signaling">
	<t><xref target="I-D.ietf-bess-bgp-multicast-controller"/> specifies the
       procedures for a controller to signal multicast forwarding state to each
       router on a multicast tree based on the controller's computation.
       Depending on deployment scenarios, in inter-region cases it is possible
       that the hop-by-hop signaling specified in this document and the
       controller based signaling may be used in different regions.
	</t>
	<t>Consider a situation where an RBR is connected to three regions A, B,
       and C, where hop-by-hop signaling is used in A and B, while controller
       based signaling is used in C.
	</t>
	<t>For a particular multicast tree, A is the upstream region, while B and
       C are two downstream regions. The RBR receives a Leaf A-D route from
       region B and a Leaf A-D route from C's controller, and sends a Leaf
       A-D route to its upstream router in A.
	</t>
	<t>For a different tree, C is the upstream region while A and B are
       downstream. The RBR receives two Leaf A-D routes for the tree from
       regions A and B, and one Leaf A-D route from C's controller. Note
       that the RBR needs to signal to the controller that it is a leaf of
       the tree (because of the Leaf A-D routes received from regions A and B).
	</t>
	<t>For both cases, the RBR stitches together different segments in
       different regions by creating forwarding state based on the Leaf A-D
       routes (optionally based on the S-PMSI A-D routes in region A and B
       in addition.)
	</t>
    </section>
    </section>
    <section title="BGP Classful Transport Planes">
	  <t><xref target="I-D.ietf-idr-bgp-ct"/>
	  specifies an experimental framework for classifying underlay routes into transport
	  classes and mapping service routes to specific transport classes.
	  An underlay route signaled with BGP-CT SAFI carries a Transport Class
	  Route Target (TC-RT) to both indicate the transport class that the route
	  belongs to and to control the propagation and importation of the
	  underlay route. The recipients of the underlay routes use the TC-RT
	  to determine how the Protocol NH (PNH) is resolved. A service/overlay route
	  may carry a mapping community that maps to a transport class that
	  is used to resolve the service route's PNH.
	  </t>
	  <t>In the case of multicast, the selection of the link/tunnel between an
	  upstream and downstream tree node may be subject to the transport class
	  that the tree is for (in the case of an underlay tree) or the class of
	  transport that the tree should use (in the case of an overlay tree).
	  In both the underlay and overlay case, the transport class is indicated
	  by a mapping community attached to the BGP multicast routes,
	  which could be a color community or any community intended for mapping
      to the transport.
	  <!--derived from a Transport Class Route Target (TC-RT) as specified in
	  <xref target="I-D.zzhang-idr-rt-derived-community"/-->
	  </t>
	  <t>The mapping community not only affects an upstream node's selection
	  of link/tunnel to a downstream node, but may also affect a downstream
	  node's selection of its upstream node (i.e. the RPF procedure).
	  </t>
      <t><xref target="I-D.ietf-idr-bgp-car"/> is another experimental mechanism
	  that provides class/color-aware routing. Multicast signaled by BGP may
	  be integrated with that as well, but it is outside the scope of this
	  document.
      </t>
    </section>
	<section title="Flexible Algorithm and Multi-topology" anchor="ipa">
	<t>Similar to classful transport, in the case of multi-topology <xref target="RFC4915"/>
       <xref target="RFC5120"/> or Flexible Algorithm <xref target='RFC9350'/>,
       a multicast tree may be required to do RPF based on a particular topology
	   or Flexible Algorithm (IPA). To signal that, the BGP-MCAST Leaf A-D route
	   may carry an extended community to encode the topology and/or IPA.
       Note that this could also be an operator-defined mapping community
	   that maps to a transport class (that is
       associated with a topology or a Flexible Algorithm).
    </t>
	<t>In the grand scheme of inter-region scenario, if mLDP is to be
       used with Flexible Algorithm or Multi-topology for signaling in a
       particular region, <xref target='I-D.ietf-mpls-mldp-multi-topology'/>
       specifies how topology and/or IPA are encoded.
    </t>
	<t>Similarly, in the case of PIM, <xref target="RFC6420"/> specifies how topology information
       is encoded in PIM signaling and a similar mechanism can be specified for
       Flexible Algorithm. However, that, and potentially encoding transport
       class in PIM/mLDP are outside the scope of this document.
    </t>
	</section>
    </section>
    </section>
    <section title="Specification" anchor="specification">
    <section title="BGP NLRIs and Attributes" anchor="nlri">
    <t>
   The BGP Multiprotocol Extensions <xref target="RFC4760"/> allow BGP to carry routes
   from multiple different "AFI/SAFIs".  This document defines a
   new SAFI known as a MCAST-TREE SAFI with value 78 assigned by the
   IANA.
    </t>
    <t>
   The MCAST-TREE NLRI defined below is carried in the BGP UPDATE messages
   <xref target="RFC4271"/> 
   using the BGP multiprotocol extensions <xref target="RFC4760"/> with an AFI
   of IPv4 (1) or IPv6 (2) and a MCAST-TREE SAFI (78).
    </t>
    <t>
   The Next hop field of MP_REACH_NLRI attribute SHALL be interpreted as
   an IPv4 address whenever the length of the Next Hop address is 4
   octets, and as an IPv6 address whenever the length of the Next Hop is
   address is 16 octets.
    </t>
    <t>
   The NLRI field in the MP_REACH_NLRI and MP_UNREACH_NLRI is a prefix
   with a maximum length of 12 octets for IPv4 AFI and 36 octets for
   IPv6 AFI.  The following is the format of the MCAST-TREE NLRI:
    </t>
    <t>
    <figure>
    <artwork>
                   +-----------------------------------+
                   |    Route Type (1 octet)           |
                   +-----------------------------------+
                   |     Length (1 octet)              |
                   +-----------------------------------+
                   | Route Type specific (variable)    |
                   +-----------------------------------+
    </artwork>
    </figure>
    </t>
    <t>
   The Route Type field defines the encoding of the rest of the Route Type
 specific MCAST-TREE NLRI.
    </t>
    <t>
   The Length field indicates the length in octets of the Route Type
   specific field of MCAST-TREE NLRI.
    </t>
    <t>The following new route types are defined:
    <figure>
	<artwork>
       3 -  S-PMSI A-D Route for (x,G)
       4 -  Leaf A-D Route
       5 -  Source Active A-D Route
    0x43 -  S-PMSI A-D Route for mLDP
	</artwork>
    </figure>
    </t>
    <t>Except for the Source Active A-D routes, the routes are to be consumed
       by targeted upstream/downstream neighbors and are not propagated
       further. This can be achieved by outbound filtering based on the
       RTs that lead to the importation of the routes.
    </t>
    <t>The Type-3/4 routes MAY carry a Tunnel Encapsulation Attribute (TEA)
       <xref target="RFC9012"/>.
       The Type-0x43 route MUST carry a TEA. When used for mLDP, the Type-4
       route MUST carry a TEA. The TEA includes one tunnel entry with an
	   MPLS Label Stack Sub-TLV that includes one label. This is the label
	   associated with the (x,G) labeled tree or mLDP tunnel.
    </t>
    <section title="S-PMSI A-D Route">
    <t>Similar to defined in RFC 6514, an S-PMSI A-D Route Type specific
       MCAST-TREE NLRI consists of the following:
    <figure>
	<artwork>
      +-----------------------------------+
      |      RD   (8 octets)              |
      +-----------------------------------+
      | Multicast Source Length (1 octet) |
      +-----------------------------------+
      |  Multicast Source (variable)      |
      +-----------------------------------+
      |  Multicast Group Length (1 octet) |
      +-----------------------------------+
      |  Multicast Group   (variable)     |
      +-----------------------------------+
      |  Upstream Router's IP Address     |
      +-----------------------------------+
	</artwork>
    </figure>
    </t>
    <t>
   If the Multicast Source (or Group) field contains an IPv4 address,
   then the value of the Multicast Source (or Group) Length field is 32.
   If the Multicast Source (or Group) field contains an IPv6 address,
   then the value of the Multicast Source (or Group) Length field is 128.
    </t>
    <t>
   Usage of other values of the Multicast Source Length and Multicast
   Group Length fields is outside the scope of this document.
    </t>
    <t>There are three usages for S-PMSI A-D route. They're described in
	   <xref target="overlay"/>, <xref target="lanproc"/> and <xref target="upstream-spmsi"/>
       respectively.
    </t>
    </section>
    <section title="Leaf A-D Route">
    <t>Similar to the Leaf A-D route in <xref target="RFC6514"/>, a MCAST-TREE Leaf A-D route's
       route key includes the corresponding S-PMSI NLRI, plus the
       Originating Router's IP Address.
    </t>
    <figure>
	<artwork>

      +-----------------------------------+
      |  S-PMSI NLRI                      |
      +-----------------------------------+
      |  Originating Router's IP Address  |
      +-----------------------------------+
	</artwork>
    </figure>
    <t>For example, the entire NLRI of a Leaf A-D route for (x,G) tree is
        as following:
    <figure>
	<artwork>

      +-     +-----------------------------------+
      |      |    Route Type - 4 (Leaf A-D)      |
      |      +-----------------------------------+
      |      |     Length (1 octet)              |
      |   +- +-----------------------------------+ --+
      |   |  |    Route Type - 3 (S-PMSI A-D)    |   |
    L | L |  +-----------------------------------+   | S
    E | E |  |     Length (1 octet)              |   | |
    A | A |  +-----------------------------------+   | P
    F | F |  |      RD   (8 octets)              |   | M
      |   |  +-----------------------------------+   | S
      |   |  | Multicast Source Length (1 octet) |   | I
      |   |  +-----------------------------------+   |  I
    N | R |  |  Multicast Source (variable)      |   | 
    L | O |  +-----------------------------------+   |
    R | U |  |  Multicast Group Length (1 octet) |   | N
    I | T |  +-----------------------------------+   | L
      | E |  |  Multicast Group   (variable)     |   | R
      |   |  +-----------------------------------+   | I
      | K |  |  Upstream Router's IP Address     |   |
      | E |  +-----------------------------------+ --+
      | Y |  |  Originating Router's IP Address  |
      +-  +- +-----------------------------------+
	</artwork>
    </figure>
    </t>
    <!--t>Even though the MCAST-TREE Leaf A-D route is unsolicited, unlike the
       Leaf A-D route for GTM in <xref target="RFC7524"/>, it is encoded as if a corresponding
       S-PMSI A-D route had been received.
    </t>
    <t>When used for signaling mLDP tunnels, even though the Leaf A-D route
       is unsolicited, unlike the "Route-type 0x44 Leaf A-D route for
       C-multicast mLDP" as in <xref target="RFC7441"/>, it is Route-type 4 and encoded
       as if a corresponding S-PMSI A-D route had been received.
    </t-->
    </section>
    <section title="Source Active A-D Route">
    <t>Similar to what is defined in <xref target="RFC6514"/>, a Source Active A-D Route Type
       specific MCAST NLRI consists of the following:
    <figure>
	<artwork>
      +-----------------------------------+
      |      RD   (8 octets)              |
      +-----------------------------------+
      | Multicast Source Length (1 octet) |
      +-----------------------------------+
      |   Multicast Source (variable)     |
      +-----------------------------------+
      |  Multicast Group Length (1 octet) |
      +-----------------------------------+
      |  Multicast Group (variable)       |
      +-----------------------------------+
	</artwork>
    </figure>
    </t>
    <t>The definition of the source/length and group/length fields are the
       same as in the S-PMSI A-D routes.
    </t>
<!--    <t>
   Source Active A-D routes with a Multicast group belonging to the
   Source-Specific Multicast (SSM) range (as defined in <xref target="RFC4607"/>, and
   potentially extended locally on a router) MUST NOT be advertised by a
   router and MUST be discarded if received.
    </t>
-->
    <t>
   Usage of Source Active A-D routes is described in <xref target="source"/>.
    </t>
    </section>
    <section title="S-PMSI A-D Route for mLDP">
    <t>The route is used to signal upstream FEC for an MP2MP mLDP tunnel.
    The route key include a Route Distinguisher, the mLDP FEC and the
	Upstream Router's IP Address field.
    </t>
    </section>
    <section title="Session Address Extended Community" anchor="LAEC">
    <t>For two BGP speakers to determine if they are directly connected,
       each will advertise their local interface addresses, with a
       Session Address Extended Community. This is an IPv4/IPv6 Address
       Specific EC with the Global Administrator Field set to the local
	   address used for its multihop sessions and the Local Administrator
	   Field set to the prefix length corresponding to the interface's
	   network mask.
    </t>
    <t>As an IPv4 example, a router has two interfaces with address 192.0.2.1/28
       and 198.51.100.1/24 respectively (notice the different prefix lengths),
       and a loopback address 203.0.113.1/32 that is used for BGP sessions.
       It advertises prefix
       192.0.2.1/32 with a Session Address EC 203.0.113.1:28 and
       198.51.100.1/32 with a Session Address EC 203.0.113.1:24.
       If it also uses another loopback address 203.0.113.101/32 for other
       BGP sessions, then the routes will additionally carry Session Address EC
       203.0.113.101:28 and 203.0.113.101:24 respectively.
    </t>
    <t>As an IPv6 example, a router has two interfaces with address
	2001:DB8::1:1/112 and 2001:DB8::2:1/120 respectively (notice the
	different prefix lengths),
       and a loopback address 2001:DB8::3:1/128 that is used for BGP sessions.
       It advertises prefix 2001:DB8::1:1/128 with a Session Address EC
	   "2001:DB8::3:1":112 (the quoted part is the IPv6 address for the
	   Global Administrator field and the "112" is the Local Administrator
	   field) and prefix 2001:DB8::2:1/128 with a Session Address EC
	   "2001:DB8::3:1":120.
       If it also uses another loopback address 2001:DB8:3:101/128 for other
       BGP sessions, then the routes will additionally carry Session Address EC
       "2001:DB8::3:101":112 and "2001:DB8::3:101":120 respectively.
    </t>
    <t>This achieves what the Address List TLV in LDP Address Messages
       achieves, and can also be used to indicate that a router supports
       the BGP multicast signaling procedures specified in this document.
    </t>
    <t>Only those interface addresses that will be used as resolved RPF nexthops
       in the RIB need to be advertised with the Session Address EC. For example,
       the RPF lookup may say that the resolved nexthop address is A1,
       so the router needs to find out the corresponding BGP speaker with
       address A1 through the (interface address, session address) mapping
       built according to the interface address NLRI with the Session Address EC.
       For comparison, in LDP this is done via the (interface address,
       session address) mapping that is built by the LDP Address Messages.
    </t>
    </section>
    <section title="Multicast RPF Address Extended Community" anchor="rpfec">
	<t>This is an IP or IPv6 Address Specific EC with the Global Admin Field
       set to the address of the upstream RBR and the Local Admin Field set
       to 0.
	</t>
    </section>
    <section title="Topology/IPA Extended Community">
	<t>This is a Transitive Opaque Extended Community with the following
       format:
    <figure>
	<artwork>
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       0x03    |   Sub-Type    |         Reserved              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |              IPA              |        MT-ID                  |         
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	</artwork>
    </figure>
    </t>
	<t>IPA is the Flexible Algorithm number and MT-ID is the Multi-Topology
    Identifier to be used for setting a multicast tree. The usage of this
	EC is described in <xref target="ipa"/>.
    </t>
    </section>
    </section>
    <section title="Procedures">
    <section title="Source Discovery for ASM" anchor="sa">
    <t>When an FHR first receives a multicast packet addressed to an ASM
       group, it originates a Source Active route.
    </t>
    <t>The FHRs withdraw the Source Active route after a certain amount
       of time since it last received a packet of an (S,G) flow. The
       amount of time to wait is a local matter.
    </t>
    <t>The SA routes carry an IPv4 or IPv6 address specific Route Target.
	   The Global Administrator field is set to the group
       address of the flow, and the Local Administrator field is set to 0
       or a pre-assigned domain-wide unique value that identifies a VPN.
    </t>
    <t>
       When an LHR needs to join an ASM group (e.g., as the result of
	   receiving a (*,G) IGMP/MLD join), it advertises a Route Target
	   Membership route, with the Route Target
       field in the NLRI set according to the group, as how an FHR encodes
	   the Route Target in its Source Active  routes. The propagation of
       the SA routes is subject to cooperative export filtering as specified
       in <xref target="RFC4684"/> and referred to as Route Target Constrain
	   (RTC) mechanism
	   in this document. With that, the LHR only receives Source Active routes
	   for groups that it is interested in.
    </t>
    <t>Upon the receiving of the Source Active A-D routes, the LHR originates
	Leaf A-D routes as described
       below, as long as it still needs to receive traffic for the flows
       (i.e., the corresponding IGMP/MLD membership exists or join from
       downstream PIM/BGP neighbor exists).
    </t>
    </section>
    <section title="Originating Tree Join Routes" anchor="join-send">
    <section title="(x,G) Multicast Tree">
    <t>When a router needs to join a particular (S,G) tree,
       it determines the RPF nexthop address wrt the source, following the
       same RPF procedures as defined for PIM. It further finds the BGP router
       that advertised the nexthop address as one of its local addresses.
    </t>
    <t>If the RPF neighbor supports MCAST-TREE SAFI, this router originates a
       Leaf A-D route. Although it is unsolicited, it is constructed as if
       there was a corresponding S-PMSI A-D route. The Upstream Router's
       IP Address field is set to the RPF neighbor's session address (learnt
       via the EC attached to the host route for the RPF nexthop address).
       An Address Specific RT corresponding to the session address is attached
       to the route, with the Global Administrative Field set to the session
       address and the local administrative field set to 0 or a pre-assigned
       domain-wide unique value that identifies a VPN. The route is advertised
	   the route to the RPF neighbor (in the case of EBGP or hop-by-hop IBGP),
	   or to one or more RRs.
    </t>
    <t>Similarly, when a router learns that it needs to join a bi-directional
       tree for a particular group, it determines the RPF neighbor wrt the RPA.
       If the neighbor supports MCAST-TREE SAFI, it originates a Leaf A-D Route.
    </t>
    <t>As the Leaf A-D route is originated, the router sets up the
	corresponding forwarding state such that the expected
       incoming interface list includes all non-LAN interfaces directly
       connecting to the upstream neighbor. LAN interfaces are added upon
       receiving corresponding S-PMSI A-D route (<xref target="spmsi-recv"/>).
       If the upstream neighbor is not directly connected, a corresponding
       S-PMSI A-D route advertised by the upstream router is used to determine
       the tunnel used to receive traffic, as described in <xref target="overlay"/>
    </t>
    <t>When the upstream neighbor changes, the previously advertised Leaf A-D route
       is withdrawn. If there is a new upstream neighbor, a new Leaf A-D route
       is originated, corresponding to the new neighbor. Because NLRIs are
       different for the old and new Leaf A-D routes, make-before-break as
	   well as Multicast Only Fast ReRroute (MoFRR) <xref target="RFC7431"/> can be achieved.
    </t>
    </section>
    <section title="BGP Inband Signaling for mLDP Tunnel">
    <t>The same mLDP procedures as defined in <xref target="RFC6388"/> are followed, except
       that where a label mapping message is sent in <xref target="RFC6388"/>, a Leaf A-D
       route is sent if the upstream neighbor supports BGP based signaling.
    </t>
    </section>
    </section>
    <section title="Receiving Tree Join Routes">
    <t>A router (auto-)configures Import RTs matching itself so that
       it can import tree join routes from their peers.
       Note that in this document, tree join routes are Leaf A-D routes.
    </t>
    <t>When a router receives a tree join route 
       and imports it, it determines if it needs to originate its own
       corresponding route and advertise further upstream wrt the source/RPA
       or mLDP tunnel root. If this router is the FHR or is on the RPL for a
       bidrectional group, or is the
       tunnel root, then it does not need to. Otherwise, the procedures
       in <xref target="join-send"/> are followed.
    </t>
    <t>Additionally, the router sets up its corresponding forwarding state
       such that traffic will be sent to the downstream neighbor, and received
       from the downstream neighbor in the case of bidirectional tree/tunnel.
       If the downstream neighbor is not directly connected, the tunnel announced
       in a corresponding S-PMSI route is used, as described in <xref target="overlay"/>.
    </t>
    </section>
    <section title="Withdrawal of Tree Join Routes">
    <t>For a particular tree or tunnel, if a downstream neighbor withdraws its
       Leaf A-D route, the neighbor is removed from the corresponding
       forwarding state. If all downstream neighbors withdraw their tree join
       routes and this router no longer has local receivers, it withdraws
       the tree join routes that it previously originated.
    </t>
    <t>As mentioned earlier, when the upstream neighbor changes, the previously
       advertised Leaf A-D route is also withdrawn. The corresponding incoming
       interfaces are also removed from the corresponding forwarding state.
    </t>
    </section>
    <section title="LAN procedures for (x,G) Unidirectional Tree" anchor="lanproc">
    <t>For a unidirectional (x,G) multicast tree, if there is a LAN
       interface connecting to the downstream neighbor, it MAY be preferred
       over non-LAN interfaces, but an S-PMSI A-D route MUST be originated
       to facilitate the analog of the Assert process (<xref
       target="spmsi-send"/>).
    </t>
    <section title="Originating S-PMSI A-D Routes" anchor="spmsi-send">
    <t>If this router chooses to use a LAN interface to send traffic
       to its neighbors for a particular (S,G) or (*,G) flow, it MUST
       announce that by originating a corresponding S-PMSI A-D route that
       does not include a PTA.
       The LAN interface is identified by an IP address specific RT,
       with the Global Administrative Field set to
       the LAN interface's address prefix and the Local Administrative Field
       set to the prefix length. The RT also serves the purpose of restricting
       the importing of the route by all routers on the LAN. An operator
       MUST ensure that RTs encoded as above are not used for other purposes.
       Practically that should not be unreasonable.
    </t>
    <t>If multiple LAN interfaces are to be used (to reach different sets of
       neighbors), then the route will include multiple RTs, one for each
       used LAN interface as described above.
    </t>
    </section>
    <section title="Receiving S-PMSI A-D Routes" anchor="spmsi-recv">
    <t>A router (auto-)configures an Import RT for each of its LAN interfaces
       over which BGP is used for multicast signaling. The construction
       of the RT is described in the previous section.
    </t>
    <t>When a router R1 imports an S-PMSI A-D route for flow (x,G) from
       router R2, R1 checks to see if it also originats an S-PMSI A-D
       route with the same NLRI except the Upstream Router's IP Address field.
       When a router R1 originates an S-PMSI
       A-D route, it checks to see if it also has installed an S-PMSI A-D
       route, from some other router R2, with the same NLRI except the
       Upstream Router's IP Address field.  In either
       case, R1 checks to see if the two routes have an RT in common and
       the RT is encoded as in <xref target="spmsi-send"/>.  If
       so, then there is a LAN attached to both R1 and R2, and both
       routers are prepared to send (S,G) traffic onto that LAN.
       This kicks off the assert procedure
       to elect a winner - the one with the highest Upstream Router's IP
       Address in the NLRI wins.
       An assert loser will not include the corresponding LAN interface
       in its outgoing interface list, but it keeps the S-PMSI A-D route
       that it originates.
    </t>
    <t>If this router does not have a matching S-PMSI route of its own
       with some common RTs, and the originator of the received S-PMSI route
       is a chosen upstream neighbor for the corresponding flow, then
       this router updates its forwarding state to include the LAN
       interface in the incoming interface list. When the last S-PMSI route
       with a RT matching the LAN is withdrawn later, the LAN interface is
       removed from the incoming interface list.
    </t>
    <t>Note that a downstream router on the LAN does not participate in the
       assert procedure. It adds/keeps the LAN interface in the
       expected incoming interfaces as long as its chosen upstream peer
       originates the S-PMSI AD route. It does not switch to the assert winner
       as its upstream. An assert loser MAY keep sending joins upstream based
       on local policy even if it has no other downstream neighbors
       (this could be used for fast switchover in case the assert winner
       fails).
    </t>
	<t>If this router receives an S-PMSI A-D route from its upstream neighbor
	with multiple RTs for the LANs that this router is on, it MUST select
	only one of the LAN interfaces to receive traffic. Which LAN interface is
	selected is a local decision.
	</t>
    </section>
    </section>
    <section title="Distributing Label for Upstream Traffic for Bidirectional Tree/Tunnel" anchor="upstream-spmsi">
    <t>For MP2MP mLDP tunnels or labeled (*,G) bidirectional trees, an upstream
       router needs to advertise a label to all its downstream neighbors so
       that the downstream neighbors can send traffic to itself.
    </t>
    <t>For MP2MP mLDP tunnels, the same procedures for mLDP are followed
       except that instead of MP2MP-U Label Mapping messages, S-PMSI A-D
       Routes for mLDP are used.
    </t>
    <t>For labeled (*,G) bidirectional trees, for a Leaf A-D route received
       from a downstream neighbor, a corresponding S-PMSI A-D route is sent
       back to the downstream router.
    </t>
    <t>In both cases, a single S-PMSI A-D route is originated for each tree
       from this router,
       but with multiple RTs (one for each downstream neighbor on the tree).
       A TEA specifies a label allocated by the upstream router for its
       downstream neighbors to send traffic with. Note that this is still a
       "downstream allocated" label (the upstream router is "downstream"
       from traffic direction point of view).
    </t>
    <t>The S-PMSI routes do not carry a PTA, unless a tunnel is used
       to reach downstream neighbors as described in <xref target="overlay"/>.
    </t>
    </section>
    </section>
    </section>
    <section title="IANA Considerations">
	<t>IANA has assigned BGP SAFI value 78 for the MCAST-TREE SAFI.
	</t>
	<t>This document requests IANA to create a new "BGP MCAST-TREE Route Types"
       registry, referencing this document. The following initial values are
       defined:
    <figure>
	<artwork>
       0~2 -  Reserved
       3   -  S-PMSI A-D Route for (x,G)
       4   -  Leaf A-D Route
       5   -  Source Active A-D Route
    0x43   -  S-PMSI A-D Route for mLDP
	</artwork>
    </figure>
	</t>
	<t>This document requests IANA to assign two Sub-type values from
       Transitive IPv4-Address-Specific Extended Community Sub-types Registry
       for Session Address EC and Multicast RPF Address EC respectively.
	</t>
	<t>This document requests IANA to assign two Sub-Type values from
       Transitive IPv6-Address-Specific Extended Community Types Registry
       for Session Address EC and Multicast RPF Address EC respectively.
	</t>
	<t>This document requests IANA to assign one Sub-Type value from
       Transitive Opaque Extended Community Types Registry
       for the Topology/IPA EC.
	</t>
    </section>
    <section anchor="Security" title="Security Considerations">
      <t>
This document shares many of the mechanisms and concepts of MVPN and, 
accordingly, can reuse many of the security considerations described in 
RFC6513 and RFC6514, though the distinctions made on PE-CE links and 
relationships in those documents are not relevant.
      </t>
      <t>
This document describes interworking with several multicast control 
protocols, including PIM-SM, PIM-SSM, PIM-Bidir, mLDP and IGMP/MLD.
Security considerations specified for those protocols are 
applicable to this document.
      </t>
      <t>
Implementations should include Multicast Damping procedures specified in 
RFC7899 to protect the control plane from excessive churn due to multicast 
dynamicity.  Implementations should also include the ability to rate-limit 
join state creation on a per-peer and per-RIB basis, as well as rate-limit 
Source Active A-D route propagation on a per-source, per-peer and per-RIB 
basis to configurable thresholds.
      </t>
    </section>
    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>The authors thank Marco Rodrigues for his
         initial idea/ask of using BGP for multicast signaling beyond MVPN.
         We thank Eric Rosen for his questions, suggestions, and help to
         find solutions to some issues. We also thank Luay Jalil,
         James Uttaro and Shraddha Hegde for their comments and support
		 for the work. Special thanks go to Joe Halpern for his thorough
		 review and comments that significantly improved the document quality.
      </t>
    </section>
  </middle>
  <back>
    <references title="Normative References">
	  <?rfc include='reference.RFC.2119.xml'?>
	  <?rfc include='reference.RFC.8174.xml'?>
	  <?rfc include='reference.RFC.7761.xml'?>
	  <?rfc include='reference.RFC.6388.xml'?>
	  <?rfc include='reference.RFC.5015.xml'?>
	  <?rfc include='reference.RFC.6514.xml'?>
	  <?rfc include='reference.RFC.4684.xml'?>
	  <?rfc include='reference.RFC.4271.xml'?>
	  <?rfc include='reference.RFC.4760.xml'?>
	  <?rfc include='reference.RFC.5492.xml'?>
	  <?rfc include='reference.RFC.9012.xml'?>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.6826.xml'?>
	  <?rfc include='reference.RFC.7438.xml'?>
	  <?rfc include='reference.RFC.7431.xml'?>
	  <?rfc include='reference.RFC.7938.xml'?>
	  <?rfc include='reference.RFC.5496.xml'?>
	  <?rfc include='reference.RFC.6512.xml'?>
	  <?rfc include='reference.RFC.7060.xml'?>
	  <?rfc include='reference.RFC.6420.xml'?>
	  <?rfc include='reference.RFC.4915.xml'?>
	  <?rfc include='reference.RFC.5120.xml'?>
	  <?rfc include='reference.RFC.9350.xml'?>
	  <?rfc include='reference.RFC.6559.xml'?>
	  <?rfc include='reference.RFC.3376.xml'?>
	  <?rfc include='reference.RFC.3810.xml'?>
	  <?rfc include='reference.RFC.9573.xml'?>
	  <?rfc include='reference.I-D.ietf-idr-bgp-ct.xml'?>
	  <?rfc include='reference.I-D.ietf-idr-bgp-car.xml'?>
      <?rfc include='reference.I-D.ietf-bess-mvpn-pe-ce.xml'?>
      <?rfc include='reference.I-D.ietf-mpls-seamless-mpls.xml'?>
      <?rfc include='reference.I-D.ietf-bess-bgp-multicast-controller.xml'?>
      <?rfc include='reference.I-D.wijnands-bier-mld-lan-election.xml'?>
      <?rfc include='reference.I-D.ietf-mpls-mldp-multi-topology.xml'?>
    </references>
  </back>
</rfc>
