<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="info" docName="draft-cth-rtgwg-bgp-control-13"
     ipr="trust200902">
  <front>
    <title abbrev="Architecture for BGP Controller">
    Architecture for Use of BGP as Central Controller</title>

    <author fullname="Yujia" initials="Y" surname="Luo">
      <organization>China Telcom Co., Ltd.</organization>
      <address>
        <postal>
          <street>109 West Zhongshan Ave,Tianhe District</street>
          <city>Guangzhou</city>
          <code>510630</code>
          <country>China</country>
        </postal>
        <email>luoyuj@sdu.edu.cn</email>
      </address>
    </author>

    <author fullname="Liang" initials="L" surname="Qu">
      <organization>China Telcom Co., Ltd.</organization>
      <address>
        <postal>
          <street>109 West Zhongshan Ave,Tianhe District</street>
          <city>Guangzhou</city>
          <code>510630</code>
          <country>China</country>
        </postal>
        <email>ouliang@chinatelecom.cn</email>
      </address>
    </author>

    <author fullname="Xiang" initials="X" surname="Huang">
      <organization>Tencent</organization>
      <address>
        <postal>
          <street> </street>
          <city> </city>
          <code> </code>
          <country> </country>
        </postal>
        <email>terranhuang@tencent.com</email>
      </address>
    </author>

    <author fullname="Gyan S. Mishra" initials="G" surname="Mishra">
      <organization>Verizon Inc.</organization>
      <address>
        <postal>
          <street>13101 Columbia Pike</street>
          <city>Silver Spring</city>
          <code>MD 20904</code>
          <country>USA</country>
        </postal>
        <phone> 301 502-1347</phone>
        <email>gyan.s.mishra@verizon.com</email>
      </address>
    </author>

     <author initials="H" surname="Chen" fullname="Huaimo Chen">
      <organization>Futurewei</organization>
      <address>
        <postal>
          <street></street>
          <city>Boston, MA</city>
          <region></region>
          <code></code>
          <country>USA</country>
        </postal>
        <email>hchen.ietf@gmail.com</email>
      </address>
    </author>

    <author fullname="Shunwan Zhuang" initials="S" surname="Zhuang">
      <organization>Huawei</organization>

      <address>
        <postal>
          <street>Huawei Bld., No.156 Beiqing Rd.</street>
          <city>Beijing</city>
          <code>100095</code>
          <country>China</country>
        </postal>
        <email>zhuangshunwan@huawei.com</email>
      </address>
    </author>

    <author fullname="Zhenbin Li" initials="Z" surname="Li">
      <organization>Huawei</organization>
      <address>
        <postal>
          <street>Huawei Bld., No.156 Beiqing Rd.</street>
          <city>Beijing</city>
          <code>100095</code>
          <country>China</country>
        </postal>
        <email>lizhenbin@huawei.com</email>
      </address>
    </author>

    <date year="2024"/>

    <abstract>
      <t>BGP is a core part of a network including Software-Defined 
      Networking (SDN) system. 
      It has the traffic engineering information on the network topology
      and can compute optimal paths for a given traffic flow across the 
      network. </t>
<!--
      <t>PCE as a central controller is proposed. 
      However, to use PCE as a central controller, 
      operators have to deploy PCE protocol in their networks. 
      In addition, they need to make PCE to obtain the traffic engineering 
      information about the network topology from other protocols 
      such as BGP-LS. 
      It is natural and beneficial to use BGP as a central controller, 
      where, operators do not need deploy and maintain 
      an extra PCE protocol in their networks. </t>
-->
      <t>This document describes some reference architectures 
      for BGP as a central controller. 
      A BGP-based central controller can simplify the operations 
      on the network and use network resources efficiently for providing 
      services with high quality.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>Border Gateway Protocol (BGP) <xref target="RFC1771"></xref> 
      is an exterior gateway protocol (EGP).
      It is developed to exchange routing information among routers 
      in different autonomous systems (ASes).
      Along its developments, BGP has been extended to provide 
      numerous new functions. 
      It collects the link states including traffic engineering (TE) 
      information from other protocols such as IGP  
      and distributes them among routers in different ASes 
      <xref target="RFC7752"></xref>.
      It also controls the redirection of traffic flows 
      <xref target="RFC5575"></xref>.
      Furthermore, it distributes MPLS labels 
      <xref target="RFC3107"></xref>.
      For scalability, BGP is extended to have Route Reflector (RR) 
      <xref target="RFC4456"></xref>.</t>

      <t>For segment routing (SR), BGP is extended to advertise
      SR policies with candidate paths to the policy headend routers,
      which are typically ingress routers
      <xref target="I-D.ietf-idr-segment-routing-te-policy"></xref>.
      
      The SR specific PCEP extensions are defined in 
      <xref target="I-D.ietf-pce-segment-routing"></xref>.
      A stateful PCE can compute an SR traffic engineering (SR-TE) path
      satisfying a set of constraints, and initiate an SR-TE path 
      on a headend router using the extensions.</t>

      <t>An SDN controller (or controller for short) is the core of 
      an SDN system or network. 
      It is between network elements (NEs) such as routers or 
      switches at one end and applications such as 
      Operational Support System (OSS) or Network Management System (NMS)
      at the other end.
      The essential function of a controller is to steer traffic flows 
      across the network for providing more services with higher quality.
      It manages network resources such as link bandwidth, 
      computes expected paths for carrying traffic flows based on 
      available network resources,
      programs the network elements for the creation of tunnels along 
      the paths, and redirects traffic flows into corresponding tunnels.</t>

     <t>Based on the current BGP, it is natural, beneficial and 
     relatively simple to extend BGP to become a controller.
     Using BGP as a controller for a network will greatly simplify 
     the operations on the network. 
     It avoids deploying, operating and maintaining
     a new extra component or protocol such as PCE 
     as a controller in the network.
     </t>

     <t>This document describes some reference architectures for BGP as
     a central controller and introduces some scenarios to which 
     the BGP controller can be applied.</t>

    </section> <!-- Introduction -->

    <section title="Terminology">
      <t><list style="symbols">
          <t>SR: Segment Routing</t>
          <t>RR: Route Reflector</t>
          <t>SID: Segment Identifier</t>
          <t>SR-Path: Segment Routing Path</t>
          <t>SR-Tunnel: Segment Routing Tunnel</t>
          <t>TEDB: Traffic Engineering Database</t>
          <t>LSDB: Link State Database</t>
          <t>SLDB: SID/Label Database</t>
          <t>TPDB: Tunnel and Path Database</t>
          <t>CSPF: Constrained Shortest Path First</t>
          <t>TM: Tunnel Manager</t>
          <t>NMS: Network Management System</t>
          <t>SRLB: SR Local Block</t>
          <t>NE: Network Element</t>
          <t>PCE: Path Computation Element</t>
          <t>AS:  Autonomous System</t>
          <t>QoS: Quality of Service</t>
          <t>ISP: Internet Service Provider</t>
          <t>MAN: Metropolitan Area Network</t>
          <t>OTT: Over the Top</t>
          <t>OTTSP: Over the Top Service Provider, or Content Operator</t>
          <t>AR: Access Router</t>
        </list></t>
    </section>

    <section title="Architectures">
      <t>An architecture for the use of BGP as a central controller is
      based on the essential function of a controller. 
      It is constructed from some building blocks or components.
      After introduction to building blocks, 
      a few of reference architectures are described in this section.</t>

      <section title="Building Blocks">
        <t>Some critical building blocks are briefed. 
        They are Traffic Engineering Database (TEDB or TED for short), 
        SID/Label Database (SLDB),
        Tunnel and Path Database (TPDB),
        Constrained Shortest Path First (CSPF), and
        Tunnel Manager (TM).</t>

        <section title="TEDB">
          <t>The Traffic Engineering Database (TEDB) stores 
          the Traffic Engineering (TE) information about the network.
          It includes the unreserved bandwidth at each of eight priority
          levels for every link in the network.</t>

          <t>TEDB can be an individual block, which is constructed from
          the link state information received.
          It may be embedded into the link state database (LSDB) in the 
          BGP when the BGP creates/updates the LSDB from the link state 
          information it receives.</t>
        </section> <!-- TEDB -->

        <section title="SLDB">
          <t>The SID/Label Database (SLDB) records and maintains the 
          status of every Segment Identifier (SID) and label for every
          node, interface/link and/or prefix in the network, which the
          controller controls.  
          The status of SID/label indicates whether the SID/Label is
          assigned. If it is assigned, then the object such as the node, 
          link or prefix, to which it is assigned, is recorded.</t>

          <t>SLDB can be an individual block, which is constructed from
          the link state information such as SR Local Block (SRLB) that 
          the BGP receives.
          It may be embedded into the link state database (LSDB) in the BGP
          when the BGP creates the LSDB from the link state information 
          it receives.</t>
        </section> <!-- SLDB -->

        <section title="TPDB">
          <t>The Tunnel and Path Database (TPDB) stores the information 
          for every tunnel, which includes:

          <list style="hanging">
            <t hangText="o">
              the parameters received for the tunnel from a user/application,
            </t>

            <t hangText="o">
              the path computed for the tunnel, 
            </t>

            <t hangText="o">
              the resources such as link bandwidth reserved along the path
              for the tunnel,
            </t>

            <t hangText="o">
              the SID/labels assigned along the path for the tunnel, and
            </t>

            <t hangText="o">
              the status of the tunnel.
            </t>

          </list>
          </t>
        </section> <!-- TPDB -->

        <section title="CSPF">
          <t>The Constrained Shortest Path First (CSPF) computes a path 
          for a tunnel such as SR tunnel or LSP tunnel that satisfies a 
          set of given constraints using the information in TEDB.</t>
        </section> <!-- CSPF -->

        <section title="TM">
          <t>The Tunnel Manager (TM) receives a request for an operation 
          on a tunnel from a user or an application such as Network 
          Management System (NMS). The operation may be a creation of a 
          new tunnel, a deletion of an existing tunnel, or a change to
          an existing tunnel.</t>

          <t>When receiving a request for creating a new tunnel, 
          the TM asks the CSPF to compute a path for the tunnel 
          that satisfies the constraints given for the tunnel.</t>

          <t>After obtaining the path for the tunnel from the CSPF, the TM
          requests the SLDB to assign SID/labels along the path for the
          tunnel and asks the TEDB to reserve the resources such as link
          bandwidth along the path for the tunnel.</t>
         
          <t>The TM in a central controller may set up the tunnel
          along the path in the network by programming each of the NEs
          along the path through the API to the network. In a SR network,
          the TM initiates a SR tunnel in the network by sending a sequence
          of SID/labels to the source NE of the tunnel.</t>

          <t>The TM records the information for the tunnel in the Tunnel and 
          Path Database (TPDB).  The information includes
          the path computed for the tunnel, 
          the resources such as bandwidth reserved along the path,
          the SID/labels assigned along the path for the tunnel, and
          the status of the tunnel.</t>
        </section> <!-- TM -->
      </section> <!-- Building Blocks -->

      <section title="One Controller">
        <t>Figure below illustrates a reference architecture for 
        using the BGP as a central controller, which controls 
        a network.

        The BGP as a controller in the reference architecture 
        controls a network through an API to the network such as BGP+/RR+ 
        (extensions to BGP for central controller).
        The BGP controller is responsible for creating and maintaining
        every tunnel in the network. It also controls the redirection 
        of traffic flow to each tunnel.</t>

        <t>The BGP controller comprises a number of modules, including 
        a TM, a CSPF, a TEDB, a SLDB and a TPDB.
        The interfaces among these modules are listed as follows:</t>

        <t><figure>
          <artwork> <![CDATA[ 
            +------------------------------------------+
            | Users/Applications(Orchestrator/OSS/NMS) |
            +------------------------------------------+
                                 |
          +----------------------------------------------+
          | BGP as Controller                            |
          |                      +---------------+       |
          |         /------------|       TM      |       |
          |        /     Ia      +---------------+       |
          |  +--------+           | |  |       \         |
          |  |  CSPF  |   ________| |  |        \Id      |
          |  +--------+  /   Ib    /Ic |    +---------+  |
          |        \Ie  /         /    |    |   TPDB  |  |
          |     +---------+ +-------+  |    +---------+  |
          |     |  TEDB   | |  SLDB |  |                 |
          |     +---------+ +-------+  |                 |
          |           \         \      |In               |
          +----------------API to Network(RR+)-----------+
                             /       \
                            /         \____
                           /           \   \____
                          /\  .---. .---+       \
                         |  \(     '    |'.---. |
                         |---\  Network |      '+.
                        (o    \         |       | )
                         (     |        |       o)
                          (    |        |       )
                           (   o        o    .-'
                            '               )
                             '---._.-.     )
                                      '---'  
          ]]>
          </artwork>
          </figure> </t>

        <t><list style="symbols">
         <t> 
          Interface Ia between the TM and the CSPF. 
          Through this interface, the TM requests the CSPF 
          to compute a path for a tunnel with a set of constraints, 
          and the CSPF responses the TM with the path computed 
          that satisfies the constraints.
         </t>

         <t> 
          Interface Ib between the TM and the TEDB.
          When a tunnel is to be created, 
          through this interface, the TM reserves in the TEDB 
          the TE resources such as link bandwidths 
          on every link along the path computed for the tunnel.

          When a tunnel is deleted, 
          the TM releases  
          the TE resources such as link bandwidths 
          on every link along the path for the tunnel.
         </t>

         <t> 
          Interface Ic between the TM and the SLDB. 
          When a tunnel is to be created, 
          through this interface, the TM reserves in the SLDB 
          a SID/label for every link or some links along the path 
          computed for the tunnel.

          When a tunnel is deleted, 
          the TM releases  
          the SID/label for every link or some links 
          along the path for the tunnel.
         </t>

         <t> 
          Interface Id between the TM and the TPDB.
          the TM updates the information for every tunnel in the TPDB
          through this interface.
         </t>

         <t> 
          Interface Ie between the CSPF and the TEDB. Through this interface,
          the CSPF accesses the traffic engineering information such as link 
          bandwidths when it computes a path for a tunnel.
         </t>
        </list>
        </t>

        <t>
        There is an interface In between the BGP controller and the network.
        In fact, there is a control channel (or interface) between
        the BGP controller and every (edge) node in the network.
        </t>

        <t>
        Initially, the TEDB obtains the original traffic engineering (TE) 
        information such as link bandwidths from the network through the 
        interface In (i.e., API to network) for every link in the network.

        The SLDB gets the original SID/label resources from the network 
        through the interface for every node, link and prefix in the network.
        </t>

<!--
        <t>
        Then the TE information in the TEDB is updated mostly by the 
        following events.
        <list style="symbols">
          <t>
          When a tunnel is to be created, 
          the TM reserves in the TEDB bandwidths on every link 
          along the path for the tunnel.
          </t>

          <t>
          When a tunnel is deleted, 
          the TM releases bandwidths on every link 
          along the path for the tunnel.
          </t>

          <t>
          When a link in the network is down, 
          the TE information about the link is removed from the TEDB.
          </t>

          <t>
          When a link in the network is up, 
          the TE information about the link is added into the TEDB.
          </t>
        </list>
        </t>

        <t>
        The SID/label resources in the SLDB may be updated as follows:
        <list style="symbols">
          <t>
          When a tunnel is to be created, 
          the TM reserves in the SLDB a SID/label for every link 
          along the path for the tunnel. 
          </t>

          <t>
          When a tunnel is deleted, 
          the TM releases the SID/label for every link 
          along the path for the tunnel.
          </t>

          <t>
          When a node in the network is down, 
          the SID/label resources on the node is removed from the SLDB.

          When a link in the network is down, 
          the SID/label resources on the link is removed from the SLDB.
          </t>

          <t>
          When a node in the network is up, 
          the SID/label resources on the node is added into the SLDB.

          When a link in the network is up, 
          the SID/label resources on the link is added into the SLDB.
          </t>
        </list>
        </t>
-->
      </section> <!-- One Controller -->

      <section title="Controller Cluster">
        <t>A critical issue in a network with a central controller
        is the failure of the controller, 
        which is a single point of failure (SPOF).
        If the controller fails, the entire network may not work.</t>

        <t>A controller cluster (i.e., a group of controllers)
        works as a single controller from user's point of view.
        A simple controller cluster consists of two controllers.
        One works as a active (or say primary) controller, 
        and the other as a standby (or say secondary) controller.
        In normal operations, the active controller is responsible 
        for the network it controls. It also synchronizes with the 
        standby controller. When the active controller fails,
        the standby controller becomes a new active controller,
        which controls the network.</t>

        <t>The Figure below illustrates a simple controller cluster 
        containing two BGP-based controllers:
        Active BGP-based Controller and Standby BGP-based Controller.

        In normal operations, the active controller interacts with 
        users and/or applications. For example, it receives 
        configurations for tunnels and the traffic flows to tunnels
        from users. The active controller instructs the network elements
        in the network to provide the services requested by users and/or
        applications. For example, after receiving the configurations
        for a tunnel and a traffic flow to the tunnel,
        the active controller computes a path for the tunnel,
        programs (or say instructs) the network elements along 
        the path for creating the tunnel, and instructs the ingress
        of the tunnel to direct the traffic flow into the tunnel.</t>

        <t><figure>
            <artwork align="center"><![CDATA[
       +-------------------------------------------+
       |  Users/Applications(Orchestrator/OSS/NMS) |
       +-------------------------------------------+
                              ^
                              |
   +--------------------------+------------------------+
   | Controller ______________|_____________           |
   | Cluster   |                            |          |
   |           |    ___________________     |          |
   |           |   |  Synchronization  |    |          |
   |           v   v                   v    v          |
   |    +------------+               +------------+    |
   |    | Active     |               | Standby    |    |
   |    | BGP-based  |               | BGP-based  |    |
   |    | Controller |               | Controller |    |
   |    +------------+               +------------+    |
   |           ^                            ^          |
   |           |____________________________|          |
   |                          |                        |
   |                          v                        |
   +-----------------API to Network(RR+)---------------+
                         /       \
                        /         \____
                       /           \   \____
                      /\  .---. .---+       \
                     |  \(     '    |'.---. |
                     |---\  Network |      '+.
                    (o    \         |       | )
                     (     |        |       o)
                      (    |        |       )
                       (   o        o    .-'
                        '               )
                         '---._.-.     )
                                  '---'
]]></artwork>
          </figure></t>
       <t>During this process, the status information about the network
        is updated in the active controller. The information includes:
        the traffic engineering information in their TEDBs,
        the SID/label information in their SLDBs, and 
        the configurations, paths, resources and status for tunnels
        in their TPDBs.
        The active controller synchronizes this information with 
        the standby controller. Thus 
        these two controllers have the same status information 
        about the network. 
        When the active controller fails, 
        the standby controller takes over the role of the active
        controller smoothly and becomes active controller.</t>
      </section> <!-- Controller Cluster -->

      <section title="Hierarchical Controllers">
         <t>The Figure below illustrates a system with 
         hierarchical controllers. 
         There is one Parent Controller and four Child Controllers:
         Child Controller 1, Child Controller 2, Child Controller 3 and 
         Child Controller 4.</t>

         <t><figure>
            <artwork align="center"><![CDATA[
         +-------------------------------------------+
         |  Users/Applications(Orchestrator/OSS/NMS) |
         +----------------------+--------------------+
                                |
                      +---------+---------+
                      | Parent Controller |
                      +--+---------+----+-+
                       _/|          \    \____
                     _/  |           \        \____
                   _/    |            \            \__
                __/      |   +---------+---------+    \
             __/         |   |Child Controller 3 |    |
            /            |   +-------------------+    |
 +---------+---------+   |       /       \            |
 |Child Controller 1 |   |     .---. .---,\           |
 +-------------------+   |    (     '     ')          |
      /       \          |    (  Domain 3 )           |
    .---. .---,\         |     (         )  +---------+---------+
   (     '     ')        |      '-o-.--o)   |Child Controller 4 |
   (  Domain 1 )         |             |    +-------------------+
    (         )          |             |        /         \____
     '-o-.---)  +--------+----------+  \       /           \   \____
       |        |Child Controller 2 |   \     /\  .---. .---+       \
       |        +-------------------+    \   |  \(     '    |'.---. |
       |            /         \____       \_ |---\ Domain 4 |      '+,
       \           /           \   \____    (o    \         |       | )
        \         /\  .---. .---+       \    (     |        |       o)
         \       |  \(     '    |'.---. |     (    |        |       )
          \      |---\ Domain 2 |      '+.     (   o        o    .-'
           \____(o    \         |       | )     '               )
                 (     |        |       o)-------o---._.-.-----)
                  (    |        |       )
                   (   o        o    .-'
                    '               )
                     '---._.-.-----)
  ]]>
  </artwork>
          </figure></t>

        <t>The parent controller communicates with these four child 
        controllers and controls them, each of which controls 
        (or is responsible for) a domain.
        Child controller 1 controls domain 1,
        Child controller 2 controls domain 2,
        Child controller 3 controls domain 3, and
        Child controller 4 controls domain 4.</t>

        <t>One level of hierarchy of controllers is illustrated 
        in the figure above. There is one parent controller at top level, 
        which is not a child controller. Under the parent controller, 
        there are four child controllers, 
        which are not parent controllers. </t>

        <t>In a general case, at top level there is one parent 
        controller that is not a child controller, there are some 
        controllers that are both parent controllers and child 
        controllers, and there are a number of child controllers 
        that are not parent controllers.  This is a system
        of multiple levels of hierarchies, in which one parent 
        controller controls or communicates with a first number 
        of child controllers, some of which are also parent 
        controllers, each of which controls or communicates with 
        a second number of child controllers, and so on.</t>

        <t>The parent controller receives requests for creating 
        end to end tunnels from users or applications.  
        For each request, the parent controller is responsible 
        for obtaining a path for the tunnel and
        creating the tunnel along the path through sending instructions 
        to the corresponding child controllers. </t>

      </section> <!-- Hierarchical Controllers -->
    </section> <!-- Architecture -->

    <section title="Application Scenarios">
      <t>This section introduces a set of scenarios to which 
      the controller can be applied.</t>

      <section title="Business-oriented Traffic Steering">
        <t>It is reasonable in commercial sense to provide multiple paths 
        to the same destination with differentiated experiences for 
        preferential users/services. 
        This is an efficient approach to maximize providers'
        network resource usage as well as their profit and offer more choices 
        to network users.</t>

        <section title="Preferential Users">
        <t>In the Figure below for an ISP network, 
        there are three kinds of users in Sydney,
        saying Gold, Silver and Bronze, and they wish to visit website
        located in HongKong.  The ISP provides three different paths with
        different experiences according to users' priority.  The Gold Users
        may use Path1 with less latency and loss.  The Silver Users may use
        the Path2 through Singapore with less latency but maybe some
        congestion there.  The Bronze Users may use Path3 through LA with
        some latency and loss.</t>
        <t><figure>
            <artwork align="center"><![CDATA[
                 +----------+
                 | HongKong |
               --+----------+--
            ---       |        ---
         ---          |           ---
       --             |              --
   +----------+       |         +----------+
   |Singapore |       |         |    LA    |
   +----------+       |         +----------+
       --             |Path1         --
         ---          |           ---
    Path2   ---       |        ---  Path3
               --+----------+--
                 |  Sydney  |
                 +----------+
                      |
                      |
          +-----------+-----------+
          |           |           |
      +-------+   +-------+   +-------+
      |Silver |   |Gold   |   |Bronze |
      |Users  |   |Users  |   |Users  |
      +-------+   +-------+   +-------+
]]></artwork>
          </figure></t>

        </section> <!-- Preferential Users -->

        <section title="Preferential Services">
        <t>As depicted in the Figure below, 
        the OTTSP has 3 exits with one ISP, which are
        located in City A, City B and City C.  The content is obtained from
        Content Server and send to the exits through AR. An OTTSP may make
        its steering strategy based on different services.  For example, the
        OTTSP in the Figure may choose exit R21 for video service and
        exit R22 for web service, which REQUIREs a mechanism/system exists to
        identify different services from traffic flow.</t>
        <t><figure>
            <artwork align="center"><![CDATA[
               *           *
        City A *  City B   * City C
               *           *
               *  +-----+  *
               *  |Users|  *
               *  +-----+  *
               *     |     *
         +-----------+-----------+
         |     *     |     *     |
      +-----+  *  +-----+  *  +-----+
      | R11 |-----| R12 |-----| R13 |
      +-----+  *  +-----+  *  +-----+  ISP
         |     *     |     *     |
    *****|***********|***********|*********
         |     *     |     *     |
         |     *     |     *     |     OTT
      +-----+  *  +-----+  *  +-----+
      | R21 |-----| R22 |-----| R23 |
      +-----+  *  +-----+  *  +-----+
         |     *     |     *     |
         +-----------+-----------+
               *     |     *
               *  +-----+  *     +-------+
               *  | AR  |--------|Content|
               *  +-----+  *     |Server |
                                 +-------+
]]></artwork>
          </figure></t>

        </section> <!-- Preferential Services -->
      </section> <!-- Business-oriented Steering -->

      <section title="Traffic Congestion Mitigation">
        <t>It is a persistent goal for providers to increase the utilization
        ratio of their current network resources, and to mitigate the traffic
        congestion.  Traffic congestion is possible to happen anywhere in the
        ISP network(MAN, IDC, core and the links between them), because
        internet traffic is hard to predict.  For example, there might be
        some local online events that the network operators didn't know
        beforehand, or some sudden attack just happened.  Even for the big
        events that can be predicted, such as annual online discount of
        e-commerce company, or IOS update of Apple Inc, we could not
        guarantee there is no congestion.  Since the network capacity
        expansion is usually an annual operation, there could be delay on any
        links of the engineering.  As a result, the temporary traffic
        steering is always needed.  The same thing happens to the OTT
        networks as well.</t>

        <t>It should be noted that, the traffic steering is absolutely not a
        global behavior.  It just acts on part of the network, and it's
        temporary. </t>

        <section title="Congestion Mitigation in Core">
        <t>As depicted in the Figure below, 
        traffic from MAN C1 to MAN D2 follows the path
        Core C->Core B->Core D as the primary path, but somehow the load
        ratio becomes too much.  It is reasonable to transfer some traffic
        load to less utilized path Core C->Core A->Core D when the primary
        path has congestion. </t>

        <t><figure>
            <artwork align="center"><![CDATA[
                               Core

                            +----------+
                            | Core A   |
   +------+               --+----------+--                +------+
   |MAN C1|-+          ---                ---           +-|MAN D1|
   +------+ |       ---                      ---        | +------+
            |     --                            --      |
            | +----------+                 +----------+ |
            +-| Core C   |                 |  Core D  |-+
            | +----------+                 +----------+ |
            |     --                            --      |
   +------+ |       ---                      ---        | +------+
   |MAN C2|-+          ---                ---           +-|MAN D2|
   +------+               --+----------+--                +------+
                            | Core B   |
                            +----------+
]]></artwork>
          </figure></t>

         </section> <!-- Congestion Mitigation in Core -->

        <section title="Congestion Mitigation among ISPs">
        <t>As depicted in the Figure below, 
        ISP1 and ISP2 are interconnect by 3 exits which
        are located in 3 cities respectively.  The links between ISP1 and
        ISP2 in the same city are called local links, and the rest are long
        distance links.  Traffic from IXP C1 to Core A in ISP 2 usually
        passes through link IXP C1->IXP A2->Core A.  This is a long distant
        route, directly connecting city C and city A.  Part of traffic could
        be transferred to link IXP.</t>

        <t><figure>
            <artwork align="center"><![CDATA[
                 *            *
         City A  *   City B   *  City C
                 *            *
       +-------+ *  +-------+ * +-------+
       |IXP A1 |----|IXP  B1|---|IXP C1 |
       +-------+ *  +-------+ * +-------+  ISP 1
          |      *      |     *   |  |
   *******|*************|*********|**|**********
          |  +----------|---------+  |
          |  |   *      |     *      |     ISP 2
          |  |   *      |     *      |
        +------+ *  +------+  * +------+
        |IXP A2|----|IXP B2|----|IXP C2|
        +------+ *  +------+  * +------+
          |      *      |     *      |
          |      *      |     *      |
       +-------+ *  +-------+ * +-------+
       |Core A |----|Core B |---|Core C |
       +-------+ *  +-------+ * +-------+
]]></artwork>
          </figure></t>

         </section> <!-- Congestion Mitigation among ISPs -->

         <section title="Congestion Mitigation at International Edge">
         <t>An ISP usually interconnects with more than 2 transit networks at 
         the international edge, so it is quite common that multiple paths may
         exist for the same foreign destination.  Usually those paths with
         better QoS properties such as latency, loss, jitter and etc are often
         preferred.  Since these properties keep changing from time to time,
         the decision of path selection has to be made dynamically.</t>

         <t>As depicted in the Figure below, 
         the traffic to the foreign destination H from IP
         core network (AS C1) has two choices on transit network, saying
         Transit A and Transit B.  Under normal conditions, Transit B is the
         primary choice, but Transit A will be preferred when the QoS of
         Transit B gets worse.  As a result, the same traffic will go through
         Transit A instead.</t>

        <t><figure>
            <artwork align="center"><![CDATA[
                 *            *
         City A  *   City B   *  City C
                 *            *
       +-------+ *  +-------+ * +-------+
       |IXP A1 |----|IXP  B1|---|IXP C1 |
       +-------+ *  +-------+ * +-------+  ISP 1
          |      *      |     *   |  |
   *******|*************|*********|**|**********
          |  +----------|---------+  |
          |  |   *      |     *      |     ISP 2
          |  |   *      |     *      |
        +------+ *  +------+  * +------+
        |IXP A2|----|IXP B2|----|IXP C2|
        +------+ *  +------+  * +------+
          |      *      |     *      |
          |      *      |     *      |
       +-------+ *  +-------+ * +-------+
       |Core A |----|Core B |---|Core C |
       +-------+ *  +-------+ * +-------+
]]></artwork>
          </figure></t>

         </section> <!-- Congestion Mitigation at International Edge -->
      </section> <!-- Traffic Congestion Mitigation -->
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>The interactions with a BGP-based controller are 
      similar to those with any other SDN controller. The
      security implications of SDN controller have not been fully 
      discussed or described.  
      Therefore, protocol and applicability for
      solutions around this architecture must take proper account of these
      concerns.</t>
    </section>


    <section anchor="IANA" title="IANA Considerations">
      <t>This document does not require any IANA actions.</t>
    </section>


    <section title="Acknowledgements">
      <t>The authors would like to thank 
  Chris Bowers, Jeff Tantsura
  for their valuable suggestions and comments on this draft.</t>
    </section>

    <section title="Contributors">
      <t><figure>
        <artwork align="center"><![CDATA[
       Nan Wu
       Huawei
       Email: eric.wu@huawei.com
]]></artwork>
      </figure></t>
    </section>

  </middle>


  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>
      <?rfc include="reference.RFC.1771"?>
      <?rfc include="reference.RFC.3107"?>
      <?rfc include="reference.RFC.4271"?>
      <?rfc include="reference.RFC.4456"?>
      <?rfc include="reference.RFC.5575"?>
      <?rfc include="reference.RFC.7752"?>

    </references>

    <references title="Informative References">
      <?rfc include='reference.I-D.ietf-idr-segment-routing-te-policy'?>
      <?rfc include='reference.I-D.ietf-pce-segment-routing'?>
      <?rfc include='reference.I-D.ietf-idr-flowspec-path-redirect'?>
      <?rfc include='reference.I-D.ietf-isis-segment-routing-extensions'?>
      <?rfc include='reference.I-D.ietf-spring-segment-routing'?>
      <?rfc include='reference.I-D.ietf-rtgwg-bgp-routing-large-dc'?>
      <?rfc include='reference.I-D.ietf-idr-bgpls-segment-routing-epe'?>
    </references>
  </back>
</rfc>
