<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
    which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
    There has to be one entity for each item to be referenced. 
    An alternate method (rfc include) is described in the references. -->
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2629.xml">
<!ENTITY RFC3552 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3552.xml">
<!ENTITY I-D.narten-iana-considerations-rfc2434bis SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.narten-iana-considerations-rfc2434bis.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
    please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
    (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
    (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="std" docName="draft-ietf-idr-performance-routing-04"
     ipr="trust200902">
  <front>
    <title abbrev="">Performance-based BGP Routing Mechanism</title>

    <author fullname="Xiaohu Xu" initials="X." surname="Xu">
      <organization>China Mobile</organization>

      <address>
        <!--
       <postal>
         <street></street>
-->

        <!-- Reorder these if your country does things differently -->

        <!--
         <city>Soham</city>

         <region></region>

         <code></code>

         <country>UK</country>
       </postal>

       <phone>+44 7889 488 335</phone>
-->

        <email>xuxiaohu_ietf@hotmail.com</email>

        <!-- uri and facsimile elements may also be added -->
      </address>
    </author>

    <author fullname="Shraddha Hegde" initials="S. " surname="Hegde">
      <organization>Juniper</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>shraddha@juniper.net</email>

        <uri/>
      </address>
    </author>

    <author fullname="Ketan Talaulikar" initials="K." surname="Talaulikar">
      <organization>Cisco</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>ketant@cisco.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Mohamed Boucadair" initials="M." surname="Boucadair">
      <organization>France Telecom</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>mohamed.boucadair@orange.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Christian Jacquenet" initials="C." surname="Jacquenet">
      <organization>France Telecom</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>christian.jacquenet@orange.com</email>

        <uri/>
      </address>
    </author>

    <!--

-->

    <date day="26" month="August" year="2024"/>

    <abstract>
      <t>The current BGP specification doesn't use network performance metrics
      (e.g., network latency) in the route selection decision process. This
      document describes a performance-based BGP routing mechanism in which
      network latency metric is taken as one of the route selection criteria.
      This routing mechanism is useful for those server providers with global
      reach to deliver low-latency network connectivity services to their
      customers.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>Network latency is widely recognized as one of major obstacles in
      migrating business applications to the cloud since cloud-based
      applications usually have very clearly defined and stringent network
      latency requirements. Service providers with global reach aim at
      delivering low-latency network connectivity services to their cloud
      service customers as a competitive advantage. Sometimes, the network
      connectivity may travel across more than one Autonomous System (AS)
      under their administration. However, the BGP [RFC4271] which is used for
      path selection across ASes doesn't use network latency in the route
      selection process. As such, the best route selected based upon the
      existing BGP route selection criteria may not be the best from the
      customer experience perspective.</t>

      <t>This document describes a performance-based BGP routing paradigm in
      which network latency metric is disseminated via a new TLV of the AIGP
      attribute [RFC7311] and that metric is used as an input to the route
      selection process. This mechanism is useful for those server providers
      with global reach, which usually own more than one AS, to deliver
      low-latency network connectivity services to their customers.</t>

      <t>Furthermore, in order to be backward compatible with existing BGP
      implementations and have no impact on the stability of the overall
      routing system, it's expected that the performance routing paradigm
      could coexist with the vanilla routing paradigm. As such, service
      providers could thus provide low-latency routing services while still
      offering the vanilla routing services depending on customers'
      requirements.</t>

      <t>For the sake of simplicity, this document considers only one network
      performance metric that's the network latency metric. The support of
      multiple network performance metrics is out of scope of this document.
      In addition, this document focuses exclusively on BGP matters and
      therefore all those BGP-irrelevant matters such as the mechanisms for
      measuring network latency are outside the scope of this document.</t>

      <t>A variant of this performance-based BGP routing is implemented (see
      http://www.ist-mescal.org/roadmap/qbgp-demo.avi).</t>

      <t/>
    </section>

    <section anchor="Terminology" title="Terminology">
      <t>This memo makes use of the terms defined in <xref
      target="RFC4271"/>.</t>

      <t>Network latency indicates the amount of time it takes for a packet to
      traverse a given network path [RFC2679]. Provided a packet was forwarded
      along a path which contains multiple links and routers, the network
      latency would be the sum of the transmission latency of each link (i.e.,
      link latency), plus the sum of the internal delay occurred within each
      router (i.e., router latency) which includes queuing latency and
      processing latency. The sum of the link latency is also known as the
      cumulative link latency. In today's service provider networks which
      usually span across a wide geographical area, the cumulative link
      latency becomes the major part of the network latency since the total of
      the internal latency happened within each high-capacity router seems
      trivial compared to the cumulative link latency. In other words, the
      cumulative link latency could approximately represent the network
      latency in the above networks.</t>

      <t>Furthermore, since the link latency is more stable than the router
      latency, such approximate network latency represented by the cumulative
      link latency is more stable. Therefore, if there was a way to calculate
      the cumulative link latency of a given network path, it is strongly
      recommended to use such cumulative link latency to approximately
      represent the network latency. Otherwise, the network latency would have
      to be measured frequently by some means (e.g., PING or other measurement
      tools).</t>
    </section>

    <section anchor="Advertising" title="Performance Route Advertisement">
      <t>Performance (i.e., low latency) routes SHOULD be exchanged between
      BGP peers by means of a specific Subsequent Address Family Identifier
      (SAFI) of TBD (see IANA Section) and also be carried as labeled routes
      as per [RFC3107]. In other word, performance routes can then be looked
      as specific labeled routes which are associated with network latency
      metric.</t>

      <t>A BGP speaker SHOULD NOT advertise performance routes to a particular
      BGP peer unless that peer indicates, through BGP capability
      advertisement (see Section 4), that it can process update messages with
      that specific SAFI field.</t>

      <t>Network latency metric is attached to the performance routes via a
      new TLV of the AIGP attribute, referred to as NETWORK_LATENCY TLV. The
      value of this TLV indicates the network latency in microseconds from the
      BGP speaker depicted by the NEXT_HOP path attribute to the address
      depicted by the NLRI prefix. The type code of this TLV is TBD (see IANA
      Section), and the value field is 4 octets in length. In some abnormal
      cases, if the cumulative link latency exceeds the maximum value of
      0xFFFFFFFF, the value field SHOULD be set to 0xFFFFFFFF. Note that the
      NETWORK_LATENCY TLV MUST NOT co-exisit with the AIGP TLV within the same
      AIGP attribute.</t>

      <t>A BGP speaker SHOULD be configurable to enable or disable the
      origination of performance routes. If enabled, a local latency value for
      a given to-be-originated performance route MUST be configured to the BGP
      speaker so that it can be filled to the NETWORK_LATENCY TLV of that
      performance route.</t>

      <t>A BGP speaker that is enabled to process NETWORK_LATENCY, but it was
      not provisioned with the local latency value SHOULD remove the
      NETWORK_LATENCY attribute when it advertises the corresponding route
      downstream.</t>

      <t>When distributing a performance route learnt from a BGP peer, if this
      BGP speaker has set itself as the NEXT_HOP of such route, the value of
      the NETWORK_LATENCY TLV SHOULD be increased by adding the network
      latency from itself to the previous NEXT_HOP of such route. Otherwise,
      the NETWORK_LATENCY TLV of such route MUST NOT be modified.</t>

      <t>As for how to obtain the network latency to a given BGP NEXT_HOP is
      outside the scope of this document. However, note that the path latency
      to the NEXT HOP SHOULD approximately represent the network latency of
      the exact forwarding path towards the NEXT_HOP. For example, if a BGP
      speaker uses a Traffic Engineering (TE) Label Switching Path (LSP) from
      itself to the NEXT_HOP, rather than the shortest path calculated by
      Interior Gateway Protocol (IGP), the latency to the NEXT HOP SHOULD
      reflect the network latency of that TE LSP path, rather than the IGP
      shortest path. In the case where the latency to the NEXT HOP could not
      be obtained due to some reason(s), that latency SHOULD be set to
      0xFFFFFFFF by default.</t>

      <t>To keep performance routes stable enough, a BGP speaker SHOULD use a
      configurable threshold for network latency fluctuation to avoid sending
      any update which would otherwise be triggered by a minor network latency
      fluctuation below that threshold.</t>
    </section>

    <section title="Capability Advertisement">
      <t>A BGP speaker that uses multiprotocol extensions to advertise
      performance routes SHOULD use the Capabilities Optional Parameter, as
      defined in [RFC5492], to inform its peers about this capability.</t>

      <t>The MP_EXT Capability Code, as defined in [RFC4760], is used to
      advertise the (AFI, SAFI) pairs available on a particular
      connection.</t>

      <t>A BGP speaker that implements the Performance Routing Capability MUST
      support the BGP Labeled Route Capability, as defined in [RFC3107]. A BGP
      speaker that advertises the Performance Routing Capability to a peer
      using BGP Capabilities advertisement [RFC5492] does not have to
      advertise the BGP Labeled Route Capability to that peer.</t>
    </section>

    <section title="Performance Route Selection">
      <t>Performance route selection only requires the following modification
      to the tie-breaking procedures of the BGP route selection decision
      (phase 2) described in [RFC4271]: network latency metric comparison
      SHOULD be executed just ahead of the AS-Path Length comparison step.
      Prior to executing the network latency metric comparison, the value of
      the NETWORK_LATENCY TLV SHOULD be increased by adding the network
      latency from the BGP speaker to the NEXT_HOP of that route.</t>

      <t>The Loc-RIB of the performance routing paradigm is independent from
      that of the vanilla routing paradigm. Accordingly, the routing table of
      the performance routing paradigm is independent from that of the vanilla
      routing paradigm. Whether the performance routing paradigm or the
      vanilla routing paradigm would be applied to a given packet is a local
      policy issue which is outside the scope of this document. For example,
      by leveraging the Cos-Based Forwarding (CBF) capability which allows
      routers to have distinct routing and forwarding tables for each type of
      traffic, the selected performance routes could be installed in the
      routing and forwarding tables corresponding to high-priority
      traffic.</t>

      <section title="Deployment Considerations">
        <t>This section is not normative.</t>

        <t>Enabling the performance-based BGP routing at large (i.e., among
        domains that do not belong to the same administrative entity) may be
        conditioned by other administrative settlement considerations that are
        out of scope of this document. Nevertheless, this document does not
        require nor exclude activating the proposed route selection scheme
        between domains that are managed by distinct administrative
        entities.</t>

        <t>The main deployment case targeted by this specification is where
        involved domains are managed by the same administrative entity.
        Concretely, this performance-based BGP routing mechanism can
        advantageously be enabled in a multi-domain environment, where all the
        involved domains are operated by the same administrative entity so
        that the processing of the low latency routes can be consistent
        throughout the domains. Besides security considerations that may arise
        (and which are further discussed in Section 9), there is indeed a need
        to consistently enforce a low-latency-based BGP routing policy within
        a set of domains that belong to the same administrative entity. This
        is motivated by the processing of traffic which is of very different
        nature and which may have different QoS requirements. Moreover, the
        combined use of BGP-inferred low latency information with traffic
        engineering tools that would lead to the computation and the
        establishment of traffic-engineered LSP paths between "low
        latency"-enabled BGP peers based upon the manipulation of the
        Unidirectional Link delay sub-TLV [RFC7810] [RFC7471] would contribute
        to guarantee the overall consistency of the low latency information
        within each domain. Furthmore, a BGP color extended community could be
        attached to the performance routes so as to associates a low-latency
        Segment Routing (SR) LSP towards the BGP NEXT_HOP with these
        low-latency BGP routes, in this way, those traffic matching the
        low-latency BGP routes would be forwarded to the BGP NEXT_HOP via the
        low-latency SR LSP towards that BGP NEXT_HOP.</t>

        <t>In network environments where router reflectors are deployed but
        next-hop-self is disabled on them, route reflectors usually reflect
        those received routes which are optimal (i.e., lowest latency) from
        their perspectives but may not be optimal from the receivers'
        perspectives. Some existing solutions as described in [RFC7911],
        [I-D.ietf-idr-bgp-optimal-route-reflection] and [RFC6774] can be used
        to address this issue.</t>

        <t>From a network provider perspective, the ability to manipulate low
        latency routes may lead to different, presumably service-specific
        designs. In particular, there is a need to assess the impact of using
        such capability on the overall performance of the BGP peers from a
        route computation and selection procedure as a function of the
        tie-breaking operation. A typical use case would consist in selecting
        low latency routes for traffic that for example pertains to the VoIP,
        or whose nature demands the selection of the lowest latency route in
        the Adj-RIB-Out database of the corresponding BGP peers. Typically,
        live broadcasting services or some e-health services could certainly
        take advantage of such capability. It is out of scope of this document
        to exhaustively elaborate on such service-specific designs that are
        obviously deployment-specific.</t>
      </section>
    </section>

    <section title="Contributors">
      <figure>
        <artwork><![CDATA[   Ning So
   Reliance
   Email: Ning.So@ril.com


   Yimin Shen
   Juniper
   Email: yshen@juniper.net


   Uma Chunduri
   Huawei
   Email: uma.chunduri@huawei.com


   Hui Ni
   Huawei
   Email: nihui@huawei.com


   Yongbing Fan
   China Telecom
   Email: fanyb@gsta.com


   Luis M. Contreras
   Telefonica I+D
   Email: luismiguel.contrerasmurillo@telefonica.com
]]></artwork>
      </figure>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>Thanks to Joel Halpern, Alvaro Retana, Jim Uttaro, Robert Raszuk,
      Eric Rosen, Bruno Decraene, Qing Zeng, Jie Dong, Mach Chen, Saikat Ray,
      Wes George, Jeff Haas, John Scudder, Stephane Litkowski and Sriganesh
      Kini for their valuable comments on this document. Special thanks should
      be given to Jim Uttaro and Eric Rosen for their proposal of using a new
      TLV of the AIGP attribute to convey the network latency metric.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>A new BGP Capability Code for the Performance Routing Capability, a
      new SAFI specific for performance routing and a new type code for
      NETWORK_LATENCY TLV of the AIGP attribute are required to be allocated
      by IANA.</t>

      <!---->
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>In addition to the considerations discussed in [RFC4271], the
      following items should be considered as well:</t>

      <t><list style="letters">
          <t>Tweaking the value of the NETWORK_LATENCY by an illegitimate
          party may influence the route selection results. Therefore, the
          Performance Routing Capability negotiation between BGP peers which
          belong to different administration domains MUST be disabled by
          default. Furthermore, a BGP speaker MUST discard all performance
          routes received from the BGP peer for which the Performance Routing
          Capability negotiation has been disabled.</t>

          <t>Frequent updates of the NETWORK_LATENCY TLV may have a severe
          impact on the stability of the routing system. Such practice SHOULD
          be avoided by setting a reasonable threshold for network latency
          fluctuation.</t>
        </list></t>

      <!---->
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include='reference.RFC.2119'?>

      <?rfc include='reference.RFC.4271'?>

      <?rfc include='reference.RFC.5492'?>

      <?rfc include='reference.RFC.4760'?>

      <?rfc include='reference.RFC.3107'?>

      <!---->
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.2679'?>

      <?rfc include='reference.RFC.3630'?>

      <?rfc include='reference.RFC.5305'?>

      <?rfc include='reference.RFC.6774'?>

      <?rfc include='reference.I-D.ietf-idr-bgp-optimal-route-reflection'?>

      <?rfc include='reference.I-D.ietf-spring-segment-routing-policy'?>

      <?rfc include='reference.RFC.7911'?>

      <?rfc include='reference.RFC.7471'?>

      <?rfc include='reference.RFC.7810'?>

      <?rfc ?>

      <!---->
    </references>
  </back>
</rfc>
