<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="rfc7991bis.rnc"?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>

<rfc xmlns:xi="http://www.w3.org/2001/XInclude" docName="draft-ietf-rmcat-rtp-cc-feedback-12" number="9392" submissionType="IETF" category="info" consensus="true" ipr="trust200902" tocInclude="true" sortRefs="true" symRefs="true" xml:lang="en" updates="" obsoletes="" version="3">

  <front>
    <title abbrev="RTCP Feedback for Congestion Control">
      Sending RTP Control Protocol (RTCP) Feedback for Congestion Control in 
      Interactive Multimedia Conferences
    </title>
    <seriesInfo name="RFC" value="9392"/>
    <author fullname="Colin Perkins" initials="C." surname="Perkins">
      <organization>University of Glasgow</organization>
      <address>
        <postal>
          <extaddr>School of Computing Science</extaddr>
          <city>Glasgow</city>
          <code>G12 8QQ</code>
          <country>United Kingdom</country>
        </postal>
        <email>csp@csperkins.org</email>
      </address>
    </author>
    <date month="April" year="2023" />
    <area>tsv</area>
    <workgroup>rmcat</workgroup>
    <keyword>RTP</keyword>
    <keyword>Congestion Control</keyword>
    <keyword>VoIP</keyword>
    <keyword>Video Conferencing</keyword>

  <abstract>
      <t>
	This memo discusses the rate at which congestion control feedback can
	be sent using the RTP Control Protocol (RTCP) and the suitability of RTCP for
	implementing congestion control for unicast multimedia applications.
      </t>
    </abstract>
  </front>
  <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
  <middle>
    <section title="Introduction">
      <t>
        The deployment of WebRTC systems <xref target="RFC8825"/> has resulted 
        in high-quality video conferencing seeing extremely wide use.  To ensure 
        the stability of the network in the face of this use, WebRTC systems 
        need to use some form of congestion control for their RTP-based media
        traffic <xref target="RFC2914"/> <xref target="RFC8083"/> 
	<xref target="RFC8085"/> <xref target="RFC8834"/>, allowing them to
        adapt and adjust the media data they send to match
        changes in the available network capacity. In addition to ensuring
        the stable operation of the network, such adaptation is critical to
        ensuring a good user experience, since it allows the sender to match
        the media to the network capacity, rather than forcing the receiver
        to compensate for uncontrolled packet loss when the available capacity
        is exceeded.
      </t>

      <t>
        To develop such congestion control, it is necessary to
        understand the sort of congestion feedback that can be provided within
        the framework of RTP <xref target="RFC3550"/> and the RTP Control 
        Protocol (RTCP). It then becomes possible to determine if this is 
        sufficient for congestion control or if some form of RTP extension
        is needed.
      </t>

      <t>
        As this memo will show, if it is desired to use RTCP in something
        close to its current form for congestion feedback, the multimedia
        congestion control algorithm needs to be designed to work with
        detailed feedback sent every few frames, rather than per-frame
        acknowledgement, to match the constraints of RTCP.
      </t>

      <t>
        This memo considers unicast congestion feedback that can be sent using
        RTCP under the RTP/SAVPF profile <xref target="RFC5124"/> (the secure
        version of the RTP/AVPF profile <xref target="RFC4585"/>).
	This
        profile was chosen because it forms the basis for media transport in WebRTC
        <xref target="RFC8834"/> systems. However, nothing in this memo is specific to
        the secure version of the profile or to WebRTC. It is also
        assumed that the congestion control feedback mechanism described in
        <xref target="RFC8888"/> and common RTCP extensions for efficient feedback
        <xref target="RFC5506"/> <xref target="RFC8108"/>
        <xref target="RFC8861"/> <xref target="RFC8872"/> are used.
      </t>
      <section title="Terminology">
	<dl newline="false" spacing="normal">
	  <dt>Nr:</dt> <dd>number of frames between feedback reports</dd>
	  <dt>Nrs:</dt> <dd>number of reduced-size RTCP packets send for every compound RTCP packet</dd>
	  <dt>Na:</dt> <dd>number of audio packets per report</dd>
	  <dt>Nv:</dt> <dd>number of video packets per reports</dd>
	  <dt>Sc:</dt> <dd>size of a compound RTCP packet</dd>
	  <dt>Srs:</dt> <dd>size of a reduced-size RTCP packet</dd>
	  <dt>Tf:</dt> <dd>duration of a media frame in seconds </dd>
	  <dt>Rf:</dt> <dd>frame rate 1/Tf</dd>
</dl>
</section>
    </section>

    <section title="Considerations for RTCP Feedback">
      <t>
        Several questions need to be answered when providing RTCP 
        feedback for congestion control purposes. These include:
      </t>
      <ul>
        <li> How often is feedback needed? </li>
        <li> How much overhead is acceptable? </li>
        <li> How much and what data does each report contain? </li>
      </ul>
      <t>
	However, the key question is as follows: how often does the receiver need
	to send feedback on the reception quality it is experiencing and hence the
	congestion state of the network?
      </t>

      <t>
        Widely used transport protocols, such as TCP, send acknowledgements
        frequently. For example, a TCP receiver will send an acknowledgement
        at least once every 0.5 seconds or when new data equal to twice the maximum
        segment size has been received <xref target="RFC9293"/>.
        That has relatively low overhead when traffic is bidirectional
        and acknowledgements can be piggybacked onto return path data packets.
        It can also be acceptable, and can have reasonable overhead, to send
        separate acknowledgement packets when those packets are much smaller
        than data packets.
      </t>

      <t>
        Frequent acknowledgements can become a problem, however, when there
        is no return traffic on which to piggyback feedback or if separate
        feedback and data packets are sent and the feedback is similar in
        size to the data being acknowledged. This can be the case for some
        forms of media traffic, especially for Voice over IP (VoIP) flows, leading
        to high overhead when using a transport protocol that sends frequent
        feedback. Approaches like in-network filtering of acknowledgements
        that have been proposed to reduce acknowledgement overheads on highly
        asymmetric links (e.g., as mentioned in <xref target="RFC3449"/>)
        can also reduce the feedback frequency and overhead for multimedia traffic, but this
        so-called "stretch-ACK" behavior is nonstandard and not guaranteed.
      </t>

      <t>
        Accordingly, when implementing congestion control for RTP-based multimedia traffic,
        it might make sense to give the option of sending congestion feedback less often
        than TCP does.  For example, it might be possible to send a feedback packet
        once per video frame, every few frames, or once per network round-trip
        time (RTT). This could still give sufficiently frequent feedback for
        the congestion control loop to be stable and responsive while keeping
        the overhead reasonable when the feedback cannot be piggybacked onto
        returning data. In this case, it is important to note that RTCP can
        send much more detailed feedback than simple acknowledgements.
	For example, if it were useful, it could be possible to use an RTCP extended
	report (XR) packet <xref target="RFC3611"/> to send feedback once per RTT;
	the feedback could comprise a
	bitmap of lost and received packets, with reception times, over that
	RTT. As long as feedback is sent
        frequently enough that the control loop is stable and the sender is
        kept informed when data leaves the network (to provide an equivalent
        to acknowledgement (ACK) clocking in TCP), it is not necessary to report on every packet
        at the instant it is received. Indeed, it is unlikely that a video
        codec can react instantly to a rate change, and there is little 
        point in providing feedback more often than the codec can adapt.
        This suggests that an RTP receiver needs to be configured to provide
        feedback at a rate that matches the rate of adaptation of the sender.
        In the best case, this will match the media frame rate but might
        often be slower.
      </t>

      <t>
        Reducing the feedback frequency compared to TCP will reduce feedback
        overhead but will lead multimedia flows to adapt to congestion more
        slowly than TCP, raising concerns about inter-flow fairness. Similar
        concerns are noted in <xref target="RFC5348"/>, and accordingly, the
        congestion control algorithm described therein aims for "reasonable"
        fairness and a sending rate that is "generally within a factor of
        two" of what TCP would achieve under the same conditions.  It is
        to be noted, however, that TCP exhibits inter-flow unfairness when
        flows with differing round-trip times compete, and stretch
        acknowledgements due to in-network traffic manipulation are not
        uncommon and also raise fairness concerns. Implementations need
        to balance potential unfairness against feedback overhead.
      </t>

      <t>
        Generating and processing feedback consumes resources at the sender
        and receiver. The feedback packets also incur forwarding costs, contribute
        to link utilization, and can affect the timing of other traffic on the
        network. This can affect performance on some types of networks that can be
        impacted by the rate, timing, and size of feedback packets, as well as
        the overall volume of feedback bytes.
      </t>

      <t>
        The amount of overhead due to congestion control feedback that is
        considered acceptable has to be determined.  RTCP feedback is sent in
        separate packets to RTP data, and this has some cost in terms of
        additional header overhead compared to protocols that piggyback
        feedback on return path data packets. The RTP standards have long said
        that a 5% overhead for RTCP traffic is generally acceptable. Is this still
	the case
        for congestion control feedback? Is there a desire to provide
        more responsive feedback and congestion control, possibly with a
        higher overhead? Or is lower overhead wanted, accepting that this
        might reduce responsiveness of the congestion control algorithm?
      </t>

      <t>
        Finally, the details of how much and what data is to be sent in 
        each report will affect the frequency and/or overhead of feedback.
        There is a fundamental trade-off that the more frequently feedback
        packets are sent, the less data can be included in each packet to
        keep the overhead constant. Does the congestion control need a high
        rate but simple feedback (e.g., like TCP acknowledgements), or is
        it acceptable to send more complex feedback less often? 
        Is it useful for the congestion control to receive frequent feedback,
        perhaps to provide more accurate round-trip time estimates, or to
        provide robustness in case feedback packets are lost, even if the
        media sending rate cannot quickly be changed? Or is low-rate feedback,
        resulting in slowly responsive changes to the sending rate, acceptable?
        Different combinations of the congestion control algorithm and media
        codec might require different trade-offs, and the correct trade-off
        for interactive, self-paced, real-time multimedia traffic might not
        be the same as that for TCP congestion control.
      </t>
    </section>

    <section title="What Feedback is Achievable with RTCP?">
      <t>
        The following sections illustrate how the RTCP congestion control
        feedback report <xref target="RFC8888"/> can be used in different
        scenarios and illustrate the overheads of this approach.
      </t>

      <section title="Scenario 1: Voice Telephony" anchor="sec-p2p-audio">
        <t>
          In many ways, point-to-point voice telephony is the simplest
          scenario for congestion control, since there is only a single
          media stream to control. It's complicated, however, by severe 
          bandwidth constraints on the feedback, to keep the overhead 
          manageable. 
        </t>

        <t>
          Assume a two-party, point-to-point VoIP call, using RTP
          over UDP/IP. A rate-adaptive speech codec, such as Opus, is used,
          encoded into RTP packets in frames of a duration of Tf seconds (Tf =
          0.020 s in many cases, but values up to 0.060 s are not uncommon). The
          congestion control algorithm requires feedback every Nr frames,
          i.e., every Nr * Tf seconds, to ensure effective control.  Both
          parties in the call send speech data or comfort noise with
          sufficient frequency that they are counted as senders for the
          purpose of the RTCP reporting interval calculation.
        </t>

        <t>
          RTCP feedback packets can be full (compound) RTCP feedback
          packets or reduced-size RTCP packets <xref target="RFC5506"/>. 
          A compound RTCP packet is sent once for every Nrs reduced-size
          RTCP packets. 
        </t>

        <t>
          Compound RTCP packets contain a Sender Report (SR) packet, a
          Source Description (SDES) packet, and an RTP Congestion Control 
          Feedback (CCFB) packet <xref target="RFC8888"/>. Reduced-size
          RTCP packets contain only the CCFB packet. Since each participant
          sends only a single RTP media stream, the extensions for RTCP report
          aggregation <xref target="RFC8108"/> and reporting group optimization 
          <xref target="RFC8861"/> are not used. 
        </t>

        <t>
          Within each compound RTCP packet, the SR packet will contain a
          sender information block (28 octets) and a single reception
          report block (24 octets), for a total of 52 octets.  A minimal
          SDES packet will contain a header (4 octets), a single chunk
          containing a synchronization source (SSRC) (4 octets), and a CNAME item, and if the
          recommendations for choosing the CNAME <xref target="RFC7022"/>
          are followed, the CNAME item will comprise a 2-octet header, 16
          octets of data, and 2 octets of padding, for a total SDES packet
          size of 28 octets.
	  The CCFB packets contain an RTCP header
          and SSRC (8 octets), a report timestamp (4 octets), the other party's
          SSRC, beginning and ending sequence numbers (8 octets), and 2 * Nr
          octets of reports, for a total of 20 + (2 * Nr) octets.
          The compound Secure RTCP (SRTCP) packet will include 4 octets of trailer,
          followed by an 80-bit (10-octet) authentication tag if HMAC-SHA1
          authentication is used.
          If IPv4 is used, with no IP options, the UDP/IP header will be
          28 octets in size. This gives a total compound RTCP packet size
          of Sc = 142 + (2 * Nr) octets.
        </t>
        <t>
          The reduced-size RTCP packets will comprise just the CCFB packet,
          SRTCP trailer and authentication tag, and a UDP/IP header. It can
          be seen that these packets will be Srs = 62 + (2 * Nr) octets in size.
        </t>

        <t>
          The RTCP reporting interval calculation (Sections <xref target="RFC3550" section="6.2" sectionFormat="bare"/> and <xref target="RFC3550" section="6.3" sectionFormat="bare"/> of <xref target="RFC3550"/> and <xref target="RFC4585"/>) for a two-party session where both participants 
          are senders reduces to:
        </t>
        <artwork name="" type="" align="left"><![CDATA[
   Trtcp = n * Srtcp / Brtcp
        ]]></artwork>
        <t>
          where Srtcp = (Sc + Nrs * Srs) / (1 + Nrs) is the average RTCP packet
          size in octets, Brtcp is the bandwidth allocated to RTCP in octets
          per second, and n is the number of participants in the RTP session
          (in this scenario, n = 2).
        </t>

        <t>
          To ensure an RTCP report containing congestion control feedback is
          sent after every Nr frames of audio, it is necessary to set the RTCP
          reporting interval to Trtcp = Nr * Tf, which when substituted into the
          previous, gives Nr * Tf = n * Srtcp / Brtcp.
          Solving this to give the RTCP bandwidth (Brtcp) and expanding the
          definition of Srtcp gives:
        </t>
        <artwork name="" type="" align="left"><![CDATA[
   Brtcp = (n * (Sc + Nrs * Srs)) / (Nr * Tf * (1 + Nrs))
        ]]></artwork>

        <t>
          If we assume every report is a compound RTCP packet (i.e., Nrs = 0),
          the frame duration is Tf = 20 ms, and an RTCP report is sent for every
          second frame (i.e., 25 RTCP reports per second), this gives an RTCP
          feedback bandwidth of Brtcp = 57 kbps. Increasing the frame duration
          or reducing the frequency of reports will reduce the RTCP bandwidth,
          as shown in <xref target="voip_rtcp_bw"/>.
        </t>

        <table anchor='voip_rtcp_bw'>
          <name>RTCP Bandwidth Needed for VoIP Feedback (Compound Reports Only)</name>
          <thead>
            <tr>
              <th align='center'>Tf (seconds)</th>
              <th align='center'>Nr (frames)</th>
              <th align='center'>rtcp_bw (kbps)</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td> 0.020 </td>
              <td>  2 </td>
              <td> 57.0 </td>
            </tr>
            <tr>
              <td> 0.020 </td>
              <td>  4 </td>
              <td> 29.3 </td>
            </tr>
            <tr>
              <td> 0.020 </td>
              <td>  8 </td>
              <td> 15.4 </td>
            </tr>
            <tr>
              <td> 0.020 </td>
              <td> 16 </td>
              <td>  8.5 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td>  2 </td>
              <td> 19.0 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td>  4 </td>
              <td>  9.8 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td>  8 </td>
              <td>  5.1 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td> 16 </td>
              <td>  2.8 </td>
            </tr>
          </tbody>
        </table>

        <t>
          The final row of <xref target="voip_rtcp_bw"/> (60 ms frames, reporting
          every 16 frames) sends RTCP reports once per second, giving an RTCP
          bandwidth overhead of 2.8 kbps.
        </t>

        <t>
          The overhead can be reduced by sending some reports in reduced-size
          RTCP packets <xref target="RFC5506"/>. For example, if we alternate
          compound and reduced-size RTCP packets, i.e., Nrs = 1, the calculation
          gives the results shown in <xref target="voip_rtcp_bw_non-compound"/>.
        </t>

        <table anchor='voip_rtcp_bw_non-compound'>
          <name>Required RTCP Bandwidth for VoIP Feedback (Alternating Compound and Reduced-Size Reports)</name>
          <thead>
            <tr>
              <th align='center'>Tf (seconds)</th>
              <th align='center'>Nr (frames)</th>
              <th align='center'>rtcp_bw (kbps)</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td> 0.020 </td>
              <td>  2 </td>
              <td> 41.4 </td>
            </tr>
            <tr>
              <td> 0.020 </td>
              <td>  4 </td>
              <td> 21.5 </td>
            </tr>
            <tr>
              <td> 0.020 </td>
              <td>  8 </td>
              <td> 11.5 </td>
            </tr>
            <tr>
              <td> 0.020 </td>
              <td> 16 </td>
              <td>  6.5 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td>  2 </td>
              <td> 13.8 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td>  4 </td>
              <td>  7.2 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td>  8 </td>
              <td>  3.8 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td> 16 </td>
              <td>  2.2 </td>
            </tr>
          </tbody>
        </table>

        <t>
          The RTCP bandwidth needed for 60 ms frames, reporting every 16 
          frames (once per second), can be seen to drop to 2.2 kbps. This
          calculation can be repeated for other patterns of compound and
          reduced-size RTCP packets, feedback frequency, and frame duration,
          as needed.
        </t>
	
        <aside><t>
          Note: To achieve the RTCP transmission intervals above, the
          RTP/SAVPF profile with T_rr_interval=0 is used, since even when
          using the reduced minimal transmission interval, the RTP/SAVP
          profile would only allow sending RTCP at most every 0.11 s (every
          third frame of video). Using RTP/SAVPF with T_rr_interval=0,
	  however, enables full utilization of the configured 5% RTCP bandwidth
	  fraction.
        </t> </aside>

        <t>
          The use of IPv6 will increase the overhead by 20 octets per packet,
          due to the increased size of the IPv6 header compared to IPv4,
          assuming no IP options in either case. This increases the size
          of compound packets to Sc = 162 + (2 * Nr) octets and reduced-size
          packets to Srs = 82 + (2 * Nr). Rerunning the calculations from
          <xref target="voip_rtcp_bw"/> with these packet sizes gives the
          results shown in <xref target="voip_rtcp_bw_ipv6"/>.
          As can be seen, there is a significant increase in overhead
          due to the use of IPv6.
        </t>
        <table anchor='voip_rtcp_bw_ipv6'>
          <name>RTCP Bandwidth Needed for VoIP Feedback (Compound Reports Only) Using IPv6</name>
          <thead>
            <tr>
              <th align='center'>Tf (seconds)</th>
              <th align='center'>Nr (frames)</th>
              <th align='center'>rtcp_bw (kbps)</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td> 0.020 </td>
              <td>  2 </td>
              <td> 64.8 </td>
            </tr>
            <tr>
              <td> 0.020 </td>
              <td>  4 </td>
              <td> 33.2 </td>
            </tr>
            <tr>
              <td> 0.020 </td>
              <td>  8 </td>
              <td> 17.4 </td>
            </tr>
            <tr>
              <td> 0.020 </td>
              <td> 16 </td>
              <td>  9.5 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td>  2 </td>
              <td> 21.6 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td>  4 </td>
              <td> 11.1 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td>  8 </td>
              <td>  5.8 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td> 16 </td>
              <td>  3.2 </td>
            </tr>
          </tbody>
        </table>

        <t>
          Repeating the calculations from <xref target="voip_rtcp_bw_non-compound"/>
          using IPv6 gives the results shown in <xref target="voip_rtcp_bw_non-compound_ipv6"/>.
          As can be seen, the overhead still increases with IPv6 when 
          a mix of compound and reduced-size reports is used, but the
          effect is less pronounced than with compound reports only.
        </t>
        <table anchor='voip_rtcp_bw_non-compound_ipv6'>
          <name>Required RTCP Bandwidth for VoIP Feedback (Alternating Compound and Reduced-Size Reports) Using IPv6</name>
          <thead>
            <tr>
              <th align='center'>Tf (seconds)</th>
              <th align='center'>Nr (frames)</th>
              <th align='center'>rtcp_bw (kbps)</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td> 0.020 </td>
              <td>  2 </td>
              <td> 49.2 </td>
            </tr>
            <tr>
              <td> 0.020 </td>
              <td>  4 </td>
              <td> 25.4 </td>
            </tr>
            <tr>
              <td> 0.020 </td>
              <td>  8 </td>
              <td> 13.5 </td>
            </tr>
            <tr>
              <td> 0.020 </td>
              <td> 16 </td>
              <td>  7.5 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td>  2 </td>
              <td> 16.4 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td>  4 </td>
              <td>  8.5 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td>  8 </td>
              <td>  4.5 </td>
            </tr>
            <tr>
              <td> 0.060 </td>
              <td> 16 </td>
              <td>  2.5 </td>
            </tr>
          </tbody>
        </table>

      </section>

      <section title="Scenario 2: Point-to-Point Video Conference" anchor="sec-p2p-video">
        <t>
          Consider a point-to-point
          video call between two end systems. There will be four RTP flows in 
          this scenario (two audio and two video), with all four flows being
          active for essentially all the time (the audio flows will likely
          use voice activity detection and comfort noise to reduce the packet 
          rate during silent periods, but this does not cause the transmissions to
          stop). 
        </t>

        <t>
          Assume all four flows are sent in a single RTP session, each using
          a separate SSRC. The RTCP reports from the co-located audio and video 
          SSRCs at each end point are aggregated <xref target="RFC8108"/>,
          the optimizations in <xref target="RFC8861"/> are used, and RTCP
          congestion control feedback is sent <xref target="RFC8888"/>.
        </t>

        <t>
          As in <xref target="sec-p2p-audio"/>, when all members are senders,
          the RTCP reporting interval calculation in Sections <xref target="RFC3550" section="6.2"
          sectionFormat="bare"/> and <xref target="RFC3550"
          section="6.3" sectionFormat="bare"/>
          <xref target="RFC3550"/> and in <xref target="RFC4585"/> reduces to:
        </t>
        <artwork name="" type="" align="left"><![CDATA[
   Trtcp = n * Srtcp / Brtcp
        ]]></artwork>
        <t>
          where n is the number of members in the session, Srtcp is the
          average RTCP packet size in octets, and Brtcp is the RTCP
          bandwidth in octets per second.
        </t>

        <t>
          The average RTCP packet size (Srtcp) depends on the amount of feedback
          sent in each RTCP packet, the number of members in the
          session, the size of source description (RTCP SDES) information
          sent, and the amount of congestion control feedback sent in each
          packet.
        </t>

        <t>
          As a baseline, each RTCP packet will be a compound RTCP packet that
          contains an aggregate of a compound RTCP packet generated by the 
          video SSRC and a compound RTCP packet generated by the audio SSRC.
          When the RTCP reporting group extensions are used, one of these
          SSRCs will be a reporting SSRC, to which the other SSRC will have
          delegated its reports. No reduced-size RTCP packets are sent.
        </t>

        <t>
          The aggregated compound RTCP packet from the non-reporting SSRC
          will contain an RTCP SR packet, an RTCP SDES packet, and an RTCP
          Reporting Group Reporting Sources (RGRS) packet. The RTCP SR packet
	  contains the 28-octet UDP/IP header 
          (assuming IPv4 with no options) and 
          sender information but no report blocks (since the reporting is
          delegated). The RTCP SDES packet will comprise a header (4 octets),
          the originating SSRC (4 octets), a CNAME chunk, a terminating chunk, 
          and any padding.  If the CNAME follows <xref target="RFC7022"/> and
          <xref target="RFC8834"/>, the CNAME chunk will be 18 octets in
          size and will be followed by one octet of padding and one terminating
          null octet to align the SDES packet to a 32-bit boundary
          (<xref target="RFC3550" sectionFormat="comma" section="6.5"/>), making the SDES packet 28
          octets in size. The RTCP RGRS packet will be 12 octets in size. 
          This gives a total of 28 + 28 + 12 = 68 octets.
        </t>

        <t>
          The aggregated compound RTCP packet from the reporting SSRC will
          contain an RTCP SR packet, an RTCP SDES packet, and an RTCP 
          congestion control feedback packet. 
          The RTCP SR packet will contain two report blocks, one for each of
          the remote SSRCs (the report for the other local SSRC is suppressed
          by the reporting group extension), for a total of 28 + (2 * 24) =
          76 octets. The RTCP SDES packet will
	  comprise a header (4 octets), originating SSRC (4 octets), a CNAME
	  chunk, a Reporting Group (RGRP) chunk, a terminating chunk, and any
	  padding.  If the CNAME follows <xref target="RFC7022"/> and <xref
	  target="RFC8834"/>, it will be 18 octets in size.
	  The RGRP chunk similarly comprises 18 octets, the terminating
	  chunk is comprised of 1 octet, and 3 octets of padding are needed,
	  for a total of 48 octets.
          The RTCP congestion control feedback (CCFB) report comprises an 8-octet
          RTCP header and SSRC, a 4-octet report timestamp, and for
	  each of the remote audio and video SSRCs, an 8-octet report header, 2 octets per packet
          reported upon, and padding to a 4-octet boundary if needed; that is, 
          8 + 4 + 8 + (2 * Nv) + 8 + (2 * Na), where Nv is the number of video
          packets per report and Na is the number of audio packets per report.
        </t>

        <t>
          The complete compound RTCP packet contains the RTCP packets from
          both the reporting and non-reporting SSRCs, an SRTCP trailer and authentication
          tag, and a UDP/IPv4 header. The size of this RTCP packet is therefore
          262 + (2 * Nv) + (2 * Na) octets.
          Since the aggregate RTCP packet contains reports from two SSRCs, the
          RTCP packet size is halved before use <xref target="RFC8108"/>.
          Accordingly, the size of the RTCP packets is:
        </t>
        <artwork name="" type="" align="left"><![CDATA[
   Srtcp = (262 + (2 * Nv) + (2 * Na)) / 2
        ]]></artwork>

        <t>
          How many RTP packets does the RTCP XR congestion control feedback
          packet, included in these compound RTCP packets, report on? That is,
          what are the values of Nv and Na?
          This depends on the RTCP reporting interval (Trtcp), the video bit
          rate and frame rate (Rf), the audio bit rate and framing interval,
          and whether the receiver chooses to send congestion control feedback
          in each RTCP packet it sends.
        </t>

        <t>
          To simplify the calculation, assume it is desired to send one RTCP
          report for each frame of video received (i.e., Trtcp = 1 / Rf) and
          to include a congestion control feedback packet in each report.
          Assume that video has a constant bit rate and frame rate and that
          each frame of video has to fit into a 1500-octet MTU. Further,
          assume that the audio takes negligible bandwidth and that the
          audio framing interval can be varied within reasonable bounds, so
          that an integral number of audio frames align with video frame
          boundaries.
        </t>

        <t>
          <xref target="scenario2-compound"/> shows the resulting values of
          Nv and Na (the number of video and audio packets covered by each
          congestion control feedback report) for a range of data rates and
          video frame rates, assuming congestion control feedback is sent
          once per video frame.
          The table also shows the result of inverting the RTCP reporting
          interval calculation to find the corresponding RTCP bandwidth
          (Brtcp). The RTCP bandwidth is given in kbps and as a fraction of
          the data rate.
        </t>

        <t>
          It can be seen that, for example, with a data rate of 1024 kbps
          and a video sent at 30 frames per second, the RTCP congestion control
          feedback report sent for each video frame will include reports on
          3 video packets and 2 audio packets. The RTCP bandwidth needed to
          sustain this reporting rate is 127.5 kbps (12% of the data rate).
          This assumes an audio framing interval of 16.67 ms, so that 2
          audio packets are sent for each video frame.
        </t>

        <table anchor='scenario2-compound'>
          <name>Required RTCP Bandwidth, Reporting on Every Frame</name>
          <thead>
            <tr>
              <th  align='center'> Data Rate (kbps) </th>
              <th align='center'> Video Frame Rate: Rf </th>
              <th align='center'> Video Packets per Report: Nv </th>
              <th align='center'> Audio Packets per Report: Na </th>
              <th align='center'> Required RTCP Bandwidth: Brtcp (kbps) </th>
            </tr>
          </thead>
          <tbody>
          <tr>
            <td> 100 </td>
            <td> 8 </td>
            <td> 1 </td>
            <td> 6 </td>
            <td> 34.5 (34%) </td>
          </tr>

          <tr>
            <td> 200 </td>
            <td> 16 </td>
            <td> 1 </td>
            <td> 3 </td>
            <td> 67.5 (33%) </td>
          </tr>

          <tr>
            <td> 350 </td>
            <td> 30 </td>
            <td> 1 </td>
            <td> 2 </td>
            <td> 125.6 (35%) </td>
          </tr>

          <tr>
            <td> 700 </td>
            <td> 30 </td>
            <td> 2 </td>
            <td> 2 </td>
            <td> 126.6 (18%) </td>
          </tr>

          <tr>
            <td> 700 </td>
            <td> 60 </td>
            <td> 1 </td>
            <td> 1 </td>
            <td> 249.4 (35%) </td>
          </tr>

          <tr>
            <td> 1024 </td>
            <td> 30 </td>
            <td> 3 </td>
            <td> 2 </td>
            <td> 127.5 (12%) </td>
          </tr>

          <tr>
            <td> 1400 </td>
            <td> 60 </td>
            <td> 2 </td>
            <td> 1 </td>
            <td> 251.2 (17%) </td>
          </tr>

          <tr>
            <td> 2048 </td>
            <td> 30 </td>
            <td> 6 </td>
            <td> 2 </td>
            <td> 130.3 ( 6%) </td>
          </tr>

          <tr>
            <td> 2048 </td>
            <td> 60 </td>
            <td> 3 </td>
            <td> 1 </td>
            <td> 253.1 (12%) </td>
          </tr>

          <tr>
            <td> 4096 </td>
            <td> 30 </td>
            <td> 12 </td>
            <td> 2 </td>
            <td> 135.9 ( 3%) </td>
          </tr>

          <tr>
            <td> 4096 </td>
            <td> 60 </td>
            <td> 6 </td>
            <td> 1 </td>
            <td> 258.8 ( 6%) </td>
          </tr>

          </tbody>
        </table>

        <t>
          Use of reduced-size RTCP <xref target="RFC5506"/> would allow the SR
          and SDES packets to be omitted from some reports. These reduced-size
          RTCP packets would
          contain an RTCP RGRS packet from the non-reporting SSRC and an RTCP 
          SDES RGRP packet and a congestion control feedback packet from the 
          reporting SSRC. This will be 12 + 28 + 12 + 8 + (2 * Nv) + 8 + (2 * Na) octets, 
          plus the SRTCP trailer and authentication tag and a UDP/IP header.
          That is, the size of the reduced-size packets would be (110 + (2 * Nv) + (2 * Na)) / 2
          octets. Repeating the analysis above,
          but alternating compound and reduced-size reports, gives the results shown
          in <xref target="scenario2-compound-noncompound"/>.
        </t>

        <table anchor='scenario2-compound-noncompound'>
          <name>Required RTCP Bandwidth, Reporting on Every Frame, with Reduced-Size Reports</name>
          <thead>
            <tr>
              <th align='center'> Data Rate (kbps) </th>
              <th align='center'> Video Frame Rate: Rf </th>
              <th align='center'> Video Packets per Report: Nv </th>
              <th align='center'> Audio Packets per Report: Na </th>
              <th align='center'> Required RTCP Bandwidth: Brtcp (kbps) </th>
            </tr>
          </thead>
          <tbody>
          <tr>
            <td> 100 </td>
            <td> 8 </td>
            <td> 1 </td>
            <td> 6 </td>
            <td> 25.0 (25%) </td>
          </tr>

          <tr>
            <td> 200 </td>
            <td> 16 </td>
            <td> 1 </td>
            <td> 3 </td>
            <td> 48.5 (24%) </td>
          </tr>

          <tr>
            <td> 350 </td>
            <td> 30 </td>
            <td> 1 </td>
            <td> 2 </td>
            <td> 90.0 (25%) </td>
          </tr>

          <tr>
            <td> 700 </td>
            <td> 30 </td>
            <td> 2 </td>
            <td> 2 </td>
            <td> 90.9 (12%) </td>
          </tr>

          <tr>
            <td> 700 </td>
            <td> 60 </td>
            <td> 1 </td>
            <td> 1 </td>
            <td> 178.1 (25%) </td>
          </tr>

          <tr>
            <td> 1024 </td>
            <td> 30 </td>
            <td> 3 </td>
            <td> 2 </td>
            <td> 91.9 ( 8%) </td>
          </tr>

          <tr>
            <td> 1400 </td>
            <td> 60 </td>
            <td> 2 </td>
            <td> 1 </td>
            <td> 180.0 (12%) </td>
          </tr>

          <tr>
            <td> 2048 </td>
            <td> 30 </td>
            <td> 6 </td>
            <td> 2 </td>
            <td> 94.7 ( 4%) </td>
          </tr>

          <tr>
            <td> 2048 </td>
            <td> 60 </td>
            <td> 3 </td>
            <td> 1 </td>
            <td> 181.9 ( 8%) </td>
          </tr>

          <tr>
            <td> 4096 </td>
            <td> 30 </td>
            <td> 12 </td>
            <td> 2 </td>
            <td> 100.3 ( 2%) </td>
          </tr>

          <tr>
            <td> 4096 </td>
            <td> 60 </td>
            <td> 6 </td>
            <td> 1 </td>
            <td> 187.5 ( 4%) </td>
          </tr>

          </tbody>
        </table>

        <t>
          The use of reduced-size RTCP gives a noticeable reduction in the 
          needed RTCP bandwidth and can be combined with reporting every
          few frames, rather than every frame. Overall, it is clear that
          the RTCP overhead can be reasonable across the range of data and
          frame rates if RTCP is configured carefully.
        </t>

        <t>
          As discussed in <xref target="sec-p2p-audio"/>, the reporting overhead will
          increase if IPv6 is used, due to the increased size of the IPv6
          header. <xref target="scenario2-compound-noncompound-ipv6"/> shows
          the overhead in this case, compared to 
          <xref target="scenario2-compound-noncompound"/>. As can be seen,
          the increase in overhead due to IPv6 rapidly becomes less significant as the data
          rate increases.
        </t>
        <table anchor='scenario2-compound-noncompound-ipv6'>
          <name>Required RTCP Bandwidth, Reporting on Every Frame, with Reduced-Size Reports, Using IPv6</name>
          <thead>
            <tr>
              <th align='center'> Data Rate (kbps) </th>
              <th align='center'> Video Frame Rate: Rf </th>
              <th align='center'> Video Packets per Report: Nv </th>
              <th align='center'> Audio Packets per Report: Na </th>
              <th align='center'> Required RTCP Bandwidth: Brtcp (kbps) </th>
            </tr>
          </thead>
          <tbody>
          <tr>
            <td> 100 </td>
            <td> 8 </td>
            <td> 1 </td>
            <td> 6 </td>
            <td> 27.5 (27%) </td>
          </tr>

          <tr>
            <td> 200 </td>
            <td> 16 </td>
            <td> 1 </td>
            <td> 3 </td>
            <td> 53.5 (26%) </td>
          </tr>

          <tr>
            <td> 350 </td>
            <td> 30 </td>
            <td> 1 </td>
            <td> 2 </td>
            <td> 99.4 (28%) </td>
          </tr>

          <tr>
            <td> 700 </td>
            <td> 30 </td>
            <td> 2 </td>
            <td> 2 </td>
            <td> 100.3 (14%) </td>
          </tr>

          <tr>
            <td> 700 </td>
            <td> 60 </td>
            <td> 1 </td>
            <td> 1 </td>
            <td> 196.9 (28%) </td>
          </tr>

          <tr>
            <td> 1024 </td>
            <td> 30 </td>
            <td> 3 </td>
            <td> 2 </td>
            <td> 101.2 ( 9%) </td>
          </tr>

          <tr>
            <td> 1400 </td>
            <td> 60 </td>
            <td> 2 </td>
            <td> 1 </td>
            <td> 198.8 (14%) </td>
          </tr>

          <tr>
            <td> 2048 </td>
            <td> 30 </td>
            <td> 6 </td>
            <td> 2 </td>
            <td> 104.1 ( 5%) </td>
          </tr>

          <tr>
            <td> 2048 </td>
            <td> 60 </td>
            <td> 3 </td>
            <td> 1 </td>
            <td> 200.6 ( 9%) </td>
          </tr>

          <tr>
            <td> 4096 </td>
            <td> 30 </td>
            <td> 12 </td>
            <td> 2 </td>
            <td> 109.7 ( 2%) </td>
          </tr>

          <tr>
            <td> 4096 </td>
            <td> 60 </td>
            <td> 6 </td>
            <td> 1 </td>
            <td> 206.2 ( 5%) </td>
          </tr>

          </tbody>
        </table>
      </section>

    </section>

    <section title="Discussion and Conclusions">
      <t>
        Practical systems will generally send some non-media traffic on the
        same path as the media traffic. This can include Session Traversal Utilities for NAT (STUN) / Traversal Using Relays around NAT (TURN) packets
        to keep alive NAT bindings <xref target="RFC8445"/>, WebRTC data
        channel packets <xref target="RFC8831"/>, etc. Such traffic also
        needs congestion control, but the means by which this is achieved
        is out of the scope of this memo. 
      </t>

      <t>
        RTCP, as it is currently specified, cannot be used to send per-packet
        congestion feedback with reasonable overhead. 
      </t>

      <t>
        RTCP can, however, be used to send congestion 
        feedback on each frame of video sent, provided the session bandwidth
        exceeds a couple of megabits per second (the exact rate depends on
        the number of session participants, the RTCP bandwidth fraction,
        what RTCP extensions are enabled, and how much detail of feedback is 
        needed). For lower-rate sessions, the overhead of reporting on every
        frame becomes high but can be reduced to something reasonable by
        sending reports once per N frames (e.g., every second frame) or by
        sending reduced-size RTCP reports in between the regular reports.
        The improved compression of new video codecs exacerbates the
        reporting overhead for a given video quality level, although this
        is to some extent countered by the use of higher-quality video
        over time.
      </t>

      <t>
        If it is desired to use RTCP in something close to its current form 
        for congestion feedback in WebRTC, the multimedia congestion control 
        algorithm needs to be designed to work with feedback sent every few
        frames, since that fits within the limitations of RTCP. The provided feedback
        will be more detailed than just an acknowledgement, however, and will provide
        a loss bitmap, relative arrival time, and received Explicit Congestion Notification (ECN) marks for each
        packet sent. This will allow congestion control 
        that is effective, if slowly responsive, to be implemented (there is
        guidance on providing effective congestion control in <xref target="RFC8085" section="3.1" sectionFormat="of"/>). 
      </t>

      <t>
        The format described in <xref target="RFC8888"/>
        seems sufficient for the needs of congestion control feedback. There
        is little point optimizing this format; the main overhead comes from
        the UDP/IP headers and the other RTCP packets included in the compound
        packets and can be lowered by using the extensions described in 
        <xref target="RFC5506"/> and sending reports less frequently. The use of header
        compression <xref target="RFC2508"/> <xref target="RFC3545"/>
        <xref target="RFC5795"/> can also be beneficial.
      </t>
        
      <t>
        Further study of the scenarios of interest is needed to ensure that
        the analysis presented is applicable to other media topologies
        <xref target="RFC7667"/> and to sessions with different data rates
        and sizes of membership.
      </t>

    </section>

    <section title="Security Considerations">
      <t>
        An attacker that can modify or spoof RTCP congestion control feedback
        packets can manipulate the sender behavior to cause denial of service. 
        This can be prevented by authentication and integrity protection of
        RTCP packets, for example, using the secure RTP profile 
        <xref target="RFC3711"/> <xref target="RFC5124"/> or other means
        as discussed in <xref target="RFC7201"/>.
      </t>
    </section>

    <section title="IANA Considerations">
      <t>
	This document has no IANA actions.
      </t>
    </section>

  </middle>
  <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
  <back>
    <references title="Normative References">
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2914.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3550.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3711.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4585.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5124.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5506.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7022.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7201.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8108.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8083.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8085.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8825.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8834.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8861.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8872.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8888.xml"/>
    </references>
    <references title="Informative References">
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2508.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3449.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3545.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3611.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5348.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5795.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7667.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8445.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8831.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9293.xml"/>
    </references>
    <section anchor="Acknowledgements" title="Acknowledgements" numbered="false">
      <t>
        Thanks to <contact fullname="Bernard Aboba"/>, <contact
        fullname="Martin Duke"/>, <contact fullname="Linda Dunbar"/>,
        <contact fullname="Gorry Fairhurst"/>, <contact
        fullname="Ingemar Johansson"/>, <contact fullname="Shuping
        Peng"/>, <contact fullname="Alvaro Retana"/>, <contact
        fullname="Zahed Sarker"/>, <contact fullname="John Scudder"/>,
        <contact fullname="Éric Vyncke"/>, <contact fullname="Magnus Westerlund"/>, and the members of the RMCAT
        feedback design team for their feedback.
      </t>
    </section>
  </back>
</rfc>
<!-- vim: set ts=2 sw=2 tw=77 et ai: -->
