<?xml version="1.0" encoding="UTF-8"?>

<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902"
     docName="draft-ietf-cose-hash-algs-09" number="9054"
     submissionType="IETF" category="info" consensus="true" tocInclude="true"
     symRefs="true" sortRefs="true" xml:lang="en"  version="3">

  <front>
    <title abbrev="COSE Hashes">CBOR Object Signing and Encryption (COSE): Hash Algorithms</title>
    <seriesInfo name="RFC" value="9054"/>
    <author initials="J." surname="Schaad" fullname="Jim Schaad">
      <organization>August Cellars</organization>
      <address />
    </author>
    <date year="2022" month="August"/>
    <area>Security</area>
    <workgroup>COSE Working Group</workgroup> 

    <keyword>SHA-1 Hash Algorithm</keyword>
    <keyword>SHA-2 HAsh Algorithm</keyword>
    <keyword>SHAKE Algorithm</keyword>

    <abstract>
      <t>
        The CBOR Object Signing and
	Encryption (COSE) syntax (see RFC 9052) does not define any
	direct methods for using hash algorithms.
        There are, however, circumstances where hash algorithms are used, such
	as indirect signatures, where the hash of one or more contents are
	signed, and identification of an X.509 certificate or other object by the
	use of a fingerprint.
        This document defines hash algorithms that are identified by COSE algorithm identifiers.
      </t>
    </abstract>
  </front>
  <middle>
    <section anchor="introduction">
      <name>Introduction</name>
      <t>
        The CBOR Object Signing and Encryption (COSE) syntax <xref target="RFC9052" format="default"/> does not define any direct methods for the use of hash algorithms.
        It also does not define a structure syntax that is used to encode a digested object structure along the lines of the DigestedData ASN.1 structure in <xref target="RFC5652"/>.
        This omission was intentional, as a structure consisting of just a digest identifier, the content, and a digest value does not, by itself, provide any strong security service.
        Additionally, an application is going to be better off defining this type of structure so that it can include any additional data that needs to be hashed, as well as methods of obtaining the data.
      </t>
      <t>
        While the above is true, there are some cases where having some standard hash algorithms defined for COSE with a common identifier makes a great deal of sense.
        Two of the cases where these are going to be used are:
      </t>
      <ul>
        <li>
          Indirect signing of content, and
        </li>
        <li>
          Object identification.
        </li>
      </ul>
      <t>
        Indirect signing of content is a paradigm where the content is not
	directly signed, but instead a hash of the content is computed, and
	that hash value -- along with an identifier for the hash algorithm -- is
	included in the content that will be signed.
        Indirect signing allows for a signature to be validated without first
	downloading all of the content associated with the signature.
        Rather, the signature can be validated on all of the hash values and
	pointers to the associated contents; those associated parts can then
	be downloaded, then the hash value of that part can be computed and
	compared to the hash value in the signed content.
        This capability can be of even greater importance in a constrained
	environment, as not all of the content signed may be needed by the
	device. An example of how this is used can be found in <xref
	target="I-D.ietf-suit-manifest" sectionFormat="of" section="5.4"/>.
      </t>
      <t>
        The use of hashes to identify objects is something that has been very common.
        One of the primary things that has been identified by a hash function in a secure message is a certificate.
        Two examples of this can be found in <xref target="RFC2634"/> and the COSE equivalents in <xref target="I-D.ietf-cose-x509"/>.
      </t>
      <section anchor="requirements-terminology">
        <name>Requirements Terminology</name>
        <t>
    The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
    NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
    "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
    described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> 
    when, and only when, they appear in all capitals, as shown here.
        </t>

      </section>

    </section>

    <section>
      <name>Hash Algorithm Usage</name>

      <t>
        As noted in the previous section, hash functions can be used for a
	variety of purposes.
        Some of these purposes require that a hash function be cryptographically strong.
        These include direct and indirect signatures -- that is, using the
	hash as part of the signature or using the hash as part of the body to
	be signed.
        Other uses of hash functions may not require the same level of strength.
      </t>
      
      <t>
        This document contains some hash functions that are not designed to be used for cryptographic operations.
        An application that is using a hash function needs to carefully evaluate exactly what hash properties are needed and which hash functions are going to provide them.
        Applications should also make sure that the ability to change hash
functions is part of the base design, as cryptographic advances are sure to
reduce the strength of any given hash function <xref target="BCP201"/>.
      </t>

      <t>
        A hash function is a map from one, normally large, bit string to a second, usually smaller, bit string.
        As the number of possible input values is far greater than the number of possible output values, it is inevitable that there are going to be collisions.
        The trick is to make sure that it is difficult to find two values that are going to map to the same output value.
        A "Collision Attack" is one where an attacker can find two different messages that have the same hash value.
        A hash function that is susceptible to practical collision attacks <bcp14>SHOULD NOT</bcp14> be used for a cryptographic purpose.
        The discovery of theoretical collision attacks against a given hash
	function <bcp14>SHOULD</bcp14> trigger protocol maintainers and users
	to review the continued suitability of the algorithm if
	alternatives are available and migration is viable.
        The only reason such a hash function is used is when there is
	absolutely no other choice (e.g., a Hardware Security Module (HSM)
	that cannot be replaced), and only after looking at the possible
	security issues.
        Cryptographic purposes would include the creation of signatures or the use of hashes for indirect signatures.
        These functions may still be usable for noncryptographic purposes.
      </t>

      <t>
        An example of a noncryptographic use of a hash is filtering from a
	collection of values to find a set of possible candidates; the
	candidates can then be checked to see if they can successfully be
	used.
        A simple example of this is the classic fingerprint of a certificate.
        If the fingerprint is used to verify that it is the correct certificate, then that usage is a cryptographic one and is subject to the warning above about collision attack.
        If, however, the fingerprint is used to sort through a collection of certificates to find those that might be used for the purpose of verifying a signature, a simple filter capability is sufficient.
        In this case, one still needs to confirm that the public key validates
the signature (and that the certificate is trusted), and all certificates that don't contain a key that validates the signature can be discarded as false positives.
      </t>

      <t>
        To distinguish between these two cases, a new value in the Recommended
	column of the "COSE Algorithms" registry has been added.
        "Filter Only" indicates that the only purpose of a hash function
	should be to filter results; it is not intended for applications that
	require a cryptographically strong algorithm.
      </t>

      <section>
        <name>
          Example CBOR Hash Structure
        </name>

        <t>
          <xref target="RFC8152"/> did not provide a default structure for
	  holding a hash value both because no separate hash algorithms
	  were defined and because the way the structure is set up is frequently
	  application specific.
          There are four fields that are often included as part of a hash structure:
        </t>

        <ul>
          <li>
            The hash algorithm identifier.
          </li>
          <li>
            The hash value.
          </li>
          <li>
            A pointer to the value that was hashed.
            This could be a pointer to a file, an object that can be obtained
	    from the network, a pointer to someplace in the message, or
	    something very application specific.
          </li>
          <li>
            Additional data. This can be something as simple as a random value
	    (i.e., salt) to make finding hash collisions slightly harder (because
	    the payload handed to the application could have been selected to
	    have a collision), or as complicated as a set of processing
	    instructions that is used with the object that is pointed to.
            The additional data can be dealt with in a number of ways,
	    prepending or appending to the content, but it is strongly
	    suggested that either it be a fixed known size, or the lengths of
	    the pieces being hashed be included so that the resulting byte
            string has a unique interpretation as the additional data.
            (Encoding as a CBOR array accomplishes this requirement.)
          </li>
        </ul>

        <t>
          An example of a structure that permits all of the above fields to exist would look like the following:
        </t>

        <sourcecode type="cddl">
COSE_Hash_V = (
    1 : int / tstr, # Algorithm identifier
    2 : bstr, # Hash value
    ? 3 : tstr, # Location of object that was hashed
    ? 4 : any   # object containing other details and things
    )
        </sourcecode>

        <t>
          Below is an alternative structure that could be used in situations where one is searching a group of objects for a matching hash value.
          In this case, the location would not be needed, and adding extra data to the hash would be counterproductive.
          This results in a structure that looks like this:
        </t>

        <sourcecode type="cddl">
COSE_Hash_Find = [
    hashAlg : int / tstr,
    hashValue : bstr
]
        </sourcecode>

      </section>
    </section>
    
    <section>
        <name>Hash Algorithm Identifiers</name>

        <section>
          <name>SHA-1 Hash Algorithm</name>
          <t>
            The SHA-1 hash algorithm <xref target="RFC3174"/> was designed by
	    the United States National Security Agency and published in
	    1995. Since that time, a large amount of cryptographic analysis
	    has been applied to this algorithm, and a successful collision
	    attack has been created <xref target="SHA-1-collision"/>.
            The IETF formally started discouraging the use of SHA-1 in <xref target="RFC6194"/>.
          </t>

          <t>
            Despite these facts, there are still times where SHA-1 needs to be
	    used; therefore, it makes sense to assign a code point for the
	    use of this hash algorithm.
            Some of these situations involve historic HSMs where only SHA-1 is
	    implemented; in other situations, the SHA-1 value is used
	    for the purpose of filtering; thus, the collision-resistance
	    property is not needed.
          </t>

          <t>
            Because of the known issues for SHA-1 and the fact that it should no longer be used, the algorithm will be registered with the recommendation of "Filter Only".
            This provides guidance about when the algorithm is safe for use, while discouraging usage where it is not safe.
          </t>

          <t>
            The COSE capabilities for this algorithm is an empty array.
          </t>



<!-- table 1 -->
          <table align="center" anchor="SHA1-Algs">
          <name>SHA-1 Hash Algorithm</name>
            <thead>
              <tr>
                <th>Name</th>
                <th>Value</th>
                <th>Description</th>
                <th>Capabilities</th>
                <th>Reference</th>
                <th>Recommended</th>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td>SHA-1</td>
                <td>-14</td>
                <td>SHA-1 Hash</td>
                <td>[]</td>
                <td>RFC 9054</td>
                <td>Filter Only</td>
              </tr>
            </tbody>
          </table>

        </section>

      <section>
        <name>SHA-2 Hash Algorithms</name>
        <t>
          The family of SHA-2 hash algorithms <xref target="FIPS-180-4"/> was designed by the United States National Security Agency and published in 2001.
          Since that time, some additional algorithms have been added to the original set to deal with length-extension attacks and some performance issues.
          While the SHA-3 hash algorithms have been published since that time, the SHA-2 algorithms are still broadly used.
        </t>

        <t>
          There are a number of different parameters for the SHA-2 hash functions.
          The set of hash functions that has been chosen for inclusion in
	  this document is based on those different parameters and some of
	  the trade-offs involved.
        </t>
          <ul>
            <li>
              <t>
              <strong>SHA-256/64</strong> provides a truncated hash.
              The length of the truncation is designed to allow for smaller transmission size.
              The trade-off is that the odds that a collision will occur increase proportionally.
              Use of this hash function requires analysis of the potential
	      problems that could result from a collision, or it must be
	      limited to where the purpose of the hash is noncryptographic.
              </t>
              <t>
                The latter is the case for some of the scenarios identified in <xref target="I-D.ietf-cose-x509"/>,
                specifically, for the cases when the hash value is used to select among possible certificates: if
		there are multiple choices remaining, then each choice can be
		tested by using the public key.
              </t>
            </li>
            <li>
              <strong>SHA-256</strong> is probably the most common hash function used currently.
              SHA-256 is an efficient hash algorithm for 32-bit hardware.
            </li>
            <li>
              <strong>SHA-384</strong> and <strong>SHA-512</strong> hash functions are efficient for 64-bit hardware.
            </li>
            <li>
              <strong>SHA-512/256</strong> provides a hash function that runs more efficiently on 64-bit hardware but offers the same security level as SHA-256.
            </li>
          </ul>

	  <aside><t>NOTE: SHA-256/64 is a simple truncation of SHA-256 to 64 bits defined in this specification. SHA-512/256 is a modified variant of SHA-512 truncated to 256 bits, as defined in <xref target="FIPS-180-4"/>.</t></aside>
          <t>
            The COSE capabilities array for these algorithms is empty.
          </t>
<!-- table 2 -->

        <table align="center" anchor="SHA2-Algs">
          <name>SHA-2 Hash Algorithms</name>
          <thead>
            <tr>
              <th>Name</th>
              <th>Value</th>
              <th>Description</th>
              <th>Capabilities</th>
              <th>Reference</th>
              <th>Recommended</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td>SHA-256/64</td>
              <td>-15</td>
              <td>SHA-2 256-bit Hash truncated to 64-bits</td>
              <td>[]</td>
              <td>RFC 9054</td>
              <td>Filter Only</td>
            </tr>
            <tr>
              <td>SHA-256</td>
              <td>-16</td>
              <td>SHA-2 256-bit Hash</td>
              <td>[]</td>
              <td>RFC 9054</td>
              <td>Yes</td>
            </tr>
            <tr>
              <td>SHA-384</td>
              <td>-43</td>
              <td>SHA-2 384-bit Hash</td>
              <td>[]</td>
              <td>RFC 9054</td>
              <td>Yes</td>
            </tr>
            <tr>
              <td>SHA-512</td>
              <td>-44</td>
              <td>SHA-2 512-bit Hash</td>
              <td>[]</td>
              <td>RFC 9054</td>
              <td>Yes</td>
            </tr>
            <tr>
              <td>SHA-512/256</td>
              <td>-17</td>
              <td>SHA-2 512-bit Hash truncated to 256-bits</td>
              <td>[]</td>
              <td>RFC 9054</td>
              <td>Yes</td>
            </tr>
          </tbody>
        </table>
      </section>

      <section>
        <name>SHAKE Algorithms</name>

        <t>
          The family of SHA-3 hash algorithms <xref target="FIPS-202"/> was the result of a competition run by NIST.
          The pair of algorithms known as SHAKE-128 and SHAKE-256 are the instances of SHA-3 that are currently being standardized in the IETF.

          This is the reason for including these algorithms in this document.          
        </t>

        <t>
          The SHA-3 hash algorithms have a significantly different structure than the SHA-2 hash algorithms.
        </t>

        <t>
          Unlike the SHA-2 hash functions, no algorithm identifier is created for shorter lengths.
          The length of the hash value stored is 256 bits for SHAKE-128 and
	  512 bits for SHAKE-256.
        </t>
        
          <t>
            The COSE capabilities array for these algorithms is empty.
          </t>
<!-- table 3 -->
        <table align="center" anchor="SHAKE-Algs">
          <name>SHAKE Hash Functions</name>
          <thead>
            <tr>
              <th>Name</th>
              <th>Value</th>
              <th>Description</th>
              <th>Capabilities</th>
              <th>Reference</th>
              <th>Recommended</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td>SHAKE128</td>
              <td>-18</td>
              <td>SHAKE-128 256-bit Hash Value</td>
              <td>[]</td>
              <td>RFC 9054</td>
              <td>Yes</td>
            </tr>
            <tr>
              <td>SHAKE256</td>
              <td>-45</td>
              <td>SHAKE-256 512-bit Hash Value</td>
              <td>[]</td>
              <td>RFC 9054</td>
              <td>Yes</td>
            </tr>
          </tbody>
        </table>
      </section>
    </section>
    
    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      
      <section anchor="cose-algorithm-registry">
        <name>COSE Algorithm Registry</name>
        <t>
          IANA has registered the following algorithms in the <eref target="https://www.iana.org/assignments/cose/">"COSE Algorithms" registry</eref>.
        </t>
        
        <ul>
          <li>
            The SHA-1 hash function found in <xref target="SHA1-Algs"/>.
          </li>
          <li>
            The set of SHA-2 hash functions found in <xref target="SHA2-Algs"/>.
          </li>
          <li>
            The set of SHAKE hash functions found in <xref target="SHAKE-Algs"/>.
          </li>
        </ul>

        <!-- IANA
             The following paragraph is retained for historic reasons only.
        -->
        
        <t>
          Many of the hash values produced are relatively long; as such,
	  use of a two-byte algorithm identifier seems reasonable.
          SHA-1 is tagged as "Filter Only", so a longer algorithm identifier is appropriate even though it is a shorter hash value.
        </t>

        <t>
          IANA has added the value of "Filter Only" to the set of
	  legal values for the Recommended column.
          This value is only to be used for hash functions and indicates that
	  it is not to be used for purposes that require collision
	  resistance. As a result of this addition, IANA has added this document as a reference for the "COSE Algorithms" registry.

        </t>
          
      </section>
    </section>
    
    <section anchor="security-considerations">
      <name>Security Considerations</name>
        <t>
          Protocols need to perform a careful analysis of the properties of a hash function that are needed and how they map onto the possible attacks.
          In particular, one needs to distinguish between those uses that need the cryptographic properties, such as collision resistance, and uses that only need properties that correspond to possible object identification.
          The different attacks correspond to who or what is being protected: is it the originator that is the attacker or a third party?
          This is the difference between collision resistance and second pre-image resistance.
          As a general rule, longer hash values are "better" than short ones, but trade-offs of transmission size, timeliness, and security all need to be included as part of this analysis.
          In many cases, the value being hashed is a public value and, as
such, (first) pre-image resistance is not part of this analysis.
        </t>
        <t>
          Algorithm agility needs to be considered a requirement for any use of hash functions <xref target="BCP201"/>.
          As with any cryptographic function, hash functions are under
	  constant attack, and the cryptographic strength of hash algorithms
	  will be reduced over time.
        </t>
    </section>
  </middle>

<back>  

    <displayreference target="RFC2634" to="ESS"/>
    <displayreference target="RFC5652" to="CMS"/>
    <displayreference target="RFC8152" to="COSE"/>
    <displayreference target="I-D.ietf-cose-x509" to="COSE-x509"/>
    <displayreference target="I-D.ietf-suit-manifest" to="SUIT-MANIFEST"/>

<references>
    <name>References</name>

    <references title='Normative References'>

      <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3174.xml"/>



      <reference anchor="FIPS-180-4">
        <front>
          <title>Secure Hash Standard</title>
          <author>
            <organization>NIST</organization>
          </author>
          <date month="August" year="2015"/>
        </front>
        <seriesInfo name="FIPS PUB" value="180-4"/>
        <seriesInfo name="DOI" value="10.6028/NIST.FIPS.180-4"/>
      </reference>

      <reference anchor="FIPS-202">
        <front>
          <title>SHA-3 Standard: Permutation-Based Hash and Extendable-Output
	  Functions</title>
          <author initials="M.J." surname="Dworkin">
            <organization>National Institute of Standards and Technology</organization>
          </author>
          <date month="August" year="2015"/>
        </front>
        <seriesInfo name="FIPS PUB" value="202"/>
        <seriesInfo name="DOI" value="10.6028/NIST.FIPS.202"/>
      </reference>

<reference anchor="RFC9052" target="https://www.rfc-editor.org/info/rfc9052">
<front>
<title>CBOR Object Signing and Encryption (COSE): Structures and Process</title>
<author initials='J' surname='Schaad' fullname='Jim Schaad'>
  <organization/>
</author>
<date month='August' year='2022'/>
</front>
<seriesInfo name="STD" value="96"/>
<seriesInfo name="RFC" value="9052"/>
<seriesInfo name="DOI" value="10.17487/RFC9052"/>
</reference>
      
     
    </references>

    <references title='Informative References'>

<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5652.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2634.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6194.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8152.xml"/>

<!--  [I-D.ietf-cose-x509] Approved-announcement to be sent::Revised I-D Needed  -->
<xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-cose-x509.xml"/>


<!-- [I-D.ietf-suit-manifest] IESG state I-D Exists -->
<xi:include
    href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-suit-manifest.xml"/>


      <!--
      <xi:include href="bibxml/reference.RFC.2585.xml"/>
      <xi:include href="bibxml/reference.RFC.5246.xml"/>
      <xi:include href="bibxml/reference.RFC.7468.xml"/>
      <xi:include href="bibxml/reference.RFC.8152.xml"/>
      <xi:include href="bibxml/reference.RFC.8392.xml"/>
      <xi:include href="bibxml/reference.I-D.ietf-lamps-rfc5751-bis.xml"/>
      <xi:include href="bibxml/reference.I-D.ietf-cbor-cddl.xml"/>
      <xi:include href="bibxml/reference.I-D.selander-ace-cose-ecdhe.xml"/>
      -->

                <!-- <xi:include href="bibxml/reference.BCP.0201.xml"/> -->

   <reference anchor="BCP201" target="https://www.rfc-editor.org/info/bcp201">

     <front>
       <title>Guidelines for Cryptographic Algorithm Agility and Selecting Mandatory-to-Implement 
Algorithms</title>

       <author initials="R." surname="Housley" fullname="Russ Housley">
         <organization />
       </author>

       <date month="November" year="2015" />
     </front>
     <seriesInfo name="BCP" value="201" />
     <seriesInfo name="RFC" value="7696"/>
   </reference>


<reference anchor="SHA-1-collision" target="https://shattered.io/static/shattered.pdf">
        <front>
          <title>The first collision for full SHA-1</title>
          <author initials="M." surname="Stevens"/>
          <author initials="E." surname="Bursztein"/>
          <author initials="P." surname="Karpman"/>
          <author initials="A." surname="Albertini"/>
          <author initials="Y." surname="Markov"/>
          <date month="Feb" year="2017"/>
        </front>
      </reference>
    </references>
  </references>
  </back>
</rfc>
