<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent">
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt"?>

<rfc
 category="std"
 docName="draft-ietf-nfsv4-layoutwcc-02"
 ipr="trust200902"
 obsoletes=""
 scripts="Common,Latin"
 sortRefs="true"
 submissionType="IETF"
 symRefs="true"
 tocDepth="3"
 tocInclude="true"
 updates="8435"
 version="3"
 xml:lang="en">

<front>
  <title abbrev="LAYOUT_WCC">
    Add LAYOUT_WCC to NFSv4.2's Flex File Layout Type
  </title>
  <seriesInfo name="Internet-Draft" value="draft-ietf-nfsv4-layoutwcc-02"/>
  <author fullname="Thomas Haynes" initials="T." surname="Haynes">
    <organization abbrev="Hammerspace">Hammerspace</organization>
    <address>
      <email>loghyr@hammerspace.com</email>
    </address>
  </author>
  <author fullname="Trond Myklebust" initials="T." surname="Myklebust">
    <organization abbrev="Hammerspace">Hammerspace</organization>
    <address>
      <email>trondmy@hammerspace.com</email>
    </address>
  </author>
  <date year="2024" month="April" day="23"/>
  <area>Transport</area>
  <workgroup>Network File System Version 4</workgroup>
  <keyword>NFSv4</keyword>
  <abstract>
    <t>
      The Parallel Network File System (pNFS) Flexible File Layout allows
      for a file's metadata (MDS) and data (DS) to be on different
      servers. It does not provide a mechanism for the data server to
      update the metadata server of changes to the data part of the
      file. The client has knowledge of such updates, but lacks the
      ability to update the metadata server.  This document presents a
      refinement to RFC8435 to allow the client to update the metadata
      server to changes on the data server.
    </t>
  </abstract>

  <note removeInRFC="true">
    <t>
      Discussion of this draft takes place
      on the NFSv4 working group mailing list (nfsv4@ietf.org),
      which is archived at
      <eref target="https://mailarchive.ietf.org/arch/browse/nfsv4/"/>.
      Working Group information can be found at
      <eref target="https://datatracker.ietf.org/wg/nfsv4/about/"/>.
    </t>
  </note>
</front>

<middle>

<section anchor="sec_intro" numbered="true" removeInRFC="false" toc="default">
  <name>Introduction</name>
  <t>
    In the Network File System version4 (NFSv4) with a Parallel NFS
    (pNFS) Flexible File Layout (<xref target="RFC8435" format="default"
    sectionFormat="of"/>) server, there is no mechanism for the data
    servers to update the metadata servers for when the data portion
    of the file is modified.  The metadata server needs this knowledge
    to correspondingly update the metadata portion of the file. If the
    client is using NFSv3 as the protocol with the data server, it can
    leverage weak cache consistency (WCC) to update the metadata server
    of the attribute changes.  In this document, we introduce a new
    operation called LAYOUT_WCC which allows the client to periodically
    report the attributes of the data files to the metadata server.
  </t>

  <t>
    Using the process detailed in <xref target="RFC8178" format="default"
    sectionFormat="of"/>, the revisions in this document become an
    extension of NFSv4.2 <xref target="RFC7862" format="default"
    sectionFormat="of"/>. They are built on top of the external data
    representation (XDR) <xref target="RFC4506" format="default"
    sectionFormat="of"/> generated from <xref target="RFC7863"
    format="default" sectionFormat="of"/>.
  </t>

  <section anchor="sec_defs" numbered="true" removeInRFC="false" toc="default">
    <name>Definitions</name>
    <dl newline="false" spacing="normal">
      <dt>(file) data:</dt>
      <dd>
        that part of the file system object that contains the
        data to be read or written.  It is the contents of the object
        rather than the attributes of the object.
      </dd>

      <dt>data server (DS):</dt>
      <dd>
        a pNFS server that provides the file's data when
        the file system object is accessed over a file-based protocol.
      </dd>

      <dt>(file) metadata:</dt>
      <dd>
        the part of the file system object that contains
        various descriptive data relevant to the file object, as opposed
        to the file data itself.  This could include the time of last
        modification, access time, EOF position, etc.
      </dd>

      <dt>metadata server (MDS):</dt>
      <dd>
        the pNFS server that provides metadata
        information for a file system object.
      </dd>

      <dt>weak cache consistency (WCC):</dt>
      <dd>
        In NFSv3, WCC allows the client to check for file attribute changes
        before and after an operation.  (See Section 2.6 of <xref target="RFC1813"
        format="default" sectionFormat="of"/>)
      </dd>
    </dl>
  </section>
  <section numbered="true" removeInRFC="false" toc="default">
    <name>Requirements Language</name>
    <t>
      The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
      "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
      NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>",
      "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
      "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this
      document are to be interpreted as described in BCP&nbsp;14 <xref
      target="RFC2119" format="default" sectionFormat="of"/> <xref
      target="RFC8174" format="default" sectionFormat="of"/> when,
      and only when, they appear in all capitals, as shown here.
    </t>
  </section>
</section>

<section anchor="op_LAYOUT_WCC" numbered="true" removeInRFC="false" toc="default">
  <name>Operation 77: LAYOUT_WCC - Layout Weak Cache Consistency</name>
  <section toc="exclude" anchor="ss_op_LAYOUT_WCC_A" numbered="true">
    <name>ARGUMENT</name>
    <sourcecode name="" type="" markers="true"><![CDATA[
/// struct LAYOUT_WCC4args {
///         stateid4        lowa_stateid;
///         layouttype4     lowa_type;
///         opaque          lowa_body<>;
/// };
]]></sourcecode>
   </section>
   <section toc="exclude" anchor="ss_op_LAYOUT_WCC_R" numbered="true">
     <name>RESULT</name>
     <sourcecode name="" type="" markers="true"><![CDATA[
/// struct LAYOUT_WCC4res {
///         nfsstat4                lowr_status;
/// };
]]></sourcecode>
   </section>
   <section toc="exclude" anchor="ss_op_LAYOUT_WCC_D" numbered="true">
     <name>DESCRIPTION</name>
     <t>
       When using pNFS (See Section 12 of <xref target="RFC8881"
       format="default" sectionFormat="of"/>), the client is most likely
       to be performing file operations to the storage device and not the
       metadata server. With a NFSv3 data server in the flexible
       files layout type (in <xref target="RFC8435" format="default"
       sectionFormat="of"/>) there is no control protocol (<xref target="RFC8434"
       format="default" sectionFormat="of"/>) between the
       metadata server and the storage device. In order to update the
       metadata state of the file, the metadata server will need to track
       the metadata state of the data file - once the layout is issued,
       it is not able to see the NFSv3 file operations from the client
       to the storage device. Thus the metadata server will be required
       to query the storage device for the data file attributes.
     </t>
     <t>
       For example, the metadata server
       would issue a NFSv3 GETATTR to the storage device. These queries
       are most likely triggered in response to a NFSv4 GETATTR to the
       metadata server. Not only are these NFSv3 GETATTRs to the storage device
       individually expensive, the storage device can become inundated by
       a storm of such requests. NFSv3 solved a similar issue by having
       the READ and WRITE operations employ a post-operation attribute
       to report the weak cache consistency (WCC) data (See Section 2.6
       of <xref target="RFC1813" format="default" sectionFormat="of"/>).
     </t>
     <t>
       Each NFSv3 operation corresponds to one round trip between the
       client and server. So a WRITE followed by a GETATTR would require
       two round trips. In that scenario, the attribute information
       retrieved is considered to be strict server-client consistency.
       For NFSv4, the WRITE and GETATTR can
       be issued together inside a compound, which only requires one round
       trip between the client and server. And this is also considered
       to be a strict server-client consistency. In essence, the NFSv4
       READ and WRITE operations drop the post-operation attributes,
       allowing the client to decide if it needs that information.
     </t>
     <t>
       With the flexible files layout type, the client can leverage
       the NFSv3 WCC to service the proxying of times (See Section 4 of <xref
       target="delstid" format="default" sectionFormat="of"/>).
       But the granularity of this data is limited. With client side
       mirroring (See Section 8 of <xref target="RFC8435" format="default"
       sectionFormat="of"/>), the client has to aggregate the N mirrored
       files in order to send one piece of information instead of N
       pieces of information. Also, the client is limited to sending
       that information only when it returns the delegation.
     </t>
     <t>
       The current filehandle and the lowa_stateid identifies the
       particular layout for the LAYOUT_WCC operation. The lowa_type
       indicates how to unpack the layout type specific payload inside
       the lowa_body field. The lowa_type is defined to be a value from
       the IANA registry for "pNFS Layout Types Registry".
     </t>
     <t>
       The lowa_body will contain the data file attributes. The client
       will be responsible for mapping the NFSv3 post-operation attributes
       to those in a fattr4.  Just as the post-operation attributes may be
       ignored by the client, the server may ignore the attributes inside
       the LAYOUT_WCC.  But the server can also use those attributes to
       avoid querying the storage device for the data file attributes.
       Note that as these attributes are optional and there is nothing
       the client can do if the server ignores one, there is no need to
       return a bitmap4 of which attributes were accepted in the result
       of the LAYOUT_WCC.
     </t>
   </section>

   <section anchor="ss_op_LAYOUT_WCC_impl" numbered="true" removeInRFC="false" toc="default">
     <name>Implementation</name>
     <section anchor="ss_op_LAYOUT_WCC_examples" numbered="true" removeInRFC="false" toc="default">
       <name>Examples of when to use LAYOUT_WCC</name>
       <t>
         The only way for the metadata server to detect modifications
         to the data file is to probe the data servers via a GETATTR. It
         can compare the mtime results across multiple calls to detect a
         NFSv3 WRITE operation by the client. Likewise, the atime results
         indicate the client having issued a NFSv3 READ operation. As such,
         the client should leverage the LAYOUT_WCC operation whenever it
         has the belief that the metadata server would need to refresh
         the attributes of the data files.  While The client can send a
         LAYOUT_WCC at any time, there are times it will want to do this
         operation in order to avoid having the metadata server issue
         NFSv3 GETATTR requests to the data servers:
       </t>
       <ul spacing="normal">
         <li>
           Whenever it sends a GETATTR for any of the following attributes: size, space_used, change, 
           time_access, time_metadata, and time_modify.
         </li>
         <li>
           Whenever it sends an NFS4ERR_ACCESS error via LAYOUTRETURN or LAYOUTERROR - it could
           have already gotten the NFSv3 uid and gid values back in the WCC of the WRITE,
           READ, or COMMIT operation which got the error.
         </li>
         <li>
           Whenever it sends a SETATTR to refresh the proxied times ((See Section 4 of <xref
           target="delstid" format="default" sectionFormat="of"/>) - the metadata server is
           going to want to correlate these times in order to detect later modification to
           the data file.
         </li>
       </ul>
     </section>

     <section anchor="ss_op_LAYOUT_WCC_payload" numbered="true" removeInRFC="false" toc="default">
       <name>Examples of what to send in the LAYOUT_WCC</name>
       <t>
         The NFSv3 attributes returned in the WCC of WRITE, READ, and COMMIT are a smaller subset
         of what can be transmitted as a NFSv4 attribute. The mapping of NFSv3 to NFSv4 attributes
         shown in <xref target="table_mappings"/> also details which attributes
         the LAYOUT_WCC <bcp14>SHOULD</bcp14> be providing to the metadata server, Both the uid
         and gid are stringified into their respective attributes of owner and owner_group.
         The reason to provide these two attributes is in case of NFS4ERR_ACCESS, the metadata
         server can compare what it expects the values of the uid and gid of the data file
         to be versus the actual values. It can then repair the permissions as needed or
         modify the expected values it has cached.
       </t>

       <table anchor='table_mappings'>
         <tbody>
         <tr><th>NFSv3 Attribute</th> <th>NFSv4 Attribute</th></tr>

         <tr><td>size</td>  <td>size</td></tr>
         <tr><td>used</td>  <td>space_used</td></tr>
         <tr><td>uid</td>   <td>owner</td></tr>
         <tr><td>gid</td>   <td>owner_group</td></tr>
         <tr><td>atime</td> <td>time_access</td></tr>
         <tr><td>mtime</td> <td>time_modify</td></tr>
         <tr><td>ctime</td> <td>time_metadata</td></tr>
         </tbody>
       </table>

     </section>
   </section>

   <section anchor="ss_op_LAYOUT_WCC_errs" numbered="true" removeInRFC="false" toc="default">
     <name>Allowed Errors</name>
     <t>
      The LAYOUT_WCC operation can raise the errors in
      <xref target="tbl_op_error_returns" format="default" sectionFormat="of"/>.
      When an error is encountered, the metadata server can decide to ignore
      the entire operation or depending on the layout type
      specific payload, it could decide to apply a portion of the payload.
     </t>
     <t keepWithNext="true">Valid Error Returns for LAYOUT_WCC</t>
     <table anchor="tbl_op_error_returns" align="center">
       <thead>
         <tr>
           <th align="left">Errors</th>
         </tr>
       </thead>
       <tbody>
         <tr>
           <td align="left">
        NFS4ERR_ADMIN_REVOKED,
        NFS4ERR_BADXDR,
        NFS4ERR_BAD_STATEID,
        NFS4ERR_DEADSESSION,
        NFS4ERR_DELAY,
        NFS4ERR_DELEG_REVOKED,
        NFS4ERR_EXPIRED,
        NFS4ERR_FHEXPIRED,
        NFS4ERR_GRACE,
        NFS4ERR_INVAL,
        NFS4ERR_ISDIR,
        NFS4ERR_MOVED,
        NFS4ERR_NOFILEHANDLE,
        NFS4ERR_NOTSUPP,
        NFS4ERR_NO_GRACE,
        NFS4ERR_OLD_STATEID,
        NFS4ERR_OP_NOT_IN_SESSION,
        NFS4ERR_REP_TOO_BIG,
        NFS4ERR_REP_TOO_BIG_TO_CACHE,
        NFS4ERR_REQ_TOO_BIG,
        NFS4ERR_RETRY_UNCACHED_REP,
        NFS4ERR_SERVERFAULT,
        NFS4ERR_STALE,
        NFS4ERR_TOO_MANY_OPS,
        NFS4ERR_UNKNOWN_LAYOUTTYPE,
        NFS4ERR_WRONG_CRED,
        NFS4ERR_WRONG_TYPE
          </td>
        </tr>
      </tbody>
    </table>
  </section>
  <section anchor="ss_op_LAYOUT_WCC_opt" numbered="true" removeInRFC="false" toc="default">
    <name>Extension of Existing Implementations</name>
    <t>
      The new LAYOUT_WCC operation is <bcp14>OPTIONAL</bcp14> for both
      NFSv4.2 (<xref target="RFC7863" format="default" sectionFormat="of"/>)
      and the flexible files layout type (<xref target="RFC8435" format="default"
       sectionFormat="of"/>).
    </t>
  </section>
  <section anchor="ss_op_LAYOUT_WCC_ff" numbered="true" removeInRFC="false" toc="default">
    <name>Flex Files Layout Type</name>
    <sourcecode name="" type="" markers="true"><![CDATA[
/// struct ff_data_server_wcc4 {
///             deviceid4            ffdsw_deviceid;
///             stateid4             ffdsw_stateid;
///             nfs_fh4              ffdsw_fh_vers<>;
///             fattr4               ffdsw_attributes;
/// };
///
/// struct ff_mirror_wcc4 {
///             ff_data_server_wcc4  ffmw_data_servers<>;
/// };
///
/// struct ff_layout_wcc4 {
///             ff_mirror_wcc4       fflw_mirrors<>;
/// };
]]>
    </sourcecode>
    <t>
      The flex file layout type specific results <bcp14>MUST</bcp14> correspond
      to the ff_layout4 data structure as defined in Section 5.1 of
      <xref target="RFC8435" format="default" sectionFormat="of"/>.
      There <bcp14>MUST</bcp14> be a one-to-one correspondence between:
    </t>
    <ul spacing="normal">
      <li>
        ff_data_server4 -&gt; ff_data_server_wcc4
      </li>
      <li>
        ff_mirror4 -&gt; ff_mirror_wcc4
      </li>
      <li>
        ff_layout4 -&gt; ff_layout_wcc4
      </li>
    </ul>
    <t>
      Each ff_layout4 has an array of ff_mirror4, which have an array of ff_data_server4.
      Based on the current filehandle and the lowa_stateid, the server can match the
      reported attributes.
    </t>
    <t>
      But the positional correspondence between the elements is not
      sufficient to determine the attributes to update. Consider the
      case where a layout had three mirrors and two of them had updated
      attributes, but the third did not. A client could decide to present
      all three mirrors, with one mirror having an attribute mask with
      no attributes present. Or it could decide to present only the
      two mirrors which had been changed.
    </t>
    <t>
      In either case, the combination of ffdsw_deviceid, ffdsw_stateid, and
      ffdsw_fh_vers will uniquely identify the attributes to be updated.
      All three arguments are required. A layout might have multiple data
      files on the same storage device, in which case the ffdsw_deviceid and
      ffdsw_stateid would match, but the ffdsw_fh_vers would not.
    </t>
    <t>
      The ffdsw_attributes are processed similar to the obj_attributes in
      the SETATTR arguments (See Section 18.34 of <xref target="RFC8881" format="default" sectionFormat="of"/>).
    </t>
  </section>
</section>

<section anchor="xdr_desc" numbered="true" removeInRFC="false" toc="default">
  <name>Extraction of XDR</name>
  <t>
    This document contains the external data representation (XDR)
    <xref target="RFC4506" format="default" sectionFormat="of"/> description of the new open
    flags for delegating the file to the client.
    The XDR description is embedded in this
    document in a way that makes it simple for the reader to extract
    into a ready-to-compile form.  The reader can feed this document
    into the following shell script to produce the machine readable
    XDR description of the new flags:
  </t>
  <sourcecode name="" type="" markers="true"><![CDATA[
#!/bin/sh
grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??'
    ]]>
  </sourcecode>
  <t>
    That is, if the above script is stored in a file called "extract.sh", and
    this document is in a file called "spec.txt", then the reader can do:
  </t>
  <sourcecode name="" type="" markers="true"><![CDATA[
sh extract.sh < spec.txt > layout_wcc.x
    ]]>
  </sourcecode>
  <t>
    The effect of the script is to remove leading white space from each
    line, plus a sentinel sequence of "///".  XDR descriptions with the
    sentinel sequence are embedded throughout the document.
  </t>
  <t>
    Note that the XDR code contained in this document depends on types
    from the NFSv4.2 nfs4_prot.x file (generated from <xref target="RFC7863" format="default" sectionFormat="of"/>).
    This includes both nfs types that end with a 4, such as offset4,
    length4, etc., as well as more generic types such as uint32_t and
    uint64_t.
  </t>
  <t>
    While the XDR can be appended to that from <xref target="RFC7863" format="default" sectionFormat="of"/>,
    the various code snippets belong in their respective areas of the
    that XDR.
  </t>
  <section anchor="code_copyright" numbered="true" removeInRFC="false" toc="default">
    <name>Code Components Licensing Notice</name>
    <t>
       Both the XDR description and the scripts used for extracting the
       XDR description are Code Components as described in Section 4 of
       <xref target="LEGAL" format="default" sectionFormat="of">"Legal
       Provisions Relating to IETF Documents"</xref>.  These Code
       Components are licensed according to the terms of that document.
    </t>
  </section>
</section>

<section anchor="sec_security" numbered="true" removeInRFC="false" toc="default">
  <name>Security Considerations</name>
  <t>
    There are no new security considerations beyond those in
    <xref target="RFC7862" format="default" sectionFormat="of"/>.
  </t>
</section>

<section anchor="sec_iana" numbered="true" removeInRFC="false" toc="default">
  <name>IANA Considerations</name>
  <t>
    IANA should use the current document (RFC-TBD) as the reference for the new entries.
  </t>
</section>

</middle>

<back>

<references>
  <name>References</name>

  <references>
  <name>Normative References</name>
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
       href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
       href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4506.xml"/>
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
       href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7862.xml"/>
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
       href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7863.xml"/>
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
       href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
       href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8178.xml"/>
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
       href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8434.xml"/>
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
       href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8435.xml"/>
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
       href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8881.xml"/>

    <reference anchor='delstid'>
      <front>
        <title>Extending the Opening of Files in NFSv4.2</title>
        <author initials="T." surname="Haynes" fullname="T. Haynes">
          <organization>Hammerspace</organization>
        </author>
        <author initials="T." surname="Myklebust" fullname="T. Myklebust">
          <organization>Hammerspace</organization>
        </author>
        <date year="2023" month="February" />
      </front>
      <seriesInfo name="draft-ietf-nfsv4-delstid-02.xml" value="(Work In Progress)"/>
    </reference>

  </references>

  <references>
  <name>Informative References</name>
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
       href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.1813.xml"/>
    <reference anchor="LEGAL" target="http://trustee.ietf.org/docs/IETF-Trust-License-Policy.pdf">
      <front>
        <title abbrev="Legal Provisions">Legal Provisions Relating to IETF Documents</title>
        <author>
          <organization>IETF Trust</organization>
        </author>
        <date month="November" year="2008"/>
      </front>
    </reference>
  </references>
</references>

<section numbered="true" removeInRFC="false" toc="default">
      <name>Acknowledgments</name>
      <t>
        Dave Noveck, Tigran Mkrtchyan, and Rick Macklem provided reviews of the document.
      </t>
    </section>

</back>

</rfc>
