rfc9766v1.txt   rfc9766.txt 
Internet Engineering Task Force (IETF) T. Haynes Internet Engineering Task Force (IETF) T. Haynes
Request for Comments: 9766 T. Myklebust Request for Comments: 9766 T. Myklebust
Category: Standards Track Hammerspace Category: Standards Track Hammerspace
ISSN: 2070-1721 February 2025 ISSN: 2070-1721 April 2025
Addition of LAYOUT_WCC to NFSv4.2's Flexible File Layout Type Extensions for Weak Cache Consistency in NFSv4.2's Flexible File Layout
Abstract Abstract
This document specifies extensions to the Parallel Network File This document specifies extensions to NFSv4.2 for improving Weak
System (NFS) version 4 (pNFS) for improving write cache consistency. Cache Consistency (WCC). These extensions introduce mechanisms that
These extensions introduce mechanisms that ensure partial writes ensure partial writes performed under a Parallel NFS (pNFS) layout
performed under a pNFS layout remain coherent and correctly tracked. remain coherent and correctly tracked. The solution addresses
The solution addresses concurrency and data integrity concerns that concurrency and data integrity concerns that may arise when multiple
may arise when multiple clients write to the same file through clients write to the same file through separate data servers. By
separate data servers. By defining additional interactions among defining additional interactions among clients, metadata servers, and
clients, metadata servers, and data servers, this specification data servers, this specification enhances the reliability of NFSv4 in
enhances the reliability of NFSv4 in parallel-access environments and parallel-access environments and ensures consistency across diverse
ensures consistency across diverse deployment scenarios. deployment scenarios.
Status of This Memo Status of This Memo
This is an Internet Standards Track document. This is an Internet Standards Track document.
This document is a product of the Internet Engineering Task Force This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has (IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841. Internet Standards is available in Section 2 of RFC 7841.
skipping to change at line 78 skipping to change at line 78
5. Security Considerations 5. Security Considerations
6. IANA Considerations 6. IANA Considerations
7. References 7. References
7.1. Normative References 7.1. Normative References
7.2. Informative References 7.2. Informative References
Acknowledgments Acknowledgments
Authors' Addresses Authors' Addresses
1. Introduction 1. Introduction
In the Network File System version 4 (NFSv4) with a Parallel NFS In the Parallel NFS (pNFS) flexible file layout (see [RFC8435]),
(pNFS) flexible file layout (see Section 12 of [RFC8435]) server,
there is no mechanism for the data servers to update the metadata there is no mechanism for the data servers to update the metadata
servers when the data portion of the file is modified. The metadata servers when the data portion of the file is modified. The metadata
server needs this knowledge to correspondingly update the metadata server needs this knowledge to correspondingly update the metadata
portion of the file. If the client is using NFSv3 as the protocol portion of the file. If the client is using NFSv3 as the protocol
with the data server, it can leverage Weak Cache Consistency (WCC) to with the data server, it can leverage Weak Cache Consistency (WCC) to
update the metadata server of the attribute changes. In this update the metadata server of the attribute changes. In this
document, we introduce a new operation called LAYOUT_WCC to NFSv4.2, document, we introduce a new operation called LAYOUT_WCC to NFSv4.2,
which allows the client to periodically report the attributes of the which allows the client to periodically report the attributes of the
data files to the metadata server. data files to the metadata server.
skipping to change at line 121 skipping to change at line 120
metadata server (MDS): the pNFS server that provides metadata metadata server (MDS): the pNFS server that provides metadata
information for a file system object. information for a file system object.
storage device: the target to which clients may direct I/O requests storage device: the target to which clients may direct I/O requests
when they hold an appropriate layout. Note that each data server when they hold an appropriate layout. Note that each data server
is a storage device but that some storage device are not data is a storage device but that some storage device are not data
servers. (See Section 2.1 of [RFC8434] for a discussion on the servers. (See Section 2.1 of [RFC8434] for a discussion on the
difference between a data server and a storage device.) difference between a data server and a storage device.)
weak cache consistency (WCC): In NFSv3, WCC allows the client to weak cache consistency (WCC): the mechanism in NFSv3 that allows the
check for file attribute changes before and after an operation client to check for file attribute changes before and after an
(see Section 2.6 of [RFC1813]). operation (see Section 2.6 of [RFC1813]).
1.2. Requirements Language 1.2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in "OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
2. Weak Cache Consistency (WCC) 2. Weak Cache Consistency (WCC)
A pNFS layout type enables the metadata server to inform the client A pNFS layout type enables the metadata server to inform the client
of both the storage protocol and the locations of the data that the of both the storage protocol and the locations of the data that the
client should use when communicating with the storage devices. The client should use when communicating with the storage devices. The
flexible file layout type, as specified in [RFC8435], describes how flexible file layout type, as specified in [RFC8435], describes how
data servers using NFSv3 can be accessed. The client is restricted data servers using NFSv3 can be accessed. The client is restricted
to performing the following NFSv3 operations on the filehandles to performing the following NFSv3 operations on the filehandles
provided in the layout: READ (Section 3.3.6 of [RFC1813]), WRITE provided in the layout: READ, WRITE, and COMMIT (see Sections 3.3.6,
(Section 3.3.7 of [RFC1813]), and COMMIT (Section 3.3.21 of 3.3.7, and 3.3.21 of [RFC1813], respectively). In other words, the
[RFC1813]). In other words, the client may only use NFSv3 operations client may only use NFSv3 operations that act directly on the data
that act directly on the data portion of the file. portion of the file.
Because there is no control protocol (see [RFC8434]) possible with Because there is no control protocol (see [RFC8434]) possible with
all data servers, NFSv3 is used as the control protocol. As such, all data servers, NFSv3 is used as the control protocol. As such,
the following NFSv3 operations are commonly used by the metadata the following NFSv3 operations are commonly used by the metadata
server: CREATE (see Section 3.3.8 of [RFC1813]), GETATTR (see server: CREATE, GETATTR, and SETATTR (see Sections 3.3.8, 3.3.1, and
Section 3.3.1 of [RFC1813]), and SETATTR (see Section 3.3.2 of 3.3.2 of [RFC1813], respectively). That is, the metadata server is
[RFC1813]). That is, the metadata server is only allowed to use only allowed to use NFSv3 operations that directly act on the
NFSv3 operations that directly act on the metadata portion of the metadata portion of the data file. GETATTR allows the metadata
data file. GETATTR allows the metadata server to mainly retrieve the server to mainly retrieve the mtime (modify time), ctime (change
mtime (modify time), ctime (change time), and atime (access time). time), and atime (access time). The metadata server can use this
The metadata server can use this information to determine if the information to determine if the client modified the file whilst it
client modified the file whilst it held an iomode of LAYOUTIOMODE4_RW held an iomode of LAYOUTIOMODE4_RW (see Section 3.3.20 of [RFC8881]).
(see Section 3.3.20 of [RFC8881]). Then it can determine the Then it can determine the following for the metadata file:
following for the metadata file: time_modify (see Section 5.8.2.43 of time_modify, time_metadata, and time_access (see Sections 5.8.2.43,
[RFC8881]), time_metadata (see Section 5.8.2.42 of [RFC8881]), and 5.8.2.42, and 5.8.2.37 of [RFC8881], respectively). That is, it can
time_access (see Section 5.8.2.37 of [RFC8881]). That is, it can
determine the information to return to clients in an NFSv4.2 GETATTR determine the information to return to clients in an NFSv4.2 GETATTR
response. response.
For example, the metadata server might issue an NFSv3 GETATTR For example, the metadata server might issue an NFSv3 GETATTR
operation to the data server, which is typically triggered by a operation to the data server, which is typically triggered by a
client's NFSv4 GETATTR request to the metadata server. In addition client's NFSv4 GETATTR request to the metadata server. In addition
to the cost of each individual GETATTR operation, the data server can to the cost of each individual GETATTR operation, the data server can
be overwhelmed by a large volume of such requests. NFSv3 addressed a be overwhelmed by a large volume of such requests. NFSv3 addressed a
similar challenge by including a post-operation attribute in the READ similar challenge by including a post-operation attribute in the READ
and WRITE operations to report WCC data (see Section 2.6 of and WRITE operations to report WCC data (see Section 2.6 of
[RFC1813]). [RFC1813]).
Each NFSv3 operation entails a single round trip between the client Each NFSv3 operation entails a single round trip between the client
and server. Consequently, issuing a WRITE followed by a GETATTR and server. Consequently, issuing a WRITE followed by a GETATTR
would require two round trips. In that situation, the retrieved would require two round trips. In that situation, the retrieved
attribute information is regarded as strict server-client attribute information is regarded as having strict server-client
consistency. By contrast, NFSv4 enables a WRITE and GETATTR to be consistency. By contrast, NFSv4 enables a WRITE and GETATTR to be
combined within a compound operation, which requires only one round combined within a compound operation, which requires only one round
trip. This combined approach is likewise considered strict server- trip. This combined approach is likewise considered to have strict
client consistency. Essentially, NFSv4 READ and WRITE operations server-client consistency. Essentially, NFSv4 READ and WRITE
omit post-operation attributes, allowing the client to determine operations omit post-operation attributes, allowing the client to
whether it requires that information. determine whether it requires that information.
Whilst NFSv4 got rid of the requirement for WCC information to be Whilst NFSv4 got rid of the requirement for WCC information to be
supplied by the WRITE or READ operations, the introduction of pNFS supplied by the WRITE or READ operations, the introduction of pNFS
reintroduces the same problem. The metadata server has to reintroduces the same problem. The metadata server has to
communicate with the data server in order to get the data that could communicate with the data server in order to get the data that could
be provided by a WCC model. be provided by a WCC model.
With the flexible file layout type, the client can leverage the NFSv3 With the flexible file layout type, the client can leverage the NFSv3
WCC to service the proxying of times (see Section 5 of [RFC9754]), WCC to service the proxying of times (see Section 5 of [RFC9754]),
but the granularity of this data is limited. With client-side but the granularity of this data is limited. With client-side
skipping to change at line 290 skipping to change at line 288
- time_modify (see Section 5.8.2.43 of [RFC8881]) - time_modify (see Section 5.8.2.43 of [RFC8881])
* Whenever it sends an NFS4ERR_ACCESS error via LAYOUTRETURN or * Whenever it sends an NFS4ERR_ACCESS error via LAYOUTRETURN or
LAYOUTERROR. It could have already gotten the NFSv3 uid and gid LAYOUTERROR. It could have already gotten the NFSv3 uid and gid
values back in the WCC of the WRITE, READ, or COMMIT operation values back in the WCC of the WRITE, READ, or COMMIT operation
that got the error. Thus, it could report that information back that got the error. Thus, it could report that information back
to the metadata server, saving it from querying that information to the metadata server, saving it from querying that information
via an NFSv3 GETATTR. via an NFSv3 GETATTR.
* Whenever it sends a SETATTR to refresh the proxied times (see * Whenever it sends a SETATTR to refresh the proxied times (see
Section 5 of [RFC9754]). The metadata server is going to want to Section 5 of [RFC9754]). The metadata server will correlate these
correlate these times in order to detect later modification to the times in order to detect later modification to the data file.
data file.
3.4.2. Examples of What to Send in LAYOUT_WCC 3.4.2. Examples of What to Send in LAYOUT_WCC
The NFSv3 attributes returned in the WCC of WRITE, READ, and COMMIT The NFSv3 attributes returned in the WCC of WRITE, READ, and COMMIT
operations are a smaller subset of what can be transmitted as an operations are a smaller subset of what can be transmitted as an
NFSv4 attribute. The mapping of NFSv3 to NFSv4 attributes is shown NFSv4 attribute. The mapping of NFSv3 to NFSv4 attributes is shown
in Table 1. The LAYOUT_WCC MUST provide all of these attributes to in Table 1. The LAYOUT_WCC MUST provide all of these attributes to
the metadata server. Both the uid and gid are stringified into their the metadata server. Both the uid and gid are stringified into their
respective attributes of owner and owner_group. In the case of respective attributes of owner and owner_group. In the case of
NFS4ERR_ACCESS, the reason to provide these two attributes is that NFS4ERR_ACCESS, the reason to provide these two attributes is that
skipping to change at line 416 skipping to change at line 413
attributes present. Or it could decide to present only the two attributes present. Or it could decide to present only the two
mirrors that had been changed. mirrors that had been changed.
In either case, the combination of ffdsw_deviceid, ffdsw_stateid, and In either case, the combination of ffdsw_deviceid, ffdsw_stateid, and
ffdsw_fh_vers will uniquely identify the attributes to be updated. ffdsw_fh_vers will uniquely identify the attributes to be updated.
All three arguments are required. A layout might have multiple data All three arguments are required. A layout might have multiple data
files on the same storage device, in which case the ffdsw_deviceid files on the same storage device, in which case the ffdsw_deviceid
and ffdsw_stateid would match, but the ffdsw_fh_vers would not. and ffdsw_stateid would match, but the ffdsw_fh_vers would not.
The ffdsw_attributes are processed similar to the obj_attributes in The ffdsw_attributes are processed similar to the obj_attributes in
the SETATTR arguments (see Section 18.34 of [RFC8881]). the SETATTR arguments (see Section 18.30 of [RFC8881]).
4. Extraction of XDR 4. Extraction of XDR
This document contains the XDR [RFC4506] description of the new open This document contains the XDR [RFC4506] description of the new
flags for delegating the file to the client. The XDR description is NFSv4.2 operation LAYOUT_WCC. The XDR description is embedded in
embedded in this document in a way that makes it simple for the this document in a way that makes it simple for the reader to extract
reader to extract into a ready-to-compile form. The reader can feed into a ready-to-compile form. The reader can feed this document into
this document into the following shell script to produce the machine- the following shell script to produce the machine-readable XDR
readable XDR description of the new flags: description of the new NFSv4.2 operation LAYOUT_WCC.
<CODE BEGINS> <CODE BEGINS>
#!/bin/sh #!/bin/sh
grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??' grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??'
<CODE ENDS> <CODE ENDS>
That is, if the above script is stored in a file called 'extract.sh', That is, if the above script is stored in a file called 'extract.sh',
and this document is in a file called 'spec.txt', then the reader can and this document is in a file called 'spec.txt', then the reader can
do: do:
<CODE BEGINS> <CODE BEGINS>
sh extract.sh < spec.txt > layout_wcc.x sh extract.sh < spec.txt > layout_wcc.x
<CODE ENDS> <CODE ENDS>
The effect of the script is to remove leading white space from each The effect of the script is to remove leading blank space from each
line, plus a sentinel sequence of '///'. XDR descriptions with the line, plus a sentinel sequence of '///'. XDR descriptions with the
sentinel sequence are embedded throughout the document. sentinel sequence are embedded throughout the document.
Note that the XDR code contained in this document depends on types Note that the XDR code contained in this document depends on types
from the NFSv4.2 nfs4_prot.x file (generated from [RFC7863]). This from the NFSv4.2 nfs4_prot.x file (generated from [RFC7863]). This
includes both nfs types that end with a 4 (such as offset4 and includes both nfs types that end with a 4 (such as offset4 and
length4) as well as more generic types (such as uint32_t and length4) as well as more generic types (such as uint32_t and
uint64_t). uint64_t).
While the XDR can be appended to that from [RFC7863], the various While the XDR can be appended to that from [RFC7863], the various
 End of changes. 13 change blocks. 
49 lines changed or deleted 46 lines changed or added

This html diff was produced by rfcdiff 1.48.