Hi All, John and I have written the following review, focused on the Lustre client procfs stats file. John''s talk at LUG [1] provides some of the background to this work. We welcome your thoughts. John Hammond (TACC) Richard Henwood (Whamcloud) Introduction ------------ The Lustre proc filesystem (procfs) is a convenient way to modify and review a Lustre filesystem. The Lustre procfs is a solid interface for tool writers to build upon. Tools that deliver precise and accurate metrics are valuable in trouble-shooting a production Lustre filesystem. Documentation is available [2,3] for Lustre procfs but it is not complete. In particular, is does not describe the contents of the /lustre/llite/<mount_id>/stats file. This file is a natural place to store filesystem metrics for each Lustre filesystem the client has mounted. This document is concerned with describing deficiencies and enhancements to Lustre procfs. The scope of this document is limited to the stats file in procfs of a mounted Lustre filesystem as seen by a client. Our intention is to air our suggestions, make modifications to our ideas based on feedback and create a patch for review. Lustre client metrics --------------------- One of the primary objectives for the support staff at TACC is to maintain computing capability. A simple measure of capability is that clients of the Lustre filesystem are consuming data. An accurate measure of the quantity of data consumed by a given client is useful in sharing resources and scheduling jobs. This measure (the number of bytes read) should be simple to collect. Declarative stats file ---------------------- The existing proc filesystem on a client provides a ''stats'' file. This is located: /proc/fs/lustre/llite/<mount_id>/stats. The contents of this file initially include: snapshot_time 1304977515.150559 secs.usecs ioctl 1 samples [regs] alloc_inode 1 samples [regs] inode_permission 1 samples [regs] *ENHANCEMENT* stats should declare all metrics that are recorded, even if they are zero. Currently tool developers must maintain their own lookups of all possible values and test for their absence. Declaring all the metrics voids the need to consult source code to identify all possible metrics. Read bytes ---------- A client of a Lustre filesystem will be interested in the total bytes transfered over the fabric. The stats file appears to provide a valuable snapshot of high-level data transfer metrics. However, after investigation the values recorded are of limited value. read_bytes returns the number of bytes that have been requested. This is not the same as the number of bytes that have been read. The example below illustrates this confusion: [root at rhel6_21 ~]# echo "hello lustre" > /mnt/lustre/test.txt [root at rhel6_21 ~]# cat /mnt/lustre/test.txt > /dev/null [root at rhel6_21 ~]# cat /proc/fs/lustre/llite/lustre-ffff88001aa95c00/stats ... read_bytes 1 samples [bytes] 2097152 2097152 2097152 write_bytes 1 samples [bytes] 13 13 13 ... In this example, the read on the file was performed by cat. This requests the number of bytes it needs to fill it''s internal buffer. It continues to do this until the read returns zero. So, in our example, the internal buffer size of cat is 1KB and it performs two reads. As it stands, this metric may be misleading to the uninformed. *ENHANCEMENT* read_bytes should return the number of bytes that have been read, consistent with the behavior of write_bytes. This will avoid confusion for users and give a more accurate measure of the traffic over the filesystem. Cache misses The Lustre client has a cache. File reads may be serviced by this cache, or the may need to be completed by the backend filesystem (a cache miss). It is possible to discover if a cache miss has taken place on the client, but it is time consuming and subject to race conditions. *ENHANCEMENT* Bytes send over the wire should be explicitly recorded in the stats file. This will enable a detailed view of the client and network interaction with the filesystem. Conclusions ----------- This document outlines changes to the procfs client stats file based on a experience gained using Lustre in production at TACC. The authors welcome feedback on these changes. 1. http://www.olcf.ornl.gov/wp-content/events/lug2011/4-13-2011/330-400_John_Hammond_hammond-lug.pdf 2. http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html 3. http://wiki.lustre.org/manual/LustreManual20_HTML/SystemConfigurationUtilities_HTML.html#50438219_pgfId-1294840 -- Richard.Henwood at whamcloud.com Whamcloud Inc. tel: +1 512 410 9612
On 2011-05-10, at 3:14 PM, Richard Henwood <rhenwood at whamcloud.com> wrote:> John and I have written the following review, focused on the Lustre > client procfs stats file. John''s talk at LUG [1] provides some of the > background to this work. We welcome your thoughts. > > John Hammond (TACC) > Richard Henwood (Whamcloud) > > > Introduction > ------------ > > The Lustre proc filesystem (procfs) is a convenient way to modify and > review a Lustre filesystem. The Lustre procfs is a solid interface for > tool writers to build upon. Tools that deliver precise and accurate > metrics are valuable in trouble-shooting a production Lustre > filesystem. > > Documentation is available [2,3] for Lustre procfs but it is not > complete. In particular, is does not describe the contents of the > /lustre/llite/<mount_id>/stats file. This file is a natural place to > store filesystem metrics for each Lustre filesystem the client has > mounted. > > This document is concerned with describing deficiencies and > enhancements to Lustre procfs. The scope of this document is limited > to the stats file in procfs of a mounted Lustre filesystem as seen by > a client. Our intention is to air our suggestions, make modifications > to our ideas based on feedback and create a patch for review. > > Lustre client metrics > --------------------- > > One of the primary objectives for the support staff at TACC is to > maintain computing capability. A simple measure of capability is that > clients of the Lustre filesystem are consuming data. An accurate > measure of the quantity of data consumed by a given client is useful > in sharing resources and scheduling jobs. This measure (the number of > bytes read) should be simple to collect. > > Declarative stats file > ---------------------- > > The existing proc filesystem on a client provides a ''stats'' file. This > is located: /proc/fs/lustre/llite/<mount_id>/stats. The contents of > this file initially include: > > snapshot_time 1304977515.150559 secs.usecs > ioctl 1 samples [regs] > alloc_inode 1 samples [regs] > inode_permission 1 samples [regs] > > *ENHANCEMENT* stats should declare all metrics that are recorded, even > if they are zero. Currently tool developers must maintain their own > lookups of all possible values and test for their absence. Declaring > all the metrics voids the need to consult source code to identify all > possible metrics.In fact, this is how the "stats" file used to operate, however to avoid printing a lot of stats counters that are always zero for a given device the kernel filters out any values that have never been hit in the code. In order to keep parts of the stats setup generic between the different layers of Lustre, the stats structure may contain counters that are never used.> Read bytes > ---------- > > A client of a Lustre filesystem will be interested in the total bytes > transfered over the fabric. The stats file appears to provide a > valuable snapshot of high-level data transfer metrics. However, after > investigation the values recorded are of limited value. read_bytes > returns the number of bytes that have been requested. This is not the > same as the number of bytes that have been read. The example below > illustrates this confusion: > > [root at rhel6_21 ~]# echo "hello lustre" > /mnt/lustre/test.txt > [root at rhel6_21 ~]# cat /mnt/lustre/test.txt > /dev/null > [root at rhel6_21 ~]# cat /proc/fs/lustre/llite/lustre-ffff88001aa95c00/stats > ... > read_bytes 1 samples [bytes] 2097152 2097152 2097152 > write_bytes 1 samples [bytes] 13 13 13 > ... > > In this example, the read on the file was performed by cat. This > requests the number of bytes it needs to fill it''s internal buffer. It > continues to do this until the read returns zero. So, in our example, > the internal buffer size of cat is 1KB and it performs two reads. As > it stands, this metric may be misleading to the uninformed. > > *ENHANCEMENT* read_bytes should return the number of bytes that have > been read, consistent with the behavior of write_bytes. This will > avoid confusion for users and give a more accurate measure of the > traffic over the filesystem.I agree this is misleading and could probably be fixed fairly easily.> Cache misses > > The Lustre client has a cache. File reads may be serviced by this > cache, or the may need to be completed by the backend filesystem (a > cache miss). It is possible to discover if a cache miss has taken > place on the client, but it is time consuming and subject to race > conditions. > > *ENHANCEMENT* Bytes send over the wire should be explicitly recorded > in the stats file. This will enable a detailed view of the client and > network interaction with the filesystem.> Conclusions > ----------- > This document outlines changes to the procfs client stats file based > on a experience gained using Lustre in production at TACC. The authors > welcome feedback on these changes. > > 1. http://www.olcf.ornl.gov/wp-content/events/lug2011/4-13-2011/330-400_John_Hammond_hammond-lug.pdf > 2. http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html > 3. http://wiki.lustre.org/manual/LustreManual20_HTML/SystemConfigurationUtilities_HTML.html#50438219_pgfId-1294840 > > -- > Richard.Henwood at whamcloud.com > Whamcloud Inc. > tel: +1 512 410 9612 > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Thanks for the feedback. We are going to address our issues with Lustre client procfs stats as follows: On Tue, May 10, 2011 at 4:14 PM, Richard Henwood <rhenwood at whamcloud.com> wrote: <snip>> > *ENHANCEMENT* stats should declare all metrics that are recorded, even > if they are zero. Currently tool developers must maintain their own > lookups of all possible values and test for their absence. Declaring > all the metrics voids the need to consult source code to identify all > possible metrics. >*RESOLUTION* Update the Manual. <snip>> *ENHANCEMENT* read_bytes should return the number of bytes that have > been read, consistent with the behavior of write_bytes. This will > avoid confusion for users and give a more accurate measure of the > traffic over the filesystem. > Cache misses >*RESOLUTION* Assert this behavior is a bug. Create a Jira ticket and discuss the solution design there. http://jira.whamcloud.com/browse/LU-333 <snip>> *ENHANCEMENT* Bytes send over the wire should be explicitly recorded > in the stats file. This will enable a detailed view of the client and > network interaction with the filesystem. >*RESOLUTION* Assert this is a valuable enhancement. Create a Jira ticket and discuss the solution design there. http://jira.whamcloud.com/browse/LU-334 richard, -- Richard.Henwood at whamcloud.com Whamcloud Inc. tel: +1 512 410 9612