thr3ads.net - Lustre discuss - [Lustre-discuss] review: Lustre client procfs stats [May 2011]

If this information is useful, please help other people find it:
Share via:

Richard Henwood

2011-May-10 21:14 UTC

[Lustre-discuss] review: Lustre client procfs stats

Hi All,
John and I have written the following review, focused on the Lustre
client procfs stats file. John''s talk at LUG [1] provides some of the
background to this work. We welcome your thoughts.

John Hammond (TACC)
Richard Henwood (Whamcloud)

Introduction
------------

The Lustre proc filesystem (procfs) is a convenient way to modify and
review a Lustre filesystem. The Lustre procfs is a solid interface for
tool writers to build upon. Tools that deliver precise and accurate
metrics are valuable in trouble-shooting a production Lustre
filesystem.

Documentation is available [2,3] for Lustre procfs but it is not
complete. In particular, is does not describe the contents of the
/lustre/llite/<mount_id>/stats file. This file is a natural place to
store filesystem metrics for each Lustre filesystem the client has
mounted.

This document is concerned with describing deficiencies and
enhancements to Lustre procfs. The scope of this document is limited
to the stats file in procfs of a mounted Lustre filesystem as seen by
a client. Our intention is to air our suggestions, make modifications
to our ideas based on feedback and create a patch for review.

Lustre client metrics
---------------------

One of the primary objectives for the support staff at TACC is to
maintain computing capability. A simple measure of capability is that
clients of the Lustre filesystem are consuming data. An accurate
measure of the quantity of data consumed by a given client is useful
in sharing resources and scheduling jobs. This measure (the number of
bytes read) should be simple to collect.

Declarative stats file
----------------------

The existing proc filesystem on a client provides a ''stats''
file. This
is located: /proc/fs/lustre/llite/<mount_id>/stats. The contents of
this file initially include:

snapshot_time 1304977515.150559 secs.usecs
ioctl 1 samples [regs]
alloc_inode 1 samples [regs]
inode_permission 1 samples [regs]

*ENHANCEMENT* stats should declare all metrics that are recorded, even
if they are zero. Currently tool developers must maintain their own
lookups of all possible values and test for their absence. Declaring
all the metrics voids the need to consult source code to identify all
possible metrics.

Read bytes
----------

A client of a Lustre filesystem will be interested in the total bytes
transfered over the fabric. The stats file appears to provide a
valuable snapshot of high-level data transfer metrics. However, after
investigation the values recorded are of limited value. read_bytes
returns the number of bytes that have been requested. This is not the
same as the number of bytes that have been read. The example below
illustrates this confusion:

[root at rhel6_21 ~]# echo "hello lustre" > /mnt/lustre/test.txt
[root at rhel6_21 ~]# cat /mnt/lustre/test.txt > /dev/null
[root at rhel6_21 ~]# cat /proc/fs/lustre/llite/lustre-ffff88001aa95c00/stats
...
read_bytes 1 samples [bytes] 2097152 2097152 2097152
write_bytes 1 samples [bytes] 13 13 13
...

In this example, the read on the file was performed by cat. This
requests the number of bytes it needs to fill it''s internal buffer. It
continues to do this until the read returns zero. So, in our example,
the internal buffer size of cat is 1KB and it performs two reads. As
it stands, this metric may be misleading to the uninformed.

*ENHANCEMENT* read_bytes should return the number of bytes that have
been read, consistent with the behavior of write_bytes. This will
avoid confusion for users and give a more accurate measure of the
traffic over the filesystem.
Cache misses

The Lustre client has a cache. File reads may be serviced by this
cache, or the may need to be completed by the backend filesystem (a
cache miss). It is possible to discover if a cache miss has taken
place on the client, but it is time consuming and subject to race
conditions.

*ENHANCEMENT* Bytes send over the wire should be explicitly recorded
in the stats file. This will enable a detailed view of the client and
network interaction with the filesystem.

Conclusions
-----------
This document outlines changes to the procfs client stats file based
on a experience gained using Lustre in production at TACC. The authors
welcome feedback on these changes.

1.
http://www.olcf.ornl.gov/wp-content/events/lug2011/4-13-2011/330-400_John_Hammond_hammond-lug.pdf
2. http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html
3.
http://wiki.lustre.org/manual/LustreManual20_HTML/SystemConfigurationUtilities_HTML.html#50438219_pgfId-1294840

--
Richard.Henwood at whamcloud.com
Whamcloud Inc.
tel: +1 512 410 9612

Andreas Dilger

2011-May-12 06:48 UTC

head link

[Lustre-discuss] review: Lustre client procfs stats

On 2011-05-10, at 3:14 PM, Richard Henwood <rhenwood at whamcloud.com>
wrote:> John and I have written the following review, focused on the Lustre
> client procfs stats file. John''s talk at LUG [1] provides some of
the
> background to this work. We welcome your thoughts.
> 
> John Hammond (TACC)
> Richard Henwood (Whamcloud)
> 
> 
> Introduction
> ------------
> 
> The Lustre proc filesystem (procfs) is a convenient way to modify and
> review a Lustre filesystem. The Lustre procfs is a solid interface for
> tool writers to build upon. Tools that deliver precise and accurate
> metrics are valuable in trouble-shooting a production Lustre
> filesystem.
> 
> Documentation is available [2,3] for Lustre procfs but it is not
> complete. In particular, is does not describe the contents of the
> /lustre/llite/<mount_id>/stats file. This file is a natural place to
> store filesystem metrics for each Lustre filesystem the client has
> mounted.
> 
> This document is concerned with describing deficiencies and
> enhancements to Lustre procfs. The scope of this document is limited
> to the stats file in procfs of a mounted Lustre filesystem as seen by
> a client. Our intention is to air our suggestions, make modifications
> to our ideas based on feedback and create a patch for review.
> 
> Lustre client metrics
> ---------------------
> 
> One of the primary objectives for the support staff at TACC is to
> maintain computing capability. A simple measure of capability is that
> clients of the Lustre filesystem are consuming data. An accurate
> measure of the quantity of data consumed by a given client is useful
> in sharing resources and scheduling jobs. This measure (the number of
> bytes read) should be simple to collect.
> 
> Declarative stats file
> ----------------------
> 
> The existing proc filesystem on a client provides a
''stats'' file. This
> is located: /proc/fs/lustre/llite/<mount_id>/stats. The contents of
> this file initially include:
> 
> snapshot_time             1304977515.150559 secs.usecs
> ioctl                     1 samples [regs]
> alloc_inode               1 samples [regs]
> inode_permission          1 samples [regs]
> 
> *ENHANCEMENT* stats should declare all metrics that are recorded, even
> if they are zero. Currently tool developers must maintain their own
> lookups of all possible values and test for their absence. Declaring
> all the metrics voids the need to consult source code to identify all
> possible metrics.
In fact, this is how the "stats" file used to operate, however to
avoid printing a lot of stats counters that are always zero for a given device
the kernel filters out any values that have never been hit in the code.

In order to keep parts of the stats setup generic between the different layers
of Lustre, the stats structure may contain counters that are never used.
> Read bytes
> ----------
> 
> A client of a Lustre filesystem will be interested in the total bytes
> transfered over the fabric. The stats file appears to provide a
> valuable snapshot of high-level data transfer metrics. However, after
> investigation the values recorded are of limited value. read_bytes
> returns the number of bytes that have been requested. This is not the
> same as the number of bytes that have been read. The example below
> illustrates this confusion:
> 
> [root at rhel6_21 ~]# echo "hello lustre" >
/mnt/lustre/test.txt
> [root at rhel6_21 ~]# cat /mnt/lustre/test.txt > /dev/null
> [root at rhel6_21 ~]# cat
/proc/fs/lustre/llite/lustre-ffff88001aa95c00/stats
> ...
> read_bytes                1 samples [bytes] 2097152 2097152 2097152
> write_bytes               1 samples [bytes] 13 13 13
> ...
> 
> In this example, the read on the file was performed by cat. This
> requests the number of bytes it needs to fill it''s internal
buffer. It
> continues to do this until the read returns zero. So, in our example,
> the internal buffer size of cat is 1KB and it performs two reads. As
> it stands, this metric may be misleading to the uninformed.
> 
> *ENHANCEMENT* read_bytes should return the number of bytes that have
> been read, consistent with the behavior of write_bytes. This will
> avoid confusion for users and give a more accurate measure of the
> traffic over the filesystem.
I agree this is misleading and could probably be fixed fairly easily. 
> Cache misses
> 
> The Lustre client has a cache. File reads may be serviced by this
> cache, or the may need to be completed by the backend filesystem (a
> cache miss). It is possible to discover if a cache miss has taken
> place on the client, but it is time consuming and subject to race
> conditions.
> 
> *ENHANCEMENT* Bytes send over the wire should be explicitly recorded
> in the stats file. This will enable a detailed view of the client and
> network interaction with the filesystem.
> Conclusions
> -----------
> This document outlines changes to the procfs client stats file based
> on a experience gained using Lustre in production at TACC. The authors
> welcome feedback on these changes.
> 
> 1.
http://www.olcf.ornl.gov/wp-content/events/lug2011/4-13-2011/330-400_John_Hammond_hammond-lug.pdf
> 2. http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html
> 3.
http://wiki.lustre.org/manual/LustreManual20_HTML/SystemConfigurationUtilities_HTML.html#50438219_pgfId-1294840
> 
> -- 
> Richard.Henwood at whamcloud.com
> Whamcloud Inc.
> tel: +1 512 410 9612
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Richard Henwood

2011-May-17 14:54 UTC

head link

[Lustre-discuss] review: Lustre client procfs stats

Thanks for the feedback.

We are going to address our issues with Lustre client procfs stats as follows:

On Tue, May 10, 2011 at 4:14 PM, Richard Henwood <rhenwood at
whamcloud.com> wrote:
<snip>>
> *ENHANCEMENT* stats should declare all metrics that are recorded, even
> if they are zero. Currently tool developers must maintain their own
> lookups of all possible values and test for their absence. Declaring
> all the metrics voids the need to consult source code to identify all
> possible metrics.
>
*RESOLUTION* Update the Manual.

<snip>> *ENHANCEMENT* read_bytes should return the number of bytes that have
> been read, consistent with the behavior of write_bytes. This will
> avoid confusion for users and give a more accurate measure of the
> traffic over the filesystem.
> Cache misses
>
*RESOLUTION* Assert this behavior is a bug. Create a Jira ticket and
discuss the solution design there.
http://jira.whamcloud.com/browse/LU-333

<snip>> *ENHANCEMENT* Bytes send over the wire should be explicitly recorded
> in the stats file. This will enable a detailed view of the client and
> network interaction with the filesystem.
>
*RESOLUTION* Assert this is a valuable enhancement. Create a Jira
ticket and discuss the solution design there.
http://jira.whamcloud.com/browse/LU-334



richard,
-- 
Richard.Henwood at whamcloud.com
Whamcloud Inc.
tel: +1 512 410 9612

Lustre discuss - May 2011 - review: Lustre client procfs stats

[Lustre-discuss] review: Lustre client procfs stats

[Lustre-discuss] review: Lustre client procfs stats

[Lustre-discuss] review: Lustre client procfs stats