thr3ads.net - Lustre discuss - [Lustre-discuss] How to track down a latency/timing problem [Aug 2010]

If this information is useful, please help other people find it:
Share via:

robert

2010-Aug-12 16:10 UTC

[Lustre-discuss] How to track down a latency/timing problem

Hello Lustre Experts

I am trying to solve a problem with very slow "ls" and other big
amount
of file operations but good overall read/write rates.

We are running a small cluster of 3 OSSs with 9 OSTs, 1MDS (with SSD
MDT) and currently two clients. All server nodes are centos 5.2 with
lustre 1.8.1 while the clients are centos 5.4 with lustre 1.8.3. All
components are networked with DDR IB. Striping is set to 1 or 2 for
different folders.

From the very beginning of our tests we had rather slow metadata
operations. File creation maxed at 250/s/client. "ls" of a dir with
1000
files takes about 40-70 seconds almost independently from the file?s
sizes. Dirs that have recently been accessed, are of course much faster
due to caching. There is no general performance problem as we are
getting almost 1G/s when reading/writing big files from two clients in
several threads. But when creating lots of files with lmdd in a test
script in a single thread there are also hangs of a few seconds before
the rates get back to normal.

I?ve been searching the mailing list archives for similar problems but
only found the usual "improving performance for small files" hints.
All
these suggestions have been tested but did not or only slightly improve
performance. Can anyone please tell me ...

- Is there a way to check the amount of time that the different parts of
a file operation take (like 1ms requesting metadata, 1ms receiving
metadata, 123ms reading blocks from OST, ...)?
- Does anyone have a hint on what could be the problem, where to search
or what to do?

Any help is much appreciated!

Thanks!

Robert

Andreas Dilger

2010-Aug-12 16:22 UTC

head link

[Lustre-discuss] How to track down a latency/timing problem

On 2010-08-12, at 10:10, robert wrote:> I am trying to solve a problem with very slow "ls" and other big
amount
> of file operations but good overall read/write rates.
> 
> We are running a small cluster of 3 OSSs with 9 OSTs, 1MDS (with SSD
> MDT) and currently two clients. All server nodes are centos 5.2 with
> lustre 1.8.1 while the clients are centos 5.4 with lustre 1.8.3. All
> components are networked with DDR IB. Striping is set to 1 or 2 for
> different folders.
I don''t recall if there are specific metadata improvements in 1.8.3 vs.
1.8.1, but it is usually better to have newer RPMs than older.  Note that if you
have 1.8.1 (vs. 1.8.1.1) there is a corruption bug in the MDS, and you
definitely need to upgrade.
> From the very beginning of our tests we had rather slow metadata
> operations. File creation maxed at 250/s/client.
For DDR IB that is definitely slow.
> "ls" of a dir with 1000
> files takes about 40-70 seconds almost independently from the file?s
> sizes. Dirs that have recently been accessed, are of course much faster
> due to caching. There is no general performance problem as we are
> getting almost 1G/s when reading/writing big files from two clients in
> several threads. But when creating lots of files with lmdd in a test
> script in a single thread there are also hangs of a few seconds before
> the rates get back to normal.
> 
> I?ve been searching the mailing list archives for similar problems but
> only found the usual "improving performance for small files"
hints. All
> these suggestions have been tested but did not or only slightly improve
> performance. Can anyone please tell me ...
> 
> - Is there a way to check the amount of time that the different parts of
> a file operation take (like 1ms requesting metadata, 1ms receiving
> metadata, 123ms reading blocks from OST, ...)?
You can use "strace -ttt" on the client to see the time of each
syscall.  You can look at the various "stats" files on the MDS to see
the wait+processing time of each RPC type.


Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Ashley Pittman

2010-Aug-12 16:25 UTC

head link

[Lustre-discuss] How to track down a latency/timing problem

On 12 Aug 2010, at 17:10, robert wrote:> - Is there a way to check the amount of time that the different parts of
> a file operation take (like 1ms requesting metadata, 1ms receiving
> metadata, 123ms reading blocks from OST, ...)?
"strace ls" is interesting to watch if you haven''t unaliased
ls as below...
> - Does anyone have a hint on what could be the problem, where to search
> or what to do?
One very simple trick that I was un-aware of until it was mentioned at the lug
is to unalias ls.  Most, if not all distributions alias ls to "ls
--color" which causes ls to stat every single file in a directory to
decipher it''s type, running /bin/ls will improve the ls performance by
several orders of magnitude.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk

Robert Pinnow, rise | fx

2010-Aug-13 17:49 UTC

head link

Re: How to track down a latency/timing problem

Thank you Andreas and Ashley.

      Together your answers really helped.

      Though the file creation rates did only improve by 2x, all file
      listing operations are now about 20x as fast.

      The problem was that ACLs were not activated on the MDS but
      requested by default aliased "ls". Every single line of "ls
-l"
      returned a "Operation not suported" error message with strace
and
      took about 100-300ms.

      Thanks again!

      Robert

    Am 12.08.2010 18:22, schrieb Andreas Dilger:

On 2010-08-12, at 10:10, robert wrote:

I am trying to solve a problem with very slow "ls" and other big
amount
of file operations but good overall read/write rates.

We are running a small cluster of 3 OSSs with 9 OSTs, 1MDS (with SSD
MDT) and currently two clients. All server nodes are centos 5.2 with
lustre 1.8.1 while the clients are centos 5.4 with lustre 1.8.3. All
components are networked with DDR IB. Striping is set to 1 or 2 for
different folders.

I don''t recall if there are specific metadata improvements in 1.8.3 vs.
1.8.1, but it is usually better to have newer RPMs than older.  Note that if you
have 1.8.1 (vs. 1.8.1.1) there is a corruption bug in the MDS, and you
definitely need to upgrade.

From the very beginning of our tests we had rather slow metadata
operations. File creation maxed at 250/s/client.

For DDR IB that is definitely slow.

"ls" of a dir with 1000
files takes about 40-70 seconds almost independently from the file´s
sizes. Dirs that have recently been accessed, are of course much faster
due to caching. There is no general performance problem as we are
getting almost 1G/s when reading/writing big files from two clients in
several threads. But when creating lots of files with lmdd in a test
script in a single thread there are also hangs of a few seconds before
the rates get back to normal.

I´ve been searching the mailing list archives for similar problems but
only found the usual "improving performance for small files" hints.
All
these suggestions have been tested but did not or only slightly improve
performance. Can anyone please tell me ...

- Is there a way to check the amount of time that the different parts of
a file operation take (like 1ms requesting metadata, 1ms receiving
metadata, 123ms reading blocks from OST, ...)?

You can use "strace -ttt" on the client to see the time of each
syscall.  You can look at the various "stats" files on the MDS to see
the wait+processing time of each RPC type.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

    -- 

Robert
                  Pinnow - Managing Director - r i
                    s e | fx

        t:
+49
                  30 201 803 00 robert
@
                  risefx.com 

                  c: +49 172 384 2183 www.risefx.com

        r i s e | fx GmbH 

              Schlesische Strasse 28,
                  Aufgang B 10997
                  Berlin

              Geschaeftsfuehrer: Sven Pannicke, Robert
                  Pinnow 

                  Handelsregister Berlin HRB 106667 B


_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Apparently Analagous Threads

Search for more maybe matching threads

Lustre discuss - Aug 2010 - How to track down a latency/timing problem

[Lustre-discuss] How to track down a latency/timing problem

[Lustre-discuss] How to track down a latency/timing problem

[Lustre-discuss] How to track down a latency/timing problem

Re: How to track down a latency/timing problem

Apparently Analagous Threads