thr3ads.net - Lustre discuss - [Lustre-discuss] Slow Tar File Extraction [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Mike Hanby

2011-Mar-27 05:21 UTC

[Lustre-discuss] Slow Tar File Extraction

Howdy,

Lustre 1.8.5 (CentOS 5.5 x86_64 using Lustre provided RPMs)
3 x OSS each serving 2 OSTs
1 x MDS

Client machine is connected via Infiniband to the Lustre filesystem

One of our users reported that their tar files are taking abnormally long to
extract on the Lustre file system (25 minutes), yet on their NFS mounted home
the extraction takes approximately 10 seconds.

I''ve duplicated the issue using their tar file. It''s not
compressed, is 126MB in size and contains 9000 files in a single subdirectory,
most of the files in the tar file are 20K or less in size.

Here are the results:
* Extraction
  * NFS home: 10 seconds
  * Lustre scratch: 20 minutes
* md5sum for all 9000 files
  * NFS home: 7 seconds
  * Lustre scratch: 6 minutes

The command "ls -l" in the directory on both Lustre and NFS home
return with similar speed.

I''ve run the same test extraction on several different clients with the
same results. I''ve also setstripe to 1 on the extraction directory
without any change.

Anyone have any suggestions or ideas?

Thanks, Mike

Mike Hanby

2011-Mar-27 05:57 UTC

head link

[Lustre-discuss] Slow Tar File Extraction

Sorry, I typo''d the "setstripe 1" line, it should have read,
"setstripe 1" the extraction took 3 minutes The directory had a stripe
of 2.

An improvement of 25 minutes to 3 minutes is significant, but still a long way
from the 10 seconds it takes on NFS mounted $HOME.


________________________________________
From: lustre-discuss-bounces at lists.lustre.org [lustre-discuss-bounces at
lists.lustre.org] On Behalf Of Mike Hanby [mhanby at uab.edu]
Sent: Sunday, March 27, 2011 12:21 AM
To: lustre-discuss at lists.lustre.org
Subject: [Lustre-discuss] Slow Tar File Extraction

Howdy,

Lustre 1.8.5 (CentOS 5.5 x86_64 using Lustre provided RPMs)
3 x OSS each serving 2 OSTs
1 x MDS

Client machine is connected via Infiniband to the Lustre filesystem

One of our users reported that their tar files are taking abnormally long to
extract on the Lustre file system (25 minutes), yet on their NFS mounted home
the extraction takes approximately 10 seconds.

I''ve duplicated the issue using their tar file. It''s not
compressed, is 126MB in size and contains 9000 files in a single subdirectory,
most of the files in the tar file are 20K or less in size.

Here are the results:
* Extraction
  * NFS home: 10 seconds
  * Lustre scratch: 20 minutes
* md5sum for all 9000 files
  * NFS home: 7 seconds
  * Lustre scratch: 6 minutes

The command "ls -l" in the directory on both Lustre and NFS home
return with similar speed.

I''ve run the same test extraction on several different clients with the
same results. I''ve also setstripe to 1 on the extraction directory
without any change.

Anyone have any suggestions or ideas?

Thanks, Mike
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Brett Worth

2011-Mar-27 12:42 UTC

head link

[Lustre-discuss] Slow Tar File Extraction

Hi Mike.

On 03/27/2011 04:57 PM, Mike Hanby wrote:> An improvement of 25 minutes to 3 minutes is significant, but still a long
way from the 10 seconds it takes on NFS mounted $HOME.
Is it possible that the mds is slow to resolve uid''s?  Are you using
local accounts or a
network name service?  If it''s slow network lookups then nscd might
help.  Does nscd work
on  an mds?

Brett
-- 

   /) _ _ _/_/ / / /  _ _//
  /_)/</= / / (_(_/()/< ///

Mike Hanby

2011-Mar-27 13:20 UTC

head link

[Lustre-discuss] Slow Tar File Extraction

Howdy Brett, thanks for the reply.

The MDS is using local passwd and group files that are mirrored around the
cluster. So, uid look should be fast.

I also tried turning the default debugging level to 0 on the client and
didn''t see a noticable difference.
________________________________________
From: brett.worth at gmail.com [brett.worth at gmail.com] On Behalf Of Brett
Worth [brett at worth.id.au]
Sent: Sunday, March 27, 2011 7:42 AM
To: Mike Hanby
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Slow Tar File Extraction

Hi Mike.

On 03/27/2011 04:57 PM, Mike Hanby wrote:> An improvement of 25 minutes to 3 minutes is significant, but still a long
way from the 10 seconds it takes on NFS mounted $HOME.
Is it possible that the mds is slow to resolve uid''s?  Are you using
local accounts or a
network name service?  If it''s slow network lookups then nscd might
help.  Does nscd work
on  an mds?

Brett
--

   /) _ _ _/_/ / / /  _ _//
  /_)/</= / / (_(_/()/< ///

Peter Grandi

2011-Mar-28 13:13 UTC

head link

[Lustre-discuss] Slow Tar File Extraction

[ ... ]
>> I''ve duplicated the issue using their tar file. It''s
not
>> compressed, is 126MB in size and contains 9000 files in a
>> single subdirectory, most of the files in the tar file are
>> 20K or less in size.
Many small files in a single directory are bad news for any
filesystem. Especially if they lock the directory (as they
should) and do synchronous metadata updates (as they should)
and they commit files when closed (as they should).

It still amazes me that people with 9000 small records want to
put them as individual files instead of say in ZIP/''ar''
archive
(there is a reason ''.a'' files are used to hold many small
''.o''
files) or even better a BDB or similar file, but this is so
common.

Especially for Lustre which was designed for large highly
parallel data streaming, not for small sequential metadata
workloads.
>> Here are the results:
>> * Extraction
>>   * NFS home: 10 seconds
>>   * Lustre scratch: 20 minutes
>> * md5sum for all 9000 files
>>   * NFS home: 7 seconds
>>   * Lustre scratch: 6 minutes
The NFS numbers are very low indeed. For writing that''s 12MB/s
and 900 inodes/s. Unless the clients are on 100Mb/s that''s
several times slower than expected. As to reading from NFS,
18MB/s and 1300 inodes/s are again quite a bit slower than the
expected 90MB/s on a 1Gb/s link (as demonstrated by a reported
speed higher than 12MB/s). Surely NFS like any nework filesystem
has performance issues in the many small files case, but it
should not be that bad.

I have seen similarly bad (or worse) numbers from a major
science facility where servers were rather missetup, but if that
is not your case then maybe check the other reasons why this is
bad, such as misconfiguration of client, network being
overloaded or lossy or misconfigured, ... as these could be
affecting the Lustre case too.
> Sorry, I typo''d the "setstripe 1" line, it should have
read,
> "setstripe 1" the extraction took 3 minutes The directory had
> a stripe of 2. [ ..
The numbers for Lustre are excessively bad: striped it does
100KB/s and 8 inodes/s writing, and 350KB/s and 25 inodes/s
reading, and unstriped it does 700KB/s and 50 inodes/s writing.

Especially the large difference in inode numbers for the striped
case, as well as the suggests that the metadata updates are done
synchronously with several high latency exchanges between the
MDSes and the OSS or OSSes involved. This points to poor network
and/or disk (most likely disk) latency specially on the MDS,
often due to extreme missetup of the MDS or both.

IIRC there is a vast difference between most versions of Lustre
and with NFS as to write buffering on the client, and especially
espensive and synchronous multi-node ''stat'' for Lustre.

Do the usual basic checks just to establish a baseline:

  * Bandwidth text between client and MDS, MDS and OSS, client
    and OSS using something like ''nuttcp''.
  * Copy the ''.tar'' file to an OST on the OSS itself (not over
    the network) using ''dd bs=1M oflag=direct''.
  * Copy the ''.tar'' file to Lustre from a client using
''dd bs=1M
    conv=fsync''.
  * Create 9000 empty files in a newly created directory in the
    MDT on the MDS itself.
  * Create 9000 empty files in a newly created directory in one
    OST on the OSS itself.
  * Create 9000 empty files in a newly created directory from a
    Lustre client.
  * On the MDS check the IO rates with ''iostat -xd 1''.
  * On the OSS check the IO rates with ''iostat -xd 1''.

I guess that you will have some non amusing surprises.

Lustre discuss - Mar 2011 - Slow Tar File Extraction

[Lustre-discuss] Slow Tar File Extraction

[Lustre-discuss] Slow Tar File Extraction

[Lustre-discuss] Slow Tar File Extraction

[Lustre-discuss] Slow Tar File Extraction

[Lustre-discuss] Slow Tar File Extraction