Howdy, Lustre 1.8.5 (CentOS 5.5 x86_64 using Lustre provided RPMs) 3 x OSS each serving 2 OSTs 1 x MDS Client machine is connected via Infiniband to the Lustre filesystem One of our users reported that their tar files are taking abnormally long to extract on the Lustre file system (25 minutes), yet on their NFS mounted home the extraction takes approximately 10 seconds. I''ve duplicated the issue using their tar file. It''s not compressed, is 126MB in size and contains 9000 files in a single subdirectory, most of the files in the tar file are 20K or less in size. Here are the results: * Extraction * NFS home: 10 seconds * Lustre scratch: 20 minutes * md5sum for all 9000 files * NFS home: 7 seconds * Lustre scratch: 6 minutes The command "ls -l" in the directory on both Lustre and NFS home return with similar speed. I''ve run the same test extraction on several different clients with the same results. I''ve also setstripe to 1 on the extraction directory without any change. Anyone have any suggestions or ideas? Thanks, Mike
Sorry, I typo''d the "setstripe 1" line, it should have read, "setstripe 1" the extraction took 3 minutes The directory had a stripe of 2. An improvement of 25 minutes to 3 minutes is significant, but still a long way from the 10 seconds it takes on NFS mounted $HOME. ________________________________________ From: lustre-discuss-bounces at lists.lustre.org [lustre-discuss-bounces at lists.lustre.org] On Behalf Of Mike Hanby [mhanby at uab.edu] Sent: Sunday, March 27, 2011 12:21 AM To: lustre-discuss at lists.lustre.org Subject: [Lustre-discuss] Slow Tar File Extraction Howdy, Lustre 1.8.5 (CentOS 5.5 x86_64 using Lustre provided RPMs) 3 x OSS each serving 2 OSTs 1 x MDS Client machine is connected via Infiniband to the Lustre filesystem One of our users reported that their tar files are taking abnormally long to extract on the Lustre file system (25 minutes), yet on their NFS mounted home the extraction takes approximately 10 seconds. I''ve duplicated the issue using their tar file. It''s not compressed, is 126MB in size and contains 9000 files in a single subdirectory, most of the files in the tar file are 20K or less in size. Here are the results: * Extraction * NFS home: 10 seconds * Lustre scratch: 20 minutes * md5sum for all 9000 files * NFS home: 7 seconds * Lustre scratch: 6 minutes The command "ls -l" in the directory on both Lustre and NFS home return with similar speed. I''ve run the same test extraction on several different clients with the same results. I''ve also setstripe to 1 on the extraction directory without any change. Anyone have any suggestions or ideas? Thanks, Mike _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hi Mike. On 03/27/2011 04:57 PM, Mike Hanby wrote:> An improvement of 25 minutes to 3 minutes is significant, but still a long way from the 10 seconds it takes on NFS mounted $HOME.Is it possible that the mds is slow to resolve uid''s? Are you using local accounts or a network name service? If it''s slow network lookups then nscd might help. Does nscd work on an mds? Brett -- /) _ _ _/_/ / / / _ _// /_)/</= / / (_(_/()/< ///
Howdy Brett, thanks for the reply. The MDS is using local passwd and group files that are mirrored around the cluster. So, uid look should be fast. I also tried turning the default debugging level to 0 on the client and didn''t see a noticable difference. ________________________________________ From: brett.worth at gmail.com [brett.worth at gmail.com] On Behalf Of Brett Worth [brett at worth.id.au] Sent: Sunday, March 27, 2011 7:42 AM To: Mike Hanby Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Slow Tar File Extraction Hi Mike. On 03/27/2011 04:57 PM, Mike Hanby wrote:> An improvement of 25 minutes to 3 minutes is significant, but still a long way from the 10 seconds it takes on NFS mounted $HOME.Is it possible that the mds is slow to resolve uid''s? Are you using local accounts or a network name service? If it''s slow network lookups then nscd might help. Does nscd work on an mds? Brett -- /) _ _ _/_/ / / / _ _// /_)/</= / / (_(_/()/< ///
[ ... ]>> I''ve duplicated the issue using their tar file. It''s not >> compressed, is 126MB in size and contains 9000 files in a >> single subdirectory, most of the files in the tar file are >> 20K or less in size.Many small files in a single directory are bad news for any filesystem. Especially if they lock the directory (as they should) and do synchronous metadata updates (as they should) and they commit files when closed (as they should). It still amazes me that people with 9000 small records want to put them as individual files instead of say in ZIP/''ar'' archive (there is a reason ''.a'' files are used to hold many small ''.o'' files) or even better a BDB or similar file, but this is so common. Especially for Lustre which was designed for large highly parallel data streaming, not for small sequential metadata workloads.>> Here are the results: >> * Extraction >> * NFS home: 10 seconds >> * Lustre scratch: 20 minutes >> * md5sum for all 9000 files >> * NFS home: 7 seconds >> * Lustre scratch: 6 minutesThe NFS numbers are very low indeed. For writing that''s 12MB/s and 900 inodes/s. Unless the clients are on 100Mb/s that''s several times slower than expected. As to reading from NFS, 18MB/s and 1300 inodes/s are again quite a bit slower than the expected 90MB/s on a 1Gb/s link (as demonstrated by a reported speed higher than 12MB/s). Surely NFS like any nework filesystem has performance issues in the many small files case, but it should not be that bad. I have seen similarly bad (or worse) numbers from a major science facility where servers were rather missetup, but if that is not your case then maybe check the other reasons why this is bad, such as misconfiguration of client, network being overloaded or lossy or misconfigured, ... as these could be affecting the Lustre case too.> Sorry, I typo''d the "setstripe 1" line, it should have read, > "setstripe 1" the extraction took 3 minutes The directory had > a stripe of 2. [ ..The numbers for Lustre are excessively bad: striped it does 100KB/s and 8 inodes/s writing, and 350KB/s and 25 inodes/s reading, and unstriped it does 700KB/s and 50 inodes/s writing. Especially the large difference in inode numbers for the striped case, as well as the suggests that the metadata updates are done synchronously with several high latency exchanges between the MDSes and the OSS or OSSes involved. This points to poor network and/or disk (most likely disk) latency specially on the MDS, often due to extreme missetup of the MDS or both. IIRC there is a vast difference between most versions of Lustre and with NFS as to write buffering on the client, and especially espensive and synchronous multi-node ''stat'' for Lustre. Do the usual basic checks just to establish a baseline: * Bandwidth text between client and MDS, MDS and OSS, client and OSS using something like ''nuttcp''. * Copy the ''.tar'' file to an OST on the OSS itself (not over the network) using ''dd bs=1M oflag=direct''. * Copy the ''.tar'' file to Lustre from a client using ''dd bs=1M conv=fsync''. * Create 9000 empty files in a newly created directory in the MDT on the MDS itself. * Create 9000 empty files in a newly created directory in one OST on the OSS itself. * Create 9000 empty files in a newly created directory from a Lustre client. * On the MDS check the IO rates with ''iostat -xd 1''. * On the OSS check the IO rates with ''iostat -xd 1''. I guess that you will have some non amusing surprises.