Hi all, since our users have managed to write several TBs to Lustre by now, they sometimes would like to know what and how much there is in their directories. Is there any smarter way to find out than to do a "du -hs <dirname>" and wait for 30min for the 12TB-answer ? I''ve already told them to substitute "ls -l" by "find -type f -exec ls -l {};", although I''m not too sure about that either. Regards, Thomas
Lundgren, Andrew
2008-Sep-04 16:43 UTC
[Lustre-discuss] Lustre directory sizes - fast "du"
We are having the same issue at the moment, so I am looking for a reply as well. One suggestion was to use lfs find, though initially we haven''t seen that to be faster. -- Andrew -----Original Message----- From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Thomas Roth Sent: Thursday, September 04, 2008 7:38 AM To: lustre-discuss at lists.lustre.org Subject: [Lustre-discuss] Lustre directory sizes - fast "du" Hi all, since our users have managed to write several TBs to Lustre by now, they sometimes would like to know what and how much there is in their directories. Is there any smarter way to find out than to do a "du -hs <dirname>" and wait for 30min for the 12TB-answer ? I''ve already told them to substitute "ls -l" by "find -type f -exec ls -l {};", although I''m not too sure about that either. Regards, Thomas _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
> Hi all, since our users have managed to write several TBs to > Lustre by now, they sometimes would like to know what and how > much there is in their directories. Is there any smarter way > to find out than to do a "du -hs <dirname>" and wait for 30min > for the 12TB-answer ?If you have any patches that speed up the fetching of (what are likely to be) millions of records from random places on a disk very quickly, and also speed up the latency of the associated network roundtrips please let us know :-). Recent Lustre version try at least to amortize the roundtrip cost by prefetching metadata, which may work. Another way to help might be to put the MDTs on the lowest latency disks you can find on a system with very very large amounts of RAM. It may be worth looking at recent flash SSDs for the MDTs, as they have very very low latency.> I''ve already told them to substitute "ls -l" by "find -type f > -exec ls -l {};", although I''m not too sure about that either.That''s crazy. Even if you have directories with really large number of files, which is crazy in itself. Lustre scales *data* fairly well, *metadata* scales a lot less well, and that''s because scaling metadata is a huge research problem. I see a number of similarly crazy or much worse questions in the XFS mailing list too. The craziest ones are from the vast number of those who think that file systems are database managers, and files can be used as records. In some way thyings like Linux and Lustre or XFS make building large scale storage projects too easy, bringing difficult issues down in cost and complexity to the level of many who can''t quite realize the import of what they are doing.
On Sep 04, 2008 18:55 +0100, Peter Grandi wrote:> > Hi all, since our users have managed to write several TBs to > > Lustre by now, they sometimes would like to know what and how > > much there is in their directories. Is there any smarter way > > to find out than to do a "du -hs <dirname>" and wait for 30min > > for the 12TB-answer ?One possibility would be to enable quotas with large limits for every user that won''t hamper usage. This will track space usage for each user. Depending on your coding skills it might even be possible to change the quota code so that it only tracked space usage but didn''t enforce usage limits. This would potentially reduce the overhead of quotas because there is no need for OSTs to check the quota limits during IO.> If you have any patches that speed up the fetching of (what are > likely to be) millions of records from random places on a disk > very quickly, and also speed up the latency of the associated > network roundtrips please let us know :-).That is always our goal as well :-).> I''ve already told them to substitute "ls -l" by "find -type f > -exec ls -l {};", although I''m not too sure about that either.I don''t think that will help at all. "ls" is a crazy bunch of code that does "stat" on the directory and all kinds of extra work. Possibly better would be: lfs find ${dir} -type f | xargs stat -c %b Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Is "lfs find ${dir} -type f | xargs stat -c %b" faster than a regular Unix find? On Fri, Sep 5, 2008 at 11:59 PM, Andreas Dilger <adilger at sun.com> wrote:> On Sep 04, 2008 18:55 +0100, Peter Grandi wrote: >> > Hi all, since our users have managed to write several TBs to >> > Lustre by now, they sometimes would like to know what and how >> > much there is in their directories. Is there any smarter way >> > to find out than to do a "du -hs <dirname>" and wait for 30min >> > for the 12TB-answer ? > > One possibility would be to enable quotas with large limits for every > user that won''t hamper usage. This will track space usage for each > user. > > Depending on your coding skills it might even be possible to change > the quota code so that it only tracked space usage but didn''t enforce > usage limits. This would potentially reduce the overhead of quotas > because there is no need for OSTs to check the quota limits during IO. > >> If you have any patches that speed up the fetching of (what are >> likely to be) millions of records from random places on a disk >> very quickly, and also speed up the latency of the associated >> network roundtrips please let us know :-). > > That is always our goal as well :-). > >> I''ve already told them to substitute "ls -l" by "find -type f >> -exec ls -l {};", although I''m not too sure about that either. > > I don''t think that will help at all. "ls" is a crazy bunch of > code that does "stat" on the directory and all kinds of extra > work. Possibly better would be: > > lfs find ${dir} -type f | xargs stat -c %b > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >