We have been struggling with our Lustre performance for some time now especially with large directories. I recently did some informal benchmarking (on a live system so I know results are not scientifically valid) and noticed a huge drop in performance of reads(stat operations) past 20k files in a single directory. I''m using bonnie++, disabling IO testing (-s 0) and just creating, reading, and deleting 40kb files in a single directory. I''ve done this on for directory sizes of 2,000 to 40,000 files. Create performance is a flat line of ~150 files/sec across the board. Delete performance is all over the place, but no higher than 3,000 files/sec. The really interesting data point is read performance, which for these tests is just a stat of the file not reading data. Starting with the smaller directories it is relatively consistent at just below 2,500 files/sec, but when I jump from 20,000 files to 30,000 files the performance drops to around 100 files/sec. We were assuming this was somewhat expected behavior and are in the process of trying to get our users to change their code. Then yesterday I was browsing the Lustre Operations Manual and found section 33.8 that says Lustre is tested with directories as large as 10 million files in a single directory and still get lookups at a rate of 5,000 files/sec. That leaves me wondering 2 things. How can we get 5,000 files/sec for anything and why is our performance dropping off so suddenly at after 20k files? Here is our setup: All IO servers are Dell PowerEdge 2950s. 2 8-core sockets with X5355 @ 2.66GHz and 16Gb of RAM. The data is on DDN S2A 9550s with 8+2 RAID configuration connected directly with 4Gb Fibre channel. They are running RHEL 4.5, Lustre 6.7.2-ddn3, kernel 2.6.18-128.7.1.el5.ddn1.l1.6.7.2.ddn3smp As a side note the users code is Parflow, developed at LLNL. The files are SILO files. We have as many as 1.4 million files in a single directory and we now have half a billion files that we need to deal with in one way or another. The code has already been modified to split the files on newer runs until multiple subdirectories, but we''re still dealing with 10s of thousands of files in a single directory. The users have been able to run these data sets on Lustre systems at LLNL 3 orders of magnitude faster. Thanks, Mike Robbert HPC & Networking Engineer Colorado School of Mines
It will continue downward as the number of files in the directory increase. Interestingly, GPFS stat performance increased as the number of files increased. My tests were on 128 nodes * 8 processes/node * 10 - 500 files per process. - Richard On 9/10/10 11:11 AM, "Michael Robbert" <mrobbert at mines.edu> wrote:> We have been struggling with our Lustre performance for some time now > especially with large directories. I recently did some informal benchmarking > (on a live system so I know results are not scientifically valid) and noticed > a huge drop in performance of reads(stat operations) past 20k files in a > single directory. I''m using bonnie++, disabling IO testing (-s 0) and just > creating, reading, and deleting 40kb files in a single directory. I''ve done > this on for directory sizes of 2,000 to 40,000 files. Create performance is a > flat line of ~150 files/sec across the board. Delete performance is all over > the place, but no higher than 3,000 files/sec. The really interesting data > point is read performance, which for these tests is just a stat of the file > not reading data. Starting with the smaller directories it is relatively > consistent at just below 2,500 files/sec, but when I jump from 20,000 files to > 30,000 files the performance drops to around 100 files/sec. We were assuming > this w > as somewhat expected behavior and are in the process of trying to get our > users to change their code. Then yesterday I was browsing the Lustre > Operations Manual and found section 33.8 that says Lustre is tested with > directories as large as 10 million files in a single directory and still get > lookups at a rate of 5,000 files/sec. That leaves me wondering 2 things. How > can we get 5,000 files/sec for anything and why is our performance dropping > off so suddenly at after 20k files? > > Here is our setup: > All IO servers are Dell PowerEdge 2950s. 2 8-core sockets with X5355 @ > 2.66GHz and 16Gb of RAM. > The data is on DDN S2A 9550s with 8+2 RAID configuration connected directly > with 4Gb Fibre channel. > They are running RHEL 4.5, Lustre 6.7.2-ddn3, kernel > 2.6.18-128.7.1.el5.ddn1.l1.6.7.2.ddn3smp > > As a side note the users code is Parflow, developed at LLNL. The files are > SILO files. We have as many as 1.4 million files in a single directory and we > now have half a billion files that we need to deal with in one way or another. > The code has already been modified to split the files on newer runs until > multiple subdirectories, but we''re still dealing with 10s of thousands of > files in a single directory. The users have been able to run these data sets > on Lustre systems at LLNL 3 orders of magnitude faster. > > Thanks, > Mike Robbert > HPC & Networking Engineer > Colorado School of Mines > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://*lists.lustre.org/mailman/listinfo/lustre-discuss >=================================================== Richard Hedges Customer Support and Test - File Systems Project Development Environment Group - Livermore Computing Lawrence Livermore National Laboratory 7000 East Avenue, MS L-557 Livermore, CA 94551 v: (925) 423-2699 f: (925) 423-6961 E: richard-hedges at llnl.gov
Richard, Are you talking about bonnie++ performance or Parflow performance? And doesn''t this fly in the face of the Lustre Operation Manual that seems to indicate that performance should be fun up to at least 10 million files in a single directory? How do you reconcile your results with the fact that there are users running at LLNL with up to 1.4 million files in a single directory? Thanks, Mike On Sep 10, 2010, at 12:16 PM, Hedges, Richard M. wrote:> It will continue downward as the number of files in the directory increase. > Interestingly, GPFS stat performance increased as the number of files > increased. My tests were on 128 nodes * 8 processes/node * 10 - 500 files > per process. > > - Richard > > > On 9/10/10 11:11 AM, "Michael Robbert" <mrobbert at mines.edu> wrote: > >> We have been struggling with our Lustre performance for some time now >> especially with large directories. I recently did some informal benchmarking >> (on a live system so I know results are not scientifically valid) and noticed >> a huge drop in performance of reads(stat operations) past 20k files in a >> single directory. I''m using bonnie++, disabling IO testing (-s 0) and just >> creating, reading, and deleting 40kb files in a single directory. I''ve done >> this on for directory sizes of 2,000 to 40,000 files. Create performance is a >> flat line of ~150 files/sec across the board. Delete performance is all over >> the place, but no higher than 3,000 files/sec. The really interesting data >> point is read performance, which for these tests is just a stat of the file >> not reading data. Starting with the smaller directories it is relatively >> consistent at just below 2,500 files/sec, but when I jump from 20,000 files to >> 30,000 files the performance drops to around 100 files/sec. We were assuming >> this w >> as somewhat expected behavior and are in the process of trying to get our >> users to change their code. Then yesterday I was browsing the Lustre >> Operations Manual and found section 33.8 that says Lustre is tested with >> directories as large as 10 million files in a single directory and still get >> lookups at a rate of 5,000 files/sec. That leaves me wondering 2 things. How >> can we get 5,000 files/sec for anything and why is our performance dropping >> off so suddenly at after 20k files? >> >> Here is our setup: >> All IO servers are Dell PowerEdge 2950s. 2 8-core sockets with X5355 @ >> 2.66GHz and 16Gb of RAM. >> The data is on DDN S2A 9550s with 8+2 RAID configuration connected directly >> with 4Gb Fibre channel. >> They are running RHEL 4.5, Lustre 6.7.2-ddn3, kernel >> 2.6.18-128.7.1.el5.ddn1.l1.6.7.2.ddn3smp >> >> As a side note the users code is Parflow, developed at LLNL. The files are >> SILO files. We have as many as 1.4 million files in a single directory and we >> now have half a billion files that we need to deal with in one way or another. >> The code has already been modified to split the files on newer runs until >> multiple subdirectories, but we''re still dealing with 10s of thousands of >> files in a single directory. The users have been able to run these data sets >> on Lustre systems at LLNL 3 orders of magnitude faster. >> >> Thanks, >> Mike Robbert >> HPC & Networking Engineer >> Colorado School of Mines >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://*lists.lustre.org/mailman/listinfo/lustre-discuss >> > > > ===================================================> > Richard Hedges > Customer Support and Test - File Systems Project > Development Environment Group - Livermore Computing > Lawrence Livermore National Laboratory > 7000 East Avenue, MS L-557 > Livermore, CA 94551 > > v: (925) 423-2699 > f: (925) 423-6961 > E: richard-hedges at llnl.gov >
On 2010-09-10, at 12:11, Michael Robbert wrote:> Create performance is a flat line of ~150 files/sec across the board. Delete performance is all over the place, but no higher than 3,000 files/sec... Then yesterday I was browsing the Lustre Operations Manual and found section 33.8 that says Lustre is tested with directories as large as 10 million files in a single directory and still get lookups at a rate of 5,000 files/sec. That leaves me wondering 2 things. How can we get 5,000 files/sec for anything and why is our performance dropping off so suddenly at after 20k files? > > Here is our setup: > All IO servers are Dell PowerEdge 2950s. 2 8-core sockets with X5355 @ 2.66GHz and 16Gb of RAM. > The data is on DDN S2A 9550s with 8+2 RAID configuration connected directly with 4Gb Fibre channel.Are you using the DDN 9550s for the MDT? That would be a bad configuration, because they can only be configured with RAID-6, and would explain why you are seeing such bad performance. For the MDT you always want to have RAID-1+0 storage. Potentially, for every 512-byte inode written to disk you need to write many times that much data inside the RAID-6 array to keep the parity correct. For large filesystems, sites have used 12 or 24 small SAS disks (15k RPM) in RAID-1+0 to get high IOPS performance for the MDT.> We have as many as 1.4 million files in a single directory and we now have half a billion files that we need to deal with in one way or another.Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
On Saturday, September 11, 2010, Andreas Dilger wrote:> On 2010-09-10, at 12:11, Michael Robbert wrote: > > Create performance is a flat line of ~150 files/sec across the board. > > Delete performance is all over the place, but no higher than 3,000 > > files/sec... Then yesterday I was browsing the Lustre Operations Manual > > and found section 33.8 that says Lustre is tested with directories as > > large as 10 million files in a single directory and still get lookups at > > a rate of 5,000 files/sec. That leaves me wondering 2 things. How can we > > get 5,000 files/sec for anything and why is our performance dropping off > > so suddenly at after 20k files? > > > > Here is our setup: > > All IO servers are Dell PowerEdge 2950s. 2 8-core sockets with X5355 @ > > 2.66GHz and 16Gb of RAM. The data is on DDN S2A 9550s with 8+2 RAID > > configuration connected directly with 4Gb Fibre channel. > > Are you using the DDN 9550s for the MDT? That would be a bad > configuration, because they can only be configured with RAID-6, and would > explain why you are seeing such bad performance. For the MDT you alwaysUnfortunately, we failed to copy the scratch MDT in a reasonable time so far. Copying several hundreds of million files turned out to take ages ;) But I guess Mike did the benchmarks for the other filesystem with an EF3010.> > We have as many as 1.4 million files in a single directory and we now > > have half a billion files that we need to deal with in one way or > > another.Mike, is there a chance you can try which rate acp reports? http://oss.oracle.com/~mason/acp/ Also could you please send me your exact bonnie line or script? We could try to reproduce it on and idle test 9550 with a 6620 for metada (the 6620 is slower for that than the ef3010). Thanks, Bernd -- Bernd Schubert DataDirect Networks
On 2010-09-10, at 17:32, Bernd Schubert wrote:> On Saturday, September 11, 2010, Andreas Dilger wrote: >> >> Are you using the DDN 9550s for the MDT? That would be a bad >> configuration, because they can only be configured with RAID-6, and would >> explain why you are seeing such bad performance. For the MDT you always > > Unfortunately, we failed to copy the scratch MDT in a reasonable time so far. > Copying several hundreds of million files turned out to take ages ;) But I > guess Mike did the benchmarks for the other filesystem with an EF3010.Probably for a straight transfer of the MDT, it would be MUCH faster to do a straight "dd" of the filesystem over to the new LUN (assuming it is at least as large as the original LUN). Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
On Sep 10, 2010, at 5:32 PM, Bernd Schubert wrote:> On Saturday, September 11, 2010, Andreas Dilger wrote: >> On 2010-09-10, at 12:11, Michael Robbert wrote: >>> Create performance is a flat line of ~150 files/sec across the board. >>> Delete performance is all over the place, but no higher than 3,000 >>> files/sec... Then yesterday I was browsing the Lustre Operations Manual >>> and found section 33.8 that says Lustre is tested with directories as >>> large as 10 million files in a single directory and still get lookups at >>> a rate of 5,000 files/sec. That leaves me wondering 2 things. How can we >>> get 5,000 files/sec for anything and why is our performance dropping off >>> so suddenly at after 20k files? >>> >>> Here is our setup: >>> All IO servers are Dell PowerEdge 2950s. 2 8-core sockets with X5355 @ >>> 2.66GHz and 16Gb of RAM. The data is on DDN S2A 9550s with 8+2 RAID >>> configuration connected directly with 4Gb Fibre channel. >> >> Are you using the DDN 9550s for the MDT? That would be a bad >> configuration, because they can only be configured with RAID-6, and would >> explain why you are seeing such bad performance. For the MDT you always > > Unfortunately, we failed to copy the scratch MDT in a reasonable time so far. > Copying several hundreds of million files turned out to take ages ;) But I > guess Mike did the benchmarks for the other filesystem with an EF3010.The benchmarks listed above are for our scratch filesystem, whose MDT is on the 9550. I don''t know why I didn''t mention the benchmarks that I also ran on our home filesystem whose MDT was recently moved to the EF3010 with RAID 1+0 on 6 SAS disks. The other 6 disks in the EF3010 are waiting for when we can move the scratch MDT there. Anyways, the benchmarks on home were actually worse. Create performance was about the same, but read performance was in the low hundreds. The command line was: ./bonnie++ -d $dir -s 0 -n $size:40000:40000:1 Where $dir was a directory on the filesystem being tested and $size was the number of files in thousands (5, 10, 20, 30) A dd of the MDT wasn''t possible because the original LUN was nearly 5Tb (only 35Gb used), but the new LUN is just over 1Tb.> >>> We have as many as 1.4 million files in a single directory and we now >>> have half a billion files that we need to deal with in one way or >>> another. > > Mike, is there a chance you can try which rate acp reports? > > http://oss.oracle.com/~mason/acp/ > > Also could you please send me your exact bonnie line or script? We could try > to reproduce it on and idle test 9550 with a 6620 for metada (the 6620 is > slower for that than the ef3010).I have downloaded and compiled acp. I have started a copy of one of 1.6 million file directories. After 1 hour it is still reading files from a top level directory with only 122k files and hasn''t written anything. The only option used on the command line was -v so I could see what it was doing. Thanks, Mike
> We have been struggling with our Lustre performance for some > time now especially with large directories.Are you assuming that Lustre has been designed for good performance with lots of (probably tiny) files in large directories?> I recently did some informal benchmarking (on a live system so > I know results are not scientifically valid) and noticed a huge > drop in performance of reads(stat operations) past 20k files in > a single directory.Is a a benchmark really needed to figure that out?> I''m using bonnie++, disabling IO testing (-s 0) and just > creating, reading, and deleting 40kb files in a single > directory.What do you think Bonnie++ is a benchmark of?> [ ... ] The really interesting data point is read performance, > which for these tests is just a stat of the file not reading > data. Starting with the smaller directories it is relatively > consistent at just below 2,500 files/sec, but when I jump from > 20,000 files to 30,000 files the performance drops to around > 100 files/sec.Why is that surprising?> [ ... ] are in the process of trying to get our users to > change their code. [ ... ]But as mentioned below it is being changed in a way that will help but not a lot.> Then yesterday I was browsing the Lustre Operations ManualDid you read it before designing and setting up your system? There are relevant bits of advice in 1.4.2.2 and 10.1.1-4 for example (some of them objectionable, such as recommending RAID6 for data storage, without the necessary qualifications at the very least).> and found section 33.8 that says Lustre is tested with > directories as large as 10 million files in a single directoryWhy would "tested" imply "works real fast in every possible, including really stupid, setup"?> and still get lookups at a rate of 5,000 files/sec.What sort of "lookups" do you think they were talking about? On what sort of storage systems do you think you get 5,000 random metadata operations/s? Can you explain how to get 5,000 *random* metadata lookup/s from disks that can do 50-100 random IOP/s each?> That leaves me wondering 2 things. How can we get 5,000 > files/sec for anything and why is our performance dropping off > so suddenly at after 20k files?Why do you need to wonder? Have you read about new amazing techniques like caching in RAM/flash and scaling via RAID? Have your read the extensive discussions of metadata and data performance in the Lustre docs?> Here is our setup: All IO servers are Dell PowerEdge 2950s. 2 > 8-core sockets with X5355 @ 2.66GHz and 16Gb of RAM. The data > is on DDN S2A 9550s with 8+2 RAID configuration connected > directly with 4Gb Fibre channel.Why do you describe where the data is when you have so far talked only about the netadata? Do you have a good idea of the differences (and the different workloads a described in the Lustre manual) between MDS/MDTs and OSSes/OSTs? ALso, if you have a highly parallel program that deals with what look like millions of tiny files (which looks like an appalling misdesign to me), why do you run it on a RAID3 (of all things) storage system? If you storing the metadata for Lustre on the same storage system as the data *and* it is a RAID3 setup, WHY WHY WHY? Why haven''t you hired Sun/Oracle consultants to design and configure your metadata and data storage systems?> They are running RHEL 4.5, Lustre 6.7.2-ddn3, kernel > 2.6.18-128.7.1.el5.ddn1.l1.6.7.2.ddn3smpWhy are you running a very old version of Lustre (and on RHEL45 of all things, but that is less relevant)? Are your running the servers in 32b or 64b mode?> As a side note the users code is Parflow, developed at LLNL. > The files are SILO files. We have as many as 1.4 million files > in a single directoryWhy hasn''t LLNL hired consultants who understand the differences between file systems and DBMSes to help design ParFlow?> and we now have half a billion files that we need to deal with > in one way or another.To me that means that the application is appallingly written (there are a lot of those about). Then perhaps your setup is entirely inappropriate for most types of workload and even more so for metadata intensive ones, and maybe Lustre was designed for optimal performnance on large streaming workloads, so what looks to me an appallingly misdesigned application works particularly badly in your case.> The code has already been modified to split the files on newer > runs until multiple subdirectories, but we''re still dealing > with 10s of thousands of files in a single directory.To me that''s still appalling. There are very good reasons why file systems and DBMSes both exist, and they are not the same.> The users have been able to run these data sets on Lustre > systems at LLNL 3 orders of magnitude faster.Do you think that LLNL have metadata storage and caches as weak as yours? Given how the application is "designed", would it suffer a colossal performance drop at LLNL too on a suitably larger data set? Have you realized by now that Lustre performance is very, very anisotropic in the space of possible setups and applications?
Michael Robbert wrote:> We have been struggling with our Lustre performance for some time now > especially with large directories. I recently did some informal > benchmarking (on a live system so I know results are not > scientifically valid) and noticed a huge drop in performance of > reads(stat operations) past 20k files in a single directory. I''m > using bonnie++, disabling IO testing (-s 0) and just creating, > reading, and deleting 40kb files in a single directory. I''ve done > this on for directory sizes of 2,000 to 40,000 files. Create > performance is a flat line of ~150 files/sec across the board. Delete > performance is all over the place, but no higher than 3,000 > files/sec. The really interesting data point is read performance, > which for these tests is just a stat of the file not reading data. > Starting with the smaller directories it is relatively consistent at > just below 2,500 files/sec, but when I jump from 20,000 files to > 30,000 files the performance drops to around 100 files/sec. We wereThink small random RAID6 reads. Performance craters when you do this.> assuming this w as somewhat expected behavior and are in the process > of trying to get our users to change their code. Then yesterday I was > browsing the Lustre Operations Manual and found section 33.8 that > says Lustre is tested with directories as large as 10 million files > in a single directory and still get lookups at a rate of 5,000 > files/sec. That leaves me wondering 2 things. How can we get 5,000 > files/sec for anything and why is our performance dropping off so > suddenly at after 20k files?Change your MDT to be on a different machine. A very fast RAID10. I''ve seen fast SAS 15k recommended, but they aren''t the only options. What you want are very high random read IOPs.> Here is our setup: All IO servers are Dell PowerEdge 2950s. 2 8-core > sockets with X5355 @ 2.66GHz and 16Gb of RAM. The data is on DDN S2A > 9550s with 8+2 RAID configuration connected directly with 4Gb Fibre > channel. They are running RHEL 4.5, Lustre 6.7.2-ddn3, kernel > 2.6.18-128.7.1.el5.ddn1.l1.6.7.2.ddn3smpHmmm... thats a RHEL5 kernel, not a RHEL4 kernel. Are you sure you have 4.5?> > As a side note the users code is Parflow, developed at LLNL. The > files are SILO files. We have as many as 1.4 million files in a > single directory and we now have half a billion files that we need to > deal with in one way or another. The code has already been modified > to split the files on newer runs until multiple subdirectories, but > we''re still dealing with 10s of thousands of files in a single > directory. The users have been able to run these data sets on Lustre > systems at LLNL 3 orders of magnitude faster.This shouldn''t be a problem for a well designed system. Regards, Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
Peter, can you comment on what you said here about RAID6? Are there Twiki or other entries somewhere about this? There are relevant bits of advice in 1.4.2.2 and 10.1.1-4 for example (some of them objectionable, such as recommending RAID6 for data storage, without the necessary qualifications at the very least). Thanks, bob
> > [ ... ] The really interesting data point is read performance, > > which for these tests is just a stat of the file not reading > > data. Starting with the smaller directories it is relatively > > consistent at just below 2,500 files/sec, but when I jump from > > 20,000 files to 30,000 files the performance drops to around > > 100 files/sec. > > Why is that surprising?No, with dirindex 30000 files are not that much. In fact I could reproduce Mikes numbers also with smaller directory sizes. But I could bump it for a single node to consistently 30000 after increasing the LRU_SIZE. Now people might wonder why this matters if there is lru-auto-resize. Simple answer, several DDN customers including CSM run into serious issues with lru-auto- resize enabled. Not all of those issues are resolved even in latest Lustre releases. However, I definitely need to work on patch to be able to disable/enable it on demand (so far each and every network reconnection resets it to default, so something like a cron script it required on clients to set the value one wants to have).> > What sort of "lookups" do you think they were talking about? > > On what sort of storage systems do you think you get 5,000 random > metadata operations/s?Really large directories suffer from the htree dirindex implementation returning random inode numbers instead of sequential inode numbers for readdir(). And that is rather sub-optimal for cp/tar/''ls -l''/etc> > Can you explain how to get 5,000 *random* metadata lookup/s from > disks that can do 50-100 random IOP/s each? > > > That leaves me wondering 2 things. How can we get 5,000 > > files/sec for anything and why is our performance dropping off > > so suddenly at after 20k files? > > Why do you need to wonder?I would expect that performance drops off after in between 100K and 1 million files per directory, but not 20000 yet.> > They are running RHEL 4.5, Lustre 6.7.2-ddn3, kernel > > 2.6.18-128.7.1.el5.ddn1.l1.6.7.2.ddn3smp > > Why are you running a very old version of Lustre (and on RHEL45 > of all things, but that is less relevant)?1.6.7.2-ddnX is still maintained and 1.8 also does not provide better metadata performance. Tests and new systems show that 1.8.3-ddn3.2 runs rather stable and vanilla 1.8.4 also so far seems to be mostly fine. So we start to encourage people to update. However, from my personal point of view, 1.8.2 was a draw-back for stability compared to 1.8.1.1 and it took some time to find out all issues. Some bugs CSM is running sometimes into, are also not yet fixed in 1.8. Introducing possible and unknown new issues is mostly not an option for production systems.> > Are your running the servers in 32b or 64b mode? > > > As a side note the users code is Parflow, developed at LLNL. > > The files are SILO files. We have as many as 1.4 million files > > in a single directory > > Why hasn''t LLNL hired consultants who understand the differences > between file systems and DBMSes to help design ParFlow?With all the knowledgeable people at LLNL, I have no idea how such an application ever could be written.> > The code has already been modified to split the files on newer > > runs until multiple subdirectories, but we''re still dealing > > with 10s of thousands of files in a single directory. > > To me that''s still appalling. There are very good reasons why > file systems and DBMSes both exist, and they are not the same. > > > The users have been able to run these data sets on Lustre > > systems at LLNL 3 orders of magnitude faster. > > Do you think that LLNL have metadata storage and caches as weak > as yours?I definitely know that LLNL was working and pushing lru resize into 1.6.5. That might explain why. Unfortunately, as I said before, that brought up some serious new issues not solved yet until now. I also entirely agree, that the application is not suitable for a Lustre filesystem, even if LLNL should have found some workarounds. Cheers, Bernd -- Bernd Schubert DataDirect Networks
On Sep 11, 2010, at 2:41 PM, Michael Robbert wrote:> >> Mike, is there a chance you can try which rate acp reports? >> >> http://oss.oracle.com/~mason/acp/ >> >> Also could you please send me your exact bonnie line or script? We could try >> to reproduce it on and idle test 9550 with a 6620 for metada (the 6620 is >> slower for that than the ef3010). > > I have downloaded and compiled acp. I have started a copy of one of 1.6 million file directories. After 1 hour it is still reading files from a top level directory with only 122k files and hasn''t written anything. The only option used on the command line was -v so I could see what it was doing. > >What exactly is it that we''re trying to get out of acp? Yesterday one of my "tar pipe" copies finished earlier than expected. It happened while acp was running on another directory which I know should have nothing to do with the other, but then I started another copy yesterday and it finished by this morning (should have taken 2 days). At some point in this process I realized that the write portion of acp appears to not be implemented so all it does is read data. I am wondering if it is causing data to be cached, at a faster rate than tar can read, and therefore helping with the speed of my copying of data. On the other hand processes that I''ve started today appear to be going just as slow as before (maybe a little faster 300-500 files per minute). I''m also beginning to wonder how much of an impact the work of other users is affecting this. If that is the case I can bring some of it to a halt since some of it is the users with this large data as they are attempting to clean up their old data. I would like to know how I can monitor that. In the past I''ve seen the load average of the MDS to go up to 20 or 30. It is only at about 5 right now. How high does it have to go before overall performance is affected? or is that even an indicator I should be looking at? I''m trying to read as much Lustre documentation as I can, mostly the Lustre Operations Manual and old mailing list entries, but most of it is about OSS/OST performance and our problem seems to only be with the MDS/MDT. Any pointers to where I can learn more about what happens on the MDS. Especially anything about how it caches data. Thanks, Mike