Hi Christoph-- Christoph Biardzki wrote:> > Observation #1: > --------------- > > Run "iozone -t1 -i0 -i1 -r1m -s10g -F /mnt/lustre/bigfile" on C. > Run "ls -al" on /mnt/lustre on C at the same time. > > => "ls" will return only after iozone finished writing (which can take > several minutes) Is this normal?This is an exaggerated example of behaviour which is essentially normal (although obviously not desirable). In Lustre 1.0.x, programs like ls (anything which calls stat()) will introduce a very hard barrier, to ensure that it reports a completely accurate file size. This barrier doesn''t wait for iozone to finish writing entirely, but it does wait for any cached dirty data to flush, which by default can be quite a lot. Try this, and see if this makes things better, even though it won''t make it perfect: for i in /proc/fs/lustre/osc/OSC_*MNT_*/max_dirty_mb; do echo 4 > $i done This will reduce the amount of cached dirty data, which should improve the response time. This should not affect throughput for the non-ls case, and will probably be the default setting in Lustre 1.0.3. With some experimentation, you may find that you can reduce it further. We are in the process of removing this hard barrier, which is the correct way to deal with this issue.> Observation #2: > --------------- > Metadata performance from client ist still low, on the server its fine:Interesting. Let me see if someone here can reproduce that locally first, because it will be much easier to debug. We have seen effects like this before when the TCP stack flips into a mode which causes a lot of extra latency. -Phil
Hi Christoph-- Christoph Biardzki wrote:> >>Nevertheless, 1 create/s is not normal. Do you have the same problem >>with the 1.0.1 packages? How are you creating these files? > > I just untar''ed a file. It only happened from the "client" to the > "server", the Lustre filesystem mounted on the server (locally) is very > fast. Strange. Let me try 1.0.1 and I''ll report the results.I think you will not have much better luck with 1.0.1, because this sounds like an extreme case of a known deficiency in our locking protocol. Does your tar file have a pretty deep directory tree? Try an experiment for me, if you would: un-tar a file which is just one level deep (i.e., just creates files and directories in the current working directory). Is it much faster? Our metadata locking protocol today favours this current working directory load because it was simplest to implement and made our initial customers happy. When you untar something with many directories, the metadata locks bounce back and forth between the client and server, and your performance goes way down. Someone is working on this issue right now; a fix will appear in an upcoming 1.0.x release, but I''m not certain exactly which. Thanks-- -Phil
Phil Schwan wrote:> Try an experiment for me, if you would: un-tar a file which is just one > level deep (i.e., just creates files and directories in the current > working directory). Is it much faster? > > Our metadata locking protocol today favours this current working > directory load because it was simplest to implement and made our initial > customers happy. When you untar something with many directories, the > metadata locks bounce back and forth between the client and server, and > your performance goes way down. > > Someone is working on this issue right now; a fix will appear in an > upcoming 1.0.x release, but I''m not certain exactly which. > > Thanks-- > > -Phil >Hi Phil, I just compiled 1.0.1 from source, patched and compiled a vanilla 2.4.20 kernel (setup symlinks and "quilt push -av", right?) and installed everything on my two node Client (C) -Server (S) test setup. Lustre Version: b1_0-20031214053127-PRISTINE-.usr.src.linux-2.4.20 I used the example uml.sh script to configure - I changed the name of the "client" and the "server" in uml.sh and called MDSDEV=/dev/sdb1 OSTDEV1=/dev/sdb2 MDSSIZE=900000 OSTSIZE=125000000 sh uml.sh Observation #1: --------------- Run "iozone -t1 -i0 -i1 -r1m -s10g -F /mnt/lustre/bigfile" on C. Run "ls -al" on /mnt/lustre on C at the same time. => "ls" will return only after iozone finished writing (which can take several minutes) Is this normal? Observation #2: --------------- Metadata performance from client ist still low, on the server its fine: client:/mnt/lustre # time for i in 1 2 3 4 5 6 7 8 9 10; do touch /mnt/lustre/file-$i ; done; real 0m1.235s user 0m0.000s sys 0m0.010s server:/usr/lib/lustre/examples # time for i in 1 2 3 4 5 6 7 8 9 10; do touch /mnt/lustre/file2-$i ; done; real 0m0.101s user 0m0.000s sys 0m0.020s I still have to try the pre-patched lustre kernel... - Christoph -- Leibniz Rechenzentrum München (LRZ) http://www.lrz.de High Performance Systems Division Barer Str. 21 - 80333 Munich - Germany Tel. ++49-(0)89 / 289-28853, Room 1527
Hi Phil, Phil Schwan wrote:>>disabling portals debug helps! Now I can see ~70 MB/s write througput >>from one client to one server (equipped with a four-disk RAID0 which >>achieves around 100 MB/s write / 80 MB/s read for a 32 GB file). This is >>pretty good! > > > Not bad, although there is still an unusual bottleneck somewhere. Which > benchmark are you using? If you run vmstat on the OSS and client, are > you CPU-bound?Server & Client are Dual-Xeons 2.8 GHz. Server is around 37%, Client ~25%. In production we''d use LSI FC-disk arrays so optimizing for this test setup is not really useful. I''m perfectly happy with performance now - but I had some problems with crashes on 1.0.0 and thats why I tried CVS.> The Lustre 1.0.2 release will have a more sensible default debugging > level. Can you wait a week or so? Otherwise I think you would have to > write a small script that sets the debug level afterwards with sysctl.Thats perfectly OK, I was just curious :)> >>BTW: I just tried the current CVS version - works, but metadata >>performance is horrible (~1 file create/s) - is there a reason or some >>other hidden option? :)) > > > It is very difficult to choose the correct version of the code from CVS, > so we strongly discourage people from trying. > > Nevertheless, 1 create/s is not normal. Do you have the same problem > with the 1.0.1 packages? How are you creating these files?I just untar''ed a file. It only happened from the "client" to the "server", the Lustre filesystem mounted on the server (locally) is very fast. Strange. Let me try 1.0.1 and I''ll report the results. Thanks! - Christoph -- Leibniz Rechenzentrum München (LRZ) http://www.lrz.de High Performance Systems Division Barer Str. 21 - 80333 Munich - Germany Tel. ++49-(0)89 / 289-28853, Room 1527 -- Leibniz Rechenzentrum München (LRZ) http://www.lrz.de High Performance Systems Division Barer Str. 21 - 80333 Munich - Germany Tel. ++49-(0)89 / 289-28853, Room 1527