thr3ads.net - Lustre discuss - [Lustre-discuss] GigE-Performance II / 0-config] [May 2006]

If this information is useful, please help other people find it:
Share via:

Phil Schwan

2006-May-19 07:36 UTC

[Lustre-discuss] GigE-Performance II / 0-config]

Hi Christoph--

Christoph Biardzki wrote:> 
> Observation #1:
> ---------------
> 
> Run "iozone -t1 -i0 -i1 -r1m -s10g -F /mnt/lustre/bigfile" on C.
> Run "ls -al" on /mnt/lustre on C at the same time.
> 
> => "ls" will return only after iozone finished writing (which
can take
> several minutes) Is this normal?
This is an exaggerated example of behaviour which is essentially normal
(although obviously not desirable).

In Lustre 1.0.x, programs like ls (anything which calls stat()) will
introduce a very hard barrier, to ensure that it reports a completely
accurate file size.  This barrier doesn''t wait for iozone to finish
writing entirely, but it does wait for any cached dirty data to flush,
which by default can be quite a lot.

Try this, and see if this makes things better, even though it won''t
make
it perfect:

  for i in /proc/fs/lustre/osc/OSC_*MNT_*/max_dirty_mb; do
    echo 4 > $i
  done

This will reduce the amount of cached dirty data, which should improve
the response time.  This should not affect throughput for the non-ls
case, and will probably be the default setting in Lustre 1.0.3.  With
some experimentation, you may find that you can reduce it further.

We are in the process of removing this hard barrier, which is the
correct way to deal with this issue.
> Observation #2:
> ---------------
> Metadata performance from client ist still low, on the server its fine:
Interesting.  Let me see if someone here can reproduce that locally
first, because it will be much easier to debug.  We have seen effects
like this before when the TCP stack flips into a mode which causes a lot
of extra latency.

-Phil

Phil Schwan

2006-May-19 07:36 UTC

head link

[Lustre-discuss] GigE-Performance II / 0-config]

Hi Christoph--

Christoph Biardzki wrote:> 
>>Nevertheless, 1 create/s is not normal.  Do you have the same problem
>>with the 1.0.1 packages?  How are you creating these files?
> 
> I just untar''ed a file. It only happened from the
"client" to the
> "server", the Lustre filesystem mounted on the server (locally)
is very
> fast. Strange. Let me try 1.0.1 and I''ll report the results.
I think you will not have much better luck with 1.0.1, because this
sounds like an extreme case of a known deficiency in our locking
protocol.  Does your tar file have a pretty deep directory tree?

Try an experiment for me, if you would: un-tar a file which is just one
level deep (i.e., just creates files and directories in the current
working directory).  Is it much faster?

Our metadata locking protocol today favours this current working
directory load because it was simplest to implement and made our initial
customers happy.  When you untar something with many directories, the
metadata locks bounce back and forth between the client and server, and
your performance goes way down.

Someone is working on this issue right now; a fix will appear in an
upcoming 1.0.x release, but I''m not certain exactly which.

Thanks--

-Phil

Christoph Biardzki

2006-May-19 07:36 UTC

head link

[Lustre-discuss] GigE-Performance II / 0-config]

Phil Schwan wrote:

> Try an experiment for me, if you would: un-tar a file which is just one
> level deep (i.e., just creates files and directories in the current
> working directory).  Is it much faster?
> 
> Our metadata locking protocol today favours this current working
> directory load because it was simplest to implement and made our initial
> customers happy.  When you untar something with many directories, the
> metadata locks bounce back and forth between the client and server, and
> your performance goes way down.
> 
> Someone is working on this issue right now; a fix will appear in an
> upcoming 1.0.x release, but I''m not certain exactly which.
> 
> Thanks--
> 
> -Phil
> 

Hi Phil,



I just compiled 1.0.1 from source, patched and compiled a vanilla 2.4.20 
kernel (setup symlinks and "quilt push -av", right?) and installed 
everything on my two node Client (C) -Server (S) test setup.

Lustre Version: b1_0-20031214053127-PRISTINE-.usr.src.linux-2.4.20

I used the example uml.sh script to configure - I changed the name of 
the "client" and the "server" in uml.sh and called

MDSDEV=/dev/sdb1 OSTDEV1=/dev/sdb2 MDSSIZE=900000 OSTSIZE=125000000 sh 
uml.sh




Observation #1:
---------------

Run "iozone -t1 -i0 -i1 -r1m -s10g -F /mnt/lustre/bigfile" on C.
Run "ls -al" on /mnt/lustre on C at the same time.

=> "ls" will return only after iozone finished writing (which can
take
several minutes) Is this normal?


Observation #2:
---------------
Metadata performance from client ist still low, on the server its fine:


client:/mnt/lustre # time for i in 1 2 3 4 5 6 7 8 9 10; do touch 
/mnt/lustre/file-$i ; done;

real    0m1.235s
user    0m0.000s
sys     0m0.010s


server:/usr/lib/lustre/examples # time for i in 1 2 3 4 5 6 7 8 9 10; do 
touch /mnt/lustre/file2-$i ; done;

real    0m0.101s
user    0m0.000s
sys     0m0.020s



I still have to try the pre-patched lustre kernel...



- Christoph





-- 
Leibniz Rechenzentrum München (LRZ)
http://www.lrz.de
High Performance Systems Division
Barer Str. 21 - 80333 Munich - Germany
Tel. ++49-(0)89 / 289-28853, Room 1527

Christoph Biardzki

2006-May-19 07:36 UTC

head link

[Lustre-discuss] GigE-Performance II / 0-config]

Hi Phil,


Phil Schwan wrote:
>>disabling portals debug helps! Now I can see ~70 MB/s write througput
>>from one client to one server (equipped with a four-disk RAID0 which
>>achieves around 100 MB/s write / 80 MB/s read for a 32 GB file). This is
>>pretty good!
> 
> 
> Not bad, although there is still an unusual bottleneck somewhere.  Which
> benchmark are you using?  If you run vmstat on the OSS and client, are
> you CPU-bound?
Server & Client are Dual-Xeons 2.8 GHz. Server is around 37%, Client
~25%. In production we''d use LSI FC-disk arrays so optimizing for this
test setup is not really useful. I''m perfectly happy with performance
now - but I had some problems with crashes on 1.0.0 and thats why I
tried CVS.

> The Lustre 1.0.2 release will have a more sensible default debugging
> level.  Can you wait a week or so?  Otherwise I think you would have to
> write a small script that sets the debug level afterwards with sysctl.
Thats perfectly OK, I was just curious :)

> 
>>BTW: I just tried the current CVS version - works, but metadata 
>>performance is horrible (~1 file create/s) - is there a reason or some 
>>other hidden option? :))
> 
> 
> It is very difficult to choose the correct version of the code from CVS,
> so we strongly discourage people from trying.
> 
> Nevertheless, 1 create/s is not normal.  Do you have the same problem
> with the 1.0.1 packages?  How are you creating these files?
I just untar''ed a file. It only happened from the "client" to
the
"server", the Lustre filesystem mounted on the server (locally) is
very
fast. Strange. Let me try 1.0.1 and I''ll report the results.


Thanks!


  - Christoph


-- 
Leibniz Rechenzentrum München (LRZ)
http://www.lrz.de
High Performance Systems Division
Barer Str. 21 - 80333 Munich - Germany
Tel. ++49-(0)89 / 289-28853, Room 1527


-- 
Leibniz Rechenzentrum München (LRZ)
http://www.lrz.de
High Performance Systems Division
Barer Str. 21 - 80333 Munich - Germany
Tel. ++49-(0)89 / 289-28853, Room 1527

Lustre discuss - May 2006 - GigE-Performance II / 0-config]

[Lustre-discuss] GigE-Performance II / 0-config]

[Lustre-discuss] GigE-Performance II / 0-config]

[Lustre-discuss] GigE-Performance II / 0-config]

[Lustre-discuss] GigE-Performance II / 0-config]