thr3ads.net - Lustre discuss - [Lustre-discuss] NFS Performance [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Dan

2008-Apr-15 19:16 UTC

[Lustre-discuss] NFS Performance

Hi,

With help from Oleg we got the right patches applied and NFS working
well.  Maximum performance was about 60 MB/sec.  Last week that dropped
to about 12.5 MB/sec and I cannot find a reason.  Lustre clients all
obtain 100+ MB/sec on GigE.  Each OST is good for 270 MB/sec.  When
mounting the client on one of the OSSs I get 230 MB/sec.  Seems the
speed is there.  How can NFS and Lustre be tuned better?

Current config for 1.6.4.3 is below:

1.  MGS/OSS w/ 4 OSTs - mgs_max_num_threads=32, ost_max_num_threads=64
2.  OSS w/ 6 OSTs - ost_max_num_threads=64
3.  20 Lustre clients - all perform well (GREAT Lustre developers!!!!
this system is amazing!)
4.  NFS server runs from a Lustre client machine for 12 to 15 MB/sec
max.
5.  NFS server from the MGS (client on MGS/OSS = bad, I know!) can get
20 to 30 MB/sec
    - this got 60+ MB/sec in the past.

bugs and patches applied:

14360 - 14006 is the only patch
14379 - patch 14007 only since 14008 is reversed by 14591
13371 - bug for the above mentioned 14591 patch

With these patches the system is stable unless I bump the OST or MGS
threads too high.  Performance doesn''t seem to change much with any
tuning.  I''ve adjusted the client via /proc and the OSTs and MGS
via /etc/modprobe.conf.

Suggestions?

Thank you,

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080415/85eb25ca/attachment-0002.html

Mark Seger

2008-Apr-15 19:39 UTC

head link

[Lustre-discuss] NFS Performance

while I can''t tell you how to tune nfs, I can tell you how to monitor 
it.  With collectl - http://collectl.sourceforge.net/ - you should be 
able to watch nfs, lustre and your network all at once, maybe even toss 
in cpu for good measure

This is an example of the output (along with the appropriate switches).  
I''m not doing anything over nfs, so those fields are all zero.

[root at cag-dl145-172 ~]# collectl -scnfl
waiting for 1 second sample...
#<--------CPU--------><-----------Network----------><--NFS Svr 
Summary--><-------Lustre Client->
#cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   read  write  
calls  Reads KBRead Writes Ke
   0   0 11335     33   2301  33665    2301   33665      0      0      
0      0      0      0  0
   0   0 11377     59   2303  33693    2303   33690      0      0      
0      0      0      0  0
   0   0 11362     29   2305  33719    2305   33721      0      0      
0      0      0      0  0

there are lots of different options you can try, but again I''m not sure
what to look for.  changing the ''f'' to ''F''
lets you did a little deeper
and looks at the metadata ops, commits, and restrans.
[root at cag-dl145-172 ~]# collectl -scnFl
#<--------CPU--------><-----------Network----------><----NFS 
MetaOps----><-------Lustre Client->
#cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   meta commit 
retran  Reads KBRead Writes Ke
   0   0   121     43      0      4       0       2      0      0      
0      0      0      0  0
   0   0   146    143      0      2       0       3      0      0      
0      0      0      0  0

if you really want to see everything nfsstat might show there''s two
more
formats based on the case of the ''f'':
[root at cag-dl145-172 ~]# collectl -sf --verbose
# NFS SERVER (/sec)
#<----------Network-------><----------RPC---------><---NFS
V3--->
#PKTS   UDP   TCP  TCPCONN  CALLS  BADAUTH  BADCLNT   READ  WRITE
    0     0     0        0      0        0        0      0      0

any my favorite when I haven''t a clue what nfs is doing:
[root at cag-dl145-172 ~]# collectl -sF --verbose
# NFS V3 SERVER (/sec)
#NULL GETA SETA LOOK ACCS RLNK READ WRIT CRE8 MKDR SYML MKND RMOV RMDR 
RENM LINK RDIR RDR+ FSTA FINF PATH COMM
    0    0    0    0    0    0    0    0    0    0    0    0    0    
0    0    0    0    0    0    0    0    0

on the other hand if you want to see the size of the rpcs bucket sizes 
being received from lustre there''s always:
[root at cag-dl145-172 collectl]# ./collectl.pl -s l -OB
# LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages)
#Rds  RdK   1P   2P   4P   8P  16P  32P  64P 128P 256P Wrts WrtK   1P   
2P   4P   8P  16P  32P  64P 128P 256P
   0    0    0    0    0    0    0    0    0    0    0    0    0    0    
0    0    0    0    0    0    0    0

I haven''t had too much feedback on collectl and am always looking for
some.
btw - there are a lot more options than I just showed you and if you 
like timestamps, just append -oT to the commands.

that should give you a pretty good start...  8-)

-mark

Dan wrote:> Hi,
>
> With help from Oleg we got the right patches applied and NFS working 
> well.  Maximum performance was about 60 MB/sec.  Last week that 
> dropped to about 12.5 MB/sec and I cannot find a reason.  Lustre 
> clients all obtain 100+ MB/sec on GigE.  Each OST is good for 270 
> MB/sec.  When mounting the client on one of the OSSs I get 230 
> MB/sec.  Seems the speed is there.  How can NFS and Lustre be tuned 
> better?
>
> Current config for 1.6.4.3 is below:
>
> 1.  MGS/OSS w/ 4 OSTs - mgs_max_num_threads=32, ost_max_num_threads=64
> 2.  OSS w/ 6 OSTs - ost_max_num_threads=64
> 3.  20 Lustre clients - all perform well (GREAT Lustre developers!!!! 
> this system is amazing!)
> 4.  NFS server runs from a Lustre client machine for 12 to 15 MB/sec max.
> 5.  NFS server from the MGS (client on MGS/OSS = bad, I know!) can get 
> 20 to 30 MB/sec
>     - this got 60+ MB/sec in the past.
>
> bugs and patches applied:
>
> 14360 - 14006 is the only patch
> 14379 - patch 14007 only since 14008 is reversed by 14591
> 13371 - bug for the above mentioned 14591 patch
>
> With these patches the system is stable unless I bump the OST or MGS 
> threads too high.  Performance doesn''t seem to change much with
any
> tuning.  I''ve adjusted the client via /proc and the OSTs and MGS
via
> /etc/modprobe.conf.
>
> Suggestions?
>
> Thank you,
>
> Dan
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Sridhar Basam

2008-Apr-15 19:58 UTC

head link

[Lustre-discuss] NFS Performance

There isn''t much details on the tests below for someone to make any 
recommendations. Are you NFS exporting a Lustre filesystem to a bunch of 
NFS clients and then testing on the clients? What sort of workload are 
you using to perform these tests? Is it reads, writes or a mix of them? 
More details on the hardware/software and the testing methods would help.

    Sridhar

Dan wrote:> Hi,
>
> With help from Oleg we got the right patches applied and NFS working 
> well.  Maximum performance was about 60 MB/sec.  Last week that 
> dropped to about 12.5 MB/sec and I cannot find a reason.  Lustre 
> clients all obtain 100+ MB/sec on GigE.  Each OST is good for 270 
> MB/sec.  When mounting the client on one of the OSSs I get 230 
> MB/sec.  Seems the speed is there.  How can NFS and Lustre be tuned 
> better?
>
> Current config for 1.6.4.3 is below:
>
> 1.  MGS/OSS w/ 4 OSTs - mgs_max_num_threads=32, ost_max_num_threads=64
> 2.  OSS w/ 6 OSTs - ost_max_num_threads=64
> 3.  20 Lustre clients - all perform well (GREAT Lustre developers!!!! 
> this system is amazing!)
> 4.  NFS server runs from a Lustre client machine for 12 to 15 MB/sec max.
> 5.  NFS server from the MGS (client on MGS/OSS = bad, I know!) can get 
> 20 to 30 MB/sec
>     - this got 60+ MB/sec in the past.
>
> bugs and patches applied:
>
> 14360 - 14006 is the only patch
> 14379 - patch 14007 only since 14008 is reversed by 14591
> 13371 - bug for the above mentioned 14591 patch
>
> With these patches the system is stable unless I bump the OST or MGS 
> threads too high.  Performance doesn''t seem to change much with
any
> tuning.  I''ve adjusted the client via /proc and the OSTs and MGS
via
> /etc/modprobe.conf.
>
> Suggestions?
>
> Thank you,
>
> Dan
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080415/95361b4a/attachment-0002.html

Dan Redig

2008-Apr-15 20:06 UTC

head link

[Lustre-discuss] NFS Performance

Thanks Mark!  I just started using collectl last week.  I''ll
investigate the options you suggested in a minutes and see.

Dan


-----Original Message-----
From: Mark Seger <Mark.Seger at hp.com>
Date: Tuesday, Apr 15, 2008 12:39 pm
Subject: Re: [Lustre-discuss] NFS Performance
To: Dan <dan at nerp.net>
CC: Lustre-discuss at lists.lustre.org

while I can''t tell you how to tune nfs, I can tell you how to monitor
it.  With collectl - http://collectl.sourceforge.net/ - you should be able to
watch nfs, lustre and your network all at once, maybe even toss in cpu for good
measure

This is an example of the output (along with the appropriate switches). 
I''m not doing anything over nfs, so those fields are all zero.

[root at cag-dl145-172 ~]# collectl -scnfl
waiting for 1 second sample...
#<--------CPU--------><-----------Network----------><--NFS Svr
Summary--><-------Lustre Client->
#cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   read  write  calls  Reads
KBRead Writes Ke
   0   0 11335     33   2301  33665    2301   33665      0      0      0      0 
0      0  0
   0   0 11377     59   2303  33693    2303   33690      0      0      0      0 
0      0  0
   0   0 11362     29   2305  33719    2305   33721      0      0      0      0 
0      0  0

there are lots of different options you can try, but again I''m not sure
what to look for.  changing the ''f'' to ''F''
lets you did a little deeper and looks at the metadata ops, commits, and
restrans.
[root at cag-dl145-172 ~]# collectl -scnFl
#<--------CPU--------><-----------Network----------><----NFS
MetaOps----><-------Lustre Client->
#cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   meta commit retran  Reads
KBRead Writes Ke
   0   0   121     43      0      4       0       2      0      0      0      0 
0      0  0
   0   0   146    143      0      2       0       3      0      0      0      0 
0      0  0

if you really want to see everything nfsstat might show there''s two
more formats based on the case of the ''f'':
[root at cag-dl145-172 ~]# collectl -sf --verbose
# NFS SERVER (/sec)
#<----------Network-------><----------RPC---------><---NFS
V3--->
#PKTS   UDP   TCP  TCPCONN  CALLS  BADAUTH  BADCLNT   READ  WRITE
    0     0     0        0      0        0        0      0      0

any my favorite when I haven''t a clue what nfs is doing:
[root at cag-dl145-172 ~]# collectl -sF --verbose
# NFS V3 SERVER (/sec)
#NULL GETA SETA LOOK ACCS RLNK READ WRIT CRE8 MKDR SYML MKND RMOV RMDR RENM LINK
RDIR RDR+ FSTA FINF PATH COMM
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
0    0    0    0    0    0

on the other hand if you want to see the size of the rpcs bucket sizes being
received from lustre there''s always:
[root at cag-dl145-172 collectl]# ./collectl.pl -s l -OB
# LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages)
#Rds  RdK   1P   2P   4P   8P  16P  32P  64P 128P 256P Wrts WrtK   1P   2P   4P 
8P  16P  32P  64P 128P 256P
   0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 
0    0    0    0    0    0

I haven''t had too much feedback on collectl and am always looking for
some. btw - there are a lot more options than I just showed you and if you like
timestamps, just append -oT to the commands.

that should give you a pretty good start...  8-)

-mark

Dan wrote:
 Hi,
> With help from Oleg we got the right patches applied and NFS working  well.  Maximum performance was about 60 MB/sec.  Last week that 
 dropped to about 12.5 MB/sec and I cannot find a reason.  Lustre 
 clients all obtain 100+ MB/sec on GigE.  Each OST is good for 270 
 MB/sec.  When mounting the client on one of the OSSs I get 230 
 MB/sec.  Seems the speed is there.  How can NFS and Lustre be tuned 
 better?
> Current config for 1.6.4.3 is below:
> 1.  MGS/OSS w/ 4 OSTs - mgs_max_num_threads=32, ost_max_num_threads=64 2.  OSS w/ 6 OSTs - ost_max_num_threads=64
 3.  20 Lustre clients - all perform well (GREAT Lustre developers!!!! 
 this system is amazing!)
 4.  NFS server runs from a Lustre client machine for 12 to 15 MB/sec max.
 5.  NFS server from the MGS (client on MGS/OSS = bad, I know!) can get 
 20 to 30 MB/sec
     - this got 60+ MB/sec in the past.
> bugs and patches applied:
> 14360 - 14006 is the only patch 14379 - patch 14007 only since 14008 is reversed by 14591
 13371 - bug for the above mentioned 14591 patch
> With these patches the system is stable unless I bump the OST or MGS  threads too high.  Performance doesn''t seem to change much with any 
 tuning.  I''ve adjusted the client via /proc and the OSTs and MGS via 
 /etc/modprobe.conf.
> Suggestions?
> Thank you,
> Dan ------------------------------------------------------------------------
> _______________________________________________ Lustre-discuss mailing list
 Lustre-discuss at lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

Mark Seger

2008-Apr-15 20:45 UTC

head link

[Lustre-discuss] NFS Performance

> Thanks Mark!  I just started using collectl last week.  I''ll
investigate the options you suggested in a minutes and see.
>   by all means do so and if you have any problems with the switches - you 
don''t want to know how much extra code there is in collectl just to
deal
with all the lustre stats - just let me know.  Also be sure to let me 
know if you encounter any operational problems too.  I''d only recently 
gotten around to adding support for 1.6.4 and while I think it all works 
the proof is in trying it in a lot of different configurations/environments.
-mark> Dan
>
>
> -----Original Message-----
> From: Mark Seger <Mark.Seger at hp.com>
> Date: Tuesday, Apr 15, 2008 12:39 pm
> Subject: Re: [Lustre-discuss] NFS Performance
> To: Dan <dan at nerp.net>
> CC: Lustre-discuss at lists.lustre.org
>
> while I can''t tell you how to tune nfs, I can tell you how to
monitor it.  With collectl - http://collectl.sourceforge.net/ - you should be
able to watch nfs, lustre and your network all at once, maybe even toss in cpu
for good measure
>
> This is an example of the output (along with the appropriate switches). 
I''m not doing anything over nfs, so those fields are all zero.
>
> [root at cag-dl145-172 ~]# collectl -scnfl
> waiting for 1 second sample...
> #<--------CPU--------><-----------Network----------><--NFS
Svr Summary--><-------Lustre Client->
> #cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   read  write  calls 
Reads KBRead Writes Ke
>    0   0 11335     33   2301  33665    2301   33665      0      0      0   
0      0      0  0
>    0   0 11377     59   2303  33693    2303   33690      0      0      0   
0      0      0  0
>    0   0 11362     29   2305  33719    2305   33721      0      0      0   
0      0      0  0
>
> there are lots of different options you can try, but again I''m not
sure what to look for.  changing the ''f'' to
''F'' lets you did a little deeper and looks at the metadata
ops, commits, and restrans.
> [root at cag-dl145-172 ~]# collectl -scnFl
> #<--------CPU--------><-----------Network----------><----NFS
MetaOps----><-------Lustre Client->
> #cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   meta commit retran 
Reads KBRead Writes Ke
>    0   0   121     43      0      4       0       2      0      0      0   
0      0      0  0
>    0   0   146    143      0      2       0       3      0      0      0   
0      0      0  0
>
> if you really want to see everything nfsstat might show there''s
two more formats based on the case of the ''f'':
> [root at cag-dl145-172 ~]# collectl -sf --verbose
> # NFS SERVER (/sec)
> #<----------Network-------><----------RPC---------><---NFS
V3--->
> #PKTS   UDP   TCP  TCPCONN  CALLS  BADAUTH  BADCLNT   READ  WRITE
>     0     0     0        0      0        0        0      0      0
>
> any my favorite when I haven''t a clue what nfs is doing:
> [root at cag-dl145-172 ~]# collectl -sF --verbose
> # NFS V3 SERVER (/sec)
> #NULL GETA SETA LOOK ACCS RLNK READ WRIT CRE8 MKDR SYML MKND RMOV RMDR RENM
LINK RDIR RDR+ FSTA FINF PATH COMM
>     0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
0    0    0    0    0    0    0
>
> on the other hand if you want to see the size of the rpcs bucket sizes
being received from lustre there''s always:
> [root at cag-dl145-172 collectl]# ./collectl.pl -s l -OB
> # LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages)
> #Rds  RdK   1P   2P   4P   8P  16P  32P  64P 128P 256P Wrts WrtK   1P   2P 
4P   8P  16P  32P  64P 128P 256P
>    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 
0    0    0    0    0    0    0
>
> I haven''t had too much feedback on collectl and am always looking
for some. btw - there are a lot more options than I just showed you and if you
like timestamps, just append -oT to the commands.
>
> that should give you a pretty good start...  8-)
>
> -mark
>
> Dan wrote:
>  Hi,
>
>   
>> With help from Oleg we got the right patches applied and NFS working 
>>     
>  well.  Maximum performance was about 60 MB/sec.  Last week that 
>  dropped to about 12.5 MB/sec and I cannot find a reason.  Lustre 
>  clients all obtain 100+ MB/sec on GigE.  Each OST is good for 270 
>  MB/sec.  When mounting the client on one of the OSSs I get 230 
>  MB/sec.  Seems the speed is there.  How can NFS and Lustre be tuned 
>  better?
>
>   
>> Current config for 1.6.4.3 is below:
>>     
>
>   
>> 1.  MGS/OSS w/ 4 OSTs - mgs_max_num_threads=32, ost_max_num_threads=64
>>     
>  2.  OSS w/ 6 OSTs - ost_max_num_threads=64
>  3.  20 Lustre clients - all perform well (GREAT Lustre developers!!!! 
>  this system is amazing!)
>  4.  NFS server runs from a Lustre client machine for 12 to 15 MB/sec max.
>  5.  NFS server from the MGS (client on MGS/OSS = bad, I know!) can get 
>  20 to 30 MB/sec
>      - this got 60+ MB/sec in the past.
>
>   
>> bugs and patches applied:
>>     
>
>   
>> 14360 - 14006 is the only patch
>>     
>  14379 - patch 14007 only since 14008 is reversed by 14591
>  13371 - bug for the above mentioned 14591 patch
>
>   
>> With these patches the system is stable unless I bump the OST or MGS 
>>     
>  threads too high.  Performance doesn''t seem to change much with
any
>  tuning.  I''ve adjusted the client via /proc and the OSTs and MGS
via
>  /etc/modprobe.conf.
>
>   
>> Suggestions?
>>     
>
>   
>> Thank you,
>>     
>
>   
>> Dan
>>     
>  ------------------------------------------------------------------------
>
>   
>> _______________________________________________
>>     
>  Lustre-discuss mailing list
>  Lustre-discuss at lists.lustre.org
>  http://lists.lustre.org/mailman/listinfo/lustre-discuss
>    
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Seemingly Similar Threads

Search for more reasonably related threads

Lustre discuss - Apr 2008 - NFS Performance

[Lustre-discuss] NFS Performance

[Lustre-discuss] NFS Performance

[Lustre-discuss] NFS Performance

[Lustre-discuss] NFS Performance

[Lustre-discuss] NFS Performance

Seemingly Similar Threads