Hi,
I''ve created a small lustre testbed which i''m using to do
experiments. n32
holds the mgs and one ost. I get 39MB/s when writing to the disk from n32.
However, if i do the same experiments using dd on a mounted ost over rdma
using o2ib i get 12.5MB/s!!! Why am''i experiencing a performance loss
when
my ib interconnect can deliver over 650MB/s ? I''m currently using HCA
firmware 3.2 for the 23108 but i''m not sure this is the issue. The
socknal
on n32 when n31 is writing is only using 4% of CPU. how can i troubleshoot
this performance issue ?
ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
ib_mthca 0000:04:00.0: HCA FW version 3.2.0 is old (3.4.0 is current).
ib_mthca 0000:04:00.0: If you have problems, try updating your HCA FW.
n32:
/dev/sdb1             899M   41M  808M   5% /mnt/lustre/mgs
/dev/sdb2             9.9G  3.1G  6.4G  33% /mnt/lustre/sdb2
/dev/sdb3              30G  2.1G   27G   8% /mnt/lustre/sdb3
/dev/sdb4              27G  1.4G   25G   6% /mnt/lustre/sdb4
161.74.83.48@o2ib:/testfs
                       67G  6.5G   57G  11% /mnt/lustre/ost
n32:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k
512000+0 records in
512000+0 records out
524288000 bytes (524 MB) copied, 13.406 seconds, 39.1 MB/s
top - 14:20:19 up 1 day,  7:15,  2 users,  load average: 0.00, 0.00, 0.00
Tasks: 272 total,   1 running, 271 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  2.3%sy,  0.0%ni, 81.1%id, 12.5%wa,  1.2%hi,  3.0%si,
0.0%st
Mem:    515144k total,   508984k used,     6160k free,   349900k buffers
Swap:  1052216k total,      148k used,  1052068k free,    51280k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4003 root      15   0     0    0    0 S    5  0.0   7:19.17 socknal_sd00
 4358 root      15   0     0    0    0 S    0  0.0   0:06.08 ll_ost_io_04
 4364 root      15   0     0    0    0 S    0  0.0   0:06.28 ll_ost_io_10
 7412 root      16   0  2320 1136  768 R    0  0.2   0:00.21 top
    1 root      16   0   716  284  244 S    0  0.1   0:00.87 init
n31:/mnt/lustre/ost # df -h -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda6              33G  4.9G   29G  15% /
udev                  252M  136K  252M   1% /dev
161.74.83.48@o2ib:/testfs
                       67G  6.5G   57G  11% /mnt/lustre/ost
n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k
512000+0 records in
512000+0 records out
524288000 bytes (524 MB) copied, 41.9394 seconds, 12.5 MB/s
Thierry.
----------------------------------------
Dr Thierry DELAITRE
Systems and Services Manager, CSCS
University of Westminster
115 New Cavendish Street, London W1W 6UW
Tel: 020 7911 5000 ext: 3586
Fax: 020 7911 5089
Mobile short dial code 1788
http://www.cscs.wmin.ac.uk/~delaitt
----------------------------------------
This e-mail and its attachments are intended for the above named only
and may be confidential.  If they have come to you in error you must
not copy or show them to anyone, nor should you take any action based
on them, other than to notify the error by replying to the sender.
I''m now getting same perf on a lustre client than on the lustre server. I removed the tcp interface from modprobe and left o2ib only as for some reason lustre was using the tcp interface for communication. n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k 512000+0 records in 512000+0 records out 524288000 bytes (524 MB) copied, 12.6019 seconds, 41.6 MB/s Thierry. On Thu, 28 Sep 2006, Thierry Delaitre wrote:> > Hi, > > I''ve created a small lustre testbed which i''m using to do experiments. n32 > holds the mgs and one ost. I get 39MB/s when writing to the disk from n32. > However, if i do the same experiments using dd on a mounted ost over rdma > using o2ib i get 12.5MB/s!!! Why am''i experiencing a performance loss when > my ib interconnect can deliver over 650MB/s ? I''m currently using HCA > firmware 3.2 for the 23108 but i''m not sure this is the issue. The socknal > on n32 when n31 is writing is only using 4% of CPU. how can i troubleshoot > this performance issue ? > > ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) > ib_mthca 0000:04:00.0: HCA FW version 3.2.0 is old (3.4.0 is current). > ib_mthca 0000:04:00.0: If you have problems, try updating your HCA FW. > > n32: > > /dev/sdb1 899M 41M 808M 5% /mnt/lustre/mgs > /dev/sdb2 9.9G 3.1G 6.4G 33% /mnt/lustre/sdb2 > /dev/sdb3 30G 2.1G 27G 8% /mnt/lustre/sdb3 > /dev/sdb4 27G 1.4G 25G 6% /mnt/lustre/sdb4 > 161.74.83.48@o2ib:/testfs > 67G 6.5G 57G 11% /mnt/lustre/ost > > n32:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k > 512000+0 records in > 512000+0 records out > 524288000 bytes (524 MB) copied, 13.406 seconds, 39.1 MB/s > > top - 14:20:19 up 1 day, 7:15, 2 users, load average: 0.00, 0.00, 0.00 > Tasks: 272 total, 1 running, 271 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.0%us, 2.3%sy, 0.0%ni, 81.1%id, 12.5%wa, 1.2%hi, 3.0%si, > 0.0%st > Mem: 515144k total, 508984k used, 6160k free, 349900k buffers > Swap: 1052216k total, 148k used, 1052068k free, 51280k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 4003 root 15 0 0 0 0 S 5 0.0 7:19.17 socknal_sd00 > 4358 root 15 0 0 0 0 S 0 0.0 0:06.08 ll_ost_io_04 > 4364 root 15 0 0 0 0 S 0 0.0 0:06.28 ll_ost_io_10 > 7412 root 16 0 2320 1136 768 R 0 0.2 0:00.21 top > 1 root 16 0 716 284 244 S 0 0.1 0:00.87 init > > n31:/mnt/lustre/ost # df -h -h > Filesystem Size Used Avail Use% Mounted on > /dev/sda6 33G 4.9G 29G 15% / > udev 252M 136K 252M 1% /dev > 161.74.83.48@o2ib:/testfs > 67G 6.5G 57G 11% /mnt/lustre/ost > > n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k > 512000+0 records in > 512000+0 records out > 524288000 bytes (524 MB) copied, 41.9394 seconds, 12.5 MB/s > > Thierry.
Thierry,
How did you get around your problems with o2iblnd/OFED symbol versions?
                Cheers,
                        Eric
---------------------------------------------------
|Eric Barton        Barton Software               |
|9 York Gardens     Tel:    +44 (117) 330 1575    |
|Clifton            Mobile: +44 (7909) 680 356    |
|Bristol BS8 4LL    Fax:    call first            |
|United Kingdom     E-Mail: eeb@bartonsoftware.com|
---------------------------------------------------
 
> -----Original Message-----
> From: lustre-discuss-bounces@clusterfs.com 
> [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of 
> Thierry Delaitre
> Sent: 29 September 2006 10:09 AM
> To: lustre-discuss@clusterfs.com
> Subject: [Lustre-discuss] Re: performance issues
> 
> 
> I''m now getting same perf on a lustre client than on the 
> lustre server. I
> removed the tcp interface from modprobe and left o2ib only as for some
> reason lustre was using the tcp interface for communication.
> 
> n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k
> 512000+0 records in
> 512000+0 records out
> 524288000 bytes (524 MB) copied, 12.6019 seconds, 41.6 MB/s
> 
> Thierry.
> 
> On Thu, 28 Sep 2006, Thierry Delaitre wrote:
> 
> >
> > Hi,
> >
> > I''ve created a small lustre testbed which i''m using
to do
> experiments. n32
> > holds the mgs and one ost. I get 39MB/s when writing to the 
> disk from n32.
> > However, if i do the same experiments using dd on a mounted 
> ost over rdma
> > using o2ib i get 12.5MB/s!!! Why am''i experiencing a 
> performance loss when
> > my ib interconnect can deliver over 650MB/s ? I''m currently 
> using HCA
> > firmware 3.2 for the 23108 but i''m not sure this is the 
> issue. The socknal
> > on n32 when n31 is writing is only using 4% of CPU. how can 
> i troubleshoot
> > this performance issue ?
> >
> > ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
> > ib_mthca 0000:04:00.0: HCA FW version 3.2.0 is old (3.4.0 
> is current).
> > ib_mthca 0000:04:00.0: If you have problems, try updating 
> your HCA FW.
> >
> > n32:
> >
> > /dev/sdb1             899M   41M  808M   5% /mnt/lustre/mgs
> > /dev/sdb2             9.9G  3.1G  6.4G  33% /mnt/lustre/sdb2
> > /dev/sdb3              30G  2.1G   27G   8% /mnt/lustre/sdb3
> > /dev/sdb4              27G  1.4G   25G   6% /mnt/lustre/sdb4
> > 161.74.83.48@o2ib:/testfs
> >                        67G  6.5G   57G  11% /mnt/lustre/ost
> >
> > n32:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k
> > 512000+0 records in
> > 512000+0 records out
> > 524288000 bytes (524 MB) copied, 13.406 seconds, 39.1 MB/s
> >
> > top - 14:20:19 up 1 day,  7:15,  2 users,  load average: 
> 0.00, 0.00, 0.00
> > Tasks: 272 total,   1 running, 271 sleeping,   0 stopped,   0 zombie
> > Cpu(s):  0.0%us,  2.3%sy,  0.0%ni, 81.1%id, 12.5%wa,  
> 1.2%hi,  3.0%si,
> > 0.0%st
> > Mem:    515144k total,   508984k used,     6160k free,   
> 349900k buffers
> > Swap:  1052216k total,      148k used,  1052068k free,    
> 51280k cached
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >  4003 root      15   0     0    0    0 S    5  0.0   
> 7:19.17 socknal_sd00
> >  4358 root      15   0     0    0    0 S    0  0.0   
> 0:06.08 ll_ost_io_04
> >  4364 root      15   0     0    0    0 S    0  0.0   
> 0:06.28 ll_ost_io_10
> >  7412 root      16   0  2320 1136  768 R    0  0.2   0:00.21 top
> >     1 root      16   0   716  284  244 S    0  0.1   0:00.87 init
> >
> > n31:/mnt/lustre/ost # df -h -h
> > Filesystem            Size  Used Avail Use% Mounted on
> > /dev/sda6              33G  4.9G   29G  15% /
> > udev                  252M  136K  252M   1% /dev
> > 161.74.83.48@o2ib:/testfs
> >                        67G  6.5G   57G  11% /mnt/lustre/ost
> >
> > n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k
> > 512000+0 records in
> > 512000+0 records out
> > 524288000 bytes (524 MB) copied, 41.9394 seconds, 12.5 MB/s
> >
> > Thierry.
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>
Eric,> How did you get around your problems with o2iblnd/OFED symbol versions?3 things: - i think lustre was using the include files of the IB module that comes natively with the linux kernel. I removed the /usr/src/linux-2.6.16.21-0.8/include/rdma and created a symlink from /usr/src/linux-2.6.16.21-0.8/drivers/infiniband to /usr/local/ofed/src/openib/drivers/infiniband - in addition to the issue above, i had to set mod_version to ''n'' as a workaround in the kernel. Someone else is also experiencing this: https://mail.clusterfs.com/pipermail/lustre-discuss/2006-July/001834.html - you have to compile lustre with /usr/local/ofed/src/openib and not /usr/local/ofed/src/openib-1.1 mkdir -p /usr/local/ofed/src/openib/drivers/infiniband ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband ./configure --with-o2ib=/usr/local/ofed/src/openib Cheers, Thierry. On Fri, 29 Sep 2006, Eric Barton wrote:> Cheers, > Eric > > --------------------------------------------------- > |Eric Barton Barton Software | > |9 York Gardens Tel: +44 (117) 330 1575 | > |Clifton Mobile: +44 (7909) 680 356 | > |Bristol BS8 4LL Fax: call first | > |United Kingdom E-Mail: eeb@bartonsoftware.com| > --------------------------------------------------- > > > > -----Original Message----- > > From: lustre-discuss-bounces@clusterfs.com > > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of > > Thierry Delaitre > > Sent: 29 September 2006 10:09 AM > > To: lustre-discuss@clusterfs.com > > Subject: [Lustre-discuss] Re: performance issues > > > > > > I''m now getting same perf on a lustre client than on the > > lustre server. I > > removed the tcp interface from modprobe and left o2ib only as for some > > reason lustre was using the tcp interface for communication. > > > > n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k > > 512000+0 records in > > 512000+0 records out > > 524288000 bytes (524 MB) copied, 12.6019 seconds, 41.6 MB/s > > > > Thierry. > > > > On Thu, 28 Sep 2006, Thierry Delaitre wrote: > > > > > > > > Hi, > > > > > > I''ve created a small lustre testbed which i''m using to do > > experiments. n32 > > > holds the mgs and one ost. I get 39MB/s when writing to the > > disk from n32. > > > However, if i do the same experiments using dd on a mounted > > ost over rdma > > > using o2ib i get 12.5MB/s!!! Why am''i experiencing a > > performance loss when > > > my ib interconnect can deliver over 650MB/s ? I''m currently > > using HCA > > > firmware 3.2 for the 23108 but i''m not sure this is the > > issue. The socknal > > > on n32 when n31 is writing is only using 4% of CPU. how can > > i troubleshoot > > > this performance issue ? > > > > > > ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) > > > ib_mthca 0000:04:00.0: HCA FW version 3.2.0 is old (3.4.0 > > is current). > > > ib_mthca 0000:04:00.0: If you have problems, try updating > > your HCA FW. > > > > > > n32: > > > > > > /dev/sdb1 899M 41M 808M 5% /mnt/lustre/mgs > > > /dev/sdb2 9.9G 3.1G 6.4G 33% /mnt/lustre/sdb2 > > > /dev/sdb3 30G 2.1G 27G 8% /mnt/lustre/sdb3 > > > /dev/sdb4 27G 1.4G 25G 6% /mnt/lustre/sdb4 > > > 161.74.83.48@o2ib:/testfs > > > 67G 6.5G 57G 11% /mnt/lustre/ost > > > > > > n32:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k > > > 512000+0 records in > > > 512000+0 records out > > > 524288000 bytes (524 MB) copied, 13.406 seconds, 39.1 MB/s > > > > > > top - 14:20:19 up 1 day, 7:15, 2 users, load average: > > 0.00, 0.00, 0.00 > > > Tasks: 272 total, 1 running, 271 sleeping, 0 stopped, 0 zombie > > > Cpu(s): 0.0%us, 2.3%sy, 0.0%ni, 81.1%id, 12.5%wa, > > 1.2%hi, 3.0%si, > > > 0.0%st > > > Mem: 515144k total, 508984k used, 6160k free, > > 349900k buffers > > > Swap: 1052216k total, 148k used, 1052068k free, > > 51280k cached > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > 4003 root 15 0 0 0 0 S 5 0.0 > > 7:19.17 socknal_sd00 > > > 4358 root 15 0 0 0 0 S 0 0.0 > > 0:06.08 ll_ost_io_04 > > > 4364 root 15 0 0 0 0 S 0 0.0 > > 0:06.28 ll_ost_io_10 > > > 7412 root 16 0 2320 1136 768 R 0 0.2 0:00.21 top > > > 1 root 16 0 716 284 244 S 0 0.1 0:00.87 init > > > > > > n31:/mnt/lustre/ost # df -h -h > > > Filesystem Size Used Avail Use% Mounted on > > > /dev/sda6 33G 4.9G 29G 15% / > > > udev 252M 136K 252M 1% /dev > > > 161.74.83.48@o2ib:/testfs > > > 67G 6.5G 57G 11% /mnt/lustre/ost > > > > > > n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k > > > 512000+0 records in > > > 512000+0 records out > > > 524288000 bytes (524 MB) copied, 41.9394 seconds, 12.5 MB/s > > > > > > Thierry. > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > >---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender.