Hi,
I''ve created a small lustre testbed which i''m using to do
experiments. n32
holds the mgs and one ost. I get 39MB/s when writing to the disk from n32.
However, if i do the same experiments using dd on a mounted ost over rdma
using o2ib i get 12.5MB/s!!! Why am''i experiencing a performance loss
when
my ib interconnect can deliver over 650MB/s ? I''m currently using HCA
firmware 3.2 for the 23108 but i''m not sure this is the issue. The
socknal
on n32 when n31 is writing is only using 4% of CPU. how can i troubleshoot
this performance issue ?
ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
ib_mthca 0000:04:00.0: HCA FW version 3.2.0 is old (3.4.0 is current).
ib_mthca 0000:04:00.0: If you have problems, try updating your HCA FW.
n32:
/dev/sdb1 899M 41M 808M 5% /mnt/lustre/mgs
/dev/sdb2 9.9G 3.1G 6.4G 33% /mnt/lustre/sdb2
/dev/sdb3 30G 2.1G 27G 8% /mnt/lustre/sdb3
/dev/sdb4 27G 1.4G 25G 6% /mnt/lustre/sdb4
161.74.83.48@o2ib:/testfs
67G 6.5G 57G 11% /mnt/lustre/ost
n32:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k
512000+0 records in
512000+0 records out
524288000 bytes (524 MB) copied, 13.406 seconds, 39.1 MB/s
top - 14:20:19 up 1 day, 7:15, 2 users, load average: 0.00, 0.00, 0.00
Tasks: 272 total, 1 running, 271 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 2.3%sy, 0.0%ni, 81.1%id, 12.5%wa, 1.2%hi, 3.0%si,
0.0%st
Mem: 515144k total, 508984k used, 6160k free, 349900k buffers
Swap: 1052216k total, 148k used, 1052068k free, 51280k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4003 root 15 0 0 0 0 S 5 0.0 7:19.17 socknal_sd00
4358 root 15 0 0 0 0 S 0 0.0 0:06.08 ll_ost_io_04
4364 root 15 0 0 0 0 S 0 0.0 0:06.28 ll_ost_io_10
7412 root 16 0 2320 1136 768 R 0 0.2 0:00.21 top
1 root 16 0 716 284 244 S 0 0.1 0:00.87 init
n31:/mnt/lustre/ost # df -h -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda6 33G 4.9G 29G 15% /
udev 252M 136K 252M 1% /dev
161.74.83.48@o2ib:/testfs
67G 6.5G 57G 11% /mnt/lustre/ost
n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k
512000+0 records in
512000+0 records out
524288000 bytes (524 MB) copied, 41.9394 seconds, 12.5 MB/s
Thierry.
----------------------------------------
Dr Thierry DELAITRE
Systems and Services Manager, CSCS
University of Westminster
115 New Cavendish Street, London W1W 6UW
Tel: 020 7911 5000 ext: 3586
Fax: 020 7911 5089
Mobile short dial code 1788
http://www.cscs.wmin.ac.uk/~delaitt
----------------------------------------
This e-mail and its attachments are intended for the above named only
and may be confidential. If they have come to you in error you must
not copy or show them to anyone, nor should you take any action based
on them, other than to notify the error by replying to the sender.
I''m now getting same perf on a lustre client than on the lustre server. I removed the tcp interface from modprobe and left o2ib only as for some reason lustre was using the tcp interface for communication. n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k 512000+0 records in 512000+0 records out 524288000 bytes (524 MB) copied, 12.6019 seconds, 41.6 MB/s Thierry. On Thu, 28 Sep 2006, Thierry Delaitre wrote:> > Hi, > > I''ve created a small lustre testbed which i''m using to do experiments. n32 > holds the mgs and one ost. I get 39MB/s when writing to the disk from n32. > However, if i do the same experiments using dd on a mounted ost over rdma > using o2ib i get 12.5MB/s!!! Why am''i experiencing a performance loss when > my ib interconnect can deliver over 650MB/s ? I''m currently using HCA > firmware 3.2 for the 23108 but i''m not sure this is the issue. The socknal > on n32 when n31 is writing is only using 4% of CPU. how can i troubleshoot > this performance issue ? > > ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) > ib_mthca 0000:04:00.0: HCA FW version 3.2.0 is old (3.4.0 is current). > ib_mthca 0000:04:00.0: If you have problems, try updating your HCA FW. > > n32: > > /dev/sdb1 899M 41M 808M 5% /mnt/lustre/mgs > /dev/sdb2 9.9G 3.1G 6.4G 33% /mnt/lustre/sdb2 > /dev/sdb3 30G 2.1G 27G 8% /mnt/lustre/sdb3 > /dev/sdb4 27G 1.4G 25G 6% /mnt/lustre/sdb4 > 161.74.83.48@o2ib:/testfs > 67G 6.5G 57G 11% /mnt/lustre/ost > > n32:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k > 512000+0 records in > 512000+0 records out > 524288000 bytes (524 MB) copied, 13.406 seconds, 39.1 MB/s > > top - 14:20:19 up 1 day, 7:15, 2 users, load average: 0.00, 0.00, 0.00 > Tasks: 272 total, 1 running, 271 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.0%us, 2.3%sy, 0.0%ni, 81.1%id, 12.5%wa, 1.2%hi, 3.0%si, > 0.0%st > Mem: 515144k total, 508984k used, 6160k free, 349900k buffers > Swap: 1052216k total, 148k used, 1052068k free, 51280k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 4003 root 15 0 0 0 0 S 5 0.0 7:19.17 socknal_sd00 > 4358 root 15 0 0 0 0 S 0 0.0 0:06.08 ll_ost_io_04 > 4364 root 15 0 0 0 0 S 0 0.0 0:06.28 ll_ost_io_10 > 7412 root 16 0 2320 1136 768 R 0 0.2 0:00.21 top > 1 root 16 0 716 284 244 S 0 0.1 0:00.87 init > > n31:/mnt/lustre/ost # df -h -h > Filesystem Size Used Avail Use% Mounted on > /dev/sda6 33G 4.9G 29G 15% / > udev 252M 136K 252M 1% /dev > 161.74.83.48@o2ib:/testfs > 67G 6.5G 57G 11% /mnt/lustre/ost > > n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k > 512000+0 records in > 512000+0 records out > 524288000 bytes (524 MB) copied, 41.9394 seconds, 12.5 MB/s > > Thierry.
Thierry,
How did you get around your problems with o2iblnd/OFED symbol versions?
Cheers,
Eric
---------------------------------------------------
|Eric Barton Barton Software |
|9 York Gardens Tel: +44 (117) 330 1575 |
|Clifton Mobile: +44 (7909) 680 356 |
|Bristol BS8 4LL Fax: call first |
|United Kingdom E-Mail: eeb@bartonsoftware.com|
---------------------------------------------------
> -----Original Message-----
> From: lustre-discuss-bounces@clusterfs.com
> [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of
> Thierry Delaitre
> Sent: 29 September 2006 10:09 AM
> To: lustre-discuss@clusterfs.com
> Subject: [Lustre-discuss] Re: performance issues
>
>
> I''m now getting same perf on a lustre client than on the
> lustre server. I
> removed the tcp interface from modprobe and left o2ib only as for some
> reason lustre was using the tcp interface for communication.
>
> n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k
> 512000+0 records in
> 512000+0 records out
> 524288000 bytes (524 MB) copied, 12.6019 seconds, 41.6 MB/s
>
> Thierry.
>
> On Thu, 28 Sep 2006, Thierry Delaitre wrote:
>
> >
> > Hi,
> >
> > I''ve created a small lustre testbed which i''m using
to do
> experiments. n32
> > holds the mgs and one ost. I get 39MB/s when writing to the
> disk from n32.
> > However, if i do the same experiments using dd on a mounted
> ost over rdma
> > using o2ib i get 12.5MB/s!!! Why am''i experiencing a
> performance loss when
> > my ib interconnect can deliver over 650MB/s ? I''m currently
> using HCA
> > firmware 3.2 for the 23108 but i''m not sure this is the
> issue. The socknal
> > on n32 when n31 is writing is only using 4% of CPU. how can
> i troubleshoot
> > this performance issue ?
> >
> > ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
> > ib_mthca 0000:04:00.0: HCA FW version 3.2.0 is old (3.4.0
> is current).
> > ib_mthca 0000:04:00.0: If you have problems, try updating
> your HCA FW.
> >
> > n32:
> >
> > /dev/sdb1 899M 41M 808M 5% /mnt/lustre/mgs
> > /dev/sdb2 9.9G 3.1G 6.4G 33% /mnt/lustre/sdb2
> > /dev/sdb3 30G 2.1G 27G 8% /mnt/lustre/sdb3
> > /dev/sdb4 27G 1.4G 25G 6% /mnt/lustre/sdb4
> > 161.74.83.48@o2ib:/testfs
> > 67G 6.5G 57G 11% /mnt/lustre/ost
> >
> > n32:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k
> > 512000+0 records in
> > 512000+0 records out
> > 524288000 bytes (524 MB) copied, 13.406 seconds, 39.1 MB/s
> >
> > top - 14:20:19 up 1 day, 7:15, 2 users, load average:
> 0.00, 0.00, 0.00
> > Tasks: 272 total, 1 running, 271 sleeping, 0 stopped, 0 zombie
> > Cpu(s): 0.0%us, 2.3%sy, 0.0%ni, 81.1%id, 12.5%wa,
> 1.2%hi, 3.0%si,
> > 0.0%st
> > Mem: 515144k total, 508984k used, 6160k free,
> 349900k buffers
> > Swap: 1052216k total, 148k used, 1052068k free,
> 51280k cached
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 4003 root 15 0 0 0 0 S 5 0.0
> 7:19.17 socknal_sd00
> > 4358 root 15 0 0 0 0 S 0 0.0
> 0:06.08 ll_ost_io_04
> > 4364 root 15 0 0 0 0 S 0 0.0
> 0:06.28 ll_ost_io_10
> > 7412 root 16 0 2320 1136 768 R 0 0.2 0:00.21 top
> > 1 root 16 0 716 284 244 S 0 0.1 0:00.87 init
> >
> > n31:/mnt/lustre/ost # df -h -h
> > Filesystem Size Used Avail Use% Mounted on
> > /dev/sda6 33G 4.9G 29G 15% /
> > udev 252M 136K 252M 1% /dev
> > 161.74.83.48@o2ib:/testfs
> > 67G 6.5G 57G 11% /mnt/lustre/ost
> >
> > n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k
> > 512000+0 records in
> > 512000+0 records out
> > 524288000 bytes (524 MB) copied, 41.9394 seconds, 12.5 MB/s
> >
> > Thierry.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>
Eric,> How did you get around your problems with o2iblnd/OFED symbol versions?3 things: - i think lustre was using the include files of the IB module that comes natively with the linux kernel. I removed the /usr/src/linux-2.6.16.21-0.8/include/rdma and created a symlink from /usr/src/linux-2.6.16.21-0.8/drivers/infiniband to /usr/local/ofed/src/openib/drivers/infiniband - in addition to the issue above, i had to set mod_version to ''n'' as a workaround in the kernel. Someone else is also experiencing this: https://mail.clusterfs.com/pipermail/lustre-discuss/2006-July/001834.html - you have to compile lustre with /usr/local/ofed/src/openib and not /usr/local/ofed/src/openib-1.1 mkdir -p /usr/local/ofed/src/openib/drivers/infiniband ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband ./configure --with-o2ib=/usr/local/ofed/src/openib Cheers, Thierry. On Fri, 29 Sep 2006, Eric Barton wrote:> Cheers, > Eric > > --------------------------------------------------- > |Eric Barton Barton Software | > |9 York Gardens Tel: +44 (117) 330 1575 | > |Clifton Mobile: +44 (7909) 680 356 | > |Bristol BS8 4LL Fax: call first | > |United Kingdom E-Mail: eeb@bartonsoftware.com| > --------------------------------------------------- > > > > -----Original Message----- > > From: lustre-discuss-bounces@clusterfs.com > > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of > > Thierry Delaitre > > Sent: 29 September 2006 10:09 AM > > To: lustre-discuss@clusterfs.com > > Subject: [Lustre-discuss] Re: performance issues > > > > > > I''m now getting same perf on a lustre client than on the > > lustre server. I > > removed the tcp interface from modprobe and left o2ib only as for some > > reason lustre was using the tcp interface for communication. > > > > n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k > > 512000+0 records in > > 512000+0 records out > > 524288000 bytes (524 MB) copied, 12.6019 seconds, 41.6 MB/s > > > > Thierry. > > > > On Thu, 28 Sep 2006, Thierry Delaitre wrote: > > > > > > > > Hi, > > > > > > I''ve created a small lustre testbed which i''m using to do > > experiments. n32 > > > holds the mgs and one ost. I get 39MB/s when writing to the > > disk from n32. > > > However, if i do the same experiments using dd on a mounted > > ost over rdma > > > using o2ib i get 12.5MB/s!!! Why am''i experiencing a > > performance loss when > > > my ib interconnect can deliver over 650MB/s ? I''m currently > > using HCA > > > firmware 3.2 for the 23108 but i''m not sure this is the > > issue. The socknal > > > on n32 when n31 is writing is only using 4% of CPU. how can > > i troubleshoot > > > this performance issue ? > > > > > > ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) > > > ib_mthca 0000:04:00.0: HCA FW version 3.2.0 is old (3.4.0 > > is current). > > > ib_mthca 0000:04:00.0: If you have problems, try updating > > your HCA FW. > > > > > > n32: > > > > > > /dev/sdb1 899M 41M 808M 5% /mnt/lustre/mgs > > > /dev/sdb2 9.9G 3.1G 6.4G 33% /mnt/lustre/sdb2 > > > /dev/sdb3 30G 2.1G 27G 8% /mnt/lustre/sdb3 > > > /dev/sdb4 27G 1.4G 25G 6% /mnt/lustre/sdb4 > > > 161.74.83.48@o2ib:/testfs > > > 67G 6.5G 57G 11% /mnt/lustre/ost > > > > > > n32:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k > > > 512000+0 records in > > > 512000+0 records out > > > 524288000 bytes (524 MB) copied, 13.406 seconds, 39.1 MB/s > > > > > > top - 14:20:19 up 1 day, 7:15, 2 users, load average: > > 0.00, 0.00, 0.00 > > > Tasks: 272 total, 1 running, 271 sleeping, 0 stopped, 0 zombie > > > Cpu(s): 0.0%us, 2.3%sy, 0.0%ni, 81.1%id, 12.5%wa, > > 1.2%hi, 3.0%si, > > > 0.0%st > > > Mem: 515144k total, 508984k used, 6160k free, > > 349900k buffers > > > Swap: 1052216k total, 148k used, 1052068k free, > > 51280k cached > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > 4003 root 15 0 0 0 0 S 5 0.0 > > 7:19.17 socknal_sd00 > > > 4358 root 15 0 0 0 0 S 0 0.0 > > 0:06.08 ll_ost_io_04 > > > 4364 root 15 0 0 0 0 S 0 0.0 > > 0:06.28 ll_ost_io_10 > > > 7412 root 16 0 2320 1136 768 R 0 0.2 0:00.21 top > > > 1 root 16 0 716 284 244 S 0 0.1 0:00.87 init > > > > > > n31:/mnt/lustre/ost # df -h -h > > > Filesystem Size Used Avail Use% Mounted on > > > /dev/sda6 33G 4.9G 29G 15% / > > > udev 252M 136K 252M 1% /dev > > > 161.74.83.48@o2ib:/testfs > > > 67G 6.5G 57G 11% /mnt/lustre/ost > > > > > > n31:/mnt/lustre/ost # dd if=/dev/zero of=titi bs=1k count=500k > > > 512000+0 records in > > > 512000+0 records out > > > 524288000 bytes (524 MB) copied, 41.9394 seconds, 12.5 MB/s > > > > > > Thierry. > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > >---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender.