Hi all, I am testing with MXLND. My setup is very simple. I have one MGS/MDT, one OST, and one client. All are running RHEL5.3. The OST is using a loopback device on the internal disk which is capable of ~70 MB/s. I am running 10 dd in parallel which send 1 GB each and MXLND should be able to send the data ~1,100 MB/s. The machine has 8 GB of memory. Before the dds complete, free memory drops to about 32 MB. Everything completes, but Lustre posts the following to dmesg: Lustre: 4227:0:(filter_io_26.c:641:filter_commitrw_write()) lustre- OST0000: slow i_mutex 30s Lustre: 4222:0:(lustre_fsfilt.h:320:fsfilt_commit_wait()) lustre- OST0000: slow journal start 30s Lustre: 4222:0:(filter_io_26.c:724:filter_commitrw_write()) lustre- OST0000: slow commitrw commit 30s Lustre: 4242:0:(filter_io_26.c:706:filter_commitrw_write()) lustre- OST0000: slow direct_io 30s Lustre: 4242:0:(filter_io_26.c:706:filter_commitrw_write()) Skipped 4 previous similar messages Should I be concerned or is this normal? Scott
On Fri, 2009-08-28 at 15:00 -0400, Scott Atchley wrote:> Lustre: 4227:0:(filter_io_26.c:641:filter_commitrw_write()) lustre- > OST0000: slow i_mutex 30s > Lustre: 4222:0:(lustre_fsfilt.h:320:fsfilt_commit_wait()) lustre- > OST0000: slow journal start 30s > Lustre: 4222:0:(filter_io_26.c:724:filter_commitrw_write()) lustre- > OST0000: slow commitrw commit 30s > Lustre: 4242:0:(filter_io_26.c:706:filter_commitrw_write()) lustre- > OST0000: slow direct_io 30s > Lustre: 4242:0:(filter_io_26.c:706:filter_commitrw_write()) Skipped 4 > previous similar messages > > Should I be concerned or is this normal?It means that I/Os are completing more slowly that Lustre would like, which as you can guess means you are hammering the disk(s) too hard. Try reducing the number of OST threads. Ideally you want those messages to go away even when you are pushing the OSTs to capacity. Ideally you want just enough OST threads to push the disks to capacity but no more. So measure, reduce, measure. If the throughput is the same or better after reducing, reduce further and measure again. Repeat until you have found the sweet spot. Obdfilter-survey in the iokit automates this for you running many tests at different thread counts letting you see where the sweet spot is without all the iterating. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090828/114fbc5a/attachment.bin
On Aug 28, 2009, at 3:08 PM, Brian J. Murrell wrote:> On Fri, 2009-08-28 at 15:00 -0400, Scott Atchley wrote: >> Lustre: 4227:0:(filter_io_26.c:641:filter_commitrw_write()) lustre- >> OST0000: slow i_mutex 30s >> Lustre: 4222:0:(lustre_fsfilt.h:320:fsfilt_commit_wait()) lustre- >> OST0000: slow journal start 30s >> Lustre: 4222:0:(filter_io_26.c:724:filter_commitrw_write()) lustre- >> OST0000: slow commitrw commit 30s >> Lustre: 4242:0:(filter_io_26.c:706:filter_commitrw_write()) lustre- >> OST0000: slow direct_io 30s >> Lustre: 4242:0:(filter_io_26.c:706:filter_commitrw_write()) Skipped 4 >> previous similar messages >> >> Should I be concerned or is this normal? > > It means that I/Os are completing more slowly that Lustre would like, > which as you can guess means you are hammering the disk(s) too hard. > Try reducing the number of OST threads. Ideally you want those > messages > to go away even when you are pushing the OSTs to capacity. Ideally > you > want just enough OST threads to push the disks to capacity but no > more. > So measure, reduce, measure. If the throughput is the same or better > after reducing, reduce further and measure again. Repeat until you > have > found the sweet spot. > > Obdfilter-survey in the iokit automates this for you running many > tests > at different thread counts letting you see where the sweet spot is > without all the iterating.Hi Brian, Thanks for the description. Since I am mainly testing for correctness of MXLND, I am not worried about hammering my test disk. I will keep this in mind in case I get a big fat RAID this Christmas. ;-) Scott
On Aug 28, 2009 15:00 -0400, Scott Atchley wrote:> I am testing with MXLND. My setup is very simple. I have one MGS/MDT, > one OST, and one client. All are running RHEL5.3. The OST is using a > loopback device on the internal disk which is capable of ~70 MB/s.Note that using a loopback device imposes a lot of extra overhead on the OST. There is an extra data copy inside the driver, and there are 2 levels of "transactions", one for the loopback ldiskfs filesystem, and one for the underlying (assumed ext3) filesystem.> I am running 10 dd in parallel which send 1 GB each and MXLND should > be able to send the data ~1,100 MB/s. The machine has 8 GB of memory.> Before the dds complete, free memory drops to about 32 MB. Everything > completes, but Lustre posts the following to dmesg: > > Lustre: 4227:0:(filter_io_26.c:641:filter_commitrw_write()) lustre- > OST0000: slow i_mutex 30s > Lustre: 4222:0:(lustre_fsfilt.h:320:fsfilt_commit_wait()) lustre- > OST0000: slow journal start 30s > Lustre: 4222:0:(filter_io_26.c:724:filter_commitrw_write()) lustre- > OST0000: slow commitrw commit 30s > Lustre: 4242:0:(filter_io_26.c:706:filter_commitrw_write()) lustre- > OST0000: slow direct_io 30s > Lustre: 4242:0:(filter_io_26.c:706:filter_commitrw_write()) Skipped 4 > previous similar messages > > Should I be concerned or is this normal?If you are just testing the MXLND you could also use LNET Self Test (LST) and avoid the OST entirely. Alternately, to do Lustre RPC level testing you can use echo server. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Aug 28, 2009, at 3:46 PM, Andreas Dilger wrote:> If you are just testing the MXLND you could also use LNET Self Test > (LST) and avoid the OST entirely. Alternately, to do Lustre RPC level > testing you can use echo server.Hi Andreas, Yes, I am just testing MXLND. I wanted to test when machines crash/ reboot. LNET Self Test (LST) did not recover nicely. I was not sure if that was a bug in LST or in MXLND. Using a real filesystem did recover as expected. Scott