Charland, Denis
2012-Feb-14 19:55 UTC
[Lustre-discuss] Lustre write performances 4 times faster after removing SD_IOSTATS from kernel
After performing some tests, I noticed that after removing SD_IOSTATS functionality from the kernel, the write performances of Lustre was 4 times faster. On a Lustre 1.8.7 client, I untared a tarfile of a directory containing about 225,000 files (6.4GB). I used time to measure the time it takes to perform the operation. I performed the tests with two Lustre 1.8.7 servers with identical hardware. One is running Fedora 12 with a 2.6.32-19 patched kernel and the other one is running Fedora 7 with a 2.6.22-14 patched kernel. With both servers, it took about 4 minutes to untar the tarfile when running the kernel without SD_IOSTATS compared to about 16 minutes when running the kernel with SD_IOSTATS. Is that normal that SD_IOSTATS has a so big impact on Lustre write performances? Denis Charland UNIX Systems Administrator National Research Council Canada -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120214/26a6620a/attachment.html
Peter Grandi
2012-Feb-15 11:49 UTC
[Lustre-discuss] Lustre write performances 4 times faster after removing SD_IOSTATS from kernel
>>> On Tue, 14 Feb 2012 14:55:03 -0500, "Charland, Denis" >>> <Denis.Charland at imi.cnrc-nrc.gc.ca> said:> [ ... ] On a Lustre 1.8.7 client, I untared a tarfile of a > directory containing about 225,000 files (6.4GB).Writing lots of small files is not the most appropriate. Also, it may make a big difference where the ''tar'' file was stored, and in how many directories the files ended up.> [ ... ] it took about 4 minutes to untar the tarfile when > running the kernel without SD_IOSTATS compared to about 16 > minutes when running the kernel with SD_IOSTATS. [ ... ]On the client or the OSS or MDS? Anyhow stats collection in itself should not make a difference. There may something else involved. For example logging volume, which is known to make a significant difference. Perhaps also whether barriers (or ''sync'') are enabled, which may conceivably remotely related to ''SD_IOSTATS''. Possibly you may have a stats collector program that results in a higher load on the MDS if ''SD_IOSTATS'' is enabled. But looking at your numbers, in the faster case you were able to commit ~950 inodes/s and 26 MB/s, and even with Lustre not being the best for small files, that''s not so good. Overall your report is vague and not very useful. Have you looked for example at ''vmstat'' and ''iostat -dx'' rates on the MDS/OSS hosts? Have you looked at the logs on both client and servers? Checked mount options, ''sync'' and barriers?
Andreas Dilger
2012-Feb-15 19:06 UTC
[Lustre-discuss] Lustre write performances 4 times faster after removing SD_IOSTATS from kernel
On 2012-02-14, at 11:55 AM, Charland, Denis wrote:> After performing some tests, I noticed that after removing SD_IOSTATS functionality from the kernel, the write performances of Lustre was 4 times faster. On a Lustre 1.8.7 client, I untared a tarfile of a directory containing about 225,000 files (6.4GB).That is about 28kB/file, so it is causing a LOT of IO requests to the underlying disks.> I used time to measure the time it takes to perform the operation. I performed the tests with two Lustre 1.8.7 servers with identical hardware. One is running Fedora 12 with a 2.6.32-19 patched kernel and the other one is running Fedora 7 with a 2.6.22-14 patched kernel. > > With both servers, it took about 4 minutes to untar the tarfile when running the kernel without SD_IOSTATS compared to about 16 minutes when running the kernel with SD_IOSTATS. > > Is that normal that SD_IOSTATS has a so big impact on Lustre write performances?The sd_iostats code uses spin_lock_irqsave() to ensure consistent data structure access in interrupt context, so possibly this is causing the performance impact. The impact might be more severe if there are many cores on the server. The sd_iostats patch is completely optional, and is no longer in the RHEL6 kernel patch series in the Lustre 2.1/2.2 releases. Cheers, Andreas -- Andreas Dilger Whamcloud, Inc. Principal Lustre Engineer http://www.whamcloud.com/
Charland, Denis
2012-Feb-15 19:45 UTC
[Lustre-discuss] Lustre write performances 4 times faster after removing SD_IOSTATS from kernel
> The sd_iostats code uses spin_lock_irqsave() to ensure consistent data structure access in interrupt context, so > possibly this is causing the performance impact. The impact might be more severe if there are many cores on the server.Thanks Andreas, that''s a more realistic explanation...> The sd_iostats patch is completely optional, and is no longer in the RHEL6 kernel patch series in the Lustre > 2.1/2.2 releases.It has also been removed from SLES11 kernel patch series in Lustre 1.8.7 release but for another reason (Bugzilla - Bug #23988). Cheers, Denis
Peter Grandi
2012-Feb-15 21:36 UTC
[Lustre-discuss] Lustre write performances 4 times faster after removing SD_IOSTATS from kernel
>> After performing some tests, I noticed that after removing >> SD_IOSTATS functionality from the kernel, the write >> performances of Lustre was 4 times faster. On a Lustre 1.8.7 >> client, I untared a tarfile of a directory containing about >> 225,000 files (6.4GB).> That is about 28kB/file, so it is causing a LOT of IO requests > to the underlying disks.Well, at least one per file, but I more suspicious of the MDS when there is a lot of metadata traffic, and here there is file creation and file closure (with update of the file size) every 28KiB. The size of each write IO is small, so the size of each RPC, but that is reflected in the low overall transfer rate. [ ... ]>> With both servers, it took about 4 minutes to untar the >> tarfile when running the kernel without SD_IOSTATS compared >> to about 16 minutes when running the kernel with SD_IOSTATS. >> Is that normal that SD_IOSTATS has a so big impact on Lustre >> write performances?> The sd_iostats code uses spin_lock_irqsave() to ensure > consistent data structure access in interrupt context, so > possibly this is causing the performance impact. The impact > might be more severe if there are many cores on the server.It would be really unfortunate that it were severe to the point that it takes 3 times as long to do that as to do 28KiB of IO. Note the transfer rate: 6.4GB in 4 minutes is around 25-26MiB/s, while in 16 minutes it reduces to around 6-7MiB/s. Theese are not so good figures in absolute terms, and more interestingly imply around ~1ms per file (~1,000 files/s) and written per second) vs. 4ms per file (~250 files/s), with an extra 3ms per file (allegedly) with ''SD_IOSTATS''. Considering the lack of obviously useful information in the original post, I am skeptical that ''SD_IOSTATS'' directly accounts for 3ms on top of 1ms of IO per file, that''s why I suspect at most some collateral effect.
Charland, Denis
2012-Feb-15 22:07 UTC
[Lustre-discuss] Lustre write performances 4 times faster after removing SD_IOSTATS from kernel
> Considering the lack of obviously useful information in the original post, I am skeptical that ''SD_IOSTATS'' directly accounts for 3ms on > top of 1ms of IO per file, that''s why I suspect at most some collateral effect.Peter, do your own tests and we''ll see.