Hayes, Robert N
2009-May-22 16:58 UTC
[Lustre-discuss] Lustre performance differences between 1.6.4.3 and 1.6.5
We have been troubleshooting a major performance issue with out Lustre systems when switching from 1.6.4.3 to 1.6.5. Using a patchless client and running a single thread, single file (50GB) dd copy, the 1.6.4.3 client wrote up to 1.3GB/s. Using the same system and changing only the client to 1.6.5, write speed went down to 700MB/s. We found one configuration option different by default between the two clients. In 1.6.4.3, checksum is disabled by default. In 1.6.5, checksum is enabled by default. Disabling checksum on the 1.6.5 client resulted in a performance increase to 1.1GB/s. Question - are there other option changes between 1.6.4.3 and 1.6.5 that might account for the remaining 200MB/s loss in performance? Bob Hayes System Administrator Intel-SSG-DRD-DP Office: 253-371-3040 Cell: 253-441-5482 e-mail: robert.n.hayes at Intel.Com<mailto:robert.n.hayes at Intel.Com> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090522/d7a5740b/attachment.html
Andreas Dilger
2009-May-25 21:49 UTC
[Lustre-discuss] Lustre performance differences between 1.6.4.3 and 1.6.5
On May 22, 2009 09:58 -0700, Hayes, Robert N wrote:> We have been troubleshooting a major performance issue with out > Lustre systems when switching from 1.6.4.3 to 1.6.5. Using a > patchless client and running a single thread, single file (50GB) dd > copy, the 1.6.4.3 client wrote up to 1.3GB/s. Using the same system and > changing only the client to 1.6.5, write speed went down to 700MB/s. > > We found one configuration option different by default between the > two clients. In 1.6.4.3, checksum is disabled by default. In 1.6.5, > checksum is enabled by default. Disabling checksum on the 1.6.5 client > resulted in a performance increase to 1.1GB/s. Question - are there > other option changes between 1.6.4.3 and 1.6.5 that might account for > the remaining 200MB/s loss in performance?I believe that the checksumming change is the major one. This mostly affects the single-threaded high-bandwidth case as you found out. With most HPC jobs the IO is multi-threaded and all of the cores on the client can contribute. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.