nic@cray.com
2007-Jan-19 10:48 UTC
[Lustre-devel] [Bug 11540] SLES10 + 1.4.9pre: obdfilter shows large write performance degredation
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11540 A couple of new data points: - Mounting with barrier=0 seems to have gotten the write performance back. - Given that under SLES9, we see the message "disabling barrier-based syncs" quite soon after either Lustre or a regular ext3 mount, running under SLES10 with barrier=0 or the boot parameter barrier=off should not induce any extra data loss for hardware failures. - Barriers are off by default in the vanilla Linus kernel -- in the SLES10 kernel they are being turned on by a patch from Suse (more details available) - We do know that running with barriers off makes it even more critical to run e2fsck after a storage hardware failure - any thing that would generate SCSI errors on linux or result in the cache being lost. - From Documentation/block/barriers.txt, we need to find out exactly which behavior we are seeing to start investigating why these are so slow for us. At first glance it seems that the SCSI midlayer doesn''t support command tagging, so it is draining the whole request queue for these writes. More investigation is needed. - From forgetting to put into place the bug 11230 changes for the last round of obdfilter, it seems that a lconf change to detect this kernel and make those changes is needed. Finally -- the big question: - Is CFS comfortable with us running Lustre with barrier=0 ?