nic@cray.com
2007-Jan-17 13:51 UTC
[Lustre-devel] [Bug 11540] SLES10 + 1.4.9pre: obdfilter shows large write performance degredation
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11540 I still owe you folks the obdfilter studies, but we have some tangential data that might prove interesting. We have been chasing a slow NFS write problem in parallel with this issue -- namely that parallel compiles on an NFS exported ext3 filesystem were taking near 10 minutes instead of 5 seconds. We were able to capture server side stack traces during one of these slow periods, and it appears that all of the nfsd threads are stuck in log_wait_commit() where it is looping across the transaction ids (tid) and comparing that against the journal->j_commit_sequence. My view is they are waiting for their transaction id to be comitted to disk. We then have kjournald as the only remaining suspect, and it appears to be stuck in sync_dirty_buffer() from journal_write_commit_record(): kjournald D ffff810072dabbb8 0 2849 1 2866 2784 (L-TLB) ffff810072dabbb8 ffff810072dabb78 ffffffff8028e48a ffff81007c6f4340 0000000000000561 ffffffff803ca180 ffff81007c6f4558 0000000000000008 0000000000000086 0000000000000086 Call Trace: <ffffffff8028e48a>{scsi_request_fn+858} <ffffffff80356d25>{io_schedule+53} <ffffffff8017e1fb>{sync_buffer+59} <ffffffff80356f05>{__wait_on_bit+69} <ffffffff8017e1c0>{sync_buffer+0} <ffffffff8017e1c0>{sync_buffer+0} <ffffffff80357006>{out_of_line_wait_on_bit+198} <ffffffff80147b20>{wake_bit_function+0} <ffffffff80147b20>{wake_bit_function+0} <ffffffff8017e277>{__wait_on_buffer+39} <ffffffff80181be2>{sync_dirty_buffer+162} <ffffffff80205e2f>{journal_write_commit_record+159} <ffffffff8020b566>{journal_free_journal_head+22} <ffffffff8020b90e>{__journal_remove_journal_head+430} <ffffffff80206bb2>{journal_commit_transaction+3314} <ffffffff8013acac>{lock_timer_base+44} <ffffffff8013af3f>{try_to_del_timer_sync+95} <ffffffff8020948e>{kjournald+254} <ffffffff80147ae0>{autoremove_wake_function+0} <ffffffff80147ae0>{autoremove_wake_function+0} <ffffffff80209380>{commit_timeout+0} <ffffffff8010ce52>{child_rip+8} <ffffffff80209390>{kjournald+0} Reading through journal_write_commit_record() shows the error message we often see on SLES9 of: "JBD: barrier-based sync failed on %s - " "disabling barriers\n", We noticed this message was NOT being printed under SLES10. As a test, we mounted the ext3 filesystem as "-o barrier=0" to see if this is an issue, and low-and-behold, the compiles are back down to 5 seconds. What is not clear is (1) why are barrier based syncs working now on SLES10 and (2) why oh why are they so dang slow? For the interested, here is a sd_iostats snapshot of the disk during barrier on & off test: barrier on (SLOW): eelc0-0c0s0n3:~ # cat /tmp/barrier_on_sdiostats index: 0 snapshot_time: 386.275018 size reads total writes total 512 0 0 0 0 1024 0 0 0 0 2048 0 0 0 0 4096 0 0 14911 119288 8192 0 0 1699 30800 16384 0 0 801 33288 32768 0 0 821 71080 65536 0 0 129 20336 131072 0 0 4 1144 total 0 0 18365 275936 barrier on (FAST): eelc0-0c0s0n3:~ # cat /proc/scsi/sd_iostats/sda index: 0 snapshot_time: 57.314699 size reads total writes total 512 0 0 0 0 1024 0 0 0 0 2048 0 0 0 0 4096 11 88 5066 40528 8192 1 24 399 7384 16384 0 0 1575 67696 32768 0 0 286 22608 65536 0 0 86 14808 131072 0 0 69 24728 262144 0 0 74 50688 524288 0 0 63 64512 total 12 112 7618 292952 While this hasn''t been tested on Lustre yet (lconf --mountfsoptions barrier=0), we could sure use the help diagnosing this. I will be running the Lustre test later today to see if this is the source of our woes there as well.