Nathan Kroenert
2008-Feb-07 06:54 UTC
[zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.
Hey all - I''m working on an interesting issue where I''m seeing ZFS being quite cranky about writing O_SYNC written blocks. Bottom line is that I have a small test case that does essentially this: open file for writing -- O_SYNC loop( write() 8KB of random data print time taken to write data } It''s taking anywhere up to 80 seconds per 8KB block. When the ''problem'' is not in evidence, (and it''s not always happening), I can do around 1200 O_SYNC writes per second... It seems to be waiting here virtually all of the time: > 0t11021::pid2proc | ::print proc_t p_tlist|::findstack -v stack pointer for thread 30171352960: 2a118052df1 [ 000002a118052df1 cv_wait+0x38() ] 000002a118052ea1 zil_commit+0x44(1, 6b50516, 193, 60005db66bc, 6b50570, 60005db6640) 000002a118052f51 zfs_write+0x554(0, 14000, 2a1180539e8, 6000af22840, 2000, 2a1180539d8) 000002a118053071 fop_write+0x20(304898cd100, 2a1180539d8, 10, 300a27a9e48, 0, 7b7462d0) 000002a118053121 write+0x268(4, 8058, 60051a3d738, 2000, 113, 1) 000002a118053221 dtrace_systrace_syscall32+0xac(4, ffbfdaf0, 2000, 21e80, ff3a00c0, ff3a0100) 000002a1180532e1 syscall_trap32+0xcc(4, ffbfdaf0, 2000, 21e80, ff3a00c0, ff3a0100) And this also evident in a dtrace of it, following the write in... <...> 28 -> zil_commit 28 -> cv_wait 28 -> thread_lock 28 <- thread_lock 28 -> cv_block 28 -> ts_sleep 28 <- ts_sleep 28 -> new_mstate 28 -> cpu_update_pct 28 -> cpu_grow 28 -> cpu_decay 28 -> exp_x 28 <- exp_x 28 <- cpu_decay 28 <- cpu_grow 28 <- cpu_update_pct 28 <- new_mstate 28 -> disp_lock_enter_high 28 <- disp_lock_enter_high 28 -> disp_lock_exit_high 28 <- disp_lock_exit_high 28 <- cv_block 28 -> sleepq_insert 28 <- sleepq_insert 28 -> disp_lock_exit_nopreempt 28 <- disp_lock_exit_nopreempt 28 -> swtch 28 -> disp 28 -> disp_lock_enter 28 <- disp_lock_enter 28 -> disp_lock_exit 28 <- disp_lock_exit 28 -> disp_getwork 28 <- disp_getwork 28 -> restore_mstate 28 <- restore_mstate 28 <- disp 28 -> pg_cmt_load 28 <- pg_cmt_load 28 <- swtch 28 -> resume 28 -> savectx 28 -> schedctl_save 28 <- schedctl_save 28 <- savectx <...> At this point, it waits for up to 80 seconds. I''m also seeing zil_commit() being called around 7-15 times per second. For kicks, I disabled the ZIL: zil_disable/W0t1, and that made not a pinch of difference. :) For what it''s worth, this is a T2000, running Oracle, connected to an HDS 9990 (using 2GB fibre), with 8KB record sizes for the oracle filesystems, and I''m only seeing the issue on the ZFS filesystems that have the active oracle tables on them. The O_SYNC test case is just trying to help me understand what''s happening. The *real* problem is that oracle is running like rubbish when it''s trying to roll forward archive logs from another server. It''s an almost 100% write workload. At the moment, it cannot even keep up with the other server''s log creation rate, and it''s barely doing anything. (The other box is quite different, so not really valid for direct comparison at this point). 6513020 looked interesting for a while, but I already have 120011-14 and 127111-03 and installed. I''m looking into the cache flush settings of the 9990 array to see if it''s that killing me, but I''m also looking for any other ideas on what might be hurting me. I also have set zfs:zfs_nocacheflush = 1 in /etc/system The Oracle Logs are on a separate Zpool and I''m not seeing the issue on those filesystems. The lockstats I have run are not yet all that interesting. If anyone has ideas on specific incantations I should use or some specific D or anything else, I''d be most appreciative. Cheers! Nathan.
Robert Milkowski
2008-Feb-10 11:51 UTC
[zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.
Hello Nathan, Thursday, February 7, 2008, 6:54:39 AM, you wrote: NK> For kicks, I disabled the ZIL: zil_disable/W0t1, and that made not a NK> pinch of difference. :) Have you exported and them imported pool to get zil_disable into effect? -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
Roch Bourbonnais
2008-Feb-15 10:51 UTC
[zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.
Le 10 f?vr. 08 ? 12:51, Robert Milkowski a ?crit :> Hello Nathan, > > Thursday, February 7, 2008, 6:54:39 AM, you wrote: > > NK> For kicks, I disabled the ZIL: zil_disable/W0t1, and that made > not a > NK> pinch of difference. :) > > Have you exported and them imported pool to get zil_disable into > effect? > >I don''t think export/import is required. -r> -- > Best regards, > Robert Milkowski mailto:milek at task.gda.pl > http://milek.blogspot.com > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Robert Milkowski
2008-Feb-20 22:03 UTC
[zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.
Hello Roch, Friday, February 15, 2008, 10:51:50 AM, you wrote: RB> Le 10 f?vr. 08 ? 12:51, Robert Milkowski a ?crit :>> Hello Nathan, >> >> Thursday, February 7, 2008, 6:54:39 AM, you wrote: >> >> NK> For kicks, I disabled the ZIL: zil_disable/W0t1, and that made >> not a >> NK> pinch of difference. :) >> >> Have you exported and them imported pool to get zil_disable into >> effect? >> >>RB> I don''t think export/import is required. You sure? I''m sure it''s been required unless something changed. -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
Roch Bourbonnais
2008-Feb-20 23:32 UTC
[zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.
Le 20 f?vr. 08 ? 23:03, Robert Milkowski a ?crit :> Hello Roch, > > Friday, February 15, 2008, 10:51:50 AM, you wrote: > > RB> Le 10 f?vr. 08 ? 12:51, Robert Milkowski a ?crit : > >>> Hello Nathan, >>> >>> Thursday, February 7, 2008, 6:54:39 AM, you wrote: >>> >>> NK> For kicks, I disabled the ZIL: zil_disable/W0t1, and that made >>> not a >>> NK> pinch of difference. :) >>> >>> Have you exported and them imported pool to get zil_disable into >>> effect? >>> >>> > > RB> I don''t think export/import is required. > > You sure? I''m sure it''s been required unless something changed. >Sorry, my mistake, I was thinking of zfs_nocacheflush. You''re absolutely right. -r nits : pool export/import will see the modified zil_disable value but an unmount/mount cycle of an individual filesystem is sufficient to get the effect.> -- > Best regards, > Robert Milkowski mailto:milek at task.gda.pl > http://milek.blogspot.com >