HI, Just getting my feet wet with zfs. I set up a test system (Sunblade 1000, dual channel scsi card, disk array with 14x18GB 15K RPM SCSI disks) and was trying to write a large file (10 GB) to the array to see how it performed. I configured the raid using raidz. During the write, I saw the disk access lights come on, but I noticed a peculiar behavior. The system would write to the disk, but then pause for a few seconds, then contineu, then pause for a few seconds. I saw the same behavior when I made a smaller raidz using 4x36 GB scsi drives in a different enclosure. Since I''m new to zfs, and realize that I''m probably missing something, I was hoping somebody might help shed some light on my problem. Thanks! This message posted from opensolaris.org
Hi Bob, Looks like : 6415647 Sequential writing is jumping http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6415647 -r ____________________________________________________________________________________ Roch Bourbonnais Sun Microsystems, Icnc-Grenoble Senior Performance Analyst 180, Avenue De L''Europe, 38330, Montbonnot Saint Martin, France Performance & Availability Engineering http://icncweb.france/~rbourbon http://blogs.sun.com/roller/page/roch Roch.Bourbonnais at Sun.Com (+33).4.76.18.83.20 Bob Evans writes: > HI, > > Just getting my feet wet with zfs. I set up a test system (Sunblade > 1000, dual channel scsi card, disk array with 14x18GB 15K RPM SCSI > disks) and was trying to write a large file (10 GB) to the array to > see how it performed. I configured the raid using raidz. > > During the write, I saw the disk access lights come on, but I noticed > a peculiar behavior. The system would write to the disk, but then > pause for a few seconds, then contineu, then pause for a few seconds. > > I saw the same behavior when I made a smaller raidz using 4x36 GB scsi > drives in a different enclosure. > > Since I''m new to zfs, and realize that I''m probably missing something, > I was hoping somebody might help shed some light on my problem. > > Thanks! > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
James C. McPherson
2006-Aug-14 15:38 UTC
[zfs-discuss] Significant "pauses" during zfs writes
Bob Evans wrote:> Just getting my feet wet with zfs. I set up a test system (Sunblade > 1000, dual channel scsi card, disk array with 14x18GB 15K RPM SCSI > disks) and was trying to write a large file (10 GB) to the array to > see how it performed. I configured the raid using raidz. > > During the write, I saw the disk access lights come on, but I noticed > a peculiar behavior. The system would write to the disk, but then > pause for a few seconds, then contineu, then pause for a few seconds. > > > I saw the same behavior when I made a smaller raidz using 4x36 GB > scsi drives in a different enclosure. > > Since I''m new to zfs, and realize that I''m probably missing > something, I was hoping somebody might help shed some light on my > problem.Hi Bob, I''m pretty sure that''s not a problem that you''re seeing, just ZFS'' normal behaviour. Writes are coalesced as much as possible, so the "pauses" that you observed are most likely going to be the system waiting for suitable IOs to be gathered up and sent out to your storage. If you want to examine this a bit more then might I suggest the DTrace Toolkit''s iosnoop utility. best regards, James C. McPherson
Yes James is right this is normal behaviour. Unless the writes are synchronous (O_DSYNC) or explicitely flushed (fsync()) then they are batched up, written out and committed as a transaction every txg_time (5 seconds). Neil. James C. McPherson wrote:> Bob Evans wrote: > >> Just getting my feet wet with zfs. I set up a test system (Sunblade >> 1000, dual channel scsi card, disk array with 14x18GB 15K RPM SCSI >> disks) and was trying to write a large file (10 GB) to the array to >> see how it performed. I configured the raid using raidz. >> >> During the write, I saw the disk access lights come on, but I noticed >> a peculiar behavior. The system would write to the disk, but then >> pause for a few seconds, then contineu, then pause for a few seconds. >> >> >> I saw the same behavior when I made a smaller raidz using 4x36 GB >> scsi drives in a different enclosure. >> >> Since I''m new to zfs, and realize that I''m probably missing >> something, I was hoping somebody might help shed some light on my >> problem. > > > Hi Bob, > I''m pretty sure that''s not a problem that you''re seeing, just > ZFS'' normal behaviour. Writes are coalesced as much as possible, > so the "pauses" that you observed are most likely going to be > the system waiting for suitable IOs to be gathered up and sent > out to your storage. > > If you want to examine this a bit more then might I suggest the > DTrace Toolkit''s iosnoop utility. > > > best regards, > James C. McPherson > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Neil
Neil Perrin writes: > Yes James is right this is normal behaviour. Unless the writes are > synchronous (O_DSYNC) or explicitely flushed (fsync()) then they > are batched up, written out and committed as a transaction > every txg_time (5 seconds). > > Neil. > > James C. McPherson wrote: > > Bob Evans wrote: > > > >> Just getting my feet wet with zfs. I set up a test system (Sunblade > >> 1000, dual channel scsi card, disk array with 14x18GB 15K RPM SCSI > >> disks) and was trying to write a large file (10 GB) to the array to > >> see how it performed. I configured the raid using raidz. > >> > >> During the write, I saw the disk access lights come on, but I noticed > >> a peculiar behavior. The system would write to the disk, but then > >> pause for a few seconds, then contineu, then pause for a few seconds. > >> > >> > >> I saw the same behavior when I made a smaller raidz using 4x36 GB > >> scsi drives in a different enclosure. > >> > >> Since I''m new to zfs, and realize that I''m probably missing > >> something, I was hoping somebody might help shed some light on my > >> problem. > > > > > > Hi Bob, > > I''m pretty sure that''s not a problem that you''re seeing, just > > ZFS'' normal behaviour. Writes are coalesced as much as possible, > > so the "pauses" that you observed are most likely going to be > > the system waiting for suitable IOs to be gathered up and sent > > out to your storage. > > > > If you want to examine this a bit more then might I suggest the > > DTrace Toolkit''s iosnoop utility. > > > > > > best regards, > > James C. McPherson > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > -- > > Neil > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss I also would agree if the application is burning 100% of a CPU. But if the application is being throttled at times _and_ the storage is itself not 100% exercised then I believe something is wrong and we have that anomalous jumpyness. So Bob, is the application burning a full CPU ? -r
I''m starting simple, there is no app. I have a 10GB file (called foo) on the internal FC drive, I did a zfs create raidz bar .... then ran "cp foo /bar/", so there is no cpu activity due to an app. As a test case, this took 7 min 30 sec to copy to the zfs partition. I removed the pool, formatted a single drive, and then tried to copy the same file to a single drive that was nfs mounted. This only took 5 min 30 sec. Watching disk activity, it seems like I get 2-4 seconds of writes, followed by 2-4 seconds of inactivity. Bob This message posted from opensolaris.org
Bob Evans writes: > I''m starting simple, there is no app. > > I have a 10GB file (called foo) on the internal FC drive, I did a zfs create raidz bar .... > then ran "cp foo /bar/", so there is no cpu activity due to an app. > > As a test case, this took 7 min 30 sec to copy to the zfs partition. I removed the pool, formatted > a single drive, and then tried to copy the same file to a single drive that was nfs mounted. > This only took 5 min 30 sec. > > Watching disk activity, it seems like I get 2-4 seconds of writes, followed by 2-4 seconds of inactivity. > > Bob That''s points to the jumpiness issue. BTW, cp(1) is your application ;-) If it''s not 100% on CPU then it''s not your throttle, the storage is and _that_ should be busy 100% of the time. -r > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
As added information, top reports that "cp" is using about 25% of the single cpu. There are no other apps running. Bob This message posted from opensolaris.org
One last tidbit, for what it is worth. Rather than watch top, I ran xcpustate. It seems that just as the writes pause, the cpu looks like it hits 100% (or very close), then it falls back down to its lower level. I''m still getting used to Solaris 10 as well, so if you have a DTrace script you''d recommend me running, please feel free to suggest it. This message posted from opensolaris.org
Bob Evans writes: > One last tidbit, for what it is worth. Rather than watch top, I ran > xcpustate. It seems that just as the writes pause, the cpu looks like > it hits 100% (or very close), then it falls back down to its lower > level. > > I''m still getting used to Solaris 10 as well, so if you have a DTrace > script you''d recommend me running, please feel free to suggest it. > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss I had overlooked that this could be a single CPU system. For such a test the potential limiting factors can be either of: 1. storage. 2. the single cp thread burning a CPU. 3. total system CPU overload. You''ve already indicated that storage is not 100% busy. The cp(1) thread is not burning a CPU. And it looks like the system is not CPU saturated, right ? I''d argue that the conjunction of those 3 things somewhat points to ''a problem''. -r
Nathan Kroenert
2006-Aug-14 23:09 UTC
[zfs-discuss] Significant "pauses" during zfs writes
Hey, Bob - It might be worth exploring where your data stream for the writes was coming from. Moreover, it might be worth exploring how fast it was filling up caches for writing. Were you delivering enough data to keep the disks busy 100% of the time? I have been tricked by this before... :) Nathan. On Tue, 2006-08-15 at 01:38, James C. McPherson wrote:> Bob Evans wrote: > > Just getting my feet wet with zfs. I set up a test system (Sunblade > > 1000, dual channel scsi card, disk array with 14x18GB 15K RPM SCSI > > disks) and was trying to write a large file (10 GB) to the array to > > see how it performed. I configured the raid using raidz. > > > > During the write, I saw the disk access lights come on, but I noticed > > a peculiar behavior. The system would write to the disk, but then > > pause for a few seconds, then contineu, then pause for a few seconds. > > > > > > I saw the same behavior when I made a smaller raidz using 4x36 GB > > scsi drives in a different enclosure. > > > > Since I''m new to zfs, and realize that I''m probably missing > > something, I was hoping somebody might help shed some light on my > > problem. > > Hi Bob, > I''m pretty sure that''s not a problem that you''re seeing, just > ZFS'' normal behaviour. Writes are coalesced as much as possible, > so the "pauses" that you observed are most likely going to be > the system waiting for suitable IOs to be gathered up and sent > out to your storage. > > If you want to examine this a bit more then might I suggest the > DTrace Toolkit''s iosnoop utility. > > > best regards, > James C. McPherson > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss--
Michael Schuster
2006-Aug-22 09:40 UTC
[zfs-discuss] Re: Significant "pauses" during zfs writes
IHAC who is using a very similar test (cp -pr /zpool1/Studio11 /zpool1/Studio11.copy) and is seeing behaviour similar to what we''ve seen described here; BUT since he''s using a single-CPU box (SunBlade 1500) and has a single disk in his pool, every time the CPU goes into "100%-mode", interactive usability is reduced to zero for 1-2 seconds. This is of course undesirable. + Do we have any bug IDs or RFEs addressing this? + Is there any tuning the customer can try (he''s willing to do anything with the machine, apart from donating it to me :-)? (we''ve already tried "echo txg_time/W1 | mdb -kw", with no improvement - on the contrary, machine returned to "usable" state hardly at all). + has customer (and I) missed a constraint about when/where to deploy zfs? thx Michael This message posted from opensolaris.org
Michael Schuster writes: > IHAC who is using a very similar test (cp -pr /zpool1/Studio11 > /zpool1/Studio11.copy) and is seeing behaviour similar to what we''ve > seen described here; BUT since he''s using a single-CPU box (SunBlade > 1500) and has a single disk in his pool, every time the CPU goes into > "100%-mode", interactive usability is reduced to zero for 1-2 > seconds. This is of course undesirable. > > + Do we have any bug IDs or RFEs addressing this? > + Is there any tuning the customer can try (he''s willing to do anything with the machine, apart from donating it to me :-)? (we''ve already tried "echo txg_time/W1 | mdb -kw", with no improvement - on the contrary, machine returned to "usable" state hardly at all). > + has customer (and I) missed a constraint about when/where to deploy zfs? > > thx > Michael > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss What is the desired behavior ? The workload (cp with compression) overloads the single CPU. Response time is bound to be affected. I am not sure how we could get compression to run at a low priority level, holding up write syscalls. I suggest to Buy a second CPU or turn off compression. With even with a second CPU, once compression becomes MT-hot, he might well suffer again. Maybe we need compression=on|off|iflightcpuload -r
Michael Schuster - Sun Microsystems
2006-Aug-22 11:30 UTC
[zfs-discuss] Re: Significant "pauses" during zfs writes
Roch wrote:> Michael Schuster writes: > > IHAC who is using a very similar test (cp -pr /zpool1/Studio11 > > /zpool1/Studio11.copy) and is seeing behaviour similar to what we''ve > > seen described here; BUT since he''s using a single-CPU box (SunBlade > > 1500) and has a single disk in his pool, every time the CPU goes into > > "100%-mode", interactive usability is reduced to zero for 1-2 > > seconds. This is of course undesirable. > > > > + Do we have any bug IDs or RFEs addressing this? > > + Is there any tuning the customer can try (he''s willing to do anything with the machine, apart from donating it to me :-)? (we''ve already tried "echo txg_time/W1 | mdb -kw", with no improvement - on the contrary, machine returned to "usable" state hardly at all). > > + has customer (and I) missed a constraint about when/where to deploy zfs? > > > > thx > > Michael > > > > > > This message posted from opensolaris.org > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > What is the desired behavior ?a system that doesn''t freeze every few seconds just because someone is busy on a zpool.> The workload (cp with compression) overloads the single > CPU. Response time is bound to be affected.I wasn''t aware that compression was involved here. Does raidz imply compression? I thought not. also note: my customer is NOT using compression.> > I am not sure how we could get compression to run at a low > priority level, holding up write syscalls. > > I suggest to Buy a second CPU or turn off compression.see above - compression is off (I checked :-) Michael -- Michael Schuster +49 89 46008-2974 / x62974 visit the online support center: http://www.sun.com/osc/ Recursion, n.: see ''Recursion''
Michael Schuster - Sun Microsystems writes: > Roch wrote: > > Michael Schuster writes: > > > IHAC who is using a very similar test (cp -pr /zpool1/Studio11 > > > /zpool1/Studio11.copy) and is seeing behaviour similar to what we''ve > > > seen described here; BUT since he''s using a single-CPU box (SunBlade > > > 1500) and has a single disk in his pool, every time the CPU goes into > > > "100%-mode", interactive usability is reduced to zero for 1-2 > > > seconds. This is of course undesirable. > > > > > > + Do we have any bug IDs or RFEs addressing this? > > > + Is there any tuning the customer can try (he''s willing to do anything with the machine, apart from donating it to me :-)? (we''ve already tried "echo txg_time/W1 | mdb -kw", with no improvement - on the contrary, machine returned to "usable" state hardly at all). > > > + has customer (and I) missed a constraint about when/where to deploy zfs? > > > > > > thx > > > Michael > > > > > > > > > This message posted from opensolaris.org > > > _______________________________________________ > > > zfs-discuss mailing list > > > zfs-discuss at opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > > > What is the desired behavior ? > > a system that doesn''t freeze every few seconds just because someone is busy > on a zpool. > > > The workload (cp with compression) overloads the single > > CPU. Response time is bound to be affected. > > I wasn''t aware that compression was involved here. Does raidz imply > compression? I thought not. > > also note: my customer is NOT using compression. > > > > > I am not sure how we could get compression to run at a low > > priority level, holding up write syscalls. > > > > I suggest to Buy a second CPU or turn off compression. > > see above - compression is off (I checked :-) > > Michael > -- > Michael Schuster +49 89 46008-2974 / x62974 > visit the online support center: http://www.sun.com/osc/ > > Recursion, n.: see ''Recursion'' Ok lets get a profile then: dtrace -n ''profile-10ms{@[stack(20)]=count()} END{trunc(@,20)}'' -r
Michael Schuster - Sun Microsystems
2006-Aug-23 10:49 UTC
[zfs-discuss] Re: Significant "pauses" during zfs writes
Roch wrote:> Ok lets get a profile then: > > dtrace -n ''profile-10ms{@[stack(20)]=count()} END{trunc(@,20)}''I sent this output offline to Roch, here''s the essential ones and (first) his reply:> So it looks like this: > > 6421427 netra x1 slagged by NFS over ZFS leading to long spins in the ATA driver code" > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6421427 > > I also wonder if this fix below will change radically the code but > I don''t really know this though... > > 6430966 SATA NCQ support is needed in SATA framework and hba driversSUNW,UltraSPARC-IIIi`drv_usecwait+0x24 uata`ata_wait+0x90 uata`acersb_get_intr_status+0x38 uata`ata_set_feature+0x148 uata`ata_disk_start+0x120 uata`ata_hba_start+0xbc uata`ghd_waitq_process_and_mutex_hold+0x6c uata`ghd_waitq_process_and_mutex_exit+0x4 uata`ghd_transport+0x134 uata`ata_disk_tran_start+0x104 dad`dcdstart+0x2dc dad`dcdstrategy+0x3d0 dad`dcdioctl+0xa74 zfs`vdev_disk_io_start+0xa0 zfs`vdev_config_sync+0xe8 zfs`spa_sync+0x238 zfs`txg_sync_thread+0x134 unix`thread_start+0x4 697 SUNW,UltraSPARC-IIIi`drv_usecwait+0x24 uata`ata_wait+0x90 uata`acersb_get_intr_status+0x38 uata`ata_set_feature+0x148 uata`ata_disk_start+0x120 uata`ata_hba_start+0xbc uata`ghd_waitq_process_and_mutex_hold+0x6c uata`ghd_waitq_process_and_mutex_exit+0x4 uata`ghd_transport+0x134 uata`ata_disk_tran_start+0x104 dad`dcdstart+0x2dc dad`dcdstrategy+0x3d0 dad`dcdioctl+0xa74 zfs`vdev_disk_io_start+0xa0 zfs`zio_wait+0xc zfs`vdev_config_sync+0x260 zfs`spa_sync+0x238 zfs`txg_sync_thread+0x134 unix`thread_start+0x4 847 SUNW,UltraSPARC-IIIi`drv_usecwait+0x24 uata`ata_wait+0x90 uata`acersb_get_intr_status+0x38 uata`ata_set_feature+0x148 uata`ata_disk_start+0x160 uata`ata_hba_start+0xbc uata`ghd_waitq_process_and_mutex_hold+0x6c uata`ghd_waitq_process_and_mutex_exit+0x4 uata`ghd_transport+0x134 uata`ata_disk_tran_start+0x104 dad`dcdstart+0x2dc dad`dcdstrategy+0x3d0 zfs`vdev_uberblock_sync+0xf4 zfs`vdev_uberblock_sync_tree+0x88 zfs`vdev_config_sync+0x218 zfs`spa_sync+0x238 zfs`txg_sync_thread+0x134 unix`thread_start+0x4 897 unix`idle+0x24 unix`thread_start+0x4 968 Thx to all Michael -- Michael Schuster +49 89 46008-2974 / x62974 visit the online support center: http://www.sun.com/osc/ Recursion, n.: see ''Recursion''
Robert Milkowski
2006-Aug-26 01:23 UTC
[zfs-discuss] Re: Significant "pauses" during zfs writes
Hello Michael, Wednesday, August 23, 2006, 12:49:28 PM, you wrote: MSSM> Roch wrote: MSSM> I sent this output offline to Roch, here''s the essential ones and (first) MSSM> his reply:>> So it looks like this: >> >> 6421427 netra x1 slagged by NFS over ZFS leading to long spins in the ATA driver code" >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6421427 >>Is there any workarounds? Or maybe some code not yeyt integrated to try? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Michael Schuster - Sun Microsystems
2006-Aug-28 07:59 UTC
[zfs-discuss] Re: Significant "pauses" during zfs writes
Robert Milkowski wrote:> Hello Michael, > > Wednesday, August 23, 2006, 12:49:28 PM, you wrote: > > MSSM> Roch wrote: > > > MSSM> I sent this output offline to Roch, here''s the essential ones and (first) > MSSM> his reply: > >>> So it looks like this: >>> >>> 6421427 netra x1 slagged by NFS over ZFS leading to long spins in the ATA driver code" >>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6421427 >>> > > Is there any workarounds? > Or maybe some code not yeyt integrated to try?not that I know of - Roch may be better informed than me though. cheers Michael -- Michael Schuster +49 89 46008-2974 / x62974 visit the online support center: http://www.sun.com/osc/ Recursion, n.: see ''Recursion''
George Wilson
2006-Aug-28 11:48 UTC
[zfs-discuss] Re: Significant "pauses" during zfs writes
A fix for this should be integrated shortly. Thanks, George Michael Schuster - Sun Microsystems wrote:> Robert Milkowski wrote: >> Hello Michael, >> >> Wednesday, August 23, 2006, 12:49:28 PM, you wrote: >> >> MSSM> Roch wrote: >> >> >> MSSM> I sent this output offline to Roch, here''s the essential ones >> and (first) >> MSSM> his reply: >> >>>> So it looks like this: >>>> >>>> 6421427 netra x1 slagged by NFS over ZFS leading to long spins in >>>> the ATA driver code" >>>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6421427 >>>> >> >> Is there any workarounds? >> Or maybe some code not yeyt integrated to try? > > not that I know of - Roch may be better informed than me though. > > cheers > Michael