Hi, I have a low-power server with three drives in it, like so: matt at vault:~$ zpool status pool: rpool state: ONLINE scan: resilvered 588M in 0h3m with 0 errors on Fri Jan 7 07:38:06 2011 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c8t1d0s0 ONLINE 0 0 0 c8t0d0s0 ONLINE 0 0 0 cache c12d0s0 ONLINE 0 0 0 errors: No known data errors I''m running netatalk file sharing for mac, and using it as a time machine backup server for my mac laptop. When files are copying to the server, I often see periods of a minute or so where network traffic stops. I''m convinced that there''s some bottleneck in the storage side of things because when this happens, I can still ping the machine and if I have an ssh window, open, I can still see output from a `top` command running smoothly. However, if I try and do anything that touches disk (eg `ls`) that command stalls. At the time it comes good, everything comes good, file copies across the network continue, etc. If I have a ssh terminal session open and run `iostat -nv 5` I see something like this: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.2 36.0 153.6 4608.0 1.2 0.3 31.9 9.3 16 18 c12d0 0.0 113.4 0.0 7446.7 0.8 0.1 7.0 0.5 15 5 c8t0d0 0.2 106.4 4.1 7427.8 4.0 0.1 37.8 1.4 93 14 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.4 73.2 25.7 9243.0 2.3 0.7 31.6 9.8 34 37 c12d0 0.0 226.6 0.0 24860.5 1.6 0.2 7.0 0.9 25 19 c8t0d0 0.2 127.6 3.4 12377.6 3.8 0.3 29.7 2.2 91 27 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 44.2 0.0 5657.6 1.4 0.4 31.7 9.0 19 20 c12d0 0.2 76.0 4.8 9420.8 1.1 0.1 14.2 1.7 12 13 c8t0d0 0.0 16.6 0.0 2058.4 9.0 1.0 542.1 60.2 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.2 0.0 25.6 0.0 0.0 0.3 2.3 0 0 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 11.0 0.0 1365.6 9.0 1.0 818.1 90.9 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.2 0.0 0.1 0.0 0.0 0.0 0.1 25.4 0 1 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 17.6 0.0 2182.4 9.0 1.0 511.3 56.8 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 16.6 0.0 2058.4 9.0 1.0 542.1 60.2 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 15.8 0.0 1959.2 9.0 1.0 569.6 63.3 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.2 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0 0 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 17.4 0.0 2157.6 9.0 1.0 517.2 57.4 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 18.2 0.0 2256.8 9.0 1.0 494.5 54.9 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 14.8 0.0 1835.2 9.0 1.0 608.1 67.5 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.2 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0 0 c12d0 0.0 1.4 0.0 0.6 0.0 0.0 0.0 0.2 0 0 c8t0d0 0.0 49.0 0.0 6049.6 6.7 0.5 137.6 11.2 100 55 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 55.4 0.0 7091.2 1.9 0.6 34.9 9.9 27 28 c12d0 0.2 126.0 8.6 9347.7 1.4 0.1 11.4 0.6 20 7 c8t0d0 0.0 120.8 0.0 9340.4 4.9 0.2 40.5 1.5 77 18 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.2 57.0 153.6 7271.2 1.8 0.5 31.0 9.4 26 28 c12d0 0.2 108.4 12.8 6498.9 0.3 0.1 2.5 0.6 6 5 c8t0d0 0.2 104.8 5.2 6506.8 4.0 0.2 38.2 1.4 67 15 c8t1d0 The stall occurs when the drive c8t1d0 is 100% waiting, and doing only slow i/o, typically writing about 2MB/s. However, the other drive is all zeros... doing nothing. The drives are: c8t0d0 - Western Digital Green - SATA_____WDC_WD15EARS-00Z_____WD-WMAVU2582242 c8t1d0 - Samsung Silencer - SATA_____SAMSUNG_HD154UI_______S1XWJDWZ309550 I''ve installed smartmon and done a short and long test on both drives, all resulting in no found errors. I expect that the c8t0d0 WD Green is the lemon here and for some reason is getting stuck in periods where it can write no faster than about 2MB/s. Does this sound right? Secondly, what I wonder is why it is that the whole file system seems to hang up at this time. Surely if the other drive is doing nothing, a web page can be served by reading from the available drive (c8t1d0) while the slow drive (c8t0d0) is stuck writing slow. I have 4GB RAM in the box, and it''s not doing much other than running apache httpd and netatalk. Thanks for any input, Matt -- This message posted from opensolaris.org
matt.connolly.au at gmail.com said:> extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 1.2 36.0 153.6 4608.0 1.2 0.3 31.9 9.3 16 18 c12d0 > 0.0 113.4 0.0 7446.7 0.8 0.1 7.0 0.5 15 5 c8t0d0 > 0.2 106.4 4.1 7427.8 4.0 0.1 37.8 1.4 93 14 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.4 73.2 25.7 9243.0 2.3 0.7 31.6 9.8 34 37 c12d0 > 0.0 226.6 0.0 24860.5 1.6 0.2 7.0 0.9 25 19 c8t0d0 > 0.2 127.6 3.4 12377.6 3.8 0.3 29.7 2.2 91 27 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 44.2 0.0 5657.6 1.4 0.4 31.7 9.0 19 20 c12d0 > 0.2 76.0 4.8 9420.8 1.1 0.1 14.2 1.7 12 13 c8t0d0 > 0.0 16.6 0.0 2058.4 9.0 1.0 542.1 60.2 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.2 0.0 25.6 0.0 0.0 0.3 2.3 0 0 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 11.0 0.0 1365.6 9.0 1.0 818.1 90.9 100 100 c8t1d0 > . . .matt.connolly.au at gmail.com said:> I expect that the c8t0d0 WD Green is the lemon here and for some reason is > getting stuck in periods where it can write no faster than about 2MB/s. Does > this sound right?No, it''s the opposite. The drive sitting at 100%-busy, c8t1d0, while the other drive is idle, is the sick one. It''s slower than the other, has 9.0 operations waiting (queued) to finish. The other one is idle because it has already finished the write activity and is waiting for the slow one in the mirror to catch up. If you run "iostat -xn" without the interval argument, i.e. so it prints out only one set of stats, you''ll see the average performance of the drives since last reboot. If the "asvc_t" figure is significantly larger for one drive than the other, that''s a way to identify the one which has been slower over the long term.> Secondly, what I wonder is why it is that the whole file system seems to hang > up at this time. Surely if the other drive is doing nothing, a web page can > be served by reading from the available drive (c8t1d0) while the slow drive > (c8t0d0) is stuck writing slow.The available drive is c8t0d0 in this case. However, if ZFS is in the middle of a txg (ZFS transaction) commit, it cannot safely do much with the pool until that commit finishes. You can see that ZFS only lets 10 operations accumulate per drive (used to be 35), i.e. 9.0 in the "wait" column, and 1.0 in the "actv" column, so it''s kinda stuck until the drive gets its work done. Maybe the drive is failing, or maybe it''s one of those with large sectors that are not properly aligned with the on-disk partitions. Regards, Marion
Thanks, Marion. (I actually got the drive labels mixed up in the original post... I edited it on the forum page: http://opensolaris.org/jive/thread.jspa?messageID=511057#511057 ) My suspicion was the same: the drive doing the slow i/o is the problem. I managed to confirm that by taking the other drive offline (c8t0d0 samsung), and the same stalls and slow i/o occurred. After putting the drive online (and letting the resilver complete) I took the slow drive (c8t1d0 western digital green) offline and the system ran very nicely. It is a 4k sector drive, but I thought zfs recognised those drives and didn''t need any special configuration...? -- This message posted from opensolaris.org
matt.connolly.au at gmail.com said:> After putting the drive online (and letting the resilver complete) I took the > slow drive (c8t1d0 western digital green) offline and the system ran very > nicely. > > It is a 4k sector drive, but I thought zfs recognised those drives and didn''t > need any special configuration...?That''s a nice confirmation of the cost of not doing anything special (:-). I hear the problem may be due to 4k drives which report themselves as 512b drives, for boot/BIOS compatibility reasons. I''ve also seen various ways to force 4k alignment, and check what the "ashift" value is in your pool''s drives, etc. Google "solaris zfs 4k sector align" will lead the way. Regards, Marion
Observation below... On Feb 4, 2011, at 7:10 PM, Matt Connolly wrote:> Hi, I have a low-power server with three drives in it, like so: > > > matt at vault:~$ zpool status > pool: rpool > state: ONLINE > scan: resilvered 588M in 0h3m with 0 errors on Fri Jan 7 07:38:06 2011 > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c8t1d0s0 ONLINE 0 0 0 > c8t0d0s0 ONLINE 0 0 0 > cache > c12d0s0 ONLINE 0 0 0 > > errors: No known data errors > > > I''m running netatalk file sharing for mac, and using it as a time machine backup server for my mac laptop. > > When files are copying to the server, I often see periods of a minute or so where network traffic stops. I''m convinced that there''s some bottleneck in the storage side of things because when this happens, I can still ping the machine and if I have an ssh window, open, I can still see output from a `top` command running smoothly. However, if I try and do anything that touches disk (eg `ls`) that command stalls. At the time it comes good, everything comes good, file copies across the network continue, etc. > > If I have a ssh terminal session open and run `iostat -nv 5` I see something like this: > > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 1.2 36.0 153.6 4608.0 1.2 0.3 31.9 9.3 16 18 c12d0 > 0.0 113.4 0.0 7446.7 0.8 0.1 7.0 0.5 15 5 c8t0d0 > 0.2 106.4 4.1 7427.8 4.0 0.1 37.8 1.4 93 14 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.4 73.2 25.7 9243.0 2.3 0.7 31.6 9.8 34 37 c12d0 > 0.0 226.6 0.0 24860.5 1.6 0.2 7.0 0.9 25 19 c8t0d0 > 0.2 127.6 3.4 12377.6 3.8 0.3 29.7 2.2 91 27 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 44.2 0.0 5657.6 1.4 0.4 31.7 9.0 19 20 c12d0 > 0.2 76.0 4.8 9420.8 1.1 0.1 14.2 1.7 12 13 c8t0d0 > 0.0 16.6 0.0 2058.4 9.0 1.0 542.1 60.2 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.2 0.0 25.6 0.0 0.0 0.3 2.3 0 0 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 11.0 0.0 1365.6 9.0 1.0 818.1 90.9 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.2 0.0 0.1 0.0 0.0 0.0 0.1 25.4 0 1 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 17.6 0.0 2182.4 9.0 1.0 511.3 56.8 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 16.6 0.0 2058.4 9.0 1.0 542.1 60.2 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 15.8 0.0 1959.2 9.0 1.0 569.6 63.3 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.2 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0 0 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 17.4 0.0 2157.6 9.0 1.0 517.2 57.4 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 18.2 0.0 2256.8 9.0 1.0 494.5 54.9 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 14.8 0.0 1835.2 9.0 1.0 608.1 67.5 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.2 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0 0 c12d0 > 0.0 1.4 0.0 0.6 0.0 0.0 0.0 0.2 0 0 c8t0d0 > 0.0 49.0 0.0 6049.6 6.7 0.5 137.6 11.2 100 55 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 55.4 0.0 7091.2 1.9 0.6 34.9 9.9 27 28 c12d0 > 0.2 126.0 8.6 9347.7 1.4 0.1 11.4 0.6 20 7 c8t0d0 > 0.0 120.8 0.0 9340.4 4.9 0.2 40.5 1.5 77 18 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 1.2 57.0 153.6 7271.2 1.8 0.5 31.0 9.4 26 28 c12d0 > 0.2 108.4 12.8 6498.9 0.3 0.1 2.5 0.6 6 5 c8t0d0 > 0.2 104.8 5.2 6506.8 4.0 0.2 38.2 1.4 67 15 c8t1d0The queues are building in the HBA (wait, wsvc_t, %w) not at the disk (actv, asvc_t, %b). Changing the disk might not help. Changing the controller might help immensely.> The stall occurs when the drive c8t1d0 is 100% waiting, and doing only slow i/o, typically writing about 2MB/s. However, the other drive is all zeros... doing nothing. > > The drives are: > c8t0d0 - Western Digital Green - SATA_____WDC_WD15EARS-00Z_____WD-WMAVU2582242 > c8t1d0 - Samsung Silencer - SATA_____SAMSUNG_HD154UI_______S1XWJDWZ309550 > > > I''ve installed smartmon and done a short and long test on both drives, all resulting in no found errors. >smartmon doesn''t know anything about controllers. What sort of controller is it? -- richard> > I expect that the c8t0d0 WD Green is the lemon here and for some reason is getting stuck in periods where it can write no faster than about 2MB/s. Does this sound right? > > > Secondly, what I wonder is why it is that the whole file system seems to hang up at this time. Surely if the other drive is doing nothing, a web page can be served by reading from the available drive (c8t1d0) while the slow drive (c8t0d0) is stuck writing slow. > > I have 4GB RAM in the box, and it''s not doing much other than running apache httpd and netatalk. > > > Thanks for any input, > Matt > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> It is a 4k sector drive, but I thought zfs recognised those drives and didn''t > need any special configuration...?4k drives are a big problem for ZFS, much has been posted/written about it. Basically, if the 4k drives report 512 byte blocks, as they almost all do, then ZFS does not detect and configure the pool correctly. If the drive actually reports the real 4k block size, ZFS handles this very nicely. So the problem/fault is drives misreporting the real block size, to maintain compatibility with other OS''s etc, and not really with ZFS. cheers Andy.
Thanks Richard - interesting... The c8 controller is the motherboard SATA controller on an Intel D510 motherboard. I''ve read over the man page for iostat again, and I don''t see anything in there that makes a distinction between the controller and the device. If it is the controller, would it make sense that the problem affects only one drive and not the other? It still smells of a drive issue to me. Since the controller is on the motherboard and difficult to replace, I''ll replace the drive shortly and see how it goes. Nonetheless, I still find it odd that the whole io system effectively hangs up when one drive''s queue fills up. Since the purpose of a mirror is to continue operating in the case of one drive''s failure, I find it frustrating that the system slows right down so much because one drive''s i/o queue is full. -- This message posted from opensolaris.org
On Wed, February 9, 2011 04:51, Matt Connolly wrote:> Nonetheless, I still find it odd that the whole io system effectively > hangs up when one drive''s queue fills up. Since the purpose of a mirror is > to continue operating in the case of one drive''s failure, I find it > frustrating that the system slows right down so much because one drive''s > i/o queue is full.I see what you''re saying. But I don''t think mirror systems really try to handle asymmetric performance. They either treat the drives equivalently, or else they decide one of them is "broken" and don''t use it at all. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
On Feb 9, 2011, at 2:51 AM, Matt Connolly wrote:> Thanks Richard - interesting... > > The c8 controller is the motherboard SATA controller on an Intel D510 motherboard. > > I''ve read over the man page for iostat again, and I don''t see anything in there that makes a distinction between the controller and the device. > > If it is the controller, would it make sense that the problem affects only one drive and not the other? It still smells of a drive issue to me.It might be a drive issue. It might be a controller issue. It might be a cable issue. The reason for the slowness is not necessarily visible to the OS, beyond the two queues shown in iostat.> Since the controller is on the motherboard and difficult to replace, I''ll replace the drive shortly and see how it goes.The hardware might be fine. At this point, given the data you''ve shared, it is not possible to identify the root cause. We can only show where the slowdown is and you can look more closely at the suspect components. Lately, I''ve spent a lot of time with LSIutil and I am really impressed with the ability to identify hardware issues on all data paths. Is there a similar utility for Intel controllers?> Nonetheless, I still find it odd that the whole io system effectively hangs up when one drive''s queue fills up. Since the purpose of a mirror is to continue operating in the case of one drive''s failure, I find it frustrating that the system slows right down so much because one drive''s i/o queue is full.Slow != failed, for some definition of slow. -- richard
> matt at vault:~$ zpool status > pool: rpool > state: ONLINE > scan: resilvered 588M in 0h3m with 0 errors on Fri Jan 7 07:38:06 2011 > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c8t1d0s0 ONLINE 0 0 0 > c8t0d0s0 ONLINE 0 0 0 > cache > c12d0s0 ONLINE 0 0 0 > > errors: No known data errors<snip/>> The stall occurs when the drive c8t1d0 is 100% waiting, and doing only > slow i/o, typically writing about 2MB/s. However, the other drive is > all zeros... doing nothing. > > The drives are: > c8t0d0 - Western Digital Green - > SATA_____WDC_WD15EARS-00Z_____WD-WMAVU2582242 > c8t1d0 - Samsung Silencer - > SATA_____SAMSUNG_HD154UI_______S1XWJDWZ309550 > > I''ve installed smartmon and done a short and long test on both drives, > all resulting in no found errors.Just a hunch, but try to run iostat -e or -E to see the error statistics. http://karlsbakk.net/iostat-overview.sh will give you a nice overview if you''re on a wide terminal. The times I''ve seen one drive slow down a pool, it has always been because of errors on that drive. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.