Nathan Kroenert
2007-Aug-27 00:12 UTC
[zfs-discuss] NV_65 AMD64 - ZFS seems to write fast and slow to a single spindle
Hey all - Just saw something really weird. I have been playing with by box for a little while now, and just noticed something whilst checking how fast / slow my IDE ports were on a newish motherboard... I had been copying around an image. Not a particularly large one - 500M ISO... I had been observing the read speed off disk, and write speed to disk. When reading from one disk and writing to another, I was seeing about 60MB/s and all was as expected. But, then, I thought I''d do one more run, and copied the *same* image as my last run... Of course, the image was in memory, so I expected there would be no reads and lots of writes. What I saw was lots of not very impressive speed (cmdk1 is the target): extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b cmdk0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 cmdk1 0.0 35.3 0.0 4514.1 32.4 2.0 975.3 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b cmdk0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 cmdk1 0.0 36.7 0.0 4697.6 32.4 2.0 936.8 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b cmdk0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 cmdk1 0.0 36.8 0.0 4650.6 32.4 2.0 935.1 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b cmdk0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0 0 cmdk1 0.0 37.5 0.0 4424.9 32.3 2.0 913.8 100 100 So my target disk, which is owned exclusively by ZFS, was apparently flat out writing at 4.4MB/s. At the time it goes bad, the svc_t jumps from about 125ms to 950ms. Ouch!! On closer inspection, I see that - - The cp returns almost immediately. (somewhat expected) - ZFS starts writing at about 60MB/s, but only for about 2 seconds (This is changable. Sometimes, it writes the whole image at the slower rate.) - the write rate drops back to 4 - 5MB/s - CPU usage is only 8% - I still have 1.5GB of 4GB free memory (Though I *am* running Xen at the moment. Not sure if that matters) - If I kick off a second copy to a different filename whilst the first is running it does not get any faster. - If I kick off a write to a raw zvol on the same pool, the write rate to the disk jumps back up to the expected 60MB/s, but drops again as soon as it''s completed the write to the raw zvol... So, it seems it''s not the disk itself. - The zpool *has* been exported and imported this boot. Not sure if that matters either. - I had a hunch that memory availability might be playing a part, so I forced a whole heap to be freed up with a honking big malloc and walk of the pages. I freed up 3GB (box has 4GB total) and it seems that I start to see the problem much more frequently as I get to about 1.5GB free. - It''s not entirely predictable. Sometimes, it''ll write at 50-60MB/s for up to 8 or so seconds, and others, it''ll only write fast for a burst right at the start, then take quite some time to write out the rest. It''s almost as if we are being throttled on the rate at which we can push data through the ZFS in-memory cache when writing previously read and written data. Or something equally bogus like me expecting that ZFS would write as fast as it can all the time, which I guess might be an invalid assumption? Now: This is running NV_65 with the Xen bits from back then. Not sure if that really matters. Does not seem that the disk is having problem - beaker:/disk2/crap # zpool status pool: disk2 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM disk2 ONLINE 0 0 0 c3d0 ONLINE 0 0 0 errors: No known data errors pool: zfs state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zfs ONLINE 0 0 0 c1d0s7 ONLINE 0 0 0 errors: No known data errors c1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: ST3320620AS Revision: Serial No: 6QF Size: 320.07GB <320070352896 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 c3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: ST3320620AS Revision: Serial No: 6QF Size: 320.07GB <320070352896 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 c0t1d0 Soft Errors: 4 Hard Errors: 302 Transport Errors: 0 If anyone within SWAN on the ZFS team wanted to take a look at this box and see if it''s a new bug or just me being a bonehead and not understanding what I''m seeing, please respond to me directly, and I can provide access. (I''ll make an effort not to reboot the box just in case it''s only this boot that sees the problems.) Nathan. :)