Joe Little
2006-Apr-19 00:27 UTC
[zfs-discuss] asymtopic scrubbing / problematic zfs reporting
I had an Adaptec Serial ATA RAID 21610SA card for a large 16-drive raid-z ZFS pool. I''ve been syncing data to it nightly, and most nights the system is responsive but a "zpool status" or a "df" hangs at the ZFS volume. There is never a any log messages (is there any ZFS logging?) and I have to hard reset the system to get it back. After bringing it up, I can do a "zpool status" and no errors are ever reported. That''s the first problem. To see if there was something amiss with the pool, I kicked of a "zpool scrub" on the volume. After many hours, it fluctuates from 5 hours to go down to close to 4, going from 0.something percent complete to 14% complete. When I regularly check on it, it seems to go back in time, to a point with greater minutes to go and _less_ completion progress on the scrub. After 4 hours and hitting the mid-teens progress multiple times, I''m at this now: root at sram:~ # zpool status pool: shares state: ONLINE scrub: scrub in progress, 7.06% done, 4h38m to go config: NAME STATE READ WRITE CKSUM shares ONLINE 0 0 0 raidz ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 ONLINE 0 0 0 c2t14d0 ONLINE 0 0 0 c2t15d0 ONLINE 0 0 0 errors: No known data errors Does a scrub go on indefinitely?
Joe Little
2006-Apr-19 05:26 UTC
[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting
Replying to the group after discovering one thing.. I had may snapshot schedule in place, doing hourly, daily, monthly, etc. Well, it appears that a snapshot across multiple zfs volumes within a pool is likely what was reseting the scrub. After disabling the snapshots, it appears it was able to proceed without further regressions in progress, and is now busily sitting at 100%. Is snapshots affecting scrubing the expected behavior, or is this a bug? Also, alot of i/o still occurs even though its 100% done -- and its been that way for quite some time. zpool status: pool: shares state: ONLINE scrub: scrub in progress, 100.00% done, 0h0m to go zpool iostat: shares 1.12T 2.51T 554 0 67.2M 0 shares 1.12T 2.51T 988 0 57.6M 0 shares 1.12T 2.51T 1.78K 0 25.0M 0 Second, I''m surprised the system has been up after all that i/o compared to my previous instabilitly. It leads me to suspect either writes (I disabled the cards write cache) or somehow snapshots themselves on 1.4TB of data are somehow triggering the catatonic state of the ZFS subsystem in a non-determistic way. I''ll enable the write cache on the adaptec card tomorrow to rule that out. On 4/18/06, Joe Little <jmlittle at gmail.com> wrote:> I had an Adaptec Serial ATA RAID 21610SA card for a large 16-drive > raid-z ZFS pool. I''ve been syncing data to it nightly, and most nights > the system is responsive but a "zpool status" or a "df" hangs at the > ZFS volume. There is never a any log messages (is there any ZFS > logging?) and I have to hard reset the system to get it back. After > bringing it up, I can do a "zpool status" and no errors are ever > reported. That''s the first problem. > > To see if there was something amiss with the pool, I kicked of a > "zpool scrub" on the volume. After many hours, it fluctuates from 5 > hours to go down to close to 4, going from 0.something percent > complete to 14% complete. When I regularly check on it, it seems to go > back in time, to a point with greater minutes to go and _less_ > completion progress on the scrub. After 4 hours and hitting the > mid-teens progress multiple times, I''m at this now: > > root at sram:~ # zpool status > pool: shares > state: ONLINE > scrub: scrub in progress, 7.06% done, 4h38m to go > config: > > NAME STATE READ WRITE CKSUM > shares ONLINE 0 0 0 > raidz ONLINE 0 0 0 > c2t0d0 ONLINE 0 0 0 > c2t1d0 ONLINE 0 0 0 > c2t2d0 ONLINE 0 0 0 > c2t3d0 ONLINE 0 0 0 > c2t4d0 ONLINE 0 0 0 > c2t5d0 ONLINE 0 0 0 > c2t6d0 ONLINE 0 0 0 > c2t7d0 ONLINE 0 0 0 > raidz ONLINE 0 0 0 > c2t8d0 ONLINE 0 0 0 > c2t9d0 ONLINE 0 0 0 > c2t10d0 ONLINE 0 0 0 > c2t11d0 ONLINE 0 0 0 > c2t12d0 ONLINE 0 0 0 > c2t13d0 ONLINE 0 0 0 > c2t14d0 ONLINE 0 0 0 > c2t15d0 ONLINE 0 0 0 > > errors: No known data errors > > Does a scrub go on indefinitely? >
Niclas Sodergard
2006-Apr-19 07:44 UTC
[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting
On 4/19/06, Joe Little <jmlittle at gmail.com> wrote:> Replying to the group after discovering one thing.. I had may snapshot > schedule in place, doing hourly, daily, monthly, etc. > > Well, it appears that a snapshot across multiple zfs volumes within a > pool is likely what was reseting the scrub. After disabling the > snapshots, it appears it was able to proceed without further > regressions in progress, and is now busily sitting at 100%. Is > snapshots affecting scrubing the expected behavior, or is this a bug?I have seen exactly the same problem but until your email I haven''t been able to figure out what is going on. I could run a scrub for 24 hours and it wouldn''t go past 0.4%. I''m doing snapshots every 15 minutes on a number of filesystems in that pool (307 snapshots at the moment) and when I disabled the cronjobs that takes care of this scrubbing start to work perfectly. pool: data state: ONLINE scrub: scrub in progress, 16.33% done, 1h9m to go config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror ONLINE 0 0 0 c1d0s6 ONLINE 0 0 0 c0d0s6 ONLINE 0 0 0 errors: No known data errors Thanks a lot for the fix. It definitely looks like a bug in zfs. I guess in the meantime I will make my snapshot scripts a bit smarter so that they check if the zpool is being scrubbed at the moment. cheers, Nickus
Niclas Sodergard
2006-Apr-19 07:47 UTC
[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting
On 4/19/06, Niclas Sodergard <nickus at gmail.com> wrote:> I have seen exactly the same problem but until your email I haven''t > been able to figure out what is going on. I could run a scrub for 24 > hours and it wouldn''t go past 0.4%. I''m doing snapshots every 15 > minutes on a number of filesystems in that pool (307 snapshots at the > moment) and when I disabled the cronjobs that takes care of this > scrubbing start to work perfectly.I also delete the oldest snapshot every 15 minutes. So both multiple snapshot and multiple destroy operations every 15 minutes. cheers, Nickus
Joe Little
2006-Apr-19 13:34 UTC
[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting
Same here. I use the posted hourly/daily/monthly snapshot script that was used as an example. Every hourly snapshot taken generally also deletes one after the first 24 hours. On 4/19/06, Niclas Sodergard <nickus at gmail.com> wrote:> On 4/19/06, Niclas Sodergard <nickus at gmail.com> wrote: > > > I have seen exactly the same problem but until your email I haven''t > > been able to figure out what is going on. I could run a scrub for 24 > > hours and it wouldn''t go past 0.4%. I''m doing snapshots every 15 > > minutes on a number of filesystems in that pool (307 snapshots at the > > moment) and when I disabled the cronjobs that takes care of this > > scrubbing start to work perfectly. > > I also delete the oldest snapshot every 15 minutes. So both multiple > snapshot and multiple destroy operations every 15 minutes. > > cheers, > Nickus >
Joe Little
2006-Apr-19 13:37 UTC
[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting
List: Well, after another 8 hours at being 100% done, the scrub never completed. So I did a "zpool scrub -s shares" to stop it.. That commands seems to wedge and can''t be killed. On 4/19/06, Niclas Sodergard <nickus at gmail.com> wrote:> On 4/19/06, Niclas Sodergard <nickus at gmail.com> wrote: > > > I have seen exactly the same problem but until your email I haven''t > > been able to figure out what is going on. I could run a scrub for 24 > > hours and it wouldn''t go past 0.4%. I''m doing snapshots every 15 > > minutes on a number of filesystems in that pool (307 snapshots at the > > moment) and when I disabled the cronjobs that takes care of this > > scrubbing start to work perfectly. > > I also delete the oldest snapshot every 15 minutes. So both multiple > snapshot and multiple destroy operations every 15 minutes. > > cheers, > Nickus >
Eric Schrock
2006-Apr-19 16:05 UTC
[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting
On Wed, Apr 19, 2006 at 09:47:36AM +0200, Niclas Sodergard wrote:> On 4/19/06, Niclas Sodergard <nickus at gmail.com> wrote: > > > I have seen exactly the same problem but until your email I haven''t > > been able to figure out what is going on. I could run a scrub for 24 > > hours and it wouldn''t go past 0.4%. I''m doing snapshots every 15 > > minutes on a number of filesystems in that pool (307 snapshots at the > > moment) and when I disabled the cronjobs that takes care of this > > scrubbing start to work perfectly. > > I also delete the oldest snapshot every 15 minutes. So both multiple > snapshot and multiple destroy operations every 15 minutes.This is a known bug: 6343667 need itinerary so interrupted scrub/resilver doesn''t have to start over Basically, what happens is that the current traversal code can''t handle snapshot creation or deletion. Rather than preventing snapshots from being taken during a resilver, we opted to work around this by restarting a resilver (or scrub) every time a snapshot is taken or deleted. This is not cool - if you are taking snapshots on a regular basis (as you are), then your resilver will never complete. Jeff has a fix for this in the works. Hopefully it will arrive soon. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Joe Little
2006-Apr-20 16:03 UTC
[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting
Back to the problems with the underlying ZFS dying after snapshots are made... I ran all night fine, but this morning, I checked on its status: root at sram:~ # zpool status pool: shares state: ONLINE scrub: scrub stopped with 0 errors on Wed Apr 19 09:39:03 2006 config: NAME STATE READ WRITE CKSUM shares ONLINE 0 0 0 raidz ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 ONLINE 0 0 0 c2t14d0 ONLINE 0 0 0 c2t15d0 ONLINE 0 0 0 errors: No known data errors root at sram:~ # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/dsk/c0d0s0 57123983 345905 56206839 1% / swap 3433504 468 3433036 1% /etc/svc/volatile /usr/lib/libc/libc_hwcap1.so.1 57123983 345905 56206839 1% /lib/libc.so.1 swap 3433036 0 3433036 0% /tmp swap 3433056 20 3433036 1% /var/run shares 2641118782 28 2641118754 1% /shares (And there it wedges). So, zpool says everything is fine, but the zfs volumes are toast and the system needs to be hard reset to get it back. Any zfs command will wedge. On 4/19/06, Eric Schrock <eric.schrock at sun.com> wrote:> On Wed, Apr 19, 2006 at 09:47:36AM +0200, Niclas Sodergard wrote: > > On 4/19/06, Niclas Sodergard <nickus at gmail.com> wrote: > > > > > I have seen exactly the same problem but until your email I haven''t > > > been able to figure out what is going on. I could run a scrub for 24 > > > hours and it wouldn''t go past 0.4%. I''m doing snapshots every 15 > > > minutes on a number of filesystems in that pool (307 snapshots at the > > > moment) and when I disabled the cronjobs that takes care of this > > > scrubbing start to work perfectly. > > > > I also delete the oldest snapshot every 15 minutes. So both multiple > > snapshot and multiple destroy operations every 15 minutes. > > This is a known bug: > > 6343667 need itinerary so interrupted scrub/resilver doesn''t have to start over > > Basically, what happens is that the current traversal code can''t handle > snapshot creation or deletion. Rather than preventing snapshots from > being taken during a resilver, we opted to work around this by > restarting a resilver (or scrub) every time a snapshot is taken or > deleted. This is not cool - if you are taking snapshots on a regular > basis (as you are), then your resilver will never complete. Jeff has a > fix for this in the works. Hopefully it will arrive soon. > > - Eric > > -- > Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock >
Eric Schrock
2006-Apr-20 16:09 UTC
[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting
On Thu, Apr 20, 2006 at 09:03:39AM -0700, Joe Little wrote:> > (And there it wedges). So, zpool says everything is fine, but the zfs > volumes are toast and the system needs to be hard reset to get it > back. Any zfs command will wedge. >What build is this? Prior to build 36, scrubbing/resilvering wasn''t throttled at all, and would issue I/Os as fast as it could traverse metadata. If this was all in-core, it could issue thousands of near-simultaneous I/Os which would give the appearance of a "hung" system. If you''re running pre-36 bits, I''d recommend upgrading first. If you''re running 36 or later, and can get the system into this state, can you do a "reboot -d" to get a crash dump (which will show up in /var/crash/<hostname>) and provide a pointer to it? If you don''t have a publicly available FTP/website, just email me privately and I can give you instructions on how to send it to Sun. Alternatively, you could send the thread list output, which would be a reasonable start. On a live system in this state, run ''mdb -k'', and then send the output of: > ::walk thread | ::findstack As well as: > ::pgrep zfs | ::walk thread | ::findstack > ::pgrep zpool | ::walk thread | ::findstack Thanks. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Joe Little
2006-Apr-20 22:39 UTC
[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting
Well, I''ll bring it back up as solaris soon. It was nexentra a4 (B36-based). I wanted to isolate whether the adaptec 16-port SATA card was bad, so I went to Linux and tried to kill it with a software raid/lvm/xfs combo and bonnie. It appears to be dying there as well but with DMA problems. The system was perfectly stable with the 3ware cards that I ordered with it originally, but sadly, neither the LSI 300-8x or 3ware series are supported by Solaris, just this Adaptec and the marvel cards (which are in turn not supported by Linux). Sad state of affairs in the SATA large port count device world. On 4/20/06, Eric Schrock <eric.schrock at sun.com> wrote:> On Thu, Apr 20, 2006 at 09:03:39AM -0700, Joe Little wrote: > > > > (And there it wedges). So, zpool says everything is fine, but the zfs > > volumes are toast and the system needs to be hard reset to get it > > back. Any zfs command will wedge. > > > > What build is this? Prior to build 36, scrubbing/resilvering wasn''t > throttled at all, and would issue I/Os as fast as it could traverse > metadata. If this was all in-core, it could issue thousands of > near-simultaneous I/Os which would give the appearance of a "hung" > system. If you''re running pre-36 bits, I''d recommend upgrading first. > > If you''re running 36 or later, and can get the system into this state, > can you do a "reboot -d" to get a crash dump (which will show up in > /var/crash/<hostname>) and provide a pointer to it? If you don''t have a > publicly available FTP/website, just email me privately and I can give > you instructions on how to send it to Sun. > > Alternatively, you could send the thread list output, which would be a > reasonable start. On a live system in this state, run ''mdb -k'', and > then send the output of: > > > ::walk thread | ::findstack > > As well as: > > > ::pgrep zfs | ::walk thread | ::findstack > > ::pgrep zpool | ::walk thread | ::findstack > > Thanks. > > - Eric > > -- > Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock >
Al Hopper
2006-Apr-21 00:30 UTC
[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting
On Thu, 20 Apr 2006, Joe Little wrote:> Well, I''ll bring it back up as solaris soon. It was nexentra a4 > (B36-based). I wanted to isolate whether the adaptec 16-port SATA card > was bad, so I went to Linux and tried to kill it with a software > raid/lvm/xfs combo and bonnie. It appears to be dying there as well > but with DMA problems. The system was perfectly stable with the 3ware > cards that I ordered with it originally, but sadly, neither the LSI > 300-8x or 3ware series are supported by Solaris, just this Adaptec andThe LSI 300-8x and 150-4 are supported by the Sun amr driver[1]. All you have to do is add the appropriate entries into /etc/driver_aliases ... unless you wish to boot off them, in which case, you have a little more work to do. Email me offlist if you have any questions.> the marvel cards (which are in turn not supported by Linux). Sad state > of affairs in the SATA large port count device world.Everyone who has been around solaris x86 knows that lack of drivers is a critical issue. The (closed) marvell driver works well with the SuperMicro 8-port (dumb/newer version) controller card. This card/driver and ZFS make a killer combo - and in raidz "mode", beat the living daylights out of RAID5 performance on the LSI hardware RAID cards. But FYI, the LSI cards don''t support JBOD operation. This is unfortunate, because I have one system running on a 150-4 and I was hoping to *upgrade* it to ZFS when update 2 ships. Just so you know where my biases lie - I refer to adaptec hardware as AdaptKrap hardware. :) ..... snip ..... [1] with full credit going to Chad Leigh on the solaris on intel list. Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005