thr3ads.net - zfs discuss - [zfs-discuss] asymtopic scrubbing / problematic zfs reporting [Apr 2006]

If this information is useful, please help other people find it:
Share via:

Joe Little

2006-Apr-19 00:27 UTC

[zfs-discuss] asymtopic scrubbing / problematic zfs reporting

I had an Adaptec Serial ATA RAID 21610SA card for a large 16-drive
raid-z ZFS pool. I''ve been syncing data to it nightly, and most nights
the system is responsive but a "zpool status" or a "df"
hangs at the
ZFS volume. There is never a any log messages (is there any ZFS
logging?) and I have to hard reset the system to get it back. After
bringing it up, I can do a "zpool status" and no errors are ever
reported. That''s the first problem.

To see if there was something amiss with the pool, I kicked of a
"zpool scrub" on the volume. After many hours, it fluctuates from 5
hours to go down to close to 4, going from 0.something percent
complete to 14% complete. When I regularly check on it, it seems to go
back in time, to a point with greater minutes to go and _less_
completion progress on the scrub. After 4 hours and hitting the
mid-teens progress multiple times, I''m at this now:

root at sram:~ # zpool status
  pool: shares
 state: ONLINE
 scrub: scrub in progress, 7.06% done, 4h38m to go
config:

        NAME         STATE     READ WRITE CKSUM
        shares       ONLINE       0     0     0
          raidz      ONLINE       0     0     0
            c2t0d0   ONLINE       0     0     0
            c2t1d0   ONLINE       0     0     0
            c2t2d0   ONLINE       0     0     0
            c2t3d0   ONLINE       0     0     0
            c2t4d0   ONLINE       0     0     0
            c2t5d0   ONLINE       0     0     0
            c2t6d0   ONLINE       0     0     0
            c2t7d0   ONLINE       0     0     0
          raidz      ONLINE       0     0     0
            c2t8d0   ONLINE       0     0     0
            c2t9d0   ONLINE       0     0     0
            c2t10d0  ONLINE       0     0     0
            c2t11d0  ONLINE       0     0     0
            c2t12d0  ONLINE       0     0     0
            c2t13d0  ONLINE       0     0     0
            c2t14d0  ONLINE       0     0     0
            c2t15d0  ONLINE       0     0     0

errors: No known data errors

Does a scrub go on indefinitely?

Joe Little

2006-Apr-19 05:26 UTC

head link

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

Replying to the group after discovering one thing.. I had may snapshot
schedule in place, doing hourly, daily, monthly, etc.

Well, it appears that a snapshot across multiple zfs volumes within a
pool is likely what was reseting the scrub. After disabling the
snapshots, it appears it was able to proceed without further
regressions in progress, and is now busily sitting at 100%. Is
snapshots affecting scrubing the expected behavior, or is this a bug?

Also, alot of i/o still occurs even though its 100% done -- and its
been that way for quite some time.

zpool status:
  pool: shares
 state: ONLINE
 scrub: scrub in progress, 100.00% done, 0h0m to go

zpool iostat:
shares      1.12T  2.51T    554      0  67.2M      0
shares      1.12T  2.51T    988      0  57.6M      0
shares      1.12T  2.51T  1.78K      0  25.0M      0

Second, I''m surprised the system has been up after all that i/o
compared to my previous instabilitly. It leads me to suspect either
writes (I disabled the cards write cache) or somehow snapshots
themselves on 1.4TB of data are somehow triggering the catatonic state
of the ZFS subsystem in a non-determistic way. I''ll enable the write
cache on the adaptec card tomorrow to rule that out.

On 4/18/06, Joe Little <jmlittle at gmail.com>
wrote:> I had an Adaptec Serial ATA RAID 21610SA card for a large 16-drive
> raid-z ZFS pool. I''ve been syncing data to it nightly, and most
nights
> the system is responsive but a "zpool status" or a "df"
hangs at the
> ZFS volume. There is never a any log messages (is there any ZFS
> logging?) and I have to hard reset the system to get it back. After
> bringing it up, I can do a "zpool status" and no errors are ever
> reported. That''s the first problem.
>
> To see if there was something amiss with the pool, I kicked of a
> "zpool scrub" on the volume. After many hours, it fluctuates from
5
> hours to go down to close to 4, going from 0.something percent
> complete to 14% complete. When I regularly check on it, it seems to go
> back in time, to a point with greater minutes to go and _less_
> completion progress on the scrub. After 4 hours and hitting the
> mid-teens progress multiple times, I''m at this now:
>
> root at sram:~ # zpool status
>   pool: shares
>  state: ONLINE
>  scrub: scrub in progress, 7.06% done, 4h38m to go
> config:
>
>         NAME         STATE     READ WRITE CKSUM
>         shares       ONLINE       0     0     0
>           raidz      ONLINE       0     0     0
>             c2t0d0   ONLINE       0     0     0
>             c2t1d0   ONLINE       0     0     0
>             c2t2d0   ONLINE       0     0     0
>             c2t3d0   ONLINE       0     0     0
>             c2t4d0   ONLINE       0     0     0
>             c2t5d0   ONLINE       0     0     0
>             c2t6d0   ONLINE       0     0     0
>             c2t7d0   ONLINE       0     0     0
>           raidz      ONLINE       0     0     0
>             c2t8d0   ONLINE       0     0     0
>             c2t9d0   ONLINE       0     0     0
>             c2t10d0  ONLINE       0     0     0
>             c2t11d0  ONLINE       0     0     0
>             c2t12d0  ONLINE       0     0     0
>             c2t13d0  ONLINE       0     0     0
>             c2t14d0  ONLINE       0     0     0
>             c2t15d0  ONLINE       0     0     0
>
> errors: No known data errors
>
> Does a scrub go on indefinitely?
>

Niclas Sodergard

2006-Apr-19 07:44 UTC

head link

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

On 4/19/06, Joe Little <jmlittle at gmail.com>
wrote:> Replying to the group after discovering one thing.. I had may snapshot
> schedule in place, doing hourly, daily, monthly, etc.
>
> Well, it appears that a snapshot across multiple zfs volumes within a
> pool is likely what was reseting the scrub. After disabling the
> snapshots, it appears it was able to proceed without further
> regressions in progress, and is now busily sitting at 100%. Is
> snapshots affecting scrubing the expected behavior, or is this a bug?
I have seen exactly the same problem but until your email I haven''t
been able to figure out what is going on. I could run a scrub for 24
hours and it wouldn''t go past 0.4%. I''m doing snapshots every
15
minutes on a number of filesystems in that pool (307 snapshots at the
moment) and when I disabled the cronjobs that takes care of this
scrubbing start to work perfectly.

  pool: data
 state: ONLINE
 scrub: scrub in progress, 16.33% done, 1h9m to go
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1d0s6  ONLINE       0     0     0
            c0d0s6  ONLINE       0     0     0

errors: No known data errors

Thanks a lot for the fix. It definitely looks like a bug in zfs. I
guess in the meantime I will make my snapshot scripts a bit smarter so
that they check if the zpool is being scrubbed at the moment.

cheers,
Nickus

Niclas Sodergard

2006-Apr-19 07:47 UTC

head link

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

On 4/19/06, Niclas Sodergard <nickus at gmail.com> wrote:
> I have seen exactly the same problem but until your email I
haven''t
> been able to figure out what is going on. I could run a scrub for 24
> hours and it wouldn''t go past 0.4%. I''m doing snapshots
every 15
> minutes on a number of filesystems in that pool (307 snapshots at the
> moment) and when I disabled the cronjobs that takes care of this
> scrubbing start to work perfectly.
I also delete the oldest snapshot every 15 minutes. So both multiple
snapshot and multiple destroy operations every 15 minutes.

cheers,
Nickus

Joe Little

2006-Apr-19 13:34 UTC

head link

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

Same here. I use the posted hourly/daily/monthly snapshot script that
was used as an example. Every hourly snapshot taken generally also
deletes one after the first 24 hours.


On 4/19/06, Niclas Sodergard <nickus at gmail.com>
wrote:> On 4/19/06, Niclas Sodergard <nickus at gmail.com> wrote:
>
> > I have seen exactly the same problem but until your email I
haven''t
> > been able to figure out what is going on. I could run a scrub for 24
> > hours and it wouldn''t go past 0.4%. I''m doing
snapshots every 15
> > minutes on a number of filesystems in that pool (307 snapshots at the
> > moment) and when I disabled the cronjobs that takes care of this
> > scrubbing start to work perfectly.
>
> I also delete the oldest snapshot every 15 minutes. So both multiple
> snapshot and multiple destroy operations every 15 minutes.
>
> cheers,
> Nickus
>

Joe Little

2006-Apr-19 13:37 UTC

head link

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

List:

Well, after another 8 hours at being 100% done, the scrub never
completed. So I did a "zpool scrub -s shares" to stop it.. That
commands seems to wedge and can''t be killed.


On 4/19/06, Niclas Sodergard <nickus at gmail.com>
wrote:> On 4/19/06, Niclas Sodergard <nickus at gmail.com> wrote:
>
> > I have seen exactly the same problem but until your email I
haven''t
> > been able to figure out what is going on. I could run a scrub for 24
> > hours and it wouldn''t go past 0.4%. I''m doing
snapshots every 15
> > minutes on a number of filesystems in that pool (307 snapshots at the
> > moment) and when I disabled the cronjobs that takes care of this
> > scrubbing start to work perfectly.
>
> I also delete the oldest snapshot every 15 minutes. So both multiple
> snapshot and multiple destroy operations every 15 minutes.
>
> cheers,
> Nickus
>

Eric Schrock

2006-Apr-19 16:05 UTC

head link

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

On Wed, Apr 19, 2006 at 09:47:36AM +0200, Niclas Sodergard
wrote:> On 4/19/06, Niclas Sodergard <nickus at gmail.com> wrote:
> 
> > I have seen exactly the same problem but until your email I
haven''t
> > been able to figure out what is going on. I could run a scrub for 24
> > hours and it wouldn''t go past 0.4%. I''m doing
snapshots every 15
> > minutes on a number of filesystems in that pool (307 snapshots at the
> > moment) and when I disabled the cronjobs that takes care of this
> > scrubbing start to work perfectly.
> 
> I also delete the oldest snapshot every 15 minutes. So both multiple
> snapshot and multiple destroy operations every 15 minutes.
This is a known bug:

6343667 need itinerary so interrupted scrub/resilver doesn''t have to
start over

Basically, what happens is that the current traversal code can''t handle
snapshot creation or deletion.  Rather than preventing snapshots from
being taken during a resilver, we opted to work around this by
restarting a resilver (or scrub) every time a snapshot is taken or
deleted.  This is not cool - if you are taking snapshots on a regular
basis (as you are), then your resilver will never complete.  Jeff has a
fix for this in the works.  Hopefully it will arrive soon.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Joe Little

2006-Apr-20 16:03 UTC

head link

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

Back to the problems with the underlying ZFS dying after snapshots are made...

I ran all night fine, but this morning, I checked on its status:

root at sram:~ # zpool status
  pool: shares
 state: ONLINE
 scrub: scrub stopped with 0 errors on Wed Apr 19 09:39:03 2006
config:

        NAME         STATE     READ WRITE CKSUM
        shares       ONLINE       0     0     0
          raidz      ONLINE       0     0     0
            c2t0d0   ONLINE       0     0     0
            c2t1d0   ONLINE       0     0     0
            c2t2d0   ONLINE       0     0     0
            c2t3d0   ONLINE       0     0     0
            c2t4d0   ONLINE       0     0     0
            c2t5d0   ONLINE       0     0     0
            c2t6d0   ONLINE       0     0     0
            c2t7d0   ONLINE       0     0     0
          raidz      ONLINE       0     0     0
            c2t8d0   ONLINE       0     0     0
            c2t9d0   ONLINE       0     0     0
            c2t10d0  ONLINE       0     0     0
            c2t11d0  ONLINE       0     0     0
            c2t12d0  ONLINE       0     0     0
            c2t13d0  ONLINE       0     0     0
            c2t14d0  ONLINE       0     0     0
            c2t15d0  ONLINE       0     0     0

errors: No known data errors
root at sram:~ # df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/dsk/c0d0s0       57123983    345905  56206839   1% /
swap                   3433504       468   3433036   1% /etc/svc/volatile
/usr/lib/libc/libc_hwcap1.so.1
                      57123983    345905  56206839   1% /lib/libc.so.1
swap                   3433036         0   3433036   0% /tmp
swap                   3433056        20   3433036   1% /var/run
shares               2641118782        28 2641118754   1% /shares

(And there it wedges). So, zpool says everything is fine, but the zfs
volumes are toast and the system needs to be hard reset to get it
back. Any zfs command will wedge.


On 4/19/06, Eric Schrock <eric.schrock at sun.com>
wrote:> On Wed, Apr 19, 2006 at 09:47:36AM +0200, Niclas Sodergard wrote:
> > On 4/19/06, Niclas Sodergard <nickus at gmail.com> wrote:
> >
> > > I have seen exactly the same problem but until your email I
haven''t
> > > been able to figure out what is going on. I could run a scrub for
24
> > > hours and it wouldn''t go past 0.4%. I''m doing
snapshots every 15
> > > minutes on a number of filesystems in that pool (307 snapshots at
the
> > > moment) and when I disabled the cronjobs that takes care of this
> > > scrubbing start to work perfectly.
> >
> > I also delete the oldest snapshot every 15 minutes. So both multiple
> > snapshot and multiple destroy operations every 15 minutes.
>
> This is a known bug:
>
> 6343667 need itinerary so interrupted scrub/resilver doesn''t have
to start over
>
> Basically, what happens is that the current traversal code can''t
handle
> snapshot creation or deletion.  Rather than preventing snapshots from
> being taken during a resilver, we opted to work around this by
> restarting a resilver (or scrub) every time a snapshot is taken or
> deleted.  This is not cool - if you are taking snapshots on a regular
> basis (as you are), then your resilver will never complete.  Jeff has a
> fix for this in the works.  Hopefully it will arrive soon.
>
> - Eric
>
> --
> Eric Schrock, Solaris Kernel Development      
http://blogs.sun.com/eschrock
>

Eric Schrock

2006-Apr-20 16:09 UTC

head link

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

On Thu, Apr 20, 2006 at 09:03:39AM -0700, Joe Little
wrote:> 
> (And there it wedges). So, zpool says everything is fine, but the zfs
> volumes are toast and the system needs to be hard reset to get it
> back. Any zfs command will wedge.
> 
What build is this?  Prior to build 36, scrubbing/resilvering wasn''t
throttled at all, and would issue I/Os as fast as it could traverse
metadata.  If this was all in-core, it could issue thousands of
near-simultaneous I/Os which would give the appearance of a "hung"
system.  If you''re running pre-36 bits, I''d recommend
upgrading first.

If you''re running 36 or later, and can get the system into this state,
can you do a "reboot -d" to get a crash dump (which will show up in
/var/crash/<hostname>) and provide a pointer to it?  If you don''t
have a
publicly available FTP/website, just email me privately and I can give
you instructions on how to send it to Sun.

Alternatively, you could send the thread list output, which would be a
reasonable start.  On a live system in this state, run ''mdb
-k'', and
then send the output of:

	> ::walk thread | ::findstack

As well as:

	> ::pgrep zfs | ::walk thread | ::findstack
	> ::pgrep zpool | ::walk thread | ::findstack

Thanks.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Joe Little

2006-Apr-20 22:39 UTC

head link

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

Well, I''ll bring it back up as solaris soon. It was nexentra a4
(B36-based). I wanted to isolate whether the adaptec 16-port SATA card
was bad, so I went to Linux and tried to kill it with a software
raid/lvm/xfs combo and bonnie. It appears to be dying there as well
but with DMA problems. The system was perfectly stable with the 3ware
cards that I ordered with it originally, but sadly, neither the LSI
300-8x or 3ware series are supported by Solaris, just this Adaptec and
the marvel cards (which are in turn not supported by Linux). Sad state
of affairs in the SATA large port count device world.

On 4/20/06, Eric Schrock <eric.schrock at sun.com>
wrote:> On Thu, Apr 20, 2006 at 09:03:39AM -0700, Joe Little wrote:
> >
> > (And there it wedges). So, zpool says everything is fine, but the zfs
> > volumes are toast and the system needs to be hard reset to get it
> > back. Any zfs command will wedge.
> >
>
> What build is this?  Prior to build 36, scrubbing/resilvering
wasn''t
> throttled at all, and would issue I/Os as fast as it could traverse
> metadata.  If this was all in-core, it could issue thousands of
> near-simultaneous I/Os which would give the appearance of a
"hung"
> system.  If you''re running pre-36 bits, I''d recommend
upgrading first.
>
> If you''re running 36 or later, and can get the system into this
state,
> can you do a "reboot -d" to get a crash dump (which will show up
in
> /var/crash/<hostname>) and provide a pointer to it?  If you
don''t have a
> publicly available FTP/website, just email me privately and I can give
> you instructions on how to send it to Sun.
>
> Alternatively, you could send the thread list output, which would be a
> reasonable start.  On a live system in this state, run ''mdb
-k'', and
> then send the output of:
>
>         > ::walk thread | ::findstack
>
> As well as:
>
>         > ::pgrep zfs | ::walk thread | ::findstack
>         > ::pgrep zpool | ::walk thread | ::findstack
>
> Thanks.
>
> - Eric
>
> --
> Eric Schrock, Solaris Kernel Development      
http://blogs.sun.com/eschrock
>

Al Hopper

2006-Apr-21 00:30 UTC

head link

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

On Thu, 20 Apr 2006, Joe Little wrote:
> Well, I''ll bring it back up as solaris soon. It was nexentra a4
> (B36-based). I wanted to isolate whether the adaptec 16-port SATA card
> was bad, so I went to Linux and tried to kill it with a software
> raid/lvm/xfs combo and bonnie. It appears to be dying there as well
> but with DMA problems. The system was perfectly stable with the 3ware
> cards that I ordered with it originally, but sadly, neither the LSI
> 300-8x or 3ware series are supported by Solaris, just this Adaptec and
The LSI 300-8x and 150-4 are supported by the Sun amr driver[1].  All you
have to do is add the appropriate entries into /etc/driver_aliases ...
unless you wish to boot off them, in which case, you have a little more
work to do.

Email me offlist if you have any questions.
> the marvel cards (which are in turn not supported by Linux). Sad state
> of affairs in the SATA large port count device world.
Everyone who has been around solaris x86 knows that lack of drivers is a
critical issue.  The (closed) marvell driver works well with the
SuperMicro 8-port (dumb/newer version) controller card.  This card/driver
and ZFS make a killer combo - and in raidz "mode", beat the living
daylights out of RAID5 performance on the LSI hardware RAID cards.

But FYI, the LSI cards don''t support JBOD operation.  This is
unfortunate,
because I have one system running on a 150-4 and I was hoping to *upgrade*
it to ZFS when update 2 ships.

Just so you know where my biases lie - I refer to adaptec hardware as
AdaptKrap hardware.  :)

..... snip .....

[1] with full credit going to Chad Leigh on the solaris on intel list.

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005

zfs discuss - Apr 2006 - asymtopic scrubbing / problematic zfs reporting

[zfs-discuss] asymtopic scrubbing / problematic zfs reporting

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting

[zfs-discuss] Re: asymtopic scrubbing / problematic zfs reporting