thr3ads.net - freebsd stable - zfs resilver keeps restarting [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Marc UBM Bocklet

2016-Sep-18 13:09 UTC

zfs resilver keeps restarting

Hi all,

due to two bad cables, I had two drives drop from my striped raidz2
pool (built on top of geli encrypted drives). I replaced one of the
drives before I realized that the cabling was at fault - that's the
drive which is being replaced in the ouput of zpool status below.

I have just installed the new cables and all sata errors are gone.
However, the resilver of the pool keeps restarting.

I see no errors in /var/log/messages, but zpool history -i says:

2016-09-18.14:56:21 [txg:1219501] scan setup func=2 mintxg=3
maxtxg=1219391 2016-09-18.14:56:51 [txg:1219505] scan done complete=0
2016-09-18.14:56:51 [txg:1219505] scan setup func=2 mintxg=3
maxtxg=1219391 2016-09-18.14:57:20 [txg:1219509] scan done complete=0
2016-09-18.14:57:20 [txg:1219509] scan setup func=2 mintxg=3
maxtxg=1219391 2016-09-18.14:57:49 [txg:1219513] scan done complete=0
2016-09-18.14:57:49 [txg:1219513] scan setup func=2 mintxg=3
maxtxg=1219391 2016-09-18.14:58:19 [txg:1219517] scan done complete=0
2016-09-18.14:58:19 [txg:1219517] scan setup func=2 mintxg=3
maxtxg=1219391 2016-09-18.14:58:45 [txg:1219521] scan done complete=0
2016-09-18.14:58:45 [txg:1219521] scan setup func=2 mintxg=3
maxtxg=1219391

I assume that "scan done complete=0" means that the resilver
didn't
finish?

pool layout is the following:

 pool: pool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool
will continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Sep 18 14:51:39 2016
        235G scanned out of 9.81T at 830M/s, 3h21m to go
        13.2M resilvered, 2.34% done
config:

        NAME                        STATE     READ WRITE CKSUM
        pool                        DEGRADED     0     0     0
          raidz2-0                  ONLINE       0     0     0
            da6.eli                 ONLINE       0     0     0
            da7.eli                 ONLINE       0     0     0
            ada1.eli                ONLINE       0     0     0
            ada2.eli                ONLINE       0     0     0
            da10.eli                ONLINE       0     0     2
            da11.eli                ONLINE       0     0     0
            da12.eli                ONLINE       0     0     0
            da13.eli                ONLINE       0     0     0
          raidz2-1                  DEGRADED     0     0     0
            da0.eli                 ONLINE       0     0     0
            da1.eli                 ONLINE       0     0     0
            da2.eli                 ONLINE       0     0     1
(resilvering) 
	    replacing-3             DEGRADED     0     0     1
              10699825708166646100  UNAVAIL      0     0     0
was /dev/da3.eli da4.eli            ONLINE       0     0     0
(resilvering) 
            da3.eli                 ONLINE       0     0     0
            da5.eli                 ONLINE       0     0     0
            da8.eli                 ONLINE       0     0     0
            da9.eli                 ONLINE       0     0     0

errors: No known data errors

system is 
FreeBSD xxx 10.1-BETA1 FreeBSD 10.1-BETA1 #27 r271633:
Mon Sep 15 22:34:05 CEST 2014
root at xxx:/usr/obj/usr/src/sys/xxx  amd64

controller is
SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor]

Drives are connected via four four-port sata cables.

Should I upgrade to 10.3-release or did I make some sort of
configuration error / overlook something?

Thanks in advance!

Cheers,
Marc

Alan Somers

2016-Sep-18 16:05 UTC

head link

zfs resilver keeps restarting

On Sun, Sep 18, 2016 at 7:09 AM, Marc UBM Bocklet via freebsd-stable
<freebsd-stable at freebsd.org> wrote:>
> Hi all,
>
> due to two bad cables, I had two drives drop from my striped raidz2
> pool (built on top of geli encrypted drives). I replaced one of the
> drives before I realized that the cabling was at fault - that's the
> drive which is being replaced in the ouput of zpool status below.
>
> I have just installed the new cables and all sata errors are gone.
> However, the resilver of the pool keeps restarting.
>
> I see no errors in /var/log/messages, but zpool history -i says:
>
> 2016-09-18.14:56:21 [txg:1219501] scan setup func=2 mintxg=3
> maxtxg=1219391 2016-09-18.14:56:51 [txg:1219505] scan done complete=0
> 2016-09-18.14:56:51 [txg:1219505] scan setup func=2 mintxg=3
> maxtxg=1219391 2016-09-18.14:57:20 [txg:1219509] scan done complete=0
> 2016-09-18.14:57:20 [txg:1219509] scan setup func=2 mintxg=3
> maxtxg=1219391 2016-09-18.14:57:49 [txg:1219513] scan done complete=0
> 2016-09-18.14:57:49 [txg:1219513] scan setup func=2 mintxg=3
> maxtxg=1219391 2016-09-18.14:58:19 [txg:1219517] scan done complete=0
> 2016-09-18.14:58:19 [txg:1219517] scan setup func=2 mintxg=3
> maxtxg=1219391 2016-09-18.14:58:45 [txg:1219521] scan done complete=0
> 2016-09-18.14:58:45 [txg:1219521] scan setup func=2 mintxg=3
> maxtxg=1219391
>
> I assume that "scan done complete=0" means that the resilver
didn't
> finish?
>
> pool layout is the following:
>
>  pool: pool
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool
> will continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>   scan: resilver in progress since Sun Sep 18 14:51:39 2016
>         235G scanned out of 9.81T at 830M/s, 3h21m to go
>         13.2M resilvered, 2.34% done
> config:
>
>         NAME                        STATE     READ WRITE CKSUM
>         pool                        DEGRADED     0     0     0
>           raidz2-0                  ONLINE       0     0     0
>             da6.eli                 ONLINE       0     0     0
>             da7.eli                 ONLINE       0     0     0
>             ada1.eli                ONLINE       0     0     0
>             ada2.eli                ONLINE       0     0     0
>             da10.eli                ONLINE       0     0     2
>             da11.eli                ONLINE       0     0     0
>             da12.eli                ONLINE       0     0     0
>             da13.eli                ONLINE       0     0     0
>           raidz2-1                  DEGRADED     0     0     0
>             da0.eli                 ONLINE       0     0     0
>             da1.eli                 ONLINE       0     0     0
>             da2.eli                 ONLINE       0     0     1
> (resilvering)
>             replacing-3             DEGRADED     0     0     1
>               10699825708166646100  UNAVAIL      0     0     0
> was /dev/da3.eli da4.eli            ONLINE       0     0     0
> (resilvering)
>             da3.eli                 ONLINE       0     0     0
>             da5.eli                 ONLINE       0     0     0
>             da8.eli                 ONLINE       0     0     0
>             da9.eli                 ONLINE       0     0     0
>
> errors: No known data errors
>
> system is
> FreeBSD xxx 10.1-BETA1 FreeBSD 10.1-BETA1 #27 r271633:
> Mon Sep 15 22:34:05 CEST 2014
> root at xxx:/usr/obj/usr/src/sys/xxx  amd64
>
> controller is
> SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor]
>
> Drives are connected via four four-port sata cables.
>
> Should I upgrade to 10.3-release or did I make some sort of
> configuration error / overlook something?
>
> Thanks in advance!
>
> Cheers,
> Marc
Resilver will start over anytime there's new damage.  In your case,
with two failed drives, resilver should've begun after you replaced
the first drive, and restarted after you replaced the second.  Have
you seen it restart more than that?  If so, keep an eye on the error
counters in "zpool status"; they might give you a clue.  You could
also raise the loglevel of devd to "info" in /etc/syslog.conf and see
what gets logged to /etc/devd.log.  That will tell you if drives a
dropping out and automatically rejoining the pool, for example.

-Alan

freebsd stable - Sep 2016 - zfs resilver keeps restarting

zfs resilver keeps restarting

zfs resilver keeps restarting