thr3ads.net - freebsd stable - Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20) [[UPDATE w/more tests]] [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Steven Hartland

2019-Apr-20 20:56 UTC

Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

Thanks for extra info, the next question would be have you eliminated 
that corruption exists before the disk is removed?

Would be interesting to add a zpool scrub to confirm this isn't the case 
before the disk removal is attempted.

 ??? Regards
 ??? Steve

On 20/04/2019 18:35, Karl Denninger wrote:>
> On 4/20/2019 10:50, Steven Hartland wrote:
>> Have you eliminated geli as possible source?
> No; I could conceivably do so by re-creating another backup volume set 
> without geli-encrypting the drives, but I do not have an extra set of 
> drives of the capacity required laying around to do that. I would have 
> to do it with lower-capacity disks, which I can attempt if you think 
> it would help.? I *do* have open slots in the drive backplane to set 
> up a second "test" unit of this sort.? For reasons below it will
take
> at least a couple of weeks to get good data on whether the problem 
> exists without geli, however.
>>
>> I've just setup an old server which has a LSI 2008 running and old
FW
>> (11.0) so was going to have a go at reproducing this.
>>
>> Apart from the disconnect steps below is there anything else needed 
>> e.g. read / write workload during disconnect?
>
> Yes.? An attempt to recreate this on my sandbox machine using smaller 
> disks (WD RE-320s) and a decent amount of read/write activity (tens to 
> ~100 gigabytes) on a root mirror of three disks with one taken offline 
> did not succeed.? It *reliably* appears, however, on my backup volumes 
> with every drive swap. The sandbox machine is physically identical 
> other than the physical disks; both are Xeons with ECC RAM in them.
>
> The only operational difference is that the backup volume sets have a 
> *lot* of data written to them via zfs send|zfs recv over the 
> intervening period where with "ordinary" activity from I/O (which
was
> the case on my sandbox) the I/O pattern is materially different.? The 
> root pool on the sandbox where I tried to reproduce it synthetically 
> *is* using geli (in fact it boots native-encrypted.)
>
> The "ordinary" resilver on a disk swap typically covers ~2-3Tb
and is
> a ~6-8 hour process.
>
> The usual process for the backup pool looks like this:
>
> Have 2 of the 3 physical disks mounted; the third is in the bank vault.
>
> Over the space of a week, the backup script is run daily.? It first 
> imports the pool and then for each zfs filesystem it is backing up 
> (which is not all of them; I have a few volatile ones that I don't 
> care if I lose, such as object directories for builds and such, plus 
> some that are R/O data sets that are backed up separately) it does:
>
> If there is no "... at zfs-base": zfs snapshot -r ... at
zfs-base; zfs send
> -R ... at zfs-base | zfs receive -Fuvd $BACKUP
>
> else
>
> zfs rename -r ... at zfs-base ... at zfs-old
> zfs snapshot -r ... at zfs-base
>
> zfs send -RI ... at zfs-old ... at zfs-base |zfs recv -Fudv $BACKUP
>
> .... if ok then zfs destroy -vr ... at zfs-old otherwise print a 
> complaint and stop.
>
> When all are complete it then does a "zpool export backup" to
detach
> the pool in order to reduce the risk of "stupid root user" (me)
accidents.
>
> In short I send an incremental of the changes since the last backup, 
> which in many cases includes a bunch of automatic snapshots that are 
> taken on frequent basis out of the cron. Typically there are a week's 
> worth of these that accumulate between swaps of the disk to the vault, 
> and the offline'd disk remains that way for a week.? I also wait for 
> the zpool destroy on each of the targets to drain before continuing, 
> as not doing so back in the 9 and 10.x days was a good way to 
> stimulate an instant panic on re-import the next day due to kernel 
> stack page exhaustion if the previous operation destroyed hundreds of 
> gigabytes of snapshots (which does routinely happen as part of the 
> backed up data is Macrium images from PCs, so when a new month comes 
> around the PC's backup routine removes a huge amount of old data from 
> the filesystem.)
>
> Trying to simulate the checksum errors in a few hours' time thus far 
> has failed.? But every time I swap the disks on a weekly basis I get a 
> handful of checksum errors on the scrub. If I export and re-import the 
> backup mirror after that the counters are zeroed -- the checksum error 
> count does *not* remain across an export/import cycle although the 
> "scrub repaired" line remains.
>
> For example after the scrub completed this morning I exported the pool 
> (the script expects the pool exported before it begins) and ran the 
> backup.? When it was complete:
>
> root at NewFS:~/backup-zfs # zpool status backup
> ? pool: backup
> ?state: DEGRADED
> status: One or more devices has been taken offline by the administrator.
> ??????? Sufficient replicas exist for the pool to continue functioning 
> in a
> ??????? degraded state.
> action: Online the device using 'zpool online' or replace the
device with
> ??????? 'zpool replace'.
> ? scan: scrub repaired 188K in 0 days 09:40:18 with 0 errors on Sat 
> Apr 20 08:45:09 2019
> config:
>
> ??????? NAME????????????????????? STATE???? READ WRITE CKSUM
> ??????? backup??????????????????? DEGRADED???? 0 0???? 0
> ????????? mirror-0??????????????? DEGRADED???? 0 0???? 0
> ??????????? gpt/backup61.eli????? ONLINE?????? 0 0???? 0
> ??????????? gpt/backup62-1.eli??? ONLINE?????? 0 0???? 0
> ??????????? 13282812295755460479? OFFLINE????? 0 0???? 0? was 
> /dev/gpt/backup62-2.eli
>
> errors: No known data errors
>
> It knows it fixed the checksums but the error count is zero -- I did 
> NOT "zpool clear".
>
> This may have been present in 11.2; I didn't run that long enough in 
> this environment to know.? It definitely was *not* present in 11.1 and 
> before; the same data structure and script for backups has been in use 
> for a very long time without any changes and this first appeared when 
> I upgraded from 11.1 to 12.0 on this specific machine, with the exact 
> same physical disks being used for over a year (they're currently 6Tb 
> units; the last change out for those was ~1.5 years ago when I went 
> from 4Tb to 6Tb volumes.)? I have both HGST-NAS and He-Enterprise 
> disks in the rotation and both show identical behavior so it doesn't 
> appear to be related to a firmware problem in one disk .vs. the other 
> (e.g. firmware that fails to flush the on-drive cache before going to 
> standby even though it was told to.)
>
>>
>> mps0: <Avago Technologies (LSI) SAS2008> port 0xe000-0xe0ff mem 
>> 0xfaf3c000-0xfaf3ffff,0xfaf40000-0xfaf7ffff irq 26 at device 0.0 on
pci3
>> mps0: Firmware: 11.00.00.00, Driver: 21.02.00.00-fbsd
>> mps0: IOCCapabilities: 
>> 185c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR>
>>
>> ??? Regards
>> ??? Steve
>>
>> On 20/04/2019 15:39, Karl Denninger wrote:
>>> I can confirm that 20.00.07.00 does *not* stop this.
>>> The previous write/scrub on this device was on 20.00.07.00. It was
>>> swapped back in from the vault yesterday, resilvered without
incident,
>>> but a scrub says....
>>>
>>> root at NewFS:/home/karl # zpool status backup
>>> ?? pool: backup
>>> ??state: DEGRADED
>>> status: One or more devices has experienced an unrecoverable
error.? An
>>> ???????? attempt was made to correct the error.? Applications are 
>>> unaffected.
>>> action: Determine if the device needs to be replaced, and clear the
>>> errors
>>> ???????? using 'zpool clear' or replace the device with
'zpool
>>> replace'.
>>> ??? see: http://illumos.org/msg/ZFS-8000-9P
>>> ?? scan: scrub repaired 188K in 0 days 09:40:18 with 0 errors on
Sat
>>> Apr
>>> 20 08:45:09 2019
>>> config:
>>>
>>> ???????? NAME????????????????????? STATE???? READ WRITE CKSUM
>>> ???????? backup??????????????????? DEGRADED???? 0???? 0???? 0
>>> ?????????? mirror-0??????????????? DEGRADED???? 0???? 0???? 0
>>> ???????????? gpt/backup61.eli????? ONLINE?????? 0???? 0???? 0
>>> ???????????? gpt/backup62-1.eli??? ONLINE?????? 0???? 0??? 47
>>> ???????????? 13282812295755460479? OFFLINE????? 0???? 0???? 0 was
>>> /dev/gpt/backup62-2.eli
>>>
>>> errors: No known data errors
>>>
>>> So this is firmware-invariant (at least between 19.00.00.00 and
>>> 20.00.07.00); the issue persists.
>>>
>>> Again, in my instance these devices are never removed
"unsolicited" so
>>> there can't be (or at least shouldn't be able to) unflushed
data in the
>>> device or kernel cache.? The procedure is and remains:
>>>
>>> zpool offline .....
>>> geli detach .....
>>> camcontrol standby ...
>>>
>>> Wait a few seconds for the spindle to spin down.
>>>
>>> Remove disk.
>>>
>>> Then of course on the other side after insertion and the kernel has
>>> reported "finding" the device:
>>>
>>> geli attach ...
>>> zpool online ....
>>>
>>> Wait...
>>>
>>> If this is a boogered TXG that's held in the metadata for the
>>> "offline"'d device (maybe "off by one"?)
that's potentially bad in that
>>> if there is an unknown failure in the other mirror component the
>>> resilver will complete but data has been irrevocably destroyed.
>>>
>>> Granted, this is a very low probability scenario (the area where
the
>>> bad
>>> checksums are has to be where the corruption hits, and it has to
happen
>>> between the resilver and access to that data.)? Those are long odds
but
>>> nonetheless a window of "you're hosed" does appear to
exist.
>>>
>>
> -- 
> Karl Denninger
> karl at denninger.net <mailto:karl at denninger.net>
> /The Market Ticker/
> /[S/MIME encrypted email preferred]/

Karl Denninger

2019-Apr-20 21:26 UTC

head link

Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

No; I can, but of course that's another ~8 hour (overnight) delay
between swaps.

That's not a bad idea however....

On 4/20/2019 15:56, Steven Hartland wrote:> Thanks for extra info, the next question would be have you eliminated
> that corruption exists before the disk is removed?
>
> Would be interesting to add a zpool scrub to confirm this isn't the
> case before the disk removal is attempted.
>
> ??? Regards
> ??? Steve
>
> On 20/04/2019 18:35, Karl Denninger wrote:
>>
>> On 4/20/2019 10:50, Steven Hartland wrote:
>>> Have you eliminated geli as possible source?
>> No; I could conceivably do so by re-creating another backup volume
>> set without geli-encrypting the drives, but I do not have an extra
>> set of drives of the capacity required laying around to do that. I
>> would have to do it with lower-capacity disks, which I can attempt if
>> you think it would help.? I *do* have open slots in the drive
>> backplane to set up a second "test" unit of this sort.? For
reasons
>> below it will take at least a couple of weeks to get good data on
>> whether the problem exists without geli, however.
>>>
>>> I've just setup an old server which has a LSI 2008 running and
old
>>> FW (11.0) so was going to have a go at reproducing this.
>>>
>>> Apart from the disconnect steps below is there anything else needed
>>> e.g. read / write workload during disconnect?
>>
>> Yes.? An attempt to recreate this on my sandbox machine using smaller
>> disks (WD RE-320s) and a decent amount of read/write activity (tens
>> to ~100 gigabytes) on a root mirror of three disks with one taken
>> offline did not succeed.? It *reliably* appears, however, on my
>> backup volumes with every drive swap. The sandbox machine is
>> physically identical other than the physical disks; both are Xeons
>> with ECC RAM in them.
>>
>> The only operational difference is that the backup volume sets have a
>> *lot* of data written to them via zfs send|zfs recv over the
>> intervening period where with "ordinary" activity from I/O
(which was
>> the case on my sandbox) the I/O pattern is materially different.? The
>> root pool on the sandbox where I tried to reproduce it synthetically
>> *is* using geli (in fact it boots native-encrypted.)
>>
>> The "ordinary" resilver on a disk swap typically covers
~2-3Tb and is
>> a ~6-8 hour process.
>>
>> The usual process for the backup pool looks like this:
>>
>> Have 2 of the 3 physical disks mounted; the third is in the bank vault.
>>
>> Over the space of a week, the backup script is run daily.? It first
>> imports the pool and then for each zfs filesystem it is backing up
>> (which is not all of them; I have a few volatile ones that I don't
>> care if I lose, such as object directories for builds and such, plus
>> some that are R/O data sets that are backed up separately) it does:
>>
>> If there is no "... at zfs-base": zfs snapshot -r ... at
zfs-base; zfs send
>> -R ... at zfs-base | zfs receive -Fuvd $BACKUP
>>
>> else
>>
>> zfs rename -r ... at zfs-base ... at zfs-old
>> zfs snapshot -r ... at zfs-base
>>
>> zfs send -RI ... at zfs-old ... at zfs-base |zfs recv -Fudv $BACKUP
>>
>> .... if ok then zfs destroy -vr ... at zfs-old otherwise print a
>> complaint and stop.
>>
>> When all are complete it then does a "zpool export backup" to
detach
>> the pool in order to reduce the risk of "stupid root user"
(me)
>> accidents.
>>
>> In short I send an incremental of the changes since the last backup,
>> which in many cases includes a bunch of automatic snapshots that are
>> taken on frequent basis out of the cron. Typically there are a
week's
>> worth of these that accumulate between swaps of the disk to the
>> vault, and the offline'd disk remains that way for a week.? I also
>> wait for the zpool destroy on each of the targets to drain before
>> continuing, as not doing so back in the 9 and 10.x days was a good
>> way to stimulate an instant panic on re-import the next day due to
>> kernel stack page exhaustion if the previous operation destroyed
>> hundreds of gigabytes of snapshots (which does routinely happen as
>> part of the backed up data is Macrium images from PCs, so when a new
>> month comes around the PC's backup routine removes a huge amount of
>> old data from the filesystem.)
>>
>> Trying to simulate the checksum errors in a few hours' time thus
far
>> has failed.? But every time I swap the disks on a weekly basis I get
>> a handful of checksum errors on the scrub. If I export and re-import
>> the backup mirror after that the counters are zeroed -- the checksum
>> error count does *not* remain across an export/import cycle although
>> the "scrub repaired" line remains.
>>
>> For example after the scrub completed this morning I exported the
>> pool (the script expects the pool exported before it begins) and ran
>> the backup.? When it was complete:
>>
>> root at NewFS:~/backup-zfs # zpool status backup
>> ? pool: backup
>> ?state: DEGRADED
>> status: One or more devices has been taken offline by the
administrator.
>> ??????? Sufficient replicas exist for the pool to continue
>> functioning in a
>> ??????? degraded state.
>> action: Online the device using 'zpool online' or replace the
device
>> with
>> ??????? 'zpool replace'.
>> ? scan: scrub repaired 188K in 0 days 09:40:18 with 0 errors on Sat
>> Apr 20 08:45:09 2019
>> config:
>>
>> ??????? NAME????????????????????? STATE???? READ WRITE CKSUM
>> ??????? backup??????????????????? DEGRADED???? 0 0???? 0
>> ????????? mirror-0??????????????? DEGRADED???? 0 0???? 0
>> ??????????? gpt/backup61.eli????? ONLINE?????? 0 0???? 0
>> ??????????? gpt/backup62-1.eli??? ONLINE?????? 0 0???? 0
>> ??????????? 13282812295755460479? OFFLINE????? 0 0???? 0? was
>> /dev/gpt/backup62-2.eli
>>
>> errors: No known data errors
>>
>> It knows it fixed the checksums but the error count is zero -- I did
>> NOT "zpool clear".
>>
>> This may have been present in 11.2; I didn't run that long enough
in
>> this environment to know.? It definitely was *not* present in 11.1
>> and before; the same data structure and script for backups has been
>> in use for a very long time without any changes and this first
>> appeared when I upgraded from 11.1 to 12.0 on this specific machine,
>> with the exact same physical disks being used for over a year
>> (they're currently 6Tb units; the last change out for those was
~1.5
>> years ago when I went from 4Tb to 6Tb volumes.)? I have both HGST-NAS
>> and He-Enterprise disks in the rotation and both show identical
>> behavior so it doesn't appear to be related to a firmware problem
in
>> one disk .vs. the other (e.g. firmware that fails to flush the
>> on-drive cache before going to standby even though it was told to.)
>>
>>>
>>> mps0: <Avago Technologies (LSI) SAS2008> port 0xe000-0xe0ff
mem
>>> 0xfaf3c000-0xfaf3ffff,0xfaf40000-0xfaf7ffff irq 26 at device 0.0 on
>>> pci3
>>> mps0: Firmware: 11.00.00.00, Driver: 21.02.00.00-fbsd
>>> mps0: IOCCapabilities:
>>> 185c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR>
>>>
>>> ??? Regards
>>> ??? Steve
>>>
>>> On 20/04/2019 15:39, Karl Denninger wrote:
>>>> I can confirm that 20.00.07.00 does *not* stop this.
>>>> The previous write/scrub on this device was on 20.00.07.00. It
was
>>>> swapped back in from the vault yesterday, resilvered without
incident,
>>>> but a scrub says....
>>>>
>>>> root at NewFS:/home/karl # zpool status backup
>>>> ?? pool: backup
>>>> ??state: DEGRADED
>>>> status: One or more devices has experienced an unrecoverable
>>>> error.? An
>>>> ???????? attempt was made to correct the error.? Applications
are
>>>> unaffected.
>>>> action: Determine if the device needs to be replaced, and clear
the
>>>> errors
>>>> ???????? using 'zpool clear' or replace the device with
'zpool
>>>> replace'.
>>>> ??? see: http://illumos.org/msg/ZFS-8000-9P
>>>> ?? scan: scrub repaired 188K in 0 days 09:40:18 with 0 errors
on
>>>> Sat Apr
>>>> 20 08:45:09 2019
>>>> config:
>>>>
>>>> ???????? NAME????????????????????? STATE???? READ WRITE CKSUM
>>>> ???????? backup??????????????????? DEGRADED???? 0???? 0???? 0
>>>> ?????????? mirror-0??????????????? DEGRADED???? 0???? 0???? 0
>>>> ???????????? gpt/backup61.eli????? ONLINE?????? 0???? 0???? 0
>>>> ???????????? gpt/backup62-1.eli??? ONLINE?????? 0???? 0??? 47
>>>> ???????????? 13282812295755460479? OFFLINE????? 0???? 0???? 0
was
>>>> /dev/gpt/backup62-2.eli
>>>>
>>>> errors: No known data errors
>>>>
>>>> So this is firmware-invariant (at least between 19.00.00.00 and
>>>> 20.00.07.00); the issue persists.
>>>>
>>>> Again, in my instance these devices are never removed
"unsolicited" so
>>>> there can't be (or at least shouldn't be able to)
unflushed data in
>>>> the
>>>> device or kernel cache.? The procedure is and remains:
>>>>
>>>> zpool offline .....
>>>> geli detach .....
>>>> camcontrol standby ...
>>>>
>>>> Wait a few seconds for the spindle to spin down.
>>>>
>>>> Remove disk.
>>>>
>>>> Then of course on the other side after insertion and the kernel
has
>>>> reported "finding" the device:
>>>>
>>>> geli attach ...
>>>> zpool online ....
>>>>
>>>> Wait...
>>>>
>>>> If this is a boogered TXG that's held in the metadata for
the
>>>> "offline"'d device (maybe "off by
one"?) that's potentially bad in
>>>> that
>>>> if there is an unknown failure in the other mirror component
the
>>>> resilver will complete but data has been irrevocably destroyed.
>>>>
>>>> Granted, this is a very low probability scenario (the area
where
>>>> the bad
>>>> checksums are has to be where the corruption hits, and it has
to
>>>> happen
>>>> between the resilver and access to that data.)? Those are long
odds
>>>> but
>>>> nonetheless a window of "you're hosed" does
appear to exist.
>>>>
>>>
>> --?
>> Karl Denninger
>> karl at denninger.net <mailto:karl at denninger.net>
>> /The Market Ticker/
>> /[S/MIME encrypted email preferred]/
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"-- 
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4897 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20190420/380bdbbb/attachment.bin>

Karl Denninger

2019-Apr-28 22:01 UTC

head link

Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20) [[UPDATE w/more tests]]

On 4/20/2019 15:56, Steven Hartland wrote:> Thanks for extra info, the next question would be have you eliminated
> that corruption exists before the disk is removed?
>
> Would be interesting to add a zpool scrub to confirm this isn't the
> case before the disk removal is attempted.
>
> ??? Regards
> ??? Steve
>
> On 20/04/2019 18:35, Karl Denninger wrote:
>>
>> On 4/20/2019 10:50, Steven Hartland wrote:
>>> Have you eliminated geli as possible source?
>> No; I could conceivably do so by re-creating another backup volume
>> set without geli-encrypting the drives, but I do not have an extra
>> set of drives of the capacity required laying around to do that.? I
>> would have to do it with lower-capacity disks, which I can attempt if
>> you think it would help.? I *do* have open slots in the drive
>> backplane to set up a second "test" unit of this sort.? For
reasons
>> below it will take at least a couple of weeks to get good data on
>> whether the problem exists without geli, however.
>>Ok, following up on this with more data....

First step taken was to create a *second* backup pool (I have the
backplane slots open, fortunately) with three different disks but *no
encryption.*

I ran both side-by-side for several days, with the *unencrypted* one
operating with one disk detached and offline (pulled physically) just as
I do normally.? Then I swapped the two using the same paradigm.

The difference was *dramatic* -- the resilver did *not* scan the entire
disk; it only copied the changed blocks and was finished FAST.? A
subsequent scrub came up 100% clean.

Next I put THOSE disks in the vault (so as to make sure I didn't get
hosed if something went wrong) and re-initialized the pool in question,
leaving only the "geli" alone (in other words I zpool destroy'd
the pool
and then re-created it with all three disks connected and
geli-attached.)? The purpose for doing this was to eliminate the
possibility of old corruption somewhere, or some sort of problem with
multiple, spanning years, in-place "zpool upgrade" commands.? Then I
ran
a base backup to initialize all three volumes, detached one and yanked
it out of the backplane, as would be the usual, leaving the other two
online and operating.

I ran backups as usual for most of last week after doing this, with the
61.eli and 62-1.eli volumes online, and 62-2 physically out of the
backplane.

Today I swapped them again as I usually do (e.g. offline 62.1, geli
detach, camcontrol standby and then yank it -- then insert the 62-2
volume, geli attach and zpool online) and this is happening:

[\u at NewFS /home/karl]# zpool status backup
? pool: backup
?state: DEGRADED
status: One or more devices is currently being resilvered.? The pool will
??????? continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
? scan: resilver in progress since Sun Apr 28 12:57:47 2019
??????? 2.48T scanned at 202M/s, 1.89T issued at 154M/s, 3.27T total
??????? 1.89T resilvered, 57.70% done, 0 days 02:37:14 to go
config:

??????? NAME????????????????????? STATE???? READ WRITE CKSUM
??????? backup??????????????????? DEGRADED???? 0???? 0???? 0
????????? mirror-0??????????????? DEGRADED???? 0???? 0???? 0
??????????? gpt/backup61.eli????? ONLINE?????? 0???? 0???? 0
??????????? 11295390187305954877? OFFLINE????? 0???? 0???? 0? was
/dev/gpt/backup62-1.eli
??????????? gpt/backup62-2.eli??? ONLINE?????? 0???? 0???? 0

errors: No known data errors

The "3.27T" number is accurate (by "zpool list") for the
space in use.

There is not a snowball's chance in Hades that anywhere near 1.89T of
that data (thus far, and it ain't done as you can see!) was modified
between when all three disks were online and when the 62-2.eli volume
was swapped back in for 62.1.eli.? No possible way.? Maybe some
100-200Gb of data has been touched across the backed-up filesystems in
the last three-ish days but there's just flat-out no way it's more than
that; this would imply an entropy of well over 50% of the writeable data
on this box in less than a week!? That's NOT possible.? Further it's not
100%; it shows 2.48T scanned but 1.89T actually written to the other drive.

So something is definitely foooged here and it does appear that geli is
involved in it.? Whatever is foooging zfs the resilver process thinks it
has to recopy MOST (but not all!) of the blocks in use, it appears, from
the 61.eli volume to the 62-2.eli volume.

The question is what would lead ZFS to think it has to do that -- it
clearly DOES NOT as a *much* smaller percentage of the total TXG set on
61.eli was modified while 62-2.eli was offline and 62.1.eli was online.

Again I note that on 11.1 and previous this resilver was a rapid
operation; whatever was actually changed got copied but the system never
copied *nearly everything* on a resilver, including data that had not
been changed at all, on a mirrored set.

Obviously on a Raidz volume you have to go through the entire data
structure because parity has to be recomputed and blocks regenerated but
on a mirror only the changed TXGs need to be looked at and copied.? TXGs
that were on both disks at the time second one was taken offline do not
need to be touched *at all* since they're already on the target!

What's going on here?

-- 
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4897 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20190428/35ca43da/attachment.bin>

freebsd stable - Apr 2019 - Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20) [[UPDATE w/more tests]]

Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20) [[UPDATE w/more tests]]