Hi,
I recently received reports of 2 users who experienced corrupted raid-z
pools with ZFS-FUSE and I''m having trouble reproducing the problem or
even figuring out what the cause is.
One of the users experienced corruption only by rebooting the system:
> # zpool status
> pool: media
> state: FAULTED
> scrub: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> media UNAVAIL 0 0 0 insufficient replicas
> raidz1 UNAVAIL 0 0 0 corrupted data
> sda ONLINE 0 0 0
> sdb ONLINE 0 0 0
> sdc ONLINE 0 0 0
> sdd ONLINE 0 0 0
First I thought it was a problem of the device names being renamed
(caused by different order of disk detection on boot), but I believe in
this case ZFS would report the drive as UNAVAIL.
Anyway, exporting and re-importing didn''t work:
> # zpool import
> pool: media
> id: 18446744072804078091
> state: FAULTED
> action: The pool cannot be imported due to damaged devices or data.
> config:
> media UNAVAIL insufficient replicas
> raidz1 UNAVAIL corrupted data
> sda ONLINE
> sdb ONLINE
> sdc ONLINE
> sdd ONLINE
Another user experienced a similar problem but in a different circumstance:
He had a raid-z pool with 2 drives and while the system was idle he
removed one of the drives. zfs-fuse doesn''t notice the drive is removed
until it tries to read or write to the device, so "zpool status"
showed
the drive was still online. Anyway, after a slightly confusing sequence
of events (replugging the drive, zfs-fuse crashing(?!), and some other
weirdness), the end result was the same:
> pool: pool
> state: UNAVAIL
> scrub: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> pool UNAVAIL 0 0 0 insufficient replicas
> raidz1 UNAVAIL 0 0 0 corrupted data
> sdc2 ONLINE 0 0 0
> sdd2 ONLINE 0 0 0
I tried to reproduce this but I can''t. When I remove a USB drive from a
Raid-Z pool, zfs-fuse correctly shows READ/WRITE failures. I also tried
killing zfs-fuse, changing the order of the drives and then starting
zfs-fuse, but after exporting and importing it never corrupted the pool
(although it found checksum errors on the drive that was unplugged, of
course).
Something that might be useful knowing: zfs-fuse uses the block devices
as if it were a normal file and it calls fsync() on the file descriptor
when necessary (like in vdev_file.c), but this only guarantees that the
kernel buffers are flushed, it doesn''t actually send the flush command
to the disk (unfortunately there''s no DKIOCFLUSHWRITECACHE ioctl
equivalent in Linux). Anyway, the possibility that this is the problem
seems very remote to me (and it wouldn''t explain the second case).
Do you have any idea what the problem could be or how can I determine
the cause? I''m stuck at this point, and the first user seems to have
lost 280 GB of data (he didn''t have a backup)..
Regards,
Ricardo Correia