Good morning,
i experience file corruption on a zfs in a two node Cluster. The Filesystem
holds the datafile of a VirtualBox windows-guest instance. It is placed in one
resourcegroup together with the gds-scripts which manage the virtual-machine
startup and probe:
clresourcegroup create vb1
clresource create -t SUNW.HAStoragePlus \
-g vb1 \
-p Zpools=vb1 \
-p AffinityOn=True vb1-storage
clresource create -g vb1 -t SUNW.gds \
[..]
-p stop_signal=9 -p Failover_enabled=true \
-p Resource_dependencies=vb1-storage vb1-vms
After some days of operations (and many failovers) the virtual-disk-datafile is
corrupted and the zfs does not mount any more:
Oct 23 09:56:08 siegfried EVENT-TIME: Thu Oct 23 09:56:08 CEST 2008
Oct 23 09:56:08 siegfried PLATFORM: PowerEdge 1850, CSN: 9Z7MV1J, HOSTNAME:
siegfried
Oct 23 09:56:08 siegfried SOURCE: zfs-diagnosis, REV: 1.0
Oct 23 09:56:08 siegfried EVENT-ID: 3e0a4051-cd05-cce8-b0bb-c4c165cc4fcc
Oct 23 09:56:08 siegfried DESC: The number of checksum errors associated with a
ZFS device
Oct 23 09:56:08 siegfried exceeded acceptable levels. Refer to
http://sun.com/msg/ZFS-8000-GH for more information.
Oct 23 09:56:08 siegfried AUTO-RESPONSE: The device has been marked as degraded.
An attempt
Oct 23 09:56:08 siegfried will be made to activate a hot spare if available.
Oct 23 09:56:08 siegfried IMPACT: Fault tolerance of the pool may be
compromised.
Oct 23 09:56:08 siegfried REC-ACTION: Run ''zpool status -x''
and replace the bad device.
# zpool status -xv
pool: vb1
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
vb1 ONLINE 0 0 0
c4t600D0230000000000088824BC4228807d0 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
/vb1/vb1/vhd/vb1_vhd1.vdi
SunOS Version: 5.11 snv_97 i86pc i386 i86pc
ClusterExpress Version: 08/20/2008 (build from source)
Storage: SAN Luns via scsi_vhci
Any suggestions?
Best wishes,
Armin
--
This message posted from opensolaris.org
Robert Milkowski
2008-Oct-23 14:21 UTC
[zfs-discuss] (ZFS) file corruption with HAStoragePlus
Hello Armin,
Thursday, October 23, 2008, 10:13:23 AM, you wrote:
AO> Good morning,
AO> i experience file corruption on a zfs in a two node Cluster. The
AO> Filesystem holds the datafile of a VirtualBox windows-guest
AO> instance. It is placed in one resourcegroup together with the
AO> gds-scripts which manage the virtual-machine startup and probe:
AO> clresourcegroup create vb1
AO> clresource create -t SUNW.HAStoragePlus \
AO> -g vb1 \
AO> -p Zpools=vb1 \
AO> -p AffinityOn=True vb1-storage
AO> clresource create -g vb1 -t SUNW.gds \
AO> [..]
AO> -p stop_signal=9 -p Failover_enabled=true \
AO> -p Resource_dependencies=vb1-storage vb1-vms
AO> After some days of operations (and many failovers) the
AO> virtual-disk-datafile is corrupted and the zfs does not mount any more:
AO> Oct 23 09:56:08 siegfried EVENT-TIME: Thu Oct 23 09:56:08 CEST 2008
AO> Oct 23 09:56:08 siegfried PLATFORM: PowerEdge 1850, CSN: 9Z7MV1J,
HOSTNAME: siegfried
AO> Oct 23 09:56:08 siegfried SOURCE: zfs-diagnosis, REV: 1.0
AO> Oct 23 09:56:08 siegfried EVENT-ID:
AO> 3e0a4051-cd05-cce8-b0bb-c4c165cc4fcc
AO> Oct 23 09:56:08 siegfried DESC: The number of checksum errors associated
with a ZFS device
AO> Oct 23 09:56:08 siegfried exceeded acceptable levels. Refer to
AO> http://sun.com/msg/ZFS-8000-GH for more information.
AO> Oct 23 09:56:08 siegfried AUTO-RESPONSE: The device has been marked as
degraded. An attempt
AO> Oct 23 09:56:08 siegfried will be made to activate a hot spare if
available.
AO> Oct 23 09:56:08 siegfried IMPACT: Fault tolerance of the pool may be
compromised.
AO> Oct 23 09:56:08 siegfried REC-ACTION: Run ''zpool status
-x'' and replace the bad device.
AO> # zpool status -xv
AO> pool: vb1
AO> state: ONLINE
AO> status: One or more devices has experienced an error resulting in data
AO> corruption. Applications may be affected.
AO> action: Restore the file in question if possible. Otherwise restore the
AO> entire pool from backup.
AO> see: http://www.sun.com/msg/ZFS-8000-8A
AO> scrub: none requested
AO> config:
AO> NAME STATE READ WRITE
CKSUM
AO> vb1 ONLINE 0 0 0
AO> c4t600D0230000000000088824BC4228807d0 ONLINE 0 0 0
AO> errors: Permanent errors have been detected in the following files:
AO> /vb1/vb1/vhd/vb1_vhd1.vdi
AO> SunOS Version: 5.11 snv_97 i86pc i386 i86pc
AO> ClusterExpress Version: 08/20/2008 (build from source)
AO> Storage: SAN Luns via scsi_vhci
AO> Any suggestions?
If you can then try to get some kind of redundancy provided by ZFS
(mirror?). Looks like your controller/array/whatever corrupted some
data.
--
Best regards,
Robert mailto:milek at task.gda.pl
http://milek.blogspot.com