thr3ads.net - zfs discuss - [zfs-discuss] (ZFS) file corruption with HAStoragePlus [Oct 2008]

If this information is useful, please help other people find it:
Share via:

Armin Ollig

2008-Oct-23 09:13 UTC

[zfs-discuss] (ZFS) file corruption with HAStoragePlus

Good morning,

 i experience file corruption on a zfs in a two node Cluster. The Filesystem
holds the datafile of a VirtualBox windows-guest instance. It is placed in one
resourcegroup together with the gds-scripts which manage the virtual-machine
startup and probe:

clresourcegroup create vb1 

clresource create -t SUNW.HAStoragePlus \
-g vb1 \
-p Zpools=vb1 \
-p AffinityOn=True vb1-storage 

clresource create -g vb1 -t SUNW.gds \
 [..]
-p stop_signal=9 -p Failover_enabled=true \
-p Resource_dependencies=vb1-storage vb1-vms

After some days of operations (and many failovers) the virtual-disk-datafile is
corrupted and the zfs does not mount any more:

Oct 23 09:56:08 siegfried EVENT-TIME: Thu Oct 23 09:56:08 CEST 2008
Oct 23 09:56:08 siegfried PLATFORM: PowerEdge 1850, CSN: 9Z7MV1J, HOSTNAME:
siegfried
Oct 23 09:56:08 siegfried SOURCE: zfs-diagnosis, REV: 1.0
Oct 23 09:56:08 siegfried EVENT-ID: 3e0a4051-cd05-cce8-b0bb-c4c165cc4fcc
Oct 23 09:56:08 siegfried DESC: The number of checksum errors associated with a
ZFS device
Oct 23 09:56:08 siegfried exceeded acceptable levels.  Refer to
http://sun.com/msg/ZFS-8000-GH for more information.
Oct 23 09:56:08 siegfried AUTO-RESPONSE: The device has been marked as degraded.
An attempt
Oct 23 09:56:08 siegfried will be made to activate a hot spare if available.
Oct 23 09:56:08 siegfried IMPACT: Fault tolerance of the pool may be
compromised.
Oct 23 09:56:08 siegfried REC-ACTION: Run ''zpool status -x''
and replace the bad device.

# zpool status -xv
  pool: vb1
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:
        NAME                                     STATE     READ WRITE CKSUM
        vb1                                      ONLINE       0     0     0
          c4t600D0230000000000088824BC4228807d0  ONLINE       0     0     0
errors: Permanent errors have been detected in the following files:
        /vb1/vb1/vhd/vb1_vhd1.vdi


SunOS Version: 5.11 snv_97 i86pc i386 i86pc
ClusterExpress Version: 08/20/2008 (build from source)
Storage: SAN Luns via scsi_vhci

Any suggestions?
Best wishes,
 Armin
--
This message posted from opensolaris.org

Robert Milkowski

2008-Oct-23 14:21 UTC

head link

[zfs-discuss] (ZFS) file corruption with HAStoragePlus

Hello Armin,

Thursday, October 23, 2008, 10:13:23 AM, you wrote:

AO> Good morning,

AO>  i experience file corruption on a zfs in a two node Cluster. The
AO> Filesystem holds the datafile of a VirtualBox windows-guest
AO> instance. It is placed in one resourcegroup together with the
AO> gds-scripts which manage the virtual-machine startup and probe:

AO> clresourcegroup create vb1 

AO> clresource create -t SUNW.HAStoragePlus \
AO> -g vb1 \
AO> -p Zpools=vb1 \
AO> -p AffinityOn=True vb1-storage 

AO> clresource create -g vb1 -t SUNW.gds \
AO>  [..]
AO> -p stop_signal=9 -p Failover_enabled=true \
AO> -p Resource_dependencies=vb1-storage vb1-vms

AO> After some days of operations (and many failovers) the
AO> virtual-disk-datafile is corrupted and the zfs does not mount any more:

AO> Oct 23 09:56:08 siegfried EVENT-TIME: Thu Oct 23 09:56:08 CEST 2008
AO> Oct 23 09:56:08 siegfried PLATFORM: PowerEdge 1850, CSN: 9Z7MV1J,
HOSTNAME: siegfried
AO> Oct 23 09:56:08 siegfried SOURCE: zfs-diagnosis, REV: 1.0
AO> Oct 23 09:56:08 siegfried EVENT-ID:
AO> 3e0a4051-cd05-cce8-b0bb-c4c165cc4fcc
AO> Oct 23 09:56:08 siegfried DESC: The number of checksum errors associated
with a ZFS device
AO> Oct 23 09:56:08 siegfried exceeded acceptable levels.  Refer to
AO> http://sun.com/msg/ZFS-8000-GH for more information.
AO> Oct 23 09:56:08 siegfried AUTO-RESPONSE: The device has been marked as
degraded.  An attempt
AO> Oct 23 09:56:08 siegfried will be made to activate a hot spare if
available.
AO> Oct 23 09:56:08 siegfried IMPACT: Fault tolerance of the pool may be
compromised.
AO> Oct 23 09:56:08 siegfried REC-ACTION: Run ''zpool status
-x'' and replace the bad device.

AO> # zpool status -xv
AO>   pool: vb1
AO>  state: ONLINE
AO> status: One or more devices has experienced an error resulting in data
AO>         corruption.  Applications may be affected.
AO> action: Restore the file in question if possible.  Otherwise restore the
AO>         entire pool from backup.
AO>    see: http://www.sun.com/msg/ZFS-8000-8A
AO>  scrub: none requested
AO> config:
AO>         NAME                                     STATE     READ WRITE
CKSUM
AO>         vb1                                      ONLINE       0   0     0
AO>           c4t600D0230000000000088824BC4228807d0  ONLINE       0   0     0
AO> errors: Permanent errors have been detected in the following files:
AO>         /vb1/vb1/vhd/vb1_vhd1.vdi


AO> SunOS Version: 5.11 snv_97 i86pc i386 i86pc
AO> ClusterExpress Version: 08/20/2008 (build from source)
AO> Storage: SAN Luns via scsi_vhci

AO> Any suggestions?

If you can then try to get some kind of redundancy provided by ZFS
(mirror?). Looks like your controller/array/whatever corrupted some
data.

-- 
Best regards,
 Robert                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

zfs discuss - Oct 2008 - (ZFS) file corruption with HAStoragePlus

[zfs-discuss] (ZFS) file corruption with HAStoragePlus

[zfs-discuss] (ZFS) file corruption with HAStoragePlus