Is there any flaw with the process below, customer asked: Sun Cluser with each zpool composed of 1 Lun (yes, they have been advised to use redundant config instead). They do not export the pool to other host instead they use BCV to make a mirror of the lun. They then split the mirror and import the lun/zpool onto a machine not even part of the cluster - backup server. Most of the time, the import seems to work, but maybe about 10-15% of the time it panics the system with bad checksum . Customer do this procedure on 9 luns, 2 times a day. They have been doing the same thing with vxfs/VxVM for some time without any issue. They were recommended to run scrub on a regular basis. I have also provided list of things to check that has a potential to cause checksum error: 1- Exporting LUNs to two different hosts and creating a zpool on it. I have seen it at one customer where one host had a ufs file system on the same LUN that is used by other hosts in its zpool. 2- Accessing the LUN by other means (dd of=/dev/..emcpower11c) that is under ZFScontrol can cause corrupted data. 3- Mistakenly adding same device with different names in zpool. EMC Powerpath and Sun Multipath can have multiple device names pointing to the same device. 4- Importing device without exporting it first. 5- Bad Hardware, Storage or Controller bugs 6- ZFS is not cluster aware, it means one should use clustering software when sharing zpool across multiple hosts. Poor man cluster is not supported! 7- If LUNS exported to ZFS are RAID-5 types. See URL about RAID-5 issues: http://blogs.sun.com/bonwick/entry/raid_z http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt Consider reading the article from Jeff Bonwick http://blogs.sun.com/bonwick/entry/zfs_end_to_end_data Amer. Analysis: SolarisCAT(vmcore.1/10V)> stat core file: /cores/dir31/66015657/vmcore.1 user: Cores User (cores:911) release: 5.10 (64-bit) version: Generic_127127-11 machine: sun4v node name: bansai domain: gov.edmonton.ab.ca hw_provider: Sun_Microsystems system type: SUNW,SPARC-Enterprise-T5220 (UltraSPARC-T2) hostid: 84ac5f08 dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/dsk/c1t0d0s1(62.8G) time of crash: Sat Jul 19 22:42:55 MDT 2008 (core is 33 days old) age of system: 1 days 6 hours 4 minutes 47.86 seconds panic CPU: 56 (64 CPUs, 31.8G memory) panic string: ZFS: bad checksum (read on <unknown> off 0: zio 300743cec4 0 [L0 SPA space map] 1000L/a00P DVA[0]=<0:484a15000:a00> DVA[1]=<0:1df9 054a00:a00> fletcher4 lzjb BE contiguous birth=17010763 fill=1 cksum=8 !zio involved: SolarisCAT(vmcore.1/10V)> sdump 300743cec40 zio_t io_spa,io_type,io_error io_spa = 0x300ee32c4c0 io_type = 1 (ZIO_TYPE_READ) io_error = 0x32 <<< !zpool that had blocks with checksum errors: Block read on the file system that was using ZFS pool "sapcrp" had a checksum error. zio involved had an io_error:50 (errno: 50 (32 hex)). #define EBADE 50 /* invalid exchange */ zio checksum error (ECKSUM) are reported as EBADE errno). src code: " /* * We''ll take the unused errno ''EBADE'' (from the Convergent graveyard) * to indicate checksum errors. */ #define ECKSUM EBADE << " Due to ZFS''s checksum ability data read had checksum computed and compared it to the stored value, which should be the same if the data is good. Since the checksums are different, hence zfs concluded that data is corrupted! If the storage pool had been setup in a ZFS redundant configuration (mirroring or raidz), then ZFS could have gone to the mirror/parity, read a good value and self corrected (heal) the other side of the mirror. Unfortunately pool is configured in a non-redundant fashion as far as ZFS is concerned. No reduntant configuration: mirror or raidz is used and checksum error left no good copy of the data and that resulted in panic. If there are multiple vdevs configured, ZFS can able to heal the data by reading a block with good checksum. Zfs version 2 has metadata replication. Thus multiple vdevs with raidz(2) or mir ror are more resilient to these failure because metadata can be replicated. Also, raidz and raidz2 can have multiple vdev by creating stripe across raidz vdev groups. One can create 4 raidz groups out of 16 drive and then stripe accross four raidz groups. Each raidz group can handle one error or 1 disk failure, it means 4 errors can be handled by 4 raidz group. With striping we are also increa sing IO bandwith. There is no replication provided with Hardware raid LUNS (EMC) because only one vdev exported to ZFS. It is recommeneded, if possible, to create multiple simple hardware LUNS and export it to ZFS and then configure ZFS to create raidz groups and strip across this group. Using this strategy, you will have a benefit of having hardware RAID boxes providing large caches for faster updates and ZFS ability to heal the data on-the-fly with multiple vdevs under its control. > 0x300ee32c4c0::spa ADDR STATE NAME 00000300ee32c4c0 ACTIVE sapcrp > 0x300ee32c4c0::spa -v ADDR STATE NAME 00000300ee32c4c0 ACTIVE sapcrp ADDR STATE AUX DESCRIPTION 000006005ebcdac0 HEALTHY - 0000030015400fc0 HEALTHY - /dev/dsk/c6t6006048000018772084654 574F333445d0s0 > 0x300ee32c4c0::spa -cv ADDR STATE NAME 00000300ee32c4c0 ACTIVE sapcrp (none) ADDR STATE AUX DESCRIPTION 000006005ebcdac0 HEALTHY - 0000030015400fc0 HEALTHY - /dev/dsk/c6t6006048000018772084654 574F333445d0s0 > 0x300ee32c4c0::spa -e ADDR STATE NAME 00000300ee32c4c0 ACTIVE sapcrp ADDR STATE AUX DESCRIPTION 000006005ebcdac0 HEALTHY - READ WRITE FREE CLAIM IOCTL OPS 0 0 0 0 0 BYTES 0 0 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 0000030015400fc0 HEALTHY - /dev/dsk/c6t6006048000018772084654 574F333445d0s0 READ WRITE FREE CLAIM IOCTL OPS 0x88 0x18 0 0 0 BYTES 0x841c00 0x10a00 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0x4 Device: c6t6006048000018772084654574F333445d0s0 -> ../../devices/scsi_vhci/s sd at g6006048000018772084654574f333445 .