Warren Strange
2010-Sep-12 18:05 UTC
[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?
I posted the following to the VirtualBox forum. I would be interested in finding out if anyone else has ever seen zpool corruption with VirtualBox as a host on OpenSolaris: ----------------------------------------- I am running OpenSolaris b134 as a VirtualBox host, with a Linux guest. I have experienced 6-7 instances of my zpool getting corrupted. I am wondering if anyone else has ever seen this before. This is on a mirrored zpool - using drives from two different manufacturers (i.e. it is very unlikely both drives would fail at the same time, with the same blocks going bad). I initially thought I might have a memory problem - which could explain the simultaneous disk failures. After running memory diagnostics for 24 hours with no errors reported, I am beginning to suspect it might be something else. I am using shared folders from the guest - mounted at guest boot up time. Is it possible that the Solaris vboxsf shared folder kernel driver is causing corruption? Being in the kernel, would it allow bypassing of the normal zfs integrity mechanisms? Or is it possible there is some locking issue or race condition that triggers the corruption? Anecdotally, when I see the corruption the sequence of events seems to be: - dmesg reports various vbox drivers being loaded (normal - just loading the drivers) - Guest boots - gets just pass grub boot screen to the initial redhat boot screen. - The Guest hangs and never boots. - zpool status -v reports corrupted files. The files are on the zpool containing the shared folders and the VirtualBox images Thoughts? -- This message posted from opensolaris.org
Jeff Savit
2010-Sep-12 19:07 UTC
[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?
Hi Warren, This may not help much, except perhaps as a way to eliminate possible causes, but I ran b134 with VirtualBox and guests on ZFS for quite a long time without any such symptoms. My pool is a simple, unmirrored one, so the difference may be there. I used shared folders without incident. Guests include Linux (several distros, including RH), Windows, Solaris, BSD. --Jeff On 09/12/10 11:05 AM, Warren Strange wrote:> I posted the following to the VirtualBox forum. I would be interested in finding out if anyone else has ever seen zpool corruption with VirtualBox as a host on OpenSolaris: > > ----------------------------------------- > I am running OpenSolaris b134 as a VirtualBox host, with a Linux guest. > > I have experienced 6-7 instances of my zpool getting corrupted. I am wondering if anyone else has ever seen this before. > > This is on a mirrored zpool - using drives from two different manufacturers (i.e. it is very unlikely both drives would fail at the same time, with the same blocks going bad). I initially thought I might have a memory problem - which could explain the simultaneous disk failures. After running memory diagnostics for 24 hours with no errors reported, I am beginning to suspect it might be something else. > > I am using shared folders from the guest - mounted at guest boot up time. > > Is it possible that the Solaris vboxsf shared folder kernel driver is causing corruption? Being in the kernel, would it allow bypassing of the normal zfs integrity mechanisms? Or is it possible there is some locking issue or race condition that triggers the corruption? > > Anecdotally, when I see the corruption the sequence of events seems to be: > > - dmesg reports various vbox drivers being loaded (normal - just loading the drivers) > - Guest boots - gets just pass grub boot screen to the initial redhat boot screen. > - The Guest hangs and never boots. > - zpool status -v reports corrupted files. The files are on the zpool containing the shared folders and the VirtualBox images > > > Thoughts? >-- Jeff Savit | Principal Sales Consultant Phone: 602.824.6275 Email: jeff.savit at oracle.com | Blog: http://blogs.sun.com/jsavit Oracle North America Commercial Hardware Operating Environments & Infrastructure S/W Pillar 2355 E Camelback Rd | Phoenix, AZ 85016 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100912/ae5cd893/attachment.html>
Richard Elling
2010-Sep-12 21:04 UTC
[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?
On Sep 12, 2010, at 11:05 AM, Warren Strange wrote:> I posted the following to the VirtualBox forum. I would be interested in finding out if anyone else has ever seen zpool corruption with VirtualBox as a host on OpenSolaris: > > ----------------------------------------- > I am running OpenSolaris b134 as a VirtualBox host, with a Linux guest. > > I have experienced 6-7 instances of my zpool getting corrupted. I am wondering if anyone else has ever seen this before. > > This is on a mirrored zpool - using drives from two different manufacturers (i.e. it is very unlikely both drives would fail at the same time, with the same blocks going bad). I initially thought I might have a memory problem - which could explain the simultaneous disk failures. After running memory diagnostics for 24 hours with no errors reported, I am beginning to suspect it might be something else.So we are clear, you are running VirtualBox on ZFS, rather than ZFS on VirtualBox?> I am using shared folders from the guest - mounted at guest boot up time. > > Is it possible that the Solaris vboxsf shared folder kernel driver is causing corruption? Being in the kernel, would it allow bypassing of the normal zfs integrity mechanisms? Or is it possible there is some locking issue or race condition that triggers the corruption? > > Anecdotally, when I see the corruption the sequence of events seems to be: > > - dmesg reports various vbox drivers being loaded (normal - just loading the drivers) > - Guest boots - gets just pass grub boot screen to the initial redhat boot screen. > - The Guest hangs and never boots. > - zpool status -v reports corrupted files. The files are on the zpool containing the shared folders and the VirtualBox images > > > Thoughts?Bad power supply, HBA, cables, or other common cause. To help you determine the sort of corruption, for mirrored pools FMA will record the nature of the discrepancies. fmdump -eV will show a checksum error and the associated bitmap comparisons. -- richard -- OpenStorage Summit, October 25-27, Palo Alto, CA http://nexenta-summit2010.eventbrite.com ZFS and performance consulting http://www.RichardElling.com
Warren Strange
2010-Sep-12 21:56 UTC
[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?
> So we are clear, you are running VirtualBox on ZFS, > rather than ZFS on VirtualBox? >Correct> > Bad power supply, HBA, cables, or other common cause. > To help you determine the sort of corruption, for > mirrored pools FMA will record > the nature of the discrepancies. > fmdump -eV > will show a checksum error and the associated bitmap > comparisons.Below are the errors reported from the two disks. Not sure if anything looks suspicious (other than the obvious checksum error) Sep 10 2010 12:49:42.315641690 ereport.fs.zfs.checksum nvlist version: 0 class = ereport.fs.zfs.checksum ena = 0x95816e82e2900401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0xf3cb5e110f2c88ec vdev = 0x961d9b28c1440020 (end detector) pool = tank pool_guid = 0xf3cb5e110f2c88ec pool_context = 0 pool_failmode = wait vdev_guid = 0x961d9b28c1440020 vdev_type = disk vdev_path = /dev/dsk/c8t5d0s0 vdev_devid = id1,sd at SATA_____WDC_WD15EADS-00P_____WD-WCAVU0351361/a parent_guid = 0xdae51838a62627b9 parent_type = mirror zio_err = 50 zio_offset = 0x1ef6813a00 zio_size = 0x20000 zio_objset = 0x10 zio_object = 0x1402f zio_level = 0 zio_blkid = 0x76f cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052 0xf1041fd6f838c6eb cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052 0xf0fbe93b4f02c6eb cksum_algorithm = fletcher4 __ttl = 0x1 __tod = 0x4c8a7dc6 0x12d04f5a Sep 10 2010 12:49:42.315641636 ereport.fs.zfs.checksum nvlist version: 0 class = ereport.fs.zfs.checksum ena = 0x95816e82e2900401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0xf3cb5e110f2c88ec vdev = 0x969570b704d5bff1 (end detector) pool = tank pool_guid = 0xf3cb5e110f2c88ec pool_context = 0 pool_failmode = wait vdev_guid = 0x969570b704d5bff1 vdev_type = disk vdev_path = /dev/dsk/c8t4d0s0 vdev_devid = id1,sd at SATA_____ST31500341AS________________9VS3B4CP/a parent_guid = 0xdae51838a62627b9 parent_type = mirror zio_err = 50 zio_offset = 0x1ef6813a00 zio_size = 0x20000 zio_objset = 0x10 zio_object = 0x1402f zio_level = 0 zio_blkid = 0x76f cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052 0xf1041fd6f838c6eb cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052 0xf0fbe93b4f02c6eb cksum_algorithm = fletcher4 __ttl = 0x1 __tod = 0x4c8a7dc6 0x12d04f24 Message was edited by: wstrange -- This message posted from opensolaris.org
Richard Elling
2010-Sep-12 22:31 UTC
[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?
Comments below... On Sep 12, 2010, at 2:56 PM, Warren Strange wrote:>> So we are clear, you are running VirtualBox on ZFS, >> rather than ZFS on VirtualBox? >> > > Correct > >> >> Bad power supply, HBA, cables, or other common cause. >> To help you determine the sort of corruption, for >> mirrored pools FMA will record >> the nature of the discrepancies. >> fmdump -eV >> will show a checksum error and the associated bitmap >> comparisons. > > Below are the errors reported from the two disks. Not sure if anything looks suspicious (other than the obvious checksum error) > > Sep 10 2010 12:49:42.315641690 ereport.fs.zfs.checksum > nvlist version: 0 > class = ereport.fs.zfs.checksum > ena = 0x95816e82e2900401 > detector = (embedded nvlist) > nvlist version: 0 > version = 0x0 > scheme = zfs > pool = 0xf3cb5e110f2c88ec > vdev = 0x961d9b28c1440020 > (end detector) > > pool = tank > pool_guid = 0xf3cb5e110f2c88ec > pool_context = 0 > pool_failmode = wait > vdev_guid = 0x961d9b28c1440020 > vdev_type = disk > vdev_path = /dev/dsk/c8t5d0s0 > vdev_devid = id1,sd at SATA_____WDC_WD15EADS-00P_____WD-WCAVU0351361/a > parent_guid = 0xdae51838a62627b9 > parent_type = mirror > zio_err = 50 > zio_offset = 0x1ef6813a00 > zio_size = 0x20000 > zio_objset = 0x10 > zio_object = 0x1402f > zio_level = 0 > zio_blkid = 0x76f > cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052 0xf1041fd6f838c6eb > cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052 0xf0fbe93b4f02c6eb > cksum_algorithm = fletcher4 > __ttl = 0x1 > __tod = 0x4c8a7dc6 0x12d04f5a > > Sep 10 2010 12:49:42.315641636 ereport.fs.zfs.checksum > nvlist version: 0 > class = ereport.fs.zfs.checksum > ena = 0x95816e82e2900401 > detector = (embedded nvlist) > nvlist version: 0 > version = 0x0 > scheme = zfs > pool = 0xf3cb5e110f2c88ec > vdev = 0x969570b704d5bff1 > (end detector) > > pool = tank > pool_guid = 0xf3cb5e110f2c88ec > pool_context = 0 > pool_failmode = wait > vdev_guid = 0x969570b704d5bff1 > vdev_type = disk > vdev_path = /dev/dsk/c8t4d0s0 > vdev_devid = id1,sd at SATA_____ST31500341AS________________9VS3B4CP/a > parent_guid = 0xdae51838a62627b9 > parent_type = mirror > zio_err = 50 > zio_offset = 0x1ef6813a00 > zio_size = 0x20000 > zio_objset = 0x10 > zio_object = 0x1402f > zio_level = 0 > zio_blkid = 0x76f > cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052 0xf1041fd6f838c6eb > cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052 0xf0fbe93b4f02c6eb > cksum_algorithm = fletcher4 > __ttl = 0x1 > __tod = 0x4c8a7dc6 0x12d04f24In the case where one side of the mirror is corrupted and the other is correct, then you will be shown the difference between the two, in the form of an abbreviated bitmap. In this case, the data on each side of the mirror is the same, with a large degree of confidence. So the source of the corruption is likely to be the same -- some common component: CPU, RAM, HBA, I/O path, etc. You can rule out the disks as suspects. With some additional experiments you can determine if the corruption occurred during the write or the read. -- richard -- OpenStorage Summit, October 25-27, Palo Alto, CA http://nexenta-summit2010.eventbrite.com Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com
Warren Strange
2010-Sep-20 17:00 UTC
[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?
Just following up... I reran memtest diagnostics and let it run overnight again. This time I did see some memory errors - which would be the most likely explanation for the errors I am seeing. Faulty hardware strikes again. Thanks to all for the advice. Warren> Comments below... > > On Sep 12, 2010, at 2:56 PM, Warren Strange wrote: > >> So we are clear, you are running VirtualBox on > ZFS, > >> rather than ZFS on VirtualBox? > >> > > > > Correct > > > >> > >> Bad power supply, HBA, cables, or other common > cause. > >> To help you determine the sort of corruption, for > >> mirrored pools FMA will record > >> the nature of the discrepancies. > >> fmdump -eV > >> will show a checksum error and the associated > bitmap > >> comparisons. > > > > Below are the errors reported from the two disks. > Not sure if anything looks suspicious (other than the > obvious checksum error) > > > > Sep 10 2010 12:49:42.315641690 > ereport.fs.zfs.checksum > > nvlist version: 0 > > class = ereport.fs.zfs.checksum > > ena = 0x95816e82e2900401 > > detector = (embedded nvlist) > > nvlist version: 0 > > version = 0x0 > > scheme = zfs > > pool = 0xf3cb5e110f2c88ec > > vdev = 0x961d9b28c1440020 > > (end detector) > > > > pool = tank > > pool_guid = 0xf3cb5e110f2c88ec > > pool_context = 0 > > pool_failmode = wait > > vdev_guid = 0x961d9b28c1440020 > > vdev_type = disk > > vdev_path = /dev/dsk/c8t5d0s0 > > vdev_devid > id1,sd at SATA_____WDC_WD15EADS-00P_____WD-WCAVU0351361/a > > parent_guid = 0xdae51838a62627b9 > > parent_type = mirror > > zio_err = 50 > > zio_offset = 0x1ef6813a00 > > zio_size = 0x20000 > > zio_objset = 0x10 > > zio_object = 0x1402f > > zio_level = 0 > > zio_blkid = 0x76f > > cksum_expected = 0x405288851d24 0x100655c808fa2072 > 0xa89d11a403482052 0xf1041fd6f838c6eb > > cksum_actual = 0x40528884fd24 0x100655c803286072 > 0xa89d111c8af30052 0xf0fbe93b4f02c6eb > > cksum_algorithm = fletcher4 > > __ttl = 0x1 > > __tod = 0x4c8a7dc6 0x12d04f5a > > > > Sep 10 2010 12:49:42.315641636 > ereport.fs.zfs.checksum > > nvlist version: 0 > > class = ereport.fs.zfs.checksum > > ena = 0x95816e82e2900401 > > detector = (embedded nvlist) > > nvlist version: 0 > > version = 0x0 > > scheme = zfs > > pool = 0xf3cb5e110f2c88ec > > vdev = 0x969570b704d5bff1 > > (end detector) > > > > pool = tank > > pool_guid = 0xf3cb5e110f2c88ec > > pool_context = 0 > > pool_failmode = wait > > vdev_guid = 0x969570b704d5bff1 > > vdev_type = disk > > vdev_path = /dev/dsk/c8t4d0s0 > > vdev_devid > id1,sd at SATA_____ST31500341AS________________9VS3B4CP/a > > parent_guid = 0xdae51838a62627b9 > > parent_type = mirror > > zio_err = 50 > > zio_offset = 0x1ef6813a00 > > zio_size = 0x20000 > > zio_objset = 0x10 > > zio_object = 0x1402f > > zio_level = 0 > > zio_blkid = 0x76f > > cksum_expected = 0x405288851d24 0x100655c808fa2072 > 0xa89d11a403482052 0xf1041fd6f838c6eb > > cksum_actual = 0x40528884fd24 0x100655c803286072 > 0xa89d111c8af30052 0xf0fbe93b4f02c6eb > > cksum_algorithm = fletcher4 > > __ttl = 0x1 > > __tod = 0x4c8a7dc6 0x12d04f24 > > In the case where one side of the mirror is corrupted > and the other is correct, then > you will be shown the difference between the two, in > the form of an abbreviated bitmap. > > In this case, the data on each side of the mirror is > the same, with a large degree of > confidence. So the source of the corruption is likely > to be the same -- some common > component: CPU, RAM, HBA, I/O path, etc. You can rule > out the disks as suspects. > With some additional experiments you can determine if > the corruption occurred during > the write or the read. > -- richard > -- > OpenStorage Summit, October 25-27, Palo Alto, CA > http://nexenta-summit2010.eventbrite.com > > Richard Elling > richard at nexenta.com +1-760-896-4422 > Enterprise class storage for everyone > www.nexenta.com > > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss >-- This message posted from opensolaris.org
Orvar Korvar
2010-Sep-22 14:11 UTC
[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?
Now this is a testament to the power of ZFS. Only ZFS is so sensitive it observed these errors to you. Had you run another filesystem, you would never got a notice that your data is slowly being corrupted by some faulty hardware. :o) -- This message posted from opensolaris.org