thr3ads.net - zfs discuss - [zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders? [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Warren Strange

2010-Sep-12 18:05 UTC

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

I posted the following to the VirtualBox forum. I would be interested in finding
out if anyone else has ever seen zpool corruption with VirtualBox as a host on
OpenSolaris:

-----------------------------------------
I am running OpenSolaris b134 as a VirtualBox host, with a Linux guest.

I have experienced 6-7 instances of my zpool getting corrupted.  I am wondering
if anyone else has ever seen this before.

This is on a mirrored zpool - using drives from two different manufacturers
(i.e. it is very unlikely both drives would fail at the same time, with the same
blocks going bad). I initially thought I might have a memory problem - which
could explain the simultaneous disk failures. After running memory diagnostics
for 24 hours with no errors reported, I am beginning to suspect it might be
something else.

I am using shared folders from the guest - mounted at guest boot up time. 

Is it possible that the Solaris vboxsf shared folder kernel driver is causing
corruption? Being in the kernel, would it allow bypassing of the normal zfs
integrity mechanisms? Or is it possible there is some locking issue or race
condition that triggers the corruption?

Anecdotally, when I see the corruption the sequence of events seems to be:

- dmesg reports various vbox drivers being loaded (normal - just loading the
drivers)
- Guest boots - gets just pass grub boot screen to the initial redhat boot
screen.
- The Guest hangs and never boots. 
- zpool status -v  reports corrupted files. The files are on the zpool
containing the shared folders and the VirtualBox images


Thoughts?
-- 
This message posted from opensolaris.org

Jeff Savit

2010-Sep-12 19:07 UTC

head link

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

Hi Warren,

This may not help much, except perhaps as a way to eliminate possible 
causes, but I ran b134 with VirtualBox and guests on ZFS for quite a 
long time without any such symptoms. My pool is a simple, unmirrored 
one, so the difference may be there. I used shared folders without 
incident. Guests include Linux (several distros, including RH), Windows, 
Solaris, BSD.

--Jeff

On 09/12/10 11:05 AM, Warren Strange wrote:> I posted the following to the VirtualBox forum. I would be interested in
finding out if anyone else has ever seen zpool corruption with VirtualBox as a
host on OpenSolaris:
>
> -----------------------------------------
> I am running OpenSolaris b134 as a VirtualBox host, with a Linux guest.
>
> I have experienced 6-7 instances of my zpool getting corrupted.  I am
wondering if anyone else has ever seen this before.
>
> This is on a mirrored zpool - using drives from two different manufacturers
(i.e. it is very unlikely both drives would fail at the same time, with the same
blocks going bad). I initially thought I might have a memory problem - which
could explain the simultaneous disk failures. After running memory diagnostics
for 24 hours with no errors reported, I am beginning to suspect it might be
something else.
>
> I am using shared folders from the guest - mounted at guest boot up time.
>
> Is it possible that the Solaris vboxsf shared folder kernel driver is
causing corruption? Being in the kernel, would it allow bypassing of the normal
zfs integrity mechanisms? Or is it possible there is some locking issue or race
condition that triggers the corruption?
>
> Anecdotally, when I see the corruption the sequence of events seems to be:
>
> - dmesg reports various vbox drivers being loaded (normal - just loading
the drivers)
> - Guest boots - gets just pass grub boot screen to the initial redhat boot
screen.
> - The Guest hangs and never boots.
> - zpool status -v  reports corrupted files. The files are on the zpool
containing the shared folders and the VirtualBox images
>
>
> Thoughts?
>    

-- 


Jeff Savit | Principal Sales Consultant
Phone: 602.824.6275
Email: jeff.savit at oracle.com | Blog: http://blogs.sun.com/jsavit
Oracle North America Commercial Hardware
Operating Environments & Infrastructure S/W Pillar
2355 E Camelback Rd | Phoenix, AZ 85016



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100912/ae5cd893/attachment.html>

Richard Elling

2010-Sep-12 21:04 UTC

head link

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

On Sep 12, 2010, at 11:05 AM, Warren Strange wrote:
> I posted the following to the VirtualBox forum. I would be interested in
finding out if anyone else has ever seen zpool corruption with VirtualBox as a
host on OpenSolaris:
> 
> -----------------------------------------
> I am running OpenSolaris b134 as a VirtualBox host, with a Linux guest.
> 
> I have experienced 6-7 instances of my zpool getting corrupted.  I am
wondering if anyone else has ever seen this before.
> 
> This is on a mirrored zpool - using drives from two different manufacturers
(i.e. it is very unlikely both drives would fail at the same time, with the same
blocks going bad). I initially thought I might have a memory problem - which
could explain the simultaneous disk failures. After running memory diagnostics
for 24 hours with no errors reported, I am beginning to suspect it might be
something else.
So we are clear, you are running VirtualBox on ZFS, rather than ZFS on
VirtualBox?
> I am using shared folders from the guest - mounted at guest boot up time. 
> 
> Is it possible that the Solaris vboxsf shared folder kernel driver is
causing corruption? Being in the kernel, would it allow bypassing of the normal
zfs integrity mechanisms? Or is it possible there is some locking issue or race
condition that triggers the corruption?
> 
> Anecdotally, when I see the corruption the sequence of events seems to be:
> 
> - dmesg reports various vbox drivers being loaded (normal - just loading
the drivers)
> - Guest boots - gets just pass grub boot screen to the initial redhat boot
screen.
> - The Guest hangs and never boots. 
> - zpool status -v  reports corrupted files. The files are on the zpool
containing the shared folders and the VirtualBox images
> 
> 
> Thoughts?
Bad power supply, HBA, cables, or other common cause.
To help you determine the sort of corruption, for mirrored pools FMA will record
the nature of the discrepancies.
	fmdump -eV
will show a checksum error and the associated bitmap comparisons.
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com
ZFS and performance consulting
http://www.RichardElling.com

Warren Strange

2010-Sep-12 21:56 UTC

head link

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

> So we are clear, you are running VirtualBox on ZFS,
> rather than ZFS on VirtualBox?
> 
Correct

> 
> Bad power supply, HBA, cables, or other common cause.
> To help you determine the sort of corruption, for
> mirrored pools FMA will record
> the nature of the discrepancies.
> 	fmdump -eV
> will show a checksum error and the associated bitmap
> comparisons.
Below are the errors reported from the two disks. Not sure if anything looks
suspicious (other than the obvious checksum error)




Sep 10 2010 12:49:42.315641690 ereport.fs.zfs.checksum
nvlist version: 0
	class = ereport.fs.zfs.checksum
	ena = 0x95816e82e2900401
	detector = (embedded nvlist)
	nvlist version: 0
		version = 0x0
		scheme = zfs
		pool = 0xf3cb5e110f2c88ec
		vdev = 0x961d9b28c1440020
	(end detector)

	pool = tank
	pool_guid = 0xf3cb5e110f2c88ec
	pool_context = 0
	pool_failmode = wait
	vdev_guid = 0x961d9b28c1440020
	vdev_type = disk
	vdev_path = /dev/dsk/c8t5d0s0
	vdev_devid = id1,sd at SATA_____WDC_WD15EADS-00P_____WD-WCAVU0351361/a
	parent_guid = 0xdae51838a62627b9
	parent_type = mirror
	zio_err = 50
	zio_offset = 0x1ef6813a00
	zio_size = 0x20000
	zio_objset = 0x10
	zio_object = 0x1402f
	zio_level = 0
	zio_blkid = 0x76f
	cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052
0xf1041fd6f838c6eb
	cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052
0xf0fbe93b4f02c6eb
	cksum_algorithm = fletcher4
	__ttl = 0x1
	__tod = 0x4c8a7dc6 0x12d04f5a

Sep 10 2010 12:49:42.315641636 ereport.fs.zfs.checksum
nvlist version: 0
	class = ereport.fs.zfs.checksum
	ena = 0x95816e82e2900401
	detector = (embedded nvlist)
	nvlist version: 0
		version = 0x0
		scheme = zfs
		pool = 0xf3cb5e110f2c88ec
		vdev = 0x969570b704d5bff1
	(end detector)

	pool = tank
	pool_guid = 0xf3cb5e110f2c88ec
	pool_context = 0
	pool_failmode = wait
	vdev_guid = 0x969570b704d5bff1
	vdev_type = disk
	vdev_path = /dev/dsk/c8t4d0s0
	vdev_devid = id1,sd at SATA_____ST31500341AS________________9VS3B4CP/a
	parent_guid = 0xdae51838a62627b9
	parent_type = mirror
	zio_err = 50
	zio_offset = 0x1ef6813a00
	zio_size = 0x20000
	zio_objset = 0x10
	zio_object = 0x1402f
	zio_level = 0
	zio_blkid = 0x76f
	cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052
0xf1041fd6f838c6eb
	cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052
0xf0fbe93b4f02c6eb
	cksum_algorithm = fletcher4
	__ttl = 0x1
	__tod = 0x4c8a7dc6 0x12d04f24

Message was edited by: wstrange
-- 
This message posted from opensolaris.org

Richard Elling

2010-Sep-12 22:31 UTC

head link

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

Comments below...

On Sep 12, 2010, at 2:56 PM, Warren Strange wrote:>> So we are clear, you are running VirtualBox on ZFS,
>> rather than ZFS on VirtualBox?
>> 
> 
> Correct
> 
>> 
>> Bad power supply, HBA, cables, or other common cause.
>> To help you determine the sort of corruption, for
>> mirrored pools FMA will record
>> the nature of the discrepancies.
>> 	fmdump -eV
>> will show a checksum error and the associated bitmap
>> comparisons.
> 
> Below are the errors reported from the two disks. Not sure if anything
looks suspicious (other than the obvious checksum error)
> 
> Sep 10 2010 12:49:42.315641690 ereport.fs.zfs.checksum
> nvlist version: 0
> 	class = ereport.fs.zfs.checksum
> 	ena = 0x95816e82e2900401
> 	detector = (embedded nvlist)
> 	nvlist version: 0
> 		version = 0x0
> 		scheme = zfs
> 		pool = 0xf3cb5e110f2c88ec
> 		vdev = 0x961d9b28c1440020
> 	(end detector)
> 
> 	pool = tank
> 	pool_guid = 0xf3cb5e110f2c88ec
> 	pool_context = 0
> 	pool_failmode = wait
> 	vdev_guid = 0x961d9b28c1440020
> 	vdev_type = disk
> 	vdev_path = /dev/dsk/c8t5d0s0
> 	vdev_devid = id1,sd at SATA_____WDC_WD15EADS-00P_____WD-WCAVU0351361/a
> 	parent_guid = 0xdae51838a62627b9
> 	parent_type = mirror
> 	zio_err = 50
> 	zio_offset = 0x1ef6813a00
> 	zio_size = 0x20000
> 	zio_objset = 0x10
> 	zio_object = 0x1402f
> 	zio_level = 0
> 	zio_blkid = 0x76f
> 	cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052
0xf1041fd6f838c6eb
> 	cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052
0xf0fbe93b4f02c6eb
> 	cksum_algorithm = fletcher4
> 	__ttl = 0x1
> 	__tod = 0x4c8a7dc6 0x12d04f5a
> 
> Sep 10 2010 12:49:42.315641636 ereport.fs.zfs.checksum
> nvlist version: 0
> 	class = ereport.fs.zfs.checksum
> 	ena = 0x95816e82e2900401
> 	detector = (embedded nvlist)
> 	nvlist version: 0
> 		version = 0x0
> 		scheme = zfs
> 		pool = 0xf3cb5e110f2c88ec
> 		vdev = 0x969570b704d5bff1
> 	(end detector)
> 
> 	pool = tank
> 	pool_guid = 0xf3cb5e110f2c88ec
> 	pool_context = 0
> 	pool_failmode = wait
> 	vdev_guid = 0x969570b704d5bff1
> 	vdev_type = disk
> 	vdev_path = /dev/dsk/c8t4d0s0
> 	vdev_devid = id1,sd at SATA_____ST31500341AS________________9VS3B4CP/a
> 	parent_guid = 0xdae51838a62627b9
> 	parent_type = mirror
> 	zio_err = 50
> 	zio_offset = 0x1ef6813a00
> 	zio_size = 0x20000
> 	zio_objset = 0x10
> 	zio_object = 0x1402f
> 	zio_level = 0
> 	zio_blkid = 0x76f
> 	cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052
0xf1041fd6f838c6eb
> 	cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052
0xf0fbe93b4f02c6eb
> 	cksum_algorithm = fletcher4
> 	__ttl = 0x1
> 	__tod = 0x4c8a7dc6 0x12d04f24
In the case where one side of the mirror is corrupted and the other is correct,
then
you will be shown the difference between the two, in the form of an abbreviated
bitmap.

In this case, the data on each side of the mirror is the same, with a large
degree of
confidence. So the source of the corruption is likely to be the same -- some
common
component: CPU, RAM, HBA, I/O path, etc. You can rule out the disks as suspects.
With some additional experiments you can determine if the corruption occurred
during
the write or the read.
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com

Richard Elling
richard at nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com

Warren Strange

2010-Sep-20 17:00 UTC

head link

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

Just following up...

I reran memtest diagnostics and let it run overnight again.   This time I did
see some memory errors - which would be the most likely explanation for the
errors I am seeing.

 Faulty hardware strikes again. 


Thanks to all for the advice.

Warren

> Comments below...
> 
> On Sep 12, 2010, at 2:56 PM, Warren Strange wrote:
> >> So we are clear, you are running VirtualBox on
> ZFS,
> >> rather than ZFS on VirtualBox?
> >> 
> > 
> > Correct
> > 
> >> 
> >> Bad power supply, HBA, cables, or other common
> cause.
> >> To help you determine the sort of corruption, for
> >> mirrored pools FMA will record
> >> the nature of the discrepancies.
> >> 	fmdump -eV
> >> will show a checksum error and the associated
> bitmap
> >> comparisons.
> > 
> > Below are the errors reported from the two disks.
> Not sure if anything looks suspicious (other than the
> obvious checksum error)
> > 
> > Sep 10 2010 12:49:42.315641690
> ereport.fs.zfs.checksum
> > nvlist version: 0
> > 	class = ereport.fs.zfs.checksum
> > 	ena = 0x95816e82e2900401
> > 	detector = (embedded nvlist)
> > 	nvlist version: 0
> > 		version = 0x0
> > 		scheme = zfs
> > 		pool = 0xf3cb5e110f2c88ec
> > 		vdev = 0x961d9b28c1440020
> > 	(end detector)
> > 
> > 	pool = tank
> > 	pool_guid = 0xf3cb5e110f2c88ec
> > 	pool_context = 0
> > 	pool_failmode = wait
> > 	vdev_guid = 0x961d9b28c1440020
> > 	vdev_type = disk
> > 	vdev_path = /dev/dsk/c8t5d0s0
> > 	vdev_devid > id1,sd at
SATA_____WDC_WD15EADS-00P_____WD-WCAVU0351361/a
> > 	parent_guid = 0xdae51838a62627b9
> > 	parent_type = mirror
> > 	zio_err = 50
> > 	zio_offset = 0x1ef6813a00
> > 	zio_size = 0x20000
> > 	zio_objset = 0x10
> > 	zio_object = 0x1402f
> > 	zio_level = 0
> > 	zio_blkid = 0x76f
> > 	cksum_expected = 0x405288851d24 0x100655c808fa2072
> 0xa89d11a403482052 0xf1041fd6f838c6eb
> > 	cksum_actual = 0x40528884fd24 0x100655c803286072
> 0xa89d111c8af30052 0xf0fbe93b4f02c6eb
> > 	cksum_algorithm = fletcher4
> > 	__ttl = 0x1
> > 	__tod = 0x4c8a7dc6 0x12d04f5a
> > 
> > Sep 10 2010 12:49:42.315641636
> ereport.fs.zfs.checksum
> > nvlist version: 0
> > 	class = ereport.fs.zfs.checksum
> > 	ena = 0x95816e82e2900401
> > 	detector = (embedded nvlist)
> > 	nvlist version: 0
> > 		version = 0x0
> > 		scheme = zfs
> > 		pool = 0xf3cb5e110f2c88ec
> > 		vdev = 0x969570b704d5bff1
> > 	(end detector)
> > 
> > 	pool = tank
> > 	pool_guid = 0xf3cb5e110f2c88ec
> > 	pool_context = 0
> > 	pool_failmode = wait
> > 	vdev_guid = 0x969570b704d5bff1
> > 	vdev_type = disk
> > 	vdev_path = /dev/dsk/c8t4d0s0
> > 	vdev_devid > id1,sd at
SATA_____ST31500341AS________________9VS3B4CP/a
> > 	parent_guid = 0xdae51838a62627b9
> > 	parent_type = mirror
> > 	zio_err = 50
> > 	zio_offset = 0x1ef6813a00
> > 	zio_size = 0x20000
> > 	zio_objset = 0x10
> > 	zio_object = 0x1402f
> > 	zio_level = 0
> > 	zio_blkid = 0x76f
> > 	cksum_expected = 0x405288851d24 0x100655c808fa2072
> 0xa89d11a403482052 0xf1041fd6f838c6eb
> > 	cksum_actual = 0x40528884fd24 0x100655c803286072
> 0xa89d111c8af30052 0xf0fbe93b4f02c6eb
> > 	cksum_algorithm = fletcher4
> > 	__ttl = 0x1
> > 	__tod = 0x4c8a7dc6 0x12d04f24
> 
> In the case where one side of the mirror is corrupted
> and the other is correct, then
> you will be shown the difference between the two, in
> the form of an abbreviated bitmap.
> 
> In this case, the data on each side of the mirror is
> the same, with a large degree of
> confidence. So the source of the corruption is likely
> to be the same -- some common 
> component: CPU, RAM, HBA, I/O path, etc. You can rule
> out the disks as suspects.
> With some additional experiments you can determine if
> the corruption occurred during
> the write or the read.
>  -- richard
> -- 
> OpenStorage Summit, October 25-27, Palo Alto, CA
> http://nexenta-summit2010.eventbrite.com
> 
> Richard Elling
> richard at nexenta.com   +1-760-896-4422
> Enterprise class storage for everyone
> www.nexenta.com
> 
> 
> 
> 
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
>-- 
This message posted from opensolaris.org

Orvar Korvar

2010-Sep-22 14:11 UTC

head link

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

Now this is a testament to the power of ZFS. Only ZFS is so sensitive it
observed these errors to you. Had you run another filesystem, you would never
got a notice that your data is slowly being corrupted by some faulty hardware.

:o)
-- 
This message posted from opensolaris.org

zfs discuss - Sep 2010 - Has anyone seen zpool corruption with VirtualBox shared folders?

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?