thr3ads.net - Xen devel - iSCSI connection corrupts Xen block devices. [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Dr. Greg Wettstein

2013-Mar-26 06:26 UTC

iSCSI connection corrupts Xen block devices.

Hi, hope the week has started out well for everyone.

This report may be in the FWIW department since there may be a
fundamental reason why this doesn''t work.  We elected to report this
to the Xen community since we thought any behavior which corrupted
disk images needed to at least be reported.

We are maintaining the Xen-SAN release which provides hotplug
functionality to allow Xen guests to participate as first class
entities in either an iSCSI or fibre-channel storage network.  We were
preparing for a second release and ran across behavior which appears
to cause Xen guest block devices to be corrupted.

Relevant VM/OS versions are as follows:

	dom0:	3.4.35
	domU:	3.4.35
	Xen:	4.2.1

The test environment is a domU VM running which is using SCST
(2.2.0) to export a block device via iSCSI.  An iSCSI connection is
initiated from dom0 to the target VM.  The iSCSI block device has a VM
system image on it.  I/O can be done from dom0 to the guest without
any apparent issues; ie, mounting the filesystem and reading and
writing to it.

The problem occurs when a second VM is started which uses the iSCSI
based block device as its root filesystem.  The VM starts and
functions normally, I/O can be done without any issues from inside the
VM.  When the VM is shutdown and the iSCSI connection is closed the
block device is instantly corrupted.

The corruption isn''t subtle with the begining of the block device
being over-written with what appears to be generic contents of the
filesystem.  The corruption doesn''t occur when the VM shuts down, only
when the iSCSI connection is closed.

If the iSCSI VM target server is run on a separate physical dom0 host
everything functions normally.  So the corruption is definitely linked
to the both VM''s being run on the same physical dom0 instance.

The problem occurs regardless of the type of device backend which is
used for the domU block device exported by SCST.  The behavior has
been verified with blktap, image over loop and qdisk.  The problem
also occurs when either FILEIO or BLOCKIO are used for the SCST
virtual disk.

As I said at the outset exposing a device to blkback twice may be
something it was never designed to do.  That being said using VM''s for
this type of testing certainly makes sense and the behavior is
unexpected.

Let us know if there are any questions or if additional testing is
needed.

Have a good remainder of the week.

Greg

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"Laugh now but you won''t be laughing when we find you laying on
the
 side of the road dead."
                                -- Betty Wettstein
                                   At the Lake

Roger Pau Monné

2013-Mar-26 09:14 UTC

head link

Re: iSCSI connection corrupts Xen block devices.

On 26/03/13 07:26, Dr. Greg Wettstein wrote:> Hi, hope the week has started out well for everyone.
> 
> This report may be in the FWIW department since there may be a
> fundamental reason why this doesn''t work.  We elected to report
this
> to the Xen community since we thought any behavior which corrupted
> disk images needed to at least be reported.
Hello Greg,

I''ve also noticed this some time ago, the cause of this bug is that we
pass granted pages to netback, and when trying to perform the grant copy
operation it fails. I''ve sent a clumsy patch that solved the problem,
but it involves additional memcpy in order to avoid passing the granted
page to netback:

http://lists.xen.org/archives/html/xen-devel/2013-01/msg00717.html

The best solution I can think of is storing the grant frame reference
somewhere in the p2m table, and then using that reference instead of the
mfn when performing the grant copy operation.

Regards, Roger.

Konrad Rzeszutek Wilk

2013-Mar-26 13:13 UTC

head link

Re: iSCSI connection corrupts Xen block devices.

On Tue, Mar 26, 2013 at 01:26:24AM -0500, Dr. Greg Wettstein
wrote:> Hi, hope the week has started out well for everyone.
> 
> This report may be in the FWIW department since there may be a
> fundamental reason why this doesn''t work.  We elected to report
this
> to the Xen community since we thought any behavior which corrupted
> disk images needed to at least be reported.
You are hitting an issue that Roger hit as well. That is the
m2p override mechanism can only handle one override for a PFN - not
many.

Here is the relevant discussion:
http://lists.xen.org/archives/html/xen-devel/2013-01/msg00748.html> 
> We are maintaining the Xen-SAN release which provides hotplug
> functionality to allow Xen guests to participate as first class
> entities in either an iSCSI or fibre-channel storage network.  We were
> preparing for a second release and ran across behavior which appears
> to cause Xen guest block devices to be corrupted.
> 
> Relevant VM/OS versions are as follows:
> 
> 	dom0:	3.4.35
> 	domU:	3.4.35
> 	Xen:	4.2.1
> 
> The test environment is a domU VM running which is using SCST
> (2.2.0) to export a block device via iSCSI.  An iSCSI connection is
> initiated from dom0 to the target VM.  The iSCSI block device has a VM
> system image on it.  I/O can be done from dom0 to the guest without
> any apparent issues; ie, mounting the filesystem and reading and
> writing to it.
> 
> The problem occurs when a second VM is started which uses the iSCSI
> based block device as its root filesystem.  The VM starts and
> functions normally, I/O can be done without any issues from inside the
> VM.  When the VM is shutdown and the iSCSI connection is closed the
> block device is instantly corrupted.
> 
> The corruption isn''t subtle with the begining of the block device
> being over-written with what appears to be generic contents of the
> filesystem.  The corruption doesn''t occur when the VM shuts down,
only
> when the iSCSI connection is closed.
> 
> If the iSCSI VM target server is run on a separate physical dom0 host
> everything functions normally.  So the corruption is definitely linked
> to the both VM''s being run on the same physical dom0 instance.
> 
> The problem occurs regardless of the type of device backend which is
> used for the domU block device exported by SCST.  The behavior has
> been verified with blktap, image over loop and qdisk.  The problem
> also occurs when either FILEIO or BLOCKIO are used for the SCST
> virtual disk.
> 
> As I said at the outset exposing a device to blkback twice may be
> something it was never designed to do.  That being said using VM''s
for
> this type of testing certainly makes sense and the behavior is
> unexpected.
> 
> Let us know if there are any questions or if additional testing is
> needed.
> 
> Have a good remainder of the week.
> 
> Greg
> 
> As always,
> Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
> 4206 N. 19th Ave.           Specializing in information infra-structure
> Fargo, ND  58102            development.
> PH: 701-281-1686
> FAX: 701-281-3949           EMAIL: greg@enjellic.com
>
------------------------------------------------------------------------------
> "Laugh now but you won''t be laughing when we find you laying
on the
>  side of the road dead."
>                                 -- Betty Wettstein
>                                    At the Lake
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

Dr. Greg Wettstein

2013-Mar-28 06:23 UTC

head link

Re: iSCSI connection corrupts Xen block devices.

On Mar 26, 10:14am, =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= wrote:
} Subject: Re: [Xen-devel] iSCSI connection corrupts Xen block devices.
> On 26/03/13 07:26, Dr. Greg Wettstein wrote:
> > Hi, hope the week has started out well for everyone.
> > 
> > This report may be in the FWIW department since there may be a
> > fundamental reason why this doesn''t work.  We elected to
report this
> > to the Xen community since we thought any behavior which corrupted
> > disk images needed to at least be reported.
> Hello Greg,
Hi Roger, thanks for taking time to respond.  A thank you also to
Konrad for his reply.
> I''ve also noticed this some time ago, the cause of this bug is
that
> we pass granted pages to netback, and when trying to perform the
> grant copy operation it fails. I''ve sent a clumsy patch that
solved
> the problem, but it involves additional memcpy in order to avoid
> passing the granted page to netback:
>
> http://lists.xen.org/archives/html/xen-devel/2013-01/msg00717.html
>
> The best solution I can think of is storing the grant frame
> reference somewhere in the p2m table, and then using that reference
> instead of the mfn when performing the grant copy operation.
So it is definitely an issue of, "If it hurts then don''t do
that".

I had the sense, that expecting things to work right when pages were
transiting multiple instances and types of backends, may have been
optimistic.  It is somewhat of an edge case application so it would
seem reasonable to wait on wading into a fix until all the persistent
grant work has been completed.

On the other hand it may be worth tossing something into a FAQ
someplace since it does violate the concept of least surprise.
> Regards, Roger.
Thanks for the clarifications, have a good weekend.

Greg

}-- End of excerpt from =?ISO-8859-1?Q?Roger_Pau_Monn=E9?
As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"We know that communication is a problem, but the company is not going
 to discuss it with the employees."
                                -- Switching supervisor
                                   AT&T Long Lines Division

Seemingly Similar Threads

Search for more reasonably related threads

Xen devel - Mar 2013 - iSCSI connection corrupts Xen block devices.

iSCSI connection corrupts Xen block devices.

Re: iSCSI connection corrupts Xen block devices.

Re: iSCSI connection corrupts Xen block devices.

Re: iSCSI connection corrupts Xen block devices.

Seemingly Similar Threads