Hi, hope the week has started out well for everyone. This report may be in the FWIW department since there may be a fundamental reason why this doesn''t work. We elected to report this to the Xen community since we thought any behavior which corrupted disk images needed to at least be reported. We are maintaining the Xen-SAN release which provides hotplug functionality to allow Xen guests to participate as first class entities in either an iSCSI or fibre-channel storage network. We were preparing for a second release and ran across behavior which appears to cause Xen guest block devices to be corrupted. Relevant VM/OS versions are as follows: dom0: 3.4.35 domU: 3.4.35 Xen: 4.2.1 The test environment is a domU VM running which is using SCST (2.2.0) to export a block device via iSCSI. An iSCSI connection is initiated from dom0 to the target VM. The iSCSI block device has a VM system image on it. I/O can be done from dom0 to the guest without any apparent issues; ie, mounting the filesystem and reading and writing to it. The problem occurs when a second VM is started which uses the iSCSI based block device as its root filesystem. The VM starts and functions normally, I/O can be done without any issues from inside the VM. When the VM is shutdown and the iSCSI connection is closed the block device is instantly corrupted. The corruption isn''t subtle with the begining of the block device being over-written with what appears to be generic contents of the filesystem. The corruption doesn''t occur when the VM shuts down, only when the iSCSI connection is closed. If the iSCSI VM target server is run on a separate physical dom0 host everything functions normally. So the corruption is definitely linked to the both VM''s being run on the same physical dom0 instance. The problem occurs regardless of the type of device backend which is used for the domU block device exported by SCST. The behavior has been verified with blktap, image over loop and qdisk. The problem also occurs when either FILEIO or BLOCKIO are used for the SCST virtual disk. As I said at the outset exposing a device to blkback twice may be something it was never designed to do. That being said using VM''s for this type of testing certainly makes sense and the behavior is unexpected. Let us know if there are any questions or if additional testing is needed. Have a good remainder of the week. Greg As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "Laugh now but you won''t be laughing when we find you laying on the side of the road dead." -- Betty Wettstein At the Lake
On 26/03/13 07:26, Dr. Greg Wettstein wrote:> Hi, hope the week has started out well for everyone. > > This report may be in the FWIW department since there may be a > fundamental reason why this doesn''t work. We elected to report this > to the Xen community since we thought any behavior which corrupted > disk images needed to at least be reported.Hello Greg, I''ve also noticed this some time ago, the cause of this bug is that we pass granted pages to netback, and when trying to perform the grant copy operation it fails. I''ve sent a clumsy patch that solved the problem, but it involves additional memcpy in order to avoid passing the granted page to netback: http://lists.xen.org/archives/html/xen-devel/2013-01/msg00717.html The best solution I can think of is storing the grant frame reference somewhere in the p2m table, and then using that reference instead of the mfn when performing the grant copy operation. Regards, Roger.
Konrad Rzeszutek Wilk
2013-Mar-26 13:13 UTC
Re: iSCSI connection corrupts Xen block devices.
On Tue, Mar 26, 2013 at 01:26:24AM -0500, Dr. Greg Wettstein wrote:> Hi, hope the week has started out well for everyone. > > This report may be in the FWIW department since there may be a > fundamental reason why this doesn''t work. We elected to report this > to the Xen community since we thought any behavior which corrupted > disk images needed to at least be reported.You are hitting an issue that Roger hit as well. That is the m2p override mechanism can only handle one override for a PFN - not many. Here is the relevant discussion: http://lists.xen.org/archives/html/xen-devel/2013-01/msg00748.html> > We are maintaining the Xen-SAN release which provides hotplug > functionality to allow Xen guests to participate as first class > entities in either an iSCSI or fibre-channel storage network. We were > preparing for a second release and ran across behavior which appears > to cause Xen guest block devices to be corrupted. > > Relevant VM/OS versions are as follows: > > dom0: 3.4.35 > domU: 3.4.35 > Xen: 4.2.1 > > The test environment is a domU VM running which is using SCST > (2.2.0) to export a block device via iSCSI. An iSCSI connection is > initiated from dom0 to the target VM. The iSCSI block device has a VM > system image on it. I/O can be done from dom0 to the guest without > any apparent issues; ie, mounting the filesystem and reading and > writing to it. > > The problem occurs when a second VM is started which uses the iSCSI > based block device as its root filesystem. The VM starts and > functions normally, I/O can be done without any issues from inside the > VM. When the VM is shutdown and the iSCSI connection is closed the > block device is instantly corrupted. > > The corruption isn''t subtle with the begining of the block device > being over-written with what appears to be generic contents of the > filesystem. The corruption doesn''t occur when the VM shuts down, only > when the iSCSI connection is closed. > > If the iSCSI VM target server is run on a separate physical dom0 host > everything functions normally. So the corruption is definitely linked > to the both VM''s being run on the same physical dom0 instance. > > The problem occurs regardless of the type of device backend which is > used for the domU block device exported by SCST. The behavior has > been verified with blktap, image over loop and qdisk. The problem > also occurs when either FILEIO or BLOCKIO are used for the SCST > virtual disk. > > As I said at the outset exposing a device to blkback twice may be > something it was never designed to do. That being said using VM''s for > this type of testing certainly makes sense and the behavior is > unexpected. > > Let us know if there are any questions or if additional testing is > needed. > > Have a good remainder of the week. > > Greg > > As always, > Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. > 4206 N. 19th Ave. Specializing in information infra-structure > Fargo, ND 58102 development. > PH: 701-281-1686 > FAX: 701-281-3949 EMAIL: greg@enjellic.com > ------------------------------------------------------------------------------ > "Laugh now but you won''t be laughing when we find you laying on the > side of the road dead." > -- Betty Wettstein > At the Lake > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
On Mar 26, 10:14am, =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= wrote: } Subject: Re: [Xen-devel] iSCSI connection corrupts Xen block devices.> On 26/03/13 07:26, Dr. Greg Wettstein wrote: > > Hi, hope the week has started out well for everyone. > > > > This report may be in the FWIW department since there may be a > > fundamental reason why this doesn''t work. We elected to report this > > to the Xen community since we thought any behavior which corrupted > > disk images needed to at least be reported.> Hello Greg,Hi Roger, thanks for taking time to respond. A thank you also to Konrad for his reply.> I''ve also noticed this some time ago, the cause of this bug is that > we pass granted pages to netback, and when trying to perform the > grant copy operation it fails. I''ve sent a clumsy patch that solved > the problem, but it involves additional memcpy in order to avoid > passing the granted page to netback: > > http://lists.xen.org/archives/html/xen-devel/2013-01/msg00717.html > > The best solution I can think of is storing the grant frame > reference somewhere in the p2m table, and then using that reference > instead of the mfn when performing the grant copy operation.So it is definitely an issue of, "If it hurts then don''t do that". I had the sense, that expecting things to work right when pages were transiting multiple instances and types of backends, may have been optimistic. It is somewhat of an edge case application so it would seem reasonable to wait on wading into a fix until all the persistent grant work has been completed. On the other hand it may be worth tossing something into a FAQ someplace since it does violate the concept of least surprise.> Regards, Roger.Thanks for the clarifications, have a good weekend. Greg }-- End of excerpt from =?ISO-8859-1?Q?Roger_Pau_Monn=E9? As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "We know that communication is a problem, but the company is not going to discuss it with the employees." -- Switching supervisor AT&T Long Lines Division
Seemingly Similar Threads
- [PATCH 2/2] 4.1.2 blktap2 cleanup fixes.
- [PATCH 1/2] 4.1.2 blktap2 cleanup fixes.
- Does xen-4.2.0 support VGA passthrough with the virtual machine created by xl command?
- Re: Dom0 freeze on HVM DomU Windows reboot with VGA passthrough
- Re: Dom0 freeze on HVM DomU Windows reboot with VGA passthrough