thr3ads.net - Xen users - [Xen-users] GPLPV and LUN''s on a storage [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Klaus Steinberger

2009-Oct-26 14:29 UTC

[Xen-users] GPLPV and LUN''s on a storage

Hi,

just a information:

I did run into big troubles running Windows HVM VM''s with GPLPV driver
on LUN''s
on a storage system. (In my case a CX4 from EMC).


What happened:

After e.g. Windows Updates the VM''s rendered unbootable


What causes this:

My assumption is:  Windows boots from the Qemu device until some point there it 
switches over from the QEMU device to the PV System device. But qemu uses the VM
Caches from DOM0, but with the PV driver the LUN is accessed directly.

Now it happens after a reboot that the VM caches are preloaded from previous 
boots, but the LUN contains really already different data. This leds to curious 
crashes.


My solution to avoid that:

Dropping caches with "echo 1 > /proc/sys/vm/drop_caches"

This could also be added to the xm definition files, as they are just python:

os.system(''echo 1 >/proc/sys/vm/drop_caches'');



I already had a similar problem with paravirtualized linux VM''s on a
Redhat
System and external LUN''s. pygrub did show old boot entrys, different
from what
the VM has. Same reason. Dropping vm_caches in this case also helped. There is 
currently a bug open on redhat''s Bug Tracker.

Bug #466681

They work on direct I/O for at least pygrub.

Sincerly,
Klaus


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Pasi Kärkkäinen

2009-Oct-26 22:07 UTC

head link

Re: [Xen-users] GPLPV and LUN''s on a storage

On Mon, Oct 26, 2009 at 03:29:03PM +0100, Klaus Steinberger
wrote:> Hi,
> 
> just a information:
> 
> I did run into big troubles running Windows HVM VM''s with GPLPV
driver on
> LUN''s on a storage system. (In my case a CX4 from EMC).
> 
> 
> What happened:
> 
> After e.g. Windows Updates the VM''s rendered unbootable
> 
> 
> What causes this:
> 
> My assumption is:  Windows boots from the Qemu device until some point 
> there it switches over from the QEMU device to the PV System device. But 
> qemu uses the VM Caches from DOM0, but with the PV driver the LUN is 
> accessed directly.
> 
> Now it happens after a reboot that the VM caches are preloaded from 
> previous boots, but the LUN contains really already different data. This 
> leds to curious crashes.
> 
> 
> My solution to avoid that:
> 
> Dropping caches with "echo 1 > /proc/sys/vm/drop_caches"
> 
> This could also be added to the xm definition files, as they are just 
> python:
> 
> os.system(''echo 1 >/proc/sys/vm/drop_caches'');
> 
> 
> 
> I already had a similar problem with paravirtualized linux VM''s on
a Redhat
> System and external LUN''s. pygrub did show old boot entrys,
different from
> what the VM has. Same reason. Dropping vm_caches in this case also helped. 
> There is currently a bug open on redhat''s Bug Tracker.
> 
> Bug #466681
> 
> They work on direct I/O for at least pygrub.
> 
Do you use phy:, file: or tap:aio: ? 

-- Pasi



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Klaus Steinberger

2009-Oct-27 06:35 UTC

head link

Re: [Xen-users] GPLPV and LUN''s on a storage

> Do you use phy:, file: or tap:aio: ? 
phy:

Sincerly,
Klaus


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Pasi Kärkkäinen

2009-Oct-27 09:28 UTC

head link

Re: [Xen-users] GPLPV and LUN''s on a storage

On Tue, Oct 27, 2009 at 07:35:22AM +0100, Klaus Steinberger
wrote:> 
> >Do you use phy:, file: or tap:aio: ? 
> 
> phy:
> 
Ok. Hmm.. that should be safe.

The pygrub problem is most probably different issue, since with pygrub
you have two different programs accessing the same storage: domU with
for example phy: driver, and then the dom0 userspace pygrub using normal
non-direct calls, which causes the problem.

Redhat fixed pygrub by making pygrub use O_DIRECT to bypass dom0 kernel
caches.

What''s your dom0 OS/kernel/Xen? RHEL5.4?

-- Pasi

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Klaus Steinberger

2009-Oct-27 09:38 UTC

head link

Re: [Xen-users] GPLPV and LUN''s on a storage

Pasi Kärkkäinen schrieb:> On Tue, Oct 27, 2009 at 07:35:22AM +0100, Klaus Steinberger wrote:
>>> Do you use phy:, file: or tap:aio: ? 
>> phy:
>>
> 
> Ok. Hmm.. that should be safe.
> 
> The pygrub problem is most probably different issue, since with pygrub
> you have two different programs accessing the same storage: domU with
> for example phy: driver, and then the dom0 userspace pygrub using normal
> non-direct calls, which causes the problem.
> 
> Redhat fixed pygrub by making pygrub use O_DIRECT to bypass dom0 kernel
> caches.
> 
> What''s your dom0 OS/kernel/Xen? RHEL5.4?
Scientific Linux 5.3  (same as RHEL 5.3) kernel-2.6.18-128.7.1

Sincerly,
Klaus


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Pasi Kärkkäinen

2009-Oct-27 10:09 UTC

head link

Re: [Xen-users] GPLPV and LUN''s on a storage

On Tue, Oct 27, 2009 at 10:38:15AM +0100, Klaus Steinberger
wrote:> Pasi Kärkkäinen schrieb:
> >On Tue, Oct 27, 2009 at 07:35:22AM +0100, Klaus Steinberger wrote:
> >>>Do you use phy:, file: or tap:aio: ? 
> >>phy:
> >>
> >
> >Ok. Hmm.. that should be safe.
> >
> >The pygrub problem is most probably different issue, since with pygrub
> >you have two different programs accessing the same storage: domU with
> >for example phy: driver, and then the dom0 userspace pygrub using
normal
> >non-direct calls, which causes the problem.
> >
> >Redhat fixed pygrub by making pygrub use O_DIRECT to bypass dom0 kernel
> >caches.
> >
> >What''s your dom0 OS/kernel/Xen? RHEL5.4?
> 
> Scientific Linux 5.3  (same as RHEL 5.3) kernel-2.6.18-128.7.1
> 
Ok. Try updating to 5.4 and see if it helps..

What version of GPLPV?

Maybe James has some thoughts about this.. is it possible that Qemu IDE
devices use cached information from dom0 kernel cache and GPLPV direct
IO?

James: If you didn''t read the original email, Klaus has problems after
Windows update+reboot.. corrupted data.

-- Pasi


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

James Harper

2009-Oct-27 10:31 UTC

head link

RE: [Xen-users] GPLPV and LUN''s on a storage

> 
> just a information:
> 
> I did run into big troubles running Windows HVM VM''s with GPLPV
driver
on> LUN''s on a storage system. (In my case a CX4 from EMC).
I''m not familiar with CX4... is it iSCSI or AoE or hyperSCSI or
something?
> 
> What happened:
> 
> After e.g. Windows Updates the VM''s rendered unbootable
> 
> 
> What causes this:
> 
> My assumption is:  Windows boots from the Qemu device until some point
there> it
> switches over from the QEMU device to the PV System device. But qemu
uses the> VM
> Caches from DOM0, but with the PV driver the LUN is accessed directly.
> 
> Now it happens after a reboot that the VM caches are preloaded from
previous> boots, but the LUN contains really already different data. This leds
to> curious
> crashes.
> 
That is strange... when GPLPV is running as it should, I don''t think
that there are any writes done before GPLPV takes over. If there are, it
is done via the int13h BIOS interface and it writes a flag that says the
boot started which enables the next boot to present the "last boot was
not successful... safe mode?" menu.

Strange things do happen when you involve loopback or kpartx though, as
I found out recently. I use losetup with an offset to create a loopback
device so I can mount my NTFS volume under Linux (you break a lot of
systems when you do driver development :). If I forget to delete the
loopback device, all sorts of strange things happen, although I''ve
never
seen complete corruption before. kpartx (which uses device mapper) gives
the same problems.

Fwiw, I do almost all of my testing on phy: mapped lvm volumes.

James


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Klaus Steinberger

2009-Oct-27 11:08 UTC

head link

Re: [Xen-users] GPLPV and LUN''s on a storage

Pasi Kärkkäinen schrieb:> On Tue, Oct 27, 2009 at 10:38:15AM +0100, Klaus Steinberger wrote:
>> Pasi Kärkkäinen schrieb:
>>> On Tue, Oct 27, 2009 at 07:35:22AM +0100, Klaus Steinberger wrote:
>>> What''s your dom0 OS/kernel/Xen? RHEL5.4?
>> Scientific Linux 5.3  (same as RHEL 5.3) kernel-2.6.18-128.7.1
>>
> 
> Ok. Try updating to 5.4 and see if it helps..
> 
> What version of GPLPV?0.10.0.98

Sincerly,
Klaus


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Klaus Steinberger

2009-Oct-27 11:17 UTC

head link

Re: [Xen-users] GPLPV and LUN''s on a storage

Hi James,
> I''m not familiar with CX4... is it iSCSI or AoE or hyperSCSI or
> something?Fibrechannel, but supports also iSCSI. For these LUN''s I use
Fibrechannel.

> That is strange... when GPLPV is running as it should, I don''t
think
> that there are any writes done before GPLPV takes over. If there are, it
> is done via the int13h BIOS interface and it writes a flag that says the
> boot started which enables the next boot to present the "last boot was
> not successful... safe mode?" menu.
I think its not the writes, the reads from the VM Caches are the problem, as 
they are insonsistent with the  data on the LUN. That''s especially
problematic
if the Filesystem Metadata is read from cache, but is different to
what''s on
disk. Subsequent writes are probably going into the false places.
> Fwiw, I do almost all of my testing on phy: mapped lvm volumes.Sure, I never seen that problem with LVM. But on a RHEL cluster I can''t
use LVM
Snapshots (they don''t work in clustered LVM). So I want to use the LUN
handling
capabilities of the CX4 (snapshot''s, remote mirrorin to a second CX4
and so on).

Sincerly,
Klaus


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Oct 2009 - GPLPV and LUN''s on a storage

[Xen-users] GPLPV and LUN''s on a storage

Re: [Xen-users] GPLPV and LUN''s on a storage

Re: [Xen-users] GPLPV and LUN''s on a storage

Re: [Xen-users] GPLPV and LUN''s on a storage

Re: [Xen-users] GPLPV and LUN''s on a storage

Re: [Xen-users] GPLPV and LUN''s on a storage

RE: [Xen-users] GPLPV and LUN''s on a storage

Re: [Xen-users] GPLPV and LUN''s on a storage

Re: [Xen-users] GPLPV and LUN''s on a storage