Hi, just a information: I did run into big troubles running Windows HVM VM''s with GPLPV driver on LUN''s on a storage system. (In my case a CX4 from EMC). What happened: After e.g. Windows Updates the VM''s rendered unbootable What causes this: My assumption is: Windows boots from the Qemu device until some point there it switches over from the QEMU device to the PV System device. But qemu uses the VM Caches from DOM0, but with the PV driver the LUN is accessed directly. Now it happens after a reboot that the VM caches are preloaded from previous boots, but the LUN contains really already different data. This leds to curious crashes. My solution to avoid that: Dropping caches with "echo 1 > /proc/sys/vm/drop_caches" This could also be added to the xm definition files, as they are just python: os.system(''echo 1 >/proc/sys/vm/drop_caches''); I already had a similar problem with paravirtualized linux VM''s on a Redhat System and external LUN''s. pygrub did show old boot entrys, different from what the VM has. Same reason. Dropping vm_caches in this case also helped. There is currently a bug open on redhat''s Bug Tracker. Bug #466681 They work on direct I/O for at least pygrub. Sincerly, Klaus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Mon, Oct 26, 2009 at 03:29:03PM +0100, Klaus Steinberger wrote:> Hi, > > just a information: > > I did run into big troubles running Windows HVM VM''s with GPLPV driver on > LUN''s on a storage system. (In my case a CX4 from EMC). > > > What happened: > > After e.g. Windows Updates the VM''s rendered unbootable > > > What causes this: > > My assumption is: Windows boots from the Qemu device until some point > there it switches over from the QEMU device to the PV System device. But > qemu uses the VM Caches from DOM0, but with the PV driver the LUN is > accessed directly. > > Now it happens after a reboot that the VM caches are preloaded from > previous boots, but the LUN contains really already different data. This > leds to curious crashes. > > > My solution to avoid that: > > Dropping caches with "echo 1 > /proc/sys/vm/drop_caches" > > This could also be added to the xm definition files, as they are just > python: > > os.system(''echo 1 >/proc/sys/vm/drop_caches''); > > > > I already had a similar problem with paravirtualized linux VM''s on a Redhat > System and external LUN''s. pygrub did show old boot entrys, different from > what the VM has. Same reason. Dropping vm_caches in this case also helped. > There is currently a bug open on redhat''s Bug Tracker. > > Bug #466681 > > They work on direct I/O for at least pygrub. >Do you use phy:, file: or tap:aio: ? -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> Do you use phy:, file: or tap:aio: ?phy: Sincerly, Klaus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Oct 27, 2009 at 07:35:22AM +0100, Klaus Steinberger wrote:> > >Do you use phy:, file: or tap:aio: ? > > phy: >Ok. Hmm.. that should be safe. The pygrub problem is most probably different issue, since with pygrub you have two different programs accessing the same storage: domU with for example phy: driver, and then the dom0 userspace pygrub using normal non-direct calls, which causes the problem. Redhat fixed pygrub by making pygrub use O_DIRECT to bypass dom0 kernel caches. What''s your dom0 OS/kernel/Xen? RHEL5.4? -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Pasi Kärkkäinen schrieb:> On Tue, Oct 27, 2009 at 07:35:22AM +0100, Klaus Steinberger wrote: >>> Do you use phy:, file: or tap:aio: ? >> phy: >> > > Ok. Hmm.. that should be safe. > > The pygrub problem is most probably different issue, since with pygrub > you have two different programs accessing the same storage: domU with > for example phy: driver, and then the dom0 userspace pygrub using normal > non-direct calls, which causes the problem. > > Redhat fixed pygrub by making pygrub use O_DIRECT to bypass dom0 kernel > caches. > > What''s your dom0 OS/kernel/Xen? RHEL5.4?Scientific Linux 5.3 (same as RHEL 5.3) kernel-2.6.18-128.7.1 Sincerly, Klaus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Oct 27, 2009 at 10:38:15AM +0100, Klaus Steinberger wrote:> Pasi Kärkkäinen schrieb: > >On Tue, Oct 27, 2009 at 07:35:22AM +0100, Klaus Steinberger wrote: > >>>Do you use phy:, file: or tap:aio: ? > >>phy: > >> > > > >Ok. Hmm.. that should be safe. > > > >The pygrub problem is most probably different issue, since with pygrub > >you have two different programs accessing the same storage: domU with > >for example phy: driver, and then the dom0 userspace pygrub using normal > >non-direct calls, which causes the problem. > > > >Redhat fixed pygrub by making pygrub use O_DIRECT to bypass dom0 kernel > >caches. > > > >What''s your dom0 OS/kernel/Xen? RHEL5.4? > > Scientific Linux 5.3 (same as RHEL 5.3) kernel-2.6.18-128.7.1 >Ok. Try updating to 5.4 and see if it helps.. What version of GPLPV? Maybe James has some thoughts about this.. is it possible that Qemu IDE devices use cached information from dom0 kernel cache and GPLPV direct IO? James: If you didn''t read the original email, Klaus has problems after Windows update+reboot.. corrupted data. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > just a information: > > I did run into big troubles running Windows HVM VM''s with GPLPV driveron> LUN''s on a storage system. (In my case a CX4 from EMC).I''m not familiar with CX4... is it iSCSI or AoE or hyperSCSI or something?> > What happened: > > After e.g. Windows Updates the VM''s rendered unbootable > > > What causes this: > > My assumption is: Windows boots from the Qemu device until some pointthere> it > switches over from the QEMU device to the PV System device. But qemuuses the> VM > Caches from DOM0, but with the PV driver the LUN is accessed directly. > > Now it happens after a reboot that the VM caches are preloaded fromprevious> boots, but the LUN contains really already different data. This ledsto> curious > crashes. >That is strange... when GPLPV is running as it should, I don''t think that there are any writes done before GPLPV takes over. If there are, it is done via the int13h BIOS interface and it writes a flag that says the boot started which enables the next boot to present the "last boot was not successful... safe mode?" menu. Strange things do happen when you involve loopback or kpartx though, as I found out recently. I use losetup with an offset to create a loopback device so I can mount my NTFS volume under Linux (you break a lot of systems when you do driver development :). If I forget to delete the loopback device, all sorts of strange things happen, although I''ve never seen complete corruption before. kpartx (which uses device mapper) gives the same problems. Fwiw, I do almost all of my testing on phy: mapped lvm volumes. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Pasi Kärkkäinen schrieb:> On Tue, Oct 27, 2009 at 10:38:15AM +0100, Klaus Steinberger wrote: >> Pasi Kärkkäinen schrieb: >>> On Tue, Oct 27, 2009 at 07:35:22AM +0100, Klaus Steinberger wrote:>>> What''s your dom0 OS/kernel/Xen? RHEL5.4? >> Scientific Linux 5.3 (same as RHEL 5.3) kernel-2.6.18-128.7.1 >> > > Ok. Try updating to 5.4 and see if it helps.. > > What version of GPLPV?0.10.0.98 Sincerly, Klaus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi James,> I''m not familiar with CX4... is it iSCSI or AoE or hyperSCSI or > something?Fibrechannel, but supports also iSCSI. For these LUN''s I use Fibrechannel.> That is strange... when GPLPV is running as it should, I don''t think > that there are any writes done before GPLPV takes over. If there are, it > is done via the int13h BIOS interface and it writes a flag that says the > boot started which enables the next boot to present the "last boot was > not successful... safe mode?" menu.I think its not the writes, the reads from the VM Caches are the problem, as they are insonsistent with the data on the LUN. That''s especially problematic if the Filesystem Metadata is read from cache, but is different to what''s on disk. Subsequent writes are probably going into the false places.> Fwiw, I do almost all of my testing on phy: mapped lvm volumes.Sure, I never seen that problem with LVM. But on a RHEL cluster I can''t use LVM Snapshots (they don''t work in clustered LVM). So I want to use the LUN handling capabilities of the CX4 (snapshot''s, remote mirrorin to a second CX4 and so on). Sincerly, Klaus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users