Adam Wilbraham
2009-Jun-08 11:02 UTC
[Xen-users] Major corruption of Windows HVM disks - Xen 3.4
I''ve come back into work after the weekend and noticed that all (3) of our Windows HVMs have had massive issues with corruption of the hard disks. These have all been running rock solid on Xen 3.2.1 for the past 9 months or so but I upgraded their host to Xen 3.4 last week. I noticed on Friday morning that all 3 had randomly rebooted on Thursday evening but tried not to think too much of it. Unfortunately as I say I have come in this morning and they are all in various states of disrepair. One VM is claiming ntloader.exe is not on the disk (presumably with a number of other files), the other had crashed so I started it back up, it ran checkdsk before running through lots of corrupt and missing files, rebooting and is now BSODing on boot and the third boots but has an event log full of SQL Server errors talking about file corruption. As such I''m having to restore all three from backups which is not ideal. Has anyone experienced anything similar to this? Is it likely to be a problem with qemu rather than Xen? Either way, I am left with no choice than to roll back to a previous version of Xen as I cannot risk this happening again. Thanks, -- Adam Wilbraham - Systems Administrator TechnoPhobia Limited The Workstation 15 Paternoster Row SHEFFIELD England S1 2BX t: +44 (0)114 2212123 f: +44 (0)114 2212124 e: adam.wilbraham@technophobia.com w: http://www.technophobia.com/ Registered in England and Wales Company No. 3063669 VAT registration No. 598 7858 42 ISO 9001:2000 Accredited Company No. 21227 ISO 14001:2004 Accredited Company No. E997 ISO 27001:2005 (BS7799) Accredited Company No. IS 508906 Investor in People Certified No. 101507 The contents of this email are confidential to the addressee and are intended solely for the recipients use. If you are not the addressee, you have received this email in error. Any disclosure, copying, distribution or action taken in reliance on it is prohibited and may be unlawful. Any opinions expressed in this email are those of the author personally and not TechnoPhobia Limited who do not accept responsibility for the contents of the message. All email communications, in and out of TechnoPhobia, are recorded for monitoring purposes. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
James Harper
2009-Jun-08 12:49 UTC
RE: [Xen-users] Major corruption of Windows HVM disks - Xen 3.4
> > I''ve come back into work after the weekend and noticed that all (3) of > our Windows HVMs have had massive issues with corruption of the hard > disks. These have all been running rock solid on Xen 3.2.1 for thepast> 9 months or so but I upgraded their host to Xen 3.4 last week. > > I noticed on Friday morning that all 3 had randomly rebooted on > Thursday evening but tried not to think too much of it. Unfortunately > as I say I have come in this morning and they are all in variousstates> of disrepair. > > One VM is claiming ntloader.exe is not on the disk (presumably with a > number of other files), the other had crashed so I started it back up, > it ran checkdsk before running through lots of corrupt and missing > files, rebooting and is now BSODing on boot and the third boots buthas> an event log full of SQL Server errors talking about file corruption.As> such I''m having to restore all three from backups which is not ideal. > > Has anyone experienced anything similar to this? Is it likely to be a > problem with qemu rather than Xen? Either way, I am left with nochoice> than to roll back to a previous version of Xen as I cannot risk this > happening again. >Have you established that the virtual disks are definitely corrupt, as opposed to something gone wrong in Dom0 that makes them seem corrupt. I can''t think what would cause that situation to arise though. Are you using my GPLPV drivers? If so, then I would really like to hear more about what went wrong so I can look into it and make sure it isn''t a problem with the drivers, although for all 3 domU''s to fail simultaneously like that it would be unlikely to be a DomU side problem. In my experience, the qemu drivers are prone to this sort of thing on unclean shutdowns unfortunately, although I would have thought less so with 3.4 than with 3.2... I have seen it before under 3.1 with Dom0 running out of memory and firing up the OOM killer (snmpd memory leak), although the worst I''ve seen was a corrupt Exchange database that restored without further problems. Did you upgrade the Dom0 kernel when you upgraded xen? Good luck with the restores. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Adam Wilbraham
2009-Jun-08 13:04 UTC
Re: [Xen-users] Major corruption of Windows HVM disks - Xen 3.4
On 08/06/09 13:49, James Harper wrote:> Have you established that the virtual disks are definitely corrupt, as > opposed to something gone wrong in Dom0 that makes them seem corrupt. I > can''t think what would cause that situation to arise though.The first thing I did was to try and bring up all 3 VMs on the host that they are replicated to using DRBD. This host is still using Xen 3.2.1 as I haven''t upgraded it yet - I was waiting to see how things went before doing both.> Are you using my GPLPV drivers? If so, then I would really like to hear > more about what went wrong so I can look into it and make sure it isn''t > a problem with the drivers, although for all 3 domU''s to fail > simultaneously like that it would be unlikely to be a DomU side problem.A couple of the VMs do have an older version of the GPLPV drivers on but they were not active - I experienced problems with live migration when using your drivers so had to stick with pure HVM. I will be trying them out again in the near future as they appear to have progressed well over the last number of months, but as I say they are currently disabled so I would be 99% sure that they aren''t to blame.> In my experience, the qemu drivers are prone to this sort of thing on > unclean shutdowns unfortunately, although I would have thought less so > with 3.4 than with 3.2... I have seen it before under 3.1 with Dom0 > running out of memory and firing up the OOM killer (snmpd memory leak), > although the worst I''ve seen was a corrupt Exchange database that > restored without further problems.The corruption did seem pretty bad on all 3 but I am indeed thinking qemu may have been part of problem here. I''m just about to start trawling the logs to see if there is anything useful in there.> Did you upgrade the Dom0 kernel when you upgraded xen?No - still using 2.6.18-6 which is a standard Debian Etch kernel. I contemplated moving to the custom compiled one but everything I read on the list suggested that the Etch one would be fine. Can you see any benefit in upgrading the dom0 kernel too? Like I say this entire system has been rock solid on 3.2.1 since it was built in late 2008. The only driver for upgrade was that it was running a 32bit hypervisor and we needed more memory - I just thought bringing the base platform up to date would be a good idea at the same time.> Good luck with the restores.Thankfully everything is back up and running now - backups are a great thing when you actually take them! -- Adam Wilbraham - Systems Administrator TechnoPhobia Limited The Workstation 15 Paternoster Row SHEFFIELD England S1 2BX t: +44 (0)114 2212123 f: +44 (0)114 2212124 e: adam.wilbraham@technophobia.com w: http://www.technophobia.com/ Registered in England and Wales Company No. 3063669 VAT registration No. 598 7858 42 ISO 9001:2000 Accredited Company No. 21227 ISO 14001:2004 Accredited Company No. E997 ISO 27001:2005 (BS7799) Accredited Company No. IS 508906 Investor in People Certified No. 101507 The contents of this email are confidential to the addressee and are intended solely for the recipients use. If you are not the addressee, you have received this email in error. Any disclosure, copying, distribution or action taken in reliance on it is prohibited and may be unlawful. Any opinions expressed in this email are those of the author personally and not TechnoPhobia Limited who do not accept responsibility for the contents of the message. All email communications, in and out of TechnoPhobia, are recorded for monitoring purposes. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users