James Harper
2007-Jun-01 02:43 UTC
[Xen-devel] Any known block device corruption problems in 3.1.0 up to release?
I''ve just had a virtual windows small business server that I was testing suddenly refuse to boot (black screen almost immediately). It was apparently working fine and then suddenly wouldn''t boot. When I boot from the CD into the windows recovery console, chkdsk tells me "The volume appears to contain one or more unrecoverable problems", and no more information than that. It looks like I can see the files on the harddisk, but when I try to create a directory or something I get "access denied". It seems that some sort of fairly major block device corruption has occurred, which isn''t a good result for the testing I''m doing. I did notice that something seemed to segfault and then I couldn''t connect via VNC at about the time the corruption would have occurred... I''m using a version of Xen which I think slightly pre-dates the 3.1.0 release, on an AMD64 system which has not shown any other indication of problems apart from this one incidence of corruption. Are there any known block device corruption issues slightly before the 3.1.0 release that have been subsequently fixed? I guess it''s worth building a newer version, but I''d like to know that there was definitely a known problem and it''s definitely fixed... Thanks James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2007-Jun-01 06:42 UTC
[Xen-users] RE: [Xen-devel] Any known block device corruption problems in 3.1.0 upto release?
Oops... I meant to post this to xen-users. I have gotten the server back up and running again by making the disk accessible to another DomU and running a chkdsk on it that way. It found a single corrupt index, repaired it, and everything is now fine. The corruption was probably limited to a single block, and may have just been caused by me shutting down the domain at exactly the wrong moment. Can anyone comment on what write-back caching might be going on that could cause these sorts of problems? Thanks James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Keir Fraser
2007-Jun-01 08:17 UTC
Re: [Xen-devel] Any known block device corruption problems in 3.1.0 up to release?
What type of block-device backend were you using? Since it''s a Windows guest presumably there are no PV drivers involved? Was the process that crashed qemu-dm? -- Keir On 1/6/07 03:43, "James Harper" <james.harper@bendigoit.com.au> wrote:> I''ve just had a virtual windows small business server that I was testing > suddenly refuse to boot (black screen almost immediately). It was > apparently working fine and then suddenly wouldn''t boot. When I boot > from the CD into the windows recovery console, chkdsk tells me "The > volume appears to contain one or more unrecoverable problems", and no > more information than that. It looks like I can see the files on the > harddisk, but when I try to create a directory or something I get > "access denied". > > It seems that some sort of fairly major block device corruption has > occurred, which isn''t a good result for the testing I''m doing. > > I did notice that something seemed to segfault and then I couldn''t > connect via VNC at about the time the corruption would have occurred... > > I''m using a version of Xen which I think slightly pre-dates the 3.1.0 > release, on an AMD64 system which has not shown any other indication of > problems apart from this one incidence of corruption. > > Are there any known block device corruption issues slightly before the > 3.1.0 release that have been subsequently fixed? I guess it''s worth > building a newer version, but I''d like to know that there was definitely > a known problem and it''s definitely fixed... > > Thanks > > James > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jun-01 08:32 UTC
[Xen-users] Re: [Xen-devel] Any known block device corruption problems in 3.1.0 upto release?
On 1/6/07 07:42, "James Harper" <james.harper@bendigoit.com.au> wrote:> I have gotten the server back up and running again by making the disk > accessible to another DomU and running a chkdsk on it that way. It found > a single corrupt index, repaired it, and everything is now fine. The > corruption was probably limited to a single block, and may have just > been caused by me shutting down the domain at exactly the wrong moment. > > Can anyone comment on what write-back caching might be going on that > could cause these sorts of problems?If you did an unclean shutdown of the guest, and/or qemu-dm crashed then that could easily explain small-scale block-device corruption. So the question is: what events originally caused the unclean shutdown? -- Keir _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
James Harper
2007-Jun-01 09:47 UTC
RE: [Xen-devel] Any known block device corruption problems in 3.1.0 up to release?
I believe it was qemu, as I accidentally purged all my qemu-dm.[0-9]+.log files, and now can''t find any record of it. As per a more recent follow up message, while the corruption was enough to stop windows booting, and the recovery console couldn''t fix it, running chkdsk from another domain fixed the problem in about 30 seconds, and there appears to be no actual lost data. So I''m guessing that the problem was limited to a single cluster (or whatever NTFS calls it''s allocation units). Thanks James> -----Original Message----- > From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] > Sent: Friday, 1 June 2007 18:17 > To: James Harper; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Any known block device corruption problems in > 3.1.0 up to release? > > What type of block-device backend were you using? Since it''s a Windows > guest > presumably there are no PV drivers involved? Was the process thatcrashed> qemu-dm? > > -- Keir > > On 1/6/07 03:43, "James Harper" <james.harper@bendigoit.com.au> wrote: > > > I''ve just had a virtual windows small business server that I wastesting> > suddenly refuse to boot (black screen almost immediately). It was > > apparently working fine and then suddenly wouldn''t boot. When I boot > > from the CD into the windows recovery console, chkdsk tells me "The > > volume appears to contain one or more unrecoverable problems", andno> > more information than that. It looks like I can see the files on the > > harddisk, but when I try to create a directory or something I get > > "access denied". > > > > It seems that some sort of fairly major block device corruption has > > occurred, which isn''t a good result for the testing I''m doing. > > > > I did notice that something seemed to segfault and then I couldn''t > > connect via VNC at about the time the corruption would haveoccurred...> > > > I''m using a version of Xen which I think slightly pre-dates the3.1.0> > release, on an AMD64 system which has not shown any other indicationof> > problems apart from this one incidence of corruption. > > > > Are there any known block device corruption issues slightly beforethe> > 3.1.0 release that have been subsequently fixed? I guess it''s worth > > building a newer version, but I''d like to know that there wasdefinitely> > a known problem and it''s definitely fixed... > > > > Thanks > > > > James > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2007-Jun-02 13:31 UTC
[Xen-users] RE: [Xen-devel] Any known block device corruption problems in 3.1.0 upto release?
> > If you did an unclean shutdown of the guest, and/or qemu-dm crashedthen> that could easily explain small-scale block-device corruption. So the > question is: what events originally caused the unclean shutdown? >Qemu-dm just crashed again: Jun 2 23:18:40 dev kernel: qemu-dm[28963]: segfault at 00007fff8d08cd40 rip 0000000000409b25 rsp 00007fff8cf61710 error 4 If I can supply any info to make those numbers meaningful then let me know. I''ll do some more testing, but I think it is to do with being a bit sloppy about connecting and disconnecting vnc clients. I''m using the tightvnc java client so that connecting to the vnc console is just a matter of loading a web page. I''ll follow up shortly if I can reproduce it. Thanks James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
James Harper
2007-Jun-02 13:38 UTC
RE: [Xen-users] RE: [Xen-devel] Any known block device corruptionproblems in 3.1.0 upto release?
> > I''ll follow up shortly if I can reproduce it. >I can reliably reproduce the problem by hitting the browser refresh on the java vnc client. Sometimes I have to hit it twice but most of the time a single browser refresh is all that''s required. Hitting the disconnect button in the java console and then reconnecting always seems to work. My best guess at this point is some sort of race condition in the tcp handling code in the vnc server... although I can''t reproduce it outside the web client (I''m running over a very slow link though). James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users