Josip Rodin
2010-Mar-31 07:33 UTC
[Xen-devel] domU oom -> xvda1 read-only without any notice?
Hi, Two days ago a resource-intensive process caused one of my new 2.6.32 domUs to crash and burn, so yesterday I recompiled the kernel with debug options and retried it. In the early hours of today, after a few hours of running the same process, the kernel noticed an OOM again and started killing Apache and PostgreSQL. Unfortunately what also happened was that the root partition (/dev/xvda1) was somehow marked read-only. I tried to remount it from within domU, but I just got: % sudo mount -o remount,rw / mount: block device /dev/xvda1 is write-protected, mounting read-only There are no messages in the kernel log on either the .32 domU or the .26 dom0. What do I do, other than shutdown and re-create? How does one manually ''talk'' to blkback to see what''s up? -- 2. That which causes joy or happiness. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Mar-31 07:43 UTC
Re: [Xen-devel] domU oom -> xvda1 read-only without any notice?
On 31/03/2010 08:33, "Josip Rodin" <joy@entuzijast.net> wrote:> % sudo mount -o remount,rw / > mount: block device /dev/xvda1 is write-protected, mounting read-only > > There are no messages in the kernel log on either the .32 domU or the .26 > dom0. What do I do, other than shutdown and re-create? How does one manually > ''talk'' to blkback to see what''s up?Blkback communicates this kind of info to blkfront via a node in xenstore named ''info''. Bit 2 of this numeric field indicates if a virtual disc is read-only. I think it''s more likely that the domU has internally confused itself, rather than being told to mount read-only by dom0 -- the flags get probed when the virtual disc first appears, and I don''t think would get probed again after that anyway without a full disc hot-unplug/replug. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Josip Rodin
2010-Mar-31 07:44 UTC
[Xen-devel] Re: domU oom -> xvda1 read-only without any notice?
On Wed, Mar 31, 2010 at 09:33:40AM +0200, joy wrote:> Two days ago a resource-intensive process caused one of my new 2.6.32 domUs > to crash and burn, so yesterday I recompiled the kernel with debug options > and retried it. In the early hours of today, after a few hours of running > the same process, the kernel noticed an OOM again and started killing Apache > and PostgreSQL. Unfortunately what also happened was that the root partition > (/dev/xvda1) was somehow marked read-only. I tried to remount it from within > domU, but I just got: > > % sudo mount -o remount,rw / > mount: block device /dev/xvda1 is write-protected, mounting read-only > > There are no messages in the kernel log on either the .32 domU or the .26 > dom0. What do I do, other than shutdown and re-create? How does one manually > ''talk'' to blkback to see what''s up?On the dom0 I found /sys/module/blkbk/parameters/{debug_lvl,log_stats}. Echoing 1 into the latter got me: [1053818.198356] blkback.1.hda1: oo 22161 | rd 143219463 | wr 70692159 | br 0 [1053818.556417] blkback.1.hda2: oo 45794 | rd 41644778 | wr -2072774999 | br 0 [1053819.957581] blkback.2.hda1: oo 5277 | rd 35391 | wr 2396186 | br 0 [1053828.277147] blkback.1.hda1: oo 0 | rd 275 | wr 592 | br 0 [1053828.802488] blkback.1.hda2: oo 0 | rd 50 | wr 22689 | br 0 [1053835.086327] blkback.2.hda1: oo 0 | rd 0 | wr 85 | br 0 And so on. Another attempt at remount didn''t provoke any reaction, and I had also set debug_lvl. -- 2. That which causes joy or happiness. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Josip Rodin
2010-Mar-31 07:49 UTC
Re: [Xen-devel] domU oom -> xvda1 read-only without any notice?
On Wed, Mar 31, 2010 at 08:43:07AM +0100, Keir Fraser wrote:> On 31/03/2010 08:33, "Josip Rodin" <joy@entuzijast.net> wrote: > > > % sudo mount -o remount,rw / > > mount: block device /dev/xvda1 is write-protected, mounting read-only > > > > There are no messages in the kernel log on either the .32 domU or the .26 > > dom0. What do I do, other than shutdown and re-create? How does one manually > > ''talk'' to blkback to see what''s up? > > Blkback communicates this kind of info to blkfront via a node in xenstore > named ''info''. Bit 2 of this numeric field indicates if a virtual disc is > read-only. I think it''s more likely that the domU has internally confused > itself, rather than being told to mount read-only by dom0 -- the flags get > probed when the virtual disc first appears, and I don''t think would get > probed again after that anyway without a full disc hot-unplug/replug.OK, thanks, so is there any way I could examine xen-blkfront then? :) I looked around and only found funny bits like: % cat /sys/module/xen_blkfront/drivers/xen:vbd/vbd-51713/block/xvda1/ro 0 -- 2. That which causes joy or happiness. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Mar-31 08:45 UTC
Re: [Xen-devel] domU oom -> xvda1 read-only without any notice?
On 31/03/2010 08:49, "Josip Rodin" <joy@entuzijast.net> wrote:>> Blkback communicates this kind of info to blkfront via a node in xenstore >> named ''info''. Bit 2 of this numeric field indicates if a virtual disc is >> read-only. I think it''s more likely that the domU has internally confused >> itself, rather than being told to mount read-only by dom0 -- the flags get >> probed when the virtual disc first appears, and I don''t think would get >> probed again after that anyway without a full disc hot-unplug/replug. > > OK, thanks, so is there any way I could examine xen-blkfront then? :) > I looked around and only found funny bits like: > > % cat /sys/module/xen_blkfront/drivers/xen:vbd/vbd-51713/block/xvda1/ro > 0AFAIK that means that the read-only-ness is not being propagated up to domU block layer by the xen_blockfront driver. I''m not sure exactly what other possibilities there are, but if the kernel has been OOMing processes then perhaps you''re in a runlevel or mode, or even a kenrel bug, in which rootfs is forced read-only for other reasons? There have been bugs around OOM in the past, and really it''s a kernel path that''s obviously best avoided! Any idea why the OOM occurred in the first place? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Josip Rodin
2010-Mar-31 08:52 UTC
Re: [Xen-devel] domU oom -> xvda1 read-only without any notice?
On Wed, Mar 31, 2010 at 09:45:01AM +0100, Keir Fraser wrote:> >> Blkback communicates this kind of info to blkfront via a node in xenstore > >> named ''info''. Bit 2 of this numeric field indicates if a virtual disc is > >> read-only. I think it''s more likely that the domU has internally confused > >> itself, rather than being told to mount read-only by dom0 -- the flags get > >> probed when the virtual disc first appears, and I don''t think would get > >> probed again after that anyway without a full disc hot-unplug/replug. > > > > OK, thanks, so is there any way I could examine xen-blkfront then? :) > > I looked around and only found funny bits like: > > > > % cat /sys/module/xen_blkfront/drivers/xen:vbd/vbd-51713/block/xvda1/ro > > 0 > > AFAIK that means that the read-only-ness is not being propagated up to domU > block layer by the xen_blockfront driver. I''m not sure exactly what other > possibilities there are, but if the kernel has been OOMing processes then > perhaps you''re in a runlevel or mode, or even a kenrel bug, in which rootfs > is forced read-only for other reasons? There have been bugs around OOM in > the past, and really it''s a kernel path that''s obviously best avoided! Any > idea why the OOM occurred in the first place?Not really. It''s prompted by a skipfish (web security scanner) run on the web server on the same machine, and that server has PHP that talks to the database. It does nicely for a few hours, then it bursts into flames. The 2.6.26 domU kernel reliably died completely in that situation, it''s that SMP bug with the old forward-ported Xen patches. The new kernel survived some OOM situations, and obviously some others not so much :) This is the first time I saw it break blkfront, so I figured it may be useful to report early in case I can gather some more debug info now. If nobody has any immediate suggestions as to what to do, I''m going to restart it now because we need the machine in a not-so-useless state. But I''ll probably reproduce the same problem tonight anyway. -- 2. That which causes joy or happiness. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Mar-31 08:54 UTC
Re: [Xen-devel] domU oom -> xvda1 read-only without any notice?
On Wed, 2010-03-31 at 09:45 +0100, Keir Fraser wrote:> > AFAIK that means that the read-only-ness is not being propagated up to > domU block layer by the xen_blockfront driver. I''m not sure exactly > what other possibilities there are, but if the kernel has been OOMing > processes then perhaps you''re in a runlevel or mode, or even a kenrel > bug, in which rootfs is forced read-only for other reasons? There have > been bugs around OOM in the past, and really it''s a kernel path that''s > obviously best avoided! Any idea why the OOM occurred in the first > place?Root filesystems are often mounted with the "errors=remount-ro" option (is it the Debian default?). So it''s possible this is a domU decision to go read-only, but the question remains as to what happened in the domU to trigger this decision, can an OOM do that? There might be something earlier in the domU dmesg? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Josip Rodin
2010-Mar-31 09:01 UTC
Re: [Xen-devel] domU oom -> xvda1 read-only without any notice?
On Wed, Mar 31, 2010 at 09:54:59AM +0100, Ian Campbell wrote:> On Wed, 2010-03-31 at 09:45 +0100, Keir Fraser wrote: > > AFAIK that means that the read-only-ness is not being propagated up to > > domU block layer by the xen_blockfront driver. I''m not sure exactly > > what other possibilities there are, but if the kernel has been OOMing > > processes then perhaps you''re in a runlevel or mode, or even a kenrel > > bug, in which rootfs is forced read-only for other reasons? There have > > been bugs around OOM in the past, and really it''s a kernel path that''s > > obviously best avoided! Any idea why the OOM occurred in the first > > place? > > Root filesystems are often mounted with the "errors=remount-ro" option > (is it the Debian default?). So it''s possible this is a domU decision to > go read-only, but the question remains as to what happened in the domU > to trigger this decision, can an OOM do that? There might be something > earlier in the domU dmesg?Yes, it''s the default, but it doesn''t explain why I can''t go *back* to rw. remount-ro should be functionally equivalent to mount -o remount,ro /, but something made the system think that the block device is read-only. And, there''s nothing out of the OOM context in dmesg, as in, nothing mentions xvda explicitly, but here''s the entire kernel log from the last boot to the end. (Hopefully the list allows gzipped attachments, it''s a lot of text.) -- 2. That which causes joy or happiness. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Josip Rodin
2010-Mar-31 09:17 UTC
Re: [Xen-devel] domU oom -> xvda1 read-only without any notice?
On Wed, Mar 31, 2010 at 10:52:45AM +0200, Josip Rodin wrote:> I''m going to restart it now because we need the machine in a > not-so-useless state.After the shutdown and create: [...] [ 0.488419] blkfront: xvda1: barriers enabled done. Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done. Begin: Running /scripts/local-premount ... done. [ 0.670172] EXT3-fs: INFO: recovery required on readonly filesystem. [ 0.670186] EXT3-fs: write access will be enabled during recovery. [ 9.996563] kjournald starting. Commit interval 5 seconds [ 9.996577] EXT3-fs warning (device xvda1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure [ 9.996585] EXT3-fs warning (device xvda1): ext3_clear_journal_err: Marking fs in need of filesystem check. [ 9.996991] EXT3-fs: recovery complete. [ 9.997318] EXT3-fs: mounted filesystem with ordered data mode. [...] Will now check root file system:fsck 1.41.3 (12-Oct-2008) [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a -C0 /dev/xvda1 lastovo-root contains a file system with errors, check forced. Deleted inode 1704618 has zero dtime. FIXED. lastovo-root: ***** REBOOT LINUX ***** lastovo-root: 1046964/3276800 files (1.1% non-contiguous), 10147312/13107200 blocks fsck died with exit status 3 failed! The file system check corrected errors on the root partition but requested that the system be restarted. failed! The system will be restarted in 5 seconds. (warning). Will now restart. Then it booted fine. I examined the graphs of the machine from the time of the incident, and it seems that everything was fine until around 2:30 when a large network operation started - Legato nsrexecd was backing it up - its remote logs say it transferred 4 GB of data in around 11 minutes, and finished successfully. At that point, the graphs on the machine recorded a huge spike in both Apache and PostgreSQL connections, and then soon after the whole thing went AWOL. If necessary I can attach the entire graph snapshot, which also includes the approximated state of /proc/interrupts and /proc/meminfo. -- 2. That which causes joy or happiness. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Mar-31 10:40 UTC
Re: [Xen-devel] domU oom -> xvda1 read-only without any notice?
On Wed, 2010-03-31 at 10:01 +0100, Josip Rodin wrote:> On Wed, Mar 31, 2010 at 09:54:59AM +0100, Ian Campbell wrote: > > On Wed, 2010-03-31 at 09:45 +0100, Keir Fraser wrote: > > > AFAIK that means that the read-only-ness is not being propagated up to > > > domU block layer by the xen_blockfront driver. I''m not sure exactly > > > what other possibilities there are, but if the kernel has been OOMing > > > processes then perhaps you''re in a runlevel or mode, or even a kenrel > > > bug, in which rootfs is forced read-only for other reasons? There have > > > been bugs around OOM in the past, and really it''s a kernel path that''s > > > obviously best avoided! Any idea why the OOM occurred in the first > > > place? > > > > Root filesystems are often mounted with the "errors=remount-ro" option > > (is it the Debian default?). So it''s possible this is a domU decision to > > go read-only, but the question remains as to what happened in the domU > > to trigger this decision, can an OOM do that? There might be something > > earlier in the domU dmesg? > > Yes, it''s the default, but it doesn''t explain why I can''t go *back* to rw. > remount-ro should be functionally equivalent to mount -o remount,ro /, but > something made the system think that the block device is read-only.I''m not sure about that. It''s possible that whatever the original error was it caused the system to also mark the underlying block device as ro at the same time as the remount,ro and therefore going back to rw is not possible. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Mar-31 10:46 UTC
Re: [Xen-devel] domU oom -> xvda1 read-only without any notice?
On Wed, 2010-03-31 at 10:17 +0100, Josip Rodin wrote: Is it possible that this (which I presume is a flag set in the filesystem metadata):> [ 9.996577] EXT3-fs warning (device xvda1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure > [ 9.996585] EXT3-fs warning (device xvda1): ext3_clear_journal_err: Marking fs in need of filesystem check.would also have caused the filesystem to reject attempts to remount,rw before the reboot until you rebooted and fsck ran? I took a look at your logs (other mail) and I can''t see any mention of an IO error on xvda either -- I must admit I''m pretty perplexed :-( Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ferenc Wagner
2010-Apr-01 14:09 UTC
[Xen-devel] Re: domU oom -> xvda1 read-only without any notice?
Ian Campbell <Ian.Campbell@citrix.com> writes:> On Wed, 2010-03-31 at 10:17 +0100, Josip Rodin wrote: > Is it possible that this (which I presume is a flag set in the > filesystem metadata): > >> [ 9.996577] EXT3-fs warning (device xvda1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure >> [ 9.996585] EXT3-fs warning (device xvda1): ext3_clear_journal_err: Marking fs in need of filesystem check. > > would also have caused the filesystem to reject attempts to remount,rw > before the reboot until you rebooted and fsck ran? > > I took a look at your logs (other mail) and I can''t see any mention of > an IO error on xvda either -- I must admit I''m pretty perplexed :-(OTOH you shouldn''t be surprised about not finding anything in the logs about the filesystem being read-only. :) Dmesg would have a bigger chance of containing some traces, or network logs, if present. -- Regards, Feri. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Apr-01 14:11 UTC
Re: [Xen-devel] Re: domU oom -> xvda1 read-only without any notice?
On Thu, 2010-04-01 at 15:09 +0100, Ferenc Wagner wrote:> Ian Campbell <Ian.Campbell@citrix.com> writes: > > > On Wed, 2010-03-31 at 10:17 +0100, Josip Rodin wrote: > > Is it possible that this (which I presume is a flag set in the > > filesystem metadata): > > > >> [ 9.996577] EXT3-fs warning (device xvda1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure > >> [ 9.996585] EXT3-fs warning (device xvda1): ext3_clear_journal_err: Marking fs in need of filesystem check. > > > > would also have caused the filesystem to reject attempts to remount,rw > > before the reboot until you rebooted and fsck ran? > > > > I took a look at your logs (other mail) and I can''t see any mention of > > an IO error on xvda either -- I must admit I''m pretty perplexed :-( > > OTOH you shouldn''t be surprised about not finding anything in the logs > about the filesystem being read-only. :)Doh, yes, right ;) Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel