Jeremy Fitzhardinge
2010-Oct-21 00:04 UTC
[Xen-devel] linux-next regression: IO errors in with ext4 and xen-blkfront
Hi, When doing some regression testing with Xen on linux-next, I''m finding that my domains are failing to get through the boot sequence due to IO errors: Remounting root filesystem in read-write mode: EXT4-fs (dm-0): re-mounted. Opts: (null) [ OK ] Mounting local filesystems: EXT3-fs: barriers not enabled kjournald starting. Commit interval 5 seconds EXT3-fs (xvda1): using internal journal EXT3-fs (xvda1): mounted filesystem with writeback data mode SELinux: initialized (dev xvda1, type ext3), uses xattr SELinux: initialized (dev xenfs, type xenfs), uses genfs_contexts [ OK ] Enabling local filesystem quotas: [ OK ] Enabling /etc/fstab swaps: Adding 917500k swap on /dev/mapper/vg_f1364-lv_swap. Priority:-1 extents:1 across:917500k [ OK ] SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts Entering non-interactive startup Starting monitoring for VG vg_f1364: 2 logical volume(s) in volume group "vg_f1364" monitored [ OK ] ip6tables: Applying firewall rules: [ OK ] iptables: Applying firewall rules: [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth0: Determining IP information for eth0... done. [ OK ] Starting auditd: [ OK ] end_request: I/O error, dev xvda, sector 0 end_request: I/O error, dev xvda, sector 0 end_request: I/O error, dev xvda, sector 9675936 Aborting journal on device dm-0-8. Starting portreserve: EXT4-fs error (device dm-0): ext4_journal_start_sb:259: Detected aborted journal EXT4-fs (dm-0): Remounting filesystem read-only [ OK ] Starting system logger: EXT4-fs (dm-0): error count: 4 EXT4-fs (dm-0): initial error at 1286479997: ext4_journal_start_sb:251 EXT4-fs (dm-0): last error at 1287618175: ext4_journal_start_sb:259 I haven''t tried to bisect this yet (which will be awkward because linux-next had also introduced various Xen bootcrashing bugs), but I wonder if you have any thoughts about what may be happening here. I guess an obvious candidate is the barrier changes in the storage subsystem, but I still get the same errors if I mount root with barrier=0. Current linux-2.6 mainline is fine, so the problem is in some of the patches targeted at the next merge window. Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Oct-21 00:09 UTC
[Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
On 10/20/2010 05:04 PM, Jeremy Fitzhardinge wrote:> Hi, > > When doing some regression testing with Xen on linux-next, I''m finding > that my domains are failing to get through the boot sequence due to IO > errors: > > Remounting root filesystem in read-write mode: EXT4-fs (dm-0): re-mounted. Opts: (null) > [ OK ] > Mounting local filesystems: EXT3-fs: barriers not enabled > kjournald starting. Commit interval 5 seconds > EXT3-fs (xvda1): using internal journal > EXT3-fs (xvda1): mounted filesystem with writeback data mode > SELinux: initialized (dev xvda1, type ext3), uses xattr > SELinux: initialized (dev xenfs, type xenfs), uses genfs_contexts > [ OK ] > Enabling local filesystem quotas: [ OK ] > Enabling /etc/fstab swaps: Adding 917500k swap on /dev/mapper/vg_f1364-lv_swap. Priority:-1 extents:1 across:917500k > [ OK ] > SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts > Entering non-interactive startup > Starting monitoring for VG vg_f1364: 2 logical volume(s) in volume group "vg_f1364" monitored > [ OK ] > ip6tables: Applying firewall rules: [ OK ] > iptables: Applying firewall rules: [ OK ] > Bringing up loopback interface: [ OK ] > Bringing up interface eth0: > Determining IP information for eth0... done. > [ OK ] > Starting auditd: [ OK ] > end_request: I/O error, dev xvda, sector 0 > end_request: I/O error, dev xvda, sector 0 > end_request: I/O error, dev xvda, sector 9675936 > Aborting journal on device dm-0-8. > Starting portreserve: EXT4-fs error (device dm-0): ext4_journal_start_sb:259: Detected aborted journal > EXT4-fs (dm-0): Remounting filesystem read-only > [ OK ] > Starting system logger: EXT4-fs (dm-0): error count: 4 > EXT4-fs (dm-0): initial error at 1286479997: ext4_journal_start_sb:251 > EXT4-fs (dm-0): last error at 1287618175: ext4_journal_start_sb:259 > > > I haven''t tried to bisect this yet (which will be awkward because > linux-next had also introduced various Xen bootcrashing bugs), but I > wonder if you have any thoughts about what may be happening here. I > guess an obvious candidate is the barrier changes in the storage > subsystem, but I still get the same errors if I mount root with barrier=0.Hm. I get the same errors, but the system boots to login prompt rather than hanging at that point above, and seems generally happy. So perhaps barriers are the key.> Current linux-2.6 mainline is fine, so the problem is in some of the > patches targeted at the next merge window. >Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jens Axboe
2010-Oct-22 08:18 UTC
[Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
On 2010-10-21 02:09, Jeremy Fitzhardinge wrote:> On 10/20/2010 05:04 PM, Jeremy Fitzhardinge wrote: >> Hi, >> >> When doing some regression testing with Xen on linux-next, I''m finding >> that my domains are failing to get through the boot sequence due to IO >> errors: >> >> Remounting root filesystem in read-write mode: EXT4-fs (dm-0): re-mounted. Opts: (null) >> [ OK ] >> Mounting local filesystems: EXT3-fs: barriers not enabled >> kjournald starting. Commit interval 5 seconds >> EXT3-fs (xvda1): using internal journal >> EXT3-fs (xvda1): mounted filesystem with writeback data mode >> SELinux: initialized (dev xvda1, type ext3), uses xattr >> SELinux: initialized (dev xenfs, type xenfs), uses genfs_contexts >> [ OK ] >> Enabling local filesystem quotas: [ OK ] >> Enabling /etc/fstab swaps: Adding 917500k swap on /dev/mapper/vg_f1364-lv_swap. Priority:-1 extents:1 across:917500k >> [ OK ] >> SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts >> Entering non-interactive startup >> Starting monitoring for VG vg_f1364: 2 logical volume(s) in volume group "vg_f1364" monitored >> [ OK ] >> ip6tables: Applying firewall rules: [ OK ] >> iptables: Applying firewall rules: [ OK ] >> Bringing up loopback interface: [ OK ] >> Bringing up interface eth0: >> Determining IP information for eth0... done. >> [ OK ] >> Starting auditd: [ OK ] >> end_request: I/O error, dev xvda, sector 0 >> end_request: I/O error, dev xvda, sector 0 >> end_request: I/O error, dev xvda, sector 9675936 >> Aborting journal on device dm-0-8. >> Starting portreserve: EXT4-fs error (device dm-0): ext4_journal_start_sb:259: Detected aborted journal >> EXT4-fs (dm-0): Remounting filesystem read-only >> [ OK ] >> Starting system logger: EXT4-fs (dm-0): error count: 4 >> EXT4-fs (dm-0): initial error at 1286479997: ext4_journal_start_sb:251 >> EXT4-fs (dm-0): last error at 1287618175: ext4_journal_start_sb:259 >> >> >> I haven''t tried to bisect this yet (which will be awkward because >> linux-next had also introduced various Xen bootcrashing bugs), but I >> wonder if you have any thoughts about what may be happening here. I >> guess an obvious candidate is the barrier changes in the storage >> subsystem, but I still get the same errors if I mount root with barrier=0. > > Hm. I get the same errors, but the system boots to login prompt rather > than hanging at that point above, and seems generally happy. So perhaps > barriers are the key.To test that theory, can you try and pull the two other main bits of the pending block patches and see if it works? git://git.kernel.dk/linux-2.6-block.git for-2.6.37/core git://git.kernel.dk/linux-2.6-block.git for-2.6.37/drivers and if that works, then pull git://git.kernel.dk/linux-2.6-block.git for-2.6.37/barrier and see how that fares. -- Jens Axboe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Hellwig
2010-Oct-22 08:29 UTC
[Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
In the barriers tree Xen claims to support flushes, but I doesn''t. It never handles REQ_FLUSH requests. Try commenting out the blk_queue_flush(info->rq, info->feature_flush); call and things should improve. I still need to hear back from Xen folks how to actually implement a cache flush - they only implement a barrier write privilegue which could never implement an empty cache flush. Up to current kernels that meant it would implement barrier writes with content correctly and silently ignore empty barriers leading to very interesting data integrity bugs. From 2.6.37 onwards it simply won''t work anymore at all, which is at least consistent (modulo the bug of actually claiming to support flushes). _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jens Axboe
2010-Oct-22 08:54 UTC
[Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
On 2010-10-22 10:29, Christoph Hellwig wrote:> In the barriers tree Xen claims to support flushes, but I doesn''t. > It never handles REQ_FLUSH requests. Try commenting out the > > blk_queue_flush(info->rq, info->feature_flush); > > call and things should improve. I still need to hear back from Xen > folks how to actually implement a cache flush - they only implement > a barrier write privilegue which could never implement an empty > cache flush. Up to current kernels that meant it would implement > barrier writes with content correctly and silently ignore empty barriers > leading to very interesting data integrity bugs. From 2.6.37 onwards > it simply won''t work anymore at all, which is at least consistent > (modulo the bug of actually claiming to support flushes).So how about we just disable barriers for Xen atm? I would really really like to push that branch out as well now, since I''ll be travelling for most of the merge window this time. -- Jens Axboe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Hellwig
2010-Oct-22 08:56 UTC
[Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
On Fri, Oct 22, 2010 at 10:54:54AM +0200, Jens Axboe wrote:> So how about we just disable barriers for Xen atm? I would really really > like to push that branch out as well now, since I''ll be travelling for > most of the merge window this time.Yes, that''s what removing/commenting out that line does. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jens Axboe
2010-Oct-22 08:57 UTC
[Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
On 2010-10-22 10:56, Christoph Hellwig wrote:> On Fri, Oct 22, 2010 at 10:54:54AM +0200, Jens Axboe wrote: >> So how about we just disable barriers for Xen atm? I would really really >> like to push that branch out as well now, since I''ll be travelling for >> most of the merge window this time. > > Yes, that''s what removing/commenting out that line does.Certainly, but I meant in the barrier branch for submission. If it doesn''t do empty flushes to begin with, that should be fixed up before being enabled in any case. I''ll disable barrier support in xen-blkfront.c for now. -- Jens Axboe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Hellwig
2010-Oct-22 09:20 UTC
[Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
On Fri, Oct 22, 2010 at 10:57:40AM +0200, Jens Axboe wrote:> On 2010-10-22 10:56, Christoph Hellwig wrote: > > On Fri, Oct 22, 2010 at 10:54:54AM +0200, Jens Axboe wrote: > >> So how about we just disable barriers for Xen atm? I would really really > >> like to push that branch out as well now, since I''ll be travelling for > >> most of the merge window this time. > > > > Yes, that''s what removing/commenting out that line does. > > Certainly, but I meant in the barrier branch for submission. If > it doesn''t do empty flushes to begin with, that should be fixed > up before being enabled in any case.Yes, it should have been disabled log ago. I had a long discussion with them abnout it when they introduced the even more buggy barriers by tags mode for .36 but they simply ignored it.> I''ll disable barrier support in xen-blkfront.c for now.Thanks. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Oct-25 18:26 UTC
Re: [Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
On Fri, Oct 22, 2010 at 04:29:16AM -0400, Christoph Hellwig wrote:> In the barriers tree Xen claims to support flushes, but I doesn''t. > It never handles REQ_FLUSH requests. Try commenting out the > > blk_queue_flush(info->rq, info->feature_flush); > > call and things should improve. I still need to hear back from Xen > folks how to actually implement a cache flush - they only implementI think we just blindly assume that we would pass the request to the backend. And if the backend is running under an ancient version (2.6.18), the behavior would be quite different. Perhaps we should negotiate with the backend whether it runs under a kernel with the new barrier support? And if so, then enable them? If the backend says it has no idea what we are talking about then disable the barrier support? How does that sound? (Adding Daniel to this email thread as he has much more experience than I do). Daniel, what about the "use tagged queuing for barriers" patch you wrote some time ago? Is it applicable to this issue?> a barrier write privilegue which could never implement an empty > cache flush. Up to current kernels that meant it would implement > barrier writes with content correctly and silently ignore empty barriers > leading to very interesting data integrity bugs. From 2.6.37 onwards > it simply won''t work anymore at all, which is at least consistent > (modulo the bug of actually claiming to support flushes). > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Hellwig
2010-Oct-25 18:47 UTC
Re: [Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
On Mon, Oct 25, 2010 at 02:26:30PM -0400, Konrad Rzeszutek Wilk wrote:> I think we just blindly assume that we would pass the request > to the backend. And if the backend is running under an ancient > version (2.6.18), the behavior would be quite different.I don''t think this has much to do with the backend. Xen never implemented empty barriers correctly. This has been a bug since day one, although before no one noticed because the cruft in the old barrier code made them look like they succeed without them actually succeeding. With the new barrier code you do get an error back for them - and you do get them more often because cache flushes aka empty barriers are the only thing we send now. The right fix is to add a cache flush command to the protocol which will do the right things for all guests. In fact I read on a netbsd lists they had to do exactly that command to get their cache flushes to work, so it must exist for some versions of the backends. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Oct-25 19:05 UTC
Re: [Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
On Mon, Oct 25, 2010 at 02:47:56PM -0400, Christoph Hellwig wrote:> On Mon, Oct 25, 2010 at 02:26:30PM -0400, Konrad Rzeszutek Wilk wrote: > > I think we just blindly assume that we would pass the request > > to the backend. And if the backend is running under an ancient > > version (2.6.18), the behavior would be quite different. > > I don''t think this has much to do with the backend. Xen never > implemented empty barriers correctly. This has been a bug since day > one, although before no one noticed because the cruft in the old > barrier code made them look like they succeed without them actually > succeeding. With the new barrier code you do get an error back for > them - and you do get them more often because cache flushes aka > empty barriers are the only thing we send now. > > The right fix is to add a cache flush command to the protocol which > will do the right things for all guests. In fact I read on a netbsd > lists they had to do exactly that command to get their cache flushes > to work, so it must exist for some versions of the backends.Ok, thank you for the pointer. Daniel, you are the resident expert, what do you say? Jens, for 2.6.37 is the patch for disabling write barrier support by the xen-blkfront the way to do it? Or if we came up with a patch now would it potentially make it in 2.6.37-rcX (I don''t know if the fix for this would qualify as a bug or regression since it looks to be adding a new command)? And what Christoph suggest that this has been in v2.6.36, v2.6.35, etc. so that would definitly but it outside the regression definition. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Oct-26 12:49 UTC
Re: [Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
On Mon, 2010-10-25 at 15:05 -0400, Konrad Rzeszutek Wilk wrote:> On Mon, Oct 25, 2010 at 02:47:56PM -0400, Christoph Hellwig wrote: > > On Mon, Oct 25, 2010 at 02:26:30PM -0400, Konrad Rzeszutek Wilk wrote: > > > I think we just blindly assume that we would pass the request > > > to the backend. And if the backend is running under an ancient > > > version (2.6.18), the behavior would be quite different. > > > > I don''t think this has much to do with the backend. Xen never > > implemented empty barriers correctly. This has been a bug since day > > one, although before no one noticed because the cruft in the old > > barrier code made them look like they succeed without them actually > > succeeding. With the new barrier code you do get an error back for > > them - and you do get them more often because cache flushes aka > > empty barriers are the only thing we send now. > > > > The right fix is to add a cache flush command to the protocol which > > will do the right things for all guests. In fact I read on a netbsd > > lists they had to do exactly that command to get their cache flushes > > to work, so it must exist for some versions of the backends. > > Ok, thank you for the pointer. > > Daniel, you are the resident expert, what do you say? > > Jens, for 2.6.37 is the patch for disabling write barrier support > by the xen-blkfront the way to do it?This thread is not just about a single command, it''s two entirely different models. Let''s try like approach it like this: I don''t see the point in adding a dedicated command for the above. You want the backend to issue a cache flush. As far as the current ring model is concerned, you can express this as an empty barrier write, or you can add a dedicated op (which is an empty request with a fancier name). That''s fairly boring. Bugginess in how Linux drivers / kernel versions realize this, whether in front- or backend, aside. Next, go on and make discussions more entertaining by redefining your use of the term ''barrier'' to mean ''cache flush'' now. I think that marked the end of the previous thread. I''ve seen discussions like this. That is, you remove the ordering constraint, which is what differentiates barriers from mere cache flushes. The crux is moving to a model where an ordered write requires a queue drain by the guest. That''s somewhat more low-level and for many disks more realistic, but it''s also awkward for a virtualization layer, compared to ordered/durable writes. One things that it gets you is more latency by stalling the request stream, then extra events to kick things off again (ok, not that the difference is huge). The more general reason why I''d be reluctant to move from barriers to a caching/flushing/non-ordering disk model are questions like: Why would a frontend even want to know if a disk is cached, or have to assume so? Letting the backend alone deal with it is less overhead across different guest systems, gets enforced in the right place, and avoids a rathole full of compat headaches later on. The barrier model is relatively straightforward to implement, even when it doesn''t map to the backend queue anymore. The backend will need to translate to queue draining and cache flushes as needed by the device then. That''s a state machine, but a small one, and not exactly a new idea. Furthermore: If the backend ever gets to start dealing with that entire cache write durability thing *properly*, we need synchronization across backend groups sharing a common physical layer anyway, to schedule and merge barrier points etc. That''s a bigger state machine, but derives from the one above. From there on, any effort spent on trying to ''simplify'' things by imposing explicit drain/flush on frontends will look rather embarrassing. Unless Xen is just a fancy way to run Linux on Linux on a flat partition, I''d rather like to see the barrier model stay, blkback fixed, frontend cache flushes mapped to empty barriers. In the long run, the simpler model is the least expensive one. Daniel> Or if we came up with a patch now would it potentially make it in > 2.6.37-rcX (I don''t know if the fix for this would qualify as a bug > or regression since it looks to be adding a new command)? And what > Christoph suggest that this has been in v2.6.36, v2.6.35, etc. so that > would definitly but it outside the regression definition._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Hellwig
2010-Oct-27 10:23 UTC
Re: [Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
I''m really not interested in getting into this flamewar again. If you want to make Xen blockdevices work reliably you need to implement a cache flush primitive in the driver. If your cache flush primitive also enforced ordering that''s fine for data integrity, but won''t help your performance. Note that current the _driver_ does not implement the cache flushes correctly which is what started this thread and the previous flamewar. If you can fix it using the existing primitive with just driver changes that''s fine - but according to http://mail-index.netbsd.org/port-xen/2010/09/24/msg006274.html at least the NetBSD people didn''t think so. For details on the implementation refer to the Documentation/block/writeback_cache_control.txt file in the kernel tree, for reasons why we got rid of barriers with their syncronization semantics refer to various threads on -fsdevel and lkml during the past couple of month (search your favour archive for barriers). _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel