Richard W.M. Jones
2012-Oct-08 14:16 UTC
Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
I''m tracking this bug here: https://bugzilla.redhat.com/show_bug.cgi?id=863978 Since approx. last week I''m seeing lots of failures in btrfs. The common factor seems to be that the filesystem is created (mkfs.btrfs /dev/sda1) and then it is immediately used -- eg. mounted or some btrfs subtool is run on it. There is no pause or sync between the operations. Typical errors include: mkfs.btrfs /dev/sda1 mount -o /dev/sda1 /sysroot/ [ 96.384211] device fsid 962db3c0-4153-450b-9ca7-c9216e81afe3 devid 1 transid 3 /dev/sda1 [ 96.385314] device fsid 962db3c0-4153-450b-9ca7-c9216e81afe3 devid 1 transid 3 /dev/sda1 [ 96.394158] btrfs: disk space caching is enabled [ 96.428656] btrfs: failed to recover relocation [ 96.437190] btrfs: open_ctree failed and: btrfsck /dev/sda1 Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 read block failed check_tree_block Couldn''t read chunk root (There are plenty of others, see the above bug link) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is ''top'' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2012-Oct-08 14:27 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Mon, Oct 08, 2012 at 08:16:42AM -0600, Richard W.M. Jones wrote:> > I''m tracking this bug here: > > https://bugzilla.redhat.com/show_bug.cgi?id=863978 > > Since approx. last week I''m seeing lots of failures in btrfs. The > common factor seems to be that the filesystem is created (mkfs.btrfs > /dev/sda1) and then it is immediately used -- eg. mounted or some > btrfs subtool is run on it. There is no pause or sync between the > operations.This was a problem on older btrfs-progs, but this commit: btrfs-progs-0.19.20120817git043a639-1.fc19.i686 (043a639) has long had the fixes to flush things after mkfs. Is there any change the guest you''re testing had an ancient progs on it? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard W.M. Jones
2012-Oct-08 14:57 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Mon, Oct 08, 2012 at 10:27:57AM -0400, Chris Mason wrote:> On Mon, Oct 08, 2012 at 08:16:42AM -0600, Richard W.M. Jones wrote: > > > > I''m tracking this bug here: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=863978 > > > > Since approx. last week I''m seeing lots of failures in btrfs. The > > common factor seems to be that the filesystem is created (mkfs.btrfs > > /dev/sda1) and then it is immediately used -- eg. mounted or some > > btrfs subtool is run on it. There is no pause or sync between the > > operations. > > This was a problem on older btrfs-progs, but this commit: > > btrfs-progs-0.19.20120817git043a639-1.fc19.i686 > > (043a639) has long had the fixes to flush things after mkfs. Is there > any change the guest you''re testing had an ancient progs on it?We have a couple of guests where this fails. One has btrfs-progs-0.19.20120817git043a639-1.fc19.i686. The other has btrfs-progs-0.19-20.fc18 which appears to be based on btrfs-progs-0.19.20120817git043a639.tar.bz2 plus some upstream patches. What is the commit which we need? I can''t see anything related to this in the btrfs-progs git log. I should note this was all working fine until very recently (under 5 days ago). Nothing has changed in btrfs-progs in Fedora for a few months. Could this be related to a kernel change? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2012-Oct-08 15:04 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Mon, Oct 08, 2012 at 08:57:30AM -0600, Richard W.M. Jones wrote:> On Mon, Oct 08, 2012 at 10:27:57AM -0400, Chris Mason wrote: > > On Mon, Oct 08, 2012 at 08:16:42AM -0600, Richard W.M. Jones wrote: > > > > > > I''m tracking this bug here: > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=863978 > > > > > > Since approx. last week I''m seeing lots of failures in btrfs. The > > > common factor seems to be that the filesystem is created (mkfs.btrfs > > > /dev/sda1) and then it is immediately used -- eg. mounted or some > > > btrfs subtool is run on it. There is no pause or sync between the > > > operations. > > > > This was a problem on older btrfs-progs, but this commit: > > > > btrfs-progs-0.19.20120817git043a639-1.fc19.i686 > > > > (043a639) has long had the fixes to flush things after mkfs. Is there > > any change the guest you''re testing had an ancient progs on it? > > We have a couple of guests where this fails. One has > btrfs-progs-0.19.20120817git043a639-1.fc19.i686. The other has > btrfs-progs-0.19-20.fc18 which appears to be based on > btrfs-progs-0.19.20120817git043a639.tar.bz2 plus some upstream > patches. > > What is the commit which we need? I can''t see anything related to > this in the btrfs-progs git log.Sorry, I was remembering wrong. I fixed this up in the kernel by running invalidate_bdev during mount. I just double checked and the invalidates look right, so something strange must be going on. If it is possible to reproduce this reliably, could you please check and see if syncs do fix it? We saw this often with xfstests in the past, but haven''t seen it since the invalidates were added. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard W.M. Jones
2012-Oct-08 15:15 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Mon, Oct 08, 2012 at 11:04:19AM -0400, Chris Mason wrote:> On Mon, Oct 08, 2012 at 08:57:30AM -0600, Richard W.M. Jones wrote: > > On Mon, Oct 08, 2012 at 10:27:57AM -0400, Chris Mason wrote: > > > On Mon, Oct 08, 2012 at 08:16:42AM -0600, Richard W.M. Jones wrote: > > > > > > > > I''m tracking this bug here: > > > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=863978 > > > > > > > > Since approx. last week I''m seeing lots of failures in btrfs. The > > > > common factor seems to be that the filesystem is created (mkfs.btrfs > > > > /dev/sda1) and then it is immediately used -- eg. mounted or some > > > > btrfs subtool is run on it. There is no pause or sync between the > > > > operations. > > > > > > This was a problem on older btrfs-progs, but this commit: > > > > > > btrfs-progs-0.19.20120817git043a639-1.fc19.i686 > > > > > > (043a639) has long had the fixes to flush things after mkfs. Is there > > > any change the guest you''re testing had an ancient progs on it? > > > > We have a couple of guests where this fails. One has > > btrfs-progs-0.19.20120817git043a639-1.fc19.i686. The other has > > btrfs-progs-0.19-20.fc18 which appears to be based on > > btrfs-progs-0.19.20120817git043a639.tar.bz2 plus some upstream > > patches. > > > > What is the commit which we need? I can''t see anything related to > > this in the btrfs-progs git log. > > Sorry, I was remembering wrong. I fixed this up in the kernel by > running invalidate_bdev during mount. I just double checked and the > invalidates look right, so something strange must be going on. > > If it is possible to reproduce this reliably, could you please check and > see if syncs do fix it? We saw this often with xfstests in the past, > but haven''t seen it since the invalidates were added.Unfortunately I''m struggling to reproduce this outside of our build system (Koji). I will keep you informed if I do manage to reproduce it locally. Adding fsync /dev/sda1 was also my first instinct :-) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2012-Oct-08 15:18 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Mon, Oct 08, 2012 at 09:15:14AM -0600, Richard W.M. Jones wrote:> On Mon, Oct 08, 2012 at 11:04:19AM -0400, Chris Mason wrote: > > On Mon, Oct 08, 2012 at 08:57:30AM -0600, Richard W.M. Jones wrote: > > > On Mon, Oct 08, 2012 at 10:27:57AM -0400, Chris Mason wrote: > > > > On Mon, Oct 08, 2012 at 08:16:42AM -0600, Richard W.M. Jones wrote: > > > > > > > > > > I''m tracking this bug here: > > > > > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=863978 > > > > > > > > > > Since approx. last week I''m seeing lots of failures in btrfs. The > > > > > common factor seems to be that the filesystem is created (mkfs.btrfs > > > > > /dev/sda1) and then it is immediately used -- eg. mounted or some > > > > > btrfs subtool is run on it. There is no pause or sync between the > > > > > operations. > > > > > > > > This was a problem on older btrfs-progs, but this commit: > > > > > > > > btrfs-progs-0.19.20120817git043a639-1.fc19.i686 > > > > > > > > (043a639) has long had the fixes to flush things after mkfs. Is there > > > > any change the guest you''re testing had an ancient progs on it? > > > > > > We have a couple of guests where this fails. One has > > > btrfs-progs-0.19.20120817git043a639-1.fc19.i686. The other has > > > btrfs-progs-0.19-20.fc18 which appears to be based on > > > btrfs-progs-0.19.20120817git043a639.tar.bz2 plus some upstream > > > patches. > > > > > > What is the commit which we need? I can''t see anything related to > > > this in the btrfs-progs git log. > > > > Sorry, I was remembering wrong. I fixed this up in the kernel by > > running invalidate_bdev during mount. I just double checked and the > > invalidates look right, so something strange must be going on. > > > > If it is possible to reproduce this reliably, could you please check and > > see if syncs do fix it? We saw this often with xfstests in the past, > > but haven''t seen it since the invalidates were added. > > Unfortunately I''m struggling to reproduce this outside of our build > system (Koji). I will keep you informed if I do manage to reproduce > it locally. Adding fsync /dev/sda1 was also my first instinct :-)When we saw this during xfstests, the fsync wasn''t sufficient. It was really pretty maddening and the invalidate was a nuke it from orbit style solution. The kernel side of the invalidate may have changed, so your first instinct of a kernel change is probably right. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2012-Oct-08 16:42 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Mon, Oct 08, 2012 at 04:15:14PM +0100, Richard W.M. Jones wrote:> Unfortunately I''m struggling to reproduce this outside of our build > system (Koji). I will keep you informed if I do manage to reproduce > it locally. Adding fsync /dev/sda1 was also my first instinct :-)Have you updated the VM/guest related packages recently? This may be a bug in the VM drivers. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard W.M. Jones
2012-Oct-08 17:01 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Mon, Oct 08, 2012 at 06:42:27PM +0200, David Sterba wrote:> On Mon, Oct 08, 2012 at 04:15:14PM +0100, Richard W.M. Jones wrote: > > Unfortunately I''m struggling to reproduce this outside of our build > > system (Koji). I will keep you informed if I do manage to reproduce > > it locally. Adding fsync /dev/sda1 was also my first instinct :-) > > Have you updated the VM/guest related packages recently? This may be a > bug in the VM drivers.qemu hasn''t been updated for over a week. However I''m having a hard time understanding how even a change to qemu''s caching would in any way affect only btrfs and nothing else. The libguestfs test suite is extremely comprehensive and tests many other filesystems, and none of them are failing. These guests are built on the fly from the latest packages in Fedora, so any other package might be the cause, but it seems like the kernel is the most likely candidate. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard W.M. Jones
2012-Oct-08 21:22 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Mon, Oct 08, 2012 at 04:15:14PM +0100, Richard W.M. Jones wrote:> On Mon, Oct 08, 2012 at 11:04:19AM -0400, Chris Mason wrote: > > On Mon, Oct 08, 2012 at 08:57:30AM -0600, Richard W.M. Jones wrote: > > > On Mon, Oct 08, 2012 at 10:27:57AM -0400, Chris Mason wrote: > > > > On Mon, Oct 08, 2012 at 08:16:42AM -0600, Richard W.M. Jones wrote: > > > > > > > > > > I''m tracking this bug here: > > > > > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=863978 > > > > > > > > > > Since approx. last week I''m seeing lots of failures in btrfs. The > > > > > common factor seems to be that the filesystem is created (mkfs.btrfs > > > > > /dev/sda1) and then it is immediately used -- eg. mounted or some > > > > > btrfs subtool is run on it. There is no pause or sync between the > > > > > operations. > > > > > > > > This was a problem on older btrfs-progs, but this commit: > > > > > > > > btrfs-progs-0.19.20120817git043a639-1.fc19.i686 > > > > > > > > (043a639) has long had the fixes to flush things after mkfs. Is there > > > > any change the guest you''re testing had an ancient progs on it? > > > > > > We have a couple of guests where this fails. One has > > > btrfs-progs-0.19.20120817git043a639-1.fc19.i686. The other has > > > btrfs-progs-0.19-20.fc18 which appears to be based on > > > btrfs-progs-0.19.20120817git043a639.tar.bz2 plus some upstream > > > patches. > > > > > > What is the commit which we need? I can''t see anything related to > > > this in the btrfs-progs git log. > > > > Sorry, I was remembering wrong. I fixed this up in the kernel by > > running invalidate_bdev during mount. I just double checked and the > > invalidates look right, so something strange must be going on. > > > > If it is possible to reproduce this reliably, could you please check and > > see if syncs do fix it? We saw this often with xfstests in the past, > > but haven''t seen it since the invalidates were added. > > Unfortunately I''m struggling to reproduce this outside of our build > system (Koji). I will keep you informed if I do manage to reproduce > it locally. Adding fsync /dev/sda1 was also my first instinct :-)I have now reproduced this bug locally. Adding sync() + fsync of each /dev/sd* device after the mkfs command does appear to fix the problem. However it''s a little bit difficult to know for sure because I might just be changing the timing of things by adding these calls. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2012-Oct-09 00:00 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Mon, Oct 08, 2012 at 03:22:30PM -0600, Richard W.M. Jones wrote:> > I have now reproduced this bug locally. > > Adding sync() + fsync of each /dev/sd* device after the mkfs command > does appear to fix the problem. > > However it''s a little bit difficult to know for sure because I might > just be changing the timing of things by adding these calls.Ok, what''s a rough idea of the mainline git equiv of the buggy kernel? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard W.M. Jones
2012-Oct-09 07:20 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote:> On Mon, Oct 08, 2012 at 03:22:30PM -0600, Richard W.M. Jones wrote: > > > > I have now reproduced this bug locally. > > > > Adding sync() + fsync of each /dev/sd* device after the mkfs command > > does appear to fix the problem. > > > > However it''s a little bit difficult to know for sure because I might > > just be changing the timing of things by adding these calls. > > Ok, what''s a rough idea of the mainline git equiv of the buggy kernel?On my local machine, I''m reproducing this with what Fedora calls 3.7.0-0.rc0.git2.4.fc19.x86_64 (note I found an unrelated but very serious bug in this kernel: http://marc.info/?l=linux-kernel&m=134973394826408&w=2 ) In Fedora we apply several patches on top, but none of them appear as if they would affect btrfs or sync/invalidate paths: http://pkgs.fedoraproject.org/cgit/kernel.git/tree/ Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard W.M. Jones
2012-Oct-09 07:33 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote:> On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote: > > On Mon, Oct 08, 2012 at 03:22:30PM -0600, Richard W.M. Jones wrote: > > > > > > I have now reproduced this bug locally. > > > > > > Adding sync() + fsync of each /dev/sd* device after the mkfs command > > > does appear to fix the problem. > > > > > > However it''s a little bit difficult to know for sure because I might > > > just be changing the timing of things by adding these calls. > > > > Ok, what''s a rough idea of the mainline git equiv of the buggy kernel? > > On my local machine, I''m reproducing this with what Fedora calls > 3.7.0-0.rc0.git2.4.fc19.x86_64OK, that''s not very helpful is it :-) AFAIK it should be possible to reproduce this with Linus''s git kernel, but I haven''t proven that yet. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2012-Oct-09 09:00 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Tue, Oct 09, 2012 at 08:33:57AM +0100, Richard W.M. Jones wrote:> On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote: > > On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote: > > > Ok, what''s a rough idea of the mainline git equiv of the buggy kernel? > > > > On my local machine, I''m reproducing this with what Fedora calls > > 3.7.0-0.rc0.git2.4.fc19.x86_64 > > OK, that''s not very helpful is it :-) AFAIK it should be possible > to reproduce this with Linus''s git kernel, but I haven''t proven > that yet.Found the same error message in my logs with master+next: Oct 8 15:07:25 kernel: [13048.856283] device fsid cd15a893-e955-49cc-989c-4fd952a838a6 devid 1 transid 3 /dev/sda9 Oct 8 15:07:25 kernel: [13048.866880] btrfs: disk space caching is enabled Oct 8 15:07:25 kernel: [13048.875767] btrfs: failed to recover relocation Oct 8 15:07:25 kernel: [13048.884662] btrfs: open_ctree failed There are some xfstests that triggered the related bug with stale data, I''m investigating further. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2012-Oct-09 09:16 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote:> On my local machine, I''m reproducing this with what Fedora calls > 3.7.0-0.rc0.git2.4.fc19.x86_64 (note I found an unrelated but very > serious bug in this kernel: > http://marc.info/?l=linux-kernel&m=134973394826408&w=2 )And it''s going to be fixed http://marc.info/?l=linux-fsdevel&m=134973493227376&w=2 david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard W.M. Jones
2012-Oct-09 09:26 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Tue, Oct 09, 2012 at 11:16:57AM +0200, David Sterba wrote:> On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote: > > On my local machine, I''m reproducing this with what Fedora calls > > 3.7.0-0.rc0.git2.4.fc19.x86_64 (note I found an unrelated but very > > serious bug in this kernel: > > http://marc.info/?l=linux-kernel&m=134973394826408&w=2 ) > > And it''s going to be fixed > > http://marc.info/?l=linux-fsdevel&m=134973493227376&w=2And this one too ... http://marc.info/?l=linux-fsdevel&m=134977414011004&w=2 Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#) http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard W.M. Jones
2012-Oct-10 11:49 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Mon, Oct 08, 2012 at 10:22:30PM +0100, Richard W.M. Jones wrote:> Adding sync() + fsync of each /dev/sd* device after the mkfs command > does appear to fix the problem. > > However it''s a little bit difficult to know for sure because I might > just be changing the timing of things by adding these calls.An update: Although doing the sync + fsync certainly makes the bug much much rarer, it does not entirely eliminate it. I have now seen one case where this still happened (log below). If there''s anything else you''d like me to test, including kernel patches, just let me know. Rich. Extract from the full log at: http://kojipkgs.fedoraproject.org//work/tasks/7186/4577186/build.log modprobe btrfs [ 15.823412] Btrfs loaded grep ^[[:space:]]*btrfs$ /proc/filesystems mkfs.btrfs /dev/sda1 /dev/sdb1 [ 16.740868] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 1 transid 1 /dev/sda1 [ 16.743227] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 1 transid 1 /dev/sda1 [ 17.446334] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 2 transid 3 /dev/sdb1 fsync /dev/sda fsync /dev/sdb fsync /dev/sdc fsync /dev/sdd libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 3d | 00 00 00 01 | 00 12 34 04 | ... libguestfs: trace: mkfs_btrfs = 0 libguestfs: trace: mount "/dev/sda1" "/" libguestfs: send_to_daemon: 68 bytes: 00 00 00 40 | 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 00 | ... guestfsd: main_loop: proc 317 (mkfs_btrfs) took 2.39 seconds guestfsd: main_loop: new request, len 0x40 mount -o /dev/sda1 /sysroot/ [ 17.838747] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 2 transid 4 /dev/sdb1 [ 17.917277] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 1 transid 4 /dev/sda1 [ 18.084520] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 1 transid 4 /dev/sda1 [ 18.132447] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 1 transid 4 /dev/sda1 [ 18.176566] btrfs: disk space caching is enabled [ 18.200901] btrfs bad tree block start 0 135168 [ 18.213456] btrfs: open_ctree failed mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so guestfsd: error: /dev/sda1 on / (options: ''''): mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so libguestfs: recv_from_daemon: 284 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 01 | 00 12 34 05 | ... libguestfs: trace: mount = -1 (error) -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2012-Oct-10 12:38 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Tue, Oct 09, 2012 at 03:00:12AM -0600, David Sterba wrote:> On Tue, Oct 09, 2012 at 08:33:57AM +0100, Richard W.M. Jones wrote: > > On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote: > > > On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote: > > > > Ok, what''s a rough idea of the mainline git equiv of the buggy kernel? > > > > > > On my local machine, I''m reproducing this with what Fedora calls > > > 3.7.0-0.rc0.git2.4.fc19.x86_64 > > > > OK, that''s not very helpful is it :-) AFAIK it should be possible > > to reproduce this with Linus''s git kernel, but I haven''t proven > > that yet. > > Found the same error message in my logs with master+next: > > Oct 8 15:07:25 kernel: [13048.856283] device fsid cd15a893-e955-49cc-989c-4fd952a838a6 devid 1 transid 3 /dev/sda9 > Oct 8 15:07:25 kernel: [13048.866880] btrfs: disk space caching is enabled > Oct 8 15:07:25 kernel: [13048.875767] btrfs: failed to recover relocation > Oct 8 15:07:25 kernel: [13048.884662] btrfs: open_ctree failed > > There are some xfstests that triggered the related bug with stale data, > I''m investigating further.Check your progs, this commit was updated to continue instead of break. https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commitdiff;h=6eba9002956ac40db87d42fb653a0524dc568810;hp=bc130ecd0260e4ee6ffe07ae43fc90db281a4daa The original commit triggered those errors during 204. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard W.M. Jones
2012-Oct-10 19:38 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Wed, Oct 10, 2012 at 08:38:08AM -0400, Chris Mason wrote:> On Tue, Oct 09, 2012 at 03:00:12AM -0600, David Sterba wrote: > > On Tue, Oct 09, 2012 at 08:33:57AM +0100, Richard W.M. Jones wrote: > > > On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote: > > > > On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote: > > > > > Ok, what''s a rough idea of the mainline git equiv of the buggy kernel? > > > > > > > > On my local machine, I''m reproducing this with what Fedora calls > > > > 3.7.0-0.rc0.git2.4.fc19.x86_64 > > > > > > OK, that''s not very helpful is it :-) AFAIK it should be possible > > > to reproduce this with Linus''s git kernel, but I haven''t proven > > > that yet. > > > > Found the same error message in my logs with master+next: > > > > Oct 8 15:07:25 kernel: [13048.856283] device fsid cd15a893-e955-49cc-989c-4fd952a838a6 devid 1 transid 3 /dev/sda9 > > Oct 8 15:07:25 kernel: [13048.866880] btrfs: disk space caching is enabled > > Oct 8 15:07:25 kernel: [13048.875767] btrfs: failed to recover relocation > > Oct 8 15:07:25 kernel: [13048.884662] btrfs: open_ctree failed > > > > There are some xfstests that triggered the related bug with stale data, > > I''m investigating further. > > Check your progs, this commit was updated to continue instead of break. > > https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commitdiff;h=6eba9002956ac40db87d42fb653a0524dc568810;hp=bc130ecd0260e4ee6ffe07ae43fc90db281a4daa > > The original commit triggered those errors during 204.It does seem as if adding that commit to btrfs-progs fixes the original bug I was reporting. As before, my test isn''t very reliable, so I cannot be 100% sure. I will continue running tests. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#) http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2012-Oct-10 19:41 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Wed, Oct 10, 2012 at 01:38:53PM -0600, Richard W.M. Jones wrote:> On Wed, Oct 10, 2012 at 08:38:08AM -0400, Chris Mason wrote: > > On Tue, Oct 09, 2012 at 03:00:12AM -0600, David Sterba wrote: > > > On Tue, Oct 09, 2012 at 08:33:57AM +0100, Richard W.M. Jones wrote: > > > > On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote: > > > > > On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote: > > > > > > Ok, what''s a rough idea of the mainline git equiv of the buggy kernel? > > > > > > > > > > On my local machine, I''m reproducing this with what Fedora calls > > > > > 3.7.0-0.rc0.git2.4.fc19.x86_64 > > > > > > > > OK, that''s not very helpful is it :-) AFAIK it should be possible > > > > to reproduce this with Linus''s git kernel, but I haven''t proven > > > > that yet. > > > > > > Found the same error message in my logs with master+next: > > > > > > Oct 8 15:07:25 kernel: [13048.856283] device fsid cd15a893-e955-49cc-989c-4fd952a838a6 devid 1 transid 3 /dev/sda9 > > > Oct 8 15:07:25 kernel: [13048.866880] btrfs: disk space caching is enabled > > > Oct 8 15:07:25 kernel: [13048.875767] btrfs: failed to recover relocation > > > Oct 8 15:07:25 kernel: [13048.884662] btrfs: open_ctree failed > > > > > > There are some xfstests that triggered the related bug with stale data, > > > I''m investigating further. > > > > Check your progs, this commit was updated to continue instead of break. > > > > https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commitdiff;h=6eba9002956ac40db87d42fb653a0524dc568810;hp=bc130ecd0260e4ee6ffe07ae43fc90db281a4daa > > > > The original commit triggered those errors during 204. > > It does seem as if adding that commit to btrfs-progs fixes the > original bug I was reporting. As before, my test isn''t very reliable, > so I cannot be 100% sure. I will continue running tests.I didn''t mention that one earlier because the git commit id in your progs version string never had the buggy commit. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard W.M. Jones
2012-Oct-10 19:46 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Wed, Oct 10, 2012 at 03:41:13PM -0400, Chris Mason wrote:> On Wed, Oct 10, 2012 at 01:38:53PM -0600, Richard W.M. Jones wrote: > > On Wed, Oct 10, 2012 at 08:38:08AM -0400, Chris Mason wrote: > > > On Tue, Oct 09, 2012 at 03:00:12AM -0600, David Sterba wrote: > > > > On Tue, Oct 09, 2012 at 08:33:57AM +0100, Richard W.M. Jones wrote: > > > > > On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote: > > > > > > On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote: > > > > > > > Ok, what''s a rough idea of the mainline git equiv of the buggy kernel? > > > > > > > > > > > > On my local machine, I''m reproducing this with what Fedora calls > > > > > > 3.7.0-0.rc0.git2.4.fc19.x86_64 > > > > > > > > > > OK, that''s not very helpful is it :-) AFAIK it should be possible > > > > > to reproduce this with Linus''s git kernel, but I haven''t proven > > > > > that yet. > > > > > > > > Found the same error message in my logs with master+next: > > > > > > > > Oct 8 15:07:25 kernel: [13048.856283] device fsid cd15a893-e955-49cc-989c-4fd952a838a6 devid 1 transid 3 /dev/sda9 > > > > Oct 8 15:07:25 kernel: [13048.866880] btrfs: disk space caching is enabled > > > > Oct 8 15:07:25 kernel: [13048.875767] btrfs: failed to recover relocation > > > > Oct 8 15:07:25 kernel: [13048.884662] btrfs: open_ctree failed > > > > > > > > There are some xfstests that triggered the related bug with stale data, > > > > I''m investigating further. > > > > > > Check your progs, this commit was updated to continue instead of break. > > > > > > https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commitdiff;h=6eba9002956ac40db87d42fb653a0524dc568810;hp=bc130ecd0260e4ee6ffe07ae43fc90db281a4daa > > > > > > The original commit triggered those errors during 204. > > > > It does seem as if adding that commit to btrfs-progs fixes the > > original bug I was reporting. As before, my test isn''t very reliable, > > so I cannot be 100% sure. I will continue running tests. > > I didn''t mention that one earlier because the git commit id in your > progs version string never had the buggy commit.The git commit we have in Fedora is 043a639, which is earlier than this commit. In any case it doesn''t matter because I''ve suggested in the Fedora bug that we upgrade to the latest git. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard W.M. Jones
2012-Oct-11 07:28 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
Well the bad news is that the bug happened again overnight, even though we were definitely using btrfs-progs with the 6eba90029 patch added, _and_ it was doing a sync + fsync between the mkfs and the mount. Here is the log: modprobe btrfs [ 15.716610] Btrfs loaded grep ^[[:space:]]*btrfs$ /proc/filesystems mkfs.btrfs /dev/sda1 /dev/sdb1 [ 16.656467] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 1 /dev/sda1 [ 16.657467] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 1 /dev/sda1 [ 17.227381] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 2 transid 3 /dev/sdb1 fsync /dev/sda fsync /dev/sdb fsync /dev/sdc fsync /dev/sdd libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 3d | 00 00 00 01 | 00 12 34 04 | ... libguestfs: trace: mkfs_btrfs = 0 libguestfs: trace: mount "/dev/sda1" "/" libguestfs: send_to_daemon: 68 bytes: 00 00 00 40 | 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 00 | ... guestfsd: main_loop: proc 317 (mkfs_btrfs) took 2.22 seconds guestfsd: main_loop: new request, len 0x40 mount -o /dev/sda1 /sysroot/ [ 17.512337] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 4 /dev/sda1 [ 17.758300] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 2 transid 4 /dev/sdb1 [ 17.857285] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 4 /dev/sda1 [ 17.893279] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 4 /dev/sda1 [ 17.909277] btrfs: disk space caching is enabled [ 17.943272] btrfs bad tree block start 0 135168 [ 17.955270] btrfs: open_ctree failed mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so guestfsd: error: /dev/sda1 on / (options: ''''): mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so libguestfs: recv_from_daemon: 284 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 01 | 00 12 34 05 | ... libguestfs: trace: mount = -1 (error) http://kojipkgs.fedoraproject.org//work/tasks/9775/4579775/build.log Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones New in Fedora 11: Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 70 libraries supprt''d http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2012-Oct-11 11:26 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Thu, Oct 11, 2012 at 01:28:21AM -0600, Richard W.M. Jones wrote:> Well the bad news is that the bug happened again overnight, even > though we were definitely using btrfs-progs with the 6eba90029 patch > added, _and_ it was doing a sync + fsync between the mkfs and the > mount.This is good just because it makes the most sense. The only thing worse than a bug is a bug that disappears for the wrong reasons ;)> > Here is the log: > [ 17.943272] btrfs bad tree block start 0 135168 > [ 17.955270] btrfs: open_ctree failedThis is also good because it really points to the invalidate. You''ve got zeros where we wrote 135168, and pretty much the only way to get zeros on a disk block is if the kernel did a memset. Sure some app could have written the zeros there, but that block offset is unlikely to get allocated as a data block by the other filesystems. So, I''ll go back to the invalidate code ;) -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard W.M. Jones
2012-Oct-29 14:52 UTC
Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
On Thu, Oct 11, 2012 at 07:26:28AM -0400, Chris Mason wrote:> On Thu, Oct 11, 2012 at 01:28:21AM -0600, Richard W.M. Jones wrote: > > Well the bad news is that the bug happened again overnight, even > > though we were definitely using btrfs-progs with the 6eba90029 patch > > added, _and_ it was doing a sync + fsync between the mkfs and the > > mount. > > This is good just because it makes the most sense. The only thing worse > than a bug is a bug that disappears for the wrong reasons ;) > > > > > Here is the log: > > [ 17.943272] btrfs bad tree block start 0 135168 > > [ 17.955270] btrfs: open_ctree failed > > This is also good because it really points to the invalidate. You''ve > got zeros where we wrote 135168, and pretty much the only way to get > zeros on a disk block is if the kernel did a memset. Sure some app > could have written the zeros there, but that block offset is unlikely to > get allocated as a data block by the other filesystems. > > So, I''ll go back to the invalidate code ;)Any luck on this? It''s still happening in the latest kernels. If there''s anything / patch you want me to try, let me know. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#) http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html