Btrfs has been broken for me for ages. I first reported it on this list 5 months ago[1]. Below is a very simple reproducer that anyone can run. *NB* before you run this, adjust /dev/sda & /dev/sda1 to point to an unused block device! ---------------------------------------------------------------------- #!/bin/sh - set -e while true; do parted -s -- /dev/sda mklabel msdos parted -s -- /dev/sda mkpart primary 64s -64s wipefs -a /dev/sda1 mkfs.btrfs --label TEST /dev/sda1 mount /dev/sda1 /sysroot touch /sysroot/foo mkdir /sysroot/bar umount /sysroot done ---------------------------------------------------------------------- On the latest 3.8.0 kernel, this fails immediately (at the mount), and on 3.7.x it usually fails after a very few iterations. I see a variety of errors, but the latest kernel error is: [ 8.474934] device label ROOT devid 1 transid 2 /dev/sda2 [ 8.570619] device label ROOT devid 1 transid 2 /dev/sda2 [ 8.581891] btrfs: disk space caching is enabled [ 8.594146] btrfs bad tree block start 0 4194304 [ 8.595144] btrfs: failed to read tree root on sda2 [ 8.605308] btrfs: open_ctree failed I would really like btrfs to work. What can I do? Rich. [1] http://article.gmane.org/gmane.comp.file-systems.btrfs/20257 -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 12, 2013 at 11:54:49AM -0700, Richard W.M. Jones wrote:> Btrfs has been broken for me for ages. I first reported it on this > list 5 months ago[1]. Below is a very simple reproducer that anyone > can run. > > *NB* before you run this, adjust /dev/sda & /dev/sda1 to point to an > unused block device! > > ---------------------------------------------------------------------- > #!/bin/sh - > set -e > while true; do > parted -s -- /dev/sda mklabel msdos > parted -s -- /dev/sda mkpart primary 64s -64s > wipefs -a /dev/sda1 > mkfs.btrfs --label TEST /dev/sda1 > mount /dev/sda1 /sysroot > touch /sysroot/foo > mkdir /sysroot/bar > umount /sysroot > done > ---------------------------------------------------------------------- > > On the latest 3.8.0 kernel, this fails immediately (at the mount), and > on 3.7.x it usually fails after a very few iterations. I see a > variety of errors, but the latest kernel error is: > > [ 8.474934] device label ROOT devid 1 transid 2 /dev/sda2 > [ 8.570619] device label ROOT devid 1 transid 2 /dev/sda2 > [ 8.581891] btrfs: disk space caching is enabled > [ 8.594146] btrfs bad tree block start 0 4194304 > [ 8.595144] btrfs: failed to read tree root on sda2 > [ 8.605308] btrfs: open_ctree failed > > I would really like btrfs to work. What can I do?Hi Rich, Can you try the btrfs-progs raid56-experimental branch. It has this patch which was fixing things for me: https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commit;h=8fe354744cd7b5c4f7a3314dcdbb5095192a032f I''m not 100% sure I''ve reproduced your exact problem, but I hope this is it. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 12, 2013 at 11:54:49AM -0700, Richard W.M. Jones wrote:> Btrfs has been broken for me for ages. I first reported it on this > list 5 months ago[1]. Below is a very simple reproducer that anyone > can run. > > *NB* before you run this, adjust /dev/sda & /dev/sda1 to point to an > unused block device! > > ---------------------------------------------------------------------- > #!/bin/sh - > set -e > while true; do > parted -s -- /dev/sda mklabel msdos > parted -s -- /dev/sda mkpart primary 64s -64s > wipefs -a /dev/sda1 > mkfs.btrfs --label TEST /dev/sda1 > mount /dev/sda1 /sysroot > touch /sysroot/foo > mkdir /sysroot/bar > umount /sysroot > done > ---------------------------------------------------------------------- > > On the latest 3.8.0 kernel, this fails immediately (at the mount), and > on 3.7.x it usually fails after a very few iterations. I see a > variety of errors, but the latest kernel error is: > > [ 8.474934] device label ROOT devid 1 transid 2 /dev/sda2 > [ 8.570619] device label ROOT devid 1 transid 2 /dev/sda2 > [ 8.581891] btrfs: disk space caching is enabled > [ 8.594146] btrfs bad tree block start 0 4194304 > [ 8.595144] btrfs: failed to read tree root on sda2 > [ 8.605308] btrfs: open_ctree failed > > I would really like btrfs to work. What can I do?Been running this in a loop for 20 minutes with no issues, is this in a virt guest or something? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 12, 2013 at 06:54:49PM +0000, Richard W.M. Jones wrote:> Btrfs has been broken for me for ages. I first reported it on this > list 5 months ago[1]. Below is a very simple reproducer that anyone > can run.The very simple reproducer doesn''t fail over here on bare hardware for me. # dmesg | grep -c ''device label TEST'' 808 (keeps it spinning)> *NB* before you run this, adjust /dev/sda & /dev/sda1 to point to an > unused block device!What block devices are you using? I ask because how it implements caching affects this test. The theory, as I understand it, is that btrfs is issuing bio reads that don''t see the cached writes from mkfs. You''d never see this bug on loopback because it serves bio reads from the cache that mkfs wrote to. I''m not seeing it on hardware because the filemap_write_and_wait() that btrfs does in the kernel on mount is syncing the cached writes. The bio reads then get the mkfs data from disk. If your block device has a cache that doesn''t sync with btrfs calls this, you''d see this problem.. but.. that''d be strange indeed. Chris'' test patch to sync from mkfs would probably help, but you''d still see the problem if, say, you just wrote a btrfs image to the block device and reasonably expected the kernel to find it on mount. So it''d be nice to know what devices you''re using. Maybe some nutty in-guest virt passthrough thing? - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 12 Feb 2013 18:54:49 +0000 "Richard W.M. Jones" <rjones@redhat.com> wrote:> Btrfs has been broken for me for ages. I first reported it on this > list 5 months ago[1]. Below is a very simple reproducer that anyone > can run. > > *NB* before you run this, adjust /dev/sda & /dev/sda1 to point to an > unused block device!I might be pointing out the most obvious here, but where does /dev/sda2 come from? Some more ideas for you to try dd if=/dev/zero of=/dev/sda bs=1M count=1 blockdev --rereadpt /dev/sda and then proceed again with your test loop.> ---------------------------------------------------------------------- > #!/bin/sh - > set -e > while true; do > parted -s -- /dev/sda mklabel msdos > parted -s -- /dev/sda mkpart primary 64s -64s > wipefs -a /dev/sda1 > mkfs.btrfs --label TEST /dev/sda1 > mount /dev/sda1 /sysroot^^^^^^^^^^^^^^^^^^^^^^ sda1> touch /sysroot/foo > mkdir /sysroot/bar > umount /sysroot > done > ---------------------------------------------------------------------- > > On the latest 3.8.0 kernel, this fails immediately (at the mount), and > on 3.7.x it usually fails after a very few iterations. I see a > variety of errors, but the latest kernel error is: > > [ 8.474934] device label ROOT devid 1 transid 2 /dev/sda2 > [ 8.570619] device label ROOT devid 1 transid 2 /dev/sda2^^^^ sda2?> [ 8.581891] btrfs: disk space caching is enabled > [ 8.594146] btrfs bad tree block start 0 4194304 > [ 8.595144] btrfs: failed to read tree root on sda2 > [ 8.605308] btrfs: open_ctree failed > > I would really like btrfs to work. What can I do? > > Rich. > > [1] http://article.gmane.org/gmane.comp.file-systems.btrfs/20257 >-- With respect, Roman
On Tue, Feb 12, 2013 at 02:16:37PM -0500, Josef Bacik wrote:> On Tue, Feb 12, 2013 at 11:54:49AM -0700, Richard W.M. Jones wrote: > > Btrfs has been broken for me for ages. I first reported it on this > > list 5 months ago[1]. Below is a very simple reproducer that anyone > > can run. > > > > *NB* before you run this, adjust /dev/sda & /dev/sda1 to point to an > > unused block device! > > > > ---------------------------------------------------------------------- > > #!/bin/sh - > > set -e > > while true; do > > parted -s -- /dev/sda mklabel msdos > > parted -s -- /dev/sda mkpart primary 64s -64s > > wipefs -a /dev/sda1 > > mkfs.btrfs --label TEST /dev/sda1 > > mount /dev/sda1 /sysroot > > touch /sysroot/foo > > mkdir /sysroot/bar > > umount /sysroot > > done > > ---------------------------------------------------------------------- > > > > On the latest 3.8.0 kernel, this fails immediately (at the mount), and > > on 3.7.x it usually fails after a very few iterations. I see a > > variety of errors, but the latest kernel error is: > > > > [ 8.474934] device label ROOT devid 1 transid 2 /dev/sda2 > > [ 8.570619] device label ROOT devid 1 transid 2 /dev/sda2 > > [ 8.581891] btrfs: disk space caching is enabled > > [ 8.594146] btrfs bad tree block start 0 4194304 > > [ 8.595144] btrfs: failed to read tree root on sda2 > > [ 8.605308] btrfs: open_ctree failed > > > > I would really like btrfs to work. What can I do? > > Been running this in a loop for 20 minutes with no issues, is this in a virt > guest or something? Thanks,Yes, this is inside a very recent KVM (qemu 1.3.0), using virtio-scsi as the backing disk. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is ''top'' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 12, 2013 at 11:44:56AM -0800, Zach Brown wrote:> On Tue, Feb 12, 2013 at 06:54:49PM +0000, Richard W.M. Jones wrote: > > Btrfs has been broken for me for ages. I first reported it on this > > list 5 months ago[1]. Below is a very simple reproducer that anyone > > can run. > > The very simple reproducer doesn''t fail over here on bare hardware for > me. > > # dmesg | grep -c ''device label TEST'' > 808 > > (keeps it spinning) > > > *NB* before you run this, adjust /dev/sda & /dev/sda1 to point to an > > unused block device! > > What block devices are you using? I ask because how it implements > caching affects this test. > > The theory, as I understand it, is that btrfs is issuing bio reads that > don''t see the cached writes from mkfs. > > You''d never see this bug on loopback because it serves bio reads from > the cache that mkfs wrote to. > > I''m not seeing it on hardware because the filemap_write_and_wait() that > btrfs does in the kernel on mount is syncing the cached writes. The bio > reads then get the mkfs data from disk. > > If your block device has a cache that doesn''t sync with btrfs calls > this, you''d see this problem.. but.. that''d be strange indeed. Chris'' > test patch to sync from mkfs would probably help, but you''d still see > the problem if, say, you just wrote a btrfs image to the block device > and reasonably expected the kernel to find it on mount. > > So it''d be nice to know what devices you''re using. Maybe some nutty > in-guest virt passthrough thing?It''s virtio-scsi in a KVM (qemu 1.3.0) guest. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 12, 2013 at 02:05:35PM -0700, Richard W.M. Jones wrote:> > Yes, this is inside a very recent KVM (qemu 1.3.0), using virtio-scsi > as the backing disk.Ok, can you please run this on your virtio device file? It will overwrite the first 256K, so don''t do this on a file you care about. gcc -Wall -o vtest vtest.c ./vtest /dev/xxx I''ve attached vtest.c and gzip''d it just to make sure no mailers mess with my pretty code. -chris
On Tue, Feb 12, 2013 at 04:42:25PM -0500, Chris Mason wrote:> On Tue, Feb 12, 2013 at 02:05:35PM -0700, Richard W.M. Jones wrote: > > > > Yes, this is inside a very recent KVM (qemu 1.3.0), using virtio-scsi > > as the backing disk. > > Ok, can you please run this on your virtio device file? It will > overwrite the first 256K, so don''t do this on a file you care about. > > gcc -Wall -o vtest vtest.c > > ./vtest /dev/xxx > > I''ve attached vtest.c and gzip''d it just to make sure no mailers mess > with my pretty code.The output of this is: writing to /dev/sda and nothing else. Since it seems to be intended to be an infinite loop, I left it running for a five minutes before killing it. Will try the btrfsprogs patch next. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is ''top'' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 13, 2013 at 11:00:33AM +0000, Richard W.M. Jones wrote:> Will try the btrfsprogs patch next.I applied this patch: https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commitdiff_plain;h=8fe354744cd7b5c4f7a3314dcdbb5095192a032f to the version of btrfs-progs in Fedora Rawhide (currently "0.20.rc1.20121017git91d9eec"). Then I ran the simple reproducer, and the full libguestfs test suite on two machines. This does appear to fix the problem for me, but only on Rawhide. On Fedora 18 which has an older kernel, the patch does not fix the problem (same errors as before). Rawhide kernel: kernel-3.8.0-0.rc7.git0.1.fc19.x86_64 Fedora 18 kernel: kernel-3.7.6-201.fc18.x86_64 Anyway, it''s an improvement so I''ll make sure the patch is added to the Rawhide btrfs-progs package. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is ''top'' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 13, 2013 at 06:10:44AM -0700, Richard W.M. Jones wrote:> On Wed, Feb 13, 2013 at 11:00:33AM +0000, Richard W.M. Jones wrote: > > Will try the btrfsprogs patch next. > > I applied this patch: > > https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commitdiff_plain;h=8fe354744cd7b5c4f7a3314dcdbb5095192a032f > > to the version of btrfs-progs in Fedora Rawhide (currently > "0.20.rc1.20121017git91d9eec"). Then I ran the simple reproducer, and > the full libguestfs test suite on two machines. > > This does appear to fix the problem for me, but only on Rawhide. > > On Fedora 18 which has an older kernel, the patch does not fix the > problem (same errors as before). > > Rawhide kernel: kernel-3.8.0-0.rc7.git0.1.fc19.x86_64 > > Fedora 18 kernel: kernel-3.7.6-201.fc18.x86_64 > > Anyway, it''s an improvement so I''ll make sure the patch is added to > the Rawhide btrfs-progs package.Ok, the patch is more of a bandaid, but between running my vtest program and the patch helping, we''re clearly not clearing caches properly (somehow). I''ll take another stab at fixing this on the kernel side. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 13, 2013 at 06:31:37AM -0700, Chris Mason wrote:> On Wed, Feb 13, 2013 at 06:10:44AM -0700, Richard W.M. Jones wrote: > > On Wed, Feb 13, 2013 at 11:00:33AM +0000, Richard W.M. Jones wrote: > > > Will try the btrfsprogs patch next. > > > > I applied this patch: > > > > https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commitdiff_plain;h=8fe354744cd7b5c4f7a3314dcdbb5095192a032f > > > > to the version of btrfs-progs in Fedora Rawhide (currently > > "0.20.rc1.20121017git91d9eec"). Then I ran the simple reproducer, and > > the full libguestfs test suite on two machines. > > > > This does appear to fix the problem for me, but only on Rawhide. > > > > On Fedora 18 which has an older kernel, the patch does not fix the > > problem (same errors as before). > > > > Rawhide kernel: kernel-3.8.0-0.rc7.git0.1.fc19.x86_64 > > > > Fedora 18 kernel: kernel-3.7.6-201.fc18.x86_64 > > > > Anyway, it''s an improvement so I''ll make sure the patch is added to > > the Rawhide btrfs-progs package. > > Ok, the patch is more of a bandaid, but between running my vtest program > and the patch helping, we''re clearly not clearing caches properly > (somehow). I''ll take another stab at fixing this on the kernel side.Looks like the real problem is the udev event to register a new btrfs filesystem is racing in while we are still in mkfs. If you disable the udev rule, the problem goes away (at least for me). It''s not a udev bug though, our btrfs scanning function is improperly calling set_blocksize during a simple scan. That''s only legal when we are in mount and can''t be racing with writes to the device. Dave Sterba cooked up a fix for that, I''m testing it here. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html