Hi all, and thanks a lot for your work. Well, I''m using my home with BTRFS. It''s a Ext4 converted to BTRFS via btrfs-convert. Everything works good with stock Ubuntu 11.04 kernel (2.6.38), vanilla 2.6.38 and vanilla 2.6.39. If I use Linus'' git tree, BTRFS ooops at mount. So I bisected using kernel version 2.6.39 + latest for-linus branch. Bisect complains about this commit: 581bb050941b4f220f84d3e5ed6dace3d42dd382 is the first bad commit commit 581bb050941b4f220f84d3e5ed6dace3d42dd382 Author: Li Zefan <lizf@cn.fujitsu.com> Date: Wed Apr 20 10:06:11 2011 +0800 Btrfs: Cache free inode numbers in memory And bisect log is this: git bisect start # bad: [174ba50915b08dcfd07c8b5fb795b46a165fa09a] Btrfs: use the device_list_mutex during write_dev_supers git bisect bad 174ba50915b08dcfd07c8b5fb795b46a165fa09a # good: [61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf] Linux 2.6.39 git bisect good 61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf # bad: [aa2dfb372a2a647beedac163ce6f8b0fcbefac29] Merge branch ''allocator'' of git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne into inode_numbers git bisect bad aa2dfb372a2a647beedac163ce6f8b0fcbefac29 # good: [7a36ddec1003a4e84e79f28ee714a142ed6bc529] btrfs: use printk_ratelimited instead of printk_ratelimit git bisect good 7a36ddec1003a4e84e79f28ee714a142ed6bc529 # bad: [0965537308ac3b267ea16e731bd73870a51c53b8] Merge branch ''ino-alloc'' of git://repo.or.cz/linux-btrfs-devel into inode_numbers git bisect bad 0965537308ac3b267ea16e731bd73870a51c53b8 # bad: [581bb050941b4f220f84d3e5ed6dace3d42dd382] Btrfs: Cache free inode numbers in memory git bisect bad 581bb050941b4f220f84d3e5ed6dace3d42dd382 # good: [f38b6e754d8cc4605ac21d9c1094d569d88b163b] Btrfs: Use bitmap_set/clear() git bisect good f38b6e754d8cc4605ac21d9c1094d569d88b163b # good: [34d52cb6c50b5a43901709998f59fb1c5a43dc4a] Btrfs: Make free space cache code generic git bisect good 34d52cb6c50b5a43901709998f59fb1c5a43dc4a I can see two kind of problems, with different commit, of course. Sometimes the Ooops happens just as kernel mounts the partition, sometimes the mount is good, but HD keeps reading for more than 30 seconds, and the it Ooops. Also, you can read but you can''t write, meanwhile. In attachment my config. I have photos of the Ooops, but right now I can''t take ''em from the phone... But, maybe, you already knew and solved the problem. Anyway, if you need much more details, just tell me. Thanks a lot for your time, Andrea -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Excerpts from Andrea Gelmini''s message of 2011-05-28 13:05:47 -0400:> Hi all, > and thanks a lot for your work. > Well, I''m using my home with BTRFS. It''s a Ext4 converted to BTRFS > via btrfs-convert. > Everything works good with stock Ubuntu 11.04 kernel (2.6.38), > vanilla 2.6.38 and vanilla 2.6.39. > If I use Linus'' git tree, BTRFS ooops at mount. > So I bisected using kernel version 2.6.39 + latest for-linus branch.Thanks, could you please send in the photos of the oops when you get chance. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Sat, May 28, 2011 at 07:05:47PM +0200, Andrea Gelmini wrote:> Everything works good with stock Ubuntu 11.04 kernel (2.6.38), > vanilla 2.6.38 and vanilla 2.6.39. > If I use Linus'' git tree, BTRFS ooops at mount.can you please attach the oops traces?> So I bisected using kernel version 2.6.39 + latest for-linus branch. > Bisect complains about this commit: > 581bb050941b4f220f84d3e5ed6dace3d42dd382 is the first bad commit > commit 581bb050941b4f220f84d3e5ed6dace3d42dd382 > Author: Li Zefan <lizf@cn.fujitsu.com> > Date: Wed Apr 20 10:06:11 2011 +0800 > > Btrfs: Cache free inode numbers in memorythis patch was part of the new ino allocator and it may depend on subsequent patches (eg. 33345d015 "Btrfs: Always use 64bit inode number"). In this case it could be a 32/64 bit mismatch in inode numbers and blame would point to a incomplete state wrt the filesystem. You''ve created your FS from ext4, I think that the filesystem has 64bit inode numbers, allocated to files and this got broken during the conversion. (just a wild idea)> I can see two kind of problems, with different commit, of course. > Sometimes the Ooops happens just as kernel mounts the partition, > sometimes the mount is good, but HD keeps reading for more than 30 > seconds, and the it Ooops.This would mean something''s broken during transaction commit.> Also, you can read but you can''t write, meanwhile. > > In attachment my config.No attachment, but not needed IMHO.> I have photos of the Ooops, but right now I can''t take ''em from the phone...Would really help if you can :) david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba wrote:> Hi, > > On Sat, May 28, 2011 at 07:05:47PM +0200, Andrea Gelmini wrote: >> Everything works good with stock Ubuntu 11.04 kernel (2.6.38), >> vanilla 2.6.38 and vanilla 2.6.39. >> If I use Linus'' git tree, BTRFS ooops at mount. > > can you please attach the oops traces? > >> So I bisected using kernel version 2.6.39 + latest for-linus branch. >> Bisect complains about this commit: >> 581bb050941b4f220f84d3e5ed6dace3d42dd382 is the first bad commit >> commit 581bb050941b4f220f84d3e5ed6dace3d42dd382 >> Author: Li Zefan <lizf@cn.fujitsu.com> >> Date: Wed Apr 20 10:06:11 2011 +0800 >> >> Btrfs: Cache free inode numbers in memory > > this patch was part of the new ino allocator and it may depend > on subsequent patches (eg. 33345d015 "Btrfs: Always use > 64bit inode number"). In this case it could be a 32/64 bit mismatch in > inode numbers and blame would point to a incomplete state wrt the > filesystem. >the bug probably not caused by this.> You''ve created your FS from ext4, I think that the filesystem has > 64bit inode numbers, allocated to files and this got broken during the > conversion. (just a wild idea) > >> I can see two kind of problems, with different commit, of course. >> Sometimes the Ooops happens just as kernel mounts the partition,just mount the partition, and then no other fs operations? if so, the patch you bisected down actually won''t take effect.>> sometimes the mount is good, but HD keeps reading for more than 30 >> seconds, and the it Ooops. > > This would mean something''s broken during transaction commit. > >> Also, you can read but you can''t write, meanwhile. >> >> In attachment my config. > > No attachment, but not needed IMHO. > >> I have photos of the Ooops, but right now I can''t take ''em from the phone... > > Would really help if you can :) >right. and thanks for the bug report! btw, I''ll be off till 6.5, so this week I probably won''t be able to take care of this.. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2011/5/29 Chris Mason <chris.mason@oracle.com>:> Thanks, could you please send in the photos of the oops when you get > chance.Well, I retested everything compiling with frame pointers, so: a) partition is mounted with this flags: defaults,ssd,noacl,space_cache (at the beginning I also used compress); b) vanilla kernel .38 and .39 are working good; c) latest Linus tree (commit: bd1bfe40ac6bdf9593da29b822bc301b77a97d6a the one before 3.0-rc1, so in the photos you can find it as .39g+), it goes up, but after a while of intense i/o working thread (it''s a specific kernel thread of btrfs, I guess btrfs-ino-cache, but I could be wrong) the system freeze. Well, if i/o keep working enough time, I can even touch and unlink files, or read files already present, or do something like /usr/bin/find; these photos are here: http://ooops.lugbs.linux.it/linusgit d) rebooting with .39 doesn''t work. It crashes at mount time. The photos are here: http://ooops.lugbs.linux.it/2.6.39 e) booting with 2.6.38.7 solves the problem, giving this info: [ 20.273822] Btrfs loaded [ 20.387795] device label home devid 1 transid 4595 /dev/mapper/VG-home [ 20.388269] btrfs: use ssd allocation scheme [ 20.388277] btrfs: enabling disk space caching [ 25.025873] btrfs: unlinked 5 orphans [ 25.025876] btrfs: truncated 3 orphans f) by the way, bisect.jpg is the photo I took when I sent first email. These photos are terrible, but I guess they''re good enough to read ''em. Anyway, these are multiple shoots of same screen, of course. Thanks a lot for your time, Andrea -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Excerpts from Andrea Gelmini''s message of 2011-05-30 06:13:47 -0400:> 2011/5/29 Chris Mason <chris.mason@oracle.com>: > > Thanks, could you please send in the photos of the oops when you get > > chance. > > Well, I retested everything compiling with frame pointers, so: > a) partition is mounted with this flags: > defaults,ssd,noacl,space_cache (at the beginning I also used > compress); > b) vanilla kernel .38 and .39 are working good; > c) latest Linus tree (commit: bd1bfe40ac6bdf9593da29b822bc301b77a97d6a > the one before 3.0-rc1, > so in the photos you can find it as .39g+), it goes up, but after a > while of intense i/o working thread (it''s a specific > kernel thread of btrfs, I guess btrfs-ino-cache, but I could be > wrong) the system freeze. Well, if i/o keep working enough time, > I can even touch and unlink files, or read files already present, > or do something like /usr/bin/find; these > photos are here: http://ooops.lugbs.linux.it/linusgit > d) rebooting with .39 doesn''t work. It crashes at mount time. > The photos are here: http://ooops.lugbs.linux.it/2.6.39 > e) booting with 2.6.38.7 solves the problem, giving this info: > [ 20.273822] Btrfs loaded > [ 20.387795] device label home devid 1 transid 4595 /dev/mapper/VG-home > [ 20.388269] btrfs: use ssd allocation scheme > [ 20.388277] btrfs: enabling disk space caching > [ 25.025873] btrfs: unlinked 5 orphans > [ 25.025876] btrfs: truncated 3 orphans > f) by the way, bisect.jpg is the photo I took when I sent first email. > > These photos are terrible, but I guess they''re good enough to read ''em. > Anyway, these are multiple shoots of same screen, of course.These are perfect, thank you. We''re failing to write out the inode cache. Since you''re on a 32 bit machine, I''m guessing that we failed to kmap something properly. Could you please do gdb fs/btrfs/btrfs.ko, and then at the gdb prompt: gdb> list *__btrfs_write_out_cache+0x43a And send the output here? This corresponds to where you were crashing in the kernel you oops in your linusgit directory. If this doesn''t work, you might need to recompile with CONFIG_DEBUG_INFO=y. You won''t need to trigger the crash again, just do the gdb command on the new .ko. If you don''t have btrfs compiled as a module, use gdb vmlinux instead of gdb fs/btrfs/btrfs.ko -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2011/5/30 Chris Mason <chris.mason@oracle.com>:> These are perfect, thank you. We''re failing to write out the inode > cache. Since you''re on a 32 bit machine, I''m guessing that we failed to > kmap something properly.Thanks a lot for detailed info. I recompiled, and get this: gelma@dell:~$ gdb /lib/modules/3.0.0-rc1/kernel/fs/btrfs/* GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2 Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /lib/modules/3.0.0-rc1/kernel/fs/btrfs/btrfs.ko...done. (gdb) list *__btrfs_write_out_cache+0x43a 0x5fada is in __btrfs_write_out_cache (fs/btrfs/free-space-cache.c:676). 671 struct btrfs_free_space *e; 672 673 e = rb_entry(node, struct btrfs_free_space, offset_index); 674 entries++; 675 676 entry->offset = cpu_to_le64(e->offset); 677 entry->bytes = cpu_to_le64(e->bytes); 678 if (e->bitmap) { 679 entry->type = BTRFS_FREE_SPACE_BITMAP; 680 list_add_tail(&e->list, &bitmap_list); (gdb) Thanks a lot for your quick answer, Andrea -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2011/5/29 Chris Mason <chris.mason@oracle.com>:> Thanks, could you please send in the photos of the oops when you get > chance.By the way, switching from 2.6.38.7 to 2.6.39, I have a lot of this messages: [ 140.297248] block group 1107296256 has an wrong amount of free space [ 140.848435] block group 8623489024 has an wrong amount of free space [ 140.879178] block group 17213423616 has an wrong amount of free space [ 140.910181] block group 24729616384 has an wrong amount of free space [ 140.937690] block group 33319550976 has an wrong amount of free space [ 140.971150] block group 40835743744 has an wrong amount of free space [ 141.000816] block group 49425678336 has an wrong amount of free space [ 141.027175] block group 56941871104 has an wrong amount of free space [ 141.057614] block group 65531805696 has an wrong amount of free space [ 141.088269] block group 73047998464 has an wrong amount of free space [ 141.124767] block group 81637933056 has an wrong amount of free space [ 141.156891] block group 97744060416 has an wrong amount of free space [ 141.190143] block group 121366380544 has an wrong amount of free space [ 141.219235] block group 129956315136 has an wrong amount of free space It also happens with 2.6.38.7, but lot less. Should I worry? Thanks again, Andrea -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Excerpts from Andrea Gelmini''s message of 2011-05-30 07:59:30 -0400:> 2011/5/30 Chris Mason <chris.mason@oracle.com>: > > These are perfect, thank you. Â We''re failing to write out the inode > > cache. Â Since you''re on a 32 bit machine, I''m guessing that we failed to > > kmap something properly. > > Thanks a lot for detailed info. > I recompiled, and get this: > gelma@dell:~$ gdb /lib/modules/3.0.0-rc1/kernel/fs/btrfs/* > GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2 > Copyright (C) 2010 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "i686-linux-gnu". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>... > Reading symbols from /lib/modules/3.0.0-rc1/kernel/fs/btrfs/btrfs.ko...done. > (gdb) list *__btrfs_write_out_cache+0x43a > 0x5fada is in __btrfs_write_out_cache (fs/btrfs/free-space-cache.c:676). > 671 struct btrfs_free_space *e; > 672 > 673 e = rb_entry(node, struct btrfs_free_space, offset_index); > 674 entries++; > 675 > 676 entry->offset = cpu_to_le64(e->offset); > 677 entry->bytes = cpu_to_le64(e->bytes); > 678 if (e->bitmap) { > 679 entry->type = BTRFS_FREE_SPACE_BITMAP; > 680 list_add_tail(&e->list, &bitmap_list); > (gdb)Ok, so I think we''re blowing past the end of the page we''ve kmap''d. But I don''t think that can happen without something like the patch below triggering: Josef, what do you think? diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 70d4579..a95b72e 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -596,6 +596,11 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode, */ first_page_offset = (sizeof(u32) * num_pages) + sizeof(u64); + if (first_page_offset + sizeof(struct btrfs_free_space_entry) >= PAGE_CACHE_SIZE) { + printk(KERN_CRIT "bad first page offset %lu\n", first_page_offset); + BUG(); + } + /* Get the cluster for this block_group if it exists */ if (block_group && !list_empty(&block_group->cluster_list)) cluster = list_entry(block_group->cluster_list.next, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2011/5/30 Chris Mason <chris.mason@oracle.com>:> Ok, so I think we''re blowing past the end of the page we''ve kmap''d. But > I don''t think that can happen without something like the patch below > triggering:Quick update: after rm of ~10 GB of data, I rebooted with Linus'' latest git tree, and it works (after some minutes of btrfs-ino-cache). Ciao, Andrea -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html