thr3ads.net - Btrfs devel - Problem with latest for-linus branch [May 2011]

If this information is useful, please help other people find it:
Share via:

Andrea Gelmini

2011-May-28 17:05 UTC

Problem with latest for-linus branch

Hi all,
   and thanks a lot for your work.
   Well, I''m using my home with BTRFS. It''s a Ext4 converted
to BTRFS
via btrfs-convert.
   Everything works good with stock Ubuntu 11.04 kernel (2.6.38),
vanilla 2.6.38 and vanilla 2.6.39.
   If I use Linus'' git tree, BTRFS ooops at mount.
   So I bisected using kernel version 2.6.39 + latest for-linus branch.
   Bisect complains about this commit:
581bb050941b4f220f84d3e5ed6dace3d42dd382 is the first bad commit
commit 581bb050941b4f220f84d3e5ed6dace3d42dd382
Author: Li Zefan <lizf@cn.fujitsu.com>
Date:   Wed Apr 20 10:06:11 2011 +0800

    Btrfs: Cache free inode numbers in memory

   And bisect log is this:
git bisect start
# bad: [174ba50915b08dcfd07c8b5fb795b46a165fa09a] Btrfs: use the
device_list_mutex during write_dev_supers
git bisect bad 174ba50915b08dcfd07c8b5fb795b46a165fa09a
# good: [61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf] Linux 2.6.39
git bisect good 61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf
# bad: [aa2dfb372a2a647beedac163ce6f8b0fcbefac29] Merge branch
''allocator'' of
git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne
into inode_numbers
git bisect bad aa2dfb372a2a647beedac163ce6f8b0fcbefac29
# good: [7a36ddec1003a4e84e79f28ee714a142ed6bc529] btrfs: use
printk_ratelimited instead of printk_ratelimit
git bisect good 7a36ddec1003a4e84e79f28ee714a142ed6bc529
# bad: [0965537308ac3b267ea16e731bd73870a51c53b8] Merge branch
''ino-alloc'' of git://repo.or.cz/linux-btrfs-devel into
inode_numbers
git bisect bad 0965537308ac3b267ea16e731bd73870a51c53b8
# bad: [581bb050941b4f220f84d3e5ed6dace3d42dd382] Btrfs: Cache free
inode numbers in memory
git bisect bad 581bb050941b4f220f84d3e5ed6dace3d42dd382
# good: [f38b6e754d8cc4605ac21d9c1094d569d88b163b] Btrfs: Use bitmap_set/clear()
git bisect good f38b6e754d8cc4605ac21d9c1094d569d88b163b
# good: [34d52cb6c50b5a43901709998f59fb1c5a43dc4a] Btrfs: Make free
space cache code generic
git bisect good 34d52cb6c50b5a43901709998f59fb1c5a43dc4a

  I can see two kind of problems, with different commit, of course.
  Sometimes the Ooops happens just as kernel mounts the partition,
sometimes the mount is good, but HD keeps reading for more than 30
seconds, and the it Ooops.
  Also, you can read but you can''t write, meanwhile.

In attachment my config.

I have photos of the Ooops, but right now I can''t take ''em
from the phone...
But, maybe, you already knew and solved the problem.
Anyway, if you need much more details, just tell me.

Thanks a lot for your time,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2011-May-28 22:14 UTC

head link

Re: Problem with latest for-linus branch

Excerpts from Andrea Gelmini''s message of 2011-05-28 13:05:47
-0400:> Hi all,
>    and thanks a lot for your work.
>    Well, I''m using my home with BTRFS. It''s a Ext4
converted to BTRFS
> via btrfs-convert.
>    Everything works good with stock Ubuntu 11.04 kernel (2.6.38),
> vanilla 2.6.38 and vanilla 2.6.39.
>    If I use Linus'' git tree, BTRFS ooops at mount.
>    So I bisected using kernel version 2.6.39 + latest for-linus branch.
Thanks, could you please send in the photos of the oops when you get
chance.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2011-May-28 22:40 UTC

head link

Re: Problem with latest for-linus branch

Hi,

On Sat, May 28, 2011 at 07:05:47PM +0200, Andrea Gelmini
wrote:>    Everything works good with stock Ubuntu 11.04 kernel (2.6.38),
> vanilla 2.6.38 and vanilla 2.6.39.
>    If I use Linus'' git tree, BTRFS ooops at mount.
can you please attach the oops traces?
>    So I bisected using kernel version 2.6.39 + latest for-linus branch.
>    Bisect complains about this commit:
> 581bb050941b4f220f84d3e5ed6dace3d42dd382 is the first bad commit
> commit 581bb050941b4f220f84d3e5ed6dace3d42dd382
> Author: Li Zefan <lizf@cn.fujitsu.com>
> Date:   Wed Apr 20 10:06:11 2011 +0800
> 
>     Btrfs: Cache free inode numbers in memory
this patch was part of the new ino allocator and it may depend
on subsequent patches (eg. 33345d015 "Btrfs: Always use
64bit inode number"). In this case it could be a 32/64 bit mismatch in
inode numbers and blame would point to a incomplete state wrt the
filesystem.

You''ve created your FS from ext4, I think that the filesystem has
64bit inode numbers, allocated to files and this got broken during the
conversion. (just a wild idea)
>   I can see two kind of problems, with different commit, of course.
>   Sometimes the Ooops happens just as kernel mounts the partition,
> sometimes the mount is good, but HD keeps reading for more than 30
> seconds, and the it Ooops.
This would mean something''s broken during transaction commit.
>   Also, you can read but you can''t write, meanwhile.
> 
> In attachment my config.
No attachment, but not needed IMHO.
> I have photos of the Ooops, but right now I can''t take
''em from the phone...
Would really help if you can :)


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Li Zefan

2011-May-30 02:49 UTC

head link

Re: Problem with latest for-linus branch

David Sterba wrote:> Hi,
> 
> On Sat, May 28, 2011 at 07:05:47PM +0200, Andrea Gelmini wrote:
>>    Everything works good with stock Ubuntu 11.04 kernel (2.6.38),
>> vanilla 2.6.38 and vanilla 2.6.39.
>>    If I use Linus'' git tree, BTRFS ooops at mount.
> 
> can you please attach the oops traces?
> 
>>    So I bisected using kernel version 2.6.39 + latest for-linus branch.
>>    Bisect complains about this commit:
>> 581bb050941b4f220f84d3e5ed6dace3d42dd382 is the first bad commit
>> commit 581bb050941b4f220f84d3e5ed6dace3d42dd382
>> Author: Li Zefan <lizf@cn.fujitsu.com>
>> Date:   Wed Apr 20 10:06:11 2011 +0800
>>
>>     Btrfs: Cache free inode numbers in memory
> 
> this patch was part of the new ino allocator and it may depend
> on subsequent patches (eg. 33345d015 "Btrfs: Always use
> 64bit inode number"). In this case it could be a 32/64 bit mismatch in
> inode numbers and blame would point to a incomplete state wrt the
> filesystem.
> 
the bug probably not caused by this.
> You''ve created your FS from ext4, I think that the filesystem has
> 64bit inode numbers, allocated to files and this got broken during the
> conversion. (just a wild idea)
> 
>>   I can see two kind of problems, with different commit, of course.
>>   Sometimes the Ooops happens just as kernel mounts the partition,
just mount the partition, and then no other fs operations? if so, the
patch you bisected down actually won''t take effect.
>> sometimes the mount is good, but HD keeps reading for more than 30
>> seconds, and the it Ooops.
> 
> This would mean something''s broken during transaction commit.
> 
>>   Also, you can read but you can''t write, meanwhile.
>>
>> In attachment my config.
> 
> No attachment, but not needed IMHO.
> 
>> I have photos of the Ooops, but right now I can''t take
''em from the phone...
> 
> Would really help if you can :)
> 
right.

and thanks for the bug report!

btw, I''ll be off till 6.5, so this week I probably won''t be
able to take
care of this..
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrea Gelmini

2011-May-30 10:13 UTC

head link

Re: Problem with latest for-linus branch

2011/5/29 Chris Mason <chris.mason@oracle.com>:> Thanks, could you please send in the photos of the oops when you get
> chance.
Well, I retested everything compiling with frame pointers, so:
a) partition is mounted with this flags:
defaults,ssd,noacl,space_cache (at the beginning I also used
compress);
b) vanilla kernel .38 and .39 are working good;
c) latest Linus tree (commit: bd1bfe40ac6bdf9593da29b822bc301b77a97d6a
the one before 3.0-rc1,
   so in the photos you can find it as .39g+), it goes up, but after a
while of intense i/o working thread (it''s a specific
   kernel thread of btrfs, I guess btrfs-ino-cache, but I could be
wrong) the system freeze. Well, if i/o keep working enough time,
   I can even touch and unlink files, or read files already present,
or do something like /usr/bin/find; these
   photos are here: http://ooops.lugbs.linux.it/linusgit
d) rebooting with .39 doesn''t work. It crashes at mount time.
   The photos are here: http://ooops.lugbs.linux.it/2.6.39
e) booting with 2.6.38.7 solves the problem, giving this info:
[   20.273822] Btrfs loaded
[   20.387795] device label home devid 1 transid 4595 /dev/mapper/VG-home
[   20.388269] btrfs: use ssd allocation scheme
[   20.388277] btrfs: enabling disk space caching
[   25.025873] btrfs: unlinked 5 orphans
[   25.025876] btrfs: truncated 3 orphans
f) by the way, bisect.jpg is the photo I took when I sent first email.

These photos are terrible, but I guess they''re good enough to read
''em.
Anyway, these are multiple shoots of same screen, of course.

Thanks a lot for your time,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2011-May-30 10:41 UTC

head link

Re: Problem with latest for-linus branch

Excerpts from Andrea Gelmini''s message of 2011-05-30 06:13:47
-0400:> 2011/5/29 Chris Mason <chris.mason@oracle.com>:
> > Thanks, could you please send in the photos of the oops when you get
> > chance.
> 
> Well, I retested everything compiling with frame pointers, so:
> a) partition is mounted with this flags:
> defaults,ssd,noacl,space_cache (at the beginning I also used
> compress);
> b) vanilla kernel .38 and .39 are working good;
> c) latest Linus tree (commit: bd1bfe40ac6bdf9593da29b822bc301b77a97d6a
> the one before 3.0-rc1,
>    so in the photos you can find it as .39g+), it goes up, but after a
> while of intense i/o working thread (it''s a specific
>    kernel thread of btrfs, I guess btrfs-ino-cache, but I could be
> wrong) the system freeze. Well, if i/o keep working enough time,
>    I can even touch and unlink files, or read files already present,
> or do something like /usr/bin/find; these
>    photos are here: http://ooops.lugbs.linux.it/linusgit
> d) rebooting with .39 doesn''t work. It crashes at mount time.
>    The photos are here: http://ooops.lugbs.linux.it/2.6.39
> e) booting with 2.6.38.7 solves the problem, giving this info:
> [   20.273822] Btrfs loaded
> [   20.387795] device label home devid 1 transid 4595 /dev/mapper/VG-home
> [   20.388269] btrfs: use ssd allocation scheme
> [   20.388277] btrfs: enabling disk space caching
> [   25.025873] btrfs: unlinked 5 orphans
> [   25.025876] btrfs: truncated 3 orphans
> f) by the way, bisect.jpg is the photo I took when I sent first email.
> 
> These photos are terrible, but I guess they''re good enough to read
''em.
> Anyway, these are multiple shoots of same screen, of course.
These are perfect, thank you.  We''re failing to write out the inode
cache.  Since you''re on a 32 bit machine, I''m guessing that we
failed to
kmap something properly.

Could you please do gdb fs/btrfs/btrfs.ko, and then at the gdb prompt:

gdb> list *__btrfs_write_out_cache+0x43a

And send the output here?  This corresponds to where you were crashing
in the kernel you oops in your linusgit directory.

If this doesn''t work, you might need to recompile with
CONFIG_DEBUG_INFO=y.  You won''t need to trigger the crash again,
just do the gdb command on the new .ko.

If you don''t have btrfs compiled as a module, use gdb vmlinux instead
of
gdb fs/btrfs/btrfs.ko

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrea Gelmini

2011-May-30 11:59 UTC

head link

Re: Problem with latest for-linus branch

2011/5/30 Chris Mason <chris.mason@oracle.com>:> These are perfect, thank you.  We''re failing to write out the
inode
> cache.  Since you''re on a 32 bit machine, I''m guessing
that we failed to
> kmap something properly.
Thanks a lot for detailed info.
I recompiled, and get this:
gelma@dell:~$ gdb /lib/modules/3.0.0-rc1/kernel/fs/btrfs/*
GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show
copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /lib/modules/3.0.0-rc1/kernel/fs/btrfs/btrfs.ko...done.
(gdb) list *__btrfs_write_out_cache+0x43a
0x5fada is in __btrfs_write_out_cache (fs/btrfs/free-space-cache.c:676).
671				struct btrfs_free_space *e;
672	
673				e = rb_entry(node, struct btrfs_free_space, offset_index);
674				entries++;
675	
676				entry->offset = cpu_to_le64(e->offset);
677				entry->bytes = cpu_to_le64(e->bytes);
678				if (e->bitmap) {
679					entry->type = BTRFS_FREE_SPACE_BITMAP;
680					list_add_tail(&e->list, &bitmap_list);
(gdb)

Thanks a lot for your quick answer,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrea Gelmini

2011-May-30 13:02 UTC

head link

Re: Problem with latest for-linus branch

2011/5/29 Chris Mason <chris.mason@oracle.com>:> Thanks, could you please send in the photos of the oops when you get
> chance.
By the way, switching from 2.6.38.7 to 2.6.39, I have a lot of this messages:
[  140.297248] block group 1107296256 has an wrong amount of free space
[  140.848435] block group 8623489024 has an wrong amount of free space
[  140.879178] block group 17213423616 has an wrong amount of free space
[  140.910181] block group 24729616384 has an wrong amount of free space
[  140.937690] block group 33319550976 has an wrong amount of free space
[  140.971150] block group 40835743744 has an wrong amount of free space
[  141.000816] block group 49425678336 has an wrong amount of free space
[  141.027175] block group 56941871104 has an wrong amount of free space
[  141.057614] block group 65531805696 has an wrong amount of free space
[  141.088269] block group 73047998464 has an wrong amount of free space
[  141.124767] block group 81637933056 has an wrong amount of free space
[  141.156891] block group 97744060416 has an wrong amount of free space
[  141.190143] block group 121366380544 has an wrong amount of free space
[  141.219235] block group 129956315136 has an wrong amount of free space

It also happens with 2.6.38.7, but lot less.
Should I worry?

Thanks again,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2011-May-30 13:35 UTC

head link

Re: Problem with latest for-linus branch

Excerpts from Andrea Gelmini''s message of 2011-05-30 07:59:30
-0400:> 2011/5/30 Chris Mason <chris.mason@oracle.com>:
> > These are perfect, thank you. Â We''re failing to write out
the inode
> > cache. Â Since you''re on a 32 bit machine, I''m
guessing that we failed to
> > kmap something properly.
> 
> Thanks a lot for detailed info.
> I recompiled, and get this:
> gelma@dell:~$ gdb /lib/modules/3.0.0-rc1/kernel/fs/btrfs/*
> GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show
copying"
> and "show warranty" for details.
> This GDB was configured as "i686-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from
/lib/modules/3.0.0-rc1/kernel/fs/btrfs/btrfs.ko...done.
> (gdb) list *__btrfs_write_out_cache+0x43a
> 0x5fada is in __btrfs_write_out_cache (fs/btrfs/free-space-cache.c:676).
> 671                struct btrfs_free_space *e;
> 672    
> 673                e = rb_entry(node, struct btrfs_free_space,
offset_index);
> 674                entries++;
> 675    
> 676                entry->offset = cpu_to_le64(e->offset);
> 677                entry->bytes = cpu_to_le64(e->bytes);
> 678                if (e->bitmap) {
> 679                    entry->type = BTRFS_FREE_SPACE_BITMAP;
> 680                    list_add_tail(&e->list, &bitmap_list);
> (gdb)
Ok, so I think we''re blowing past the end of the page we''ve
kmap''d.  But
I don''t think that can happen without something like the patch below
triggering:

Josef, what do you think?

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 70d4579..a95b72e 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -596,6 +596,11 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct
inode *inode,
 	 */
 	first_page_offset = (sizeof(u32) * num_pages) + sizeof(u64);
 
+	if (first_page_offset + sizeof(struct btrfs_free_space_entry) >=
PAGE_CACHE_SIZE) {
+		printk(KERN_CRIT "bad first page offset %lu\n", first_page_offset);
+		BUG();
+	}
+
 	/* Get the cluster for this block_group if it exists */
 	if (block_group && !list_empty(&block_group->cluster_list))
 		cluster = list_entry(block_group->cluster_list.next,
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrea Gelmini

2011-May-31 18:15 UTC

head link

Re: Problem with latest for-linus branch

2011/5/30 Chris Mason <chris.mason@oracle.com>:> Ok, so I think we''re blowing past the end of the page
we''ve kmap''d.  But
> I don''t think that can happen without something like the patch
below
> triggering:
Quick update: after rm of ~10 GB of data, I rebooted with Linus'' latest
git tree, and it works (after some minutes of btrfs-ino-cache).

Ciao,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - May 2011 - Problem with latest for-linus branch

Problem with latest for-linus branch

Re: Problem with latest for-linus branch

Re: Problem with latest for-linus branch

Re: Problem with latest for-linus branch

Re: Problem with latest for-linus branch

Re: Problem with latest for-linus branch

Re: Problem with latest for-linus branch

Re: Problem with latest for-linus branch

Re: Problem with latest for-linus branch

Re: Problem with latest for-linus branch