thr3ads.net - Btrfs devel - [BUG] sleeping function called from atomic context [May 2012]

If this information is useful, please help other people find it:
Share via:

Stefan Behrens

2012-May-04 10:18 UTC

[BUG] sleeping function called from atomic context

Looks like after "btrfs read error corrected" of chunk tree block
while
reading the chunk tree in open_ctree(), we stay in atomic state (in
3.4-rc5).

The steps:
# mkfs.btrfs -d raid1 -m raid1 /dev/sdv /dev/sdw /dev/sdi
# mount /dev/sdv /mnt
fill /mnt to 25% full
# umount /mnt
disconnect /dev/sdw
# mount /dev/sdv /mnt -o degraded
fill /mnt to 50% full
# umount /mnt
reconnect /dev/sdw
# mount /dev/sdv /mnt

Block 20971520 is the chunk tree node on /dev/sdw. Here''s the kernel
log, and the short btrfs-debug-tree output of the chunk tree is located
at the end of the mail:

sd 6:0:27:0: [sdw] Attached SCSI disk
device fsid 7e1ea62e-7d32-4c11-bc02-1bc197be8f54 devid 1 transid 41 /dev/sdv
btrfs: disk space caching is enabled
parent transid verify failed on 20971520 wanted 38 found 12
btrfs read error corrected: ino 1 off 20971520 (dev /dev/sdw sector 2048)
BUG: sleeping function called from invalid context at mm/slub.c:937
in_atomic(): 1, irqs_disabled(): 0, pid: 5955, name: mount
INFO: lockdep is turned off.
Pid: 5955, comm: mount Tainted: G        W    3.3.0+ #57
Call Trace:
 [<ffffffff810af591>] __might_sleep+0xe1/0x110
 [<ffffffff81188bbd>] kmem_cache_alloc+0x4d/0x160
 [<ffffffff81140270>] ? __lock_page+0x70/0x70
 [<ffffffffa00f86bb>] alloc_extent_state+0x2b/0xf0 [btrfs]
 [<ffffffffa00fa8d7>] __set_extent_bit+0x357/0x590 [btrfs]
 [<ffffffffa00fab7c>] lock_extent_bits+0x6c/0xa0 [btrfs]
 [<ffffffffa00ce964>] verify_parent_transid+0x84/0x160 [btrfs]
 [<ffffffffa00cea86>] btrfs_buffer_uptodate+0x46/0x60 [btrfs]
 [<ffffffffa00b42ad>] read_block_for_search+0x14d/0x3b0 [btrfs]
 [<ffffffffa00b243d>] ? generic_bin_search+0xfd/0x180 [btrfs]
 [<ffffffffa00b9404>] btrfs_search_slot+0x2f4/0x8c0 [btrfs]
 [<ffffffffa0104425>] btrfs_read_chunk_tree+0xd5/0x700 [btrfs]
 [<ffffffffa00b2ec8>] ? btrfs_root_node+0x28/0xf0 [btrfs]
 [<ffffffffa00f7533>] ? read_extent_buffer+0xb3/0x110 [btrfs]
 [<ffffffffa00b2f2a>] ? btrfs_root_node+0x8a/0xf0 [btrfs]
 [<ffffffffa00b2ec8>] ? btrfs_root_node+0x28/0xf0 [btrfs]
 [<ffffffffa00d4ac5>] open_ctree+0x1185/0x1c00 [btrfs]
 [<ffffffffa00b215f>] btrfs_mount+0x45f/0x530 [btrfs]
 [<ffffffff8190ac69>] ? mutex_unlock+0x9/0x10
 [<ffffffff81160119>] ? pcpu_alloc+0x399/0xa00
 [<ffffffff811aa8fa>] ? alloc_vfsmnt+0x9a/0x1b0
 [<ffffffff8118fc1e>] mount_fs+0x3e/0x1a0
 [<ffffffff811aa916>] ? alloc_vfsmnt+0xb6/0x1b0
 [<ffffffff811ab6db>] vfs_kern_mount+0x5b/0xe0
 [<ffffffff811abf6d>] do_kern_mount+0x4d/0x110
 [<ffffffff813db663>] ? security_capable+0x13/0x20
 [<ffffffff811ad8fd>] do_mount+0x24d/0x7c0
 [<ffffffff8115b4e6>] ? memdup_user+0x46/0x90
 [<ffffffff8115b583>] ? strndup_user+0x53/0x70
 [<ffffffff811adefb>] sys_mount+0x8b/0xe0
 [<ffffffff81915862>] system_call_fastpath+0x16/0x1b
parent transid verify failed on 20975616 wanted 38 found 13
BUG: sleeping function called from invalid context at mm/slub.c:937
in_atomic(): 1, irqs_disabled(): 0, pid: 5955, name: mount
INFO: lockdep is turned off.
Pid: 5955, comm: mount Tainted: G        W    3.3.0+ #57
Call Trace:
 [<ffffffff810af591>] __might_sleep+0xe1/0x110
 [<ffffffff81081469>] ? vprintk+0x309/0x4d0
 [<ffffffff81188bbd>] kmem_cache_alloc+0x4d/0x160
 [<ffffffffa00f86bb>] alloc_extent_state+0x2b/0xf0 [btrfs]
 [<ffffffffa00fb3e5>] clear_extent_bit+0x255/0x430 [btrfs]
 [<ffffffffa00fbad4>] unlock_extent_cached+0x24/0x30 [btrfs]
 [<ffffffffa00ce9d3>] verify_parent_transid+0xf3/0x160 [btrfs]
 [<ffffffffa00cea86>] btrfs_buffer_uptodate+0x46/0x60 [btrfs]
 [<ffffffffa00b42ad>] read_block_for_search+0x14d/0x3b0 [btrfs]
 [<ffffffffa00b243d>] ? generic_bin_search+0xfd/0x180 [btrfs]
 [<ffffffffa00b9404>] btrfs_search_slot+0x2f4/0x8c0 [btrfs]
 [<ffffffffa0104425>] btrfs_read_chunk_tree+0xd5/0x700 [btrfs]
 [<ffffffffa00b2ec8>] ? btrfs_root_node+0x28/0xf0 [btrfs]
 [<ffffffffa00f7533>] ? read_extent_buffer+0xb3/0x110 [btrfs]
 [<ffffffffa00b2f2a>] ? btrfs_root_node+0x8a/0xf0 [btrfs]
 [<ffffffffa00b2ec8>] ? btrfs_root_node+0x28/0xf0 [btrfs]
 [<ffffffffa00d4ac5>] open_ctree+0x1185/0x1c00 [btrfs]
 [<ffffffffa00b215f>] btrfs_mount+0x45f/0x530 [btrfs]
 [<ffffffff8190ac69>] ? mutex_unlock+0x9/0x10
 [<ffffffff81160119>] ? pcpu_alloc+0x399/0xa00
 [<ffffffff811aa8fa>] ? alloc_vfsmnt+0x9a/0x1b0
 [<ffffffff8118fc1e>] mount_fs+0x3e/0x1a0
 [<ffffffff811aa916>] ? alloc_vfsmnt+0xb6/0x1b0
 [<ffffffff811ab6db>] vfs_kern_mount+0x5b/0xe0
 [<ffffffff811abf6d>] do_kern_mount+0x4d/0x110
 [<ffffffff813db663>] ? security_capable+0x13/0x20
 [<ffffffff811ad8fd>] do_mount+0x24d/0x7c0
 [<ffffffff8115b4e6>] ? memdup_user+0x46/0x90
 [<ffffffff8115b583>] ? strndup_user+0x53/0x70
 [<ffffffff811adefb>] sys_mount+0x8b/0xe0
 [<ffffffff81915862>] system_call_fastpath+0x16/0x1b
parent transid verify failed on 20975616 wanted 38 found 13
btrfs read error corrected: ino 1 off 20975616 (dev /dev/sdw sector 2056)
BUG: sleeping function called from invalid context at mm/slub.c:937
in_atomic(): 1, irqs_disabled(): 0, pid: 5955, name: mount
INFO: lockdep is turned off.
Pid: 5955, comm: mount Tainted: G        W    3.3.0+ #57
Call Trace:
 [<ffffffff810af591>] __might_sleep+0xe1/0x110
 [<ffffffff81188bbd>] kmem_cache_alloc+0x4d/0x160
 [<ffffffff81140270>] ? __lock_page+0x70/0x70
 [<ffffffffa00f86bb>] alloc_extent_state+0x2b/0xf0 [btrfs]
 [<ffffffffa00fa8d7>] __set_extent_bit+0x357/0x590 [btrfs]
 [<ffffffffa00fab7c>] lock_extent_bits+0x6c/0xa0 [btrfs]
 [<ffffffffa00ce964>] verify_parent_transid+0x84/0x160 [btrfs]
 [<ffffffffa00cea86>] btrfs_buffer_uptodate+0x46/0x60 [btrfs]
 [<ffffffffa00b42ad>] read_block_for_search+0x14d/0x3b0 [btrfs]
 [<ffffffffa00b9933>] ? btrfs_search_slot+0x823/0x8c0 [btrfs]
 [<ffffffff8190dd16>] ? _raw_spin_unlock+0x26/0x40
 [<ffffffffa00bb489>] btrfs_next_leaf+0x229/0x3e0 [btrfs]
 [<ffffffffa010432e>] ? read_one_chunk+0x2ae/0x2d0 [btrfs]
 [<ffffffffa01043e3>] btrfs_read_chunk_tree+0x93/0x700 [btrfs]
 [<ffffffffa00f7533>] ? read_extent_buffer+0xb3/0x110 [btrfs]
 [<ffffffffa00b2ec8>] ? btrfs_root_node+0x28/0xf0 [btrfs]
 [<ffffffffa00d4ac5>] open_ctree+0x1185/0x1c00 [btrfs]
 [<ffffffffa00b215f>] btrfs_mount+0x45f/0x530 [btrfs]
 [<ffffffff8190ac69>] ? mutex_unlock+0x9/0x10
 [<ffffffff81160119>] ? pcpu_alloc+0x399/0xa00
 [<ffffffff811aa8fa>] ? alloc_vfsmnt+0x9a/0x1b0
 [<ffffffff8118fc1e>] mount_fs+0x3e/0x1a0
 [<ffffffff811aa916>] ? alloc_vfsmnt+0xb6/0x1b0
 [<ffffffff811ab6db>] vfs_kern_mount+0x5b/0xe0
 [<ffffffff811abf6d>] do_kern_mount+0x4d/0x110
 [<ffffffff813db663>] ? security_capable+0x13/0x20
 [<ffffffff811ad8fd>] do_mount+0x24d/0x7c0
 [<ffffffff8115b4e6>] ? memdup_user+0x46/0x90
 [<ffffffff8115b583>] ? strndup_user+0x53/0x70
 [<ffffffff811adefb>] sys_mount+0x8b/0xe0
 [<ffffffff81915862>] system_call_fastpath+0x16/0x1b
parent transid verify failed on 20979712 wanted 38 found 10
BUG: sleeping function called from invalid context at mm/slub.c:937
in_atomic(): 1, irqs_disabled(): 0, pid: 5955, name: mount
INFO: lockdep is turned off.
Pid: 5955, comm: mount Tainted: G        W    3.3.0+ #57
Call Trace:
 [<ffffffff810af591>] __might_sleep+0xe1/0x110
 [<ffffffff81081469>] ? vprintk+0x309/0x4d0
 [<ffffffff81188bbd>] kmem_cache_alloc+0x4d/0x160
 [<ffffffffa00f86bb>] alloc_extent_state+0x2b/0xf0 [btrfs]
 [<ffffffffa00fb3e5>] clear_extent_bit+0x255/0x430 [btrfs]
 [<ffffffffa00fbad4>] unlock_extent_cached+0x24/0x30 [btrfs]
 [<ffffffffa00ce9d3>] verify_parent_transid+0xf3/0x160 [btrfs]
 [<ffffffffa00cea86>] btrfs_buffer_uptodate+0x46/0x60 [btrfs]
 [<ffffffffa00b42ad>] read_block_for_search+0x14d/0x3b0 [btrfs]
 [<ffffffffa00b9933>] ? btrfs_search_slot+0x823/0x8c0 [btrfs]
 [<ffffffff8190dd16>] ? _raw_spin_unlock+0x26/0x40
 [<ffffffffa00bb489>] btrfs_next_leaf+0x229/0x3e0 [btrfs]
 [<ffffffffa010432e>] ? read_one_chunk+0x2ae/0x2d0 [btrfs]
 [<ffffffffa01043e3>] btrfs_read_chunk_tree+0x93/0x700 [btrfs]
 [<ffffffffa00f7533>] ? read_extent_buffer+0xb3/0x110 [btrfs]
 [<ffffffffa00b2ec8>] ? btrfs_root_node+0x28/0xf0 [btrfs]
 [<ffffffffa00d4ac5>] open_ctree+0x1185/0x1c00 [btrfs]
 [<ffffffffa00b215f>] btrfs_mount+0x45f/0x530 [btrfs]
 [<ffffffff8190ac69>] ? mutex_unlock+0x9/0x10
 [<ffffffff81160119>] ? pcpu_alloc+0x399/0xa00
 [<ffffffff811aa8fa>] ? alloc_vfsmnt+0x9a/0x1b0
 [<ffffffff8118fc1e>] mount_fs+0x3e/0x1a0
 [<ffffffff811aa916>] ? alloc_vfsmnt+0xb6/0x1b0
 [<ffffffff811ab6db>] vfs_kern_mount+0x5b/0xe0
 [<ffffffff811abf6d>] do_kern_mount+0x4d/0x110
 [<ffffffff813db663>] ? security_capable+0x13/0x20
 [<ffffffff811ad8fd>] do_mount+0x24d/0x7c0
 [<ffffffff8115b4e6>] ? memdup_user+0x46/0x90
 [<ffffffff8115b583>] ? strndup_user+0x53/0x70
 [<ffffffff811adefb>] sys_mount+0x8b/0xe0
 [<ffffffff81915862>] system_call_fastpath+0x16/0x1b
parent transid verify failed on 20979712 wanted 38 found 10
btrfs read error corrected: ino 1 off 20979712 (dev /dev/sdw sector 2064)
Btrfs detected SSD devices, enabling SSD mode


chunk tree
node 20971520 level 1 items 2 free 119 generation 38 owner 3
fs uuid 7e1ea62e-7d32-4c11-bc02-1bc197be8f54
chunk uuid f59ccfe4-7257-4430-8556-f5e578464a38
        key (DEV_ITEMS DEV_ITEM 1) block 20975616 (5121) gen 38
        key (FIRST_CHUNK_TREE CHUNK_ITEM 24725422080) block 20979712
(5122) gen 38
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2012-May-04 13:25 UTC

head link

Re: [BUG] sleeping function called from atomic context

On Fri, May 04, 2012 at 12:18:51PM +0200, Stefan Behrens
wrote:> Looks like after "btrfs read error corrected" of chunk tree block
while
> reading the chunk tree in open_ctree(), we stay in atomic state (in
> 3.4-rc5).
I''m having a hard time reproducing this here.  Do you have lockdep on?
It might tell us which lock we''re leaving around.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2012-May-04 13:36 UTC

head link

Re: [BUG] sleeping function called from atomic context

On Fri, May 04, 2012 at 09:25:36AM -0400, Chris Mason
wrote:> I''m having a hard time reproducing this here.  Do you have lockdep
on?
> It might tell us which lock we''re leaving around.
He''s using SLUB and it does not like waiting allocations when
CONFIG_SLUB_DEBUG is on:

 445 #ifdef CONFIG_SLUB_DEBUG
...
 935 /*
 936  * Hooks for other subsystems that check memory allocations. In a typical
 937  * production configuration these hooks all should produce no code at all.
 938  */
 939 static inline int slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)
 940 {
 941         flags &= gfp_allowed_mask;
 942         lockdep_trace_alloc(flags);
 943         might_sleep_if(flags & __GFP_WAIT);
 944
 945         return should_failslab(s->objsize, flags, s->flags);
 946 }
...
1223 #else
...
1259 static inline int slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)
1260                                                         { return 0; }

where slab_pre_alloc_hook is called down the chain from kmem_cache_alloc.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2012-May-04 15:12 UTC

head link

Re: [BUG] sleeping function called from atomic context

On Fri, May 04, 2012 at 03:36:16PM +0200, David Sterba
wrote:> On Fri, May 04, 2012 at 09:25:36AM -0400, Chris Mason wrote:
> > I''m having a hard time reproducing this here.  Do you have
lockdep on?
> > It might tell us which lock we''re leaving around.
> 
> He''s using SLUB and it does not like waiting allocations when
> CONFIG_SLUB_DEBUG is on:
> 
>  445 #ifdef CONFIG_SLUB_DEBUG
> ...
>  935 /*
>  936  * Hooks for other subsystems that check memory allocations. In a
typical
>  937  * production configuration these hooks all should produce no code at
all.
>  938  */
>  939 static inline int slab_pre_alloc_hook(struct kmem_cache *s, gfp_t
flags)
>  940 {
>  941         flags &= gfp_allowed_mask;
>  942         lockdep_trace_alloc(flags);
>  943         might_sleep_if(flags & __GFP_WAIT);
>  944
>  945         return should_failslab(s->objsize, flags, s->flags);
>  946 }
> ...
> 1223 #else
> ...
> 1259 static inline int slab_pre_alloc_hook(struct kmem_cache *s, gfp_t
flags)
> 1260                                                         { return 0; }
> 
> where slab_pre_alloc_hook is called down the chain from kmem_cache_alloc.
Right, but this should be after the fact, we''re not supposed to be
holding any locks here.  Lockdep should tell us which lock is still
held, which will make it much easier.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Stefan Behrens

2012-May-04 15:20 UTC

head link

Re: [BUG] sleeping function called from atomic context

On 5/4/2012 3:25 PM, Chris Mason wrote:> On Fri, May 04, 2012 at 12:18:51PM +0200, Stefan Behrens wrote:
>> Looks like after "btrfs read error corrected" of chunk tree
block while
>> reading the chunk tree in open_ctree(), we stay in atomic state (in
>> 3.4-rc5).
> 
> I''m having a hard time reproducing this here.  Do you have lockdep
on?
> It might tell us which lock we''re leaving around.
Next two tries with lockdep and with a reboot before the mount. For the
first one I was using dd(1) to damage (overwite) one mirror of the chunk
tree and the issue was there immediately. The second time with some
writes to the disk in degraded state and a remount with all disks
afterwards. This time umount was raising the issue.

And as Dave wrote, SLUB checks whether someone is in atomic state and
calls kmem_cache_alloc() with __GFP_WAIT. I see no reason for being in
atomic state at this point, that''s the issue.


BUG: sleeping function called from invalid context at mm/slub.c:937
in_atomic(): 1, irqs_disabled(): 0, pid: 3650, name: mount
2 locks held by mount/3650:
 #0:  (&type->s_umount_key#32/1){+.+.+.}, at: [<ffffffff811909c8>]
sget+0x248/0x540
 #1:  (btrfs-fs-03){++++..}, at: [<ffffffffa010ea12>]
btrfs_clear_lock_blocking_rw+0x62/0x160 [btrfs]
Pid: 3650, comm: mount Not tainted 3.3.0+ #58
Call Trace:
 [<ffffffff810af591>] __might_sleep+0xe1/0x110
 [<ffffffff811895bd>] kmem_cache_alloc+0x4d/0x160
 [<ffffffffa00f86bb>] alloc_extent_state+0x2b/0xf0 [btrfs]
 [<ffffffffa00fa8d7>] __set_extent_bit+0x357/0x590 [btrfs]
 [<ffffffff810d594d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffffa00fab7c>] lock_extent_bits+0x6c/0xa0 [btrfs]
 [<ffffffffa00ce964>] verify_parent_transid+0x84/0x160 [btrfs]
 [<ffffffffa00cea86>] btrfs_buffer_uptodate+0x46/0x60 [btrfs]
 [<ffffffffa00b42ad>] read_block_for_search+0x14d/0x3b0 [btrfs]
 [<ffffffffa00b243d>] ? generic_bin_search+0xfd/0x180 [btrfs]
 [<ffffffffa00b9404>] btrfs_search_slot+0x2f4/0x8c0 [btrfs]
 [<ffffffffa00cd2ba>] btrfs_lookup_inode+0x2a/0xa0 [btrfs]
 [<ffffffffa00df81c>] btrfs_iget+0x10c/0x4c0 [btrfs]
 [<ffffffffa00b2190>] btrfs_mount+0x490/0x530 [btrfs]
 [<ffffffff8190b669>] ? mutex_unlock+0x9/0x10
 [<ffffffff81160b19>] ? pcpu_alloc+0x399/0xa00
 [<ffffffff8119061e>] mount_fs+0x3e/0x1a0
 [<ffffffff811ab316>] ? alloc_vfsmnt+0xb6/0x1b0
 [<ffffffff811ac0db>] vfs_kern_mount+0x5b/0xe0
 [<ffffffff811ac96d>] do_kern_mount+0x4d/0x110
 [<ffffffff813dc063>] ? security_capable+0x13/0x20
 [<ffffffff811ae2fd>] do_mount+0x24d/0x7c0
 [<ffffffff8115bee6>] ? memdup_user+0x46/0x90
 [<ffffffff8115bf83>] ? strndup_user+0x53/0x70
 [<ffffffff811ae8fb>] sys_mount+0x8b/0xe0
 [<ffffffff81916262>] system_call_fastpath+0x16/0x1b
parent transid verify failed on 29798400 wanted 30 found 27
parent transid verify failed on 29798400 wanted 30 found 27
btrfs read error corrected: ino 1 off 29798400 (dev /dev/sdw sector 41816)
parent transid verify failed on 30085120 wanted 30 found 27
parent transid verify failed on 30085120 wanted 30 found 27
btrfs read error corrected: ino 1 off 30085120 (dev /dev/sdw sector 42376)
parent transid verify failed on 30867456 wanted 30 found 27
parent transid verify failed on 30867456 wanted 30 found 27
btrfs read error corrected: ino 1 off 30867456 (dev /dev/sdw sector 43904)

(gdb) info line *btrfs_clear_lock_blocking_rw+0x62
Line 95 of "/root/git/btrfs/arch/x86/include/asm/atomic.h"
   starts at address 0x64a12 <btrfs_clear_lock_blocking_rw+98>
   and ends at 0x64a19 <btrfs_clear_lock_blocking_rw+105>.
That''s inside the definition of atomic_inc(). Doesn''t make
sense.
static inline void atomic_inc(atomic_t *v)
{
        asm volatile(LOCK_PREFIX "incl %0"
                     : "+m" (v->counter));
}

(gdb) info line *sget+0x248
Line 173 of "fs/super.c" starts at address 0xffffffff811909c8
<sget+584>
   and ends at 0xffffffff811909dd <sget+605>.
That''s in the middle of alloc_super(). Looks messy. Not helpful.
                mutex_init(&s->s_vfs_rename_mutex);


Then I was retrying once more:

btrfs: disk space caching is enabled
Btrfs detected SSD devices, enabling SSD mode
BUG: sleeping function called from invalid context at mm/slub.c:937
in_atomic(): 1, irqs_disabled(): 0, pid: 3626, name: umount
6 locks held by umount/3626:
 #0:  (&type->s_umount_key#33){+++++.}, at: [<ffffffff8119113d>]
deactivate_super+0x3d/0x60
 #1:  (&fs_info->reloc_mutex){+.+...}, at: [<ffffffffa00d724a>]
btrfs_commit_transaction+0x50a/0xaa0 [btrfs]
 #2:  (&fs_info->tree_log_mutex){+.+...}, at: [<ffffffffa00d72c4>]
btrfs_commit_transaction+0x584/0xaa0 [btrfs]
 #3:  (&head_ref->mutex){+.+...}, at: [<ffffffffa011d153>]
btrfs_delayed_ref_lock+0x43/0x130 [btrfs]
 #4:  (btrfs-extent-02){++++..}, at: [<ffffffffa010ea9f>]
btrfs_clear_lock_blocking_rw+0xef/0x160 [btrfs]
 #5:  (btrfs-extent-01){++++..}, at: [<ffffffffa010ea9f>]
btrfs_clear_lock_blocking_rw+0xef/0x160 [btrfs]
Pid: 3626, comm: umount Not tainted 3.3.0+ #58
Call Trace:
 [<ffffffff810af591>] __might_sleep+0xe1/0x110
 [<ffffffff811895bd>] kmem_cache_alloc+0x4d/0x160
 [<ffffffffa00f86bb>] alloc_extent_state+0x2b/0xf0 [btrfs]
 [<ffffffffa00fa8d7>] __set_extent_bit+0x357/0x590 [btrfs]
 [<ffffffff810d594d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffffa00fab7c>] lock_extent_bits+0x6c/0xa0 [btrfs]
 [<ffffffffa00ce964>] verify_parent_transid+0x84/0x160 [btrfs]
 [<ffffffffa00cea86>] btrfs_buffer_uptodate+0x46/0x60 [btrfs]
 [<ffffffffa00b42ad>] read_block_for_search+0x14d/0x3b0 [btrfs]
 [<ffffffffa00b243d>] ? generic_bin_search+0xfd/0x180 [btrfs]
 [<ffffffffa00b9404>] btrfs_search_slot+0x2f4/0x8c0 [btrfs]
 [<ffffffff810db9fd>] ? lock_release_non_nested+0x19d/0x390
 [<ffffffffa00c61c9>] ? run_clustered_refs+0xd9/0xa00 [btrfs]
 [<ffffffffa00ba464>] btrfs_insert_empty_items+0x84/0xe0 [btrfs]
 [<ffffffff8118964d>] ? kmem_cache_alloc+0xdd/0x160
 [<ffffffffa00c64c5>] run_clustered_refs+0x3d5/0xa00 [btrfs]
 [<ffffffffa00c6b8f>] ? btrfs_run_delayed_refs+0x9f/0x4e0 [btrfs]
 [<ffffffffa00c6b8f>] ? btrfs_run_delayed_refs+0x9f/0x4e0 [btrfs]
 [<ffffffffa00c6c6c>] btrfs_run_delayed_refs+0x17c/0x4e0 [btrfs]
 [<ffffffffa00f8a92>] ? release_extent_buffer+0x32/0xb0 [btrfs]
 [<ffffffff8190e716>] ? _raw_spin_unlock+0x26/0x40
 [<ffffffffa00f8a92>] ? release_extent_buffer+0x32/0xb0 [btrfs]
 [<ffffffffa00d6231>] commit_cowonly_roots+0xa1/0x1e0 [btrfs]
 [<ffffffffa00d731b>] btrfs_commit_transaction+0x5db/0xaa0 [btrfs]
 [<ffffffffa00d7b88>] ? start_transaction+0x88/0x330 [btrfs]
 [<ffffffff810a3ab0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa00b0661>] btrfs_sync_fs+0x61/0xe0 [btrfs]
 [<ffffffff811bb81e>] __sync_filesystem+0x5e/0x90
 [<ffffffff811bb923>] sync_filesystem+0x43/0x60
 [<ffffffff8118fd86>] generic_shutdown_super+0x36/0xf0
 [<ffffffff8118fed1>] kill_anon_super+0x11/0x20
 [<ffffffffa00b1c85>] btrfs_kill_super+0x15/0x90 [btrfs]
 [<ffffffff8119113d>] ? deactivate_super+0x3d/0x60
 [<ffffffff811903c5>] deactivate_locked_super+0x45/0x70
 [<ffffffff81191145>] deactivate_super+0x45/0x60
 [<ffffffff811ac4a2>] mntput_no_expire+0xd2/0x120
 [<ffffffff811acaa6>] sys_umount+0x76/0x3c0
 [<ffffffff81916262>] system_call_fastpath+0x16/0x1b
parent transid verify failed on 64811008 wanted 39 found 8
parent transid verify failed on 64811008 wanted 39 found 8
btrfs read error corrected: ino 1 off 64811008 (dev /dev/sdw sector 110200)
parent transid verify failed on 36839424 wanted 42 found 8
parent transid verify failed on 36839424 wanted 42 found 8
btrfs read error corrected: ino 1 off 36839424 (dev /dev/sdw sector 55568)
parent transid verify failed on 38449152 wanted 40 found 8
parent transid verify failed on 38449152 wanted 40 found 8
btrfs read error corrected: ino 1 off 38449152 (dev /dev/sdw sector 58712)
parent transid verify failed on 73908224 wanted 39 found 8
parent transid verify failed on 73908224 wanted 39 found 8
btrfs read error corrected: ino 1 off 73908224 (dev /dev/sdw sector 127968)
parent transid verify failed on 38780928 wanted 40 found 8
parent transid verify failed on 38780928 wanted 40 found 8
btrfs read error corrected: ino 1 off 38780928 (dev /dev/sdw sector 59360)
btrfs read error corrected: ino 1 off 38793216 (dev /dev/sdw sector 59384)
btrfs read error corrected: ino 1 off 69025792 (dev /dev/sdw sector 118432)
btrfs read error corrected: ino 1 off 69033984 (dev /dev/sdw sector 118448)
btrfs read error corrected: ino 1 off 38440960 (dev /dev/sdw sector 58696)
btrfs read error corrected: ino 1 off 38940672 (dev /dev/sdw sector 59672)
btrfs read error corrected: ino 1 off 42262528 (dev /dev/sdw sector 66160)
btrfs read error corrected: ino 1 off 73924608 (dev /dev/sdw sector 128000)
btrfs bad tree block start 2742074651274362994 202395648
btrfs read error corrected: ino 1 off 202395648 (dev /dev/sdw sector 378920)
btrfs read error corrected: ino 1 off 69042176 (dev /dev/sdw sector 118464)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2012-May-04 15:38 UTC

head link

Re: [BUG] sleeping function called from atomic context

On Fri, May 04, 2012 at 05:20:16PM +0200, Stefan Behrens
wrote:> On 5/4/2012 3:25 PM, Chris Mason wrote:
> > On Fri, May 04, 2012 at 12:18:51PM +0200, Stefan Behrens wrote:
> >> Looks like after "btrfs read error corrected" of chunk
tree block while
> >> reading the chunk tree in open_ctree(), we stay in atomic state
(in
> >> 3.4-rc5).
> > 
> > I''m having a hard time reproducing this here.  Do you have
lockdep on?
> > It might tell us which lock we''re leaving around.
> 
> Next two tries with lockdep and with a reboot before the mount. For the
> first one I was using dd(1) to damage (overwite) one mirror of the chunk
> tree and the issue was there immediately. The second time with some
> writes to the disk in degraded state and a remount with all disks
> afterwards. This time umount was raising the issue.
> 
> And as Dave wrote, SLUB checks whether someone is in atomic state and
> calls kmem_cache_alloc() with __GFP_WAIT. I see no reason for being in
> atomic state at this point, that''s the issue.
> 
> 
> BUG: sleeping function called from invalid context at mm/slub.c:937
> in_atomic(): 1, irqs_disabled(): 0, pid: 3650, name: mount
> 2 locks held by mount/3650:
>  #0:  (&type->s_umount_key#32/1){+.+.+.}, at:
[<ffffffff811909c8>]
> sget+0x248/0x540
>  #1:  (btrfs-fs-03){++++..}, at: [<ffffffffa010ea12>]
> btrfs_clear_lock_blocking_rw+0x62/0x160 [btrfs]
> Pid: 3650, comm: mount Not tainted 3.3.0+ #58
> Call Trace:
>  [<ffffffff810af591>] __might_sleep+0xe1/0x110
>  [<ffffffff811895bd>] kmem_cache_alloc+0x4d/0x160
>  [<ffffffffa00f86bb>] alloc_extent_state+0x2b/0xf0 [btrfs]
>  [<ffffffffa00fa8d7>] __set_extent_bit+0x357/0x590 [btrfs]
>  [<ffffffff810d594d>] ? trace_hardirqs_off+0xd/0x10
>  [<ffffffffa00fab7c>] lock_extent_bits+0x6c/0xa0 [btrfs]
>  [<ffffffffa00ce964>] verify_parent_transid+0x84/0x160 [btrfs]
Oh, that makes sense.  verify_parent_transid can sleep because it
locks the extent, and we''re being called with a spinning lock.

Hmmm, let me think on this one.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2012-May-04 15:50 UTC

head link

Re: [BUG] sleeping function called from atomic context

On Fri, May 04, 2012 at 05:20:16PM +0200, Stefan Behrens
wrote:> On 5/4/2012 3:25 PM, Chris Mason wrote:
> > On Fri, May 04, 2012 at 12:18:51PM +0200, Stefan Behrens wrote:
> >> Looks like after "btrfs read error corrected" of chunk
tree block while
> >> reading the chunk tree in open_ctree(), we stay in atomic state
(in
> >> 3.4-rc5).
> > 
> > I''m having a hard time reproducing this here.  Do you have
lockdep on?
> > It might tell us which lock we''re leaving around.
> 
> Next two tries with lockdep and with a reboot before the mount. For the
> first one I was using dd(1) to damage (overwite) one mirror of the chunk
> tree and the issue was there immediately. The second time with some
> writes to the disk in degraded state and a remount with all disks
> afterwards. This time umount was raising the issue.
> 
> And as Dave wrote, SLUB checks whether someone is in atomic state and
> calls kmem_cache_alloc() with __GFP_WAIT. I see no reason for being in
> atomic state at this point, that''s the issue.
> 
> 
> BUG: sleeping function called from invalid context at mm/slub.c:937
> in_atomic(): 1, irqs_disabled(): 0, pid: 3650, name: mount
Ok, what we need to do is make a non-blocking version of
verify_parent_transid for the (likely) case where everything is ok.
Pass it a gfp_mask arg and if it is a non-waiting call it needs to
return -EAGAIN.

I''ll add it this weekend unless someone beats me to it.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - May 2012 - [BUG] sleeping function called from atomic context

[BUG] sleeping function called from atomic context

Re: [BUG] sleeping function called from atomic context

Re: [BUG] sleeping function called from atomic context

Re: [BUG] sleeping function called from atomic context

Re: [BUG] sleeping function called from atomic context

Re: [BUG] sleeping function called from atomic context

Re: [BUG] sleeping function called from atomic context