thr3ads.net - Btrfs devel - Crashes in extent_io.c after "btrfs bad mapping eb" notice [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Franke

2012-Oct-30 23:57 UTC

Crashes in extent_io.c after "btrfs bad mapping eb" notice

Hello,

I have been having some crashes like this. Since I upgraded to 3.6.4 they have
become common. The crashes happen pretty randomly during normal system usage.
After the syslog messages the system stays semi usable for a minute, but when I
run any new program it hangs. I had to downgrade to 3.6.2 to get my system
usable again.

Is there any way I can help find the cause of those crashes? 

thanks,
Tobias Franke

logs here:
https://www.dropbox.com/s/1f6gh4k0zl2t5zq/btrfs_hangs_3-4-6.log

It starts like this:
Oct 30 22:42:36 localhost kernel: btrfs bad mapping eb start 982689189888 len
4096, wanted 4424 17
Oct 30 22:42:36 localhost kernel: ------------[ cut here ]------------
Oct 30 22:42:36 localhost kernel: WARNING: at fs/btrfs/extent_io.c:4661
map_private_extent_buffer+0x83/0xc9()
Oct 30 22:42:36 localhost kernel: Hardware name: A780L3L
Oct 30 22:42:36 localhost kernel: Modules linked in: hwmon_vid
snd_hda_codec_hdmi nvidia(PO) snd_hda_codec_realtek snd_hda_intel snd_hda_codec
i2c_piix4 k10temp
Oct 30 22:42:36 localhost kernel: Pid: 7143, comm: btrfs-endio-wri Tainted: P   
O 3.6.4-gentoo #1
Oct 30 22:42:36 localhost kernel: Call Trace:
Oct 30 22:42:36 localhost kernel: [<ffffffff81030181>] ?
warn_slowpath_common+0x78/0x8c
Oct 30 22:42:36 localhost kernel: [<ffffffff8127dc2e>] ?
map_private_extent_buffer+0x83/0xc9
Oct 30 22:42:36 localhost kernel: [<ffffffff81249353>] ?
generic_bin_search.clone.50+0xaa/0x130
Oct 30 22:42:36 localhost kernel: [<ffffffff8124c11a>] ?
btrfs_search_old_slot+0x333/0x6ad
Oct 30 22:42:36 localhost kernel: [<ffffffff812a737d>] ?
__resolve_indirect_refs+0x147/0x4b1
Oct 30 22:42:36 localhost kernel: [<ffffffff812a70bf>] ?
__add_keyed_refs.clone.6+0x9e/0x215
Oct 30 22:42:36 localhost kernel: [<ffffffff812a6ba6>] ?
__add_prelim_ref+0x3a/0xb9
Oct 30 22:42:36 localhost kernel: [<ffffffff812a7d54>] ?
find_parent_nodes+0x553/0x7ca
Oct 30 22:42:36 localhost kernel: [<ffffffff812458b8>] ?
leaf_space_used+0xaf/0xd7
Oct 30 22:42:36 localhost kernel: [<ffffffff812a8044>] ?
btrfs_find_all_roots+0x79/0xd4
Oct 30 22:42:36 localhost kernel: [<ffffffff812aa1d0>] ?
btrfs_qgroup_account_ref+0xd2/0x4a8
Oct 30 22:42:36 localhost kernel: [<ffffffff81278482>] ?
alloc_extent_state+0x58/0x9d
Oct 30 22:42:36 localhost kernel: [<ffffffff810e8245>] ?
kmem_cache_free+0x12/0xbd
Oct 30 22:42:36 localhost kernel: [<ffffffff81253c93>] ?
btrfs_delayed_refs_qgroup_accounting+0x9e/0xca
Oct 30 22:42:36 localhost kernel: [<ffffffff81266723>] ?
__btrfs_end_transaction+0x4e/0x293
Oct 30 22:42:36 localhost kernel: [<ffffffff8126bb26>] ?
btrfs_finish_ordered_io+0x2e1/0x332
Oct 30 22:42:36 localhost kernel: [<ffffffff8103cfae>] ?
run_timer_softirq+0x2c2/0x2c2
Oct 30 22:42:36 localhost kernel: [<ffffffff81286398>] ?
worker_loop+0x174/0x4c0
Oct 30 22:42:36 localhost kernel: [<ffffffff81286224>] ?
btrfs_queue_worker+0x261/0x261
Oct 30 22:42:36 localhost kernel: [<ffffffff81286224>] ?
btrfs_queue_worker+0x261/0x261
Oct 30 22:42:36 localhost kernel: [<ffffffff8104b293>] ? kthread+0x81/0x89
Oct 30 22:42:36 localhost kernel: [<ffffffff815de774>] ?
kernel_thread_helper+0x4/0x10
Oct 30 22:42:36 localhost kernel: [<ffffffff8104b212>] ?
kthread_freezable_should_stop+0x52/0x52
Oct 30 22:42:36 localhost kernel: [<ffffffff815de770>] ? gs_change+0xb/0xb
Oct 30 22:42:36 localhost kernel: ---[ end trace cb2d15ce8c2d83ec ]---
Oct 30 22:42:36 localhost kernel: ------------[ cut here ]------------
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Oct-31 00:47 UTC

head link

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

On 10/31/2012 07:57 AM, Franke wrote:> Hello,
> 
> I have been having some crashes like this. Since I upgraded to 3.6.4 they
have become common. The crashes happen pretty randomly during normal system
usage. After the syslog messages the system stays semi usable for a minute, but
when I run any new program it hangs. I had to downgrade to 3.6.2 to get my
system usable again.
> 
> Is there any way I can help find the cause of those crashes? 
> 
Hi Franke,

Jan and me have worked together to fix the mentioned bugs days before, and you
can try
Jan''s git repo and see if it works in your situation:

          git.jan-o-sch.net for-chris

(It contains most of patches related to backref walking.)

thanks,
liubo
> thanks,
> Tobias Franke
> 
> logs here:
> https://www.dropbox.com/s/1f6gh4k0zl2t5zq/btrfs_hangs_3-4-6.log
> 
> It starts like this:
> Oct 30 22:42:36 localhost kernel: btrfs bad mapping eb start 982689189888
len 4096, wanted 4424 17
> Oct 30 22:42:36 localhost kernel: ------------[ cut here ]------------
> Oct 30 22:42:36 localhost kernel: WARNING: at fs/btrfs/extent_io.c:4661
map_private_extent_buffer+0x83/0xc9()
> Oct 30 22:42:36 localhost kernel: Hardware name: A780L3L
> Oct 30 22:42:36 localhost kernel: Modules linked in: hwmon_vid
snd_hda_codec_hdmi nvidia(PO) snd_hda_codec_realtek snd_hda_intel snd_hda_codec
i2c_piix4 k10temp
> Oct 30 22:42:36 localhost kernel: Pid: 7143, comm: btrfs-endio-wri Tainted:
P           O 3.6.4-gentoo #1
> Oct 30 22:42:36 localhost kernel: Call Trace:
> Oct 30 22:42:36 localhost kernel: [<ffffffff81030181>] ?
warn_slowpath_common+0x78/0x8c
> Oct 30 22:42:36 localhost kernel: [<ffffffff8127dc2e>] ?
map_private_extent_buffer+0x83/0xc9
> Oct 30 22:42:36 localhost kernel: [<ffffffff81249353>] ?
generic_bin_search.clone.50+0xaa/0x130
> Oct 30 22:42:36 localhost kernel: [<ffffffff8124c11a>] ?
btrfs_search_old_slot+0x333/0x6ad
> Oct 30 22:42:36 localhost kernel: [<ffffffff812a737d>] ?
__resolve_indirect_refs+0x147/0x4b1
> Oct 30 22:42:36 localhost kernel: [<ffffffff812a70bf>] ?
__add_keyed_refs.clone.6+0x9e/0x215
> Oct 30 22:42:36 localhost kernel: [<ffffffff812a6ba6>] ?
__add_prelim_ref+0x3a/0xb9
> Oct 30 22:42:36 localhost kernel: [<ffffffff812a7d54>] ?
find_parent_nodes+0x553/0x7ca
> Oct 30 22:42:36 localhost kernel: [<ffffffff812458b8>] ?
leaf_space_used+0xaf/0xd7
> Oct 30 22:42:36 localhost kernel: [<ffffffff812a8044>] ?
btrfs_find_all_roots+0x79/0xd4
> Oct 30 22:42:36 localhost kernel: [<ffffffff812aa1d0>] ?
btrfs_qgroup_account_ref+0xd2/0x4a8
> Oct 30 22:42:36 localhost kernel: [<ffffffff81278482>] ?
alloc_extent_state+0x58/0x9d
> Oct 30 22:42:36 localhost kernel: [<ffffffff810e8245>] ?
kmem_cache_free+0x12/0xbd
> Oct 30 22:42:36 localhost kernel: [<ffffffff81253c93>] ?
btrfs_delayed_refs_qgroup_accounting+0x9e/0xca
> Oct 30 22:42:36 localhost kernel: [<ffffffff81266723>] ?
__btrfs_end_transaction+0x4e/0x293
> Oct 30 22:42:36 localhost kernel: [<ffffffff8126bb26>] ?
btrfs_finish_ordered_io+0x2e1/0x332
> Oct 30 22:42:36 localhost kernel: [<ffffffff8103cfae>] ?
run_timer_softirq+0x2c2/0x2c2
> Oct 30 22:42:36 localhost kernel: [<ffffffff81286398>] ?
worker_loop+0x174/0x4c0
> Oct 30 22:42:36 localhost kernel: [<ffffffff81286224>] ?
btrfs_queue_worker+0x261/0x261
> Oct 30 22:42:36 localhost kernel: [<ffffffff81286224>] ?
btrfs_queue_worker+0x261/0x261
> Oct 30 22:42:36 localhost kernel: [<ffffffff8104b293>] ?
kthread+0x81/0x89
> Oct 30 22:42:36 localhost kernel: [<ffffffff815de774>] ?
kernel_thread_helper+0x4/0x10
> Oct 30 22:42:36 localhost kernel: [<ffffffff8104b212>] ?
kthread_freezable_should_stop+0x52/0x52
> Oct 30 22:42:36 localhost kernel: [<ffffffff815de770>] ?
gs_change+0xb/0xb
> Oct 30 22:42:36 localhost kernel: ---[ end trace cb2d15ce8c2d83ec ]---
> Oct 30 22:42:36 localhost kernel: ------------[ cut here ]------------
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Oct-31 00:50 UTC

head link

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

On 10/31/2012 08:47 AM, Liu Bo wrote:> On 10/31/2012 07:57 AM, Franke wrote:
>> Hello,
>>
>> I have been having some crashes like this. Since I upgraded to 3.6.4
they have become common. The crashes happen pretty randomly during normal system
usage. After the syslog messages the system stays semi usable for a minute, but
when I run any new program it hangs. I had to downgrade to 3.6.2 to get my
system usable again.
>>
>> Is there any way I can help find the cause of those crashes? 
>>
> 
> Hi Franke,
> 
> Jan and me have worked together to fix the mentioned bugs days before, and
you can try
> Jan''s git repo and see if it works in your situation:
> 
>           git.jan-o-sch.net for-chris
it should be
           git.jan-o-sch.net/btrfs-unstable for-chris
> 
> (It contains most of patches related to backref walking.)
> 
> thanks,
> liubo
> 
>> thanks,
>> Tobias Franke
>>
>> logs here:
>> https://www.dropbox.com/s/1f6gh4k0zl2t5zq/btrfs_hangs_3-4-6.log
>>
>> It starts like this:
>> Oct 30 22:42:36 localhost kernel: btrfs bad mapping eb start
982689189888 len 4096, wanted 4424 17
>> Oct 30 22:42:36 localhost kernel: ------------[ cut here ]------------
>> Oct 30 22:42:36 localhost kernel: WARNING: at fs/btrfs/extent_io.c:4661
map_private_extent_buffer+0x83/0xc9()
>> Oct 30 22:42:36 localhost kernel: Hardware name: A780L3L
>> Oct 30 22:42:36 localhost kernel: Modules linked in: hwmon_vid
snd_hda_codec_hdmi nvidia(PO) snd_hda_codec_realtek snd_hda_intel snd_hda_codec
i2c_piix4 k10temp
>> Oct 30 22:42:36 localhost kernel: Pid: 7143, comm: btrfs-endio-wri
Tainted: P           O 3.6.4-gentoo #1
>> Oct 30 22:42:36 localhost kernel: Call Trace:
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81030181>] ?
warn_slowpath_common+0x78/0x8c
>> Oct 30 22:42:36 localhost kernel: [<ffffffff8127dc2e>] ?
map_private_extent_buffer+0x83/0xc9
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81249353>] ?
generic_bin_search.clone.50+0xaa/0x130
>> Oct 30 22:42:36 localhost kernel: [<ffffffff8124c11a>] ?
btrfs_search_old_slot+0x333/0x6ad
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812a737d>] ?
__resolve_indirect_refs+0x147/0x4b1
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812a70bf>] ?
__add_keyed_refs.clone.6+0x9e/0x215
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812a6ba6>] ?
__add_prelim_ref+0x3a/0xb9
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812a7d54>] ?
find_parent_nodes+0x553/0x7ca
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812458b8>] ?
leaf_space_used+0xaf/0xd7
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812a8044>] ?
btrfs_find_all_roots+0x79/0xd4
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812aa1d0>] ?
btrfs_qgroup_account_ref+0xd2/0x4a8
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81278482>] ?
alloc_extent_state+0x58/0x9d
>> Oct 30 22:42:36 localhost kernel: [<ffffffff810e8245>] ?
kmem_cache_free+0x12/0xbd
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81253c93>] ?
btrfs_delayed_refs_qgroup_accounting+0x9e/0xca
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81266723>] ?
__btrfs_end_transaction+0x4e/0x293
>> Oct 30 22:42:36 localhost kernel: [<ffffffff8126bb26>] ?
btrfs_finish_ordered_io+0x2e1/0x332
>> Oct 30 22:42:36 localhost kernel: [<ffffffff8103cfae>] ?
run_timer_softirq+0x2c2/0x2c2
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81286398>] ?
worker_loop+0x174/0x4c0
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81286224>] ?
btrfs_queue_worker+0x261/0x261
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81286224>] ?
btrfs_queue_worker+0x261/0x261
>> Oct 30 22:42:36 localhost kernel: [<ffffffff8104b293>] ?
kthread+0x81/0x89
>> Oct 30 22:42:36 localhost kernel: [<ffffffff815de774>] ?
kernel_thread_helper+0x4/0x10
>> Oct 30 22:42:36 localhost kernel: [<ffffffff8104b212>] ?
kthread_freezable_should_stop+0x52/0x52
>> Oct 30 22:42:36 localhost kernel: [<ffffffff815de770>] ?
gs_change+0xb/0xb
>> Oct 30 22:42:36 localhost kernel: ---[ end trace cb2d15ce8c2d83ec ]---
>> Oct 30 22:42:36 localhost kernel: ------------[ cut here ]------------
>> --
>> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Samuel

2012-Oct-31 03:46 UTC

head link

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

On 31/10/12 10:57, Franke wrote:
> I have been having some crashes like this. Since I upgraded to 3.6.4
> they have become common. The crashes happen pretty randomly during
> normal system usage. After the syslog messages the system stays semi
> usable for a minute, but when I run any new program it hangs. I had
> to downgrade to 3.6.2 to get my system usable again.
There were no btrfs changes in either 3.6.3 or 3.6.4, so it must be an
interaction with another change - you will probably need to bisect
between 3.6.2 and 3.6.3 to see what''s causing it to happen.

Worth noting that Greg K-H is looking for people to help him with the
stable series...

http://www.kroah.com/log/linux/minion-wanted.html

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Franke

2012-Oct-31 20:00 UTC

head link

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

Hi,

since yesterday I have run a balance while asleep/at work. Now I
experimented a bit, and the situation has changed.

I am now getting hard hangs ( system is gone without even writing
anything to syslog ), some time ( minutes to an hour ) into running a
scrub.

Those hangs happen with 3.6.2 , 3.6.4 and Jan''s unstable version.
It hasn''t hung yet without running a scrub.

I have no idea if this is part of the same problem or something else.
Do you have any idea either way?

I will probably destroy my btrfs and create a new one over the weekend,
unless you still need info for debugging.

thanks,
Tobias

On Wed, 31 Oct 2012 08:47:16 +0800
Liu Bo <bo.li.liu@oracle.com> wrote:
> On 10/31/2012 07:57 AM, Franke wrote:
> > Hello,
> > 
> > I have been having some crashes like this. Since I upgraded to
> > 3.6.4 they have become common. The crashes happen pretty randomly
> > during normal system usage. After the syslog messages the system
> > stays semi usable for a minute, but when I run any new program it
> > hangs. I had to downgrade to 3.6.2 to get my system usable again.
> > 
> > Is there any way I can help find the cause of those crashes? 
> > 
> 
> Hi Franke,
> 
> Jan and me have worked together to fix the mentioned bugs days
> before, and you can try Jan''s git repo and see if it works in your
> situation:
> 
>           git.jan-o-sch.net for-chris
> 
> (It contains most of patches related to backref walking.)
> 
> thanks,
> liubo
> --
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Nov-01 02:39 UTC

head link

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

On 11/01/2012 04:00 AM, Franke wrote:> Hi,
> 
> since yesterday I have run a balance while asleep/at work. Now I
> experimented a bit, and the situation has changed.
> 
> I am now getting hard hangs ( system is gone without even writing
> anything to syslog ), some time ( minutes to an hour ) into running a
> scrub.
> 
> Those hangs happen with 3.6.2 , 3.6.4 and Jan''s unstable version.
> It hasn''t hung yet without running a scrub.
> 
> I have no idea if this is part of the same problem or something else.
> Do you have any idea either way?
> 
Well, thanks for testing.

We may need your sysrq-w output(maybe screen output) to locate where we hard
hangs.

Besides, I recommend you pick Jan''s patches out, and apply them on the
latest btrfs upstream
and run another round to see if it get better, since there might be some fixes
for the very hang
already in the upstream.

Right now the latest btrfs upstream''s top commit is

commit f46dbe3dee853f8a860f889cb2b7ff4c624f2a7a
Author: Chris Mason <chris.mason@fusionio.com>
Date:   Tue Oct 9 11:17:20 2012 -0400

    btrfs: init ref_index to zero in add_inode_ref
    
    Signed-off-by: Chris Mason <chris.mason@fusionio.com>

thanks,
liubo
> I will probably destroy my btrfs and create a new one over the weekend,
> unless you still need info for debugging.
> 
> thanks,
> Tobias
> 
> On Wed, 31 Oct 2012 08:47:16 +0800
> Liu Bo <bo.li.liu@oracle.com> wrote:
> 
>> On 10/31/2012 07:57 AM, Franke wrote:
>>> Hello,
>>>
>>> I have been having some crashes like this. Since I upgraded to
>>> 3.6.4 they have become common. The crashes happen pretty randomly
>>> during normal system usage. After the syslog messages the system
>>> stays semi usable for a minute, but when I run any new program it
>>> hangs. I had to downgrade to 3.6.2 to get my system usable again.
>>>
>>> Is there any way I can help find the cause of those crashes? 
>>>
>>
>> Hi Franke,
>>
>> Jan and me have worked together to fix the mentioned bugs days
>> before, and you can try Jan''s git repo and see if it works in
your
>> situation:
>>
>>           git.jan-o-sch.net for-chris
>>
>> (It contains most of patches related to backref walking.)
>>
>> thanks,
>> liubo
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-Nov-01 11:57 UTC

head link

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

On Thu, November 01, 2012 at 03:39 (+0100), Liu Bo
wrote:> On 11/01/2012 04:00 AM, Franke wrote:
>> Hi,
>>
>> since yesterday I have run a balance while asleep/at work. Now I
>> experimented a bit, and the situation has changed.
>>
>> I am now getting hard hangs ( system is gone without even writing
>> anything to syslog ), some time ( minutes to an hour ) into running a
>> scrub.
>>
>> Those hangs happen with 3.6.2 , 3.6.4 and Jan''s unstable
version.
>> It hasn''t hung yet without running a scrub.
>>
>> I have no idea if this is part of the same problem or something else.
>> Do you have any idea either way?
>>
> 
> Well, thanks for testing.
> 
> We may need your sysrq-w output(maybe screen output) to locate where we
hard hangs.
> 
> Besides, I recommend you pick Jan''s patches out, and apply them on
the latest btrfs upstream
> and run another round to see if it get better, since there might be some
fixes for the very hang
> already in the upstream.
> 
> Right now the latest btrfs upstream''s top commit is
> 
> commit f46dbe3dee853f8a860f889cb2b7ff4c624f2a7a
> Author: Chris Mason <chris.mason@fusionio.com>
> Date:   Tue Oct 9 11:17:20 2012 -0400
> 
>     btrfs: init ref_index to zero in add_inode_ref
>     
>     Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This is an old top commit. The current cmason/master state is

commit c37b2b6269ee4637fb7cdb5da0d1e47215d57ce2
Author: Josef Bacik <jbacik@fusionio.com>
Date:   Mon Oct 22 15:51:44 2012 -0400

and includes my recent fixes. I don''t really expect them to prevent
getting
stuck anywhere. sysrq+w output would be really helpful. I''m trying to
reproduce
the problems in the meantime.

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-Nov-01 16:01 UTC

head link

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

On Thu, November 01, 2012 at 12:57 (+0100), Jan Schmidt
wrote:> I''m trying to reproduce the problems in the meantime.
Looks like it worked :-/ And it also looks like it can either bug or deadlock,
depending on the things going on in the kernel at the same time.

I did a parallel fsmark on a qgroup enabled volume while scrubbing it, reaching
at a page fault after four hours of iteration:

<1>[194521.851156] BUG: unable to handle kernel paging request at
ffff880137c52a08
<1>[194659.159461] IP: [<ffffffff810e3642>]
__lock_acquire+0x62/0x1630
<4>[194659.231741] PGD 1e0c063 PUD be586067 PMD be745067 PTE
8000000137c52160
<4>[194659.311717] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
<4>[194659.375976] Modules linked in: btrfs mpt2sas scsi_transport_sas
raid_class
<4>[194659.460230] CPU 6 
<4>[194659.483318] Pid: 20466, comm: btrfs-scrub-3 Tainted: G        W   
3.6.0+ #3 Supermicro X8SIL/X8SIL
<4>[194659.595327] RIP: 0010:[<ffffffff810e3642>] 
[<ffffffff810e3642>] __lock_acquire+0x62/0x1630
<4>[194659.696829] RSP: 0018:ffff880138ab7c50  EFLAGS: 00010046
<4>[194659.761725] RAX: 0000000000000046 RBX: ffff880137c52a08 RCX:
0000000000000000
<4>[194659.848565] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
ffff880137c52a08
<4>[194659.935405] RBP: ffff880138ab7d20 R08: 0000000000000002 R09:
0000000000000001
<4>[194660.022245] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff8802273ba3b0
<4>[194660.108984] R13: 0000000000000002 R14: 0000000000000000 R15:
0000000000000000
<4>[194660.195717] FS:  0000000000000000(0000) GS:ffff880237200000(0000)
knlGS:0000000000000000
<4>[194660.293997] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>[194660.363990] CR2: ffff880137c52a08 CR3: 0000000001e0b000 CR4:
00000000000007e0
<4>[194660.450726] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
<4>[194660.537564] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
<4>[194660.624406] Process btrfs-scrub-3 (pid: 20466, threadinfo
ffff880138ab6000, task ffff8802273ba3b0)
<4>[194660.733189] Stack:
<4>[194660.758357]  0000000000000286 ffff8802353a4000 ffff8802273ba3b0
ffffffff00000000
<4>[194660.848733]  ffff880138ab7c90 0000000000000286 ffff880138ab7d20
ffff8802353a4000
<4>[194660.939212]  ffff8802273baa78 ffffffff8245f100 ffff880138ab7cc0
0000000000000286
<4>[194661.029589] Call Trace:
<4>[194661.059969]  [<ffffffff8109811a>] ? del_timer_sync+0x8a/0xc0
<4>[194661.128964]  [<ffffffff81098090>] ?
try_to_del_timer_sync+0x70/0x70
<4>[194661.205367]  [<ffffffffa00a306a>] ? worker_loop+0x35a/0x5b0
[btrfs]
<4>[194661.281688]  [<ffffffff810e4ca5>] lock_acquire+0x95/0x140
<4>[194661.347634]  [<ffffffffa00a306a>] ? worker_loop+0x35a/0x5b0
[btrfs]
<4>[194661.423964]  [<ffffffff819380c0>] _raw_spin_lock+0x40/0x80
<4>[194661.490953]  [<ffffffffa00a306a>] ? worker_loop+0x35a/0x5b0
[btrfs]
<4>[194661.567283]  [<ffffffffa00a306a>] worker_loop+0x35a/0x5b0
[btrfs]
<4>[194661.641539]  [<ffffffffa00a2d10>] ?
btrfs_queue_worker+0x300/0x300 [btrfs]
<4>[194661.725249]  [<ffffffff810ac3d6>] kthread+0xa6/0xb0
<4>[194661.784961]  [<ffffffff819409a4>]
kernel_thread_helper+0x4/0x10
<4>[194661.857120]  [<ffffffff8193901d>] ?
retint_restore_args+0xe/0xe
<4>[194661.929294]  [<ffffffff810ac330>] ?
__init_kthread_worker+0x70/0x70
<4>[194662.005632]  [<ffffffff819409a0>] ? gs_change+0xb/0xb
<4>[194662.067405] Code: 48 89 5d d8 4c 89 7d f8 45 0f 45 e8 85 c0 48 89
fb 4c 8b 55 10 0f 84 4e 04 00 00 44 8b 3d 2b be 0c 01 45 85 ff 0f 84 56 04 00 00
<48> 81 3b e0 5a 1f 82 b8 01 00 00 00 44 0f 44 e8 83 fe 01 0f 86
<1>[194662.302652] RIP  [<ffffffff810e3642>]
__lock_acquire+0x62/0x1630
<4>[194662.375973]  RSP <ffff880138ab7c50>
<4>[194662.418821] CR2: ffff880137c52a08
<4>[194662.460051] ---[ end trace 85e160ea023efd39 ]---

debug config enabled:
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_SLUB_DEBUG=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_LOCKDEP=y

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tobias Franke

2012-Nov-01 18:13 UTC

head link

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

Hi,

I have been trying to get some sysrq output, but I am only getting hard
hangs right now. So no reaction to sysrq-w.

Here is some sysrq-w output from around 15 minutes before the crash:
https://www.dropbox.com/s/386ifyoxh8eppia/sysrqw_2012-11-1.log
I don''t know if this will be helpful.

Tobias

On Thu, 01 Nov 2012 12:57:00 +0100
Jan Schmidt <list.btrfs@jan-o-sch.net> wrote:
> On Thu, November 01, 2012 at 03:39 (+0100), Liu Bo wrote:
> > On 11/01/2012 04:00 AM, Franke wrote:
> >> Hi,
> >>
> >> since yesterday I have run a balance while asleep/at work. Now I
> >> experimented a bit, and the situation has changed.
> >>
> >> I am now getting hard hangs ( system is gone without even writing
> >> anything to syslog ), some time ( minutes to an hour ) into
> >> running a scrub.
> >>
> >> Those hangs happen with 3.6.2 , 3.6.4 and Jan''s unstable
version.
> >> It hasn''t hung yet without running a scrub.
> >>
> >> I have no idea if this is part of the same problem or something
> >> else. Do you have any idea either way?
> >>
> > 
> > Well, thanks for testing.
> > 
> > We may need your sysrq-w output(maybe screen output) to locate
> > where we hard hangs.
> > 
> > Besides, I recommend you pick Jan''s patches out, and apply
them on
> > the latest btrfs upstream and run another round to see if it get
> > better, since there might be some fixes for the very hang already
> > in the upstream.
> > 
> > Right now the latest btrfs upstream''s top commit is
> > 
> > commit f46dbe3dee853f8a860f889cb2b7ff4c624f2a7a
> > Author: Chris Mason <chris.mason@fusionio.com>
> > Date:   Tue Oct 9 11:17:20 2012 -0400
> > 
> >     btrfs: init ref_index to zero in add_inode_ref
> >     
> >     Signed-off-by: Chris Mason <chris.mason@fusionio.com>
> 
> This is an old top commit. The current cmason/master state is
> 
> commit c37b2b6269ee4637fb7cdb5da0d1e47215d57ce2
> Author: Josef Bacik <jbacik@fusionio.com>
> Date:   Mon Oct 22 15:51:44 2012 -0400
> 
> and includes my recent fixes. I don''t really expect them to
prevent
> getting stuck anywhere. sysrq+w output would be really helpful.
I''m
> trying to reproduce the problems in the meantime.
> 
> -Jan--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Seemingly Similar Threads

Search for more apparently analagous threads

Btrfs devel - Oct 2012 - Crashes in extent_io.c after "btrfs bad mapping eb" notice

Crashes in extent_io.c after "btrfs bad mapping eb" notice

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice

Seemingly Similar Threads