thr3ads.net - Btrfs devel - 3.5-rc4: BTRFS unmountable after hard lockup [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Martin Steigerwald

2012-Jun-25 18:29 UTC

3.5-rc4: BTRFS unmountable after hard lockup

Hi!

I got a X server / drm related crash or hard lockup. After I rebooted I
tried to mount the BTRFS on my esata disk. It has big metadata 
(mkfs.btrfs -l 32768 -n 32768).


I got:

[   43.764274] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe
frozen
[   43.764278] ata5: irq_stat 0x00000040, connection status changed
[   43.764281] ata5: SError: { PHYRdyChg CommWake DevExch }
[   43.764287] ata5: hard resetting link
[   46.978917] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   46.989402] ata5.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[   46.989407] ata5.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK)
filtered out
[   46.990609] ata5.00: ATA-8: Hitachi HTS545050B9A300, PB4OC60G, max UDMA/133
[   46.990613] ata5.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[   46.991925] ata5.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[   46.991930] ata5.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK)
filtered out
[   46.993155] ata5.00: configured for UDMA/133
[   47.003851] ata5: EH complete
[   47.003958] scsi 4:0:0:0: Direct-Access     ATA      Hitachi HTS54505 PB4O
PQ: 0 ANSI: 5
[   47.004135] sd 4:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465
GiB)
[   47.004191] sd 4:0:0:0: [sdb] Write Protect is off
[   47.004194] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[   47.004218] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled,
doesn''t support DPO or FUA
[   47.050154]  sdb: sdb1
[   47.050390] sd 4:0:0:0: [sdb] Attached SCSI disk
[   58.100217] CPU1: Package power limit notification (total events = 1)
[   58.100220] CPU3: Package power limit notification (total events = 1)
[   58.100221] CPU2: Package power limit notification (total events = 1)
[   58.100225] CPU0: Package power limit notification (total events = 1)
[   58.103689] CPU1: Package power limit normal
[   58.103691] CPU3: Package power limit normal
[   58.103692] CPU2: Package power limit normal
[   58.103695] CPU0: Package power limit normal
[  249.200560] device label daten devid 1 transid 2194 /dev/sdb1
[  249.201186] btrfs: use lzo compression
[  249.201192] btrfs: disk space caching is enabled
[  249.241975] btrfs: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 0, gen 0
[  251.620610] ------------[ cut here ]------------
[  251.620693] kernel BUG at fs/btrfs/inode.c:3758!
[  251.620767] invalid opcode: 0000 [#1] PREEMPT SMP 
[  251.620842] CPU 1 
[  251.620960] 
[  251.620988] Pid: 3430, comm: mount Tainted: G           O 3.5.0-rc4-tp520 #1
LENOVO 42433WG/42433WG
[  251.621149] RIP: 0010:[<ffffffffa023a93f>]  [<ffffffffa023a93f>]
btrfs_evict_inode+0xcd/0x278 [btrfs]
[  251.621289] RSP: 0018:ffff880157033a58  EFLAGS: 00010246
[  251.621370] RAX: 0000000000000000 RBX: ffff8801c1747800 RCX: 000000000000001a
[  251.621477] RDX: 000000000000001a RSI: 0000000000000002 RDI: ffff880157032000
[  251.621584] RBP: ffff8800851c5d20 R08: ffff880157033978 R09: 0000000000000002
[  251.621691] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffffa027a230
[  251.621799] R13: 0000000000008000 R14: 0000000000008000 R15: ffff8801c1740400
[  251.621907] FS:  00007ffa1402e7e0(0000) GS:ffff88021e240000(0000)
knlGS:0000000000000000
[  251.622029] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  251.622116] CR2: ffffffffff600400 CR3: 000000014a97d000 CR4: 00000000000407e0
[  251.622224] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  251.622332] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  251.622440] Process mount (pid: 3430, threadinfo ffff880157032000, task
ffff88015983be70)
[  251.622562] Stack:
[  251.622597]  0000000000000000 ffff8800851c5da8 ffff8800851c5d20
ffff8800851c5d20
[  251.622711]  ffff8800851c5e18 ffffffffa027a230 ffff8800851c5d20
ffff8801c1742c00
[  251.622825]  ffff8801c1740400 ffffffff8111787c ffff88020dca8cf0
ffff8801c1747800
[  251.622936] Call Trace:
[  251.622982]  [<ffffffff8111787c>] ? evict+0xa3/0x153
[  251.623077]  [<ffffffffa0260ef6>] ? fixup_inode_link_counts+0xd2/0xfb
[btrfs]
[  251.623201]  [<ffffffffa022ee3c>] ?
btrfs_read_fs_root_no_name+0x92/0x24e [btrfs]
[  251.623331]  [<ffffffffa0261db3>] ? btrfs_recover_log_trees+0x207/0x2dd
[btrfs]
[  251.623458]  [<ffffffffa0260a3b>] ? replay_one_extent+0x439/0x439
[btrfs]
[  251.623578]  [<ffffffffa0230fac>] ? open_ctree+0x1354/0x1680 [btrfs]
[  251.627492]  [<ffffffff811b60b0>] ? ida_get_new_above+0x16c/0x17d
[  251.631356]  [<ffffffffa02153fa>] ? btrfs_mount+0x3cb/0x516 [btrfs]
[  251.635197]  [<ffffffff810ef373>] ? alloc_pages_current+0xb2/0xcd
[  251.638971]  [<ffffffff811078c9>] ? mount_fs+0x61/0x144
[  251.642736]  [<ffffffff8111a390>] ? vfs_kern_mount+0x62/0xe3
[  251.646426]  [<ffffffff8111aa2a>] ? do_kern_mount+0x49/0xdd
[  251.650039]  [<ffffffff8111c20f>] ? do_mount+0x68a/0x710
[  251.653636]  [<ffffffff8111c3b5>] ? sys_mount+0x80/0xba
[  251.657204]  [<ffffffff813d53b9>] ? system_call_fastpath+0x16/0x1b
[  251.660791] Code: 00 48 83 ca ff 31 f6 48 89 ef e8 d8 05 01 00 48 8b 83 20 01
00 00 83 b8 40 0e 00 00 00 74 0e 48 8b 45 98 a8
20 0f 85 7b 01 00 00 <0f> 0b 83 7d 48 00 74 0f 83 bb f8 00 00 00 00 0f 84
66 01 00 00
[  251.668347] RIP  [<ffffffffa023a93f>] btrfs_evict_inode+0xcd/0x278
[btrfs]
[  251.672204]  RSP <ffff880157033a58>
[  251.698474] ---[ end trace 431fcd3e91e1f4fd ]---
[  265.799887] nepomukservices[2181]: segfault at 0 ip           (null) sp
00007fff403d0ca8 error 14 in
nepomukservicestub[400000+7000]


BTRFS was not mounted. After trying to mount again, I got:

merkaba:~> ps aux | grep " D" | grep -v grep 
root      3446  0.0  0.0      0     0 ?        D    20:22   0:00
[btrfs-transacti]
root      4666  0.0  0.0  18640  1184 tty1     D+   20:24   0:00 mount
/mnt/amazon-daten

Any hints how to get my disk mounted?

I have a fairly recent backup, but I would prefer when I do not have to
replay it. Its one of my expectations for a file system: be safe on sudden
write interruptions like power loss or crash.

Ciao,
-- 
Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Steigerwald

2012-Jun-25 18:48 UTC

head link

Re: 3.5-rc4: BTRFS unmountable after hard lockup

Am Montag, 25. Juni 2012 schrieb Martin Steigerwald:> Hi!
> 
> I got a X server / drm related crash or hard lockup. After I rebooted I
> tried to mount the BTRFS on my esata disk. It has big metadata
> (mkfs.btrfs -l 32768 -n 32768).
> 
> 
> I got:
> [… backtrace …]
> BTRFS was not mounted. After trying to mount again, I got:
> 
> merkaba:~> ps aux | grep " D" | grep -v grep
> root      3446  0.0  0.0      0     0 ?        D    20:22   0:00
> [btrfs-transacti] root      4666  0.0  0.0  18640  1184 tty1     D+  
> 20:24   0:00 mount /mnt/amazon-daten
> 
> Any hints how to get my disk mounted?
> 
> I have a fairly recent backup, but I would prefer when I do not have to
> replay it. Its one of my expectations for a file system: be safe on
> sudden write interruptions like power loss or crash.
Well, I wanted to have back my disk ASAP. So I just tried that btrfs-zero-
log mantra again.

It worked. Hopefully the backtrace still gives you a clue on what has 
happened. I thought these kind of errors where gone now.

(Yeah, I know its still experimental… no indoctrination requested;-)

Thanks,
-- 
Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2012-Jun-25 22:18 UTC

head link

Re: 3.5-rc4: BTRFS unmountable after hard lockup

On Mon, Jun 25, 2012 at 08:29:34PM +0200, Martin Steigerwald
wrote:> I got a X server / drm related crash or hard lockup. After I rebooted I
> tried to mount the BTRFS on my esata disk. It has big metadata 
> (mkfs.btrfs -l 32768 -n 32768).
> 
> 
> I got:
> 
> [   43.764274] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action
0xe frozen
> [   43.764278] ata5: irq_stat 0x00000040, connection status changed
> [   43.764281] ata5: SError: { PHYRdyChg CommWake DevExch }
> [   43.764287] ata5: hard resetting link
> [   46.978917] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [   46.989402] ata5.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES)
succeeded
> [   46.989407] ata5.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
LOCK) filtered out
> [   46.990609] ata5.00: ATA-8: Hitachi HTS545050B9A300, PB4OC60G, max
UDMA/133
> [   46.990613] ata5.00: 976773168 sectors, multi 0: LBA48 NCQ (depth
31/32), AA
> [   46.991925] ata5.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES)
succeeded
> [   46.991930] ata5.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
LOCK) filtered out
> [   46.993155] ata5.00: configured for UDMA/133
> [   47.003851] ata5: EH complete
> [   47.003958] scsi 4:0:0:0: Direct-Access     ATA      Hitachi HTS54505
PB4O PQ: 0 ANSI: 5
> [   47.004135] sd 4:0:0:0: [sdb] 976773168 512-byte logical blocks: (500
GB/465 GiB)
> [   47.004191] sd 4:0:0:0: [sdb] Write Protect is off
> [   47.004194] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> [   47.004218] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled,
doesn''t support DPO or FUA
> [   47.050154]  sdb: sdb1
> [   47.050390] sd 4:0:0:0: [sdb] Attached SCSI disk
> [   58.100217] CPU1: Package power limit notification (total events = 1)
> [   58.100220] CPU3: Package power limit notification (total events = 1)
> [   58.100221] CPU2: Package power limit notification (total events = 1)
> [   58.100225] CPU0: Package power limit notification (total events = 1)
> [   58.103689] CPU1: Package power limit normal
> [   58.103691] CPU3: Package power limit normal
> [   58.103692] CPU2: Package power limit normal
> [   58.103695] CPU0: Package power limit normal
> [  249.200560] device label daten devid 1 transid 2194 /dev/sdb1
> [  249.201186] btrfs: use lzo compression
> [  249.201192] btrfs: disk space caching is enabled
> [  249.241975] btrfs: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 0,
gen 0
> [  251.620610] ------------[ cut here ]------------
> [  251.620693] kernel BUG at fs/btrfs/inode.c:3758!
3756         if (root->fs_info->log_root_recovering) {
3757                 BUG_ON(!test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
3758                                  &BTRFS_I(inode)->runtime_flags));
3759                 goto no_delete;
3760         }

and it happened during log replay, as you found already, fixable by
running the zero-log utility. Another way is to mount read-only, this
skips log replay.

I think there could be a logic error, as this probably happens only
during log replay when the orphan bit is not in sync with link count,
but I saw that this should be handled in the fixup_inode_link_counts
call path. CCing Josef, if he has an idea.
> [  251.620767] invalid opcode: 0000 [#1] PREEMPT SMP 
> [  251.620842] CPU 1 
> [  251.620960] 
> [  251.620988] Pid: 3430, comm: mount Tainted: G           O
3.5.0-rc4-tp520 #1 LENOVO 42433WG/42433WG
> [  251.621149] RIP: 0010:[<ffffffffa023a93f>] 
[<ffffffffa023a93f>] btrfs_evict_inode+0xcd/0x278 [btrfs]
> [  251.621289] RSP: 0018:ffff880157033a58  EFLAGS: 00010246
> [  251.621370] RAX: 0000000000000000 RBX: ffff8801c1747800 RCX:
000000000000001a
> [  251.621477] RDX: 000000000000001a RSI: 0000000000000002 RDI:
ffff880157032000
> [  251.621584] RBP: ffff8800851c5d20 R08: ffff880157033978 R09:
0000000000000002
> [  251.621691] R10: 0000000000000000 R11: 0000000000000001 R12:
ffffffffa027a230
> [  251.621799] R13: 0000000000008000 R14: 0000000000008000 R15:
ffff8801c1740400
> [  251.621907] FS:  00007ffa1402e7e0(0000) GS:ffff88021e240000(0000)
knlGS:0000000000000000
> [  251.622029] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  251.622116] CR2: ffffffffff600400 CR3: 000000014a97d000 CR4:
00000000000407e0
> [  251.622224] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
> [  251.622332] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
> [  251.622440] Process mount (pid: 3430, threadinfo ffff880157032000, task
ffff88015983be70)
> [  251.622562] Stack:
> [  251.622597]  0000000000000000 ffff8800851c5da8 ffff8800851c5d20
ffff8800851c5d20
> [  251.622711]  ffff8800851c5e18 ffffffffa027a230 ffff8800851c5d20
ffff8801c1742c00
> [  251.622825]  ffff8801c1740400 ffffffff8111787c ffff88020dca8cf0
ffff8801c1747800
> [  251.622936] Call Trace:
> [  251.622982]  [<ffffffff8111787c>] ? evict+0xa3/0x153
> [  251.623077]  [<ffffffffa0260ef6>] ?
fixup_inode_link_counts+0xd2/0xfb [btrfs]
> [  251.623201]  [<ffffffffa022ee3c>] ?
btrfs_read_fs_root_no_name+0x92/0x24e [btrfs]
> [  251.623331]  [<ffffffffa0261db3>] ?
btrfs_recover_log_trees+0x207/0x2dd [btrfs]
> [  251.623458]  [<ffffffffa0260a3b>] ? replay_one_extent+0x439/0x439
[btrfs]
> [  251.623578]  [<ffffffffa0230fac>] ? open_ctree+0x1354/0x1680
[btrfs]
> [  251.627492]  [<ffffffff811b60b0>] ? ida_get_new_above+0x16c/0x17d
> [  251.631356]  [<ffffffffa02153fa>] ? btrfs_mount+0x3cb/0x516
[btrfs]
> [  251.635197]  [<ffffffff810ef373>] ? alloc_pages_current+0xb2/0xcd
> [  251.638971]  [<ffffffff811078c9>] ? mount_fs+0x61/0x144
> [  251.642736]  [<ffffffff8111a390>] ? vfs_kern_mount+0x62/0xe3
> [  251.646426]  [<ffffffff8111aa2a>] ? do_kern_mount+0x49/0xdd
> [  251.650039]  [<ffffffff8111c20f>] ? do_mount+0x68a/0x710
> [  251.653636]  [<ffffffff8111c3b5>] ? sys_mount+0x80/0xba
> [  251.657204]  [<ffffffff813d53b9>] ? system_call_fastpath+0x16/0x1b
> [  251.660791] Code: 00 48 83 ca ff 31 f6 48 89 ef e8 d8 05 01 00 48 8b 83
20 01 00 00 83 b8 40 0e 00 00 00 74 0e 48 8b 45 98 a8
> 20 0f 85 7b 01 00 00 <0f> 0b 83 7d 48 00 74 0f 83 bb f8 00 00 00 00
0f 84 66 01 00 00
> [  251.668347] RIP  [<ffffffffa023a93f>] btrfs_evict_inode+0xcd/0x278
[btrfs]
> [  251.672204]  RSP <ffff880157033a58>--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Jun-26 03:47 UTC

head link

Re: 3.5-rc4: BTRFS unmountable after hard lockup

On 06/26/2012 06:18 AM, David Sterba wrote:
> 3756         if (root->fs_info->log_root_recovering) {
> 3757                 BUG_ON(!test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> 3758                                 
&BTRFS_I(inode)->runtime_flags));
> 3759                 goto no_delete;
> 3760         }
> 
> and it happened during log replay, as you found already, fixable by
> running the zero-log utility. Another way is to mount read-only, this
> skips log replay.
> 
> I think there could be a logic error, as this probably happens only
> during log replay when the orphan bit is not in sync with link count,
> but I saw that this should be handled in the fixup_inode_link_counts
> call path. CCing Josef, if he has an idea.
> 

It is a logic error, but mostly a finger wrong from Josef IMO... :)

I''ll send a patch for it.

thanks,
liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Steigerwald

2012-Jun-26 12:28 UTC

head link

Re: 3.5-rc4: BTRFS unmountable after hard lockup

Am Dienstag, 26. Juni 2012 schrieb Liu Bo:> On 06/26/2012 06:18 AM, David Sterba wrote:
> > 3756         if (root->fs_info->log_root_recovering) {
> > 3757                 BUG_ON(!test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > 3758                                
&BTRFS_I(inode)->runtime_flags));
> > 3759                 goto no_delete;
> > 3760         }
> > 
> > and it happened during log replay, as you found already, fixable by
> > running the zero-log utility. Another way is to mount read-only, this
> > skips log replay.
> > 
> > I think there could be a logic error, as this probably happens only
> > during log replay when the orphan bit is not in sync with link count,
> > but I saw that this should be handled in the fixup_inode_link_counts
> > call path. CCing Josef, if he has an idea.
> 
> It is a logic error, but mostly a finger wrong from Josef IMO... :)
> 
> I''ll send a patch for it.
Thanks for looking into it. 

Since my BTRFS is up and running again I can´t test a patch easily however.
I´d have to unplug the disk or crash my laptop several times to trigger it
again I bet.

-- 
Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2012-Jun-26 12:49 UTC

head link

Re: 3.5-rc4: BTRFS unmountable after hard lockup

On Mon, Jun 25, 2012 at 09:47:33PM -0600, Liu Bo wrote:> On 06/26/2012 06:18 AM, David Sterba wrote:
> 
> > 3756         if (root->fs_info->log_root_recovering) {
> > 3757                 BUG_ON(!test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > 3758                                 
&BTRFS_I(inode)->runtime_flags));
> > 3759                 goto no_delete;
> > 3760         }
> > 
> > and it happened during log replay, as you found already, fixable by
> > running the zero-log utility. Another way is to mount read-only, this
> > skips log replay.
> > 
> > I think there could be a logic error, as this probably happens only
> > during log replay when the orphan bit is not in sync with link count,
> > but I saw that this should be handled in the fixup_inode_link_counts
> > call path. CCing Josef, if he has an idea.
> > 
> 
> 
> It is a logic error, but mostly a finger wrong from Josef IMO... :)
> 
> I''ll send a patch for it.
Heh oops, sorry about that ;),

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jun 2012 - 3.5-rc4: BTRFS unmountable after hard lockup

3.5-rc4: BTRFS unmountable after hard lockup

Re: 3.5-rc4: BTRFS unmountable after hard lockup

Re: 3.5-rc4: BTRFS unmountable after hard lockup

Re: 3.5-rc4: BTRFS unmountable after hard lockup

Re: 3.5-rc4: BTRFS unmountable after hard lockup

Re: 3.5-rc4: BTRFS unmountable after hard lockup