raid10 metadata and data filesystem. dmesg log follows. The system is unable to unmount the filesystem after this occurs. Filesystem mounted at/mnt/btrfs with -o compress,degraded Command: btrfs device delete missing /mnt/btrfs [ 283.398222] ------------[ cut here ]------------ [ 283.398289] kernel BUG at /home/apw/COD/linux/fs/btrfs/transaction.c:1329! [ 283.398355] invalid opcode: 0000 [#1] SMP [ 283.398481] CPU 3 [ 283.398520] Modules linked in: nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc psmouse lp parport joydev serio_raw btrfs zlib_deflate libcrc32c usbhid hid mpt2sas scsi_transport_sas raid_class e1000e [ 283.399435] [ 283.399491] Pid: 2216, comm: btrfs Not tainted 3.2.0-030200rc2-generic #201111151435 Supermicro X8SIL/X8SIL [ 283.399678] RIP: 0010:[<ffffffffa00f7052>] [<ffffffffa00f7052>] btrfs_commit_transaction+0x8f2/0x900 [btrfs] [ 283.399822] RSP: 0018:ffff880133573ac8 EFLAGS: 00010282 [ 283.399884] RAX: 00000000fffffffb RBX: ffff8801276540f0 RCX: ffff880133573a38 [ 283.399952] RDX: 0000000000002000 RSI: 000004a22dd55000 RDI: ffff880127654150 [ 283.400020] RBP: ffff880133573b88 R08: 0000000000002000 R09: 0000000000000000 [ 283.400087] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880127654168 [ 283.400154] R13: ffff880130b8ac00 R14: ffff8801276540f0 R15: ffff88012d4eb480 [ 283.400220] FS: 00007f0c17fc1760(0000) GS:ffff88013bcc0000(0000) knlGS:0000000000000000 [ 283.400302] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 283.400363] CR2: 00007f7e1d427870 CR3: 0000000132a4c000 CR4: 00000000000006e0 [ 283.400428] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 283.400492] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 283.400557] Process btrfs (pid: 2216, threadinfo ffff880133572000, task ffff8801306cdbc0) [ 283.400637] Stack: [ 283.400691] 0000000000100000 ffffea0001a22701 ffff880133573b28 ffff88012d4eb480 [ 283.400905] 0000000000000000 0000000000000132 ffff880133573b28 ffffffffa00f65be [ 283.401122] 0000000000000001 ffff880130b8ac00 ffff880130b8ac00 0000000000000001 [ 283.401322] Call Trace: [ 283.401380] [<ffffffffa00f65be>] ? join_transaction+0xde/0x280 [btrfs] [ 283.401441] [<ffffffff81088a90>] ? wake_up_bit+0x40/0x40 [ 283.401521] [<ffffffffa013615e>] prepare_to_relocate+0xbe/0xd0 [btrfs] [ 283.401604] [<ffffffffa013c17b>] relocate_block_group+0x4b/0x5f0 [btrfs] [ 283.401671] [<ffffffff8160e7fe>] ? _raw_spin_lock+0xe/0x20 [ 283.401750] [<ffffffffa00f515b>] ? btrfs_clean_old_snapshots+0x7b/0x160 [btrfs] [ 283.401852] [<ffffffffa013c8d3>] btrfs_relocate_block_group+0x1b3/0x2d0 [btrfs] [ 283.401952] [<ffffffffa011ac8d>] btrfs_relocate_chunk+0x7d/0x430 [btrfs] [ 283.402038] [<ffffffffa0125e42>] ? btrfs_tree_read_unlock_blocking+0x42/0x70 [btrfs] [ 283.402141] [<ffffffffa011b8d3>] btrfs_shrink_device+0x223/0x440 [btrfs] [ 283.402226] [<ffffffffa011bce2>] btrfs_rm_device+0x1f2/0x5c0 [btrfs] [ 283.402310] [<ffffffffa0125a78>] btrfs_ioctl+0x4e8/0x690 [btrfs] [ 283.402379] [<ffffffff811892a9>] do_vfs_ioctl+0x99/0x350 [ 283.402443] [<ffffffff81182dc5>] ? putname+0x35/0x50 [ 283.402506] [<ffffffff81189601>] sys_ioctl+0xa1/0xb0 [ 283.402572] [<ffffffff81616c02>] system_call_fastpath+0x16/0x1b [ 283.402634] Code: 00 48 85 db 0f 84 5b fe ff ff 48 8b 03 0f 1f 40 00 48 8b 7b 08 48 83 c3 10 4c 89 ee ff d0 48 8b 03 48 85 c0 75 eb e9 3a fe ff ff <0f> 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec [ 283.404962] RIP [<ffffffffa00f7052>] btrfs_commit_transaction+0x8f2/0x900 [btrfs] [ 283.405099] RSP <ffff880133573ac8> [ 283.405224] ---[ end trace d2452d35e90228f4 ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Which kernel is this? This looks like one I recently fixed. -chris On Thu, Dec 08, 2011 at 11:06:47AM -0800, David Marcin wrote:> raid10 metadata and data filesystem. dmesg log follows. The system > is unable to unmount the filesystem after this occurs. > > Filesystem mounted at/mnt/btrfs with -o compress,degraded > Command: btrfs device delete missing /mnt/btrfs > > [ 283.398222] ------------[ cut here ]------------ > [ 283.398289] kernel BUG at /home/apw/COD/linux/fs/btrfs/transaction.c:1329! > [ 283.398355] invalid opcode: 0000 [#1] SMP > [ 283.398481] CPU 3 > [ 283.398520] Modules linked in: nfsd nfs lockd fscache auth_rpcgss > nfs_acl sunrpc psmouse lp parport joydev serio_raw btrfs zlib_deflate > libcrc32c usbhid hid mpt2sas scsi_transport_sas raid_class e1000e > [ 283.399435] > [ 283.399491] Pid: 2216, comm: btrfs Not tainted > 3.2.0-030200rc2-generic #201111151435 Supermicro X8SIL/X8SIL > [ 283.399678] RIP: 0010:[<ffffffffa00f7052>] [<ffffffffa00f7052>] > btrfs_commit_transaction+0x8f2/0x900 [btrfs] > [ 283.399822] RSP: 0018:ffff880133573ac8 EFLAGS: 00010282 > [ 283.399884] RAX: 00000000fffffffb RBX: ffff8801276540f0 RCX: ffff880133573a38 > [ 283.399952] RDX: 0000000000002000 RSI: 000004a22dd55000 RDI: ffff880127654150 > [ 283.400020] RBP: ffff880133573b88 R08: 0000000000002000 R09: 0000000000000000 > [ 283.400087] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880127654168 > [ 283.400154] R13: ffff880130b8ac00 R14: ffff8801276540f0 R15: ffff88012d4eb480 > [ 283.400220] FS: 00007f0c17fc1760(0000) GS:ffff88013bcc0000(0000) > knlGS:0000000000000000 > [ 283.400302] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 283.400363] CR2: 00007f7e1d427870 CR3: 0000000132a4c000 CR4: 00000000000006e0 > [ 283.400428] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 283.400492] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 283.400557] Process btrfs (pid: 2216, threadinfo ffff880133572000, > task ffff8801306cdbc0) > [ 283.400637] Stack: > [ 283.400691] 0000000000100000 ffffea0001a22701 ffff880133573b28 > ffff88012d4eb480 > [ 283.400905] 0000000000000000 0000000000000132 ffff880133573b28 > ffffffffa00f65be > [ 283.401122] 0000000000000001 ffff880130b8ac00 ffff880130b8ac00 > 0000000000000001 > [ 283.401322] Call Trace: > [ 283.401380] [<ffffffffa00f65be>] ? join_transaction+0xde/0x280 [btrfs] > [ 283.401441] [<ffffffff81088a90>] ? wake_up_bit+0x40/0x40 > [ 283.401521] [<ffffffffa013615e>] prepare_to_relocate+0xbe/0xd0 [btrfs] > [ 283.401604] [<ffffffffa013c17b>] relocate_block_group+0x4b/0x5f0 [btrfs] > [ 283.401671] [<ffffffff8160e7fe>] ? _raw_spin_lock+0xe/0x20 > [ 283.401750] [<ffffffffa00f515b>] ? > btrfs_clean_old_snapshots+0x7b/0x160 [btrfs] > [ 283.401852] [<ffffffffa013c8d3>] > btrfs_relocate_block_group+0x1b3/0x2d0 [btrfs] > [ 283.401952] [<ffffffffa011ac8d>] btrfs_relocate_chunk+0x7d/0x430 [btrfs] > [ 283.402038] [<ffffffffa0125e42>] ? > btrfs_tree_read_unlock_blocking+0x42/0x70 [btrfs] > [ 283.402141] [<ffffffffa011b8d3>] btrfs_shrink_device+0x223/0x440 [btrfs] > [ 283.402226] [<ffffffffa011bce2>] btrfs_rm_device+0x1f2/0x5c0 [btrfs] > [ 283.402310] [<ffffffffa0125a78>] btrfs_ioctl+0x4e8/0x690 [btrfs] > [ 283.402379] [<ffffffff811892a9>] do_vfs_ioctl+0x99/0x350 > [ 283.402443] [<ffffffff81182dc5>] ? putname+0x35/0x50 > [ 283.402506] [<ffffffff81189601>] sys_ioctl+0xa1/0xb0 > [ 283.402572] [<ffffffff81616c02>] system_call_fastpath+0x16/0x1b > [ 283.402634] Code: 00 48 85 db 0f 84 5b fe ff ff 48 8b 03 0f 1f 40 > 00 48 8b 7b 08 48 83 c3 10 4c 89 ee ff d0 48 8b 03 48 85 c0 75 eb e9 > 3a fe ff ff <0f> 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 > 83 ec > [ 283.404962] RIP [<ffffffffa00f7052>] > btrfs_commit_transaction+0x8f2/0x900 [btrfs] > [ 283.405099] RSP <ffff880133573ac8> > [ 283.405224] ---[ end trace d2452d35e90228f4 ]--- > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Chris, This was on 3.2-rc2 but I tried with rc4 and it segfaulted again. I think the traces were the same but I''ve rebooted and can''t say for sure. David On Thu, Dec 8, 2011 at 11:45 AM, Chris Mason <chris.mason@oracle.com> wrote:> Which kernel is this? This looks like one I recently fixed. > > -chris > > On Thu, Dec 08, 2011 at 11:06:47AM -0800, David Marcin wrote: >> raid10 metadata and data filesystem. dmesg log follows. The system >> is unable to unmount the filesystem after this occurs. >> >> Filesystem mounted at/mnt/btrfs with -o compress,degraded >> Command: btrfs device delete missing /mnt/btrfs >> >> [ 283.398222] ------------[ cut here ]------------ >> [ 283.398289] kernel BUG at /home/apw/COD/linux/fs/btrfs/transaction.c:1329! >> [ 283.398355] invalid opcode: 0000 [#1] SMP >> [ 283.398481] CPU 3 >> [ 283.398520] Modules linked in: nfsd nfs lockd fscache auth_rpcgss >> nfs_acl sunrpc psmouse lp parport joydev serio_raw btrfs zlib_deflate >> libcrc32c usbhid hid mpt2sas scsi_transport_sas raid_class e1000e >> [ 283.399435] >> [ 283.399491] Pid: 2216, comm: btrfs Not tainted >> 3.2.0-030200rc2-generic #201111151435 Supermicro X8SIL/X8SIL >> [ 283.399678] RIP: 0010:[<ffffffffa00f7052>] [<ffffffffa00f7052>] >> btrfs_commit_transaction+0x8f2/0x900 [btrfs] >> [ 283.399822] RSP: 0018:ffff880133573ac8 EFLAGS: 00010282 >> [ 283.399884] RAX: 00000000fffffffb RBX: ffff8801276540f0 RCX: ffff880133573a38 >> [ 283.399952] RDX: 0000000000002000 RSI: 000004a22dd55000 RDI: ffff880127654150 >> [ 283.400020] RBP: ffff880133573b88 R08: 0000000000002000 R09: 0000000000000000 >> [ 283.400087] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880127654168 >> [ 283.400154] R13: ffff880130b8ac00 R14: ffff8801276540f0 R15: ffff88012d4eb480 >> [ 283.400220] FS: 00007f0c17fc1760(0000) GS:ffff88013bcc0000(0000) >> knlGS:0000000000000000 >> [ 283.400302] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> [ 283.400363] CR2: 00007f7e1d427870 CR3: 0000000132a4c000 CR4: 00000000000006e0 >> [ 283.400428] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 283.400492] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> [ 283.400557] Process btrfs (pid: 2216, threadinfo ffff880133572000, >> task ffff8801306cdbc0) >> [ 283.400637] Stack: >> [ 283.400691] 0000000000100000 ffffea0001a22701 ffff880133573b28 >> ffff88012d4eb480 >> [ 283.400905] 0000000000000000 0000000000000132 ffff880133573b28 >> ffffffffa00f65be >> [ 283.401122] 0000000000000001 ffff880130b8ac00 ffff880130b8ac00 >> 0000000000000001 >> [ 283.401322] Call Trace: >> [ 283.401380] [<ffffffffa00f65be>] ? join_transaction+0xde/0x280 [btrfs] >> [ 283.401441] [<ffffffff81088a90>] ? wake_up_bit+0x40/0x40 >> [ 283.401521] [<ffffffffa013615e>] prepare_to_relocate+0xbe/0xd0 [btrfs] >> [ 283.401604] [<ffffffffa013c17b>] relocate_block_group+0x4b/0x5f0 [btrfs] >> [ 283.401671] [<ffffffff8160e7fe>] ? _raw_spin_lock+0xe/0x20 >> [ 283.401750] [<ffffffffa00f515b>] ? >> btrfs_clean_old_snapshots+0x7b/0x160 [btrfs] >> [ 283.401852] [<ffffffffa013c8d3>] >> btrfs_relocate_block_group+0x1b3/0x2d0 [btrfs] >> [ 283.401952] [<ffffffffa011ac8d>] btrfs_relocate_chunk+0x7d/0x430 [btrfs] >> [ 283.402038] [<ffffffffa0125e42>] ? >> btrfs_tree_read_unlock_blocking+0x42/0x70 [btrfs] >> [ 283.402141] [<ffffffffa011b8d3>] btrfs_shrink_device+0x223/0x440 [btrfs] >> [ 283.402226] [<ffffffffa011bce2>] btrfs_rm_device+0x1f2/0x5c0 [btrfs] >> [ 283.402310] [<ffffffffa0125a78>] btrfs_ioctl+0x4e8/0x690 [btrfs] >> [ 283.402379] [<ffffffff811892a9>] do_vfs_ioctl+0x99/0x350 >> [ 283.402443] [<ffffffff81182dc5>] ? putname+0x35/0x50 >> [ 283.402506] [<ffffffff81189601>] sys_ioctl+0xa1/0xb0 >> [ 283.402572] [<ffffffff81616c02>] system_call_fastpath+0x16/0x1b >> [ 283.402634] Code: 00 48 85 db 0f 84 5b fe ff ff 48 8b 03 0f 1f 40 >> 00 48 8b 7b 08 48 83 c3 10 4c 89 ee ff d0 48 8b 03 48 85 c0 75 eb e9 >> 3a fe ff ff <0f> 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 >> 83 ec >> [ 283.404962] RIP [<ffffffffa00f7052>] >> btrfs_commit_transaction+0x8f2/0x900 [btrfs] >> [ 283.405099] RSP <ffff880133573ac8> >> [ 283.405224] ---[ end trace d2452d35e90228f4 ]--- >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 08, 2011 at 12:27:52PM -0800, David Marcin wrote:> Hi Chris, > This was on 3.2-rc2 but I tried with rc4 and it segfaulted again. I > think the traces were the same but I''ve rebooted and can''t say for > sure. > David > On Thu, Dec 8, 2011 at 11:45 AM, Chris Mason <chris.mason@oracle.com> wrote: > > Which kernel is this? This looks like one I recently fixed. > > > > -chris > > > > On Thu, Dec 08, 2011 at 11:06:47AM -0800, David Marcin wrote: > >> raid10 metadata and data filesystem. dmesg log follows. The system > >> is unable to unmount the filesystem after this occurs. > >> > >> Filesystem mounted at/mnt/btrfs with -o compress,degraded > >> Command: btrfs device delete missing /mnt/btrfs > >> > >> [ 283.398222] ------------[ cut here ]------------ > >> [ 283.398289] kernel BUG at /home/apw/COD/linux/fs/btrfs/transaction.c:1329!So this crash means we failed to write all the blocks required to commit the transaction. The reason is that we''re getting failed bios to the missing device, and that failure isn''t properly eaten by the raid aware endio code. If you pull the top commit from my for-linus branch, it should all work. I know you''ve got a big FS here, I haven''t tested this on raid10 yet, only raid1. If you want to wait a bit for safety I''ll do a raid10 run too. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> On Thu, Dec 08, 2011 at 11:06:47AM -0800, David Marcin wrote: >>>> raid10 metadata and data filesystem. dmesg log follows. The system >>>> is unable to unmount the filesystem after this occurs. >>>> >>>> Filesystem mounted at/mnt/btrfs with -o compress,degraded >>>> Command: btrfs device delete missing /mnt/btrfs >>>> >>>> [ 283.398222] ------------[ cut here ]------------ >>>> [ 283.398289] kernel BUG at /home/apw/COD/linux/fs/btrfs/transaction.c:1329! > > So this crash means we failed to write all the blocks required to commit > the transaction. The reason is that we''re getting failed bios to the > missing device, and that failure isn''t properly eaten by the > raid aware endio code. > > If you pull the top commit from my for-linus branch, it should all work. > > I know you''ve got a big FS here, I haven''t tested this on raid10 yet, > only raid1. If you want to wait a bit for safety I''ll do a raid10 run > too. >The fix looks good to me, and I''ve tested it on raid10. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Dec 10, 2011 at 7:22 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:>>>> On Thu, Dec 08, 2011 at 11:06:47AM -0800, David Marcin wrote: >>>>> raid10 metadata and data filesystem. dmesg log follows. The system >>>>> is unable to unmount the filesystem after this occurs. >>>>> >>>>> Filesystem mounted at/mnt/btrfs with -o compress,degraded >>>>> Command: btrfs device delete missing /mnt/btrfs >>>>> >>>>> [ 283.398222] ------------[ cut here ]------------ >>>>> [ 283.398289] kernel BUG at /home/apw/COD/linux/fs/btrfs/transaction.c:1329! >> >> So this crash means we failed to write all the blocks required to commit >> the transaction. The reason is that we''re getting failed bios to the >> missing device, and that failure isn''t properly eaten by the >> raid aware endio code. >> >> If you pull the top commit from my for-linus branch, it should all work. >> >> I know you''ve got a big FS here, I haven''t tested this on raid10 yet, >> only raid1. If you want to wait a bit for safety I''ll do a raid10 run >> too. >> > > The fix looks good to me, and I''ve tested it on raid10.Thanks Chris and Li. I can confirm that this fixed my issues on raid10 as well. David -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html