Iftikhar Rathore (irathore)
2013-Apr-13 21:04 UTC
btrfs crash (and softlockup btrfs-endio-wri)
I am using NFS over brtfs (vanilla 3.8.5) for heavy CoW to clone virtual
disks with sizes 20-50GB. It worked OK for a couple of days, but
yesterday it crashed. Reboot fixed the problem and I do not see any data
corruption. I have a couple of different kdumps, I will include one as
text and attach the other ones.
I am using Fedora 18 with vanilla 3.8.5. The filesystem is created over
a SAN volume around 16TB using Fibre Channel (Emulex FC HBA)
I have uploaded brtfs-debug-tree output at
https://docs.google.com/file/d/0B2kI43dX6LrwQjdKZUdqN2JDVlU/edit?usp=sharing
[root@datastore01 ~]# btrfs fi show /dev/sdc
failed to read /dev/sr0
failed to read /dev/sr1
Label: none uuid: 9a61c7a6-073b-4b53-bcb6-ea3f8c60ffb7
Total devices 1 FS bytes used 442.61GB
devid 1 size 16.37TB used 883.04GB path /dev/sdc
Btrfs Btrfs v0.19
[root@datastore01 ~]#
[root@datastore01 ~]# btrfsck /dev/sdc
checking extents
checking fs roots
checking root refs
found 475249549312 bytes used err is 0
total csum bytes: 463284900
total tree bytes: 614862848
total fs tree bytes: 21024768
btree space waste bytes: 79151456
file data blocks allocated: 11208754151424
referenced 943086518272
Btrfs Btrfs v0.19
[root@datastore01 ~]#
[root@datastore01 ~]# btrfs fi df /btrfs/
Data: total=879.01GB, used=442.04GB
System, DUP: total=8.00MB, used=100.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=2.00GB, used=586.00MB
Metadata: total=8.00MB, used=0.00
Apr 13 04:05:54 datastore01 kernel: [1210991.339071] BUG: soft lockup - CPU#3
stuck for 22s! [btrfs-endio-wri:30524]
Apr 13 04:05:54 datastore01 kernel: [1210991.342079] Modules linked in: nfsd
nfsv3 nfs_acl nfsv4 auth_rpcgss nfs lockd sunrpc dns_resolver fscache btrfs
zlib_deflate libcrc32c lpfc coretemp kvm_intel kvm iTCO_wdt enic i7core_edac
i2c_i801 iTCO_vendor_support edac_core bnx2 lpc_ich mfd_core ioatdma joydev
pcspkr acpi_power_meter dca crc32c_intel microcode wmi i2c_algo_bit
drm_kms_helper ttm drm fnic usb_storage libfcoe libfc mptsas scsi_transport_sas
scsi_transport_fc mptscsih i2c_core mptbase scsi_tgt [last unloaded:
iptable_mangle]
Apr 13 04:05:54 datastore01 kernel: [1210991.342127] CPU 3
Apr 13 04:05:54 datastore01 kernel: [1210991.342132] Pid: 30524, comm:
btrfs-endio-wri Tainted: G W 3.8.5 #1 Cisco Systems Inc
R250-2480805/R250-2480805
Apr 13 04:05:54 datastore01 kernel: [1210991.342135] RIP:
0010:[<ffffffff812ff49d>] [<ffffffff812ff49d>]
__write_lock_failed+0xd/0x20
Apr 13 04:05:54 datastore01 kernel: [1210991.342145] RSP: 0018:ffff880b5ed73d28
EFLAGS: 00000287
Apr 13 04:05:54 datastore01 kernel: [1210991.342148] RAX: 0000000000000000 RBX:
ffff882fde861980 RCX: 0000000000005862
Apr 13 04:05:54 datastore01 kernel: [1210991.342151] RDX: 0000000000005000 RSI:
00000000aed3c000 RDI: ffff8817d2f54688
Apr 13 04:05:54 datastore01 kernel: [1210991.342153] RBP: ffff880b5ed73d28 R08:
0000000000000000 R09: 0000000000000000
Apr 13 04:05:54 datastore01 kernel: [1210991.342155] R10: ffff8817d2f54650 R11:
0000000000000000 R12: ffff8817d3471120
Apr 13 04:05:54 datastore01 kernel: [1210991.342158] R13: ffff880b5ed73cc8 R14:
ffffffff81181bd4 R15: ffff880b5ed73ca8
Apr 13 04:05:54 datastore01 kernel: [1210991.342160] FS: 0000000000000000(0000)
GS:ffff88306fc20000(0000) knlGS:0000000000000000
Apr 13 04:05:54 datastore01 kernel: [1210991.342163] CS: 0010 DS: 0000 ES: 0000
CR0: 000000008005003b
Apr 13 04:05:54 datastore01 kernel: [1210991.342165] CR2: 00007f1ff03fbe28 CR3:
0000000001c0c000 CR4: 00000000000007e0
Apr 13 04:05:54 datastore01 kernel: [1210991.342168] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Apr 13 04:05:54 datastore01 kernel: [1210991.342170] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Apr 13 04:05:54 datastore01 kernel: [1210991.342173] Process btrfs-endio-wri
(pid: 30524, threadinfo ffff880b5ed72000, task ffff8817e8b6c620)
Apr 13 04:05:54 datastore01 kernel: [1210991.342175] Stack:
Apr 13 04:05:54 datastore01 kernel: [1210991.342176] ffff880b5ed73d38
ffffffff8164ad87 ffff880b5ed73d68 ffffffffa03d1142
Apr 13 04:05:54 datastore01 kernel: [1210991.342183] ffff8818854a2448
ffff8817d2f54810 ffff8812e5f39800 ffff882fde861980
Apr 13 04:05:54 datastore01 kernel: [1210991.342187] ffff880b5ed73de8
ffffffffa03c5da3 0000000000005000 0000000100000000
Apr 13 04:05:54 datastore01 kernel: [1210991.342192] Call Trace:
Apr 13 04:05:54 datastore01 kernel: [1210991.342201] [<ffffffff8164ad87>]
_raw_write_lock+0x17/0x20
Apr 13 04:05:54 datastore01 kernel: [1210991.342230] [<ffffffffa03d1142>]
unpin_extent_cache+0x32/0x130 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.342248] [<ffffffffa03c5da3>]
btrfs_finish_ordered_io+0x223/0x3f0 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.342256] [<ffffffff8106d7b0>]
? __internal_add_timer+0x130/0x130
Apr 13 04:05:54 datastore01 kernel: [1210991.342274] [<ffffffffa03c5f85>]
finish_ordered_fn+0x15/0x20 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.342294] [<ffffffffa03e6c56>]
worker_loop+0x136/0x580 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.342313] [<ffffffffa03e6b20>]
? btrfs_queue_worker+0x300/0x300 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.342321] [<ffffffff81081c30>]
kthread+0xc0/0xd0
Apr 13 04:05:54 datastore01 kernel: [1210991.342328] [<ffffffff81010000>]
? ftrace_define_fields_xen_mc_entry+0xa0/0xf0
Apr 13 04:05:54 datastore01 kernel: [1210991.342332] [<ffffffff81081b70>]
? kthread_create_on_node+0x120/0x120
Apr 13 04:05:54 datastore01 kernel: [1210991.342338] [<ffffffff816534ac>]
ret_from_fork+0x7c/0xb0
Apr 13 04:05:54 datastore01 kernel: [1210991.342342] [<ffffffff81081b70>]
? kthread_create_on_node+0x120/0x120
Apr 13 04:05:54 datastore01 kernel: [1210991.342344] Code: 01 31 c0 66 66 90 c3
b8 f2 ff ff ff 66 66 90 c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5
f0 81 07 00 00 10 00 f3 90 <81> 3f 00 00 10 00 75 f6 f0 81 2f 00 00 10 00
75 e6 5d c3 55 48
Apr 13 04:05:54 datastore01 kernel: [1210991.388043] BUG: soft lockup - CPU#7
stuck for 22s! [nfsd:10431]
Apr 13 04:05:54 datastore01 kernel: [1210991.390634] Modules linked in: nfsd
nfsv3 nfs_acl nfsv4 auth_rpcgss nfs lockd sunrpc dns_resolver fscache btrfs
zlib_deflate libcrc32c lpfc coretemp kvm_intel kvm iTCO_wdt enic i7core_edac
i2c_i801 iTCO_vendor_support edac_core bnx2 lpc_ich mfd_core ioatdma joydev
pcspkr acpi_power_meter dca crc32c_intel microcode wmi i2c_algo_bit
drm_kms_helper ttm drm fnic usb_storage libfcoe libfc mptsas scsi_transport_sas
scsi_transport_fc mptscsih i2c_core mptbase scsi_tgt [last unloaded:
iptable_mangle]
Apr 13 04:05:54 datastore01 kernel: [1210991.390664] CPU 7
Apr 13 04:05:54 datastore01 kernel: [1210991.390667] Pid: 10431, comm: nfsd
Tainted: G W 3.8.5 #1 Cisco Systems Inc R250-2480805/R250-2480805
Apr 13 04:05:54 datastore01 kernel: [1210991.390669] RIP:
0010:[<ffffffff81306c79>] [<ffffffff81306c79>]
__list_del_entry+0x29/0xd0
Apr 13 04:05:54 datastore01 kernel: [1210991.390676] RSP: 0018:ffff881ed614d928
EFLAGS: 00000202
Apr 13 04:05:54 datastore01 kernel: [1210991.390678] RAX: ffff881099389620 RBX:
ffff88137c665590 RCX: dead000000200200
Apr 13 04:05:54 datastore01 kernel: [1210991.390679] RDX: ffff881099389620 RSI:
ffff8817da593fb0 RDI: ffff881099389620
Apr 13 04:05:54 datastore01 kernel: [1210991.390681] RBP: ffff881ed614d928 R08:
ffff881099389620 R09: 00000000000007dd
Apr 13 04:05:54 datastore01 kernel: [1210991.390683] R10: 0000000000000000 R11:
00000000000007dc R12: 0000000000005861
Apr 13 04:05:54 datastore01 kernel: [1210991.390684] R13: 00000000000000a0 R14:
ffffffff8105e2ed R15: ffff881ed614d8b8
Apr 13 04:05:54 datastore01 kernel: [1210991.390687] FS: 0000000000000000(0000)
GS:ffff88306fc60000(0000) knlGS:0000000000000000
Apr 13 04:05:54 datastore01 kernel: [1210991.390688] CS: 0010 DS: 0000 ES: 0000
CR0: 000000008005003b
Apr 13 04:05:54 datastore01 kernel: [1210991.390690] CR2: 0000000001d329f0 CR3:
0000000001c0c000 CR4: 00000000000007e0
Apr 13 04:05:54 datastore01 kernel: [1210991.390692] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Apr 13 04:05:54 datastore01 kernel: [1210991.390694] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Apr 13 04:05:54 datastore01 kernel: [1210991.390696] Process nfsd (pid: 10431,
threadinfo ffff881ed614c000, task ffff882fe6d05d80)
Apr 13 04:05:54 datastore01 kernel: [1210991.390697] Stack:
Apr 13 04:05:54 datastore01 kernel: [1210991.390698] ffff881ed614da48
ffffffffa03f0765 ffff881ed614d9d8 ffffffffa039c7e3
Apr 13 04:05:54 datastore01 kernel: [1210991.390702] ffff881ed614d99c
ffffffffa0396dea 0000805000000002 ffff881ed614da00
Apr 13 04:05:54 datastore01 kernel: [1210991.390706] ffff8817d2f54810
ffff8812e5f39800 ffff8817d2f54688 000000000000584d
Apr 13 04:05:54 datastore01 kernel: [1210991.390709] Call Trace:
Apr 13 04:05:54 datastore01 kernel: [1210991.390727] [<ffffffffa03f0765>]
btrfs_log_changed_extents+0xe5/0x6d0 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.390737] [<ffffffffa039c7e3>]
? btrfs_search_slot+0x593/0x7a0 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.390746] [<ffffffffa0396dea>]
? btrfs_alloc_path+0x1a/0x20 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.390762] [<ffffffffa03efdeb>]
? fill_inode_item.isra.22+0x1fb/0x240 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.390778] [<ffffffffa03d9ed7>]
? free_extent_buffer+0x37/0x90 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.390794] [<ffffffffa03f4740>]
btrfs_log_inode+0x420/0x6f0 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.390800] [<ffffffff8113fe14>]
? __pagevec_release+0x24/0x40
Apr 13 04:05:54 datastore01 kernel: [1210991.390816] [<ffffffffa03f569b>]
btrfs_log_inode_parent+0x17b/0x450 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.390831] [<ffffffffa03f59b7>]
btrfs_log_dentry_safe+0x47/0x70 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.390846] [<ffffffffa03cc3c7>]
btrfs_sync_file+0x167/0x230 [btrfs]
Apr 13 04:05:54 datastore01 kernel: [1210991.390852] [<ffffffff811c815d>]
vfs_fsync_range+0x1d/0x30
Apr 13 04:05:54 datastore01 kernel: [1210991.390860] [<ffffffffa05550df>]
nfsd_vfs_write.isra.12+0x2cf/0x3d0 [nfsd]
Apr 13 04:05:54 datastore01 kernel: [1210991.390865] [<ffffffff8119848f>]
? dentry_open+0x5f/0xf0
Apr 13 04:05:54 datastore01 kernel: [1210991.390872] [<ffffffffa0555edc>]
? nfsd_open+0x11c/0x1b0 [nfsd]
Apr 13 04:05:54 datastore01 kernel: [1210991.390878] [<ffffffffa0557609>]
nfsd_write+0xf9/0x110 [nfsd]
Apr 13 04:05:54 datastore01 kernel: [1210991.390886] [<ffffffffa055e93c>]
nfsd3_proc_write+0xbc/0x150 [nfsd]
Apr 13 04:05:54 datastore01 kernel: [1210991.390891] [<ffffffffa0550dab>]
nfsd_dispatch+0xeb/0x230 [nfsd]
Apr 13 04:05:54 datastore01 kernel: [1210991.390907] [<ffffffffa048a7c8>]
svc_process_common+0x328/0x6d0 [sunrpc]
Apr 13 04:05:54 datastore01 kernel: [1210991.390917] [<ffffffffa048aebf>]
svc_process+0xff/0x150 [sunrpc]
Apr 13 04:05:54 datastore01 kernel: [1210991.390923] [<ffffffffa0550757>]
nfsd+0xc7/0x140 [nfsd]
Apr 13 04:05:54 datastore01 kernel: [1210991.390928] [<ffffffffa0550690>]
? nfsd_destroy+0x80/0x80 [nfsd]
Apr 13 04:05:54 datastore01 kernel: [1210991.390932] [<ffffffff81081c30>]
kthread+0xc0/0xd0
Apr 13 04:05:54 datastore01 kernel: [1210991.390936] [<ffffffff81010000>]
? ftrace_define_fields_xen_mc_entry+0xa0/0xf0
Apr 13 04:05:54 datastore01 kernel: [1210991.390939] [<ffffffff81081b70>]
? kthread_create_on_node+0x120/0x120
Apr 13 04:05:54 datastore01 kernel: [1210991.390943] [<ffffffff816534ac>]
ret_from_fork+0x7c/0xb0
Apr 13 04:05:54 datastore01 kernel: [1210991.390946] [<ffffffff81081b70>]
? kthread_create_on_node+0x120/0x120
Apr 13 04:05:54 datastore01 kernel: [1210991.390948] Code: 00 00 55 48 8b 17 48
b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00
00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7
75 2b 48 89 42 08
Apr 13 04:06:22 datastore01 kernel: [1211019.323609] BUG: soft lockup - CPU#3
stuck for 22s! [btrfs-endio-wri:30524]
Apr 13 04:06:22 datastore01 kernel: [1211019.326614] Modules linked in: nfsd
nfsv3 nfs_acl nfsv4 auth_rpcgss nfs lockd sunrpc dns_resolver fscache btrfs
zlib_deflate libcrc32c lpfc coretemp kvm_intel kvm iTCO_wdt enic i7core_edac
i2c_i801 iTCO_vendor_support edac_core bnx2 lpc_ich mfd_core ioatdma joydev
pcspkr acpi_power_meter dca crc32c_intel microcode wmi i2c_algo_bit
drm_kms_helper ttm drm fnic usb_storage libfcoe libfc mptsas scsi_transport_sas
scsi_transport_fc mptscsih i2c_core mptbase scsi_tgt [last unloaded:
iptable_mangle]
Apr 13 04:06:22 datastore01 kernel: [1211019.326653] CPU 3
Seemingly Similar Threads
- Re: [btrfs-transacti] & btrfs-endio-wri] - WAS: Re: [btrfs-delalloc-]
- Top shows brtfs-cache-1 and brtfs-endio-met while the hard drives seem busy
- [PATCH RFC] Change softlockup watchdog to ignore stolen time
- [PATCH RFC] Change softlockup watchdog to ignore stolen time
- [patch 0/2] softlockup watchdog improvements
