Iftikhar Rathore (irathore)
2013-Apr-13 21:04 UTC
btrfs crash (and softlockup btrfs-endio-wri)
I am using NFS over brtfs (vanilla 3.8.5) for heavy CoW to clone virtual disks with sizes 20-50GB. It worked OK for a couple of days, but yesterday it crashed. Reboot fixed the problem and I do not see any data corruption. I have a couple of different kdumps, I will include one as text and attach the other ones. I am using Fedora 18 with vanilla 3.8.5. The filesystem is created over a SAN volume around 16TB using Fibre Channel (Emulex FC HBA) I have uploaded brtfs-debug-tree output at https://docs.google.com/file/d/0B2kI43dX6LrwQjdKZUdqN2JDVlU/edit?usp=sharing [root@datastore01 ~]# btrfs fi show /dev/sdc failed to read /dev/sr0 failed to read /dev/sr1 Label: none uuid: 9a61c7a6-073b-4b53-bcb6-ea3f8c60ffb7 Total devices 1 FS bytes used 442.61GB devid 1 size 16.37TB used 883.04GB path /dev/sdc Btrfs Btrfs v0.19 [root@datastore01 ~]# [root@datastore01 ~]# btrfsck /dev/sdc checking extents checking fs roots checking root refs found 475249549312 bytes used err is 0 total csum bytes: 463284900 total tree bytes: 614862848 total fs tree bytes: 21024768 btree space waste bytes: 79151456 file data blocks allocated: 11208754151424 referenced 943086518272 Btrfs Btrfs v0.19 [root@datastore01 ~]# [root@datastore01 ~]# btrfs fi df /btrfs/ Data: total=879.01GB, used=442.04GB System, DUP: total=8.00MB, used=100.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=2.00GB, used=586.00MB Metadata: total=8.00MB, used=0.00 Apr 13 04:05:54 datastore01 kernel: [1210991.339071] BUG: soft lockup - CPU#3 stuck for 22s! [btrfs-endio-wri:30524] Apr 13 04:05:54 datastore01 kernel: [1210991.342079] Modules linked in: nfsd nfsv3 nfs_acl nfsv4 auth_rpcgss nfs lockd sunrpc dns_resolver fscache btrfs zlib_deflate libcrc32c lpfc coretemp kvm_intel kvm iTCO_wdt enic i7core_edac i2c_i801 iTCO_vendor_support edac_core bnx2 lpc_ich mfd_core ioatdma joydev pcspkr acpi_power_meter dca crc32c_intel microcode wmi i2c_algo_bit drm_kms_helper ttm drm fnic usb_storage libfcoe libfc mptsas scsi_transport_sas scsi_transport_fc mptscsih i2c_core mptbase scsi_tgt [last unloaded: iptable_mangle] Apr 13 04:05:54 datastore01 kernel: [1210991.342127] CPU 3 Apr 13 04:05:54 datastore01 kernel: [1210991.342132] Pid: 30524, comm: btrfs-endio-wri Tainted: G W 3.8.5 #1 Cisco Systems Inc R250-2480805/R250-2480805 Apr 13 04:05:54 datastore01 kernel: [1210991.342135] RIP: 0010:[<ffffffff812ff49d>] [<ffffffff812ff49d>] __write_lock_failed+0xd/0x20 Apr 13 04:05:54 datastore01 kernel: [1210991.342145] RSP: 0018:ffff880b5ed73d28 EFLAGS: 00000287 Apr 13 04:05:54 datastore01 kernel: [1210991.342148] RAX: 0000000000000000 RBX: ffff882fde861980 RCX: 0000000000005862 Apr 13 04:05:54 datastore01 kernel: [1210991.342151] RDX: 0000000000005000 RSI: 00000000aed3c000 RDI: ffff8817d2f54688 Apr 13 04:05:54 datastore01 kernel: [1210991.342153] RBP: ffff880b5ed73d28 R08: 0000000000000000 R09: 0000000000000000 Apr 13 04:05:54 datastore01 kernel: [1210991.342155] R10: ffff8817d2f54650 R11: 0000000000000000 R12: ffff8817d3471120 Apr 13 04:05:54 datastore01 kernel: [1210991.342158] R13: ffff880b5ed73cc8 R14: ffffffff81181bd4 R15: ffff880b5ed73ca8 Apr 13 04:05:54 datastore01 kernel: [1210991.342160] FS: 0000000000000000(0000) GS:ffff88306fc20000(0000) knlGS:0000000000000000 Apr 13 04:05:54 datastore01 kernel: [1210991.342163] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Apr 13 04:05:54 datastore01 kernel: [1210991.342165] CR2: 00007f1ff03fbe28 CR3: 0000000001c0c000 CR4: 00000000000007e0 Apr 13 04:05:54 datastore01 kernel: [1210991.342168] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 13 04:05:54 datastore01 kernel: [1210991.342170] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Apr 13 04:05:54 datastore01 kernel: [1210991.342173] Process btrfs-endio-wri (pid: 30524, threadinfo ffff880b5ed72000, task ffff8817e8b6c620) Apr 13 04:05:54 datastore01 kernel: [1210991.342175] Stack: Apr 13 04:05:54 datastore01 kernel: [1210991.342176] ffff880b5ed73d38 ffffffff8164ad87 ffff880b5ed73d68 ffffffffa03d1142 Apr 13 04:05:54 datastore01 kernel: [1210991.342183] ffff8818854a2448 ffff8817d2f54810 ffff8812e5f39800 ffff882fde861980 Apr 13 04:05:54 datastore01 kernel: [1210991.342187] ffff880b5ed73de8 ffffffffa03c5da3 0000000000005000 0000000100000000 Apr 13 04:05:54 datastore01 kernel: [1210991.342192] Call Trace: Apr 13 04:05:54 datastore01 kernel: [1210991.342201] [<ffffffff8164ad87>] _raw_write_lock+0x17/0x20 Apr 13 04:05:54 datastore01 kernel: [1210991.342230] [<ffffffffa03d1142>] unpin_extent_cache+0x32/0x130 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.342248] [<ffffffffa03c5da3>] btrfs_finish_ordered_io+0x223/0x3f0 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.342256] [<ffffffff8106d7b0>] ? __internal_add_timer+0x130/0x130 Apr 13 04:05:54 datastore01 kernel: [1210991.342274] [<ffffffffa03c5f85>] finish_ordered_fn+0x15/0x20 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.342294] [<ffffffffa03e6c56>] worker_loop+0x136/0x580 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.342313] [<ffffffffa03e6b20>] ? btrfs_queue_worker+0x300/0x300 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.342321] [<ffffffff81081c30>] kthread+0xc0/0xd0 Apr 13 04:05:54 datastore01 kernel: [1210991.342328] [<ffffffff81010000>] ? ftrace_define_fields_xen_mc_entry+0xa0/0xf0 Apr 13 04:05:54 datastore01 kernel: [1210991.342332] [<ffffffff81081b70>] ? kthread_create_on_node+0x120/0x120 Apr 13 04:05:54 datastore01 kernel: [1210991.342338] [<ffffffff816534ac>] ret_from_fork+0x7c/0xb0 Apr 13 04:05:54 datastore01 kernel: [1210991.342342] [<ffffffff81081b70>] ? kthread_create_on_node+0x120/0x120 Apr 13 04:05:54 datastore01 kernel: [1210991.342344] Code: 01 31 c0 66 66 90 c3 b8 f2 ff ff ff 66 66 90 c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 f0 81 07 00 00 10 00 f3 90 <81> 3f 00 00 10 00 75 f6 f0 81 2f 00 00 10 00 75 e6 5d c3 55 48 Apr 13 04:05:54 datastore01 kernel: [1210991.388043] BUG: soft lockup - CPU#7 stuck for 22s! [nfsd:10431] Apr 13 04:05:54 datastore01 kernel: [1210991.390634] Modules linked in: nfsd nfsv3 nfs_acl nfsv4 auth_rpcgss nfs lockd sunrpc dns_resolver fscache btrfs zlib_deflate libcrc32c lpfc coretemp kvm_intel kvm iTCO_wdt enic i7core_edac i2c_i801 iTCO_vendor_support edac_core bnx2 lpc_ich mfd_core ioatdma joydev pcspkr acpi_power_meter dca crc32c_intel microcode wmi i2c_algo_bit drm_kms_helper ttm drm fnic usb_storage libfcoe libfc mptsas scsi_transport_sas scsi_transport_fc mptscsih i2c_core mptbase scsi_tgt [last unloaded: iptable_mangle] Apr 13 04:05:54 datastore01 kernel: [1210991.390664] CPU 7 Apr 13 04:05:54 datastore01 kernel: [1210991.390667] Pid: 10431, comm: nfsd Tainted: G W 3.8.5 #1 Cisco Systems Inc R250-2480805/R250-2480805 Apr 13 04:05:54 datastore01 kernel: [1210991.390669] RIP: 0010:[<ffffffff81306c79>] [<ffffffff81306c79>] __list_del_entry+0x29/0xd0 Apr 13 04:05:54 datastore01 kernel: [1210991.390676] RSP: 0018:ffff881ed614d928 EFLAGS: 00000202 Apr 13 04:05:54 datastore01 kernel: [1210991.390678] RAX: ffff881099389620 RBX: ffff88137c665590 RCX: dead000000200200 Apr 13 04:05:54 datastore01 kernel: [1210991.390679] RDX: ffff881099389620 RSI: ffff8817da593fb0 RDI: ffff881099389620 Apr 13 04:05:54 datastore01 kernel: [1210991.390681] RBP: ffff881ed614d928 R08: ffff881099389620 R09: 00000000000007dd Apr 13 04:05:54 datastore01 kernel: [1210991.390683] R10: 0000000000000000 R11: 00000000000007dc R12: 0000000000005861 Apr 13 04:05:54 datastore01 kernel: [1210991.390684] R13: 00000000000000a0 R14: ffffffff8105e2ed R15: ffff881ed614d8b8 Apr 13 04:05:54 datastore01 kernel: [1210991.390687] FS: 0000000000000000(0000) GS:ffff88306fc60000(0000) knlGS:0000000000000000 Apr 13 04:05:54 datastore01 kernel: [1210991.390688] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Apr 13 04:05:54 datastore01 kernel: [1210991.390690] CR2: 0000000001d329f0 CR3: 0000000001c0c000 CR4: 00000000000007e0 Apr 13 04:05:54 datastore01 kernel: [1210991.390692] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 13 04:05:54 datastore01 kernel: [1210991.390694] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Apr 13 04:05:54 datastore01 kernel: [1210991.390696] Process nfsd (pid: 10431, threadinfo ffff881ed614c000, task ffff882fe6d05d80) Apr 13 04:05:54 datastore01 kernel: [1210991.390697] Stack: Apr 13 04:05:54 datastore01 kernel: [1210991.390698] ffff881ed614da48 ffffffffa03f0765 ffff881ed614d9d8 ffffffffa039c7e3 Apr 13 04:05:54 datastore01 kernel: [1210991.390702] ffff881ed614d99c ffffffffa0396dea 0000805000000002 ffff881ed614da00 Apr 13 04:05:54 datastore01 kernel: [1210991.390706] ffff8817d2f54810 ffff8812e5f39800 ffff8817d2f54688 000000000000584d Apr 13 04:05:54 datastore01 kernel: [1210991.390709] Call Trace: Apr 13 04:05:54 datastore01 kernel: [1210991.390727] [<ffffffffa03f0765>] btrfs_log_changed_extents+0xe5/0x6d0 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.390737] [<ffffffffa039c7e3>] ? btrfs_search_slot+0x593/0x7a0 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.390746] [<ffffffffa0396dea>] ? btrfs_alloc_path+0x1a/0x20 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.390762] [<ffffffffa03efdeb>] ? fill_inode_item.isra.22+0x1fb/0x240 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.390778] [<ffffffffa03d9ed7>] ? free_extent_buffer+0x37/0x90 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.390794] [<ffffffffa03f4740>] btrfs_log_inode+0x420/0x6f0 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.390800] [<ffffffff8113fe14>] ? __pagevec_release+0x24/0x40 Apr 13 04:05:54 datastore01 kernel: [1210991.390816] [<ffffffffa03f569b>] btrfs_log_inode_parent+0x17b/0x450 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.390831] [<ffffffffa03f59b7>] btrfs_log_dentry_safe+0x47/0x70 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.390846] [<ffffffffa03cc3c7>] btrfs_sync_file+0x167/0x230 [btrfs] Apr 13 04:05:54 datastore01 kernel: [1210991.390852] [<ffffffff811c815d>] vfs_fsync_range+0x1d/0x30 Apr 13 04:05:54 datastore01 kernel: [1210991.390860] [<ffffffffa05550df>] nfsd_vfs_write.isra.12+0x2cf/0x3d0 [nfsd] Apr 13 04:05:54 datastore01 kernel: [1210991.390865] [<ffffffff8119848f>] ? dentry_open+0x5f/0xf0 Apr 13 04:05:54 datastore01 kernel: [1210991.390872] [<ffffffffa0555edc>] ? nfsd_open+0x11c/0x1b0 [nfsd] Apr 13 04:05:54 datastore01 kernel: [1210991.390878] [<ffffffffa0557609>] nfsd_write+0xf9/0x110 [nfsd] Apr 13 04:05:54 datastore01 kernel: [1210991.390886] [<ffffffffa055e93c>] nfsd3_proc_write+0xbc/0x150 [nfsd] Apr 13 04:05:54 datastore01 kernel: [1210991.390891] [<ffffffffa0550dab>] nfsd_dispatch+0xeb/0x230 [nfsd] Apr 13 04:05:54 datastore01 kernel: [1210991.390907] [<ffffffffa048a7c8>] svc_process_common+0x328/0x6d0 [sunrpc] Apr 13 04:05:54 datastore01 kernel: [1210991.390917] [<ffffffffa048aebf>] svc_process+0xff/0x150 [sunrpc] Apr 13 04:05:54 datastore01 kernel: [1210991.390923] [<ffffffffa0550757>] nfsd+0xc7/0x140 [nfsd] Apr 13 04:05:54 datastore01 kernel: [1210991.390928] [<ffffffffa0550690>] ? nfsd_destroy+0x80/0x80 [nfsd] Apr 13 04:05:54 datastore01 kernel: [1210991.390932] [<ffffffff81081c30>] kthread+0xc0/0xd0 Apr 13 04:05:54 datastore01 kernel: [1210991.390936] [<ffffffff81010000>] ? ftrace_define_fields_xen_mc_entry+0xa0/0xf0 Apr 13 04:05:54 datastore01 kernel: [1210991.390939] [<ffffffff81081b70>] ? kthread_create_on_node+0x120/0x120 Apr 13 04:05:54 datastore01 kernel: [1210991.390943] [<ffffffff816534ac>] ret_from_fork+0x7c/0xb0 Apr 13 04:05:54 datastore01 kernel: [1210991.390946] [<ffffffff81081b70>] ? kthread_create_on_node+0x120/0x120 Apr 13 04:05:54 datastore01 kernel: [1210991.390948] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08 Apr 13 04:06:22 datastore01 kernel: [1211019.323609] BUG: soft lockup - CPU#3 stuck for 22s! [btrfs-endio-wri:30524] Apr 13 04:06:22 datastore01 kernel: [1211019.326614] Modules linked in: nfsd nfsv3 nfs_acl nfsv4 auth_rpcgss nfs lockd sunrpc dns_resolver fscache btrfs zlib_deflate libcrc32c lpfc coretemp kvm_intel kvm iTCO_wdt enic i7core_edac i2c_i801 iTCO_vendor_support edac_core bnx2 lpc_ich mfd_core ioatdma joydev pcspkr acpi_power_meter dca crc32c_intel microcode wmi i2c_algo_bit drm_kms_helper ttm drm fnic usb_storage libfcoe libfc mptsas scsi_transport_sas scsi_transport_fc mptscsih i2c_core mptbase scsi_tgt [last unloaded: iptable_mangle] Apr 13 04:06:22 datastore01 kernel: [1211019.326653] CPU 3
Maybe Matching Threads
- Re: [btrfs-transacti] & btrfs-endio-wri] - WAS: Re: [btrfs-delalloc-]
- Top shows brtfs-cache-1 and brtfs-endio-met while the hard drives seem busy
- [PATCH RFC] Change softlockup watchdog to ignore stolen time
- [PATCH RFC] Change softlockup watchdog to ignore stolen time
- [patch 0/2] softlockup watchdog improvements