Hi, I have a F20 system with BTRFS on a 4 disk RAID1 profile One of the disks failed the other day and when I was replacing it today I think a scheduled snapshot was attempted - the following appeared in the logs and any btrfs commands locked up. I don't know if the snapshot was relate dor not but the timing is suspicious. Nov 07 23:06:38 server.purley.hogarthuk.local kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: IP: [<ffffffffa0579bd1>] btrfs_kobj_rm_device+0x21/0x40 [btrfs] Nov 07 23:06:38 server.purley.hogarthuk.local kernel: PGD 2055bf067 PUD 2055be067 PMD 0 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: Oops: 0000 [#1] SMP Nov 07 23:06:38 server.purley.hogarthuk.local kernel: Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: CPU: 1 PID: 1104 Comm: btrfs Not tainted 3.16.6-203.fc20.x86_64 #1 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: task: ffff880212929da0 ti: ffff8802057a8000 task.ti: ffff8802057a8000 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: RIP: 0010:[<ffffffffa0579bd1>] [<ffffffffa0579bd1>] btrfs_kobj_rm_device+0x21/0x40 [btrfs] Nov 07 23:06:38 server.purley.hogarthuk.local kernel: RSP: 0018:ffff8802057abc80 EFLAGS: 00010286 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 5647b8799aa1b898 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: RDX: ffff8802139bb410 RSI: ffff8802139bd200 RDI: ffff88020f10c580 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: RBP: ffff8802057abc88 R08: ffff8802139bb410 R09: 00000000000004c1 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: R10: 0000000000000000 R11: ffff8802057ab99e R12: ffff8800d3062dc8 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: R13: ffff8800d2dbe800 R14: ffff8802139bd200 R15: ffff8802095d3000 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: FS: 00007f8d272e2880(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Nov 07 23:06:38 server.purley.hogarthuk.local kernel: CR2: 0000000000000088 CR3: 00000002055c0000 CR4: 00000000000007e0 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: Stack: Nov 07 23:06:38 server.purley.hogarthuk.local kernel: ffff8800d3062000 ffff8802057abd08 ffffffffa05d1475 ffff8800d3062100 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: ffff8800d3062e38 00000a38cca60000 00ff8800d3062660 0000000000000000 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: ffff880212929da0 0000000000000000 ffff8800d3062000 00000000e6a67ce4 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: Call Trace: Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffffa05d1475>] btrfs_dev_replace_finishing+0x325/0x5c0 [btrfs] Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffffa05d1a92>] btrfs_dev_replace_start+0x382/0x450 [btrfs] Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffffa059aca1>] btrfs_ioctl+0x1d71/0x2ad0 [btrfs] Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffff811ad459>] ? handle_mm_fault+0x7d9/0x1070 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffff81059d6c>] ? __do_page_fault+0x21c/0x540 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffff81206c20>] do_vfs_ioctl+0x2e0/0x4a0 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffff81206e61>] SyS_ioctl+0x81/0xa0 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffff8170f5a9>] system_call_fastpath+0x16/0x1b Nov 07 23:06:38 server.purley.hogarthuk.local kernel: Code: 5f 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 8b bf f0 09 00 00 48 85 ff 74 20 31 db 48 85 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: RIP [<ffffffffa0579bd1>] btrfs_kobj_rm_device+0x21/0x40 [btrfs] Nov 07 23:06:39 server.purley.hogarthuk.local kernel: RSP <ffff8802057abc80> Nov 07 23:06:39 server.purley.hogarthuk.local kernel: CR2: 0000000000000088 Nov 07 23:06:39 server.purley.hogarthuk.local kernel: ---[ end trace 84e1717f2e9518e5 ]--- Powered off and back on the system and btrfs fi sh appears to be showing the four disks and doesn't say device missing. Trying to mount the volume (with or without the recovery option) results in: [ 160.449018] BTRFS info (device sdb1): disk space caching is enabled [ 160.449963] BTRFS: failed to read the system array on sdb1 [ 160.465334] BTRFS: open_ctree failed A btrfs check showed: warning, device 3 is missing warning devid 3 not found already checking free space cache Error reading 7661077725184, -1 failed to load free space cache for block group 7606532177920 Error reading 7889999429632, -1 failed to load free space cache for block group 7607605919744 Error reading 9100062818304, -1 failed to load free space cache for block group 7626933272576 ... (lots of these) ... checking csums There are no extents for csum range 8242406531072-8242407579648 Csum exists for 8242406531072-8242407579648 but there is no extent record There are no extents for csum range 8242408103936-8242413346816 Csum exists for 8242408103936-8242413346816 but there is no extent record There are no extents for csum range 8242413871104-8242416492544 Csum exists for 8242413871104-8242416492544 but there is no extent record There are no extents for csum range 8242417016832-8242424881152 Csum exists for 8242417016832-8242424881152 but there is no extent record There are no extents for csum range 8242425929728-8242433794048 Csum exists for 8242425929728-8242433794048 but there is no extent record There are no extents for csum range 8242434318336-8242439036928 ... (lots of these) ... found 336915598309 bytes used err is 3401 total csum bytes: 3550868116 total tree bytes: 4758843392 total fs tree bytes: 605634560 total extent tree bytes: 249384960 btree space waste bytes: 357846512 file data blocks allocated: 149934014087168 referenced 4822013419520 Btrfs v3.16.2 If I try to mount the volume with degraded,recovery it mounts and says this in the logs: Nov 08 02:45:03 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): allowing degraded mounts Nov 08 02:45:03 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): enabling auto recovery Nov 08 02:45:03 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): disk space caching is enabled Nov 08 02:45:03 server.purley.hogarthuk.local kernel: BTRFS: bdev (null) errs: wr 7191592, rd 6079105, flush 0, corrupt 0, gen 0 Nov 08 02:45:22 server.purley.hogarthuk.local kernel: BTRFS: continuing dev_replace from <missing disk> (devid 3) to /dev/sda1 @89% Nov 08 02:45:22 server.purley.hogarthuk.local kernel: SELinux: initialized (dev sdb1, type btrfs), uses xattr Nov 08 02:45:28 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7743971131392) is invalid. skip it Nov 08 02:45:28 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7746118615040) is invalid. skip it Nov 08 02:45:28 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7751487324160) is invalid. skip it Nov 08 02:45:28 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7754708549632) is invalid. skip it Nov 08 02:45:28 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7757929775104) is invalid. skip it Nov 08 02:45:28 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7760077258752) is invalid. skip it Nov 08 02:45:28 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7761151000576) is invalid. skip it Nov 08 02:45:28 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7766519709696) is invalid. skip it Nov 08 02:45:28 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7804100673536) is invalid. skip it Nov 08 02:45:28 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7830944219136) is invalid. skip it Nov 08 02:45:29 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7853492797440) is invalid. skip it Nov 08 02:45:30 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7893221244928) is invalid. skip it Nov 08 02:45:30 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7894294986752) is invalid. skip it Nov 08 02:45:30 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7920064790528) is invalid. skip it Nov 08 02:45:30 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (7924359757824) is invalid. skip it Nov 08 02:45:31 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8097232191488) is invalid. skip it Nov 08 02:45:31 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8098305933312) is invalid. skip it Nov 08 02:45:31 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8100453416960) is invalid. skip it Nov 08 02:45:31 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8103674642432) is invalid. skip it Nov 08 02:45:31 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8116559544320) is invalid. skip it Nov 08 02:45:31 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8119780769792) is invalid. skip it Nov 08 02:45:31 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8177762828288) is invalid. skip it Nov 08 02:45:31 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8178836570112) is invalid. skip it Nov 08 02:45:31 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8194942697472) is invalid. skip it Nov 08 02:45:32 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8240039854080) is invalid. skip it Nov 08 02:45:32 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8264735916032) is invalid. skip it Nov 08 02:45:33 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8431165898752) is invalid. skip it Nov 08 02:45:33 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8453714477056) is invalid. skip it Nov 08 02:45:34 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (8498811633664) is invalid. skip it Nov 08 02:45:36 server.purley.hogarthuk.local kernel: BTRFS info (device sdb1): The free space cache file (9018502676480) is invalid. skip it Nov 08 02:45:52 server.purley.hogarthuk.local kernel: BTRFS: dev_replace from <missing disk> (devid 3) to /dev/sda1) finished After that there is the following stack trace: Nov 07 23:06:38 server.purley.hogarthuk.local kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: IP: [<ffffffffa0579bd1>] btrfs_kobj_rm_device+0x21/0x40 [btrfs] Nov 07 23:06:38 server.purley.hogarthuk.local kernel: PGD 2055bf067 PUD 2055be067 PMD 0 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: Oops: 0000 [#1] SMP Nov 07 23:06:38 server.purley.hogarthuk.local kernel: Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: CPU: 1 PID: 1104 Comm: btrfs Not tainted 3.16.6-203.fc20.x86_64 #1 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: task: ffff880212929da0 ti: ffff8802057a8000 task.ti: ffff8802057a8000 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: RIP: 0010:[<ffffffffa0579bd1>] [<ffffffffa0579bd1>] btrfs_kobj_rm_device+0x21/0x40 [btrfs] Nov 07 23:06:38 server.purley.hogarthuk.local kernel: RSP: 0018:ffff8802057abc80 EFLAGS: 00010286 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 5647b8799aa1b898 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: RDX: ffff8802139bb410 RSI: ffff8802139bd200 RDI: ffff88020f10c580 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: RBP: ffff8802057abc88 R08: ffff8802139bb410 R09: 00000000000004c1 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: R10: 0000000000000000 R11: ffff8802057ab99e R12: ffff8800d3062dc8 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: R13: ffff8800d2dbe800 R14: ffff8802139bd200 R15: ffff8802095d3000 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: FS: 00007f8d272e2880(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Nov 07 23:06:38 server.purley.hogarthuk.local kernel: CR2: 0000000000000088 CR3: 00000002055c0000 CR4: 00000000000007e0 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: Stack: Nov 07 23:06:38 server.purley.hogarthuk.local kernel: ffff8800d3062000 ffff8802057abd08 ffffffffa05d1475 ffff8800d3062100 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: ffff8800d3062e38 00000a38cca60000 00ff8800d3062660 0000000000000000 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: ffff880212929da0 0000000000000000 ffff8800d3062000 00000000e6a67ce4 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: Call Trace: Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffffa05d1475>] btrfs_dev_replace_finishing+0x325/0x5c0 [btrfs] Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffffa05d1a92>] btrfs_dev_replace_start+0x382/0x450 [btrfs] Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffffa059aca1>] btrfs_ioctl+0x1d71/0x2ad0 [btrfs] Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffff811ad459>] ? handle_mm_fault+0x7d9/0x1070 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffff81059d6c>] ? __do_page_fault+0x21c/0x540 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffff81206c20>] do_vfs_ioctl+0x2e0/0x4a0 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffff81206e61>] SyS_ioctl+0x81/0xa0 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: [<ffffffff8170f5a9>] system_call_fastpath+0x16/0x1b Nov 07 23:06:38 server.purley.hogarthuk.local kernel: Code: 5f 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 8b bf f0 09 00 00 48 85 ff 74 20 31 db 48 85 Nov 07 23:06:38 server.purley.hogarthuk.local kernel: RIP [<ffffffffa0579bd1>] btrfs_kobj_rm_device+0x21/0x40 [btrfs] Nov 07 23:06:39 server.purley.hogarthuk.local kernel: RSP <ffff8802057abc80> Nov 07 23:06:39 server.purley.hogarthuk.local kernel: CR2: 0000000000000088 Nov 07 23:06:39 server.purley.hogarthuk.local kernel: ---[ end trace 84e1717f2e9518e5 ]--- mount, df, etc all hang after this with top showing 100% wait on one of the cpus the fedora 20 kernel is 3.16.6-203.fc20.x86_64 and btrfsprogs is btrfs-progs-3.16.2-1.fc20.x86_64 Could you please provide some guidance to try and recover from this situation? Thanks, James -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html