Marc MERLIN
2013-Feb-26 06:51 UTC
kernel BUG at fs/btrfs/volumes.c:3753! These btrfs crashes at mount time on log replay are really a problem
TL;DR; WARNING: at fs/btrfs/tree-log.c:1984 walk_down_log_tree+0x51/0x307() WARNING: at fs/btrfs/tree-log.c:1988 walk_down_log_tree+0x6c/0x307() kernel BUG at fs/btrfs/volumes.c:3753! It''s way time for btrfs to stop crashing your system with no recovery flag that works to clear the log if the log can''t be replayed. Hell, on non development systems, it should just auto discard the log if it can''t be replayed without user input. Details: It''s been almost a year that I''m doing my best to test btrfs and report bugs, but how quickly it crashes on mount if anything is off, is a huge usability problem. I just again, lost use of my machine today after an unrelated problem caused a crash/reboot, and incomplete btrfs writes to my device. That happens, it''s life. But after that, I get to roll a dice of whether btrfs will recover, or just crash on mount. It''s slightly more liveable if it''s a scratch filesystem on a developer box, you just don''t mount it. It''s really really sucky if it''s your root filesystem and you need to boot from a rescue partition/media to recover each time. Then, I spent 3 hours reproducing the crash again, with netconsole working so that I can get a useful bugreport, which I send here. I also get an btrfs-image when this happens, but so far I haven''t had any interest from anyone for doing so. So, again, I have an image if someone wants it 787M fs_image I''m really losing faith in btrfs here. I see 2 problems. 1) btrfs doesn''t provide encryption. ecryptfs is so slow it''s unusable (also it doesn''t encrypt long file names anyway), so that leaves us with dmcrypt. Therefore I have to run btrfs on top of dmcrypt. Either btrfs is buggy over dmcrypt in that its writes aren''t atomic, and it very often gets into a state of an unreplayable log. I have no idea if that''s fixable, or not. Or, btrfs truly has a problem on unclean shutdown during writes even without dmcrypt, and more specifically pulling a sata device off a running bus. I''m not sure if developers are testing for pulling a drive from the sata bus while writing to it, and trying this at least 10 times for each kernel version. 2) btrfs is a development filesystem, I understand that, but if it wants more acceptance, it just can''t keep crashing each time it can''t replay a log. At this point BTRFS really must have a ''please clear my log and continue mounting'' if the log is corrupted in some way. Corruption happens through multiple means, clearly getting an inconsistent log on unclean shutdown is not only possible, it''s very frequent. btrfs can''t just panic and refuse to mount the filesystem without any override option. Will you please consider allowing sending the BUG statements that can happen during mount to a function that just cleans the replay log and tries again? If ext4 has a log it can''t use, it doesn''t crash my system, it simply tells me to run fsck and possibly mounts read-only. Below is the crash info, although I suppose it''s not that different from my previous reports. Again, I''m happy to give the fs_image to someone if you want it. device label btrfs_pool1 devid 1 transid 89465 /dev/mapper/root btrfs: disk space caching is enabled Btrfs detected SSD devices, enabling SSD mode btrfs bad tree block start 11014467582957347470 344223137792 btrfs bad tree block start 7450179628261340409 344223137792 ------------[ cut here ]------------ WARNING: at fs/btrfs/tree-log.c:1984 walk_down_log_tree+0x51/0x307() Hardware name: 2429A78 Modules linked in: iwldvm mac80211 iwlwifi cfg80211 e1000e tun cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative ppdev rfcomm bnep autofs4 pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) binfmt_misc uinput nfsd auth_rpcgss nfs_acl nfs lockd fscache sunrpc fuse configs parport_pc lp parport input_polldev loop firewire_sbp2 firewire_core crc_itu_t ecryptfs uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media hid_generic btusb usbhid hid coretemp bluetooth kvm_intel kvm snd_hda_codec_realtek ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss arc4 snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq thinkpad_acpi sdhci_pci i915 iTCO_wdt iTCO_vendor_support drm_kms_helper drm tpm_tis acpi _cpufreq tpm mperf psmouse evdev tpm_bios sdhci serio_raw i2c_algo_bit pcspkr mmc_core xhci_hcd ehci_hcd ac nvram battery wmi lpc_ich usbcore video processor button usb_common mei microcode snd_seq_device snd_timer i2c_i801 snd soundcore i2c_core rfkill raid456 multipath dm_snapshot dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor raid6_pq async_memcpy async_tx xor blowfish_x86_64 blowfish_common ecb thermal thermal_sys crc32c_intel aesni_intel xts aes_x86_64 lrw gf128mul ablk_helper cryptd [last unloaded: cfg80211] Pid: 11214, comm: mount Tainted: G O 3.7.8-amd64-preempt-20130222 #1 Call Trace: [<ffffffff81040ed4>] warn_slowpath_common+0x7e/0x96 [<ffffffff81040f01>] warn_slowpath_null+0x15/0x17 [<ffffffff8120dff1>] walk_down_log_tree+0x51/0x307 [<ffffffff8120e321>] walk_log_tree+0x7a/0x1bc [<ffffffff81210a3d>] btrfs_recover_log_trees+0x9f/0x2ff [<ffffffff8120eafa>] ? replay_one_buffer+0x26d/0x26d [<ffffffff811e2c7c>] open_ctree+0x1443/0x1823 [<ffffffff81299bde>] ? string.isra.3+0x3d/0xa4 [<ffffffff811c4f97>] btrfs_mount+0x36d/0x4cd [<ffffffff810ea49e>] ? pcpu_next_pop+0x38/0x45 [<ffffffff810eb51b>] ? pcpu_alloc+0x7ee/0x82a [<ffffffff8112c9fe>] ? alloc_vfsmnt+0xa6/0x192 [<ffffffff81118d2c>] mount_fs+0x64/0x150 [<ffffffff810eb562>] ? __alloc_percpu+0xb/0xd [<ffffffff8112cdb7>] vfs_kern_mount+0x64/0xde [<ffffffff8112d19a>] do_kern_mount+0x48/0xda [<ffffffff8112ec68>] do_mount+0x6b1/0x714 [<ffffffff810e74de>] ? memdup_user+0x37/0x5f [<ffffffff810e753d>] ? strndup_user+0x37/0x4c [<ffffffff8112ed4e>] sys_mount+0x83/0xbd [<ffffffff814c6bed>] system_call_fastpath+0x1a/0x1f ---[ end trace 29a8d40b4129ace6 ]--- ------------[ cut here ]------------ WARNING: at fs/btrfs/tree-log.c:1988 walk_down_log_tree+0x6c/0x307() Hardware name: 2429A78 Modules linked in: iwldvm mac80211 iwlwifi cfg80211 e1000e tun cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative ppdev rfcomm bnep autofs4 pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) binfmt_misc uinput nfsd auth_rpcgss nfs_acl nfs lockd fscache sunrpc fuse configs parport_pc lp parport input_polldev loop firewire_sbp2 firewire_core crc_itu_t ecryptfs uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media hid_generic btusb usbhid hid coretemp bluetooth kvm_intel kvm snd_hda_codec_realtek ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss arc4 snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq thinkpad_acpi sdhci_pci i915 iTCO_wdt iTCO_vendor_support drm_kms_helper drm tpm_tis acpi _cpufreq tpm mperf psmouse evdev tpm_bios sdhci serio_raw i2c_algo_bit pcspkr mmc_core xhci_hcd ehci_hcd ac nvram battery wmi lpc_ich usbcore video processor button usb_common mei microcode snd_seq_device snd_timer i2c_i801 snd soundcore i2c_core rfkill raid456 multipath dm_snapshot dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor raid6_pq async_memcpy async_tx xor blowfish_x86_64 blowfish_common ecb thermal thermal_sys crc32c_intel aesni_intel xts aes_x86_64 lrw gf128mul ablk_helper cryptd [last unloaded: cfg80211] Pid: 11214, comm: mount Tainted: G W O 3.7.8-amd64-preempt-20130222 #1 Call Trace: [<ffffffff81040ed4>] warn_slowpath_common+0x7e/0x96 [<ffffffff81040f01>] warn_slowpath_null+0x15/0x17 [<ffffffff8120e00c>] walk_down_log_tree+0x6c/0x307 [<ffffffff8120e321>] walk_log_tree+0x7a/0x1bc [<ffffffff81210a3d>] btrfs_recover_log_trees+0x9f/0x2ff [<ffffffff8120eafa>] ? replay_one_buffer+0x26d/0x26d [<ffffffff811e2c7c>] open_ctree+0x1443/0x1823 [<ffffffff81299bde>] ? string.isra.3+0x3d/0xa4 [<ffffffff811c4f97>] btrfs_mount+0x36d/0x4cd [<ffffffff810ea49e>] ? pcpu_next_pop+0x38/0x45 [<ffffffff810eb51b>] ? pcpu_alloc+0x7ee/0x82a [<ffffffff8112c9fe>] ? alloc_vfsmnt+0xa6/0x192 [<ffffffff81118d2c>] mount_fs+0x64/0x150 [<ffffffff810eb562>] ? __alloc_percpu+0xb/0xd [<ffffffff8112cdb7>] vfs_kern_mount+0x64/0xde [<ffffffff8112d19a>] do_kern_mount+0x48/0xda [<ffffffff8112ec68>] do_mount+0x6b1/0x714 [<ffffffff810e74de>] ? memdup_user+0x37/0x5f [<ffffffff810e753d>] ? strndup_user+0x37/0x4c [<ffffffff8112ed4e>] sys_mount+0x83/0xbd [<ffffffff814c6bed>] system_call_fastpath+0x1a/0x1f ---[ end trace 29a8d40b4129ace7 ]--- parent transid verify failed on 11731997996535126505 wanted 2669605077570711694 found 0 ------------[ cut here ]------------ kernel BUG at fs/btrfs/volumes.c:3753! invalid opcode: 0000 [#1] PREEMPT SMP Modules linked in: iwldvm mac80211 iwlwifi cfg80211 e1000e tun cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative ppdev rfcomm bnep autofs4 pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) binfmt_misc uinput nfsd auth_rpcgss nfs_acl nfs lockd fscache sunrpc fuse configs parport_pc lp parport input_polldev loop firewire_sbp2 firewire_core crc_itu_t ecryptfs uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media hid_generic btusb usbhid hid coretemp bluetooth kvm_intel kvm snd_hda_codec_realtek ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss arc4 snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq thinkpad_acpi sdhci_pci i915 iTCO_wdt iTCO_vendor_support drm_kms_helper drm tpm_tis acpi _cpufreq tpm mperf psmouse evdev tpm_bios sdhci serio_raw i2c_algo_bit pcspkr mmc_core xhci_hcd ehci_hcd ac nvram battery wmi lpc_ich usbcore video processor button usb_common mei microcode snd_seq_device snd_timer i2c_i801 snd soundcore i2c_core rfkill raid456 multipath dm_snapshot dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor raid6_pq async_memcpy async_tx xor blowfish_x86_64 blowfish_common ecb thermal thermal_sys crc32c_intel aesni_intel xts aes_x86_64 lrw gf128mul ablk_helper cryptd [last unloaded: cfg80211] CPU 0 Pid: 11214, comm: mount Tainted: G W O 3.7.8-amd64-preempt-20130222 #1 LENOVO 2429A78/2429A78 RIP: 0010:[<ffffffff81202160>] [<ffffffff81202160>] btrfs_num_copies+0x42/0x8b RSP: 0018:ffff88013329b948 EFLAGS: 00010246 RAX: 0000000000000000 RBX: a2d06e04e11219e9 RCX: 0000000000000001 RDX: ffffffffffffffff RSI: a2d06e04e11219e9 RDI: ffff88013329a000 RBP: ffff88013329b978 R08: 0000000000000000 R09: 00000000ffe7d802 R10: 00000000ffe7d802 R11: 00000000000000c0 R12: ffff88010267e138 R13: 0000000000000000 R14: 00000000fffffffb R15: 0000000000000000 FS: 00007f3b6578d7e0(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000604130 CR3: 00000001419c5000 CR4: 00000000001407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mount (pid: 11214, threadinfo ffff88013329a000, task ffff880132d38180) Stack: ffff88013329b968 0000000000001000 ffff8801df36a800 ffff8801ef7e25c0 ffff8801df36a800 0000000000000000 ffff88013329b9d8 ffffffff811df295 ffff88013043cc58 250c57ac832e708e 0000000000001000 ffff88013043cc50 Call Trace: [<ffffffff811df295>] btree_read_extent_buffer_pages.constprop.116+0xa8/0x105 [<ffffffff811e0fbf>] btrfs_read_buffer+0x2a/0x2c [<ffffffff8120e1d0>] walk_down_log_tree+0x230/0x307 [<ffffffff8120e321>] walk_log_tree+0x7a/0x1bc [<ffffffff81210a3d>] btrfs_recover_log_trees+0x9f/0x2ff [<ffffffff8120eafa>] ? replay_one_buffer+0x26d/0x26d [<ffffffff811e2c7c>] open_ctree+0x1443/0x1823 [<ffffffff81299bde>] ? string.isra.3+0x3d/0xa4 [<ffffffff811c4f97>] btrfs_mount+0x36d/0x4cd [<ffffffff810ea49e>] ? pcpu_next_pop+0x38/0x45 [<ffffffff810eb51b>] ? pcpu_alloc+0x7ee/0x82a [<ffffffff8112c9fe>] ? alloc_vfsmnt+0xa6/0x192 [<ffffffff81118d2c>] mount_fs+0x64/0x150 [<ffffffff810eb562>] ? __alloc_percpu+0xb/0xd [<ffffffff8112cdb7>] vfs_kern_mount+0x64/0xde [<ffffffff8112d19a>] do_kern_mount+0x48/0xda [<ffffffff8112ec68>] do_mount+0x6b1/0x714 [<ffffffff810e74de>] ? memdup_user+0x37/0x5f [<ffffffff810e753d>] ? strndup_user+0x37/0x4c [<ffffffff8112ed4e>] sys_mount+0x83/0xbd [<ffffffff814c6bed>] system_call_fastpath+0x1a/0x1f Code: 83 ec 18 48 89 55 d8 e8 cf fc 2b 00 48 8b 55 d8 4c 89 ef 48 89 de e8 22 26 ff ff 4c 89 e7 49 89 c5 e8 7d ff 2b 00 4d 85 ed 75 02 <0f> 0b 49 8b 45 18 48 39 d8 77 09 49 03 45 20 48 39 d8 73 02 0f RIP [<ffffffff81202160>] btrfs_num_copies+0x42/0x8b RSP <ffff88013329b948> ---[ end trace 29a8d40b4129ace8 ]--- Kernel panic - not syncing: Fatal exception -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Josef Bacik
2013-Feb-26 14:23 UTC
Re: kernel BUG at fs/btrfs/volumes.c:3753! These btrfs crashes at mount time on log replay are really a problem
On Mon, Feb 25, 2013 at 11:51:02PM -0700, Marc MERLIN wrote:> TL;DR; > WARNING: at fs/btrfs/tree-log.c:1984 walk_down_log_tree+0x51/0x307() > WARNING: at fs/btrfs/tree-log.c:1988 walk_down_log_tree+0x6c/0x307() > kernel BUG at fs/btrfs/volumes.c:3753! > > It''s way time for btrfs to stop crashing your system with no recovery flag > that works to clear the log if the log can''t be replayed. Hell, on non > development systems, it should just auto discard the log if it can''t be > replayed without user input. > > > Details: > It''s been almost a year that I''m doing my best to test btrfs and report > bugs, but how quickly it crashes on mount if anything is off, is a huge > usability problem. > > I just again, lost use of my machine today after an unrelated problem caused > a crash/reboot, and incomplete btrfs writes to my device. > That happens, it''s life. > > But after that, I get to roll a dice of whether btrfs will recover, or just > crash on mount. > It''s slightly more liveable if it''s a scratch filesystem on a developer box, > you just don''t mount it. > It''s really really sucky if it''s your root filesystem and you need to boot > from a rescue partition/media to recover each time. > > Then, I spent 3 hours reproducing the crash again, with netconsole working > so that I can get a useful bugreport, which I send here.So how did you reproduce it? I''ll take a fs_image, but being able to reproduce the problem is more valuable. Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc MERLIN
2013-Feb-26 16:20 UTC
Re: kernel BUG at fs/btrfs/volumes.c:3753! These btrfs crashes at mount time on log replay are really a problem
On Tue, Feb 26, 2013 at 09:23:00AM -0500, Josef Bacik wrote:> So how did you reproduce it? I''ll take a fs_image, but being able to reproduce > the problem is more valuable. Thanks,Here''s the image: http://marc.merlins.org/tmp/fs_image I just wrote details in the other message I just Cced you on. Please let me know if I''m missing more. Ah yes, I do use discard passthrough in dm: dmsetup table /dev/mapper/cryptroot --showkeys | grep discard 0 926304944 crypt aes-xts-plain fdf87a0d79b33008185258e61f890a5a92b39490e0ff62d3690e8cc4591a6e8a 0 8:4 8192 1 allow_discards as well as in btrfs mount of course: LABEL=btrfs_pool1 / btrfs subvol=root,defaults,compress=lzo,discard,nossd,space_cache,noatime But the thing that worries me is: I understand how it would be beneficial for you to to reproduce my crashes, but from what I heard, the workload you code and optimize for is pretty different from mine :) I would think that if you put btrfs on top of dmcrypt (that part may or may not be part of the equation) and pull the sata drive from the bus during writes, you are bound to reproduce this within 10 times (it happens maybe one time out of 3 for me). That said, forgive me if I''m going to say something stupid, I''m not a filesystem developer :) If somehow, you can pin down the cause of what I''m seeing, that would be great (assuming it''s a bug between dmcrypt and btrfs), but would you agree that in the end as a filesystem developer, you can''t always control random ordering of writes or incomplete writes to a drive before it gets shut down? (drives do their own reordering separately from linux anyway, right?) I''m also not talking about actual hardware caused corruption and bad RAM chips. Am I wrong when saying that ending up with replay journals that have unexpected data and that can''t be replayed is just inevitable and something any journalling filesystem must deal with? If I''m not wrong, it looks like btrfs is very good at detecting unexpected conditions so that it doesn''t replay bad logs and cause corruption as a result, but instead of discarding the log, it just crashes the kernel with BUG(). My previous message explains how vexing and time consuming it is to recover when it''s your root filesystem on a laptop :) Given that, would it indeed be feasible to do a sweep of the log replay code and have it discard bad logs (unless it''s in development mode for folks like you) and continue the mount with just a few transactions lost but a consistent filesystem, instead of crashing the kernel and leaving the user with an unbootable system? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Zach Brown
2013-Feb-26 18:24 UTC
Re: kernel BUG at fs/btrfs/volumes.c:3753! These btrfs crashes at mount time on log replay are really a problem
> Am I wrong when saying that ending up with replay journals that have > unexpected data and that can''t be replayed is just inevitable and something > any journalling filesystem must deal with?If by journal you mean the btrfs log then yes, strictly speaking, you''re wrong. btrfs does deal with the kind of incomplete and reordered writes that you''re talking about and it should not result in corruption of what it calls the log. But it''s a reasonable thing to be confused by. I''m guessing that you''re being tripped up by what ext3 means by a journal and by what btrfs means by a log. The journal in ext3 can be partially written during a crash. The journal replay on mount notices this because the commit block isn''t present and just throws it away. No worries. The equivalent consistent update mechanism in btrfs is cow tree updates. The superblock that references new tree blocks written to free space is itself only written once all those blocks are stable on disk. If the tree block writes are interrupted then the superblock isn''t updated and btrfs won''t see the partially written blocks. No worries. The btrfs "log" is itself just a logical btree *inside these consistent tree updates* that records logical operations that will need to be replayed. For the log to be corrupted, if the btrfs code is perfect, the storage had to have lied to btrfs and told it that tree update blocks were stable which caused the superblock write that referenced them prematurely. The equivalent problem in the ext3 journal would be a transaction that has blocks missing but which has a valid commit block. ext3 couldn''t just throw this transaction away because after the commit block write it could have been in the process of replaying the transaction blocks at their final location on disk. And it''s now missing some of those blocks to replay. This kind of corruption Shouldn''t Happen and the fs can''t just silently ignore it. I absolutely agree that the error messages should be greatly improved in this case, yes, and that it shouldn''t BUG_ON (it should *never* BUG_ON). But btrfs is right to refuse to silently revert previously stable changes by just ignoring the corrupt log. - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc MERLIN
2013-Feb-26 19:09 UTC
Re: kernel BUG at fs/btrfs/volumes.c:3753! These btrfs crashes at mount time on log replay are really a problem
On Tue, Feb 26, 2013 at 10:24:42AM -0800, Zach Brown wrote:> > Am I wrong when saying that ending up with replay journals that have > > unexpected data and that can''t be replayed is just inevitable and something > > any journalling filesystem must deal with? > > If by journal you mean the btrfs log then yes, strictly speaking, you''reI did and used the wrong term, sorry.> wrong. btrfs does deal with the kind of incomplete and reordered writes > that you''re talking about and it should not result in corruption of what > it calls the log.Got it. So in theory what I''m seeing really shouldn''t happen, unless there is a corruption bug, or hardware fault.> But it''s a reasonable thing to be confused by. I''m guessing that you''re > being tripped up by what ext3 means by a journal and by what btrfs means > by a log.yes :)> The journal in ext3 can be partially written during a crash. The > journal replay on mount notices this because the commit block isn''t > present and just throws it away. No worries.That''s indeed what I meant.> The equivalent consistent update mechanism in btrfs is cow tree updates. > The superblock that references new tree blocks written to free space is > itself only written once all those blocks are stable on disk. If the > tree block writes are interrupted then the superblock isn''t updated and > btrfs won''t see the partially written blocks. No worries.Ok. So basically what I seem to be hitting, are seemingly complete tree blocks in the log, that aren''t complete or consistent afterall, triggering BUG() in the end.> replayed. For the log to be corrupted, if the btrfs code is perfect, > the storage had to have lied to btrfs and told it that tree update > blocks were stable which caused the superblock write that referenced > them prematurely.That makes sense. Is dmcrypt with discard passthrough potentially able to tell btrfs that writes did make it to disk when in fact someonly some of them have?> The equivalent problem in the ext3 journal would be a transaction that > has blocks missing but which has a valid commit block. ext3 couldn''t > just throw this transaction away because after the commit block write it > could have been in the process of replaying the transaction blocks at > their final location on disk. And it''s now missing some of those blocks > to replay. This kind of corruption Shouldn''t Happen and the fs can''t > just silently ignore it.I understand that, and I''m not advocating silent ignores either.> I absolutely agree that the error messages should be greatly improved in > this case, yes, and that it shouldn''t BUG_ON (it should *never* BUG_ON). > > But btrfs is right to refuse to silently revert previously stable > changes by just ignoring the corrupt log.That''s the part I''m a bit confused about. It could complain, it could require a ''recovery'' option on the kernel command line (which does exist but doesn''t work for this purpose), but effectively it forces me to indeed ignore seemingly stable changes which are indeed unusable, and the code knows they are, except that I have to go back to boot media and go through hoops to get a netconsole dump/crash before that, and then use btrfs-zero-log. Once btrfs knows that the last log entry(ies) are corrupted in some way, it could dump a bunch of diag data in the kernel, then delete the unusable log and continue with the mount, which is effectively what I do myself right now. Unless I''m mistaken, this really only causes me to lose some amount of data that was written just before the crash/reboot. This is usually ok for most users and typically expected anyway (although not as desirable in the case of a database server of course). I understand that this might not want to be a default, but it should be something that could be done from the kernel command line, and allow the mount to continue and the boot to succeed (including the kernel debug log to make it to local disk, where it can then be Emailed since serial console or netconsole is not something you can do easily) Does that sound reasonable? Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Zach Brown
2013-Feb-26 19:28 UTC
Re: kernel BUG at fs/btrfs/volumes.c:3753! These btrfs crashes at mount time on log replay are really a problem
> > The equivalent consistent update mechanism in btrfs is cow tree updates. > > The superblock that references new tree blocks written to free space is > > itself only written once all those blocks are stable on disk. If the > > tree block writes are interrupted then the superblock isn''t updated and > > btrfs won''t see the partially written blocks. No worries. > > Ok. So basically what I seem to be hitting, are seemingly complete tree > blocks in the log, that aren''t complete or consistent afterall, triggering > BUG() in the end.That''s my understanding, yes. I think Josef is digging in.> I understand that this might not want to be a default, but it should be > something that could be done from the kernel command line, and allow the > mount to continue and the boot to succeed (including the kernel debug log to > make it to local disk, where it can then be Emailed since serial console or > netconsole is not something you can do easily) > > Does that sound reasonable?To me? Absolutely. Having to recover with btrfs-zero-log is annoying. I''d be surprised if anyone didn''t agree that the current process should be improved. - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html