Leho Kraav
2012-Apr-09 13:24 UTC
btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
Hi all $ uname -a Gentoo Linux s9 3.3.1-pf #2 SMP PREEMPT Mon Apr 9 00:35:28 EEST 2012 i686 Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz GenuineIntel GNU/Linux I was running stuff for the past year or so on 4 partitions: /dev/sda1 -> dm-crypt -> btrfs raid 0 ROOT 10.0GB /dev/sda2 -> dm-crypt -> btrfs raid 0 ROOT 10.0GB /dev/sda3 -> dm-crypt -> btrfs raid 0 HOME 10.0GB /dev/sda4 -> dm-crypt -> btrfs raid 0 HOME 10.0GB Both filesystems mounted with "noatime,nodiratime,ssd,discard,compress=lzo" I set that multi-partition monster up back in the 2.6.36ish days, when dm-crypt either was not capable of utilizing multicores on a single partition or I possibly didn''t know that it already could. At one point it definitely couldn''t. So over time HOME started filling up and at the point of last night''s baby eating "df -hT" showed 1.7G free. Yes I know free space is complicated in btrfs. Space had not been an issue so I didn''t think to use any better tools regularly to check, such as "btrfs fi show" I guess. I upgraded my 3.2.2-pf to 3.3.1-pf* and proceeding to launching my regular apps Firefox, TB, office, etc. Except they all hung. Checking my /var/log/message window revealed what was happening: * pf-sources => http://pf.natalenko.name/ ... Apr 8 02:45:52 s9 sudo: leho : TTY=pts/0 ; PWD=/home/leho ; USER=root ; COMMAND=/bin/tail - f /home/leho/.tail/awesome-leho /home/leho/.tail/messages /home/leho/.tail/openvpn.log Apr 8 02:45:52 s9 sudo: pam_unix(sudo:session): session opened for user root by (uid=0) Apr 8 02:46:11 s9 kernel: [ 189.691778] attempt to access beyond end of device Apr 8 02:46:11 s9 kernel: [ 189.691787] dm-3: rw=129, want=23361976, limit=20967424 Apr 8 02:46:11 s9 kernel: [ 189.691792] attempt to access beyond end of device Apr 8 02:46:11 s9 kernel: [ 189.691795] dm-3: rw=129, want=27556216, limit=20967424 Apr 8 02:46:11 s9 kernel: [ 189.691799] attempt to access beyond end of device ... Apr 8 02:46:11 s9 kernel: [ 189.691869] attempt to access beyond end of device Apr 8 02:46:11 s9 kernel: [ 189.691874] dm-3: rw=129, want=69498616, limit=20967424 ... Apr 8 02:46:11 s9 kernel: [ 189.692233] attempt to access beyond end of device Apr 8 02:46:11 s9 kernel: [ 189.692237] dm-3: rw=129, want=228879736, limit=20967424 (thousands of lines of this, as we can see "want" gets bigger all the time) And it was all downhill from there. Result is a majorly corrupted filesystem that seems to be beyond repair. Hard rebooting back started giving csum errors in various spots and any modifications to the filesystem, even deleting files, would start another flood of "attempt to access beyond end of device", totally messing up syslog-ng. With blazing speedsc of an SSD that probably isn''t a surprise. So searching around, I found out about the ENOSPC thing which is possibly still an issue in 3.3. Is there any useful info I could provide for this? I now have some bigger partitions and probably won''t run out of space again for a while. I also discovered the btrfs "restore" binary, although possibly it was too late, since I had already hard rebooted a few times and done some more damage to HOME. This thing returned a whole bunch of "ret is -3" messages, and 0 byte files. Occasionally files were good as well. But majority of the files, seems to corrupt. When running out of space happens, is this a reasonable result to expect? "btrfs scrub" reported uncorrectable errors count in the millions. At least thousands of csum mismatch errors visible in dmesg. "btrfs balance" would bomb the machine with the same "access beyond end of device". I made images of the two btrfs partitions on sda3 and sda4 for future diagnosis. I do think they are pretty corrupt though. Or could there be some magic poke or offset that would make more stuff magically "restore"-able :> So in conclusion: * is filesystem-wide corruption like this helped by running on top of dm-crypt or btrfs multi device? dm-crypt is definitely staying for me, but I did consolidate partitions now to just 2. * what exactly should happen when an out of space scenario like the above happens? * I guess I should keep an eye on "btrfs fi show" on the regular? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Daniel J Blueman
2012-Apr-09 14:35 UTC
Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
Leho Kraav <leho <at> kraav.com> writes: []> Apr 8 02:46:11 s9 kernel: [ 189.691778] attempt to access beyond end > of device > Apr 8 02:46:11 s9 kernel: [ 189.691787] dm-3: rw=129, want=23361976, > limit=20967424I recently bumped into this too [1]. Liu Bo posted a patch for it [2], which tests out fine here. The workaround is to not mount with ''discard'' until eg ~3.4-rc3 or later. Thanks, Daniel [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409 [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649 -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Leho Kraav
2012-Apr-09 14:44 UTC
Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
On 09.04.2012 17:35, Daniel J Blueman wrote:> Leho Kraav<leho<at> kraav.com> writes: > [] >> Apr 8 02:46:11 s9 kernel: [ 189.691778] attempt to access beyond end >> of device >> Apr 8 02:46:11 s9 kernel: [ 189.691787] dm-3: rw=129, want=23361976, >> limit=20967424 > > I recently bumped into this too [1]. Liu Bo posted a patch for it [2], > which tests out fine here. The workaround is to not mount with > ''discard'' until eg ~3.4-rc3 or later. > > [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409 > [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649Oh wow, thanks. This sounds exactly like what happened. I got the livelock post off my search results, but the patch post doesn''t seem to have any of the keywords I was looking for, since I had no idea it could be related to discards. So can this become a problem earlier too, not only when the space used is approaching limits? If not, I think I should be good until 3.4: $ sudo btrfs fi show Label: ''S9-HOME'' uuid: 1ed06dbc-e1b7-433f-8d1b-19cf1f7756f1 Total devices 1 FS bytes used 12.93GB devid 1 size 60.00GB used 20.04GB path /dev/dm-0 Label: ''S9-ROOT'' uuid: 6206dfce-afcf-4afe-9047-b1c88a7889fd Total devices 1 FS bytes used 8.75GB devid 1 size 30.00GB used 18.29GB path /dev/dm-1 I think I''d like to keep using "discard" for SSD still, unless a smart person says it''s not particularly useful anyway. So while I''m on 3.3, is the patch from gmane:16649 good enough to eliminate immediate dangers? And is the previous filesystem still hosed for good then? Or mounting the images with -discard might help? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Daniel J Blueman
2012-Apr-09 14:54 UTC
Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
On 9 April 2012 22:44, Leho Kraav <leho@kraav.com> wrote:> On 09.04.2012 17:35, Daniel J Blueman wrote: >> >> Leho Kraav<leho<at> kraav.com> writes: >> [] >>> >>> Apr 8 02:46:11 s9 kernel: [ 189.691778] attempt to access beyond end >>> of device >>> Apr 8 02:46:11 s9 kernel: [ 189.691787] dm-3: rw=129, want=23361976, >>> limit=20967424 >> >> >> I recently bumped into this too [1]. Liu Bo posted a patch for it [2], >> which tests out fine here. The workaround is to not mount with >> ''discard'' until eg ~3.4-rc3 or later. >> >> [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409 >> [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649 > > Oh wow, thanks. This sounds exactly like what happened. I got the livelock > post off my search results, but the patch post doesn''t seem to have any of > the keywords I was looking for, since I had no idea it could be related to > discards. > > So can this become a problem earlier too, not only when the space used is > approaching limits? If not, I think I should be good until 3.4:Looks like it affects at least 3.3 and 3.4-rc1/2 in all circumstances.> $ sudo btrfs fi show > Label: ''S9-HOME'' uuid: 1ed06dbc-e1b7-433f-8d1b-19cf1f7756f1 > Total devices 1 FS bytes used 12.93GB > devid 1 size 60.00GB used 20.04GB path /dev/dm-0 > > Label: ''S9-ROOT'' uuid: 6206dfce-afcf-4afe-9047-b1c88a7889fd > Total devices 1 FS bytes used 8.75GB > devid 1 size 30.00GB used 18.29GB path /dev/dm-1 > > I think I''d like to keep using "discard" for SSD still, unless a smart > person says it''s not particularly useful anyway.If your SSD has background garbage collection and there are disk idle periods, the synchronous discards will have little benefit.> So while I''m on 3.3, is the patch from gmane:16649 good enough to eliminate > immediate dangers?Yes.> And is the previous filesystem still hosed for good then? Or mounting the > images with -discard might help?It seems like the kernel caught and prevented the discard after the end of the partition, so the data should be fine; scrubbing will tell you. Daniel -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin Steigerwald
2012-Apr-09 19:07 UTC
Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
Am Montag, 9. April 2012 schrieb Daniel J Blueman:> On 9 April 2012 22:44, Leho Kraav <leho@kraav.com> wrote: > > On 09.04.2012 17:35, Daniel J Blueman wrote: > >> Leho Kraav<leho<at> kraav.com> writes: > >> [] > >> > >>> Apr 8 02:46:11 s9 kernel: [ 189.691778] attempt to access beyond > >>> end of device > >>> Apr 8 02:46:11 s9 kernel: [ 189.691787] dm-3: rw=129, > >>> want=23361976, limit=20967424 > >> > >> I recently bumped into this too [1]. Liu Bo posted a patch for it > >> [2], which tests out fine here. The workaround is to not mount with > >> ''discard'' until eg ~3.4-rc3 or later. > >> > >> [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409 > >> [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649 > > > > Oh wow, thanks. This sounds exactly like what happened. I got the > > livelock post off my search results, but the patch post doesn''t seem > > to have any of the keywords I was looking for, since I had no idea > > it could be related to discards. > > > > So can this become a problem earlier too, not only when the space > > used is > > > approaching limits? If not, I think I should be good until 3.4: > Looks like it affects at least 3.3 and 3.4-rc1/2 in all circumstances.Is offline discard via fstrim also affected? I used fstrim some times for my / BTRFS with 3.3.0-trunk Debian kernel (should be 3.3.0) and martin@merkaba:~> zgrep "beyond" /var/log/syslog* martin@merkaba:~#1> Seems I am safe. But I think I won´t use fstrim for now anymore on any BTRFS partition until I have some confirmation that it is safe. Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Leho Kraav
2012-Apr-09 20:58 UTC
Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
On 09.04.2012 17:54, Daniel J Blueman wrote:> On 9 April 2012 22:44, Leho Kraav<leho@kraav.com> wrote: > >> And is the previous filesystem still hosed for good then? Or mounting the >> images with -discard might help? > > It seems like the kernel caught and prevented the discard after the > end of the partition, so the data should be fine; scrubbing will tell > you.Without the patch at least, it''s BUG time. This is what happens when mounting the image. ... [171555.937706] device label HOME devid 1 transid 370409 /dev/loop3 [171555.956786] device label HOME devid 2 transid 370409 /dev/loop4 [171647.077501] device label HOME devid 2 transid 370409 /dev/loop4 [171647.196262] btrfs: continuing balance [171650.826278] btrfs: relocating block group 18278776832 flags 9 [171651.218444] btrfs csum failed ino 257 off 262144 csum 3439556781 private 289331560 [171651.226455] btrfs csum failed ino 257 off 196608 csum 3957169907 private 1046207033 [171651.227070] btrfs csum failed ino 257 off 196608 csum 3957169907 private 1046207033 [171652.484666] ------------[ cut here ]------------ [171652.484669] kernel BUG at fs/btrfs/volumes.c:2487! [171652.484671] invalid opcode: 0000 [#1] PREEMPT SMP [171652.484673] Modules linked in: btrfs zlib_deflate lrw gf128mul vboxnetadp(O) vboxnetflt(O) vboxdrv(O) coretemp it87 hwmon_vid hwmon nfs autofs4 nfsd lockd nfs_acl auth_rpcgss sunrpc iptable_mangle ipt_ULOG xt_recent xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables squashfs imon rfcomm bnep ext4 jbd2 snd_dummy loop fuse crc32c_intel nvidia(PO) snd_hda_codec_realtek dvb_usb_dib0700 dib7000p dib0070 dvb_usb dvb_core snd_hda_intel snd_hda_codec snd_pcm rc_core btusb bluetooth snd_timer r8168(O) processor skge snd rtc_cmos sg snd_page_alloc button dibx000_common i2c_i801 hid_logitech_dj hid_logitech usbhid sr_mod cdrom firewire_ohci firewire_core crc_itu_t pata_jmicron uhci_hcd [last unloaded: imon] [171652.484703] [171652.484705] Pid: 21206, comm: btrfs-balance Tainted: P O 3.3.1-vs2.3.3.2+pf #1 Gigabyte Technology Co., Ltd. P55M-UD2/P55M-UD2 [171652.484708] EIP: 0060:[<fa252ec9>] EFLAGS: 00010282 CPU: 1 [171652.484718] EIP is at btrfs_balance+0xe79/0xed0 [btrfs] [171652.484719] EAX: fffffffb EBX: d0e58e00 ECX: 80240022 EDX: 80240023 [171652.484721] ESI: 7fd00000 EDI: 00000002 EBP: cc046068 ESP: dc5a3ef0 [171652.484722] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [171652.484724] Process btrfs-balance (pid: 21206, ti=dc5a2000 task=c40ee750 task.ti=dc5a2000) [171652.484725] Stack: [171652.484726] 00000096 c14d806c c1048ebb 00000046 00000046 7fe00000 00000002 df15d800 [171652.484729] 00000000 c8efc000 c1580e3d 00000030 00000000 00000246 00000000 ec85a800 [171652.484733] 00029e7f 0002fea6 dc5a3f52 00000010 00000000 00000003 00000246 00000000 [171652.484736] Call Trace: [171652.484740] [<c1048ebb>] ? up+0xb/0x40 [171652.484743] [<c104dbfe>] ? try_to_wake_up+0x6e/0x100 [171652.484745] [<c104dc98>] ? default_wake_function+0x8/0x10 [171652.484752] [<fa252f7f>] ? balance_kthread+0x5f/0xa0 [btrfs] [171652.484759] [<fa252f20>] ? btrfs_balance+0xed0/0xed0 [btrfs] [171652.484761] [<c104381e>] ? kthread+0x6e/0x80 [171652.484763] [<c10437b0>] ? kthread_freezable_should_stop+0x50/0x50 [171652.484771] [<c13c0fb6>] ? kernel_thread_helper+0x6/0xd [171652.484772] Code: 00 00 83 ea 02 83 c7 02 e9 ee fe ff ff c6 07 00 66 ba ff 03 8b 7c 24 60 83 c7 01 e9 cf fe ff ff 31 db e9 70 fe ff ff 0f 0b 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 8b 74 24 7c c7 04 24 5c 47 28 fa 89 74 [171652.484793] EIP: [<fa252ec9>] btrfs_balance+0xe79/0xed0 [btrfs] SS:ESP 0068:dc5a3ef0 [171652.484802] ---[ end trace 15f25988d7f952de ]--- ... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Leho Kraav
2012-Apr-09 21:32 UTC
Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
On 09.04.2012 23:58, Leho Kraav wrote:> On 09.04.2012 17:54, Daniel J Blueman wrote: >> On 9 April 2012 22:44, Leho Kraav<leho@kraav.com> wrote: >> >>> And is the previous filesystem still hosed for good then? Or mounting >>> the >>> images with -discard might help? >> >> It seems like the kernel caught and prevented the discard after the >> end of the partition, so the data should be fine; scrubbing will tell >> you. > > Without the patch at least, it''s BUG time. This is what happens when > mounting the image. >It is also BUG time WITH the patch. Mount succeeds, but "btrfs fi balance HOME" gives us: Apr 10 00:24:18 server sudo: pam_unix(sudo:session): session opened for user root by (uid=1000) Apr 10 00:24:18 server kernel: [ 363.839105] ------------[ cut here ]------------ Apr 10 00:24:18 server kernel: [ 363.839163] kernel BUG at fs/btrfs/volumes.c:2733! Apr 10 00:24:18 server kernel: [ 363.839220] invalid opcode: 0000 [#1] PREEMPT SMP Apr 10 00:24:18 server kernel: [ 363.839258] Modules linked in: btrfs zlib_deflate rfcomm bnep ext4 jbd2 snd_dummy loop fuse crc32c_intel nvidia(PO) snd_hda_codec_realtek snd_hda_intel snd_hda_c odec snd_pcm dvb_usb_dib0700 dvb_usb dib0070 dib7000p dibx000_common imon dvb_core hid_logitech_dj btusb bluetooth hid_logitech rc_core skge snd_page_alloc snd_timer processor snd r8168(O) butto n i2c_i801 rtc_cmos usbhid sr_mod cdrom firewire_ohci firewire_core crc_itu_t uhci_hcd pata_jmicron Apr 10 00:24:18 server kernel: [ 363.839609] Apr 10 00:24:18 server kernel: [ 363.839619] Pid: 4682, comm: btrfs Tainted: P O 3.3.1-vs2.3.3.2+pf #1 Gigabyte Technology Co., Ltd. P55M-UD2/P55M-UD2 Apr 10 00:24:18 server kernel: [ 363.839677] EIP: 0060:[<f4f4deff>] EFLAGS: 00210246 CPU: 1 Apr 10 00:24:18 server kernel: [ 363.839709] EIP is at btrfs_balance+0xe7f/0xed0 [btrfs] Apr 10 00:24:18 server kernel: [ 363.839732] EAX: ffffff00 EBX: ffffffef ECX: 00000003 EDX: 00000303 Apr 10 00:24:18 server kernel: [ 363.839758] ESI: eb868e00 EDI: 00000000 EBP: 00000000 ESP: e8ebbdd8 Apr 10 00:24:18 server kernel: [ 363.839785] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Apr 10 00:24:18 server kernel: [ 363.839809] Process btrfs (pid: 4682, ti=e8eba000 task=eb2c8ab0 task.ti=e8eba000) Apr 10 00:24:18 server kernel: [ 363.839839] Stack: Apr 10 00:24:18 server kernel: [ 363.839850] 00000040 00000001 00000000 00000000 00000000 e8ebbe30 e8ece000 ec713bb4 Apr 10 00:24:18 server kernel: [ 363.839914] 00000097 eb945000 00000097 0000000e 00000000 00000002 00000000 f2f2d3b0 Apr 10 00:24:18 server kernel: [ 363.839987] ec0cdd34 f3153b00 e9240600 ec0cde00 c1094152 c10cbb6b eac98b00 00000001 Apr 10 00:24:18 server kernel: [ 363.840090] Call Trace: Apr 10 00:24:18 server kernel: [ 363.840109] [<c1094152>] ? filemap_fault+0x82/0x420 Apr 10 00:24:18 server kernel: [ 363.840132] [<c10cbb6b>] ? __mem_cgroup_try_charge+0x28b/0x4c0 Apr 10 00:24:18 server kernel: [ 363.840160] [<c10aa619>] ? __do_fault+0x3c9/0x510 Apr 10 00:24:18 server kernel: [ 363.840183] [<c10c2865>] ? kmem_cache_alloc+0x75/0x90 Apr 10 00:24:18 server kernel: [ 363.840212] [<f4f53f19>] ? btrfs_ioctl_balance.isra.52+0x379/0x390 [btrfs] Apr 10 00:24:18 server kernel: [ 363.840246] [<f4f56230>] ? update_ioctl_balance_args+0x2e0/0x2e0 [btrfs] Apr 10 00:24:18 server kernel: [ 363.840280] [<f4f568a1>] ? btrfs_ioctl+0x671/0x1200 [btrfs] Apr 10 00:24:18 server kernel: [ 363.840306] [<c10ae1b4>] ? handle_mm_fault+0x124/0x260 Apr 10 00:24:18 server kernel: [ 363.840334] [<f4f56230>] ? update_ioctl_balance_args+0x2e0/0x2e0 [btrfs] Apr 10 00:24:18 server kernel: [ 363.840363] [<c10e01ea>] ? do_vfs_ioctl+0x7a/0x580 Apr 10 00:24:18 server kernel: [ 363.840386] [<c1020a10>] ? vmalloc_sync_all+0x10/0x10 Apr 10 00:24:18 server kernel: [ 363.840409] [<c1020b95>] ? do_page_fault+0x185/0x3d0 Apr 10 00:24:18 server kernel: [ 363.840432] [<c10cf15f>] ? do_sys_open+0x15f/0x1b0 Apr 10 00:24:18 server kernel: [ 363.840453] [<c10df1e2>] ? do_fcntl+0x232/0x470 Apr 10 00:24:18 server kernel: [ 363.840475] [<c10e071e>] ? sys_ioctl+0x2e/0x60 Apr 10 00:24:18 server kernel: [ 363.840497] [<c13c0a4c>] ? sysenter_do_call+0x12/0x22 Apr 10 00:24:18 server kernel: [ 363.840519] Code: c7 02 e9 ee fe ff ff c6 07 00 66 ba ff 03 8b 7c 24 60 83 c7 01 e9 cf fe ff ff 31 db e9 70 fe ff ff 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b <0f> 0b 8b 74 24 7c c7 04 24 9c f7 f7 f4 89 74 24 04 e8 dc ce 46 Apr 10 00:24:18 server kernel: [ 363.840933] EIP: [<f4f4deff>] btrfs_balance+0xe7f/0xed0 [btrfs] SS:ESP 0068:e8ebbdd8 Apr 10 00:24:18 server kernel: [ 363.841023] ---[ end trace 8be1f61ebfe6132a ]--- ~ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2012-Apr-09 23:19 UTC
Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
On Tue, Apr 10, 2012 at 12:32:00AM +0300, Leho Kraav wrote:> It is also BUG time WITH the patch. Mount succeeds, but "btrfs fi balance > HOME" gives us: > > Apr 10 00:24:18 server sudo: pam_unix(sudo:session): session opened for user > root by (uid=1000) > Apr 10 00:24:18 server kernel: [ 363.839105] ------------[ cut here > ]------------ > Apr 10 00:24:18 server kernel: [ 363.839163] kernel BUG at > fs/btrfs/volumes.c:2733!that''s 2732 if (!(bctl->flags & BTRFS_BALANCE_RESUME)) { 2733 BUG_ON(ret == -EEXIST); ^^^^ 2734 set_balance_control(bctl); 2735 } else { 2736 BUG_ON(ret != -EEXIST); 2737 spin_lock(&fs_info->balance_lock); 2738 update_balance_args(bctl); 2739 spin_unlock(&fs_info->balance_lock); 2740 } IIRC somebody reported similar problem recently. It basically means there''s an inconsistent balance state. Adding Ilya to CC. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2012-Apr-10 09:07 UTC
Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
On Tue, Apr 10, 2012 at 01:19:54AM +0200, David Sterba wrote:> On Tue, Apr 10, 2012 at 12:32:00AM +0300, Leho Kraav wrote: > > It is also BUG time WITH the patch. Mount succeeds, but "btrfs fi balance > > HOME" gives us: > > > > Apr 10 00:24:18 server sudo: pam_unix(sudo:session): session opened for user > root by (uid=1000) > > Apr 10 00:24:18 server kernel: [ 363.839105] ------------[ cut here > ]------------ > > Apr 10 00:24:18 server kernel: [ 363.839163] kernel BUG at > fs/btrfs/volumes.c:2733! > > that''s > > 2732 if (!(bctl->flags & BTRFS_BALANCE_RESUME)) { > 2733 BUG_ON(ret == -EEXIST); > ^^^^ > 2734 set_balance_control(bctl); > 2735 } else { > 2736 BUG_ON(ret != -EEXIST); > 2737 spin_lock(&fs_info->balance_lock); > 2738 update_balance_args(bctl); > 2739 spin_unlock(&fs_info->balance_lock); > 2740 } > > IIRC somebody reported similar problem recently. It basically means > there''s an inconsistent balance state. Adding Ilya to CC.Leho, so you just mount with discard patch and run ''btrfs fi balance <mnt>'', correct ? The problem is that you have balance state on disk (from trying to run balance earlier w/o discard patch) but we are failing to pick it up on mount. Could you please post the entire dmesg and the output of ''btrfs-debug-tree -d <dev>'' somewhere ? Could you also apply the debug patch below, mount your fs and send me dmesg output (no need to run balance, just mount) ? Thanks, Ilya diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 20196f4..86fa082 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1867,6 +1867,7 @@ int open_ctree(struct super_block *sb, csum_root = fs_info->csum_root = btrfs_alloc_root(fs_info); chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info); dev_root = fs_info->dev_root = btrfs_alloc_root(fs_info); +printk("open_ctree\n"); if (!tree_root || !extent_root || !csum_root || !chunk_root || !dev_root) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a872b48..2e39348 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2834,6 +2834,7 @@ static int balance_kthread(void *data) mutex_lock(&fs_info->balance_mutex); set_balance_control(bctl); +printk("balance_kthread: flags %llu\n", (unsigned long long)bctl->flags); if (btrfs_test_opt(fs_info->tree_root, SKIP_BALANCE)) { printk(KERN_INFO "btrfs: force skipping balance\n"); @@ -2858,6 +2859,7 @@ int btrfs_recover_balance(struct btrfs_root *tree_root) struct btrfs_key key; int ret; +printk("recover_balance\n"); path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -2872,7 +2874,11 @@ int btrfs_recover_balance(struct btrfs_root *tree_root) key.type = BTRFS_BALANCE_ITEM_KEY; key.offset = 0; +printk("key.obj %llu\n", (unsigned long long)key.objectid); +printk("key.type %d\n", key.type); +printk("key.off %llu\n", (unsigned long long)key.offset); ret = btrfs_search_slot(NULL, tree_root, &key, path, 0, 0); +printk("search ret %d\n", ret); if (ret < 0) goto out_bctl; if (ret > 0) { /* ret = -ENOENT; */ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Leho Kraav
2012-Apr-10 15:31 UTC
Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
On 10.04.2012 12:07, Ilya Dryomov wrote: > On Tue, Apr 10, 2012 at 01:19:54AM +0200, David Sterba wrote: >> >> IIRC somebody reported similar problem recently. It basically means >> there''s an inconsistent balance state. Adding Ilya to CC. > > Leho, so you just mount with discard patch and run ''btrfs fi balance > <mnt>'', correct ? > > The problem is that you have balance state on disk (from trying to run > balance earlier w/o discard patch) but we are failing to pick it up on > mount. > > Could you please post the entire dmesg and the output of > ''btrfs-debug-tree -d<dev>'' somewhere ? > > Could you also apply the debug patch below, mount your fs and send me > dmesg output (no need to run balance, just mount) ? > Your understanding of situation based on above is correct. Yes I can do all the stuff, but it''s going to take until next week, since I''m travelling and I can''t risk BUG-ing my server remotely. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html