Hi, btrfs balance results in: http://pastebin.com/v5j0809M My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs balance do useful stuff to my free space: kernel-2.6.37-2.fc15.x86_64 btrfs-progs-0.19-12.fc14.x86_64 Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs that had failed due to ENOSP). Up until the crash, btrfs balance did retrieve a couple of Gigs free space though, so that part of the plan worked just fine. Thanks, Erik. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Please find attached the error log, for future reference. Forgot to mention: I could still use the system after this error, so it was not a complete fatal error in that regard. All active processes (mostly rsync) were hanging in state D though, so I couldn''t kill them anymore. Also the FS was not umountable. So I still had to reboot. Thanks, Erik. On 01/17/2011 03:14 PM, Erik Logtenberg wrote:> Hi, > > btrfs balance results in: > > http://pastebin.com/v5j0809M > > My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs > balance do useful stuff to my free space: > > kernel-2.6.37-2.fc15.x86_64 > btrfs-progs-0.19-12.fc14.x86_64 > > Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran > btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs > that had failed due to ENOSP). > Up until the crash, btrfs balance did retrieve a couple of Gigs free > space though, so that part of the plan worked just fine. > > Thanks, > > Erik. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Additionally, I cannot mount the filesystem anymore. mount gives no error messages but hangs in state D. dmesg shows: [ 422.323116] btrfs: use compression Which is a good thing, but it doesn''t do anything otherwise. Thanks, Erik. On 01/17/2011 03:31 PM, Erik Logtenberg wrote:> Hi, > > Please find attached the error log, for future reference. > > Forgot to mention: > I could still use the system after this error, so it was not a complete > fatal error in that regard. All active processes (mostly rsync) were > hanging in state D though, so I couldn''t kill them anymore. Also the FS > was not umountable. So I still had to reboot. > > Thanks, > > Erik. > > > On 01/17/2011 03:14 PM, Erik Logtenberg wrote: >> Hi, >> >> btrfs balance results in: >> >> http://pastebin.com/v5j0809M >> >> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs >> balance do useful stuff to my free space: >> >> kernel-2.6.37-2.fc15.x86_64 >> btrfs-progs-0.19-12.fc14.x86_64 >> >> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran >> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs >> that had failed due to ENOSP). >> Up until the crash, btrfs balance did retrieve a couple of Gigs free >> space though, so that part of the plan worked just fine. >> >> Thanks, >> >> Erik. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Please disregard that last message, the filesystem did mount after a period of hanging in state D. Apparently something called an "orphan" was unlinked: [ 422.323116] btrfs: use compression [ 761.778675] btrfs: unlinked 1 orphans [ 761.841581] SELinux: initialized (dev dm-5, type btrfs), uses xattr Thanks, Erik. On 01/17/2011 03:37 PM, Erik Logtenberg wrote:> Hi, > > Additionally, I cannot mount the filesystem anymore. mount gives no > error messages but hangs in state D. > dmesg shows: > [ 422.323116] btrfs: use compression > Which is a good thing, but it doesn''t do anything otherwise. > > Thanks, > > Erik. > > > On 01/17/2011 03:31 PM, Erik Logtenberg wrote: >> Hi, >> >> Please find attached the error log, for future reference. >> >> Forgot to mention: >> I could still use the system after this error, so it was not a complete >> fatal error in that regard. All active processes (mostly rsync) were >> hanging in state D though, so I couldn''t kill them anymore. Also the FS >> was not umountable. So I still had to reboot. >> >> Thanks, >> >> Erik. >> >> >> On 01/17/2011 03:14 PM, Erik Logtenberg wrote: >>> Hi, >>> >>> btrfs balance results in: >>> >>> http://pastebin.com/v5j0809M >>> >>> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs >>> balance do useful stuff to my free space: >>> >>> kernel-2.6.37-2.fc15.x86_64 >>> btrfs-progs-0.19-12.fc14.x86_64 >>> >>> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran >>> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs >>> that had failed due to ENOSP). >>> Up until the crash, btrfs balance did retrieve a couple of Gigs free >>> space though, so that part of the plan worked just fine. >>> >>> Thanks, >>> >>> Erik. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 17, 2011 at 10:14 PM, Erik Logtenberg <erik@logtenberg.eu> wrote:> Hi, > > btrfs balance results in: > > http://pastebin.com/v5j0809M > > My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs > balance do useful stuff to my free space: > > kernel-2.6.37-2.fc15.x86_64 > btrfs-progs-0.19-12.fc14.x86_64 > > Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran > btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs > that had failed due to ENOSP). > Up until the crash, btrfs balance did retrieve a couple of Gigs free > space though, so that part of the plan worked just fine. >Please try 2.6.36 kernel. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/18/2011 01:54 AM, Yan, Zheng wrote:> On Mon, Jan 17, 2011 at 10:14 PM, Erik Logtenberg <erik@logtenberg.eu> wrote: >> Hi, >> >> btrfs balance results in: >> >> http://pastebin.com/v5j0809M >> >> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs >> balance do useful stuff to my free space: >> >> kernel-2.6.37-2.fc15.x86_64 >> btrfs-progs-0.19-12.fc14.x86_64 >> >> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran >> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs >> that had failed due to ENOSP). >> Up until the crash, btrfs balance did retrieve a couple of Gigs free >> space though, so that part of the plan worked just fine. >> > > Please try 2.6.36 kernel.Thanks for your (short) advice. Could you please elaborate. I was in fact using a 2.6.35.10-74.fc14.x86_64 kernel before, but darkling adviced me to switch to a newer kernel to reclaim free space by balancing -- the idea was that newer kernels have better balancing implementation, more effective at reclaiming free space. Now your advice is to take a small step back again, from 2.6.37 to 2.6.36 (which is still higher than the 2.6.35 I was using before). Is that because you think that 2.6.37 may have introduced the bug that I ran into? Do you think that 2.6.36 is still recent enough to have the effective balancing so that I will in fact be able to reclaim some free space? Or is is just a shot in the dark with no reasoning whatsoever ;) Please don''t feel offended, but from your 4-word sentence I really can''t tell. Thanks, Erik. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hallo, Erik, Du meintest am 18.01.11: [...]> Thanks for your (short) advice. Could you please elaborate. I was in > fact using a 2.6.35.10-74.fc14.x86_64 kernel before,I had to change from 2.6.35.8 to 2.6.37-rc4 (and now 2.6.37) for reliable work. Viele Gruesse! Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 18, 2011 at 9:22 PM, Erik Logtenberg <erik@logtenberg.eu> wrote:> On 01/18/2011 01:54 AM, Yan, Zheng wrote: >> On Mon, Jan 17, 2011 at 10:14 PM, Erik Logtenberg <erik@logtenberg.eu> wrote: >>> Hi, >>> >>> btrfs balance results in: >>> >>> http://pastebin.com/v5j0809M >>> >>> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs >>> balance do useful stuff to my free space: >>> >>> kernel-2.6.37-2.fc15.x86_64 >>> btrfs-progs-0.19-12.fc14.x86_64 >>> >>> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran >>> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs >>> that had failed due to ENOSP). >>> Up until the crash, btrfs balance did retrieve a couple of Gigs free >>> space though, so that part of the plan worked just fine. >>> >> >> Please try 2.6.36 kernel. > > Thanks for your (short) advice. Could you please elaborate. I was in > fact using a 2.6.35.10-74.fc14.x86_64 kernel before, but darkling > adviced me to switch to a newer kernel to reclaim free space by > balancing -- the idea was that newer kernels have better balancing > implementation, more effective at reclaiming free space. > > Now your advice is to take a small step back again, from 2.6.37 to > 2.6.36 (which is still higher than the 2.6.35 I was using before). Is > that because you think that 2.6.37 may have introduced the bug that I > ran into? Do you think that 2.6.36 is still recent enough to have the > effective balancing so that I will in fact be able to reclaim some free > space? Or is is just a shot in the dark with no reasoning whatsoever ;) > > Please don''t feel offended, but from your 4-word sentence I really can''t > tell. >Just try narrowing down the bug, because I never saw bug like this before. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/18/2011 03:13 PM, Yan, Zheng wrote:> On Tue, Jan 18, 2011 at 9:22 PM, Erik Logtenberg <erik@logtenberg.eu> wrote: >> On 01/18/2011 01:54 AM, Yan, Zheng wrote: >>> On Mon, Jan 17, 2011 at 10:14 PM, Erik Logtenberg <erik@logtenberg.eu> wrote: >>>> Hi, >>>> >>>> btrfs balance results in: >>>> >>>> http://pastebin.com/v5j0809M >>>> >>>> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs >>>> balance do useful stuff to my free space: >>>> >>>> kernel-2.6.37-2.fc15.x86_64 >>>> btrfs-progs-0.19-12.fc14.x86_64 >>>> >>>> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran >>>> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs >>>> that had failed due to ENOSP). >>>> Up until the crash, btrfs balance did retrieve a couple of Gigs free >>>> space though, so that part of the plan worked just fine. >>>> >>> >>> Please try 2.6.36 kernel. >> >> Thanks for your (short) advice. Could you please elaborate. I was in >> fact using a 2.6.35.10-74.fc14.x86_64 kernel before, but darkling >> adviced me to switch to a newer kernel to reclaim free space by >> balancing -- the idea was that newer kernels have better balancing >> implementation, more effective at reclaiming free space. >> >> Now your advice is to take a small step back again, from 2.6.37 to >> 2.6.36 (which is still higher than the 2.6.35 I was using before). Is >> that because you think that 2.6.37 may have introduced the bug that I >> ran into? Do you think that 2.6.36 is still recent enough to have the >> effective balancing so that I will in fact be able to reclaim some free >> space? Or is is just a shot in the dark with no reasoning whatsoever ;) >> >> Please don''t feel offended, but from your 4-word sentence I really can''t >> tell. >> > > Just try narrowing down the bug, because I never saw bug like this before.Okay I can try that. Please note though that I cannot reliably reproduce the bug. At this moment I am in the middle of my second try at balancing the FS (still on 2.6.37), this time without 8 rsync''s banging on the FS. So far, everything is completely stable. I could downgrade to 2.6.36 after this balance and then re-try balancing, but if this second go doesn''t crash like the first try, then a succesful rebalance on 2.6.36 won''t tell us much. Please note that it could be a combination of bugs. I ran into an out-of-space issue in the middle of a backup first (at that time on 2.6.35), and also noticed some minor file corruption as a result. Then I switched over to 2.6.37 to fix the out-of-space issue (as there should have been 45G free) using a balance. During that balance operation I then ran in to the bug that I reported in my previous email. So it could be the 2.6.37 kernel hitting a minor FS corruption caused by out-of-space issues with the 2.6.35 kernel. I have no idea how I could reproduce this at all. Thanks, Erik. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, I hit the same bug again I think: [291835.724344] ------------[ cut here ]------------ [291835.724376] kernel BUG at fs/btrfs/relocation.c:836! [291835.724401] invalid opcode: 0000 [#1] SMP [291835.724424] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map [291835.724461] CPU 0 [291835.724472] Modules linked in: uvcvideo snd_usb_audio snd_usbmidi_lib videodev v4l1_compat snd_rawmidi v4l2_compat_ioctl32 btrfs zlib_deflate libcrc32c sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt tun ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss exportfs nls_utf8 cifs fscache sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm dummy uinput snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device e1000e snd_pcm snd_timer i2c_i801 snd shpchp iTCO_wdt iTCO_vendor_support soundcore dell_wmi sparse_keymap snd_page_alloc serio_raw joydev wmi dcdbas microcode usb_storage uas raid1 pata_acpi ata_generic radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] [291835.725002] [291835.725013] Pid: 27386, comm: btrfs Tainted: G I 2.6.37-2.fc15.x86_64 #1 [291835.725062] RIP: 0010:[<ffffffffa0565237>] [<ffffffffa0565237>] build_backref_tree+0x473/0xd6d [btrfs] [291835.725126] RSP: 0018:ffff8800373bf9c8 EFLAGS: 00010246 [291835.725152] RAX: ffff8801367d5100 RBX: ffff88020b110880 RCX: 0000000000000040 [291835.725186] RDX: 0000000000000030 RSI: 0000006dd08d3000 RDI: ffff880100069820 [291835.725219] RBP: ffff8800373bfaf8 R08: 0000000000008050 R09: ffff8800373bf980 [291835.725253] R10: ffff8800373bf918 R11: ffff88020b110880 R12: ffff8801367d5100 [291835.725254] R13: ffff88012c0a24c0 R14: ffff88021e2013f0 R15: ffff88021e201cf0 [291835.725254] FS: 00007fcb1a6cc760(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000 [291835.725254] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [291835.725254] CR2: 0000000002feeeb8 CR3: 00000001c2943000 CR4: 00000000000426e0 [291835.725254] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [291835.725254] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [291835.725254] Process btrfs (pid: 27386, threadinfo ffff8800373be000, task ffff88022452ae40) [291835.725254] Stack: [291835.725254] ffffea0004b5a470 ffffea0000000000 ffff8800373bf9f8 ffff8800373bfaa8 [291835.725254] 0000000000000000 ffff88005faafbb0 ffff880100069808 ffff880100069d78 [291835.725254] ffff88012c0a2aa0 ffff880100069820 ffff88020b1108c0 ffff880100069d80 [291835.725254] Call Trace: [291835.725254] [<ffffffffa0565c91>] relocate_tree_blocks+0x160/0x478 [btrfs] [291835.725254] [<ffffffffa056463d>] ? add_tree_block+0x11e/0x13e [btrfs] [291835.725254] [<ffffffffa0566b45>] relocate_block_group+0x1e3/0x490 [btrfs] [291835.725254] [<ffffffff8103edb9>] ? should_resched+0xe/0x2e [291835.725254] [<ffffffffa0566f39>] btrfs_relocate_block_group+0x147/0x28a [btrfs] [291835.725254] [<ffffffffa054e52a>] btrfs_relocate_chunk.clone.40+0x61/0x4ab [btrfs] [291835.725254] [<ffffffffa05152d4>] ? btrfs_item_key+0x1e/0x20 [btrfs] [291835.725254] [<ffffffffa05152f0>] ? btrfs_item_key_to_cpu+0x1a/0x36 [btrfs] [291835.725254] [<ffffffffa054c2a8>] ? read_extent_buffer+0xc3/0xe3 [btrfs] [291835.725254] [<ffffffffa05154e6>] ? btrfs_header_nritems.clone.12+0x17/0x1c [btrfs] [291835.725254] [<ffffffffa054cff6>] ? btrfs_item_key_to_cpu+0x2a/0x46 [btrfs] [291835.725254] [<ffffffffa055045e>] btrfs_balance+0x1a3/0x1f0 [btrfs] [291835.725254] [<ffffffff8112bce5>] ? do_filp_open+0x226/0x5c8 [291835.725254] [<ffffffffa0556773>] btrfs_ioctl+0x641/0x846 [btrfs] [291835.725254] [<ffffffff811f3ed1>] ? file_has_perm+0xa5/0xc7 [291835.725254] [<ffffffff8112e091>] do_vfs_ioctl+0x4b1/0x4f2 [291835.725254] [<ffffffff8112e128>] sys_ioctl+0x56/0x7a [291835.725254] [<ffffffff8100acc2>] system_call_fastpath+0x16/0x1b [291835.725254] Code: 48 8b 45 89 49 8d 7d 10 48 8d 75 b0 49 89 44 24 18 8a 43 70 ff c0 41 88 44 24 70 e8 f7 c3 ff ff eb 17 f6 40 71 10 49 89 c4 75 02 <0f> 0b 49 8d 45 10 49 89 45 10 49 89 45 18 48 8b b5 20 ff ff ff [291835.725254] RIP [<ffffffffa0565237>] build_backref_tree+0x473/0xd6d [btrfs] [291835.725254] RSP <ffff8800373bf9c8> [291835.738971] ---[ end trace a7919e7f17c0a727 ]--- It is really difficult to reproduce this bug. This time, I was balancing a 300GB volume, which was almost finished by the time it crashed. It had been running for 2 days straight, and survived a complete backup run, with 5 simultaneous rsyncs running on it. Last night when the rsyncs kicked in, it crashed within half an hour though. I will now try downgrading to 2.6.36 as per Zheng Yan''s suggestion. Thanks, Erik. Op 17-1-2011 15:31, Erik Logtenberg schreef:> Hi, > > Please find attached the error log, for future reference. > > Forgot to mention: > I could still use the system after this error, so it was not a complete > fatal error in that regard. All active processes (mostly rsync) were > hanging in state D though, so I couldn''t kill them anymore. Also the FS > was not umountable. So I still had to reboot. > > Thanks, > > Erik. > > > On 01/17/2011 03:14 PM, Erik Logtenberg wrote: >> Hi, >> >> btrfs balance results in: >> >> http://pastebin.com/v5j0809M >> >> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs >> balance do useful stuff to my free space: >> >> kernel-2.6.37-2.fc15.x86_64 >> btrfs-progs-0.19-12.fc14.x86_64 >> >> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran >> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs >> that had failed due to ENOSP). >> Up until the crash, btrfs balance did retrieve a couple of Gigs free >> space though, so that part of the plan worked just fine. >> >> Thanks, >> >> Erik. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
please try patch attached below, Thanks.
---
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index b37d723..49d6b13 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1158,6 +1158,7 @@ static int clone_backref_node(struct
btrfs_trans_handle *trans,
new_node->bytenr = dest->node->start;
new_node->level = node->level;
new_node->lowest = node->lowest;
+ new_node->checked = 1;
new_node->root = dest;
if (!node->lowest) {
---
On Fri, Jan 21, 2011 at 4:50 PM, Erik Logtenberg <erik@logtenberg.eu>
wrote:> Hi,
>
> I hit the same bug again I think:
>
> [291835.724344] ------------[ cut here ]------------
> [291835.724376] kernel BUG at fs/btrfs/relocation.c:836!
> [291835.724401] invalid opcode: 0000 [#1] SMP
> [291835.724424] last sysfs file:
> /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
> [291835.724461] CPU 0
> [291835.724472] Modules linked in: uvcvideo snd_usb_audio
> snd_usbmidi_lib videodev v4l1_compat snd_rawmidi v4l2_compat_ioctl32
> btrfs zlib_deflate libcrc32c sha256_generic cryptd aes_x86_64
> aes_generic cbc dm_crypt tun ebtable_nat ebtables ipt_MASQUERADE
> iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss
> exportfs nls_utf8 cifs fscache sunrpc cpufreq_ondemand acpi_cpufreq
> freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> ip6table_filter ip6_tables ipv6 kvm_intel kvm dummy uinput
> snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq
> snd_seq_device e1000e snd_pcm snd_timer i2c_i801 snd shpchp iTCO_wdt
> iTCO_vendor_support soundcore dell_wmi sparse_keymap snd_page_alloc
> serio_raw joydev wmi dcdbas microcode usb_storage uas raid1 pata_acpi
> ata_generic radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last
> unloaded: scsi_wait_scan]
> [291835.725002]
> [291835.725013] Pid: 27386, comm: btrfs Tainted: G I
> 2.6.37-2.fc15.x86_64 #1
> [291835.725062] RIP: 0010:[<ffffffffa0565237>]
[<ffffffffa0565237>]
> build_backref_tree+0x473/0xd6d [btrfs]
> [291835.725126] RSP: 0018:ffff8800373bf9c8 EFLAGS: 00010246
> [291835.725152] RAX: ffff8801367d5100 RBX: ffff88020b110880 RCX:
> 0000000000000040
> [291835.725186] RDX: 0000000000000030 RSI: 0000006dd08d3000 RDI:
> ffff880100069820
> [291835.725219] RBP: ffff8800373bfaf8 R08: 0000000000008050 R09:
> ffff8800373bf980
> [291835.725253] R10: ffff8800373bf918 R11: ffff88020b110880 R12:
> ffff8801367d5100
> [291835.725254] R13: ffff88012c0a24c0 R14: ffff88021e2013f0 R15:
> ffff88021e201cf0
> [291835.725254] FS: 00007fcb1a6cc760(0000) GS:ffff8800bfa00000(0000)
> knlGS:0000000000000000
> [291835.725254] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [291835.725254] CR2: 0000000002feeeb8 CR3: 00000001c2943000 CR4:
> 00000000000426e0
> [291835.725254] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [291835.725254] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [291835.725254] Process btrfs (pid: 27386, threadinfo ffff8800373be000,
> task ffff88022452ae40)
> [291835.725254] Stack:
> [291835.725254] ffffea0004b5a470 ffffea0000000000 ffff8800373bf9f8
> ffff8800373bfaa8
> [291835.725254] 0000000000000000 ffff88005faafbb0 ffff880100069808
> ffff880100069d78
> [291835.725254] ffff88012c0a2aa0 ffff880100069820 ffff88020b1108c0
> ffff880100069d80
> [291835.725254] Call Trace:
> [291835.725254] [<ffffffffa0565c91>]
relocate_tree_blocks+0x160/0x478
> [btrfs]
> [291835.725254] [<ffffffffa056463d>] ? add_tree_block+0x11e/0x13e
[btrfs]
> [291835.725254] [<ffffffffa0566b45>]
relocate_block_group+0x1e3/0x490
> [btrfs]
> [291835.725254] [<ffffffff8103edb9>] ? should_resched+0xe/0x2e
> [291835.725254] [<ffffffffa0566f39>]
> btrfs_relocate_block_group+0x147/0x28a [btrfs]
> [291835.725254] [<ffffffffa054e52a>]
> btrfs_relocate_chunk.clone.40+0x61/0x4ab [btrfs]
> [291835.725254] [<ffffffffa05152d4>] ? btrfs_item_key+0x1e/0x20
[btrfs]
> [291835.725254] [<ffffffffa05152f0>] ?
btrfs_item_key_to_cpu+0x1a/0x36
> [btrfs]
> [291835.725254] [<ffffffffa054c2a8>] ? read_extent_buffer+0xc3/0xe3
[btrfs]
> [291835.725254] [<ffffffffa05154e6>] ?
> btrfs_header_nritems.clone.12+0x17/0x1c [btrfs]
> [291835.725254] [<ffffffffa054cff6>] ?
btrfs_item_key_to_cpu+0x2a/0x46
> [btrfs]
> [291835.725254] [<ffffffffa055045e>] btrfs_balance+0x1a3/0x1f0
[btrfs]
> [291835.725254] [<ffffffff8112bce5>] ? do_filp_open+0x226/0x5c8
> [291835.725254] [<ffffffffa0556773>] btrfs_ioctl+0x641/0x846 [btrfs]
> [291835.725254] [<ffffffff811f3ed1>] ? file_has_perm+0xa5/0xc7
> [291835.725254] [<ffffffff8112e091>] do_vfs_ioctl+0x4b1/0x4f2
> [291835.725254] [<ffffffff8112e128>] sys_ioctl+0x56/0x7a
> [291835.725254] [<ffffffff8100acc2>] system_call_fastpath+0x16/0x1b
> [291835.725254] Code: 48 8b 45 89 49 8d 7d 10 48 8d 75 b0 49 89 44 24 18
> 8a 43 70 ff c0 41 88 44 24 70 e8 f7 c3 ff ff eb 17 f6 40 71 10 49 89 c4
> 75 02 <0f> 0b 49 8d 45 10 49 89 45 10 49 89 45 18 48 8b b5 20 ff ff
ff
> [291835.725254] RIP [<ffffffffa0565237>]
build_backref_tree+0x473/0xd6d
> [btrfs]
> [291835.725254] RSP <ffff8800373bf9c8>
> [291835.738971] ---[ end trace a7919e7f17c0a727 ]---
>
>
> It is really difficult to reproduce this bug. This time, I was balancing
> a 300GB volume, which was almost finished by the time it crashed. It had
> been running for 2 days straight, and survived a complete backup run,
> with 5 simultaneous rsyncs running on it. Last night when the rsyncs
> kicked in, it crashed within half an hour though.
>
> I will now try downgrading to 2.6.36 as per Zheng Yan''s
suggestion.
>
> Thanks,
>
> Erik.
>
>
> Op 17-1-2011 15:31, Erik Logtenberg schreef:
>> Hi,
>>
>> Please find attached the error log, for future reference.
>>
>> Forgot to mention:
>> I could still use the system after this error, so it was not a complete
>> fatal error in that regard. All active processes (mostly rsync) were
>> hanging in state D though, so I couldn''t kill them anymore.
Also the FS
>> was not umountable. So I still had to reboot.
>>
>> Thanks,
>>
>> Erik.
>>
>>
>> On 01/17/2011 03:14 PM, Erik Logtenberg wrote:
>>> Hi,
>>>
>>> btrfs balance results in:
>>>
>>> http://pastebin.com/v5j0809M
>>>
>>> My system: fully up-to-date Fedora 14 with rawhide kernel to make
btrfs
>>> balance do useful stuff to my free space:
>>>
>>> kernel-2.6.37-2.fc15.x86_64
>>> btrfs-progs-0.19-12.fc14.x86_64
>>>
>>> Filesystem had 0 bytes free, should be 45G, so on darklings advice
I ran
>>> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup
jobs
>>> that had failed due to ENOSP).
>>> Up until the crash, btrfs balance did retrieve a couple of Gigs
free
>>> space though, so that part of the plan worked just fine.
>>>
>>> Thanks,
>>>
>>> Erik.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, It took me a couple of days, because I needed to patch my kernel first and then issue a rebalance, which ran for more than two days. Nevertheless, the rebalance succeeded without any "kernel BUG"-messages, so apparently your patch works! I noticed that at first, the messages were like this: [79329.526490] btrfs: found 1939 extents [79375.950834] btrfs: found 1939 extents [79376.083599] btrfs: relocating block group 352220872704 flags 1 [80052.940435] btrfs: found 3786 extents [80108.439657] btrfs: found 3786 extents [80112.325548] btrfs: relocating block group 351147130880 flags 1 Just like I saw during previous balance-runs. Then all of a sudden the messages changed to: [104178.827594] btrfs allocation failed flags 1, wanted 2013265920 [104178.827599] space_info has 4271198208 free, is not full [104178.827602] space_info total=214748364800, used=210440957952, pinned=0, reserved=36208640, may_use=3168993280, readonly=0 [104178.827606] block group 1107296256 has 5368709120 bytes, 5368582144 used 0 pinned 0 reserved [104178.827610] entry offset 1778384896, bytes 86016, bitmap yes [104178.827612] entry offset 1855827968, bytes 20480, bitmap no [104178.827614] entry offset 1855852544, bytes 20480, bitmap no [104178.827617] block group has cluster?: no [104178.827618] 0 blocks of free space at or bigger than bytes is [104178.827621] block group 8623489024 has 5368709120 bytes, 5368705024 used 0 pinned 0 reserved [104178.827624] entry offset 8891924480, bytes 4096, bitmap yes [104178.827626] block group has cluster?: no [104178.827628] 0 blocks of free space at or bigger than bytes is [104178.827631] block group 17213423616 has 5368709120 bytes, 5368709120 used 0 pinned 0 reserved [104178.827634] block group has cluster?: no And so on. Does this indicate an error of any sort, or is this expected behaviour? Kind regards, Erik. On 01/21/2011 10:19 AM, Yan, Zheng wrote:> please try patch attached below, Thanks. > > --- > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index b37d723..49d6b13 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -1158,6 +1158,7 @@ static int clone_backref_node(struct > btrfs_trans_handle *trans, > new_node->bytenr = dest->node->start; > new_node->level = node->level; > new_node->lowest = node->lowest; > + new_node->checked = 1; > new_node->root = dest; > > if (!node->lowest) { > --- > > > On Fri, Jan 21, 2011 at 4:50 PM, Erik Logtenberg <erik@logtenberg.eu> wrote: >> Hi, >> >> I hit the same bug again I think: >> >> [291835.724344] ------------[ cut here ]------------ >> [291835.724376] kernel BUG at fs/btrfs/relocation.c:836! >> [291835.724401] invalid opcode: 0000 [#1] SMP >> [291835.724424] last sysfs file: >> /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map >> [291835.724461] CPU 0 >> [291835.724472] Modules linked in: uvcvideo snd_usb_audio >> snd_usbmidi_lib videodev v4l1_compat snd_rawmidi v4l2_compat_ioctl32 >> btrfs zlib_deflate libcrc32c sha256_generic cryptd aes_x86_64 >> aes_generic cbc dm_crypt tun ebtable_nat ebtables ipt_MASQUERADE >> iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss >> exportfs nls_utf8 cifs fscache sunrpc cpufreq_ondemand acpi_cpufreq >> freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 >> ip6table_filter ip6_tables ipv6 kvm_intel kvm dummy uinput >> snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq >> snd_seq_device e1000e snd_pcm snd_timer i2c_i801 snd shpchp iTCO_wdt >> iTCO_vendor_support soundcore dell_wmi sparse_keymap snd_page_alloc >> serio_raw joydev wmi dcdbas microcode usb_storage uas raid1 pata_acpi >> ata_generic radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last >> unloaded: scsi_wait_scan] >> [291835.725002] >> [291835.725013] Pid: 27386, comm: btrfs Tainted: G I >> 2.6.37-2.fc15.x86_64 #1 >> [291835.725062] RIP: 0010:[<ffffffffa0565237>] [<ffffffffa0565237>] >> build_backref_tree+0x473/0xd6d [btrfs] >> [291835.725126] RSP: 0018:ffff8800373bf9c8 EFLAGS: 00010246 >> [291835.725152] RAX: ffff8801367d5100 RBX: ffff88020b110880 RCX: >> 0000000000000040 >> [291835.725186] RDX: 0000000000000030 RSI: 0000006dd08d3000 RDI: >> ffff880100069820 >> [291835.725219] RBP: ffff8800373bfaf8 R08: 0000000000008050 R09: >> ffff8800373bf980 >> [291835.725253] R10: ffff8800373bf918 R11: ffff88020b110880 R12: >> ffff8801367d5100 >> [291835.725254] R13: ffff88012c0a24c0 R14: ffff88021e2013f0 R15: >> ffff88021e201cf0 >> [291835.725254] FS: 00007fcb1a6cc760(0000) GS:ffff8800bfa00000(0000) >> knlGS:0000000000000000 >> [291835.725254] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> [291835.725254] CR2: 0000000002feeeb8 CR3: 00000001c2943000 CR4: >> 00000000000426e0 >> [291835.725254] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> [291835.725254] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: >> 0000000000000400 >> [291835.725254] Process btrfs (pid: 27386, threadinfo ffff8800373be000, >> task ffff88022452ae40) >> [291835.725254] Stack: >> [291835.725254] ffffea0004b5a470 ffffea0000000000 ffff8800373bf9f8 >> ffff8800373bfaa8 >> [291835.725254] 0000000000000000 ffff88005faafbb0 ffff880100069808 >> ffff880100069d78 >> [291835.725254] ffff88012c0a2aa0 ffff880100069820 ffff88020b1108c0 >> ffff880100069d80 >> [291835.725254] Call Trace: >> [291835.725254] [<ffffffffa0565c91>] relocate_tree_blocks+0x160/0x478 >> [btrfs] >> [291835.725254] [<ffffffffa056463d>] ? add_tree_block+0x11e/0x13e [btrfs] >> [291835.725254] [<ffffffffa0566b45>] relocate_block_group+0x1e3/0x490 >> [btrfs] >> [291835.725254] [<ffffffff8103edb9>] ? should_resched+0xe/0x2e >> [291835.725254] [<ffffffffa0566f39>] >> btrfs_relocate_block_group+0x147/0x28a [btrfs] >> [291835.725254] [<ffffffffa054e52a>] >> btrfs_relocate_chunk.clone.40+0x61/0x4ab [btrfs] >> [291835.725254] [<ffffffffa05152d4>] ? btrfs_item_key+0x1e/0x20 [btrfs] >> [291835.725254] [<ffffffffa05152f0>] ? btrfs_item_key_to_cpu+0x1a/0x36 >> [btrfs] >> [291835.725254] [<ffffffffa054c2a8>] ? read_extent_buffer+0xc3/0xe3 [btrfs] >> [291835.725254] [<ffffffffa05154e6>] ? >> btrfs_header_nritems.clone.12+0x17/0x1c [btrfs] >> [291835.725254] [<ffffffffa054cff6>] ? btrfs_item_key_to_cpu+0x2a/0x46 >> [btrfs] >> [291835.725254] [<ffffffffa055045e>] btrfs_balance+0x1a3/0x1f0 [btrfs] >> [291835.725254] [<ffffffff8112bce5>] ? do_filp_open+0x226/0x5c8 >> [291835.725254] [<ffffffffa0556773>] btrfs_ioctl+0x641/0x846 [btrfs] >> [291835.725254] [<ffffffff811f3ed1>] ? file_has_perm+0xa5/0xc7 >> [291835.725254] [<ffffffff8112e091>] do_vfs_ioctl+0x4b1/0x4f2 >> [291835.725254] [<ffffffff8112e128>] sys_ioctl+0x56/0x7a >> [291835.725254] [<ffffffff8100acc2>] system_call_fastpath+0x16/0x1b >> [291835.725254] Code: 48 8b 45 89 49 8d 7d 10 48 8d 75 b0 49 89 44 24 18 >> 8a 43 70 ff c0 41 88 44 24 70 e8 f7 c3 ff ff eb 17 f6 40 71 10 49 89 c4 >> 75 02 <0f> 0b 49 8d 45 10 49 89 45 10 49 89 45 18 48 8b b5 20 ff ff ff >> [291835.725254] RIP [<ffffffffa0565237>] build_backref_tree+0x473/0xd6d >> [btrfs] >> [291835.725254] RSP <ffff8800373bf9c8> >> [291835.738971] ---[ end trace a7919e7f17c0a727 ]--- >> >> >> It is really difficult to reproduce this bug. This time, I was balancing >> a 300GB volume, which was almost finished by the time it crashed. It had >> been running for 2 days straight, and survived a complete backup run, >> with 5 simultaneous rsyncs running on it. Last night when the rsyncs >> kicked in, it crashed within half an hour though. >> >> I will now try downgrading to 2.6.36 as per Zheng Yan''s suggestion. >> >> Thanks, >> >> Erik. >> >> >> Op 17-1-2011 15:31, Erik Logtenberg schreef: >>> Hi, >>> >>> Please find attached the error log, for future reference. >>> >>> Forgot to mention: >>> I could still use the system after this error, so it was not a complete >>> fatal error in that regard. All active processes (mostly rsync) were >>> hanging in state D though, so I couldn''t kill them anymore. Also the FS >>> was not umountable. So I still had to reboot. >>> >>> Thanks, >>> >>> Erik. >>> >>> >>> On 01/17/2011 03:14 PM, Erik Logtenberg wrote: >>>> Hi, >>>> >>>> btrfs balance results in: >>>> >>>> http://pastebin.com/v5j0809M >>>> >>>> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs >>>> balance do useful stuff to my free space: >>>> >>>> kernel-2.6.37-2.fc15.x86_64 >>>> btrfs-progs-0.19-12.fc14.x86_64 >>>> >>>> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran >>>> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs >>>> that had failed due to ENOSP). >>>> Up until the crash, btrfs balance did retrieve a couple of Gigs free >>>> space though, so that part of the plan worked just fine. >>>> >>>> Thanks, >>>> >>>> Erik. >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 26, 2011 at 10:04:02AM +0100, Erik Logtenberg wrote:> Hi, > > It took me a couple of days, because I needed to patch my kernel first > and then issue a rebalance, which ran for more than two days. > Nevertheless, the rebalance succeeded without any "kernel BUG"-messages, > so apparently your patch works! > > I noticed that at first, the messages were like this: > > [79329.526490] btrfs: found 1939 extents > [79375.950834] btrfs: found 1939 extents > [79376.083599] btrfs: relocating block group 352220872704 flags 1 > [80052.940435] btrfs: found 3786 extents > [80108.439657] btrfs: found 3786 extents > [80112.325548] btrfs: relocating block group 351147130880 flags 1 > > Just like I saw during previous balance-runs. Then all of a sudden the > messages changed to: > > [104178.827594] btrfs allocation failed flags 1, wanted 2013265920 > [104178.827599] space_info has 4271198208 free, is not full > [104178.827602] space_info total=214748364800, used=210440957952, > pinned=0, reserved=36208640, may_use=3168993280, readonly=0 > [104178.827606] block group 1107296256 has 5368709120 bytes, 5368582144 > used 0 pinned 0 reserved > [104178.827610] entry offset 1778384896, bytes 86016, bitmap yes > [104178.827612] entry offset 1855827968, bytes 20480, bitmap no > [104178.827614] entry offset 1855852544, bytes 20480, bitmap no > [104178.827617] block group has cluster?: no > [104178.827618] 0 blocks of free space at or bigger than bytes is > [104178.827621] block group 8623489024 has 5368709120 bytes, 5368705024 > used 0 pinned 0 reserved > [104178.827624] entry offset 8891924480, bytes 4096, bitmap yes > [104178.827626] block group has cluster?: no > [104178.827628] 0 blocks of free space at or bigger than bytes is > [104178.827631] block group 17213423616 has 5368709120 bytes, 5368709120 > used 0 pinned 0 reserved > [104178.827634] block group has cluster?: no > > And so on. > > Does this indicate an error of any sort, or is this expected behaviour?As far as I know, it means that you''ve run out of space, and not every block group has been rewritten by the balance process. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- In one respect at least, the Martians are a happy people: --- they have no lawyers.
Hallo, Hugo, Du meintest am 26.01.11:>> It took me a couple of days, because I needed to patch my kernel >> first and then issue a rebalance, which ran for more than two days. >> Nevertheless, the rebalance succeeded without any "kernel >> BUG"-messages, so apparently your patch works![...]> As far as I know, it means that you''ve run out of space, and not > every block group has been rewritten by the balance process.Yesterday I reported a similar problem in this mailing list, in the thread "version". Running kernel 2.6.37 didn''t show this error, but running kernel 2.6.38- rc2 ended with errors. Viele Gruesse! Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> [104178.827624] entry offset 8891924480, bytes 4096, bitmap yes >> [104178.827626] block group has cluster?: no >> [104178.827628] 0 blocks of free space at or bigger than bytes is >> [104178.827631] block group 17213423616 has 5368709120 bytes, 5368709120 >> used 0 pinned 0 reserved >> [104178.827634] block group has cluster?: no >> >> And so on. >> >> Does this indicate an error of any sort, or is this expected behaviour? > > As far as I know, it means that you''ve run out of space, and not > every block group has been rewritten by the balance process. > > Hugo. >It is a 300GB volume with 79GB free. So hardly out of space. Moreover, I started the balance operation with the sole purpose of reclaiming some free space. The volume had like 40GB less free space when balance started, which was used by / reserved for Metadata. Kind regards, Erik. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Yesterday I reported a similar problem in this mailing list, in the > thread "version". > > Running kernel 2.6.37 didn''t show this error, but running kernel 2.6.38- > rc2 ended with errors. > > Viele Gruesse! > HelmutAh, indeed, just like you I use 2.6.38-rc2. Or to be more precise: 2.6.38-0.rc2.git0.1.fc14.x86_64, which is the latest rawhide kernel, with one additional patch, being the oneliner from Zheng Yan. Kind regards, Erik. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 26 Jan 2011 08:40:00 PM Helmut Hullen wrote:> Yesterday I reported a similar problem in this mailing list, in the > thread "version".I think that might have been a slightly different issue, but I''d guess there would be no harm in trying Yan Zheng''s patch! cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP