thr3ads.net - Btrfs devel - Still getting a lot of -28 (ENOSPC?) errors during balance [Apr 2013]

If this information is useful, please help other people find it:
Share via:

Roman Mamedov

2013-Apr-02 08:04 UTC

Still getting a lot of -28 (ENOSPC?) errors during balance

Hello,

With kernel 3.7.10 patched with "Btrfs: limit the global reserve to
512mb".
(the problem was occuring also without this patch, but seemed to be even worse).

At the start of balance:

Data: total=31.85GB, used=9.96GB
System: total=4.00MB, used=16.00KB
Metadata: total=1.01GB, used=696.17MB

"btrfs balance start -musage=5 -dusage=5" is going on for about 50
minutes

Current situation:

Balance on ''/mnt/r1/'' is running
1 out of about 2 chunks balanced (20 considered),  50% left

Data: total=30.85GB, used=10.04GB
System: total=4.00MB, used=16.00KB
Metadata: total=1.01GB, used=851.69MB

And a constant stream of these in dmesg:

[ 7614.756339] btrfs: block rsv returned -28
[ 7614.756341] ------------[ cut here ]------------
[ 7614.756370] WARNING: at fs/btrfs/extent-tree.c:6297
btrfs_alloc_free_block+0x351/0x360 [btrfs]()
[ 7614.756372] Hardware name: GA-880GM-UD2H
[ 7614.756374] Modules linked in: nfsd auth_rpcgss nfs_acl nfs lockd fscache
sunrpc bridge 8021q garp stp llc aoe it87 hwmon_vid snd_hda_codec_hdmi
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcsp snd_pcm
kvm_amd kvm snd_page_alloc snd_timer snd soundcore sp5100_tco serio_raw joydev
k10temp edac_core i2c_piix4 edac_mce_amd mac_hid shpchp wmi btrfs libcrc32c
zlib_deflate raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor xor
async_tx raid6_pq nbd sata_nv hid_generic usbhid usb_storage hid microcode r8169
sata_mv
[ 7614.756411] Pid: 4708, comm: btrfs Tainted: G        W    3.7.10-rm3+ #38
[ 7614.756413] Call Trace:
[ 7614.756421]  [<ffffffff8105516f>] warn_slowpath_common+0x7f/0xc0
[ 7614.756425]  [<ffffffff810551ca>] warn_slowpath_null+0x1a/0x20
[ 7614.756440]  [<ffffffffa00f71c1>] btrfs_alloc_free_block+0x351/0x360
[btrfs]
[ 7614.756453]  [<ffffffffa00e3d03>] ? __btrfs_cow_block+0x323/0x4f0
[btrfs]
[ 7614.756472]  [<ffffffffa01263ab>] ? read_extent_buffer+0xbb/0x110
[btrfs]
[ 7614.756485]  [<ffffffffa00e3b06>] __btrfs_cow_block+0x126/0x4f0 [btrfs]
[ 7614.756498]  [<ffffffffa00e4047>] btrfs_cow_block+0xf7/0x200 [btrfs]
[ 7614.756515]  [<ffffffffa014abb7>] do_relocation+0x467/0x530 [btrfs]
[ 7614.756528]  [<ffffffffa00ecdfa>] ? block_rsv_add_bytes+0x5a/0x80
[btrfs]
[ 7614.756545]  [<ffffffffa014e8dd>] relocate_tree_blocks+0x61d/0x650
[btrfs]
[ 7614.756562]  [<ffffffffa014f8d6>] relocate_block_group+0x446/0x6a0
[btrfs]
[ 7614.756579]  [<ffffffffa014fccd>]
btrfs_relocate_block_group+0x19d/0x2d0 [btrfs]
[ 7614.756596]  [<ffffffffa0128685>]
btrfs_relocate_chunk.isra.53+0x75/0x770 [btrfs]
[ 7614.756613]  [<ffffffffa0138668>] ?
btrfs_set_lock_blocking_rw+0xa8/0xf0 [btrfs]
[ 7614.756630]  [<ffffffffa011ffa1>] ?
release_extent_buffer.isra.26+0x81/0xf0 [btrfs]
[ 7614.756647]  [<ffffffffa0125227>] ? free_extent_buffer+0x37/0x90
[btrfs]
[ 7614.756663]  [<ffffffffa012cdc7>] btrfs_balance+0x877/0xe00 [btrfs]
[ 7614.756680]  [<ffffffffa0132f9c>] btrfs_ioctl_balance+0x10c/0x430
[btrfs]
[ 7614.756684]  [<ffffffff8167e38e>] ? _raw_spin_lock+0xe/0x20
[ 7614.756702]  [<ffffffffa013739b>] btrfs_ioctl+0xf2b/0x1970 [btrfs]
[ 7614.756707]  [<ffffffff8114f7f9>] ? handle_mm_fault+0x249/0x310
[ 7614.756711]  [<ffffffff816822d4>] ? __do_page_fault+0x254/0x4e0
[ 7614.756714]  [<ffffffff811527d0>] ? __vma_link_rb+0x30/0x40
[ 7614.756718]  [<ffffffff8118f9b0>] do_vfs_ioctl+0x90/0x570
[ 7614.756723]  [<ffffffff8108793a>] ? finish_task_switch+0x4a/0xe0
[ 7614.756726]  [<ffffffff8118ff21>] sys_ioctl+0x91/0xb0
[ 7614.756730]  [<ffffffff81686b1d>] system_call_fastpath+0x1a/0x1f
[ 7614.756732] ---[ end trace 9fdf5720be6ec4cb ]---
[ 7614.756894] btrfs: block rsv returned -28
[ 7614.756895] ------------[ cut here ]------------
[ 7614.756910] WARNING: at fs/btrfs/extent-tree.c:6297
btrfs_alloc_free_block+0x351/0x360 [btrfs]()
[ 7614.756911] Hardware name: GA-880GM-UD2H
[ 7614.756912] Modules linked in: nfsd auth_rpcgss nfs_acl nfs lockd fscache
sunrpc bridge 8021q garp stp llc aoe it87 hwmon_vid snd_hda_codec_hdmi
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcsp snd_pcm
kvm_amd kvm snd_page_alloc snd_timer snd soundcore sp5100_tco serio_raw joydev
k10temp edac_core i2c_piix4 edac_mce_amd mac_hid shpchp wmi btrfs libcrc32c
zlib_deflate raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor xor
async_tx raid6_pq nbd sata_nv hid_generic usbhid usb_storage hid microcode r8169
sata_mv
[ 7614.756945] Pid: 4708, comm: btrfs Tainted: G        W    3.7.10-rm3+ #38
[ 7614.756946] Call Trace:
[ 7614.756950]  [<ffffffff8105516f>] warn_slowpath_common+0x7f/0xc0
[ 7614.756953]  [<ffffffff810551ca>] warn_slowpath_null+0x1a/0x20
[ 7614.756967]  [<ffffffffa00f71c1>] btrfs_alloc_free_block+0x351/0x360
[btrfs]
[ 7614.756983]  [<ffffffffa0147ad7>] ? add_delayed_tree_ref+0xd7/0x190
[btrfs]
[ 7614.757000]  [<ffffffffa014769e>] ?
add_delayed_ref_head.isra.7+0xce/0x160 [btrfs]
[ 7614.757017]  [<ffffffffa01482a5>] ?
btrfs_add_delayed_tree_ref+0x115/0x1a0 [btrfs]
[ 7614.757030]  [<ffffffffa00e3b06>] __btrfs_cow_block+0x126/0x4f0 [btrfs]
[ 7614.757034]  [<ffffffff8112feba>] ? set_page_dirty+0x5a/0x70
[ 7614.757047]  [<ffffffffa00e4047>] btrfs_cow_block+0xf7/0x200 [btrfs]
[ 7614.757060]  [<ffffffffa00e8367>] btrfs_search_slot+0x3d7/0x8d0 [btrfs]
[ 7614.757075]  [<ffffffffa0107f64>] ?
btrfs_record_root_in_trans+0x64/0x80 [btrfs]
[ 7614.757092]  [<ffffffffa014ab2e>] do_relocation+0x3de/0x530 [btrfs]
[ 7614.757105]  [<ffffffffa00ecdfa>] ? block_rsv_add_bytes+0x5a/0x80
[btrfs]
[ 7614.757122]  [<ffffffffa014e8dd>] relocate_tree_blocks+0x61d/0x650
[btrfs]
[ 7614.757139]  [<ffffffffa014f8d6>] relocate_block_group+0x446/0x6a0
[btrfs]
[ 7614.757155]  [<ffffffffa014fccd>]
btrfs_relocate_block_group+0x19d/0x2d0 [btrfs]
[ 7614.757172]  [<ffffffffa0128685>]
btrfs_relocate_chunk.isra.53+0x75/0x770 [btrfs]
[ 7614.757189]  [<ffffffffa0138668>] ?
btrfs_set_lock_blocking_rw+0xa8/0xf0 [btrfs]
[ 7614.757206]  [<ffffffffa011ffa1>] ?
release_extent_buffer.isra.26+0x81/0xf0 [btrfs]
[ 7614.757222]  [<ffffffffa0125227>] ? free_extent_buffer+0x37/0x90
[btrfs]
[ 7614.757239]  [<ffffffffa012cdc7>] btrfs_balance+0x877/0xe00 [btrfs]
[ 7614.757256]  [<ffffffffa0132f9c>] btrfs_ioctl_balance+0x10c/0x430
[btrfs]
[ 7614.757260]  [<ffffffff8167e38e>] ? _raw_spin_lock+0xe/0x20
[ 7614.757276]  [<ffffffffa013739b>] btrfs_ioctl+0xf2b/0x1970 [btrfs]
[ 7614.757280]  [<ffffffff8114f7f9>] ? handle_mm_fault+0x249/0x310
[ 7614.757283]  [<ffffffff816822d4>] ? __do_page_fault+0x254/0x4e0
[ 7614.757287]  [<ffffffff811527d0>] ? __vma_link_rb+0x30/0x40
[ 7614.757290]  [<ffffffff8118f9b0>] do_vfs_ioctl+0x90/0x570
[ 7614.757294]  [<ffffffff8108793a>] ? finish_task_switch+0x4a/0xe0
[ 7614.757297]  [<ffffffff8118ff21>] sys_ioctl+0x91/0xb0
[ 7614.757300]  [<ffffffff81686b1d>] system_call_fastpath+0x1a/0x1f
[ 7614.757302] ---[ end trace 9fdf5720be6ec4cc ]---

-- 
With respect,
Roman

Roman Mamedov

2013-Apr-02 09:37 UTC

head link

Re: Still getting a lot of -28 (ENOSPC?) errors during balance

On Tue, 2 Apr 2013 14:04:52 +0600
Roman Mamedov <rm@romanrm.ru> wrote:
> With kernel 3.7.10 patched with "Btrfs: limit the global reserve to
512mb".
> (the problem was occuring also without this patch, but seemed to be even
worse).
> 
> At the start of balance:
> 
> Data: total=31.85GB, used=9.96GB
> System: total=4.00MB, used=16.00KB
> Metadata: total=1.01GB, used=696.17MB
> 
> "btrfs balance start -musage=5 -dusage=5" is going on for about
50 minutes
> 
> Current situation:
> 
> Balance on ''/mnt/r1/'' is running
> 1 out of about 2 chunks balanced (20 considered),  50% left
> 
> Data: total=30.85GB, used=10.04GB
> System: total=4.00MB, used=16.00KB
> Metadata: total=1.01GB, used=851.69MB
About 2 hours 10 minutes into the balance, it was still going, with:

Data: total=30.85GB, used=10.06GB
System: total=4.00MB, used=16.00KB
Metadata: total=1.01GB, used=909.16MB

Stream of -28 errors continues non-stop in dmesg;

At ~2hr20min looks like it decided to allocate some more space for metadata:

Data: total=30.85GB, used=10.01GB
System: total=4.00MB, used=16.00KB
Metadata: total=2.01GB, used=748.56MB

And shortly after (~ 2hr25min) it was done. After the balance:

Data: total=29.85GB, used=10.01GB
System: total=4.00MB, used=16.00KB
Metadata: total=2.01GB, used=748.27MB

-- 
With respect,
Roman

Josef Bacik

2013-Apr-02 13:46 UTC

head link

Re: Still getting a lot of -28 (ENOSPC?) errors during balance

On Tue, Apr 02, 2013 at 02:04:52AM -0600, Roman Mamedov
wrote:> Hello,
> 
> With kernel 3.7.10 patched with "Btrfs: limit the global reserve to
512mb".
> (the problem was occuring also without this patch, but seemed to be even
worse).
> 
> At the start of balance:
> 
> Data: total=31.85GB, used=9.96GB
> System: total=4.00MB, used=16.00KB
> Metadata: total=1.01GB, used=696.17MB
> 
> "btrfs balance start -musage=5 -dusage=5" is going on for about
50 minutes
> 
> Current situation:
> 
> Balance on ''/mnt/r1/'' is running
> 1 out of about 2 chunks balanced (20 considered),  50% left
> 
> Data: total=30.85GB, used=10.04GB
> System: total=4.00MB, used=16.00KB
> Metadata: total=1.01GB, used=851.69MB
> 
> And a constant stream of these in dmesg:
> 
Can you try this out and see if it helps?  Thanks,

Josef

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0d89ff0..9830e86 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2548,6 +2548,13 @@ static int do_relocation(struct btrfs_trans_handle
*trans,
 	list_for_each_entry(edge, &node->upper, list[LOWER]) {
 		cond_resched();
 
+		ret = btrfs_block_rsv_refill(rc->extent_root, rc->block_rsv,
+					     rc->extent_root->leafsize,
+					     BTRFS_RESERVE_FLUSH_ALL);
+		if (ret) {
+			err = ret;
+			break;
+		}
 		upper = edge->node[UPPER];
 		root = select_reloc_root(trans, rc, upper, edges, &nr);
 		BUG_ON(!root);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Roman Mamedov

2013-Apr-02 16:55 UTC

head link

Re: Still getting a lot of -28 (ENOSPC?) errors during balance

On Tue, 2 Apr 2013 09:46:26 -0400
Josef Bacik <jbacik@fusionio.com> wrote:
> On Tue, Apr 02, 2013 at 02:04:52AM -0600, Roman Mamedov wrote:
> > Hello,
> > 
> > With kernel 3.7.10 patched with "Btrfs: limit the global reserve
to 512mb".
> > (the problem was occuring also without this patch, but seemed to be
even worse).
> > 
> > At the start of balance:
> > 
> > Data: total=31.85GB, used=9.96GB
> > System: total=4.00MB, used=16.00KB
> > Metadata: total=1.01GB, used=696.17MB
> > 
> > "btrfs balance start -musage=5 -dusage=5" is going on for
about 50 minutes
> > 
> > Current situation:
> > 
> > Balance on ''/mnt/r1/'' is running
> > 1 out of about 2 chunks balanced (20 considered),  50% left
> > 
> > Data: total=30.85GB, used=10.04GB
> > System: total=4.00MB, used=16.00KB
> > Metadata: total=1.01GB, used=851.69MB
> > 
> > And a constant stream of these in dmesg:
> > 
> 
> Can you try this out and see if it helps?  Thanks,
Hello,

Well that balance has now completed, and unfortunately I don''t have a
complete
image of the filesystem from before, to apply the patch and check if the same
operation goes better this time.

I''ll keep it in mind and will try to test it out if I will get a
similar
situation again on some filesystem.

Generally what seems to make me run into various problems with balance, is the
following usage scenario: On an active filesystem (used as /home and root FS),
a snapshot is made every 30 minutes with an unique (timestamped) name; and once
a day snapshots from more than two days ago are purged. And it goes like this
for months.

Another variant of this, a backup partition, where snapshots are made every six
hours, and all snapshots are kept for 1-3 months before getting purged.

I guess this kind of usage causes a lot of internal fragmentation or
something, which makes it difficult for a balance to find enough free space to
work with.
> 
> Josef
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 0d89ff0..9830e86 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -2548,6 +2548,13 @@ static int do_relocation(struct btrfs_trans_handle
*trans,
>  	list_for_each_entry(edge, &node->upper, list[LOWER]) {
>  		cond_resched();
>  
> +		ret = btrfs_block_rsv_refill(rc->extent_root, rc->block_rsv,
> +					     rc->extent_root->leafsize,
> +					     BTRFS_RESERVE_FLUSH_ALL);
> +		if (ret) {
> +			err = ret;
> +			break;
> +		}
>  		upper = edge->node[UPPER];
>  		root = select_reloc_root(trans, rc, upper, edges, &nr);
>  		BUG_ON(!root);
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
With respect,
Roman

Josef Bacik

2013-Apr-02 17:00 UTC

head link

Re: Still getting a lot of -28 (ENOSPC?) errors during balance

On Tue, Apr 02, 2013 at 10:55:04AM -0600, Roman Mamedov
wrote:> On Tue, 2 Apr 2013 09:46:26 -0400
> Josef Bacik <jbacik@fusionio.com> wrote:
> 
> > On Tue, Apr 02, 2013 at 02:04:52AM -0600, Roman Mamedov wrote:
> > > Hello,
> > > 
> > > With kernel 3.7.10 patched with "Btrfs: limit the global
reserve to 512mb".
> > > (the problem was occuring also without this patch, but seemed to
be even worse).
> > > 
> > > At the start of balance:
> > > 
> > > Data: total=31.85GB, used=9.96GB
> > > System: total=4.00MB, used=16.00KB
> > > Metadata: total=1.01GB, used=696.17MB
> > > 
> > > "btrfs balance start -musage=5 -dusage=5" is going on
for about 50 minutes
> > > 
> > > Current situation:
> > > 
> > > Balance on ''/mnt/r1/'' is running
> > > 1 out of about 2 chunks balanced (20 considered),  50% left
> > > 
> > > Data: total=30.85GB, used=10.04GB
> > > System: total=4.00MB, used=16.00KB
> > > Metadata: total=1.01GB, used=851.69MB
> > > 
> > > And a constant stream of these in dmesg:
> > > 
> > 
> > Can you try this out and see if it helps?  Thanks,
> 
> Hello,
> 
> Well that balance has now completed, and unfortunately I don''t
have a complete
> image of the filesystem from before, to apply the patch and check if the
same
> operation goes better this time.
> 
> I''ll keep it in mind and will try to test it out if I will get a
similar
> situation again on some filesystem.
> 
> Generally what seems to make me run into various problems with balance, is
the
> following usage scenario: On an active filesystem (used as /home and root
FS),
> a snapshot is made every 30 minutes with an unique (timestamped) name; and
once
> a day snapshots from more than two days ago are purged. And it goes like
this
> for months.
> 
> Another variant of this, a backup partition, where snapshots are made every
six
> hours, and all snapshots are kept for 1-3 months before getting purged.
> 
> I guess this kind of usage causes a lot of internal fragmentation or
> something, which makes it difficult for a balance to find enough free space
to
> work with.
> 
Well one thing to keep in mind is that these warnings are truly just warnings,
it finds space and uses it, it''s just our internal space reservation
calculations are coming up short so it''s letting us know we need to
adjust our
math.  So really we need to sit down and adjust how balance does it''s
reservation stuff so we can stop these warnings from happening at all, but it
doesn''t affect the actual balance other than making it super noisy and
slow.
Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Apr 2013 - Still getting a lot of -28 (ENOSPC?) errors during balance

Still getting a lot of -28 (ENOSPC?) errors during balance

Re: Still getting a lot of -28 (ENOSPC?) errors during balance

Re: Still getting a lot of -28 (ENOSPC?) errors during balance

Re: Still getting a lot of -28 (ENOSPC?) errors during balance

Re: Still getting a lot of -28 (ENOSPC?) errors during balance