thr3ads.net - Btrfs devel - Btrfs Intermittent ENOSPC Issues [Jul 2012]

If this information is useful, please help other people find it:
Share via:

Mitch Harder

2012-Jul-31 19:37 UTC

Btrfs Intermittent ENOSPC Issues

I''ve been working on running down intermittent ENOSPC issues.

I can only seem to replicate ENOSPC errors when running zlib
compression.  However, I have been seeing similar ENOSPC errors to a
lesser extent when playing with the LZ4HC patches.

I apologize for not following up on this sooner, but I had drifted
away from using zlib, and didn''t notice there was still an issue.

My test case involves un-archiving linux git sources to a freshly
formatted btrfs partition, mounted with compress-force=zlib.  I am
using a 16 GB partition on a 250 GB Western Digital SATA Hard Disk.
My current kernel is x86_64 linux-3.5.0 merged with Chris'' for-linus
branch (for 3.6_rc).  This includes Josef''s "Btrfs: flush delayed
inodes if we''re short on space" patch.

I haven''t isolated a root cause, but here''s the feedback I
have so far.

(1)  My test case won''t generate ENOSPC issues with lzo compression or
no compression.

(2)  I''ve inserted some trace_printk debugging statements to trace
back the call stack, and the ENOSPC errors only seem to occur on a new
transaction: vfs_create -> btrfs_create -> btrfs_start_transaction ->
start_transaction -> btrfs_block_rsv_add -> reserve_metadata_bytes.

(3)  The ENOSPC condition will usually clear in a few seconds,
allowing writes to proceed.

(4)  I''ve added a loop to the reserve_metadata_bytes() function to
loop back with ''flush_state = FLUSH_DELALLOC (1)'' for 1024
retries.
This reduces and/or eliminates the ENOSPC errors, as if we''re waiting
on something else that is trying to complete.

(5)  I''ve been heavily debugging the reserve_metadata_bytes()
function, and I''m seeing problems with the way
space_info->bytes_may_use is handled.  The space_info->bytes_may_use
value is important in determining if we''re in an over-commit state.
But space_info->bytes_may_use value is often increased arbitrarily
without any mechanism for correcting the value.  Subsequently,
space_info->bytes_may_use quickly increases in size to the point where
we are always in fallback allocation as if we''re overcommitted.  In my
trials, it was hard to capture a point where space_info->bytes_may_use
wasn''t larger than the available size.

(6)  Even though reserve_metadata_bytes() is almost always in fallback
overcommitted mode, it is still working pretty well, and I''ve
developed the perception that the problem is something that needs to
finish elsewhere.

Sorry for not having a patch to fix the issue.  I''ll try to keep
banging on it as time allows.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Aug-01 12:40 UTC

head link

Re: Btrfs Intermittent ENOSPC Issues

On 08/01/2012 03:37 AM, Mitch Harder wrote:> I''ve been working on running down intermittent ENOSPC issues.
> 
> I can only seem to replicate ENOSPC errors when running zlib
> compression.  However, I have been seeing similar ENOSPC errors to a
> lesser extent when playing with the LZ4HC patches.
> 
> I apologize for not following up on this sooner, but I had drifted
> away from using zlib, and didn''t notice there was still an issue.
> 
> My test case involves un-archiving linux git sources to a freshly
> formatted btrfs partition, mounted with compress-force=zlib.  I am
> using a 16 GB partition on a 250 GB Western Digital SATA Hard Disk.
> My current kernel is x86_64 linux-3.5.0 merged with Chris''
for-linus
> branch (for 3.6_rc).  This includes Josef''s "Btrfs: flush
delayed
> inodes if we''re short on space" patch.
> 
> I haven''t isolated a root cause, but here''s the feedback
I have so far.
> 
> (1)  My test case won''t generate ENOSPC issues with lzo
compression or
> no compression.
> 
> (2)  I''ve inserted some trace_printk debugging statements to trace
> back the call stack, and the ENOSPC errors only seem to occur on a new
> transaction: vfs_create -> btrfs_create -> btrfs_start_transaction
->
> start_transaction -> btrfs_block_rsv_add -> reserve_metadata_bytes.
> 
> (3)  The ENOSPC condition will usually clear in a few seconds,
> allowing writes to proceed.
> 
> (4)  I''ve added a loop to the reserve_metadata_bytes() function to
> loop back with ''flush_state = FLUSH_DELALLOC (1)'' for
1024 retries.
> This reduces and/or eliminates the ENOSPC errors, as if we''re
waiting
> on something else that is trying to complete.
> 
> (5)  I''ve been heavily debugging the reserve_metadata_bytes()
> function, and I''m seeing problems with the way
> space_info->bytes_may_use is handled.  The space_info->bytes_may_use
> value is important in determining if we''re in an over-commit
state.
> But space_info->bytes_may_use value is often increased arbitrarily
> without any mechanism for correcting the value.  Subsequently,
> space_info->bytes_may_use quickly increases in size to the point where
> we are always in fallback allocation as if we''re overcommitted. 
In my
> trials, it was hard to capture a point where space_info->bytes_may_use
> wasn''t larger than the available size.
> 
Interesting results.

IIRC, space_info->bytes_may_use seems not to be arbitrarily increased:

Block_rsv wants NUM bytes
          -> space_info''s bytes_may_use += NUM

Block_rsv uses SOME bytes and release itself
          -> space_info''s bytes_may_use -= (NUM - SOME)

So IMO it is ''over-reserve'' that causes ENOSPC.

Maybe we can try to find why more bytes need to be reserved with
compress=zlib/compress=LZ4HC.

thanks,
liubo
> (6)  Even though reserve_metadata_bytes() is almost always in fallback
> overcommitted mode, it is still working pretty well, and I''ve
> developed the perception that the problem is something that needs to
> finish elsewhere.
> 
> Sorry for not having a patch to fix the issue.  I''ll try to keep
> banging on it as time allows.
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mitch Harder

2012-Aug-27 20:44 UTC

head link

Re: Btrfs Intermittent ENOSPC Issues

On Tue, Jul 31, 2012 at 2:37 PM, Mitch Harder
<mitch.harder@sabayonlinux.org> wrote:> I''ve been working on running down intermittent ENOSPC issues.
>
> I can only seem to replicate ENOSPC errors when running zlib
> compression.  However, I have been seeing similar ENOSPC errors to a
> lesser extent when playing with the LZ4HC patches.
>
I''ve been spending most of my efforts on the specific areas of code
that are generating the ENOSPC error.  But I''ve been developing the
perception that the real problem is elsewhere.

I probably should have looked at this a while ago, but if I generate
an Alt-SysRq-W delayed tasks traceback during the intermittent periods
when ENOSPC errors are occurring, I''m seeing delays in other areas.

It may be that the ENOSPC errors are occurring due to a page lock that
is not clearing in another thread.

[12339.617366] SysRq : HELP : loglevel(0-9) reBoot Crash
terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I)
thaw-filesystems(J) saK show-backtrace-all-active-cpus(L)
show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P)
show-all-timers(Q) unRaw Sync show-task-states(T) Unmount
show-blocked-tasks(W) dump-ftrace-buffer(Z)
[12339.650620] SysRq : Show Blocked State
[12339.650624]   task                        PC stack   pid father
[12339.650678] flush-btrfs-6   D ffffffff810c03bb     0  7162      2 0x00000000
[12339.650681]  ffff880126a83990 0000000000000046 ffff880126a82000
ffff8801266fad40
[12339.650684]  0000000000012280 ffff880126a83fd8 0000000000012280
0000000000004000
[12339.650687]  ffff880126a83fd8 0000000000012280 ffff880129af16a0
ffff8801266fad40
[12339.650690] Call Trace:
[12339.650698]  [<ffffffff8106c6d0>] ? ktime_get_ts+0xae/0xbb
[12339.650701]  [<ffffffff8106c6d0>] ? ktime_get_ts+0xae/0xbb
[12339.650705]  [<ffffffff810c03bb>] ? __lock_page+0x6d/0x6d
[12339.650708]  [<ffffffff8162da84>] schedule+0x64/0x66
[12339.650710]  [<ffffffff8162db12>] io_schedule+0x8c/0xcf
[12339.650713]  [<ffffffff810c03c9>] sleep_on_page+0xe/0x12
[12339.650715]  [<ffffffff8162c159>] __wait_on_bit_lock+0x46/0x8f
[12339.650717]  [<ffffffff810c0117>] ? find_get_pages_tag+0xf8/0x134
[12339.650720]  [<ffffffff810c03b4>] __lock_page+0x66/0x6d
[12339.650723]  [<ffffffff8104b7ff>] ? autoremove_wake_function+0x39/0x39
[12339.650753]  [<ffffffffa0065f28>]
extent_write_cache_pages.clone.16.clone.29+0x143/0x30c [btrfs]
[12339.650770]  [<ffffffffa0066303>] extent_writepages+0x48/0x5d [btrfs]
[12339.650784]  [<ffffffffa0053019>] ?
uncompress_inline.clone.33+0x15f/0x15f [btrfs]
[12339.650788]  [<ffffffff8105c8f4>] ? update_curr+0x81/0x123
[12339.650802]  [<ffffffffa00528ac>] btrfs_writepages+0x27/0x29 [btrfs]
[12339.650805]  [<ffffffff810c9975>] do_writepages+0x20/0x29
[12339.650808]  [<ffffffff8112ec67>]
__writeback_single_inode.clone.22+0x48/0x11c
[12339.650811]  [<ffffffff8112f1cf>] writeback_sb_inodes+0x1f0/0x332
[12339.650813]  [<ffffffff810c870e>] ? global_dirtyable_memory+0x1a/0x3b
[12339.650816]  [<ffffffff8112f389>] __writeback_inodes_wb+0x78/0xb9
[12339.650818]  [<ffffffff8112f510>] wb_writeback+0x146/0x23e
[12339.650820]  [<ffffffff810c891b>] ? global_dirty_limits+0x2f/0x10f
[12339.650822]  [<ffffffff8112fdef>] wb_do_writeback+0x195/0x1b0
[12339.650825]  [<ffffffff8112fe98>] bdi_writeback_thread+0x8e/0x1f1
[12339.650827]  [<ffffffff8112fe0a>] ? wb_do_writeback+0x1b0/0x1b0
[12339.650829]  [<ffffffff8112fe0a>] ? wb_do_writeback+0x1b0/0x1b0
[12339.650832]  [<ffffffff8104b2ef>] kthread+0x89/0x91
[12339.650835]  [<ffffffff816303f4>] kernel_thread_helper+0x4/0x10
[12339.650837]  [<ffffffff8104b266>] ?
kthread_freezable_should_stop+0x57/0x57
[12339.650839]  [<ffffffff816303f0>] ? gs_change+0xb/0xb
[12339.650842] tar             D ffff88012683f8b8     0  7173   7152 0x00000000
[12339.650845]  ffff880126c0f9e8 0000000000000086 ffff880126c0e000
ffff8801267496a0
[12339.650848]  0000000000012280 ffff880126c0ffd8 0000000000012280
0000000000004000
[12339.650851]  ffff880126c0ffd8 0000000000012280 ffff880129af16a0
ffff8801267496a0
[12339.650854] Call Trace:
[12339.650866]  [<ffffffffa0037b35>] ?
block_rsv_release_bytes+0xc7/0x127 [btrfs]
[12339.650869]  [<ffffffff8103c073>] ? lock_timer_base.clone.26+0x2b/0x50
[12339.650871]  [<ffffffff8162da84>] schedule+0x64/0x66
[12339.650873]  [<ffffffff8162c075>] schedule_timeout+0x22c/0x26a
[12339.650876]  [<ffffffff8103c038>] ? run_timer_softirq+0x2d4/0x2d4
[12339.650878]  [<ffffffff8162c0f1>] schedule_timeout_killable+0x1e/0x20
[12339.650890]  [<ffffffffa003dd0c>]
reserve_metadata_bytes.clone.57+0x4ba/0x5e7 [btrfs]
[12339.650906]  [<ffffffffa0066b52>] ? free_extent_buffer+0x68/0x6c
[btrfs]
[12339.650918]  [<ffffffffa003e1a9>] btrfs_block_rsv_add+0x2b/0x4d [btrfs]
[12339.650932]  [<ffffffffa004ff40>] start_transaction+0x131/0x310 [btrfs]
[12339.650946]  [<ffffffffa0050386>] btrfs_start_transaction+0x13/0x15
[btrfs]
[12339.650961]  [<ffffffffa005b10a>] btrfs_create+0x3a/0x1e0 [btrfs]
[12339.650964]  [<ffffffff81120861>] ? d_splice_alias+0xcc/0xd8
[12339.650966]  [<ffffffff811173aa>] vfs_create+0x9c/0xf5
[12339.650968]  [<ffffffff81119786>] do_last+0x2b9/0x807
[12339.650971]  [<ffffffff81119dd3>] path_openat+0xcc/0x37f
[12339.650973]  [<ffffffff8105c495>] ? pick_next_task_fair+0xfa/0x1a3
[12339.650975]  [<ffffffff8111a188>] do_filp_open+0x3d/0x89
[12339.650978]  [<ffffffff810ff2b0>] ? kmem_cache_alloc+0x31/0x104
[12339.650981]  [<ffffffff811244d8>] ? alloc_fd+0x74/0x103
[12339.650984]  [<ffffffff8110c9c1>] do_sys_open+0x10f/0x1a1
[12339.650986]  [<ffffffff8110ca8a>] sys_openat+0x14/0x16
[12339.650988]  [<ffffffff8162f196>] system_call_fastpath+0x1a/0x1f
[12339.650991] Sched Debug Version: v0.10, 3.5.2-git-local+ #1
[12339.650993] ktime                                   : 12339650.991349
[12339.650994] sched_clk                               : 12307274.537183
[12339.650996] cpu_clk                                 : 12339650.990603
[12339.650997] jiffies                                 : 4307006946
[12339.650998] sched_clock_stable                      : 0
[12339.650999]
[12339.651000] sysctl_sched
[12339.651001]   .sysctl_sched_latency                    : 12.000000
[12339.651002]   .sysctl_sched_min_granularity            : 1.500000
[12339.651004]   .sysctl_sched_wakeup_granularity         : 2.000000
[12339.651005]   .sysctl_sched_child_runs_first           : 0
[12339.651006]   .sysctl_sched_features                   : 24119
[12339.651007]   .sysctl_sched_tunable_scaling            : 1 (logaritmic)
[12339.651009]
[12339.651009] cpu#0, 2666.695 MHz
[12339.651010]   .nr_running                    : 1
[12339.651012]   .load                          : 1024
[12339.651013]   .nr_switches                   : 1867973
[12339.651014]   .nr_load_updates               : 921873
[12339.651015]   .nr_uninterruptible            : -38
[12339.651016]   .next_balance                  : 4307.007084
[12339.651018]   .curr->pid                     : 7127
[12339.651019]   .clock                         : 12339651.001145
[12339.651020]   .cpu_load[0]                   : 1024
[12339.651021]   .cpu_load[1]                   : 1024
[12339.651022]   .cpu_load[2]                   : 1024
[12339.651023]   .cpu_load[3]                   : 1024
[12339.651024]   .cpu_load[4]                   : 1024
[12339.651025]   .yld_count                     : 1
[12339.651027]   .sched_count                   : 1882299
[12339.651028]   .sched_goidle                  : 912528
[12339.651029]   .avg_idle                      : 377192
[12339.651030]   .ttwu_count                    : 926414
[12339.651031]   .ttwu_local                    : 883111
[12339.651033]
[12339.651033] cfs_rq[0]:/autogroup-1
[12339.651035]   .exec_clock                    : 15.992542
[12339.651036]   .MIN_vruntime                  : 0.000001
[12339.651037]   .min_vruntime                  : 34.248046
[12339.651038]   .max_vruntime                  : 0.000001
[12339.651040]   .spread                        : 0.000000
[12339.651041]   .spread0                       : -40669.589708
[12339.651042]   .nr_spread_over                : 0
[12339.651043]   .nr_running                    : 0
[12339.651044]   .load                          : 0
[12339.651045]   .load_avg                      : 2559.999744
[12339.651047]   .load_period                   : 8.499867
[12339.651048]   .load_contrib                  : 301
[12339.651049]   .load_tg                       : 301
[12339.651050]   .se->exec_start                : 12334559.061047
[12339.651052]   .se->vruntime                  : 40611.002746
[12339.651053]   .se->sum_exec_runtime          : 16.033168
[12339.651054]   .se->statistics.wait_start     : 0.000000
[12339.651055]   .se->statistics.sleep_start    : 0.000000
[12339.651057]   .se->statistics.block_start    : 0.000000
[12339.651058]   .se->statistics.sleep_max      : 0.000000
[12339.651059]   .se->statistics.block_max      : 0.000000
[12339.651060]   .se->statistics.exec_max       : 0.267801
[12339.651061]   .se->statistics.slice_max      : 0.017406
[12339.651063]   .se->statistics.wait_max       : 2.667927
[12339.651064]   .se->statistics.wait_sum       : 7.034526
[12339.651065]   .se->statistics.wait_count     : 1273
[12339.651066]   .se->load.weight               : 2
[12339.651068]
[12339.651068] cfs_rq[0]:/
[12339.651069]   .exec_clock                    : 29641.344277
[12339.651071]   .MIN_vruntime                  : 0.000001
[12339.651072]   .min_vruntime                  : 40703.837754
[12339.651073]   .max_vruntime                  : 0.000001
[12339.651074]   .spread                        : 0.000000
[12339.651075]   .spread0                       : 0.000000
[12339.651076]   .nr_spread_over                : 1163
[12339.651077]   .nr_running                    : 1
[12339.651078]   .load                          : 1024
[12339.651080]   .load_avg                      : 0.000000
[12339.651081]   .load_period                   : 0.000000
[12339.651082]   .load_contrib                  : 0
[12339.651083]   .load_tg                       : 0
[12339.651084]
[12339.651084] rt_rq[0]:
[12339.651085]   .rt_nr_running                 : 0
[12339.651087]   .rt_throttled                  : 0
[12339.651088]   .rt_time                       : 0.000000
[12339.651089]   .rt_runtime                    : 950.000000
[12339.651090]
[12339.651090] runnable tasks:
[12339.651090]             task   PID         tree-key  switches  prio
    exec-runtime         sum-exec        sum-sleep
[12339.651090]
----------------------------------------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Maybe Matching Threads

Search for more maybe matching threads

Btrfs devel - Jul 2012 - Btrfs Intermittent ENOSPC Issues

Btrfs Intermittent ENOSPC Issues

Re: Btrfs Intermittent ENOSPC Issues

Re: Btrfs Intermittent ENOSPC Issues

Maybe Matching Threads