thr3ads.net - Btrfs devel - Intermittent no space errors [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Dave Cundiff

2010-Jul-26 21:09 UTC

Intermittent no space errors

Hello,

On 2.6.35-rc5 I''m seeing some weird behavior under heavy IO loads. I
have a backup process that fires up several rsync processes. These
mirror several dozen servers to individual sub-volumes. Everyday I
snapshot each sub-volume and rsync over it.

The problem I''m seeing is my rsync processes are failing randomly with
"No space left on device". This is a 6 Terabyte volume with plenty of
free space.

Mount options:
/dev/sdb on /backups type btrfs (rw,max_inline=0,compress)

[root@rsync1 ~]# btrfs filesystem df /backups/
Data: total=1.88TB, used=1.88TB
Metadata: total=43.38GB, used=32.06GB
System: total=12.00MB, used=260.00KB

[root@rsync1 ~]# df /dev/sdb
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb             5781249024 2087273084 3693975940  37% /backups

They don''t all fail at once. Normally I have 4-5 running at a time and
1 or 2 will drop out with a no space error. The rest continue on. I''ve
noticed it will generally occur on ones that are in the middle of
transferring a very large file. If I lighten the load to one rsync at
a time it appears to happen less frequently.

Any known issues I should be aware of?

Thanks,

-- 
Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yan, Zheng

2010-Jul-27 13:19 UTC

head link

Re: Intermittent no space errors

On Tue, Jul 27, 2010 at 5:09 AM, Dave Cundiff <syshackmin@gmail.com>
wrote:> Hello,
>
> On 2.6.35-rc5 I''m seeing some weird behavior under heavy IO loads.
I
> have a backup process that fires up several rsync processes. These
> mirror several dozen servers to individual sub-volumes. Everyday I
> snapshot each sub-volume and rsync over it.
>
> The problem I''m seeing is my rsync processes are failing randomly
with
> "No space left on device". This is a 6 Terabyte volume with
plenty of
> free space.
>
> Mount options:
> /dev/sdb on /backups type btrfs (rw,max_inline=0,compress)
>
> [root@rsync1 ~]# btrfs filesystem df /backups/
> Data: total=1.88TB, used=1.88TB
> Metadata: total=43.38GB, used=32.06GB
> System: total=12.00MB, used=260.00KB
>
> [root@rsync1 ~]# df /dev/sdb
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/sdb             5781249024 2087273084 3693975940  37% /backups
>
> They don''t all fail at once. Normally I have 4-5 running at a time
and
> 1 or 2 will drop out with a no space error. The rest continue on.
I''ve
> noticed it will generally occur on ones that are in the middle of
> transferring a very large file. If I lighten the load to one rsync at
> a time it appears to happen less frequently.
>
> Any known issues I should be aware of?
>
Thank you for reporting this. I will dig in.

Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dave Cundiff

2010-Jul-27 20:30 UTC

head link

Re: Intermittent no space errors

Hello,

I installed the git repo kernel and added some debug to the ENOSPC
returns. Unfortunately its still failing. If it helps any its bombing
out in btrfs_check_data_free_space() in extent-tree.c. Returning on
the ENOSPC at line 2959.

Unfortunately that is the extent of my ability to debug a filesystem. :P

Thanks,

On Tue, Jul 27, 2010 at 9:19 AM, Yan, Zheng <yanzheng@21cn.com>
wrote:> On Tue, Jul 27, 2010 at 5:09 AM, Dave Cundiff <syshackmin@gmail.com>
wrote:
>> Hello,
>>
>> On 2.6.35-rc5 I''m seeing some weird behavior under heavy IO
loads. I
>> have a backup process that fires up several rsync processes. These
>> mirror several dozen servers to individual sub-volumes. Everyday I
>> snapshot each sub-volume and rsync over it.
>>
>> The problem I''m seeing is my rsync processes are failing
randomly with
>> "No space left on device". This is a 6 Terabyte volume with
plenty of
>> free space.
>>
>> Mount options:
>> /dev/sdb on /backups type btrfs (rw,max_inline=0,compress)
>>
>> [root@rsync1 ~]# btrfs filesystem df /backups/
>> Data: total=1.88TB, used=1.88TB
>> Metadata: total=43.38GB, used=32.06GB
>> System: total=12.00MB, used=260.00KB
>>
>> [root@rsync1 ~]# df /dev/sdb
>> Filesystem           1K-blocks      Used Available Use% Mounted on
>> /dev/sdb             5781249024 2087273084 3693975940  37% /backups
>>
>> They don''t all fail at once. Normally I have 4-5 running at a
time and
>> 1 or 2 will drop out with a no space error. The rest continue on.
I''ve
>> noticed it will generally occur on ones that are in the middle of
>> transferring a very large file. If I lighten the load to one rsync at
>> a time it appears to happen less frequently.
>>
>> Any known issues I should be aware of?
>>
>
> Thank you for reporting this. I will dig in.
>
> Yan, Zheng
>


-- 
Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yan, Zheng

2010-Jul-28 00:31 UTC

head link

Re: Intermittent no space errors

On Wed, Jul 28, 2010 at 4:30 AM, Dave Cundiff <syshackmin@gmail.com>
wrote:> Hello,
>
> I installed the git repo kernel and added some debug to the ENOSPC
> returns. Unfortunately its still failing. If it helps any its bombing
> out in btrfs_check_data_free_space() in extent-tree.c. Returning on
> the ENOSPC at line 2959.
>
> Unfortunately that is the extent of my ability to debug a filesystem. :P
>
This is really helpful, thank you very much.

Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Justin Ossevoort

2010-Jul-29 08:10 UTC

head link

Re: Intermittent no space errors

Hello,

Pherhaps it would be a good idea to add a tracepoint before each return
ENOSPC? It shouldn''t matter too much since a reasonable assumption
would
be that filesystems aren''t running out of space most of the time. And
you can use ''perf'' for more insight in these cases without
recompiling
the kernel.

Regards,

    justin....

On 27/07/10 22:30, Dave Cundiff wrote:> Hello,
>
> I installed the git repo kernel and added some debug to the ENOSPC
> returns. Unfortunately its still failing. If it helps any its bombing
> out in btrfs_check_data_free_space() in extent-tree.c. Returning on
> the ENOSPC at line 2959.
>
> Unfortunately that is the extent of my ability to debug a filesystem. :P
>
> Thanks,
>
> On Tue, Jul 27, 2010 at 9:19 AM, Yan, Zheng <yanzheng@21cn.com>
wrote:
>   
>> On Tue, Jul 27, 2010 at 5:09 AM, Dave Cundiff
<syshackmin@gmail.com> wrote:
>>     
>>> Hello,
>>>
>>> On 2.6.35-rc5 I''m seeing some weird behavior under heavy
IO loads. I
>>> have a backup process that fires up several rsync processes. These
>>> mirror several dozen servers to individual sub-volumes. Everyday I
>>> snapshot each sub-volume and rsync over it.
>>>
>>> The problem I''m seeing is my rsync processes are failing
randomly with
>>> "No space left on device". This is a 6 Terabyte volume
with plenty of
>>> free space.
>>>
>>> Mount options:
>>> /dev/sdb on /backups type btrfs (rw,max_inline=0,compress)
>>>
>>> [root@rsync1 ~]# btrfs filesystem df /backups/
>>> Data: total=1.88TB, used=1.88TB
>>> Metadata: total=43.38GB, used=32.06GB
>>> System: total=12.00MB, used=260.00KB
>>>
>>> [root@rsync1 ~]# df /dev/sdb
>>> Filesystem           1K-blocks      Used Available Use% Mounted on
>>> /dev/sdb             5781249024 2087273084 3693975940  37% /backups
>>>
>>> They don''t all fail at once. Normally I have 4-5 running
at a time and
>>> 1 or 2 will drop out with a no space error. The rest continue on.
I''ve
>>> noticed it will generally occur on ones that are in the middle of
>>> transferring a very large file. If I lighten the load to one rsync
at
>>> a time it appears to happen less frequently.
>>>
>>> Any known issues I should be aware of?
>>>
>>>       
>> Thank you for reporting this. I will dig in.
>>
>> Yan, Zheng
>>
>>     
>
>
>   
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Simon Kirby

2010-Aug-04 00:24 UTC

head link

Re: Intermittent no space errors

On Wed, Jul 28, 2010 at 08:31:10AM +0800, Yan, Zheng  wrote:
> On Wed, Jul 28, 2010 at 4:30 AM, Dave Cundiff <syshackmin@gmail.com>
wrote:
> > Hello,
> >
> > I installed the git repo kernel and added some debug to the ENOSPC
> > returns. Unfortunately its still failing. If it helps any its bombing
> > out in btrfs_check_data_free_space() in extent-tree.c. Returning on
> > the ENOSPC at line 2959.
> >
> > Unfortunately that is the extent of my ability to debug a filesystem.
:P
> 
> This is really helpful, thank you very much.
We''re seeing this too, since upgrading from 2.6.33.2 + merged old git
btrfs
unstable HEAD to plain 2.6.35.

[sroot@backup01:.../.rmagic]# rm *
rm: cannot remove `WEEKLY_bar3d.png'': No space left on device
rm: cannot remove `WEEKLY.html'': No space left on device
rm: cannot remove `YEARLY_bar3d.png'': No space left on device
rm: cannot remove `YEARLY.html'': No space left on device
[sroot@backup01:.../.rmagic]# l
total 25
drwxr-xr-x 1 1037300 1037300   108 2010-08-03 18:19 ./
drwxr-xr-x 1 1037300 1037300    28 2009-11-07 05:57 ../
-rw-r--r-- 1 1037300 1037300  5720 2010-04-23 13:39 WEEKLY_bar3d.png
-rw-r--r-- 1 1037300 1037300 11882 2010-04-23 13:39 WEEKLY.html
-rw-r--r-- 1 1037300 1037300  2998 2010-04-23 13:39 YEARLY_bar3d.png
-rw-r--r-- 1 1037300 1037300  3016 2010-04-23 13:39 YEARLY.html
[sroot@backup01:.../.rmagic]# rm *
rm: cannot remove `WEEKLY_bar3d.png'': No space left on device
rm: cannot remove `YEARLY.html'': No space left on device
[sroot@backup01:.../.rmagic]# rm *
rm: cannot remove `WEEKLY_bar3d.png'': No space left on device
rm: cannot remove `YEARLY.html'': No space left on device
[sroot@backup01:.../.rmagic]# rm *
[sroot@backup01:.../.rmagic]#

[sroot@backup01:/root]# btrfs filesystem df /backup/bu001/vol04
Data: total=2.55TB, used=2.20TB
Metadata: total=230.13GB, used=183.83GB
System: total=12.00MB, used=548.00KB
[sroot@backup01:/root]# df -P /backup/bu001/vol04
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/mapper/bu001-vol04 3221225472 2742785400 478440072      86%
/backup/bu001/vol04
[sroot@backup01:/root]# l /dev/mapper/bu001-vol04
brw-rw---- 1 root disk 252, 10 2010-08-03 16:02 /dev/mapper/bu001-vol04
[sroot@backup01:/root]# btrfs filesystem show /dev/dm-10
failed to read /dev/sr0
Label: none  uuid: 0c55f5f4-b618-4ec2-9dbc-e3e70a901e1a
        Total devices 1 FS bytes used 2.37TB
        devid    1 size 3.00TB used 3.00TB path /dev/dm-10

Btrfs Btrfs v0.19

We''re also seeing things like this in dmesg, which appear to be coming
from btrfs-cleaner cleaning some old snapshot:

Aug  3 18:40:45 backup01 kernel: ------------[ cut here ]------------
Aug  3 18:40:45 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441
btrfs_block_rsv_check+0x151/0x180()
Aug  3 18:40:45 backup01 kernel: Hardware name: PowerEdge 1950
Aug  3 18:40:45 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si
ipmi_msghandler aoe bnx2
Aug  3 18:40:45 backup01 kernel: Pid: 7525, comm: btrfs-cleaner Tainted: G      
W   2.6.35-hw #1
Aug  3 18:40:45 backup01 kernel: Call Trace:
Aug  3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] ?
btrfs_block_rsv_check+0x151/0x180
Aug  3 18:40:45 backup01 kernel: [<ffffffff8104b270>]
warn_slowpath_common+0x80/0xd0
Aug  3 18:40:45 backup01 kernel: [<ffffffff8104b2d5>]
warn_slowpath_null+0x15/0x20
Aug  3 18:40:45 backup01 kernel: [<ffffffff812c56a1>]
btrfs_block_rsv_check+0x151/0x180
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d5ea1>]
btrfs_should_end_transaction+0x61/0x90
Aug  3 18:40:45 backup01 kernel: [<ffffffff812c842d>]
btrfs_drop_snapshot+0x21d/0x5f0
Aug  3 18:40:45 backup01 kernel: [<ffffffff81662d72>] ?
schedule+0x3f2/0x750
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d463a>]
btrfs_clean_old_snapshots+0x12a/0x160
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d0f00>]
cleaner_kthread+0x160/0x190
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d0da0>] ?
cleaner_kthread+0x0/0x190
Aug  3 18:40:45 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/0xb0
Aug  3 18:40:45 backup01 kernel: [<ffffffff8100aca4>]
kernel_thread_helper+0x4/0x10
Aug  3 18:40:45 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0/0xb0
Aug  3 18:40:45 backup01 kernel: [<ffffffff8100aca0>] ?
kernel_thread_helper+0x0/0x10
Aug  3 18:40:45 backup01 kernel: ---[ end trace cffc4418e2c1f45d ]---
Aug  3 18:40:45 backup01 kernel: block_rsv size 16194207744 reserved 4497289216
freed 0 78598144
Aug  3 18:40:45 backup01 kernel: ------------[ cut here ]------------
Aug  3 18:40:45 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441
btrfs_block_rsv_check+0x151/0x180()
Aug  3 18:40:45 backup01 kernel: Hardware name: PowerEdge 1950
Aug  3 18:40:45 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si
ipmi_msghandler aoe bnx2
Aug  3 18:40:45 backup01 kernel: Pid: 7525, comm: btrfs-cleaner Tainted: G      
W   2.6.35-hw #1
Aug  3 18:40:45 backup01 kernel: Call Trace:
Aug  3 18:40:45 backup01 kernel: [<ffffffff812f2eb0>] ?
map_extent_buffer+0xb0/0xc0
Aug  3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] ?
btrfs_block_rsv_check+0x151/0x180
Aug  3 18:40:45 backup01 kernel: [<ffffffff8104b270>]
warn_slowpath_common+0x80/0xd0
Aug  3 18:40:45 backup01 kernel: [<ffffffff8104b2d5>]
warn_slowpath_null+0x15/0x20
Aug  3 18:40:45 backup01 kernel: [<ffffffff812c56a1>]
btrfs_block_rsv_check+0x151/0x180
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d56fa>]
__btrfs_end_transaction+0x19a/0x220
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d578e>]
btrfs_end_transaction_throttle+0xe/0x10
Aug  3 18:40:45 backup01 kernel: [<ffffffff812c84f1>]
btrfs_drop_snapshot+0x2e1/0x5f0
Aug  3 18:40:45 backup01 kernel: [<ffffffff81662d72>] ?
schedule+0x3f2/0x750
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d463a>]
btrfs_clean_old_snapshots+0x12a/0x160
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d0f00>]
cleaner_kthread+0x160/0x190
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d0da0>] ?
cleaner_kthread+0x0/0x190
Aug  3 18:40:45 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/0xb0
Aug  3 18:40:45 backup01 kernel: [<ffffffff8100aca4>]
kernel_thread_helper+0x4/0x10
Aug  3 18:40:45 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0/0xb0
Aug  3 18:40:45 backup01 kernel: [<ffffffff8100aca0>] ?
kernel_thread_helper+0x0/0x10
Aug  3 18:40:45 backup01 kernel: ---[ end trace cffc4418e2c1f45e ]---
Aug  3 18:40:45 backup01 kernel: block_rsv size 16194207744 reserved 4497281024
freed 8192 78598144
Aug  3 18:44:44 backup01 kernel: ------------[ cut here ]------------
Aug  3 18:44:44 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441
btrfs_block_rsv_check+0x151/0x180()
Aug  3 18:44:44 backup01 kernel: Hardware name: PowerEdge 1950
Aug  3 18:44:44 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si
ipmi_msghandler aoe bnx2
Aug  3 18:44:44 backup01 kernel: Pid: 7526, comm: btrfs-transacti Tainted: G    
W   2.6.35-hw #1
Aug  3 18:44:44 backup01 kernel: Call Trace:
Aug  3 18:44:44 backup01 kernel: [<ffffffff812c56a1>] ?
btrfs_block_rsv_check+0x151/0x180
Aug  3 18:44:44 backup01 kernel: [<ffffffff8104b270>]
warn_slowpath_common+0x80/0xd0
Aug  3 18:44:44 backup01 kernel: [<ffffffff8104b2d5>]
warn_slowpath_null+0x15/0x20
Aug  3 18:44:44 backup01 kernel: [<ffffffff812c56a1>]
btrfs_block_rsv_check+0x151/0x180
Aug  3 18:44:44 backup01 kernel: [<ffffffff812d56fa>]
__btrfs_end_transaction+0x19a/0x220
Aug  3 18:44:44 backup01 kernel: [<ffffffff812d579b>]
btrfs_end_transaction+0xb/0x10
Aug  3 18:44:44 backup01 kernel: [<ffffffff812d5d63>]
btrfs_commit_transaction+0x5c3/0x6a0
Aug  3 18:44:44 backup01 kernel: [<ffffffff810683e0>] ?
autoremove_wake_function+0x0/0x40
Aug  3 18:44:44 backup01 kernel: [<ffffffff812d0d90>]
transaction_kthread+0x250/0x260
Aug  3 18:44:44 backup01 kernel: [<ffffffff812d0b40>] ?
transaction_kthread+0x0/0x260
Aug  3 18:44:44 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/0xb0
Aug  3 18:44:44 backup01 kernel: [<ffffffff8100aca4>]
kernel_thread_helper+0x4/0x10
Aug  3 18:44:44 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0/0xb0
Aug  3 18:44:44 backup01 kernel: [<ffffffff8100aca0>] ?
kernel_thread_helper+0x0/0x10
Aug  3 18:44:44 backup01 kernel: ---[ end trace cffc4418e2c1f45f ]---
Aug  3 18:44:44 backup01 kernel: block_rsv size 16194207744 reserved 4522696704
freed 53190656 0

I rebuilt with the #if 0 changed to #if 1 on extent-tree.c:2947:

#if 1 /* I hope we never need this code again, just in case */

ha! :)  "rm" is succeeding everywhere so far, and so this path
hasn''t
been hit yet.  Perhaps it has to fight with the btrfs-cleaner, or
something.  Will post a follow-up later.

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yan, Zheng

2010-Aug-04 11:21 UTC

head link

Re: Intermittent no space errors

On Wed, Aug 4, 2010 at 8:24 AM, Simon Kirby <sim@hostway.ca>
wrote:> On Wed, Jul 28, 2010 at 08:31:10AM +0800, Yan, Zheng  wrote:
>
>> On Wed, Jul 28, 2010 at 4:30 AM, Dave Cundiff
<syshackmin@gmail.com> wrote:
>> > Hello,
>> >
>> > I installed the git repo kernel and added some debug to the ENOSPC
>> > returns. Unfortunately its still failing. If it helps any its
bombing
>> > out in btrfs_check_data_free_space() in extent-tree.c. Returning
on
>> > the ENOSPC at line 2959.
>> >
>> > Unfortunately that is the extent of my ability to debug a
filesystem. :P
>>
>> This is really helpful, thank you very much.
>
> We''re seeing this too, since upgrading from 2.6.33.2 + merged old
git btrfs
> unstable HEAD to plain 2.6.35.
>
> [sroot@backup01:.../.rmagic]# rm *
> rm: cannot remove `WEEKLY_bar3d.png'': No space left on device
> rm: cannot remove `WEEKLY.html'': No space left on device
> rm: cannot remove `YEARLY_bar3d.png'': No space left on device
> rm: cannot remove `YEARLY.html'': No space left on device
> [sroot@backup01:.../.rmagic]# l
> total 25
> drwxr-xr-x 1 1037300 1037300   108 2010-08-03 18:19 ./
> drwxr-xr-x 1 1037300 1037300    28 2009-11-07 05:57 ../
> -rw-r--r-- 1 1037300 1037300  5720 2010-04-23 13:39 WEEKLY_bar3d.png
> -rw-r--r-- 1 1037300 1037300 11882 2010-04-23 13:39 WEEKLY.html
> -rw-r--r-- 1 1037300 1037300  2998 2010-04-23 13:39 YEARLY_bar3d.png
> -rw-r--r-- 1 1037300 1037300  3016 2010-04-23 13:39 YEARLY.html
> [sroot@backup01:.../.rmagic]# rm *
> rm: cannot remove `WEEKLY_bar3d.png'': No space left on device
> rm: cannot remove `YEARLY.html'': No space left on device
> [sroot@backup01:.../.rmagic]# rm *
> rm: cannot remove `WEEKLY_bar3d.png'': No space left on device
> rm: cannot remove `YEARLY.html'': No space left on device
> [sroot@backup01:.../.rmagic]# rm *
> [sroot@backup01:.../.rmagic]#
>
> [sroot@backup01:/root]# btrfs filesystem df /backup/bu001/vol04
> Data: total=2.55TB, used=2.20TB
> Metadata: total=230.13GB, used=183.83GB
> System: total=12.00MB, used=548.00KB
> [sroot@backup01:/root]# df -P /backup/bu001/vol04
> Filesystem         1024-blocks      Used Available Capacity Mounted on
> /dev/mapper/bu001-vol04 3221225472 2742785400 478440072      86%
/backup/bu001/vol04
> [sroot@backup01:/root]# l /dev/mapper/bu001-vol04
> brw-rw---- 1 root disk 252, 10 2010-08-03 16:02 /dev/mapper/bu001-vol04
> [sroot@backup01:/root]# btrfs filesystem show /dev/dm-10
> failed to read /dev/sr0
> Label: none  uuid: 0c55f5f4-b618-4ec2-9dbc-e3e70a901e1a
>        Total devices 1 FS bytes used 2.37TB
>        devid    1 size 3.00TB used 3.00TB path /dev/dm-10
>
> Btrfs Btrfs v0.19
>
> We''re also seeing things like this in dmesg, which appear to be
coming
> from btrfs-cleaner cleaning some old snapshot:
>
> Aug  3 18:40:45 backup01 kernel: ------------[ cut here ]------------
> Aug  3 18:40:45 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441
btrfs_block_rsv_check+0x151/0x180()
> Aug  3 18:40:45 backup01 kernel: Hardware name: PowerEdge 1950
> Aug  3 18:40:45 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si
ipmi_msghandler aoe bnx2
> Aug  3 18:40:45 backup01 kernel: Pid: 7525, comm: btrfs-cleaner Tainted: G
       W   2.6.35-hw #1
> Aug  3 18:40:45 backup01 kernel: Call Trace:
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] ?
btrfs_block_rsv_check+0x151/0x180
> Aug  3 18:40:45 backup01 kernel: [<ffffffff8104b270>]
warn_slowpath_common+0x80/0xd0
> Aug  3 18:40:45 backup01 kernel: [<ffffffff8104b2d5>]
warn_slowpath_null+0x15/0x20
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812c56a1>]
btrfs_block_rsv_check+0x151/0x180
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812d5ea1>]
btrfs_should_end_transaction+0x61/0x90
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812c842d>]
btrfs_drop_snapshot+0x21d/0x5f0
> Aug  3 18:40:45 backup01 kernel: [<ffffffff81662d72>] ?
schedule+0x3f2/0x750
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812d463a>]
btrfs_clean_old_snapshots+0x12a/0x160
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812d0f00>]
cleaner_kthread+0x160/0x190
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812d0da0>] ?
cleaner_kthread+0x0/0x190
> Aug  3 18:40:45 backup01 kernel: [<ffffffff81067ea6>]
kthread+0x96/0xb0
> Aug  3 18:40:45 backup01 kernel: [<ffffffff8100aca4>]
kernel_thread_helper+0x4/0x10
> Aug  3 18:40:45 backup01 kernel: [<ffffffff81067e10>] ?
kthread+0x0/0xb0
> Aug  3 18:40:45 backup01 kernel: [<ffffffff8100aca0>] ?
kernel_thread_helper+0x0/0x10
> Aug  3 18:40:45 backup01 kernel: ---[ end trace cffc4418e2c1f45d ]---
> Aug  3 18:40:45 backup01 kernel: block_rsv size 16194207744 reserved
4497289216 freed 0 78598144
> Aug  3 18:40:45 backup01 kernel: ------------[ cut here ]------------
> Aug  3 18:40:45 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441
btrfs_block_rsv_check+0x151/0x180()
> Aug  3 18:40:45 backup01 kernel: Hardware name: PowerEdge 1950
> Aug  3 18:40:45 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si
ipmi_msghandler aoe bnx2
> Aug  3 18:40:45 backup01 kernel: Pid: 7525, comm: btrfs-cleaner Tainted: G
       W   2.6.35-hw #1
> Aug  3 18:40:45 backup01 kernel: Call Trace:
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812f2eb0>] ?
map_extent_buffer+0xb0/0xc0
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] ?
btrfs_block_rsv_check+0x151/0x180
> Aug  3 18:40:45 backup01 kernel: [<ffffffff8104b270>]
warn_slowpath_common+0x80/0xd0
> Aug  3 18:40:45 backup01 kernel: [<ffffffff8104b2d5>]
warn_slowpath_null+0x15/0x20
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812c56a1>]
btrfs_block_rsv_check+0x151/0x180
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812d56fa>]
__btrfs_end_transaction+0x19a/0x220
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812d578e>]
btrfs_end_transaction_throttle+0xe/0x10
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812c84f1>]
btrfs_drop_snapshot+0x2e1/0x5f0
> Aug  3 18:40:45 backup01 kernel: [<ffffffff81662d72>] ?
schedule+0x3f2/0x750
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812d463a>]
btrfs_clean_old_snapshots+0x12a/0x160
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812d0f00>]
cleaner_kthread+0x160/0x190
> Aug  3 18:40:45 backup01 kernel: [<ffffffff812d0da0>] ?
cleaner_kthread+0x0/0x190
> Aug  3 18:40:45 backup01 kernel: [<ffffffff81067ea6>]
kthread+0x96/0xb0
> Aug  3 18:40:45 backup01 kernel: [<ffffffff8100aca4>]
kernel_thread_helper+0x4/0x10
> Aug  3 18:40:45 backup01 kernel: [<ffffffff81067e10>] ?
kthread+0x0/0xb0
> Aug  3 18:40:45 backup01 kernel: [<ffffffff8100aca0>] ?
kernel_thread_helper+0x0/0x10
> Aug  3 18:40:45 backup01 kernel: ---[ end trace cffc4418e2c1f45e ]---
> Aug  3 18:40:45 backup01 kernel: block_rsv size 16194207744 reserved
4497281024 freed 8192 78598144
> Aug  3 18:44:44 backup01 kernel: ------------[ cut here ]------------
> Aug  3 18:44:44 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441
btrfs_block_rsv_check+0x151/0x180()
> Aug  3 18:44:44 backup01 kernel: Hardware name: PowerEdge 1950
> Aug  3 18:44:44 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si
ipmi_msghandler aoe bnx2
> Aug  3 18:44:44 backup01 kernel: Pid: 7526, comm: btrfs-transacti Tainted:
G        W   2.6.35-hw #1
> Aug  3 18:44:44 backup01 kernel: Call Trace:
> Aug  3 18:44:44 backup01 kernel: [<ffffffff812c56a1>] ?
btrfs_block_rsv_check+0x151/0x180
> Aug  3 18:44:44 backup01 kernel: [<ffffffff8104b270>]
warn_slowpath_common+0x80/0xd0
> Aug  3 18:44:44 backup01 kernel: [<ffffffff8104b2d5>]
warn_slowpath_null+0x15/0x20
> Aug  3 18:44:44 backup01 kernel: [<ffffffff812c56a1>]
btrfs_block_rsv_check+0x151/0x180
> Aug  3 18:44:44 backup01 kernel: [<ffffffff812d56fa>]
__btrfs_end_transaction+0x19a/0x220
> Aug  3 18:44:44 backup01 kernel: [<ffffffff812d579b>]
btrfs_end_transaction+0xb/0x10
> Aug  3 18:44:44 backup01 kernel: [<ffffffff812d5d63>]
btrfs_commit_transaction+0x5c3/0x6a0
> Aug  3 18:44:44 backup01 kernel: [<ffffffff810683e0>] ?
autoremove_wake_function+0x0/0x40
> Aug  3 18:44:44 backup01 kernel: [<ffffffff812d0d90>]
transaction_kthread+0x250/0x260
> Aug  3 18:44:44 backup01 kernel: [<ffffffff812d0b40>] ?
transaction_kthread+0x0/0x260
> Aug  3 18:44:44 backup01 kernel: [<ffffffff81067ea6>]
kthread+0x96/0xb0
> Aug  3 18:44:44 backup01 kernel: [<ffffffff8100aca4>]
kernel_thread_helper+0x4/0x10
> Aug  3 18:44:44 backup01 kernel: [<ffffffff81067e10>] ?
kthread+0x0/0xb0
> Aug  3 18:44:44 backup01 kernel: [<ffffffff8100aca0>] ?
kernel_thread_helper+0x0/0x10
> Aug  3 18:44:44 backup01 kernel: ---[ end trace cffc4418e2c1f45f ]---
> Aug  3 18:44:44 backup01 kernel: block_rsv size 16194207744 reserved
4522696704 freed 53190656 0
>
These warning is because btrfs in 2.6.35 reserves more metadata space
for internal use
than older kernel. Your FS is too full, btrfs can''t reserve enough
metadata space.
> I rebuilt with the #if 0 changed to #if 1 on extent-tree.c:2947:
>
> #if 1 /* I hope we never need this code again, just in case */
>
> ha! :)  "rm" is succeeding everywhere so far, and so this path
hasn''t
> been hit yet.  Perhaps it has to fight with the btrfs-cleaner, or
> something.  Will post a follow-up later.
yes, it has to fight with the btrfs-cleaner.

Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Simon Kirby

2010-Aug-09 23:25 UTC

head link

Re: Intermittent no space errors

On Wed, Aug 04, 2010 at 07:21:00PM +0800, Yan, Zheng  wrote:
> > We''re seeing this too, since upgrading from 2.6.33.2 + merged
old git btrfs
> > unstable HEAD to plain 2.6.35.
> >
> > [sroot@backup01:.../.rmagic]# rm *
> > rm: cannot remove `WEEKLY_bar3d.png'': No space left on device
> > rm: cannot remove `WEEKLY.html'': No space left on device
> > rm: cannot remove `YEARLY_bar3d.png'': No space left on device
> > rm: cannot remove `YEARLY.html'': No space left on device
>...
> > Aug ?3 18:44:44 backup01 kernel: ------------[ cut here ]------------
> > Aug ?3 18:44:44 backup01 kernel: WARNING: at
fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x151/0x180()
>...
> 
> These warning is because btrfs in 2.6.35 reserves more metadata space
> for internal use
> than older kernel. Your FS is too full, btrfs can''t reserve enough
> metadata space.
Hello!

Is it possible that 2.6.33.2 btrfs has mucked up the on-disk stuff in a
way that causes 2.6.35 to be unhappy?  The file system in question was
reported to be 85% full, according to "df".

In the meantime, we''ve been having some other problems on 2.6.35; for
example, rsync has been trying to append a block to a file for the past
5 days.  The file system is reported as 45% full:

[sroot@backup01:/root]# df -Pt btrfs /backup/bu000/vol05/
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/mapper/bu000-vol05 3221225472 1429529324 1791696148      45%
/backup/bu000/vol05
[sroot@backup01:/root]# btrfs files df /backup/bu000/vol05
Data: total=1.57TB, used=1.31TB
Metadata: total=15.51GB, used=10.48GB
System: total=12.00MB, used=192.00KB

At some point today, the kernel also spat this out:

BUG: soft lockup - CPU#3 stuck for 61s! [rsync:21903]
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2
CPU 3
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2

Pid: 21903, comm: rsync Tainted: G        W   2.6.35-hw #2 0NK937/PowerEdge 1950
RIP: 0010:[<ffffffff81101a2d>]  [<ffffffff81101a2d>] iput+0x5d/0x70
RSP: 0018:ffff8802c14abb48  EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff8802c14abb58 RCX: 0000000000000003
RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff88007c075980
RBP: ffffffff8100a84e R08: 0000000000000001 R09: 8000000000000000
R10: 0000000000000002 R11: 0000000000000000 R12: ffffffffffffff66
R13: ffffffff81af04e0 R14: 0000000000000000 R15: 7fffffffffffffff
FS:  00007fd13bbb06e0(0000) GS:ffff880001cc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000002f5a108 CR3: 00000001eb94a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rsync (pid: 21903, threadinfo ffff8802c14aa000, task ffff880080b04b00)
Stack:
ffff88007c075888 ffff88007c0757b0 ffff8802c14abb98 ffffffff812d7439
<0> ffffffff81664cde 0000000000000001 0000000004d80000 0000000000004000
<0> ffff88042a708178 ffff88042a708000 ffff8802c14abc08 ffffffff812c599c
Call Trace:
[<ffffffff812d7439>] ? btrfs_start_one_delalloc_inode+0x129/0x160
[<ffffffff81664cde>] ? _raw_spin_lock+0xe/0x20
[<ffffffff812c599c>] ? shrink_delalloc+0x8c/0x130
[<ffffffff812c5f39>] ? btrfs_delalloc_reserve_metadata+0x189/0x190
[<ffffffff8110180e>] ? file_update_time+0x11e/0x180
[<ffffffff812c5f83>] ? btrfs_delalloc_reserve_space+0x43/0x60
[<ffffffff812e2a98>] ? btrfs_file_aio_write+0x508/0x970
[<ffffffff8100a84e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff810ec1b1>] ? do_sync_write+0xd1/0x120
[<ffffffff810fc767>] ? poll_select_copy_remaining+0xf7/0x140
[<ffffffff810ecd2b>] ? vfs_write+0xcb/0x1a0
[<ffffffff810ecef0>] ? sys_write+0x50/0x90
[<ffffffff81009f02>] ? system_call_fastpath+0x16/0x1b
Code: 00 01 00 00 48 c7 c2 a0 2c 10 81 48 8b 40 30 48 85 c0 74 12 48 8b 50 20 48
c7 c0 a0 2c 10 81 48 85 d2 48 0
Call Trace:
[<ffffffff812d7439>] ? btrfs_start_one_delalloc_inode+0x129/0x160
[<ffffffff81664cde>] ? _raw_spin_lock+0xe/0x20
[<ffffffff812c599c>] ? shrink_delalloc+0x8c/0x130
[<ffffffff812c5f39>] ? btrfs_delalloc_reserve_metadata+0x189/0x190
[<ffffffff8110180e>] ? file_update_time+0x11e/0x180
[<ffffffff812c5f83>] ? btrfs_delalloc_reserve_space+0x43/0x60
[<ffffffff812e2a98>] ? btrfs_file_aio_write+0x508/0x970
[<ffffffff8100a84e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff810ec1b1>] ? do_sync_write+0xd1/0x120
[<ffffffff810fc767>] ? poll_select_copy_remaining+0xf7/0x140
[<ffffffff810ecd2b>] ? vfs_write+0xcb/0x1a0
[<ffffffff810ecef0>] ? sys_write+0x50/0x90
[<ffffffff81009f02>] ? system_call_fastpath+0x16/0x1b

[sroot@backup01:/root]# ls -l /proc/21903/fd/1
lrwx------ 1 root root 64 2010-08-09 18:21 /proc/21903/fd/1 ->
/backup/bu000/vol05/vg005_web11_backup/2010-08-04-17-00/64/54/.../customer
file.mov.aYX4Js
[sroot@backup01:/root]# ls -lL /proc/21903/fd/1
-rw------- 1 root root 977797120 2010-08-04 20:39 /proc/21903/fd/1
[sroot@backup01:/root]# ps auxw|grep rsync
root     21903 73.2  0.0  12912   192 ?        R    Aug04 5177:08 rsync -aHq
--numeric-ids --exclude-from=/etc/backups/backup.exclude --delete
--delete-excluded /storage/vg005/web11/64/54/
/backup/bu000/vol05/vg005_web11_backup/2010-08-04-17-00/64/54

In other words, the rsync has made no progress for 5 days (or at least
the mtime hasn''t changed since then).

"perf top" still shows output like this, showing that btrfs is trying
to btrfs_find_space_cluster all of the time:

     samples  pcnt function                       DSO
     _______ _____ ______________________________ _________________

     2127.00 11.9% find_next_bit                  [kernel]
     1914.00 10.7% find_next_zero_bit             [kernel]
     1580.00  8.8% schedule                       [kernel]
     1340.00  7.5% btrfs_find_space_cluster       [kernel]
     1238.00  6.9% _raw_spin_lock_irqsave         [kernel]
     1017.00  5.7% _raw_spin_lock                 [kernel]
      662.00  3.7% sched_clock_local              [kernel]
      615.00  3.4% native_read_tsc                [kernel]
      590.00  3.3% _raw_spin_lock_irq             [kernel]
      468.00  2.6% _raw_spin_unlock_irqrestore    [kernel]
      405.00  2.3% schedule_timeout               [kernel]
      338.00  1.9% native_sched_clock             [kernel]
      329.00  1.8% update_curr                    [kernel]
      323.00  1.8% lock_timer_base                [kernel]
      297.00  1.7% btrfs_start_one_delalloc_inode [kernel]
      285.00  1.6% pick_next_task_fair            [kernel]
      267.00  1.5% try_to_del_timer_sync          [kernel]
      248.00  1.4% sched_clock_cpu                [kernel]
      245.00  1.4% deflate_fast                   [kernel]

So, is it possible that the older btrfs code left things in a way that
wouldn''t have happened if we had started with 3.6.35 to begin with? 
In the case of the 45% full file system, it seems it should be
able to allocate more space without issue.

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jul 2010 - Intermittent no space errors

Intermittent no space errors

Re: Intermittent no space errors

Re: Intermittent no space errors

Re: Intermittent no space errors

Re: Intermittent no space errors

Re: Intermittent no space errors

Re: Intermittent no space errors

Re: Intermittent no space errors