Hello, On 2.6.35-rc5 I''m seeing some weird behavior under heavy IO loads. I have a backup process that fires up several rsync processes. These mirror several dozen servers to individual sub-volumes. Everyday I snapshot each sub-volume and rsync over it. The problem I''m seeing is my rsync processes are failing randomly with "No space left on device". This is a 6 Terabyte volume with plenty of free space. Mount options: /dev/sdb on /backups type btrfs (rw,max_inline=0,compress) [root@rsync1 ~]# btrfs filesystem df /backups/ Data: total=1.88TB, used=1.88TB Metadata: total=43.38GB, used=32.06GB System: total=12.00MB, used=260.00KB [root@rsync1 ~]# df /dev/sdb Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb 5781249024 2087273084 3693975940 37% /backups They don''t all fail at once. Normally I have 4-5 running at a time and 1 or 2 will drop out with a no space error. The rest continue on. I''ve noticed it will generally occur on ones that are in the middle of transferring a very large file. If I lighten the load to one rsync at a time it appears to happen less frequently. Any known issues I should be aware of? Thanks, -- Dave Cundiff System Administrator A2Hosting, Inc http://www.a2hosting.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 27, 2010 at 5:09 AM, Dave Cundiff <syshackmin@gmail.com> wrote:> Hello, > > On 2.6.35-rc5 I''m seeing some weird behavior under heavy IO loads. I > have a backup process that fires up several rsync processes. These > mirror several dozen servers to individual sub-volumes. Everyday I > snapshot each sub-volume and rsync over it. > > The problem I''m seeing is my rsync processes are failing randomly with > "No space left on device". This is a 6 Terabyte volume with plenty of > free space. > > Mount options: > /dev/sdb on /backups type btrfs (rw,max_inline=0,compress) > > [root@rsync1 ~]# btrfs filesystem df /backups/ > Data: total=1.88TB, used=1.88TB > Metadata: total=43.38GB, used=32.06GB > System: total=12.00MB, used=260.00KB > > [root@rsync1 ~]# df /dev/sdb > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/sdb 5781249024 2087273084 3693975940 37% /backups > > They don''t all fail at once. Normally I have 4-5 running at a time and > 1 or 2 will drop out with a no space error. The rest continue on. I''ve > noticed it will generally occur on ones that are in the middle of > transferring a very large file. If I lighten the load to one rsync at > a time it appears to happen less frequently. > > Any known issues I should be aware of? >Thank you for reporting this. I will dig in. Yan, Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, I installed the git repo kernel and added some debug to the ENOSPC returns. Unfortunately its still failing. If it helps any its bombing out in btrfs_check_data_free_space() in extent-tree.c. Returning on the ENOSPC at line 2959. Unfortunately that is the extent of my ability to debug a filesystem. :P Thanks, On Tue, Jul 27, 2010 at 9:19 AM, Yan, Zheng <yanzheng@21cn.com> wrote:> On Tue, Jul 27, 2010 at 5:09 AM, Dave Cundiff <syshackmin@gmail.com> wrote: >> Hello, >> >> On 2.6.35-rc5 I''m seeing some weird behavior under heavy IO loads. I >> have a backup process that fires up several rsync processes. These >> mirror several dozen servers to individual sub-volumes. Everyday I >> snapshot each sub-volume and rsync over it. >> >> The problem I''m seeing is my rsync processes are failing randomly with >> "No space left on device". This is a 6 Terabyte volume with plenty of >> free space. >> >> Mount options: >> /dev/sdb on /backups type btrfs (rw,max_inline=0,compress) >> >> [root@rsync1 ~]# btrfs filesystem df /backups/ >> Data: total=1.88TB, used=1.88TB >> Metadata: total=43.38GB, used=32.06GB >> System: total=12.00MB, used=260.00KB >> >> [root@rsync1 ~]# df /dev/sdb >> Filesystem 1K-blocks Used Available Use% Mounted on >> /dev/sdb 5781249024 2087273084 3693975940 37% /backups >> >> They don''t all fail at once. Normally I have 4-5 running at a time and >> 1 or 2 will drop out with a no space error. The rest continue on. I''ve >> noticed it will generally occur on ones that are in the middle of >> transferring a very large file. If I lighten the load to one rsync at >> a time it appears to happen less frequently. >> >> Any known issues I should be aware of? >> > > Thank you for reporting this. I will dig in. > > Yan, Zheng >-- Dave Cundiff System Administrator A2Hosting, Inc http://www.a2hosting.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 28, 2010 at 4:30 AM, Dave Cundiff <syshackmin@gmail.com> wrote:> Hello, > > I installed the git repo kernel and added some debug to the ENOSPC > returns. Unfortunately its still failing. If it helps any its bombing > out in btrfs_check_data_free_space() in extent-tree.c. Returning on > the ENOSPC at line 2959. > > Unfortunately that is the extent of my ability to debug a filesystem. :P >This is really helpful, thank you very much. Yan, Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, Pherhaps it would be a good idea to add a tracepoint before each return ENOSPC? It shouldn''t matter too much since a reasonable assumption would be that filesystems aren''t running out of space most of the time. And you can use ''perf'' for more insight in these cases without recompiling the kernel. Regards, justin.... On 27/07/10 22:30, Dave Cundiff wrote:> Hello, > > I installed the git repo kernel and added some debug to the ENOSPC > returns. Unfortunately its still failing. If it helps any its bombing > out in btrfs_check_data_free_space() in extent-tree.c. Returning on > the ENOSPC at line 2959. > > Unfortunately that is the extent of my ability to debug a filesystem. :P > > Thanks, > > On Tue, Jul 27, 2010 at 9:19 AM, Yan, Zheng <yanzheng@21cn.com> wrote: > >> On Tue, Jul 27, 2010 at 5:09 AM, Dave Cundiff <syshackmin@gmail.com> wrote: >> >>> Hello, >>> >>> On 2.6.35-rc5 I''m seeing some weird behavior under heavy IO loads. I >>> have a backup process that fires up several rsync processes. These >>> mirror several dozen servers to individual sub-volumes. Everyday I >>> snapshot each sub-volume and rsync over it. >>> >>> The problem I''m seeing is my rsync processes are failing randomly with >>> "No space left on device". This is a 6 Terabyte volume with plenty of >>> free space. >>> >>> Mount options: >>> /dev/sdb on /backups type btrfs (rw,max_inline=0,compress) >>> >>> [root@rsync1 ~]# btrfs filesystem df /backups/ >>> Data: total=1.88TB, used=1.88TB >>> Metadata: total=43.38GB, used=32.06GB >>> System: total=12.00MB, used=260.00KB >>> >>> [root@rsync1 ~]# df /dev/sdb >>> Filesystem 1K-blocks Used Available Use% Mounted on >>> /dev/sdb 5781249024 2087273084 3693975940 37% /backups >>> >>> They don''t all fail at once. Normally I have 4-5 running at a time and >>> 1 or 2 will drop out with a no space error. The rest continue on. I''ve >>> noticed it will generally occur on ones that are in the middle of >>> transferring a very large file. If I lighten the load to one rsync at >>> a time it appears to happen less frequently. >>> >>> Any known issues I should be aware of? >>> >>> >> Thank you for reporting this. I will dig in. >> >> Yan, Zheng >> >> > > >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 28, 2010 at 08:31:10AM +0800, Yan, Zheng wrote:> On Wed, Jul 28, 2010 at 4:30 AM, Dave Cundiff <syshackmin@gmail.com> wrote: > > Hello, > > > > I installed the git repo kernel and added some debug to the ENOSPC > > returns. Unfortunately its still failing. If it helps any its bombing > > out in btrfs_check_data_free_space() in extent-tree.c. Returning on > > the ENOSPC at line 2959. > > > > Unfortunately that is the extent of my ability to debug a filesystem. :P > > This is really helpful, thank you very much.We''re seeing this too, since upgrading from 2.6.33.2 + merged old git btrfs unstable HEAD to plain 2.6.35. [sroot@backup01:.../.rmagic]# rm * rm: cannot remove `WEEKLY_bar3d.png'': No space left on device rm: cannot remove `WEEKLY.html'': No space left on device rm: cannot remove `YEARLY_bar3d.png'': No space left on device rm: cannot remove `YEARLY.html'': No space left on device [sroot@backup01:.../.rmagic]# l total 25 drwxr-xr-x 1 1037300 1037300 108 2010-08-03 18:19 ./ drwxr-xr-x 1 1037300 1037300 28 2009-11-07 05:57 ../ -rw-r--r-- 1 1037300 1037300 5720 2010-04-23 13:39 WEEKLY_bar3d.png -rw-r--r-- 1 1037300 1037300 11882 2010-04-23 13:39 WEEKLY.html -rw-r--r-- 1 1037300 1037300 2998 2010-04-23 13:39 YEARLY_bar3d.png -rw-r--r-- 1 1037300 1037300 3016 2010-04-23 13:39 YEARLY.html [sroot@backup01:.../.rmagic]# rm * rm: cannot remove `WEEKLY_bar3d.png'': No space left on device rm: cannot remove `YEARLY.html'': No space left on device [sroot@backup01:.../.rmagic]# rm * rm: cannot remove `WEEKLY_bar3d.png'': No space left on device rm: cannot remove `YEARLY.html'': No space left on device [sroot@backup01:.../.rmagic]# rm * [sroot@backup01:.../.rmagic]# [sroot@backup01:/root]# btrfs filesystem df /backup/bu001/vol04 Data: total=2.55TB, used=2.20TB Metadata: total=230.13GB, used=183.83GB System: total=12.00MB, used=548.00KB [sroot@backup01:/root]# df -P /backup/bu001/vol04 Filesystem 1024-blocks Used Available Capacity Mounted on /dev/mapper/bu001-vol04 3221225472 2742785400 478440072 86% /backup/bu001/vol04 [sroot@backup01:/root]# l /dev/mapper/bu001-vol04 brw-rw---- 1 root disk 252, 10 2010-08-03 16:02 /dev/mapper/bu001-vol04 [sroot@backup01:/root]# btrfs filesystem show /dev/dm-10 failed to read /dev/sr0 Label: none uuid: 0c55f5f4-b618-4ec2-9dbc-e3e70a901e1a Total devices 1 FS bytes used 2.37TB devid 1 size 3.00TB used 3.00TB path /dev/dm-10 Btrfs Btrfs v0.19 We''re also seeing things like this in dmesg, which appear to be coming from btrfs-cleaner cleaning some old snapshot: Aug 3 18:40:45 backup01 kernel: ------------[ cut here ]------------ Aug 3 18:40:45 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x151/0x180() Aug 3 18:40:45 backup01 kernel: Hardware name: PowerEdge 1950 Aug 3 18:40:45 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 Aug 3 18:40:45 backup01 kernel: Pid: 7525, comm: btrfs-cleaner Tainted: G W 2.6.35-hw #1 Aug 3 18:40:45 backup01 kernel: Call Trace: Aug 3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] ? btrfs_block_rsv_check+0x151/0x180 Aug 3 18:40:45 backup01 kernel: [<ffffffff8104b270>] warn_slowpath_common+0x80/0xd0 Aug 3 18:40:45 backup01 kernel: [<ffffffff8104b2d5>] warn_slowpath_null+0x15/0x20 Aug 3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] btrfs_block_rsv_check+0x151/0x180 Aug 3 18:40:45 backup01 kernel: [<ffffffff812d5ea1>] btrfs_should_end_transaction+0x61/0x90 Aug 3 18:40:45 backup01 kernel: [<ffffffff812c842d>] btrfs_drop_snapshot+0x21d/0x5f0 Aug 3 18:40:45 backup01 kernel: [<ffffffff81662d72>] ? schedule+0x3f2/0x750 Aug 3 18:40:45 backup01 kernel: [<ffffffff812d463a>] btrfs_clean_old_snapshots+0x12a/0x160 Aug 3 18:40:45 backup01 kernel: [<ffffffff812d0f00>] cleaner_kthread+0x160/0x190 Aug 3 18:40:45 backup01 kernel: [<ffffffff812d0da0>] ? cleaner_kthread+0x0/0x190 Aug 3 18:40:45 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/0xb0 Aug 3 18:40:45 backup01 kernel: [<ffffffff8100aca4>] kernel_thread_helper+0x4/0x10 Aug 3 18:40:45 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0/0xb0 Aug 3 18:40:45 backup01 kernel: [<ffffffff8100aca0>] ? kernel_thread_helper+0x0/0x10 Aug 3 18:40:45 backup01 kernel: ---[ end trace cffc4418e2c1f45d ]--- Aug 3 18:40:45 backup01 kernel: block_rsv size 16194207744 reserved 4497289216 freed 0 78598144 Aug 3 18:40:45 backup01 kernel: ------------[ cut here ]------------ Aug 3 18:40:45 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x151/0x180() Aug 3 18:40:45 backup01 kernel: Hardware name: PowerEdge 1950 Aug 3 18:40:45 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 Aug 3 18:40:45 backup01 kernel: Pid: 7525, comm: btrfs-cleaner Tainted: G W 2.6.35-hw #1 Aug 3 18:40:45 backup01 kernel: Call Trace: Aug 3 18:40:45 backup01 kernel: [<ffffffff812f2eb0>] ? map_extent_buffer+0xb0/0xc0 Aug 3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] ? btrfs_block_rsv_check+0x151/0x180 Aug 3 18:40:45 backup01 kernel: [<ffffffff8104b270>] warn_slowpath_common+0x80/0xd0 Aug 3 18:40:45 backup01 kernel: [<ffffffff8104b2d5>] warn_slowpath_null+0x15/0x20 Aug 3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] btrfs_block_rsv_check+0x151/0x180 Aug 3 18:40:45 backup01 kernel: [<ffffffff812d56fa>] __btrfs_end_transaction+0x19a/0x220 Aug 3 18:40:45 backup01 kernel: [<ffffffff812d578e>] btrfs_end_transaction_throttle+0xe/0x10 Aug 3 18:40:45 backup01 kernel: [<ffffffff812c84f1>] btrfs_drop_snapshot+0x2e1/0x5f0 Aug 3 18:40:45 backup01 kernel: [<ffffffff81662d72>] ? schedule+0x3f2/0x750 Aug 3 18:40:45 backup01 kernel: [<ffffffff812d463a>] btrfs_clean_old_snapshots+0x12a/0x160 Aug 3 18:40:45 backup01 kernel: [<ffffffff812d0f00>] cleaner_kthread+0x160/0x190 Aug 3 18:40:45 backup01 kernel: [<ffffffff812d0da0>] ? cleaner_kthread+0x0/0x190 Aug 3 18:40:45 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/0xb0 Aug 3 18:40:45 backup01 kernel: [<ffffffff8100aca4>] kernel_thread_helper+0x4/0x10 Aug 3 18:40:45 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0/0xb0 Aug 3 18:40:45 backup01 kernel: [<ffffffff8100aca0>] ? kernel_thread_helper+0x0/0x10 Aug 3 18:40:45 backup01 kernel: ---[ end trace cffc4418e2c1f45e ]--- Aug 3 18:40:45 backup01 kernel: block_rsv size 16194207744 reserved 4497281024 freed 8192 78598144 Aug 3 18:44:44 backup01 kernel: ------------[ cut here ]------------ Aug 3 18:44:44 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x151/0x180() Aug 3 18:44:44 backup01 kernel: Hardware name: PowerEdge 1950 Aug 3 18:44:44 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 Aug 3 18:44:44 backup01 kernel: Pid: 7526, comm: btrfs-transacti Tainted: G W 2.6.35-hw #1 Aug 3 18:44:44 backup01 kernel: Call Trace: Aug 3 18:44:44 backup01 kernel: [<ffffffff812c56a1>] ? btrfs_block_rsv_check+0x151/0x180 Aug 3 18:44:44 backup01 kernel: [<ffffffff8104b270>] warn_slowpath_common+0x80/0xd0 Aug 3 18:44:44 backup01 kernel: [<ffffffff8104b2d5>] warn_slowpath_null+0x15/0x20 Aug 3 18:44:44 backup01 kernel: [<ffffffff812c56a1>] btrfs_block_rsv_check+0x151/0x180 Aug 3 18:44:44 backup01 kernel: [<ffffffff812d56fa>] __btrfs_end_transaction+0x19a/0x220 Aug 3 18:44:44 backup01 kernel: [<ffffffff812d579b>] btrfs_end_transaction+0xb/0x10 Aug 3 18:44:44 backup01 kernel: [<ffffffff812d5d63>] btrfs_commit_transaction+0x5c3/0x6a0 Aug 3 18:44:44 backup01 kernel: [<ffffffff810683e0>] ? autoremove_wake_function+0x0/0x40 Aug 3 18:44:44 backup01 kernel: [<ffffffff812d0d90>] transaction_kthread+0x250/0x260 Aug 3 18:44:44 backup01 kernel: [<ffffffff812d0b40>] ? transaction_kthread+0x0/0x260 Aug 3 18:44:44 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/0xb0 Aug 3 18:44:44 backup01 kernel: [<ffffffff8100aca4>] kernel_thread_helper+0x4/0x10 Aug 3 18:44:44 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0/0xb0 Aug 3 18:44:44 backup01 kernel: [<ffffffff8100aca0>] ? kernel_thread_helper+0x0/0x10 Aug 3 18:44:44 backup01 kernel: ---[ end trace cffc4418e2c1f45f ]--- Aug 3 18:44:44 backup01 kernel: block_rsv size 16194207744 reserved 4522696704 freed 53190656 0 I rebuilt with the #if 0 changed to #if 1 on extent-tree.c:2947: #if 1 /* I hope we never need this code again, just in case */ ha! :) "rm" is succeeding everywhere so far, and so this path hasn''t been hit yet. Perhaps it has to fight with the btrfs-cleaner, or something. Will post a follow-up later. Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 4, 2010 at 8:24 AM, Simon Kirby <sim@hostway.ca> wrote:> On Wed, Jul 28, 2010 at 08:31:10AM +0800, Yan, Zheng wrote: > >> On Wed, Jul 28, 2010 at 4:30 AM, Dave Cundiff <syshackmin@gmail.com> wrote: >> > Hello, >> > >> > I installed the git repo kernel and added some debug to the ENOSPC >> > returns. Unfortunately its still failing. If it helps any its bombing >> > out in btrfs_check_data_free_space() in extent-tree.c. Returning on >> > the ENOSPC at line 2959. >> > >> > Unfortunately that is the extent of my ability to debug a filesystem. :P >> >> This is really helpful, thank you very much. > > We''re seeing this too, since upgrading from 2.6.33.2 + merged old git btrfs > unstable HEAD to plain 2.6.35. > > [sroot@backup01:.../.rmagic]# rm * > rm: cannot remove `WEEKLY_bar3d.png'': No space left on device > rm: cannot remove `WEEKLY.html'': No space left on device > rm: cannot remove `YEARLY_bar3d.png'': No space left on device > rm: cannot remove `YEARLY.html'': No space left on device > [sroot@backup01:.../.rmagic]# l > total 25 > drwxr-xr-x 1 1037300 1037300 108 2010-08-03 18:19 ./ > drwxr-xr-x 1 1037300 1037300 28 2009-11-07 05:57 ../ > -rw-r--r-- 1 1037300 1037300 5720 2010-04-23 13:39 WEEKLY_bar3d.png > -rw-r--r-- 1 1037300 1037300 11882 2010-04-23 13:39 WEEKLY.html > -rw-r--r-- 1 1037300 1037300 2998 2010-04-23 13:39 YEARLY_bar3d.png > -rw-r--r-- 1 1037300 1037300 3016 2010-04-23 13:39 YEARLY.html > [sroot@backup01:.../.rmagic]# rm * > rm: cannot remove `WEEKLY_bar3d.png'': No space left on device > rm: cannot remove `YEARLY.html'': No space left on device > [sroot@backup01:.../.rmagic]# rm * > rm: cannot remove `WEEKLY_bar3d.png'': No space left on device > rm: cannot remove `YEARLY.html'': No space left on device > [sroot@backup01:.../.rmagic]# rm * > [sroot@backup01:.../.rmagic]# > > [sroot@backup01:/root]# btrfs filesystem df /backup/bu001/vol04 > Data: total=2.55TB, used=2.20TB > Metadata: total=230.13GB, used=183.83GB > System: total=12.00MB, used=548.00KB > [sroot@backup01:/root]# df -P /backup/bu001/vol04 > Filesystem 1024-blocks Used Available Capacity Mounted on > /dev/mapper/bu001-vol04 3221225472 2742785400 478440072 86% /backup/bu001/vol04 > [sroot@backup01:/root]# l /dev/mapper/bu001-vol04 > brw-rw---- 1 root disk 252, 10 2010-08-03 16:02 /dev/mapper/bu001-vol04 > [sroot@backup01:/root]# btrfs filesystem show /dev/dm-10 > failed to read /dev/sr0 > Label: none uuid: 0c55f5f4-b618-4ec2-9dbc-e3e70a901e1a > Total devices 1 FS bytes used 2.37TB > devid 1 size 3.00TB used 3.00TB path /dev/dm-10 > > Btrfs Btrfs v0.19 > > We''re also seeing things like this in dmesg, which appear to be coming > from btrfs-cleaner cleaning some old snapshot: > > Aug 3 18:40:45 backup01 kernel: ------------[ cut here ]------------ > Aug 3 18:40:45 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x151/0x180() > Aug 3 18:40:45 backup01 kernel: Hardware name: PowerEdge 1950 > Aug 3 18:40:45 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 > Aug 3 18:40:45 backup01 kernel: Pid: 7525, comm: btrfs-cleaner Tainted: G W 2.6.35-hw #1 > Aug 3 18:40:45 backup01 kernel: Call Trace: > Aug 3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] ? btrfs_block_rsv_check+0x151/0x180 > Aug 3 18:40:45 backup01 kernel: [<ffffffff8104b270>] warn_slowpath_common+0x80/0xd0 > Aug 3 18:40:45 backup01 kernel: [<ffffffff8104b2d5>] warn_slowpath_null+0x15/0x20 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] btrfs_block_rsv_check+0x151/0x180 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812d5ea1>] btrfs_should_end_transaction+0x61/0x90 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812c842d>] btrfs_drop_snapshot+0x21d/0x5f0 > Aug 3 18:40:45 backup01 kernel: [<ffffffff81662d72>] ? schedule+0x3f2/0x750 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812d463a>] btrfs_clean_old_snapshots+0x12a/0x160 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812d0f00>] cleaner_kthread+0x160/0x190 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812d0da0>] ? cleaner_kthread+0x0/0x190 > Aug 3 18:40:45 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/0xb0 > Aug 3 18:40:45 backup01 kernel: [<ffffffff8100aca4>] kernel_thread_helper+0x4/0x10 > Aug 3 18:40:45 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0/0xb0 > Aug 3 18:40:45 backup01 kernel: [<ffffffff8100aca0>] ? kernel_thread_helper+0x0/0x10 > Aug 3 18:40:45 backup01 kernel: ---[ end trace cffc4418e2c1f45d ]--- > Aug 3 18:40:45 backup01 kernel: block_rsv size 16194207744 reserved 4497289216 freed 0 78598144 > Aug 3 18:40:45 backup01 kernel: ------------[ cut here ]------------ > Aug 3 18:40:45 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x151/0x180() > Aug 3 18:40:45 backup01 kernel: Hardware name: PowerEdge 1950 > Aug 3 18:40:45 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 > Aug 3 18:40:45 backup01 kernel: Pid: 7525, comm: btrfs-cleaner Tainted: G W 2.6.35-hw #1 > Aug 3 18:40:45 backup01 kernel: Call Trace: > Aug 3 18:40:45 backup01 kernel: [<ffffffff812f2eb0>] ? map_extent_buffer+0xb0/0xc0 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] ? btrfs_block_rsv_check+0x151/0x180 > Aug 3 18:40:45 backup01 kernel: [<ffffffff8104b270>] warn_slowpath_common+0x80/0xd0 > Aug 3 18:40:45 backup01 kernel: [<ffffffff8104b2d5>] warn_slowpath_null+0x15/0x20 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] btrfs_block_rsv_check+0x151/0x180 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812d56fa>] __btrfs_end_transaction+0x19a/0x220 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812d578e>] btrfs_end_transaction_throttle+0xe/0x10 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812c84f1>] btrfs_drop_snapshot+0x2e1/0x5f0 > Aug 3 18:40:45 backup01 kernel: [<ffffffff81662d72>] ? schedule+0x3f2/0x750 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812d463a>] btrfs_clean_old_snapshots+0x12a/0x160 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812d0f00>] cleaner_kthread+0x160/0x190 > Aug 3 18:40:45 backup01 kernel: [<ffffffff812d0da0>] ? cleaner_kthread+0x0/0x190 > Aug 3 18:40:45 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/0xb0 > Aug 3 18:40:45 backup01 kernel: [<ffffffff8100aca4>] kernel_thread_helper+0x4/0x10 > Aug 3 18:40:45 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0/0xb0 > Aug 3 18:40:45 backup01 kernel: [<ffffffff8100aca0>] ? kernel_thread_helper+0x0/0x10 > Aug 3 18:40:45 backup01 kernel: ---[ end trace cffc4418e2c1f45e ]--- > Aug 3 18:40:45 backup01 kernel: block_rsv size 16194207744 reserved 4497281024 freed 8192 78598144 > Aug 3 18:44:44 backup01 kernel: ------------[ cut here ]------------ > Aug 3 18:44:44 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x151/0x180() > Aug 3 18:44:44 backup01 kernel: Hardware name: PowerEdge 1950 > Aug 3 18:44:44 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 > Aug 3 18:44:44 backup01 kernel: Pid: 7526, comm: btrfs-transacti Tainted: G W 2.6.35-hw #1 > Aug 3 18:44:44 backup01 kernel: Call Trace: > Aug 3 18:44:44 backup01 kernel: [<ffffffff812c56a1>] ? btrfs_block_rsv_check+0x151/0x180 > Aug 3 18:44:44 backup01 kernel: [<ffffffff8104b270>] warn_slowpath_common+0x80/0xd0 > Aug 3 18:44:44 backup01 kernel: [<ffffffff8104b2d5>] warn_slowpath_null+0x15/0x20 > Aug 3 18:44:44 backup01 kernel: [<ffffffff812c56a1>] btrfs_block_rsv_check+0x151/0x180 > Aug 3 18:44:44 backup01 kernel: [<ffffffff812d56fa>] __btrfs_end_transaction+0x19a/0x220 > Aug 3 18:44:44 backup01 kernel: [<ffffffff812d579b>] btrfs_end_transaction+0xb/0x10 > Aug 3 18:44:44 backup01 kernel: [<ffffffff812d5d63>] btrfs_commit_transaction+0x5c3/0x6a0 > Aug 3 18:44:44 backup01 kernel: [<ffffffff810683e0>] ? autoremove_wake_function+0x0/0x40 > Aug 3 18:44:44 backup01 kernel: [<ffffffff812d0d90>] transaction_kthread+0x250/0x260 > Aug 3 18:44:44 backup01 kernel: [<ffffffff812d0b40>] ? transaction_kthread+0x0/0x260 > Aug 3 18:44:44 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/0xb0 > Aug 3 18:44:44 backup01 kernel: [<ffffffff8100aca4>] kernel_thread_helper+0x4/0x10 > Aug 3 18:44:44 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0/0xb0 > Aug 3 18:44:44 backup01 kernel: [<ffffffff8100aca0>] ? kernel_thread_helper+0x0/0x10 > Aug 3 18:44:44 backup01 kernel: ---[ end trace cffc4418e2c1f45f ]--- > Aug 3 18:44:44 backup01 kernel: block_rsv size 16194207744 reserved 4522696704 freed 53190656 0 >These warning is because btrfs in 2.6.35 reserves more metadata space for internal use than older kernel. Your FS is too full, btrfs can''t reserve enough metadata space.> I rebuilt with the #if 0 changed to #if 1 on extent-tree.c:2947: > > #if 1 /* I hope we never need this code again, just in case */ > > ha! :) "rm" is succeeding everywhere so far, and so this path hasn''t > been hit yet. Perhaps it has to fight with the btrfs-cleaner, or > something. Will post a follow-up later.yes, it has to fight with the btrfs-cleaner. Yan, Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 04, 2010 at 07:21:00PM +0800, Yan, Zheng wrote:> > We''re seeing this too, since upgrading from 2.6.33.2 + merged old git btrfs > > unstable HEAD to plain 2.6.35. > > > > [sroot@backup01:.../.rmagic]# rm * > > rm: cannot remove `WEEKLY_bar3d.png'': No space left on device > > rm: cannot remove `WEEKLY.html'': No space left on device > > rm: cannot remove `YEARLY_bar3d.png'': No space left on device > > rm: cannot remove `YEARLY.html'': No space left on device >... > > Aug ?3 18:44:44 backup01 kernel: ------------[ cut here ]------------ > > Aug ?3 18:44:44 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x151/0x180() >... > > These warning is because btrfs in 2.6.35 reserves more metadata space > for internal use > than older kernel. Your FS is too full, btrfs can''t reserve enough > metadata space.Hello! Is it possible that 2.6.33.2 btrfs has mucked up the on-disk stuff in a way that causes 2.6.35 to be unhappy? The file system in question was reported to be 85% full, according to "df". In the meantime, we''ve been having some other problems on 2.6.35; for example, rsync has been trying to append a block to a file for the past 5 days. The file system is reported as 45% full: [sroot@backup01:/root]# df -Pt btrfs /backup/bu000/vol05/ Filesystem 1024-blocks Used Available Capacity Mounted on /dev/mapper/bu000-vol05 3221225472 1429529324 1791696148 45% /backup/bu000/vol05 [sroot@backup01:/root]# btrfs files df /backup/bu000/vol05 Data: total=1.57TB, used=1.31TB Metadata: total=15.51GB, used=10.48GB System: total=12.00MB, used=192.00KB At some point today, the kernel also spat this out: BUG: soft lockup - CPU#3 stuck for 61s! [rsync:21903] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 CPU 3 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 Pid: 21903, comm: rsync Tainted: G W 2.6.35-hw #2 0NK937/PowerEdge 1950 RIP: 0010:[<ffffffff81101a2d>] [<ffffffff81101a2d>] iput+0x5d/0x70 RSP: 0018:ffff8802c14abb48 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff8802c14abb58 RCX: 0000000000000003 RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff88007c075980 RBP: ffffffff8100a84e R08: 0000000000000001 R09: 8000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: ffffffffffffff66 R13: ffffffff81af04e0 R14: 0000000000000000 R15: 7fffffffffffffff FS: 00007fd13bbb06e0(0000) GS:ffff880001cc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000002f5a108 CR3: 00000001eb94a000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rsync (pid: 21903, threadinfo ffff8802c14aa000, task ffff880080b04b00) Stack: ffff88007c075888 ffff88007c0757b0 ffff8802c14abb98 ffffffff812d7439 <0> ffffffff81664cde 0000000000000001 0000000004d80000 0000000000004000 <0> ffff88042a708178 ffff88042a708000 ffff8802c14abc08 ffffffff812c599c Call Trace: [<ffffffff812d7439>] ? btrfs_start_one_delalloc_inode+0x129/0x160 [<ffffffff81664cde>] ? _raw_spin_lock+0xe/0x20 [<ffffffff812c599c>] ? shrink_delalloc+0x8c/0x130 [<ffffffff812c5f39>] ? btrfs_delalloc_reserve_metadata+0x189/0x190 [<ffffffff8110180e>] ? file_update_time+0x11e/0x180 [<ffffffff812c5f83>] ? btrfs_delalloc_reserve_space+0x43/0x60 [<ffffffff812e2a98>] ? btrfs_file_aio_write+0x508/0x970 [<ffffffff8100a84e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff810ec1b1>] ? do_sync_write+0xd1/0x120 [<ffffffff810fc767>] ? poll_select_copy_remaining+0xf7/0x140 [<ffffffff810ecd2b>] ? vfs_write+0xcb/0x1a0 [<ffffffff810ecef0>] ? sys_write+0x50/0x90 [<ffffffff81009f02>] ? system_call_fastpath+0x16/0x1b Code: 00 01 00 00 48 c7 c2 a0 2c 10 81 48 8b 40 30 48 85 c0 74 12 48 8b 50 20 48 c7 c0 a0 2c 10 81 48 85 d2 48 0 Call Trace: [<ffffffff812d7439>] ? btrfs_start_one_delalloc_inode+0x129/0x160 [<ffffffff81664cde>] ? _raw_spin_lock+0xe/0x20 [<ffffffff812c599c>] ? shrink_delalloc+0x8c/0x130 [<ffffffff812c5f39>] ? btrfs_delalloc_reserve_metadata+0x189/0x190 [<ffffffff8110180e>] ? file_update_time+0x11e/0x180 [<ffffffff812c5f83>] ? btrfs_delalloc_reserve_space+0x43/0x60 [<ffffffff812e2a98>] ? btrfs_file_aio_write+0x508/0x970 [<ffffffff8100a84e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff810ec1b1>] ? do_sync_write+0xd1/0x120 [<ffffffff810fc767>] ? poll_select_copy_remaining+0xf7/0x140 [<ffffffff810ecd2b>] ? vfs_write+0xcb/0x1a0 [<ffffffff810ecef0>] ? sys_write+0x50/0x90 [<ffffffff81009f02>] ? system_call_fastpath+0x16/0x1b [sroot@backup01:/root]# ls -l /proc/21903/fd/1 lrwx------ 1 root root 64 2010-08-09 18:21 /proc/21903/fd/1 -> /backup/bu000/vol05/vg005_web11_backup/2010-08-04-17-00/64/54/.../customer file.mov.aYX4Js [sroot@backup01:/root]# ls -lL /proc/21903/fd/1 -rw------- 1 root root 977797120 2010-08-04 20:39 /proc/21903/fd/1 [sroot@backup01:/root]# ps auxw|grep rsync root 21903 73.2 0.0 12912 192 ? R Aug04 5177:08 rsync -aHq --numeric-ids --exclude-from=/etc/backups/backup.exclude --delete --delete-excluded /storage/vg005/web11/64/54/ /backup/bu000/vol05/vg005_web11_backup/2010-08-04-17-00/64/54 In other words, the rsync has made no progress for 5 days (or at least the mtime hasn''t changed since then). "perf top" still shows output like this, showing that btrfs is trying to btrfs_find_space_cluster all of the time: samples pcnt function DSO _______ _____ ______________________________ _________________ 2127.00 11.9% find_next_bit [kernel] 1914.00 10.7% find_next_zero_bit [kernel] 1580.00 8.8% schedule [kernel] 1340.00 7.5% btrfs_find_space_cluster [kernel] 1238.00 6.9% _raw_spin_lock_irqsave [kernel] 1017.00 5.7% _raw_spin_lock [kernel] 662.00 3.7% sched_clock_local [kernel] 615.00 3.4% native_read_tsc [kernel] 590.00 3.3% _raw_spin_lock_irq [kernel] 468.00 2.6% _raw_spin_unlock_irqrestore [kernel] 405.00 2.3% schedule_timeout [kernel] 338.00 1.9% native_sched_clock [kernel] 329.00 1.8% update_curr [kernel] 323.00 1.8% lock_timer_base [kernel] 297.00 1.7% btrfs_start_one_delalloc_inode [kernel] 285.00 1.6% pick_next_task_fair [kernel] 267.00 1.5% try_to_del_timer_sync [kernel] 248.00 1.4% sched_clock_cpu [kernel] 245.00 1.4% deflate_fast [kernel] So, is it possible that the older btrfs code left things in a way that wouldn''t have happened if we had started with 3.6.35 to begin with? In the case of the 45% full file system, it seems it should be able to allocate more space without issue. Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html