Hello everyone, Yan Zheng has been doing some major surgery to the back references and extent allocation code, tackling bottlenecks in the code that tracks extents. It scales better with many snapshots and performs better in the common case of no snapshots at all. THE NEW CODE IS A FORWARD ROLLING DISK FORMAT CHANGE. This means it is compatible with the current btrfs disk format, but once you mount a filesystem with the new code, it WILL NO LONGER BE MOUNTABLE FROM OLD KERNELS. Old kernels spit out an error message when you try them on new format filesystems. This is a large change, and I''m hoping to have it stable in time for the 2.6.31 merge window. I''ve been testing it for about a week now, and haven''t been able to cause major problems yet. But, testing the compatibility with old format filesystems is the hard part, and everyone that pulls the new code should backup their data first. I''ve setup git branches called newformat where you can pull the new code. For the kernel (based on 2.6.30-rc7): git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git newformat For the progs: git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git newformat The main benefit of the new code is that backrefs on the extent allocation tree use a fuzzier format. It basically means that we search for the key in the extent allocation tree instead of providing an exact backref to the parent block. This means we can predict how many blocks will be changed when changing the extent allocation tree, and it makes enospc much less complex. It is also significantly faster. For regular subvolume trees, a similar change is made as long as there are no snapshots against a given block. This is the common case, and it makes COW less expensive overall. Yan Zheng also worked out a way to free blocks during the transaction without needing to do an explicit snapshot deletion on the old root when the transaction was done. This gets rid of some complex caching code, and fixes worst-case problems where btrfs could take a very very long time to unmount. btrfs-vol -b is faster with the new code as well, he added caching of high levels in the tree to speed things up. (Many kudos to Yan Zheng for all of this work!) -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 01, 2009 at 05:04:47PM -0400, Chris Mason wrote:> Hello everyone, > > Yan Zheng has been doing some major surgery to the back references and > extent allocation code, tackling bottlenecks in the code that tracks > extents. It scales better with many snapshots and performs better in > the common case of no snapshots at all. > > THE NEW CODE IS A FORWARD ROLLING DISK FORMAT CHANGE. This means it is > compatible with the current btrfs disk format, but once you mount a > filesystem with the new code, it WILL NO LONGER BE MOUNTABLE FROM OLD > KERNELS. Old kernels spit out an error message when you try them on new > format filesystems.Just a quick note that I''m having some issues with the backward compatibility code on 32 bit kernels. It can still read all the old items but it is having problems with creating new backrefs. 32bit is working fine on an entirely new format FS, and my 64 bit box can read and write the old format FS just fine. I''m hoping to track this one down today, but it would be a good idea to wait if you want to try the new code on old filesystems on 32 bit machines. If you do hit crashes, please don''t immediately reformat your FS if you can avoid it. We should be able to fix most problems people hit. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 02, 2009 at 09:28:30AM -0400, Chris Mason wrote:> On Mon, Jun 01, 2009 at 05:04:47PM -0400, Chris Mason wrote: > > Hello everyone, > > > > Yan Zheng has been doing some major surgery to the back references and > > extent allocation code, tackling bottlenecks in the code that tracks > > extents. It scales better with many snapshots and performs better in > > the common case of no snapshots at all. > > > > THE NEW CODE IS A FORWARD ROLLING DISK FORMAT CHANGE. This means it is > > compatible with the current btrfs disk format, but once you mount a > > filesystem with the new code, it WILL NO LONGER BE MOUNTABLE FROM OLD > > KERNELS. Old kernels spit out an error message when you try them on new > > format filesystems. > > Just a quick note that I''m having some issues with the backward > compatibility code on 32 bit kernels. It can still read all the old > items but it is having problems with creating new backrefs. > > 32bit is working fine on an entirely new format FS, and my 64 bit box > can read and write the old format FS just fine. I''m hoping to track > this one down today, but it would be a good idea to wait if you want to > try the new code on old filesystems on 32 bit machines. > > If you do hit crashes, please don''t immediately reformat your FS if you > can avoid it. We should be able to fix most problems people hit.Looks like Yan Zheng tracked this down yesterday, Jens Axboe bravely tested out 32bit old format compat again with his laptop. At this point I think the new format code is looking pretty stable and it is generally ready for more testing. I''ve rebased the newformat kernel tree to fold in the corruption fixes. This way if anyone does a git bisect they won''t end up on a commit that can corrupt their FS by accident. If you''ve already pulled the newformat tree, the new commits will conflict with the old. So, something like this will fix things if you have already pulled the newformat branch: git reset --hard v2.6.30-rc7 git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git newformat If you''ve made your own commits or pulls other than just btrfs, different steps will be required. The btrfs-progs unstable tree was also rebased. Use git reset --hard ed20f5fc905145a0673097b539442d2a59491e77 on the progs tree if you''ve already pulled down the newformat branch. Happy testing everyone -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> Hello everyone, > > Yan Zheng has been doing some major surgery to the back references and > extent allocation code, tackling bottlenecks in the code that tracks > extents. It scales better with many snapshots and performs better in > the common case of no snapshots at all. > > THE NEW CODE IS A FORWARD ROLLING DISK FORMAT CHANGE. This means it is > compatible with the current btrfs disk format, but once you mount a > filesystem with the new code, it WILL NO LONGER BE MOUNTABLE FROM OLD > KERNELS. Old kernels spit out an error message when you try them on new > format filesystems. > > This is a large change, and I''m hoping to have it stable in time for the > 2.6.31 merge window. I''ve been testing it for about a week now, and > haven''t been able to cause major problems yet. But, testing the > compatibility with old format filesystems is the hard part, and > everyone that pulls the new code should backup their data first. > > I''ve setup git branches called newformat where you can pull the new code. > > For the kernel (based on 2.6.30-rc7): > > git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git newformat > >So I started the performance runs on this. The base tests completed fine on the raid system and I will post results as soon as I can finish postprocessing, but when I tried to do nodatacow that machine it crashed pretty early. Here is console log: btrfs2 kernel: [82057.882255] ------------[ cut here ]------------ Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] invalid opcode: 0000 [#1] SMP Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] last sysfs file: /sys/devices/system/cpu/cpu15/cache/index1/shared_cpu_map Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] Stack: Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] ffff88011786d800 ffff8801259f6ea0 000000b21f256030 00000000000000e9 Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] 000000352231b250 ffff880089abbf40 ffff88013d0e2440 0000000000000001 Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] Call Trace: Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] [<ffffffffa0445198>] run_one_delayed_ref+0x382/0x42f [btrfs] Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] [<ffffffffa0464bd1>] ? map_extent_buffer+0xab/0xbe [btrfs] Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] [<ffffffffa0445f75>] run_clustered_refs+0x237/0x2b4 [btrfs] Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] [<ffffffffa0478f85>] ? btrfs_find_ref_cluster+0xdc/0x115 [btrfs] Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] [<ffffffffa044609e>] btrfs_run_delayed_refs+0xac/0x195 [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffffa044e86e>] __btrfs_end_transaction+0x59/0xfe [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffffa044e92e>] btrfs_end_transaction+0xb/0xd [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffffa045418b>] btrfs_finish_ordered_io+0x224/0x24d [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffffa04541c4>] btrfs_writepage_end_io_hook+0x10/0x12 [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffffa0467599>] end_bio_extent_writepage+0xa3/0x18f [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffff8024276e>] ? del_timer_sync+0x14/0x20 Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffff802cbbee>] bio_endio+0x26/0x28 Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffffa044b5d6>] end_workqueue_fn+0x111/0x11e [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffffa046eff5>] worker_loop+0x67/0x1ee [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffffa046ef8e>] ? worker_loop+0x0/0x1ee [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffff8024c324>] kthread+0x56/0x86 Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffff8020c9fa>] child_rip+0xa/0x20 Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffff8024c2ce>] ? kthread+0x0/0x86 Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [<ffffffff8020c9f0>] ? child_rip+0x0/0x20 Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] Code: 08 4c 8d 45 d4 41 8d 44 24 18 48 8b 73 20 48 8b 4d 18 41 b9 01 00 00 00 48 8b 7d b8 4c 89 ea 89 45 d4 e8 df e3 ff ff 85 c0 74 04 <0f> 0b eb fe 49 63 75 40 4d 8b 65 00 49 83 cf 01 4c 89 e7 48 6b Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... I also ran this on the single disk system and it did not make it through base tests. Error are different. [101511.664497] Pid: 28597, comm: btrfs-transacti Tainted: G D 2.6.30-rc7-autokern1 #1 IBM x3950-[88726RU]- [101511.675497] RIP: 0010:[<ffffffff804cd70d>] [<ffffffff804cd70d>] _spin_lock+0x14/0x1a [101511.684494] RSP: 0018:ffff8801309bbb40 EFLAGS: 00000297 [101511.689494] RAX: 0000000000001514 RBX: ffff8801309bbb40 RCX: ffff8801309bbb40 [101511.697493] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800b7427d70 [101511.705491] RBP: ffffffff8020c50e R08: 0000000000000001 R09: ffff8801309bba68 [101511.713490] R10: ffff88012231b910 R11: ffff8800478ad5b0 R12: 0000001a00000032 [101511.721488] R13: ffffffffa04370b1 R14: ffff8801309bbb60 R15: 00000000000003bf [101511.729486] FS: 0000000000000000(0000) GS:ffff88002bac0000(0000) knlGS:0000000000000000 [101511.738483] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [101511.744482] CR2: 00007fbcd3ff1b80 CR3: 0000000000201000 CR4: 00000000000006e0 [101511.752480] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [101511.760479] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [101511.768478] Call Trace: [101511.771478] [<ffffffffa0471187>] ? btrfs_try_spin_lock+0x1c/0x61 [btrfs] [101511.778476] [<ffffffffa043ea17>] ? btrfs_search_slot+0x619/0x73e [btrfs] [101511.786474] [<ffffffffa043f11d>] ? btrfs_insert_empty_items+0x5e/0xa9 [btrfs] [101511.803472] [<ffffffffa0440ce0>] ? alloc_reserved_file_extent+0x89/0x1c3 [btrfs] [101511.811470] [<ffffffffa04401d8>] ? update_reserved_extents+0x98/0xab [btrfs] [101511.819468] [<ffffffffa0445198>] ? run_one_delayed_ref+0x382/0x42f [btrfs] [101511.827467] [<ffffffff802a5387>] ? cache_flusharray+0xa2/0xae [101511.833466] [<ffffffffa0445f75>] ? run_clustered_refs+0x237/0x2b4 [btrfs] [101511.840463] [<ffffffffa0478f85>] ? btrfs_find_ref_cluster+0xdc/0x115 [btrfs] [101511.848462] [<ffffffff804cbdad>] ? thread_return+0x3e/0x91 [101511.854461] [<ffffffffa044609e>] ? btrfs_run_delayed_refs+0xac/0x195 [btrfs] [101511.862459] [<ffffffffa044f59f>] ? btrfs_commit_transaction+0x7b/0x69c [btrfs] [101511.870458] [<ffffffff8024c460>] ? autoremove_wake_function+0x0/0x38 [101511.877458] [<ffffffffa044ee87>] ? start_transaction+0x103/0x10f [btrfs] [101511.885456] [<ffffffffa044c2c6>] ? transaction_kthread+0x17f/0x20a [btrfs] [101511.892453] [<ffffffffa044c147>] ? transaction_kthread+0x0/0x20a [btrfs] [101511.900453] [<ffffffffa044c147>] ? transaction_kthread+0x0/0x20a [btrfs] [101511.907452] [<ffffffff8024c324>] ? kthread+0x56/0x86 [101511.912450] [<ffffffff8020c9fa>] ? child_rip+0xa/0x20 [101511.918449] [<ffffffff8024c2ce>] ? kthread+0x0/0x86 [101511.923449] [<ffffffff8020c9f0>] ? child_rip+0x0/0 [101536.249729] Pid: 28594, comm: btrfs-endio-wri Tainted: G D 2.6.30-rc7-autokern1 #1 IBM x3950-[88726RU]- [101536.249729] RIP: 0010:[<ffffffff804cd70d>] [<ffffffff804cd70d>] _spin_lock+0x14/0x1a [101536.249729] RSP: 0018:ffff88011a80da80 EFLAGS: 00000297 [101536.249729] RAX: 000000000000c6c2 RBX: ffff88011a80da80 RCX: 0000000000000000 [101536.249729] RDX: 0000000000000000 RSI: ffff88013d080000 RDI: ffff8800478ad6b0 [101536.249729] RBP: ffffffff8020c50e R08: 000000000000004c R09: 0000000000000001 [101536.249729] R10: 0000000000000008 R11: 0000000000086000 R12: ffff88011a80da40 [101536.249729] R13: ffff8800aa254800 R14: 0000000b470c7fff R15: ffff88011f256030 [101536.249729] FS: 0000000000000000(0000) GS:ffff88002ba30000(0000) knlGS:0000000000000000 [101536.249729] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [101536.249729] CR2: 000000000065b078 CR3: 0000000000201000 CR4: 00000000000006e0 [101536.249729] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [101536.249729] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [101536.249729] Call Trace: [101536.249729] [<ffffffffa04710cf>] ? btrfs_tree_lock+0x54/0x9e [btrfs] [101536.249729] [<ffffffffa0471022>] ? btrfs_wake_function+0x0/0x10 [btrfs] [101536.249729] [<ffffffffa0438104>] ? btrfs_lock_root_node+0x1d/0x4b [btrfs] [101536.249729] [<ffffffffa043e4c5>] ? btrfs_search_slot+0xc7/0x73e [btrfs] [101536.249729] [<ffffffffa043f11d>] ? btrfs_insert_empty_items+0x5e/0xa9 [btrfs] [101536.249729] [<ffffffffa0444f7a>] ? run_one_delayed_ref+0x164/0x42f [btrfs] [101536.249729] [<ffffffffa0445f75>] ? run_clustered_refs+0x237/0x2b4 [btrfs] [101536.249729] [<ffffffffa0478f85>] ? btrfs_find_ref_cluster+0xdc/0x115 [btrfs] [101536.249729] [<ffffffffa044609e>] ? btrfs_run_delayed_refs+0xac/0x195 [btrfs] [101536.249729] [<ffffffffa044e86e>] ? __btrfs_end_transaction+0x59/0xfe [btrfs] [101536.249729] [<ffffffffa044e92e>] ? btrfs_end_transaction+0xb/0xd [btrfs] [101536.249729] [<ffffffffa045418b>] ? btrfs_finish_ordered_io+0x224/0x24d [btrfs] [101536.249729] [<ffffffffa04541c4>] ? btrfs_writepage_end_io_hook+0x10/0x12 [btrfs] [101536.249729] [<ffffffffa0467599>] ? end_bio_extent_writepage+0xa3/0x18f [btrfs] [101536.249729] [<ffffffff8024276e>] ? del_timer_sync+0x14/0x20 [101536.249729] [<ffffffff802cbbee>] ? bio_endio+0x26/0x28 [101536.249729] [<ffffffffa044b5d6>] ? end_workqueue_fn+0x111/0x11e [btrfs] [101536.249729] [<ffffffffa046eff5>] ? worker_loop+0x67/0x1ee [btrfs] :> For the progs: > > git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git newformat >I should mention that I missed the part about the new user tools, so while these we newly formated filesystems, they were created with the old tools. These are both running 64bit. I plan to install the new tools and re-run. Steve> The main benefit of the new code is that backrefs on the extent > allocation tree use a fuzzier format. It basically means that we search > for the key in the extent allocation tree instead of providing an exact > backref to the parent block. > > This means we can predict how many blocks will be changed when changing > the extent allocation tree, and it makes enospc much less complex. It > is also significantly faster. > > For regular subvolume trees, a similar change is made as long as there > are no snapshots against a given block. This is the common case, and it > makes COW less expensive overall. > > Yan Zheng also worked out a way to free blocks during the transaction > without needing to do an explicit snapshot deletion on the old root when > the transaction was done. This gets rid of some complex caching code, > and fixes worst-case problems where btrfs could take a very very long > time to unmount. > > btrfs-vol -b is faster with the new code as well, he added caching of > high levels in the tree to speed things up. > > (Many kudos to Yan Zheng for all of this work!) > > -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 04, 2009 at 02:02:20PM -0500, Steven Pratt wrote:> Chris Mason wrote: >> Hello everyone, >> >> Yan Zheng has been doing some major surgery to the back references and >> extent allocation code, tackling bottlenecks in the code that tracks >> extents. It scales better with many snapshots and performs better in >> the common case of no snapshots at all. >> >> THE NEW CODE IS A FORWARD ROLLING DISK FORMAT CHANGE. This means it is >> compatible with the current btrfs disk format, but once you mount a >> filesystem with the new code, it WILL NO LONGER BE MOUNTABLE FROM OLD >> KERNELS. Old kernels spit out an error message when you try them on new >> format filesystems. >> >> This is a large change, and I''m hoping to have it stable in time for the >> 2.6.31 merge window. I''ve been testing it for about a week now, and >> haven''t been able to cause major problems yet. But, testing the >> compatibility with old format filesystems is the hard part, and >> everyone that pulls the new code should backup their data first. >> >> I''ve setup git branches called newformat where you can pull the new code. >> >> For the kernel (based on 2.6.30-rc7): >> >> git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git newformat >> >> > So I started the performance runs on this. The base tests completed fine > on the raid system and I will post results as soon as I can finish > postprocessing, but when I tried to do nodatacow that machine it crashed > pretty early. Here is console log:Thanks Steve. Just to clarify, which commit was the head of your git tree when you ran these tests? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 04, 2009 at 02:02:20PM -0500, Steven Pratt wrote:> Chris Mason wrote: >> Hello everyone, >> >> Yan Zheng has been doing some major surgery to the back references and >> extent allocation code, tackling bottlenecks in the code that tracks >> extents. It scales better with many snapshots and performs better in >> the common case of no snapshots at all. >> >> THE NEW CODE IS A FORWARD ROLLING DISK FORMAT CHANGE. This means it is >> compatible with the current btrfs disk format, but once you mount a >> filesystem with the new code, it WILL NO LONGER BE MOUNTABLE FROM OLD >> KERNELS. Old kernels spit out an error message when you try them on new >> format filesystems. >> >> This is a large change, and I''m hoping to have it stable in time for the >> 2.6.31 merge window. I''ve been testing it for about a week now, and >> haven''t been able to cause major problems yet. But, testing the >> compatibility with old format filesystems is the hard part, and >> everyone that pulls the new code should backup their data first. >> >> I''ve setup git branches called newformat where you can pull the new code. >> >> For the kernel (based on 2.6.30-rc7): >> >> git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git newformat >> >> > So I started the performance runs on this. The base tests completed fine > on the raid system and I will post results as soon as I can finish > postprocessing, but when I tried to do nodatacow that machine it crashed > pretty early. Here is console log:Hi Steve, Thanks again for hammering on these. Yan Zheng and I have both been trying to reproduce problems with nodatacow and with the database random write run. But, so far we haven''t been able to trigger any crashes. Do you see anything in your config or setup that is unusual? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Thu, Jun 04, 2009 at 02:02:20PM -0500, Steven Pratt wrote: > >> Chris Mason wrote: >> >>> Hello everyone, >>> >>> Yan Zheng has been doing some major surgery to the back references and >>> extent allocation code, tackling bottlenecks in the code that tracks >>> extents. It scales better with many snapshots and performs better in >>> the common case of no snapshots at all. >>> >>> THE NEW CODE IS A FORWARD ROLLING DISK FORMAT CHANGE. This means it is >>> compatible with the current btrfs disk format, but once you mount a >>> filesystem with the new code, it WILL NO LONGER BE MOUNTABLE FROM OLD >>> KERNELS. Old kernels spit out an error message when you try them on new >>> format filesystems. >>> >>> This is a large change, and I''m hoping to have it stable in time for the >>> 2.6.31 merge window. I''ve been testing it for about a week now, and >>> haven''t been able to cause major problems yet. But, testing the >>> compatibility with old format filesystems is the hard part, and >>> everyone that pulls the new code should backup their data first. >>> >>> I''ve setup git branches called newformat where you can pull the new code. >>> >>> For the kernel (based on 2.6.30-rc7): >>> >>> git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git newformat >>> >>> >>> >> So I started the performance runs on this. The base tests completed fine >> on the raid system and I will post results as soon as I can finish >> postprocessing, but when I tried to do nodatacow that machine it crashed >> pretty early. Here is console log: >> > > Hi Steve, > > Thanks again for hammering on these. Yan Zheng and I have both been > trying to reproduce problems with nodatacow and with the database random > write run. >So now that the raid machine is actually up, I discovered it got further than I thought on nodatacow. It did all the read tests, but appeared to died on 16 thread random write(not odirect). There were no messages logged to var/log/messages at all. Last I saw was : Jun 4 03:14:24 btrfs1 kernel: [65856.065491] btrfs: setting nodatacow Jun 4 15:24:45 btrfs1 syslogd 1.4.1: restart. Just dead until we rebooted machine later that day.> But, so far we haven''t been able to trigger any crashes. Do you see > anything in your config or setup that is unusual? >No, other than using the old mkfs with the new format. I''ve kicked off new runs to see if I hit the same issues Steve> -chris >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Steven Pratt wrote:> Chris Mason wrote: >> On Thu, Jun 04, 2009 at 02:02:20PM -0500, Steven Pratt wrote: >> >>> Chris Mason wrote: >>> >>>> Hello everyone, >>>> >>>> Yan Zheng has been doing some major surgery to the back references and >>>> extent allocation code, tackling bottlenecks in the code that tracks >>>> extents. It scales better with many snapshots and performs better in >>>> the common case of no snapshots at all. >>>> >>>> THE NEW CODE IS A FORWARD ROLLING DISK FORMAT CHANGE. This means >>>> it is >>>> compatible with the current btrfs disk format, but once you mount a >>>> filesystem with the new code, it WILL NO LONGER BE MOUNTABLE FROM OLD >>>> KERNELS. Old kernels spit out an error message when you try them >>>> on new >>>> format filesystems. >>>> >>>> This is a large change, and I''m hoping to have it stable in time >>>> for the >>>> 2.6.31 merge window. I''ve been testing it for about a week now, and >>>> haven''t been able to cause major problems yet. But, testing the >>>> compatibility with old format filesystems is the hard part, and >>>> everyone that pulls the new code should backup their data first. >>>> >>>> I''ve setup git branches called newformat where you can pull the new >>>> code. >>>> >>>> For the kernel (based on 2.6.30-rc7): >>>> >>>> git pull >>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git >>>> newformat >>>> >>>> >>> So I started the performance runs on this. The base tests completed >>> fine on the raid system and I will post results as soon as I can >>> finish postprocessing, but when I tried to do nodatacow that >>> machine it crashed pretty early. Here is console log: >>> >> >> Hi Steve, >> >> Thanks again for hammering on these. Yan Zheng and I have both been >> trying to reproduce problems with nodatacow and with the database random >> write run. >> > So now that the raid machine is actually up, I discovered it got > further than I thought on nodatacow. It did all the read tests, but > appeared to died on 16 thread random write(not odirect). There were no > messages logged to var/log/messages at all. Last I saw was : > > Jun 4 03:14:24 btrfs1 kernel: [65856.065491] btrfs: setting nodatacow > Jun 4 15:24:45 btrfs1 syslogd 1.4.1: restart. > > Just dead until we rebooted machine later that day.So the raid system complete the re-run of the nodatacow runs without error. So still no idea what happened on this box the first time around. As for the single disk system, it died during the random write test again, but it now looks like we might have a real HW failure. This time we see SCSI error messages. I have replaced the test disks and will try one more time. The net is, I would hold off digging too much into this as even I don''t have any repeatable errors. Steve> >> But, so far we haven''t been able to trigger any crashes. Do you see >> anything in your config or setup that is unusual? >> > No, other than using the old mkfs with the new format. I''ve kicked > off new runs to see if I hit the same issues > > Steve >> -chris >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 05, 2009 at 04:27:55PM -0500, Steven Pratt wrote:> Steven Pratt wrote: >> Chris Mason wrote: >>> On Thu, Jun 04, 2009 at 02:02:20PM -0500, Steven Pratt wrote: >>> >>>> Chris Mason wrote: >>>> >>>>> Hello everyone, >>>>> >>>>> Yan Zheng has been doing some major surgery to the back references and >>>>> extent allocation code, tackling bottlenecks in the code that tracks >>>>> extents. It scales better with many snapshots and performs better in >>>>> the common case of no snapshots at all. >>>>> >>>>> THE NEW CODE IS A FORWARD ROLLING DISK FORMAT CHANGE. This means >>>>> it is >>>>> compatible with the current btrfs disk format, but once you mount a >>>>> filesystem with the new code, it WILL NO LONGER BE MOUNTABLE FROM OLD >>>>> KERNELS. Old kernels spit out an error message when you try them >>>>> on new >>>>> format filesystems. >>>>> >>>>> This is a large change, and I''m hoping to have it stable in time >>>>> for the >>>>> 2.6.31 merge window. I''ve been testing it for about a week now, and >>>>> haven''t been able to cause major problems yet. But, testing the >>>>> compatibility with old format filesystems is the hard part, and >>>>> everyone that pulls the new code should backup their data first. >>>>> >>>>> I''ve setup git branches called newformat where you can pull the >>>>> new code. >>>>> >>>>> For the kernel (based on 2.6.30-rc7): >>>>> >>>>> git pull >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git >>>>> newformat >>>>> >>>>> >>>> So I started the performance runs on this. The base tests completed >>>> fine on the raid system and I will post results as soon as I can >>>> finish postprocessing, but when I tried to do nodatacow that >>>> machine it crashed pretty early. Here is console log: >>>> >>> >>> Hi Steve, >>> >>> Thanks again for hammering on these. Yan Zheng and I have both been >>> trying to reproduce problems with nodatacow and with the database random >>> write run. >>> >> So now that the raid machine is actually up, I discovered it got >> further than I thought on nodatacow. It did all the read tests, but >> appeared to died on 16 thread random write(not odirect). There were no >> messages logged to var/log/messages at all. Last I saw was : >> >> Jun 4 03:14:24 btrfs1 kernel: [65856.065491] btrfs: setting nodatacow >> Jun 4 15:24:45 btrfs1 syslogd 1.4.1: restart. >> >> Just dead until we rebooted machine later that day. > > So the raid system complete the re-run of the nodatacow runs without > error. So still no idea what happened on this box the first time > around. As for the single disk system, it died during the random write > test again, but it now looks like we might have a real HW failure. This > time we see SCSI error messages. I have replaced the test disks and > will try one more time. > > The net is, I would hold off digging too much into this as even I don''t > have any repeatable errors.Thanks for rerunning all of this, appreciate the update. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Fri, Jun 05, 2009 at 04:27:55PM -0500, Steven Pratt wrote: > >> Steven Pratt wrote: >> >>> Chris Mason wrote: >>> >>>> On Thu, Jun 04, 2009 at 02:02:20PM -0500, Steven Pratt wrote: >>>> >>>> >>>>> Chris Mason wrote: >>>>> >>>>> >>>>>> Hello everyone, >>>>>> >>>>>> Yan Zheng has been doing some major surgery to the back references and >>>>>> extent allocation code, tackling bottlenecks in the code that tracks >>>>>> extents. It scales better with many snapshots and performs better in >>>>>> the common case of no snapshots at all. >>>>>> >>>>>> THE NEW CODE IS A FORWARD ROLLING DISK FORMAT CHANGE. This means >>>>>> it is >>>>>> compatible with the current btrfs disk format, but once you mount a >>>>>> filesystem with the new code, it WILL NO LONGER BE MOUNTABLE FROM OLD >>>>>> KERNELS. Old kernels spit out an error message when you try them >>>>>> on new >>>>>> format filesystems. >>>>>> >>>>>> This is a large change, and I''m hoping to have it stable in time >>>>>> for the >>>>>> 2.6.31 merge window. I''ve been testing it for about a week now, and >>>>>> haven''t been able to cause major problems yet. But, testing the >>>>>> compatibility with old format filesystems is the hard part, and >>>>>> everyone that pulls the new code should backup their data first. >>>>>> >>>>>> I''ve setup git branches called newformat where you can pull the >>>>>> new code. >>>>>> >>>>>> For the kernel (based on 2.6.30-rc7): >>>>>> >>>>>> git pull >>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git >>>>>> newformat >>>>>> >>>>>> >>>>>> >>>>> So I started the performance runs on this. The base tests completed >>>>> fine on the raid system and I will post results as soon as I can >>>>> finish postprocessing, but when I tried to do nodatacow that >>>>> machine it crashed pretty early. Here is console log: >>>>> >>>>> >>>> Hi Steve, >>>> >>>> Thanks again for hammering on these. Yan Zheng and I have both been >>>> trying to reproduce problems with nodatacow and with the database random >>>> write run. >>>> >>>> >>> So now that the raid machine is actually up, I discovered it got >>> further than I thought on nodatacow. It did all the read tests, but >>> appeared to died on 16 thread random write(not odirect). There were no >>> messages logged to var/log/messages at all. Last I saw was : >>> >>> Jun 4 03:14:24 btrfs1 kernel: [65856.065491] btrfs: setting nodatacow >>> Jun 4 15:24:45 btrfs1 syslogd 1.4.1: restart. >>> >>> Just dead until we rebooted machine later that day. >>> >> So the raid system complete the re-run of the nodatacow runs without >> error. So still no idea what happened on this box the first time >> around. As for the single disk system, it died during the random write >> test again, but it now looks like we might have a real HW failure. This >> time we see SCSI error messages. I have replaced the test disks and >> will try one more time. >> >> The net is, I would hold off digging too much into this as even I don''t >> have any repeatable errors. >> > > Thanks for rerunning all of this, appreciate the update. > >No problem. Raid results are uploading to http://btrfs.boxacle.net/repository/raid/history/History.html now. There were massive improvements in the random write workloads, especially with cow enabled!! MailServer had moderate perf gains, but dramatic decrease in CPU utilization, so this is very good as well. The only regression I see is on large file creates, CPU is up 200% or more while performance is fairly flat. btrfs_tree_lock now dominates the profile. I am still having issues on the single disk system, which I am still not sure if it is btrfs or HW, but I am off on a family vacation tomorrow so it will have to wait for a week or so. Steve> -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Roy Sigurd Karlsbakk
2009-Jun-07 11:50 UTC
Re: New experimental btrfs branch ready for testing
On 1. juni. 2009, at 23.04, Chris Mason wrote:> I''ve setup git branches called newformat where you can pull the new > code. > > For the kernel (based on 2.6.30-rc7): > > git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs- > unstable.git newformat# git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs- unstable.git newformat fatal: Not a git repository> For the progs: > > git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs- > progs-unstable.git newformat# git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs- progs-unstable.git newformat fatal: Not a git repository>Have this code been removed, or is it me doing something funny? roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jun 07, 2009 at 01:50:27PM +0200, Roy Sigurd Karlsbakk wrote:> On 1. juni. 2009, at 23.04, Chris Mason wrote: > >> I''ve setup git branches called newformat where you can pull the new >> code. >> >> For the kernel (based on 2.6.30-rc7): >> >> git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs- >> unstable.git newformat > > # git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs- > unstable.git newformat > fatal: Not a git repository > >> For the progs: >> >> git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs- >> progs-unstable.git newformat > > # git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs- > progs-unstable.git newformat > fatal: Not a git repository > >> > Have this code been removed, or is it me doing something funny?You''re doing something funny. I''m guessing you don''t already have a copy of the btrfs repositories, so you should be using clone instead of pull. If you do have them, cd into them. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2009/6/2 Chris Mason <chris.mason@oracle.com>> > Hello everyone, > > Yan Zheng has been doing some major surgery to the back references and > extent allocation code, tackling bottlenecks in the code that tracks > extents. It scales better with many snapshots and performs better in > the common case of no snapshots at all. > > THE NEW CODE IS A FORWARD ROLLING DISK FORMAT CHANGE. This means it is > compatible with the current btrfs disk format, but once you mount a > filesystem with the new code, it WILL NO LONGER BE MOUNTABLE FROM OLD > KERNELS. Old kernels spit out an error message when you try them on new > format filesystems. >Hello, everyone I have a minor disk format change for the new format. The disk format change makes snapshot dropping more efficient. The format change only affects FS has been balanced. If you are testing the new format, please don''t use btrfs-vol -b or btrfs-vol -r. If you have already used btrfs-vol -b or btrfs-vol -r, please backup your data. Regards Yan Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jun 06, 2009 at 11:38:45AM -0500, Steven Pratt wrote:> > No problem. Raid results are uploading to > http://btrfs.boxacle.net/repository/raid/history/History.html now. > There were massive improvements in the random write workloads, > especially with cow enabled!! MailServer had moderate perf gains, but > dramatic decrease in CPU utilization, so this is very good as well. > > The only regression I see is on large file creates, CPU is up 200% or > more while performance is fairly flat. btrfs_tree_lock now dominates > the profile.I''m not able to reproduce the btrfs_tree_lock usage that you''re seeing. Could you please use the callgraph option to oprofile? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Sat, Jun 06, 2009 at 11:38:45AM -0500, Steven Pratt wrote: > >> No problem. Raid results are uploading to >> http://btrfs.boxacle.net/repository/raid/history/History.html now. >> There were massive improvements in the random write workloads, >> especially with cow enabled!! MailServer had moderate perf gains, but >> dramatic decrease in CPU utilization, so this is very good as well. >> >> The only regression I see is on large file creates, CPU is up 200% or >> more while performance is fairly flat. btrfs_tree_lock now dominates >> the profile. >> > > I''m not able to reproduce the btrfs_tree_lock usage that you''re seeing. > Could you please use the callgraph option to oprofile? >Ok, back from vacation and have re-engaged my brain :-) Was thinking I would have to re-run this for you, but we already have callgraph data for all the runs. For the 128 thread create workload it is here: http://btrfs.boxacle.net/repository/raid//2-6-30-rc7-newformat/btrfs-6-2-newformat/btrfs1.ffsb.large_file_creates__threads_0128.09-06-04_01.23.30/analysis/oprofile.breakout.001/oprofile-callgraph Steve> -chris >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html