Hi everyone, I''ve uploaded an experimental release of the raid5/6 support to git, in branches named raid56-experimental. This is based on David Woodhouse''s initial implementation (thanks Dave!). git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git raid56-experimental git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git raid56-experimental These are working well for me, but I''m sure I''ve missed at least one or two problems. Most importantly, the kernel side of things can have inconsistent parity if you crash or lower power. I''m adding new code to fix that right now, it''s the big missing piece. But, I wanted to give everyone the chance to test what I have while I''m finishing off the last few details. Also missing: * Support for scrub repairing bad blocks. This is not difficult, we just need to make a way for scrub to lock stripes and rewrite the whole stripe with proper parity. * Support for discard. The discard code needs to discard entire stripes. * Progs support for parity rebuild. Missing drives upset the progs today, but the kernel does rebuild parity properly. * Planned support for N-way mirroring (triple mirror raid1) isn''t included yet. With all those warnings out of the way, how does it work? The original plan was to base read/modify/write cycles at high levels in the filesystem, so that we always gave full stripe writes down to raid56 layers. But this had a few problems, especially when you start thinking about converting from one stripe size to another. It doesn''t fit with the delayed allocation model where we pick physical extents for a given operation as late as we possibly can. Instead I''m doing read/modify/write when we map bios down to the individual drives. This allows blocks from multiple files to share a stripe, and it allows us to have metadata blocks smaller than a full stripe. That''s important if you don''t want to spin every disk for each metadata read. This does sound quite a lot like MD raid, and that''s because it is. By doing the raid inside of Btrfs, we''re able to use different raid levels for metadata vs data, and we''re able to force parity rebuilds when crcs don''t match. Also management operations such as restriping and adding/removing drives are able to hook into the filesystem transactions. Longer term we''ll be able to skip reads on blocks that aren''t allocated and do other connections between raid56 and the FS metadata. I''ve spent a long time running different performance numbers, but there are many benchmarks left to run. The matrix of different configurations is fairly large, with btrfs-raid56 vs MD-raid56 vs Btrfs-on-MD-raid56, and then comparing all the basic workloads. Before I dive into numbers, I want to describe a few moving pieces. Stripe cache -- This avoids read/modify/write cycles with an LRU of recently written stripes. Picture a database that does adjacent synchronous 4K writes (say a log record and a commit block). We want to make sure we don''t repeat read/modify/writes for the commit block after writing the log block. In btrfs the stripe cache changes because we''re doing COW. Hopefully we are able to collect writes from multiple processes into a full stripe and do fewer read/modify/write cycles. But, we still need the cache. The cache in btrfs defaults to 1024 stripes and can''t (yet) be tuned. In MD it can be tuned up to 32768 stripes. In the btrfs code, the stripe cache is the director in a state machine that pulls stripes from initial submission to completion. It coordinates merging stripes, parity rebuild and handing off the stripe lock to the next bio. Plugging -- The on stack plugging code has a slick way for anyone in the IO stack to participate in plugging. Btrfs is using this to collect partial stripe writes in hopes of merging them into full stripes. When the kernel code unplugs, we sort, merge and fire off the IOs. MD has a plugging callback as well. Parity calculations -- For full stripes, Btrfs does P/Q calculations at IO submission time without handing off to helper threads. The code uses the synchronous xor/memcpy/raid6 lib apis. For sub-stripe writes, Btrfs kicks the work off to its own helper threads and uses the same synchronous apis. I''m definitely open to trying out the ioat code, but so far I don''t see the P/Q math as a real bottleneck. Everyone who made it this far gets to see benchmarks! I''ve run these on two different systems. 1) A large HP DL380 with two sockets and 4TB of flash. The flash is spread over 4 drives and in a raid0 run it can do 5GB/s streaming writes. This machine has the IOAT async raid engine. 2) A smaller single socket box with 4 spindles and 2 fusionio drives. No raid offload here. This box can do 2.5GB/s streaming writes. These are all on 3.7.0 with MD created with -c 64 and --assume-clean. I upped the MD stripe cache to 32768, but didn''t include Shaohua''s patches to parallelize the MD parity calculations. I''ll do those runs after I have the next round of btrfs changes done. Lets start with an easy benchmark: machine #2 flash broken up into 8 logical volumes and then raid5 created on top (64K stripe size). Single dd doing streaming full stripe writes: dd if=/dev/zero of=/mnt/oo bs=1344K oflag=direct count=4096 Btrfs -- 604MB/s MD -- 162MB/s My guess is the performance difference here is coming from latencies related to handing off parity to helpers. Btrfs is doing everything inline and MD is handing off. fs/direct-io.c is sending down partial stripes (one IO per 64 pages), but our plugging callbacks let us collect them. Neither MD or Btrfs are doing any reads here. Now for something a little bigger: machine #1 with all 4 drives configured in raid6. This one is using fio to do a streaming aio/dio write of large full stripes. The numbers below are from blktrace. Since we''re doing raid6 over 4 drives, half our IO was for parity. The actual tput seen by fio is 1/2 of this. The MD runs are going directly to MD, no filesystem involved. MD -- 800MB/s very little system time http://masoncoding.com/mason/benchmark/btrfs-raid6/md-raid6-full-stripe-tput.png http://masoncoding.com/mason/benchmark/btrfs-raid6/md-raid6-full-stripe-sys.png Btrfs -- 3.8GB/s one CPU mostly pegged http://masoncoding.com/mason/benchmark/btrfs-raid6/btrfs-full-stripe-tput.png http://masoncoding.com/mason/benchmark/btrfs-raid6/btrfs-full-stripe-sys.png That one CPU is handling interrupts for the flash. I spent some time trying to figure out why MD was doing reads in this run, but I wasn''t able to nail it down. Long story short, I spent a long time tuning for streaming writes on flash. MD isn''t CPU bound in these runs, and latencytop shows it is waiting for room in its stripe cache. Ok, but what about read/modify/write? Machine #2 with fio doing 32K writes onto raid5 Btrfs -- 380MB/s seen by fio MD -- 174MB/s seen by fio http://masoncoding.com/mason/benchmark/btrfs-raid6/btrfs-32K-write-raid5-full.png http://masoncoding.com/mason/benchmark/btrfs-raid6/md-raid5-32K.png For the Btrfs run, I filled the disk with 8 files and then deleted one of them. The end result made it impossible for btrfs to ever allocate a full stripe, even when it was doing COW. So every 32K write triggered a read/modify/write cycle. MD was doing rmw on every IO as well. It''s interesting that MD is doing a 1:1 read/write while btrfs is doing more reads than writes. Some of that is metadata required for the IO. How does Btrfs do at 32K sub stripe writes when the FS is empty? http://masoncoding.com/mason/benchmark/btrfs-raid6/btrfs-32K-write-raid5-empty.png COW lets us collect 32K writes from multiple procs into a full stripe, so we can avoid the rmw cycle some of the time. It''s faster, but only lasts while the space is free. Metadata intensive workloads hit the read/modify/write code much harder, and are even more latency sensitive than O_DIRECT. To test this, I used fs_mark, both on spindles and on flash. The interesting thing is that on flash, MD was within 15% of the Btrfs number. The fs_mark run was actually CPU bound creating new files in Btrfs, so once we used flash the storage wasn''t the bottleneck any more. Spindles looked a little different. For these runs I tested btrfs on top of MD vs btrfs raid5. http://masoncoding.com/mason/benchmark/btrfs-raid5/btrfs-fsmark-md-raid5-spindle.png http://masoncoding.com/mason/benchmark/btrfs-raid5/btrfs-fsmark-raid5-spindle.png Creating 12 million files on Btrfs raid5 took 226 seconds, vs 485 seconds on MD. In general MD is doing more reads for the same workload. I don''t have a great explanation for this yet but the Btrfs stripe cache may have a bigger window for merging concurrent IOs into the same stripe. Ok, that''s enough for now, happy testing everyone. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Chris, I''ve been keen for raid5/6 in btrfs since I heard of it. I cannot give you any feedback, but I''d like to take the opportunity to thank you -and all contributors (thinking of David for the raid) for your work. Regards, Hendrik -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
@@ -1389,6 +1392,14 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) } btrfs_dev_replace_unlock(&root->fs_info->dev_replace); + if ((all_avail & (BTRFS_BLOCK_GROUP_RAID5 | + BTRFS_BLOCK_GROUP_RAID6) && num_devices <= 3)) { + printk(KERN_ERR "btrfs: unable to go below three devices " + "on raid5 or raid6\n"); + ret = -EINVAL; + goto out; + } + if ((all_avail & BTRFS_BLOCK_GROUP_RAID10) && num_devices <= 4) { printk(KERN_ERR "btrfs: unable to go below four devices " "on raid10\n"); @@ -1403,6 +1414,21 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) goto out; } + if ((all_avail & BTRFS_BLOCK_GROUP_RAID5) && + root->fs_info->fs_devices->rw_devices <= 2) { + printk(KERN_ERR "btrfs: unable to go below two " + "devices on raid5\n"); + ret = -EINVAL; + goto out; + } + if ((all_avail & BTRFS_BLOCK_GROUP_RAID6) && + root->fs_info->fs_devices->rw_devices <= 3) { + printk(KERN_ERR "btrfs: unable to go below three " + "devices on raid6\n"); + ret = -EINVAL; + goto out; + } + if (strcmp(device_path, "missing") == 0) { struct list_head *devices; struct btrfs_device *tmp; This seems inconsistent? -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 04, 2013 at 02:42:24PM -0700, H. Peter Anvin wrote:> @@ -1389,6 +1392,14 @@ int btrfs_rm_device(struct btrfs_root *root, char > *device_path) > } > btrfs_dev_replace_unlock(&root->fs_info->dev_replace); > > + if ((all_avail & (BTRFS_BLOCK_GROUP_RAID5 | > + BTRFS_BLOCK_GROUP_RAID6) && num_devices <= 3)) { > + printk(KERN_ERR "btrfs: unable to go below three devices " > + "on raid5 or raid6\n"); > + ret = -EINVAL; > + goto out; > + } > + > if ((all_avail & BTRFS_BLOCK_GROUP_RAID10) && num_devices <= 4) { > printk(KERN_ERR "btrfs: unable to go below four devices " > "on raid10\n"); > @@ -1403,6 +1414,21 @@ int btrfs_rm_device(struct btrfs_root *root, char > *device_path) > goto out; > } > > + if ((all_avail & BTRFS_BLOCK_GROUP_RAID5) && > + root->fs_info->fs_devices->rw_devices <= 2) { > + printk(KERN_ERR "btrfs: unable to go below two " > + "devices on raid5\n"); > + ret = -EINVAL; > + goto out; > + } > + if ((all_avail & BTRFS_BLOCK_GROUP_RAID6) && > + root->fs_info->fs_devices->rw_devices <= 3) { > + printk(KERN_ERR "btrfs: unable to go below three " > + "devices on raid6\n"); > + ret = -EINVAL; > + goto out; > + } > + > if (strcmp(device_path, "missing") == 0) { > struct list_head *devices; > struct btrfs_device *tmp; > > > This seems inconsistent?Whoops, missed that one. Thanks! -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Also, a 2-member raid5 or 3-member raid6 are a raid1 and can be treated as such. Chris Mason <chris.mason@fusionio.com> wrote:>On Mon, Feb 04, 2013 at 02:42:24PM -0700, H. Peter Anvin wrote: >> @@ -1389,6 +1392,14 @@ int btrfs_rm_device(struct btrfs_root *root, >char >> *device_path) >> } >> btrfs_dev_replace_unlock(&root->fs_info->dev_replace); >> >> + if ((all_avail & (BTRFS_BLOCK_GROUP_RAID5 | >> + BTRFS_BLOCK_GROUP_RAID6) && num_devices <= 3)) { >> + printk(KERN_ERR "btrfs: unable to go below three devices " >> + "on raid5 or raid6\n"); >> + ret = -EINVAL; >> + goto out; >> + } >> + >> if ((all_avail & BTRFS_BLOCK_GROUP_RAID10) && num_devices <= 4) { >> printk(KERN_ERR "btrfs: unable to go below four devices " >> "on raid10\n"); >> @@ -1403,6 +1414,21 @@ int btrfs_rm_device(struct btrfs_root *root, >char >> *device_path) >> goto out; >> } >> >> + if ((all_avail & BTRFS_BLOCK_GROUP_RAID5) && >> + root->fs_info->fs_devices->rw_devices <= 2) { >> + printk(KERN_ERR "btrfs: unable to go below two " >> + "devices on raid5\n"); >> + ret = -EINVAL; >> + goto out; >> + } >> + if ((all_avail & BTRFS_BLOCK_GROUP_RAID6) && >> + root->fs_info->fs_devices->rw_devices <= 3) { >> + printk(KERN_ERR "btrfs: unable to go below three " >> + "devices on raid6\n"); >> + ret = -EINVAL; >> + goto out; >> + } >> + >> if (strcmp(device_path, "missing") == 0) { >> struct list_head *devices; >> struct btrfs_device *tmp; >> >> >> This seems inconsistent? > >Whoops, missed that one. Thanks! > >-chris-- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I felt like having a small play with this stuff, as I''ve been wanting it for so long :) But apparently I''ve made some incredibly newb error. I used the following two lines to check out the code: git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git raid56-experimental git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git raid56-experimental-progs Then I did not very much to compile both of them (installed lots and lots of packages that various places told me would be needed so they''d both compile) finishing up with a "sudo make install" for both the kernel and the tools. Rebooting miracuously it came up with the new kernel and uname -a assures me that I have a new kernel running: btrfs@ubuntu:/kernel/raid56-experimental$ uname -a Linux ubuntu 3.6.0+ #1 SMP Tue Feb 5 12:26:03 EST 2013 x86_64 x86_64 x86_64 GNU/Linux but 3.6.0 sounds rather low, but it is newer than Ubuntu 12.10''s 3.5 so I believe I am running the kernel I just compiled Where things fail is that I can figure out how to make a raid5 btrfs, I''m certain I''m using the mkfs.btrfs that I just compiled (by explicitly calling it in the make folder) but it wont recognise what I assume the parameter to be: btrfs@ubuntu:/kernel/raid56-experimental-progs$ ./mkfs.btrfs -m raid5 -d raid5 /dev/sd[bcdef] Unknown profile raid5 Which flavour of newb am I today? PS: I use newb in a very friendly way, I feel no shame over that term :) On Tue, Feb 5, 2013 at 1:26 PM, H. Peter Anvin <hpa@zytor.com> wrote:> Also, a 2-member raid5 or 3-member raid6 are a raid1 and can be treated as such. > > Chris Mason <chris.mason@fusionio.com> wrote: > >>On Mon, Feb 04, 2013 at 02:42:24PM -0700, H. Peter Anvin wrote: >>> @@ -1389,6 +1392,14 @@ int btrfs_rm_device(struct btrfs_root *root, >>char >>> *device_path) >>> } >>> btrfs_dev_replace_unlock(&root->fs_info->dev_replace); >>> >>> + if ((all_avail & (BTRFS_BLOCK_GROUP_RAID5 | >>> + BTRFS_BLOCK_GROUP_RAID6) && num_devices <= 3)) { >>> + printk(KERN_ERR "btrfs: unable to go below three devices " >>> + "on raid5 or raid6\n"); >>> + ret = -EINVAL; >>> + goto out; >>> + } >>> + >>> if ((all_avail & BTRFS_BLOCK_GROUP_RAID10) && num_devices <= 4) { >>> printk(KERN_ERR "btrfs: unable to go below four devices " >>> "on raid10\n"); >>> @@ -1403,6 +1414,21 @@ int btrfs_rm_device(struct btrfs_root *root, >>char >>> *device_path) >>> goto out; >>> } >>> >>> + if ((all_avail & BTRFS_BLOCK_GROUP_RAID5) && >>> + root->fs_info->fs_devices->rw_devices <= 2) { >>> + printk(KERN_ERR "btrfs: unable to go below two " >>> + "devices on raid5\n"); >>> + ret = -EINVAL; >>> + goto out; >>> + } >>> + if ((all_avail & BTRFS_BLOCK_GROUP_RAID6) && >>> + root->fs_info->fs_devices->rw_devices <= 3) { >>> + printk(KERN_ERR "btrfs: unable to go below three " >>> + "devices on raid6\n"); >>> + ret = -EINVAL; >>> + goto out; >>> + } >>> + >>> if (strcmp(device_path, "missing") == 0) { >>> struct list_head *devices; >>> struct btrfs_device *tmp; >>> >>> >>> This seems inconsistent? >> >>Whoops, missed that one. Thanks! >> >>-chris > > -- > Sent from my mobile phone. Please excuse brevity and lack of formatting. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gareth@cerberos.id.au - www.rockpaperdynamite.wordpress.com "Dear God, I would like to file a bug report" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
The last argument should be the directory you want to clone into. Use ''-b <branch>'' to specify the branch you want to clone. I''m pretty sure you''ve compiled just the master branch of both linux-btrfs and btrfs-progs. On Mon, Feb 4, 2013 at 8:59 PM, Gareth Pye <gareth@cerberos.id.au> wrote:> I felt like having a small play with this stuff, as I''ve been wanting > it for so long :) > > But apparently I''ve made some incredibly newb error. > > I used the following two lines to check out the code: > git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git > raid56-experimental > git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git > raid56-experimental-progs > > Then I did not very much to compile both of them (installed lots and > lots of packages that various places told me would be needed so they''d > both compile) finishing up with a "sudo make install" for both the > kernel and the tools. > Rebooting miracuously it came up with the new kernel and uname -a > assures me that I have a new kernel running: > btrfs@ubuntu:/kernel/raid56-experimental$ uname -a > Linux ubuntu 3.6.0+ #1 SMP Tue Feb 5 12:26:03 EST 2013 x86_64 x86_64 > x86_64 GNU/Linux > but 3.6.0 sounds rather low, but it is newer than Ubuntu 12.10''s 3.5 > so I believe I am running the kernel I just compiled > > Where things fail is that I can figure out how to make a raid5 btrfs, > I''m certain I''m using the mkfs.btrfs that I just compiled (by > explicitly calling it in the make folder) but it wont recognise what I > assume the parameter to be: > btrfs@ubuntu:/kernel/raid56-experimental-progs$ ./mkfs.btrfs -m raid5 > -d raid5 /dev/sd[bcdef] > Unknown profile raid5 > > Which flavour of newb am I today? > > PS: I use newb in a very friendly way, I feel no shame over that term :) > > On Tue, Feb 5, 2013 at 1:26 PM, H. Peter Anvin <hpa@zytor.com> wrote: >> Also, a 2-member raid5 or 3-member raid6 are a raid1 and can be treated as such. >> >> Chris Mason <chris.mason@fusionio.com> wrote: >> >>>On Mon, Feb 04, 2013 at 02:42:24PM -0700, H. Peter Anvin wrote: >>>> @@ -1389,6 +1392,14 @@ int btrfs_rm_device(struct btrfs_root *root, >>>char >>>> *device_path) >>>> } >>>> btrfs_dev_replace_unlock(&root->fs_info->dev_replace); >>>> >>>> + if ((all_avail & (BTRFS_BLOCK_GROUP_RAID5 | >>>> + BTRFS_BLOCK_GROUP_RAID6) && num_devices <= 3)) { >>>> + printk(KERN_ERR "btrfs: unable to go below three devices " >>>> + "on raid5 or raid6\n"); >>>> + ret = -EINVAL; >>>> + goto out; >>>> + } >>>> + >>>> if ((all_avail & BTRFS_BLOCK_GROUP_RAID10) && num_devices <= 4) { >>>> printk(KERN_ERR "btrfs: unable to go below four devices " >>>> "on raid10\n"); >>>> @@ -1403,6 +1414,21 @@ int btrfs_rm_device(struct btrfs_root *root, >>>char >>>> *device_path) >>>> goto out; >>>> } >>>> >>>> + if ((all_avail & BTRFS_BLOCK_GROUP_RAID5) && >>>> + root->fs_info->fs_devices->rw_devices <= 2) { >>>> + printk(KERN_ERR "btrfs: unable to go below two " >>>> + "devices on raid5\n"); >>>> + ret = -EINVAL; >>>> + goto out; >>>> + } >>>> + if ((all_avail & BTRFS_BLOCK_GROUP_RAID6) && >>>> + root->fs_info->fs_devices->rw_devices <= 3) { >>>> + printk(KERN_ERR "btrfs: unable to go below three " >>>> + "devices on raid6\n"); >>>> + ret = -EINVAL; >>>> + goto out; >>>> + } >>>> + >>>> if (strcmp(device_path, "missing") == 0) { >>>> struct list_head *devices; >>>> struct btrfs_device *tmp; >>>> >>>> >>>> This seems inconsistent? >>> >>>Whoops, missed that one. Thanks! >>> >>>-chris >> >> -- >> Sent from my mobile phone. Please excuse brevity and lack of formatting. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Gareth Pye > Level 2 Judge, Melbourne, Australia > Australian MTG Forum: mtgau.com > gareth@cerberos.id.au - www.rockpaperdynamite.wordpress.com > "Dear God, I would like to file a bug report" > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Thank you, that makes a lot of sense :) It''s been a good day, I''ve learnt something :) On Tue, Feb 5, 2013 at 4:29 PM, Chester <somethingsome2000@gmail.com> wrote:> The last argument should be the directory you want to clone into. Use > ''-b <branch>'' to specify the branch you want to clone. I''m pretty sure > you''ve compiled just the master branch of both linux-btrfs and > btrfs-progs. > > On Mon, Feb 4, 2013 at 8:59 PM, Gareth Pye <gareth@cerberos.id.au> wrote: >> I felt like having a small play with this stuff, as I''ve been wanting >> it for so long :) >> >> But apparently I''ve made some incredibly newb error. >> >> I used the following two lines to check out the code: >> git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git >> raid56-experimental >> git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git >> raid56-experimental-progs >> >> Then I did not very much to compile both of them (installed lots and >> lots of packages that various places told me would be needed so they''d >> both compile) finishing up with a "sudo make install" for both the >> kernel and the tools. >> Rebooting miracuously it came up with the new kernel and uname -a >> assures me that I have a new kernel running: >> btrfs@ubuntu:/kernel/raid56-experimental$ uname -a >> Linux ubuntu 3.6.0+ #1 SMP Tue Feb 5 12:26:03 EST 2013 x86_64 x86_64 >> x86_64 GNU/Linux >> but 3.6.0 sounds rather low, but it is newer than Ubuntu 12.10''s 3.5 >> so I believe I am running the kernel I just compiled >> >> Where things fail is that I can figure out how to make a raid5 btrfs, >> I''m certain I''m using the mkfs.btrfs that I just compiled (by >> explicitly calling it in the make folder) but it wont recognise what I >> assume the parameter to be: >> btrfs@ubuntu:/kernel/raid56-experimental-progs$ ./mkfs.btrfs -m raid5 >> -d raid5 /dev/sd[bcdef] >> Unknown profile raid5 >> >> Which flavour of newb am I today? >> >> PS: I use newb in a very friendly way, I feel no shame over that term :) >> >> On Tue, Feb 5, 2013 at 1:26 PM, H. Peter Anvin <hpa@zytor.com> wrote: >>> Also, a 2-member raid5 or 3-member raid6 are a raid1 and can be treated as such. >>> >>> Chris Mason <chris.mason@fusionio.com> wrote: >>> >>>>On Mon, Feb 04, 2013 at 02:42:24PM -0700, H. Peter Anvin wrote: >>>>> @@ -1389,6 +1392,14 @@ int btrfs_rm_device(struct btrfs_root *root, >>>>char >>>>> *device_path) >>>>> } >>>>> btrfs_dev_replace_unlock(&root->fs_info->dev_replace); >>>>> >>>>> + if ((all_avail & (BTRFS_BLOCK_GROUP_RAID5 | >>>>> + BTRFS_BLOCK_GROUP_RAID6) && num_devices <= 3)) { >>>>> + printk(KERN_ERR "btrfs: unable to go below three devices " >>>>> + "on raid5 or raid6\n"); >>>>> + ret = -EINVAL; >>>>> + goto out; >>>>> + } >>>>> + >>>>> if ((all_avail & BTRFS_BLOCK_GROUP_RAID10) && num_devices <= 4) { >>>>> printk(KERN_ERR "btrfs: unable to go below four devices " >>>>> "on raid10\n"); >>>>> @@ -1403,6 +1414,21 @@ int btrfs_rm_device(struct btrfs_root *root, >>>>char >>>>> *device_path) >>>>> goto out; >>>>> } >>>>> >>>>> + if ((all_avail & BTRFS_BLOCK_GROUP_RAID5) && >>>>> + root->fs_info->fs_devices->rw_devices <= 2) { >>>>> + printk(KERN_ERR "btrfs: unable to go below two " >>>>> + "devices on raid5\n"); >>>>> + ret = -EINVAL; >>>>> + goto out; >>>>> + } >>>>> + if ((all_avail & BTRFS_BLOCK_GROUP_RAID6) && >>>>> + root->fs_info->fs_devices->rw_devices <= 3) { >>>>> + printk(KERN_ERR "btrfs: unable to go below three " >>>>> + "devices on raid6\n"); >>>>> + ret = -EINVAL; >>>>> + goto out; >>>>> + } >>>>> + >>>>> if (strcmp(device_path, "missing") == 0) { >>>>> struct list_head *devices; >>>>> struct btrfs_device *tmp; >>>>> >>>>> >>>>> This seems inconsistent? >>>> >>>>Whoops, missed that one. Thanks! >>>> >>>>-chris >>> >>> -- >>> Sent from my mobile phone. Please excuse brevity and lack of formatting. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> Gareth Pye >> Level 2 Judge, Melbourne, Australia >> Australian MTG Forum: mtgau.com >> gareth@cerberos.id.au - www.rockpaperdynamite.wordpress.com >> "Dear God, I would like to file a bug report" >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html-- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gareth@cerberos.id.au - www.rockpaperdynamite.wordpress.com "Dear God, I would like to file a bug report" On Tue, Feb 5, 2013 at 4:29 PM, Chester <somethingsome2000@gmail.com> wrote:> The last argument should be the directory you want to clone into. Use > ''-b <branch>'' to specify the branch you want to clone. I''m pretty sure > you''ve compiled just the master branch of both linux-btrfs and > btrfs-progs. > > On Mon, Feb 4, 2013 at 8:59 PM, Gareth Pye <gareth@cerberos.id.au> wrote: >> I felt like having a small play with this stuff, as I''ve been wanting >> it for so long :) >> >> But apparently I''ve made some incredibly newb error. >> >> I used the following two lines to check out the code: >> git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git >> raid56-experimental >> git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git >> raid56-experimental-progs >> >> Then I did not very much to compile both of them (installed lots and >> lots of packages that various places told me would be needed so they''d >> both compile) finishing up with a "sudo make install" for both the >> kernel and the tools. >> Rebooting miracuously it came up with the new kernel and uname -a >> assures me that I have a new kernel running: >> btrfs@ubuntu:/kernel/raid56-experimental$ uname -a >> Linux ubuntu 3.6.0+ #1 SMP Tue Feb 5 12:26:03 EST 2013 x86_64 x86_64 >> x86_64 GNU/Linux >> but 3.6.0 sounds rather low, but it is newer than Ubuntu 12.10''s 3.5 >> so I believe I am running the kernel I just compiled >> >> Where things fail is that I can figure out how to make a raid5 btrfs, >> I''m certain I''m using the mkfs.btrfs that I just compiled (by >> explicitly calling it in the make folder) but it wont recognise what I >> assume the parameter to be: >> btrfs@ubuntu:/kernel/raid56-experimental-progs$ ./mkfs.btrfs -m raid5 >> -d raid5 /dev/sd[bcdef] >> Unknown profile raid5 >> >> Which flavour of newb am I today? >> >> PS: I use newb in a very friendly way, I feel no shame over that term :) >> >> On Tue, Feb 5, 2013 at 1:26 PM, H. Peter Anvin <hpa@zytor.com> wrote: >>> Also, a 2-member raid5 or 3-member raid6 are a raid1 and can be treated as such. >>> >>> Chris Mason <chris.mason@fusionio.com> wrote: >>> >>>>On Mon, Feb 04, 2013 at 02:42:24PM -0700, H. Peter Anvin wrote: >>>>> @@ -1389,6 +1392,14 @@ int btrfs_rm_device(struct btrfs_root *root, >>>>char >>>>> *device_path) >>>>> } >>>>> btrfs_dev_replace_unlock(&root->fs_info->dev_replace); >>>>> >>>>> + if ((all_avail & (BTRFS_BLOCK_GROUP_RAID5 | >>>>> + BTRFS_BLOCK_GROUP_RAID6) && num_devices <= 3)) { >>>>> + printk(KERN_ERR "btrfs: unable to go below three devices " >>>>> + "on raid5 or raid6\n"); >>>>> + ret = -EINVAL; >>>>> + goto out; >>>>> + } >>>>> + >>>>> if ((all_avail & BTRFS_BLOCK_GROUP_RAID10) && num_devices <= 4) { >>>>> printk(KERN_ERR "btrfs: unable to go below four devices " >>>>> "on raid10\n"); >>>>> @@ -1403,6 +1414,21 @@ int btrfs_rm_device(struct btrfs_root *root, >>>>char >>>>> *device_path) >>>>> goto out; >>>>> } >>>>> >>>>> + if ((all_avail & BTRFS_BLOCK_GROUP_RAID5) && >>>>> + root->fs_info->fs_devices->rw_devices <= 2) { >>>>> + printk(KERN_ERR "btrfs: unable to go below two " >>>>> + "devices on raid5\n"); >>>>> + ret = -EINVAL; >>>>> + goto out; >>>>> + } >>>>> + if ((all_avail & BTRFS_BLOCK_GROUP_RAID6) && >>>>> + root->fs_info->fs_devices->rw_devices <= 3) { >>>>> + printk(KERN_ERR "btrfs: unable to go below three " >>>>> + "devices on raid6\n"); >>>>> + ret = -EINVAL; >>>>> + goto out; >>>>> + } >>>>> + >>>>> if (strcmp(device_path, "missing") == 0) { >>>>> struct list_head *devices; >>>>> struct btrfs_device *tmp; >>>>> >>>>> >>>>> This seems inconsistent? >>>> >>>>Whoops, missed that one. Thanks! >>>> >>>>-chris >>> >>> -- >>> Sent from my mobile phone. Please excuse brevity and lack of formatting. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> Gareth Pye >> Level 2 Judge, Melbourne, Australia >> Australian MTG Forum: mtgau.com >> gareth@cerberos.id.au - www.rockpaperdynamite.wordpress.com >> "Dear God, I would like to file a bug report" >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html-- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gareth@cerberos.id.au - www.rockpaperdynamite.wordpress.com "Dear God, I would like to file a bug report" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, I believe XOR_BLOCKS must be selected, otherwise build fails with: ERROR: "xor_blocks" [fs/btrfs/btrfs.ko] undefined! diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig index 4f5dc93..5f583c8 100644 --- a/fs/btrfs/Kconfig +++ b/fs/btrfs/Kconfig @@ -7,6 +7,7 @@ config BTRFS_FS select LZO_COMPRESS select LZO_DECOMPRESS select RAID6_PQ + select XOR_BLOCKS help Btrfs is a new filesystem with extents, writable snapshotting, -- Tomasz .. oo o. oo o. .o .o o. o. oo o. .. Torcz .. .o .o .o .o oo oo .o .. .. oo oo o.o.o. .o .. o. o. o. o. o. o. oo .. .. o. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 05, 2013 at 07:22:36AM -0700, Tomasz Torcz wrote:> Hi, > > I believe XOR_BLOCKS must be selected, otherwise build fails with: > ERROR: "xor_blocks" [fs/btrfs/btrfs.ko] undefined! > > > diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig > index 4f5dc93..5f583c8 100644 > --- a/fs/btrfs/Kconfig > +++ b/fs/btrfs/Kconfig > @@ -7,6 +7,7 @@ config BTRFS_FS > select LZO_COMPRESS > select LZO_DECOMPRESS > select RAID6_PQ > + select XOR_BLOCKS > > help > Btrfs is a new filesystem with extents, writable snapshotting,Thanks, I''ve put this in. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Feb 10, 2013 at 03:35:05PM -0700, Gordon Manning wrote:> Hi, > Is the BTRFS raid code susceptible to RAID-5 write holes? �I think with > the original plan, the problem was avoided by always giving full stripe > writes to the raid layers. �Does the current plan deal with the hole in a > different manner?The current code in my git tree does not deal with the raid-5 write hole. That''s the part I''m finishing off now. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey Chris, On 02/02/2013 05:02 PM, Chris Mason wrote:> Btrfs -- 604MB/s > MD -- 162MB/s > > > MD -- 800MB/s very little system time > Btrfs -- 3.8GB/s one CPU mostly pegged> Btrfs -- 380MB/s seen by fio > MD -- 174MB/s seen by fio> Creating 12 million files on Btrfs raid5 took 226 seconds, vs 485 > seconds on MD.Do I read these numbers incorrectly, or does even this first iteration of btrfs'' raid5/6 code run circles around MD? Thanks for all the work! Kaspar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 12, 2013 at 08:16:49AM -0700, Kaspar Schleiser wrote:> Hey Chris, > > On 02/02/2013 05:02 PM, Chris Mason wrote: > > Btrfs -- 604MB/s > > MD -- 162MB/s > > > > > > MD -- 800MB/s very little system time > > Btrfs -- 3.8GB/s one CPU mostly pegged > > > Btrfs -- 380MB/s seen by fio > > MD -- 174MB/s seen by fio > > > Creating 12 million files on Btrfs raid5 took 226 seconds, vs 485 > > seconds on MD. > > Do I read these numbers incorrectly, or does even this first iteration > of btrfs'' raid5/6 code run circles around MD?Yes and no. Most of the differences were on flash, and really it just looks like MD needs tuning for IO latency and concurrency. There are some MD patches for this recently to add more threads for parity calculations, and these solve some throughput problems. But one thing that we''ve proven with btrfs is that helper threads mean more IO latencies. So the MD code probably needs some short cuts to do the parity inline as well. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html