List, I have done some benchmarking of different file systems on a HW raid (areca RAID6 with 7 disks). My test focuses on the behaviour of the system under competing read and write workloads. My benchmark runs 3 writer and 3 reader processes in parallel each reader gets its own artificial 5 GB home-directory-filetree while the writers create a similar tree. Running this on a single disk, I get the quite acceptable results. When running on-top of a Areca HW Raid6 (lvm partitioned) then both read and write performance go down by at least 2 magnitudes. I am not on this list, so please cc me on any replies. My Test Software is here: http://oss.oetiker.ch/optools/wiki/FsOpBench And these are the results: 2.6.31 - btrfs - cfq - single sata disk ###################################################################### 1 readers (30s) ---------------------------------------------------------------------- A read dir cnt 56400 min 0.001 ms max 96.106 ms mean 0.053 ms stdev 0.973 B lstat file cnt 52652 min 0.006 ms max 34.721 ms mean 0.057 ms stdev 0.680 C open file cnt 41411 min 0.014 ms max 0.277 ms mean 0.017 ms stdev 0.003 D rd 1st byte cnt 41412 min 0.019 ms max 51.501 ms mean 0.327 ms stdev 1.774 E read rate 164.741 MB/s (data) 44.940 MB/s (readdir + open + 1st byte + data) 3 readers (30s) ---------------------------------------------------------------------- A read dir cnt 21322 min 0.001 ms max 72.704 ms mean 0.073 ms stdev 1.544 B lstat file cnt 19881 min 0.006 ms max 103.878 ms mean 0.145 ms stdev 2.055 C open file cnt 15558 min 0.014 ms max 0.109 ms mean 0.018 ms stdev 0.003 D rd 1st byte cnt 15558 min 0.020 ms max 2528.137 ms mean 1.312 ms stdev 21.358 E read rate 106.851 MB/s (data) 15.778 MB/s (readdir + open + 1st byte + data) 3 readers, 3 writers (30s) ---------------------------------------------------------------------- F write open cnt 15428 min 0.057 ms max 898.478 ms mean 0.390 ms stdev 13.349 G wr 1st byte cnt 15428 min 0.006 ms max 15.889 ms mean 0.009 ms stdev 0.147 H write close cnt 15428 min 0.016 ms max 533.088 ms mean 0.222 ms stdev 9.099 I mkdir cnt 1350 min 0.031 ms max 77.956 ms mean 0.127 ms stdev 2.218 J write rate 30.738 MB/s (data) 23.177 MB/s (open + 1st byte + data) A read dir cnt 3382 min 0.001 ms max 1586.615 ms mean 0.831 ms stdev 29.901 B lstat file cnt 3158 min 0.007 ms max 427.770 ms mean 0.390 ms stdev 9.328 C open file cnt 2489 min 0.014 ms max 2.644 ms mean 0.020 ms stdev 0.071 D rd 1st byte cnt 2489 min 0.021 ms max 2033.881 ms mean 8.327 ms stdev 68.468 E read rate 11.927 MB/s (data) 2.169 MB/s (readdir + open + 1st byte + data) 2.6.31 - btrfs - cfq - areca raid6 (7 disks) lvm partitioned ###################################################################### 1 readers (30s) ---------------------------------------------------------------------- A read dir cnt 78845 min 0.001 ms max 29.713 ms mean 0.027 ms stdev 0.421 B lstat file cnt 73600 min 0.006 ms max 21.639 ms mean 0.038 ms stdev 0.273 C open file cnt 57862 min 0.013 ms max 0.100 ms mean 0.017 ms stdev 0.003 D rd 1st byte cnt 57861 min 0.014 ms max 70.214 ms mean 0.209 ms stdev 0.919 E read rate 185.464 MB/s (data) 63.842 MB/s (readdir + open + 1st byte + data) 3 readers (30s) ---------------------------------------------------------------------- A read dir cnt 41222 min 0.001 ms max 169.195 ms mean 0.056 ms stdev 1.113 B lstat file cnt 38447 min 0.006 ms max 79.977 ms mean 0.064 ms stdev 0.746 C open file cnt 30122 min 0.013 ms max 0.042 ms mean 0.018 ms stdev 0.003 D rd 1st byte cnt 30122 min 0.014 ms max 597.264 ms mean 0.535 ms stdev 6.646 E read rate 124.144 MB/s (data) 31.197 MB/s (readdir + open + 1st byte + data) 3 readers, 3 writers (30s) ---------------------------------------------------------------------- F write open cnt 107 min 0.063 ms max 70.593 ms mean 0.760 ms stdev 6.784 G wr 1st byte cnt 107 min 0.006 ms max 0.014 ms mean 0.007 ms stdev 0.002 H write close cnt 107 min 0.017 ms max 1784.192 ms mean 20.830 ms stdev 176.474 I mkdir cnt 9 min 0.049 ms max 9.184 ms mean 1.079 ms stdev 2.865 J write rate 0.200 MB/s (data) 0.199 MB/s (open + 1st byte + data) A read dir cnt 1215 min 0.001 ms max 2661.328 ms mean 4.008 ms stdev 81.513 B lstat file cnt 1144 min 0.007 ms max 377.476 ms mean 1.827 ms stdev 18.844 C open file cnt 928 min 0.014 ms max 1.596 ms mean 0.021 ms stdev 0.056 D rd 1st byte cnt 928 min 0.015 ms max 1936.262 ms mean 25.187 ms stdev 123.755 E read rate 9.199 MB/s (data) 0.792 MB/s (readdir + open + 1st byte + data) -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Tobias Oetiker:> Running this on a single disk, I get the quite acceptable results. > When running on-top of a Areca HW Raid6 (lvm partitioned) > then both read and write performance go down by at least 2 > magnitudes.Does the HW RAID use write caching (preferably battery-backed)? -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Florian, Today Florian Weimer wrote:> * Tobias Oetiker: > > > Running this on a single disk, I get the quite acceptable results. > > When running on-top of a Areca HW Raid6 (lvm partitioned) > > then both read and write performance go down by at least 2 > > magnitudes. > > Does the HW RAID use write caching (preferably battery-backed)?yes it does ... is there some magic switch to be set for btrfs to act accordingly ? cheers tobi> >-- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 28, 2009 at 9:17 AM, Florian Weimer <fweimer@bfk.de> wrote:> * Tobias Oetiker: > >> Running this on a single disk, I get the quite acceptable results. >> When running on-top of a Areca HW Raid6 (lvm partitioned) >> then both read and write performance go down by at least 2 >> magnitudes. > > Does the HW RAID use write caching (preferably battery-backed)?I believe Areca controllers have an option for writeback or writethrough caching, so it''s worth checking this and that you''re running the current firmware, in case of errata. Ironically, disabling writeback will give the OS tighter control of request latency, but throughput may drop a lot. I still can''t help thinking that this is down to the behaviour of the controller, due to the 1-disk case working well. One way would be to configure the array as 6 or 7 devices, and allow BTRFS/DM to mange the array, then see if performance under write load is better, and with or without writeback caching... Daniel -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Daniel, Today Daniel J Blueman wrote:> On Mon, Sep 28, 2009 at 9:17 AM, Florian Weimer <fweimer@bfk.de> wrote: > > * Tobias Oetiker: > > > >> Running this on a single disk, I get the quite acceptable results. > >> When running on-top of a Areca HW Raid6 (lvm partitioned) > >> then both read and write performance go down by at least 2 > >> magnitudes. > > > > Does the HW RAID use write caching (preferably battery-backed)? > > I believe Areca controllers have an option for writeback or > writethrough caching, so it''s worth checking this and that you''re > running the current firmware, in case of errata. Ironically, disabling > writeback will give the OS tighter control of request latency, but > throughput may drop a lot. I still can''t help thinking that this is > down to the behaviour of the controller, due to the 1-disk case > working well.it certainly is down to a behaviour of the controller, or the results would be the same as with a single sata disk :-) It would be interesting to see what results others get on HW Raid Controllers ...> One way would be to configure the array as 6 or 7 devices, and allow > BTRFS/DM to mange the array, then see if performance under write load > is better, and with or without writeback caching...I can imagine that this would help, but since btrfs aims to be multipurpose, this does not realy help all that much since this fundamentally alters the ''conditions'' at the moment the RAID contains different filesystem and is partitioned using lvm ... cheers tobi the results for ext3 fs look like this ... -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/28/2009 05:39 AM, Tobias Oetiker wrote:> Hi Daniel, > > Today Daniel J Blueman wrote: > > >> On Mon, Sep 28, 2009 at 9:17 AM, Florian Weimer<fweimer@bfk.de> wrote: >> >>> * Tobias Oetiker: >>> >>> >>>> Running this on a single disk, I get the quite acceptable results. >>>> When running on-top of a Areca HW Raid6 (lvm partitioned) >>>> then both read and write performance go down by at least 2 >>>> magnitudes. >>>> >>> Does the HW RAID use write caching (preferably battery-backed)? >>> >> I believe Areca controllers have an option for writeback or >> writethrough caching, so it''s worth checking this and that you''re >> running the current firmware, in case of errata. Ironically, disabling >> writeback will give the OS tighter control of request latency, but >> throughput may drop a lot. I still can''t help thinking that this is >> down to the behaviour of the controller, due to the 1-disk case >> working well. >> > it certainly is down to a behaviour of the controller, or the > results would be the same as with a single sata disk :-) It would > be interesting to see what results others get on HW Raid > Controllers ... > > >> One way would be to configure the array as 6 or 7 devices, and allow >> BTRFS/DM to mange the array, then see if performance under write load >> is better, and with or without writeback caching... >> > I can imagine that this would help, but since btrfs aims to be > multipurpose, this does not realy help all that much since this > fundamentally alters the ''conditions'' at the moment the RAID > contains different filesystem and is partitioned using lvm ... > > cheers > tobi > > the results for ext3 fs look like this ... > >I would be more suspicious of the barrier/flushes being issued. If your write cache is non-volatile, we really do not want to send them down to this type of device. Flushing this type of cache could certainly be very, very expensive and slow. Try "mount -o nobarrier" and see if your performance (write cache still enabled on the controller) is back to expected levels, Ric -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Ric, Today Ric Wheeler wrote:> I would be more suspicious of the barrier/flushes being issued. If your write > cache is non-volatile, we really do not want to send them down to this type of > device. Flushing this type of cache could certainly be very, very expensive > and slow. > > Try "mount -o nobarrier" and see if your performance (write cache still > enabled on the controller) is back to expected levels,wow, indeed ... without special mount options I get the following from my RAID6 with non volatile cache: ###################################################################### 1 readers (30s) ---------------------------------------------------------------------- A read dir cnt 78845 min 0.001 ms max 29.713 ms mean 0.027 ms stdev 0.421 B lstat file cnt 73600 min 0.006 ms max 21.639 ms mean 0.038 ms stdev 0.273 C open file cnt 57862 min 0.013 ms max 0.100 ms mean 0.017 ms stdev 0.003 D rd 1st byte cnt 57861 min 0.014 ms max 70.214 ms mean 0.209 ms stdev 0.919 E read rate 185.464 MB/s (data) 63.842 MB/s (readdir + open + 1st byte + data) 3 readers (30s) ---------------------------------------------------------------------- A read dir cnt 41222 min 0.001 ms max 169.195 ms mean 0.056 ms stdev 1.113 B lstat file cnt 38447 min 0.006 ms max 79.977 ms mean 0.064 ms stdev 0.746 C open file cnt 30122 min 0.013 ms max 0.042 ms mean 0.018 ms stdev 0.003 D rd 1st byte cnt 30122 min 0.014 ms max 597.264 ms mean 0.535 ms stdev 6.646 E read rate 124.144 MB/s (data) 31.197 MB/s (readdir + open + 1st byte + data) 3 readers, 3 writers (30s) ---------------------------------------------------------------------- F write open cnt 107 min 0.063 ms max 70.593 ms mean 0.760 ms stdev 6.784 G wr 1st byte cnt 107 min 0.006 ms max 0.014 ms mean 0.007 ms stdev 0.002 H write close cnt 107 min 0.017 ms max 1784.192 ms mean 20.830 ms stdev 176.474 I mkdir cnt 9 min 0.049 ms max 9.184 ms mean 1.079 ms stdev 2.865 J write rate 0.200 MB/s (data) 0.199 MB/s (open + 1st byte + data) A read dir cnt 1215 min 0.001 ms max 2661.328 ms mean 4.008 ms stdev 81.513 B lstat file cnt 1144 min 0.007 ms max 377.476 ms mean 1.827 ms stdev 18.844 C open file cnt 928 min 0.014 ms max 1.596 ms mean 0.021 ms stdev 0.056 D rd 1st byte cnt 928 min 0.015 ms max 1936.262 ms mean 25.187 ms stdev 123.755 E read rate 9.199 MB/s (data) 0.792 MB/s (readdir + open + 1st byte + data) mounting with -o nobarrier I get ... ###################################################################### 1 readers (30s) ---------------------------------------------------------------------- A read dir cnt 78876 min 0.001 ms max 19.803 ms mean 0.013 ms stdev 0.228 B lstat file cnt 73624 min 0.006 ms max 18.032 ms mean 0.034 ms stdev 0.210 C open file cnt 57868 min 0.014 ms max 0.041 ms mean 0.017 ms stdev 0.003 D rd 1st byte cnt 57869 min 0.019 ms max 417.725 ms mean 0.225 ms stdev 2.459 E read rate 177.779 MB/s (data) 63.375 MB/s (readdir + open + 1st byte + data) 3 readers (30s) ---------------------------------------------------------------------- A read dir cnt 38209 min 0.001 ms max 26.745 ms mean 0.025 ms stdev 0.472 B lstat file cnt 35624 min 0.006 ms max 26.019 ms mean 0.048 ms stdev 0.410 C open file cnt 27874 min 0.014 ms max 1.257 ms mean 0.017 ms stdev 0.008 D rd 1st byte cnt 27874 min 0.020 ms max 3197.520 ms mean 0.626 ms stdev 20.279 E read rate 98.242 MB/s (data) 27.763 MB/s (readdir + open + 1st byte + data) 3 readers, 3 writers (30s) ---------------------------------------------------------------------- F write open cnt 5957 min 0.061 ms max 591.787 ms mean 0.457 ms stdev 9.956 G wr 1st byte cnt 5956 min 0.006 ms max 0.136 ms mean 0.007 ms stdev 0.002 H write close cnt 5957 min 0.017 ms max 1340.145 ms mean 0.818 ms stdev 22.442 I mkdir cnt 574 min 0.034 ms max 11.094 ms mean 0.083 ms stdev 0.543 J write rate 9.766 MB/s (data) 8.705 MB/s (open + 1st byte + data) A read dir cnt 15183 min 0.001 ms max 439.260 ms mean 0.130 ms stdev 4.150 B lstat file cnt 14199 min 0.006 ms max 200.212 ms mean 0.152 ms stdev 3.420 C open file cnt 11250 min 0.014 ms max 6.641 ms mean 0.019 ms stdev 0.084 D rd 1st byte cnt 11250 min 0.021 ms max 1649.488 ms mean 1.715 ms stdev 19.472 E read rate 52.022 MB/s (data) 10.858 MB/s (readdir + open + 1st byte + data) amazing effect ... unfortunately the system also crashes quite often when running this test, so I guess we have to wait a bit more for this to run primetime ... a further observation is that both the RAID as well the SATA case show max latency values way over 1 second ... which is a bit much ... not that other filesystems were much better, but then again it would be cool if btrfs could best the others ... cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html