Well, I just got in a system I am intending to be a BIG fileserver; background- I work for a SAN startup, and we''re expecting in our first year to collect 30-60 terabytes of Fibre Channel traces. The purpose of this is to be a large repository for those traces w/ statistical analysis run against them. Looking at that storage figure, I decided this would be a perfect application for ZFS. I purchased a Super Micro chassis that''s 4u and has 24 slots for SATA drives. I''ve put one quad-core 2.66 ghz processor in & 8gig of ECC ram. I put in two Areca 1231ML ( http://www.areca.com.tw/products/pcie341.htm ) controllers which come with Solaris drivers. I''ve half-populated the chassis with 12 1Tb drives to begin with, and I''m running some experiments. I loaded OpenSolaris 05-2008 on the system. I configured up an 11 drive RAID6 set + 1 hot spare on the Areca controller put a ZFS on that raid volume, and ran bonnie++ against it (16g size), and achieved 150 mb/s write, & 200 mb/s read. I then blew that away, configured the Areca to present JBOD, and configured ZFS with RAIDZ2 11 disks, and a hot spare. Running bonnie++ against that, it achieved 40 mb/sec read and 40 mb/sec write. I wasn''t expecting RAIDZ to outrun the controller-based RAID, but I wasn''t expecting 1/3rd to 1/4 the performance. I''ve looked at the ZFS tuning info on the solaris site, and mostly what they said is "tuning is evil", with a few things for Database tuning. Anyone got suggestions on whether there''s something I might poke at to at least get this puppy up closer to 100 mb/sec? Otherwise, I may dump the JBOD and go back to the controller-based RAID. Cheers Ross -- This message posted from opensolaris.org
On Fri, 26 Sep 2008, Ross Becker wrote:> > I configured up an 11 drive RAID6 set + 1 hot spare on the Areca > controller put a ZFS on that raid volume, and ran bonnie++ against > it (16g size), and achieved 150 mb/s write, & 200 mb/s read. I then > blew that away, configured the Areca to present JBOD, and configured > ZFS with RAIDZ2 11 disks, and a hot spare. Running bonnie++ against > that, it achieved 40 mb/sec read and 40 mb/sec write. I wasn''t > expecting RAIDZ to outrun the controller-based RAID, but I wasn''t > expecting 1/3rd to 1/4 the performance. I''ve looked at the ZFSTerrible! Have you tested the I/O performance of each drive to make sure that they are all performing ok? If the individual drives are found to be performing ok with your JBOD setup, then I would suspect a device driver, card slot, or card firmware performance problem. If RAID6 is done by the RAID card then backplane I/O to the card is not very high. If raidz2 is used, then the I/O to the card is much higher. With a properly behaving device driver and card, it is quite likely that ZFS raidz2 will outperform the on-card RAID6. You might try disabling the card''s NVRAM to see if that makes a difference. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Okay, after doing some testing, it appears that the issue is on the ZFS side. I fiddled around a while with options on the areca card, and never got any better performance results than my first test. So, my best out of the raidz2 is 42 mb/s write and 43 mb/s read. I also tried turning off crc''s (not how I''d run production, but for testing), and got no performance gain. After fiddling with options, I destroyed my zfs & zpool, and tried some single-drive bits. I simply used newfs to create filesystems on single drives, mounted them, and ran some single-drive bonnie++ tests. On a single drive, I got 50 mb/sec write & 70 mb/sec read. I also tested two benchmarks on two drives simultaneously, and on each of the tests, the result dropped by about 2mb/sec, so I got a combined 96 mb/sec write & 136 mb/sec read with two separate UFS filesystems on two separate disks. So.... next steps? --ross -- This message posted from opensolaris.org
On Fri, Sep 26, 2008 at 5:46 PM, Ross Becker <ross at gridironsystems.com>wrote:> Okay, after doing some testing, it appears that the issue is on the ZFS > side. I fiddled around a while with options on the areca card, and never > got any better performance results than my first test. So, my best out of > the raidz2 is 42 mb/s write and 43 mb/s read. I also tried turning off > crc''s (not how I''d run production, but for testing), and got no performance > gain. > > After fiddling with options, I destroyed my zfs & zpool, and tried some > single-drive bits. I simply used newfs to create filesystems on single > drives, mounted them, and ran some single-drive bonnie++ tests. On a single > drive, I got 50 mb/sec write & 70 mb/sec read. I also tested two > benchmarks on two drives simultaneously, and on each of the tests, the > result dropped by about 2mb/sec, so I got a combined 96 mb/sec write & 136 > mb/sec read with two separate UFS filesystems on two separate disks. > > So.... next steps? > > --ross > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >Did you try disabling the card cache as others advised? --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080926/75edfe06/attachment.html>
That was part of my testing of the RAID controller settings; turning off the controller cache dropped me to 20 mb/sec read & write under raidz2/zfs. --Ross -- This message posted from opensolaris.org
James C. McPherson
2008-Sep-27 00:00 UTC
[zfs-discuss] ZFS poor performance on Areca 1231ML
Ross Becker wrote:> Well, I just got in a system I am intending to be a BIG fileserver; > background- I work for a SAN startup, and we''re expecting in our first > year to collect 30-60 terabytes of Fibre Channel traces. The purpose of > this is to be a large repository for those traces w/ statistical analysis > run against them. Looking at that storage figure, I decided this would > be a perfect application for ZFS. I purchased a Super Micro chassis > that''s 4u and has 24 slots for SATA drives. I''ve put one quad-core 2.66 > ghz processor in & 8gig of ECC ram. I put in two Areca 1231ML ( > http://www.areca.com.tw/products/pcie341.htm ) controllers which come > with Solaris drivers. I''ve half-populated the chassis with 12 1Tb > drives to begin with, and I''m running some experiments. I loaded > OpenSolaris 05-2008 on the system. > > I configured up an 11 drive RAID6 set + 1 hot spare on the Areca > controller put a ZFS on that raid volume, and ran bonnie++ against it > (16g size), and achieved 150 mb/s write, & 200 mb/s read. I then blew > that away, configured the Areca to present JBOD, and configured ZFS with > RAIDZ2 11 disks, and a hot spare. Running bonnie++ against that, it > achieved 40 mb/sec read and 40 mb/sec write. I wasn''t expecting RAIDZ to > outrun the controller-based RAID, but I wasn''t expecting 1/3rd to 1/4 the > performance. I''ve looked at the ZFS tuning info on the solaris site, and > mostly what they said is "tuning is evil", with a few things for Database > tuning. > > Anyone got suggestions on whether there''s something I might poke at to at > least get this puppy up closer to 100 mb/sec? Otherwise, I may dump the > JBOD and go back to the controller-based RAID.While running pre-integration testing of arcmsr(7d), I noticed that random IO was pretty terrible. My results matched what I saw in benchmark PDFs from http://www.areca.com.tw/support/main.htm (bottom of page), but I''d still like to improve the results. Were you doing more random or more sequential IO? The source is here: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/intel/io/scsi/adapters/arcmsr ... and I''m keen to talk with you in detail about the issues you''re seeing with arcmsr too. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
Ross Becker wrote:> Okay, after doing some testing, it appears that the issue is on the ZFS side. I fiddled around a while with options on the areca card, and never got any better performance results than my first test. So, my best out of the raidz2 is 42 mb/s write and 43 mb/s read. I also tried turning off crc''s (not how I''d run production, but for testing), and got no performance gain. > > After fiddling with options, I destroyed my zfs & zpool, and tried some single-drive bits. I simply used newfs to create filesystems on single drives, mounted them, and ran some single-drive bonnie++ tests. On a single drive, I got 50 mb/sec write & 70 mb/sec read. I also tested two benchmarks on two drives simultaneously, and on each of the tests, the result dropped by about 2mb/sec, so I got a combined 96 mb/sec write & 136 mb/sec read with two separate UFS filesystems on two separate disks. > > So.... next steps? > > --ross >Raidz(2) vdevs can sustain the max iops of single drive in the vdev. I''m curious what zpool iostat would say while bonnie++ is running it''s "writing intelligently" test. The throughput sounds very low to me, but the clue here is the single drive speed is in line with the raidz2 vdev, so if a single drive is being limited by iops, not by raw throughput, then this IO result makes sense. For fun, you should make two vdevs out of two raidz to see if you get twice the throughput, more or less. I''ll bet the answer is yes. Jon -- - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3
>>>>> "jl" == Jonathan Loran <jloran at ssl.berkeley.edu> writes:jl> the single drive speed is in line with the raidz2 vdev, reviewing the OP UFS single drive: 50MB/s write 70MB/s read ZFS 1-drive: 42MB/s write 43MB/s read raidz2 11-drive: 40MB/s write 40MB/s read so, read speed (on this test) is almost double for UFS. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080928/b67c827c/attachment.bin>
I have to come back and face the shame; this was a total newbie mistake by myself. I followed the ZFS shortcuts for noobs guide off bigadmin; http://wikis.sun.com/display/BigAdmin/ZFS+Shortcuts+for+Noobs What that had me doing was creating a UFS filesystem on top of a ZFS volume, so I was using only 2 layers of ZFS. I just re-did this against end-to-end ZFS, and the results are pretty freaking impressive; ZFS is handily outrunning the hardware RAID. Bonnie++ is achieving 257 mb/sec write, and 312 mb/sec read. My apologies for wasting folks time; this is my first experience with a solaris of recent vintage. --Ross -- This message posted from opensolaris.org
Cindy.Swearingen at Sun.COM
2008-Sep-29 19:30 UTC
[zfs-discuss] ZFS poor performance on Areca 1231ML
Ross, No need to apologize... Many of us work hard to make sure good ZFS information is available so a big thanks for bringing this wiki page to our attention. Playing with UFS on ZFS is one thing but even inexperienced admins need to know this kind of configuration will provide poor performance. I updated the bigadmin wiki page to include a note about sub-optimal performance and added a item about this in the ZFS best practices site, under General Storage Pool Performance Considerations, here: http://www.solarisinternals.com/wiki/index.php?title=ZFS_Best_Practices_Guide Thanks, Cindy Ross Becker wrote:> I have to come back and face the shame; this was a total newbie mistake by myself. > > I followed the ZFS shortcuts for noobs guide off bigadmin; http://wikis.sun.com/display/BigAdmin/ZFS+Shortcuts+for+Noobs > > What that had me doing was creating a UFS filesystem on top of a ZFS volume, so I was using only 2 layers of ZFS. > > I just re-did this against end-to-end ZFS, and the results are pretty freaking impressive; ZFS is handily outrunning the hardware RAID. Bonnie++ is achieving 257 mb/sec write, and 312 mb/sec read. > > My apologies for wasting folks time; this is my first experience with a solaris of recent vintage. > > --Ross > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Mon, Sep 29, 2008 at 12:57 PM, Ross Becker <ross at gridironsystems.com> wrote:> I have to come back and face the shame; this was a total newbie mistake by myself. > > I followed the ZFS shortcuts for noobs guide off bigadmin; http://wikis.sun.com/display/BigAdmin/ZFS+Shortcuts+for+Noobs > > What that had me doing was creating a UFS filesystem on top of a ZFS volume, so I was using only 2 layers of ZFS. > > I just re-did this against end-to-end ZFS, and the results are pretty freaking impressive; ZFS is handily outrunning the hardware RAID. Bonnie++ is achieving 257 mb/sec write, and 312 mb/sec read. > > My apologies for wasting folks time; this is my first experience with a solaris of recent vintage.No apology necessary and I''m glad you figured it out - I was just reading this thread and thinking "I''m missing something here - this can''t be right". If you have the budget to run a few more "experiments", try this SuperMicro card: http://www.springsource.com/repository/app/faq that others have had success with. Regards, -- Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
> > No apology necessary and I''m glad you figured it out - I was just > reading this thread and thinking "I''m missing something here - this > can''t be right". > > If you have the budget to run a few more "experiments", try this > SuperMicro card: > http://www.springsource.com/repository/app/faq > that others have had success with. > > Regards, > > -- > Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com > Voice: 972.379.2133 Timezone: US CDT > OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 > http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ >Wrong link? --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080930/4f87916f/attachment.html>
On Tue, Sep 30, 2008 at 3:51 PM, Tim <tim at tcsac.net> wrote:> > >> >> >> No apology necessary and I''m glad you figured it out - I was just >> reading this thread and thinking "I''m missing something here - this >> can''t be right". >> >> If you have the budget to run a few more "experiments", try this >> SuperMicro card: >> http://www.springsource.com/repository/app/faq >> that others have had success with. >> >> Regards, >> >> -- >> Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com >> Voice: 972.379.2133 Timezone: US CDT >> OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 >> http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ > > Wrong link?Sorry! :( http://www.supermicro.com/products/accessories/addon/AOC-USASLP-L8i.cfm> --Tim > >-- Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
At this point, ZFS is performing admirably with the Areca card. Also, that card is only 8-port, and the Areca controllers I have are 12-port. My chassis has 24 SATA bays, so being able to cover all the drives with 2 controllers is preferable. Also, the driver for the Areca controllers is being integrated into OpenSolaris as we discuss, so the next spin of Opensolaris won''t even require me to add the driver for it. --Ross -- This message posted from opensolaris.org
On Tue, Sep 30, 2008 at 5:04 PM, Ross Becker <ross at gridironsystems.com>wrote:> At this point, ZFS is performing admirably with the Areca card. Also, that > card is only 8-port, and the Areca controllers I have are 12-port. My > chassis has 24 SATA bays, so being able to cover all the drives with 2 > controllers is preferable. > > Also, the driver for the Areca controllers is being integrated into > OpenSolaris as we discuss, so the next spin of Opensolaris won''t even > require me to add the driver for it. > > > --Ross > -- >All very valid points... if you don''t mind spending 8x as much for the cards :) --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080930/27bb1e04/attachment.html>
Dmitry Razguliaev
2008-Dec-20 21:34 UTC
[zfs-discuss] ZFS poor performance on Areca 1231ML
Hi, I faced with a similar problem, like Ross, but still have not found a solution. I have raidz out of 9 sata disks connected to internal and 2 external sata controllers. Bonnie++ gives me the following results: nexenta,8G,104393,43,159637,30,57855,13,77677,38,56296,7,281.8,1,16,26450,99,+++++,+++,29909,93,24232,99,+++++,+++,13912,99 while running on a single disk it gives me the following: nexenta,8G,54382,23,49141,8,25955,5,58696,27,60815,5,270.8,1,16,19793,76,+++++,+++,32637,99,22958,99,+++++,+++,10490,99 The performance difference of between those two seems to be too small. I checked zpool iostat -v during bonnie++ itelligent writing and it looks it, every time more or less like this: capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- iTank 7.20G 2.60T 12 13 1.52M 1.58M raidz1 7.20G 2.60T 12 13 1.52M 1.58M c8d0 - - 1 1 172K 203K c7d1 - - 1 1 170K 203K c6t0d0 - - 1 1 172K 203K c8d1 - - 1 1 173K 203K c9d0 - - 1 1 174K 203K c10d0 - - 1 1 174K 203K c6t1d0 - - 1 1 175K 203K c5t0d0s0 - - 1 1 176K 203K c5t1d0s0 - - 1 1 176K 203K As far as I understand it, less each vdev executes only 1 i/o in a time. time. however, on a single device zpool iostat -v gives me the following: capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- rpool 5.47G 181G 3 3 441K 434K c7d0s0 5.47G 181G 3 3 441K 434K ---------- ----- ----- ----- ----- ----- ----- In this case this device performs 3 i/o in a time, which gives it much higher bandwidth per unit. Is there any way to increase i/o counts for my iTank zpool? I''m running OS-11.2008 on MSI P45 Diamond with 4G of memory Best Regards, Dmitry -- This message posted from opensolaris.org
Le 20 d?c. 08 ? 22:34, Dmitry Razguliaev a ?crit :> Hi, I faced with a similar problem, like Ross, but still have not > found a solution. I have raidz out of 9 sata disks connected to > internal and 2 external sata controllers. Bonnie++ gives me the > following results: > nexenta,8G, > 104393,43,159637,30,57855,13,77677,38,56296,7,281.8,1,16,26450,99,+++ > ++,+++,29909,93,24232,99,+++++,+++,13912,99 > while running on a single disk it gives me the following: > nexenta,8G, > 54382,23,49141,8,25955,5,58696,27,60815,5,270.8,1,16,19793,76,+++++,+ > ++,32637,99,22958,99,+++++,+++,10490,99 > The performance difference of between those two seems to be too > small. I checked zpool iostat -v during bonnie++ itelligent writing > and it looks it, every time more or less like this: >Did you run : zpool iostat -v 1 ?> capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > iTank 7.20G 2.60T 12 13 1.52M 1.58M > raidz1 7.20G 2.60T 12 13 1.52M 1.58M > c8d0 - - 1 1 172K 203K > c7d1 - - 1 1 170K 203K > c6t0d0 - - 1 1 172K 203K > c8d1 - - 1 1 173K 203K > c9d0 - - 1 1 174K 203K > c10d0 - - 1 1 174K 203K > c6t1d0 - - 1 1 175K 203K > c5t0d0s0 - - 1 1 176K 203K > c5t1d0s0 - - 1 1 176K 203K > > As far as I understand it, less each vdev executes only 1 i/o in a > time. time. however, on a single device zpool iostat -v gives me the > following: > > > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > rpool 5.47G 181G 3 3 441K 434K > c7d0s0 5.47G 181G 3 3 441K 434K > ---------- ----- ----- ----- ----- ----- ----- > > In this case this device performs 3 i/o in a time, which gives it > much higher bandwidth per unit. > > Is there any way to increase i/o counts for my iTank zpool? > I''m running OS-11.2008 on MSI P45 Diamond with 4G of memory > > Best Regards, Dmitry > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
A question: why do you want to use HW raid together with ZFS? I thought ZFS performing better if it was in total control? Would the results have been better if no HW raid controller, and only ZFS? -- This message posted from opensolaris.org
Dmitry Razguliaev
2009-Jan-10 11:49 UTC
[zfs-discuss] ZFS poor performance on Areca 1231ML
At the time of writing that post, no, I didn''t run zpool iostat -v 1. However, I run it after that. Results for operations of iostat command has changed from 1 for every device in raidz to something in between 20 and 400 for raidz volume and from 3 to something in between 200 and 450 for a single device zfs volume, but the final result remained the same: single disk zfs volume is only about twice slower, then 9 disks raidz zfs volume, which seems to be very strange. My expectations are in a range of 6-7 times difference in performance. Best Regards, Dmitry -- This message posted from opensolaris.org
On Sat, 10 Jan 2009, Dmitry Razguliaev wrote:> At the time of writing that post, no, I didn''t run zpool iostat -v > 1. However, I run it after that. Results for operations of iostat > command has changed from 1 for every device in raidz to something in > between 20 and 400 for raidz volume and from 3 to something in > between 200 and 450 for a single device zfs volume, but the final > result remained the same: single disk zfs volume is only about twice > slower, then 9 disks raidz zfs volume, which seems to be very > strange. My expectations are in a range of 6-7 times difference in > performance.Your expectations were wrong. Raidz and raidz2 will improve bulk sequential read/write with large files but they do nothing useful for random access or multi-user performance. There is also the issue that a single slow disk in a raidz or raidz2 vdev and drag down the performance of the whole vdev. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
muchas gracias por vuestro tiempo, por que me hiciste mas fuerte [if misleading, plz excuse my french...] ;-) z ----- Original Message ----- From: "Bob Friesenhahn" <bfriesen at simple.dallas.tx.us> To: "Dmitry Razguliaev" <rdmitry0911 at yandex.ru> Cc: <zfs-discuss at opensolaris.org> Sent: Saturday, January 10, 2009 10:28 AM Subject: Re: [zfs-discuss] ZFS poor performance on Areca 1231ML> On Sat, 10 Jan 2009, Dmitry Razguliaev wrote: > >> At the time of writing that post, no, I didn''t run zpool iostat -v >> 1. However, I run it after that. Results for operations of iostat >> command has changed from 1 for every device in raidz to something in >> between 20 and 400 for raidz volume and from 3 to something in >> between 200 and 450 for a single device zfs volume, but the final >> result remained the same: single disk zfs volume is only about twice >> slower, then 9 disks raidz zfs volume, which seems to be very >> strange. My expectations are in a range of 6-7 times difference in >> performance. > > Your expectations were wrong. Raidz and raidz2 will improve bulk > sequential read/write with large files but they do nothing useful for > random access or multi-user performance. There is also the issue that > a single slow disk in a raidz or raidz2 vdev and drag down the > performance of the whole vdev. > > Bob > =====================================> Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss