Nick
2008-Feb-09 23:38 UTC
[zfs-discuss] OpenSolaris, ZFS and Hardware RAID, a recipe for success?
I have been tasked with putting together a storage solution for use in a virtualization setup, serving NFS, CIFS, and iSCSI, over GigE. I''ve inherited a few components to work with: x86 dual core server , 512MB LSI-8888ELP RAID card 12 x 300GB 15Krpm SAS disks & array 2GB Flash to IDE "disk"/adaptor. The system will be serving virtual hard disks to a range of vmware systems connected by GigE, running enterprise workloads that are impossible to predict at this point. Using the RAID cards capability for RAID6 sounds attractive? Using the Flash RAM for the ZIL? Using zfs for general storage management? Has anyone built a similar system, what is the true path to success? What are the pitfalls? What should I have on my reading list for starters? All advice most gratefully received. MM. This message posted from opensolaris.org
Richard Elling
2008-Feb-10 04:02 UTC
[zfs-discuss] OpenSolaris, ZFS and Hardware RAID, a recipe for success?
Nick wrote:> I have been tasked with putting together a storage solution for use in a virtualization setup, serving NFS, CIFS, and iSCSI, over GigE. I''ve inherited a few components to work with: > > x86 dual core server , 512MB LSI-8888ELP RAID card > 12 x 300GB 15Krpm SAS disks & array > 2GB Flash to IDE "disk"/adaptor. > > The system will be serving virtual hard disks to a range of vmware systems connected by GigE, running enterprise workloads that are impossible to predict at this point. > > Using the RAID cards capability for RAID6 sounds attractive? >Assuming the card works well with Solaris, this sounds like a reasonable solution.> Using the Flash RAM for the ZIL? >I''m not sure why you would want to do this. Just carve off a LUN or slice on the RAID card and use its NVRAM cache. A consumer class flash "disk" will be slower.> Using zfs for general storage management? > >cool.> Has anyone built a similar system, what is the true path to success? >Success is at the summit, but there are several paths up the mountain.> What are the pitfalls? > What should I have on my reading list for starters? >Start with the ZFS system admin guide on opensolaris.org. We try to keep the solarisinternals.com wikis up to date also. -- richard
Jonathan Loran
2008-Feb-10 07:06 UTC
[zfs-discuss] OpenSolaris, ZFS and Hardware RAID, a recipe for success?
Richard Elling wrote:> Nick wrote: > >> >> Using the RAID cards capability for RAID6 sounds attractive? >> >> > > Assuming the card works well with Solaris, this sounds like a > reasonable solution. > >Careful here. If your workload is unpredictable, RAID 6 (and RAID 5) for that matter will break down under highly randomized write loads. There''s a lot of trickery done with hardware RAID cards that can do some read-ahead caching magic, improving the read-paritycalc-paritycalc-write cycle, but you can''t beat out the laws of physics. If you do *know* you''ll be streaming more than writing random small number of blocks, RAID 6 hardware can work. But with transaction like loads, performance will suck. Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080209/69ee7662/attachment.html>
The workload will be generated by a gaggle of virtual machines, most of which I expect to be running the normal mix of Microsoft/Office type applications (of which I expect MS-Exchange to be the most demanding). I gather the LSI card allows you to dedicate the entire NVRAM cache to writes should you so-wish. I had kinda given up on expecting read-ahead assistance from the hardware... Im hoping that the large-ish write cache will simply release zfs to go away and continue its work as soon as possible. Thoughts much appreciated. Nick This message posted from opensolaris.org
Johan Hartzenberg
2008-Feb-10 11:00 UTC
[zfs-discuss] OpenSolaris, ZFS and Hardware RAID, a recipe for success?
On Feb 10, 2008 9:06 AM, Jonathan Loran <jloran at ssl.berkeley.edu> wrote:> > > Richard Elling wrote: > > Nick wrote: > > > Using the RAID cards capability for RAID6 sounds attractive? > > > > Assuming the card works well with Solaris, this sounds like a > reasonable solution. > > > > Careful here. If your workload is unpredictable, RAID 6 (and RAID 5) for > that matter will break down under highly randomized write loads. There''s a > lot of trickery done with hardware RAID cards that can do some read-ahead > caching magic, improving the read-paritycalc-paritycalc-write cycle, but you > can''t beat out the laws of physics. If you do *know* you''ll be streaming > more than writing random small number of blocks, RAID 6 hardware can work. > But with transaction like loads, performance will suck. > > Jon >I would like to echo Jon''s sentiments and add the following: If you are going to have a mix of workload types or if your IO pattern is unknown, then I would suggest that you configure the array as a JBOD and use raidz. Raid 5 or Raid 6 works best for predictable IOs with well controlled IO unit sizes. How you lay it out depends on whether you need (or want) hot spares. What are your objectives here? Maximum throughput, lowest latencies, maximum space, best redundancy, serviceability/portability, or .... ? Cheers, _J -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080210/dfcfa86a/attachment.html>
> <div id="jive-html-wrapper-div"> > <br><br><div class="gmail_quote">On Feb 10, 2008 9:06 > AM, Jonathan Loran <<a > href="mailto:jloran at ssl.berkeley.edu">jloran at ssl.berke > ley.edu</a>> wrote:<br><blockquote > class="gmail_quote" style="border-left: 1px solid > rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; > padding-left: 1ex;"> > > > > > div bgcolor="#ffffff" text="#000000"> > <br> > <br> > Richard Elling wrote: > <blockquote type="cite"><div class="Ih2E3d"> > <pre>Nick wrote: > </pre> > <blockquote type="cite"> > <pre> Using the RAID cards capability for RAID6 > sounds attractive? > > </pre> > /blockquote> > <pre>Assuming the card works well with Solaris, > this sounds like a > easonable solution. > > </pre></div> > blockquote> > Careful here. If your workload is > unpredictable, RAID 6 (and RAID 5) > for that matter will break down under highly > randomized write loads. > There''s a lot of trickery done with hardware RAID > cards that can do > some read-ahead caching magic, improving the > read-paritycalc-paritycalc-write cycle, but you > can''t beat out the laws > of physics. If you do *know* you''ll be > streaming more than writing > random small number of blocks, RAID 6 hardware can > work. But with > transaction like loads, performance will suck. > <br> > <br> > Jon<br> > </div></blockquote><div><br>I would like to echo > Jon''s sentiments and > add the following: If you are going to have a > mix of workload types or > if your IO pattern is unknown, then I would suggest > that you configure > the array as a JBOD and use raidz. Raid 5 or > Raid 6 works best for > predictable IOs with well controlled IO unit > sizes.<br> > <br>How you lay it out depends on whether you need > (or want) hot > spares. What are your objectives here? > Maximum throughput, lowest > latencies, maximum space, best redundancy, > serviceability/portability, > or .... ?<br><br> > Cheers,<br><font color="#888888"> > _J<br><br></font></div></div>The top priority would be to provide some redundancy, the ability to cope with upto 2 disk failures out of the 12-disk array is very attractive. Next-up I would say performance is important. I will have no control over how many virtual (or physical) machines access their storage through the device, although I would characterise any single VM as undemanding. Certainly nothing transactional, and database access will be light. I am expecting MS-Exchange and general CIFS shares to the desktop to be the most greedy consumer. I am hoping that having 12 spindles, and 15Krpm SAS drives, will be a good base to build upon, performance wise. As this system will be handed over to non-solaris, mostly windowsy types to use, I will be investing a lot of time in trying to make the system as "set and forget" as possible, and am prepared to take some compromises in order to achieve that :-) Nick This message posted from opensolaris.org
Kyle McDonald
2008-Feb-10 13:48 UTC
[zfs-discuss] OpenSolaris, ZFS and Hardware RAID, a recipe for success?
Richard Elling wrote:> Nick wrote: > >> I have been tasked with putting together a storage solution for use in a virtualization setup, serving NFS, CIFS, and iSCSI, over GigE. I''ve inherited a few components to work with: >> >> x86 dual core server , 512MB LSI-8888ELP RAID card >> 12 x 300GB 15Krpm SAS disks & array >> 2GB Flash to IDE "disk"/adaptor. >> >> The system will be serving virtual hard disks to a range of vmware systems connected by GigE, running enterprise workloads that are impossible to predict at this point. >> >> Using the RAID cards capability for RAID6 sounds attractive? >> >> > > Assuming the card works well with Solaris, this sounds like a > reasonable solution. >Another solution, might be to create several (12?) single disk RAID0 LUNs, and let ZFS do your redundancy across them. The HW RAID card will still give each RAID0 LUN the advantages of the NVRAM cache, but with ZFS (RAIDZ2?) doing the redundancy, then ZFS will be able to recover from more situations (at least as I understand it.)> >> Using the Flash RAM for the ZIL? >> >> > > I''m not sure why you would want to do this. Just carve off a > LUN or slice on the RAID card and use its NVRAM cache. > A consumer class flash "disk" will be slower. > >This is an interesting observation. Will a separate LUN or slice on the RAID card perform better not separateing out the ZIL at all? I''m trying to imagine how this works. How does the behavior of internal ZIL differ from the behaivor of the external ZIL? Given that they''d be sharing the same drives in this case how will it help performance? I''m thinking of a comparison of an internal ZIL on a RAIDZ(2?) of 12 single drive RAID0 LUNs, vs. either A) 12 RAID0 LUNs made from 95%+ of the disks plus a RAID (5,6,10?,Z?,Z2?, something else?) LUN made from the remaining space on the 12 disks, or 11 single drive RAID0 LUNS plus a single drive RAID0 LUN for the ZIL. I can see where B might be an improvement. but no redundancy for the ZIL, and un;ess it''s a smaller disk, probably wastes space. A offers redundancy in the ZIL, and many spindles to use, but I''d image the heads would be thrashing between the ZIL portion of the disk and the ZFS portion? wouldn''t that hurt performance? -Kyle>> Using zfs for general storage management? >> >> >> > > cool. > > >> Has anyone built a similar system, what is the true path to success? >> >> > > Success is at the summit, but there are several paths up the mountain. > > >> What are the pitfalls? >> What should I have on my reading list for starters? >> >> > > Start with the ZFS system admin guide on opensolaris.org. > We try to keep the solarisinternals.com wikis up to date also. > -- richard > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
> Careful here.? If your workload is unpredictable, RAID 6 (and RAID 5) > for that matter will break down under highly randomized write loads.?Oh? What precisely do you mean by "break down"? RAID 5''s write performance is well-understood and it''s used successfully in many installations for random write loads. Clearly if you need the very highest performance from a given amount of hardware, RAID 1 will perform better for random writes, but RAID 5 can be quite good. (RAID 6 is slightly worse, since a random write requires access to 3 disks instead of 2.) There are certainly bad implementations out there, but in general RAID 5 is a reasonable choice for many random-access workloads. (For those who haven''t been paying attention, note that RAIDZ and RAIDZ2 are closer to RAID 3 in implementation and performance than to RAID 5; neither is a good choice for random-write workloads.) This message posted from opensolaris.org
Richard Elling
2008-Feb-11 00:35 UTC
[zfs-discuss] OpenSolaris, ZFS and Hardware RAID, a recipe for success?
Kyle McDonald wrote:> Richard Elling wrote: > >> Nick wrote: >> >> >>> I have been tasked with putting together a storage solution for use in a virtualization setup, serving NFS, CIFS, and iSCSI, over GigE. I''ve inherited a few components to work with: >>> >>> x86 dual core server , 512MB LSI-8888ELP RAID card >>> 12 x 300GB 15Krpm SAS disks & array >>> 2GB Flash to IDE "disk"/adaptor. >>> >>> The system will be serving virtual hard disks to a range of vmware systems connected by GigE, running enterprise workloads that are impossible to predict at this point. >>> >>> Using the RAID cards capability for RAID6 sounds attractive? >>> >>> >>> >> Assuming the card works well with Solaris, this sounds like a >> reasonable solution. >> >> > Another solution, might be to create several (12?) single disk RAID0 > LUNs, and let ZFS do your redundancy across them. The HW RAID card will > still give each RAID0 LUN the advantages of the NVRAM cache, but with > ZFS (RAIDZ2?) doing the redundancy, then ZFS will be able to recover > from more situations (at least as I understand it.) > >> >> >>> Using the Flash RAM for the ZIL? >>> >>> >>> >> I''m not sure why you would want to do this. Just carve off a >> LUN or slice on the RAID card and use its NVRAM cache. >> A consumer class flash "disk" will be slower. >> >> >> > This is an interesting observation. > > Will a separate LUN or slice on the RAID card perform better not > separateing out the ZIL at all? >Yes. You want to avoid mixing the ZIL iops with the regular data iops with contention at the LUN. This is no different than separating redo logs for databases.> I''m trying to imagine how this works. How does the behavior of internal > ZIL differ from the behaivor of the external ZIL? Given that they''d be > sharing the same drives in this case how will it help performance? > >ZIL log can be considered a write-only workload. The only time you read the ZIL is on an unscheduled reboot.> I''m thinking of a comparison of an internal ZIL on a RAIDZ(2?) of 12 > single drive RAID0 LUNs, vs. either A) 12 RAID0 LUNs made from 95%+ of > the disks plus a RAID (5,6,10?,Z?,Z2?, something else?) LUN made from > the remaining space on the 12 disks, or 11 single drive RAID0 LUNS plus > a single drive RAID0 LUN for the ZIL. > > I can see where B might be an improvement. but no redundancy for the > ZIL, and un;ess it''s a smaller disk, probably wastes space. > > A offers redundancy in the ZIL, and many spindles to use, but I''d image > the heads would be thrashing between the ZIL portion of the disk and the > ZFS portion? wouldn''t that hurt performance? >The ZIL log should be a mostly sequential write workload which will likely be coalesced at least once along the way. It is also latency sensitive, which is why the NVRAM cache is a good thing. Beyond those simple observations, it is not clear which of the multitude of possible configurations will be best. Let us know what you find. -- richard
Anton B. Rang wrote:>> Careful here. If your workload is unpredictable, RAID 6 (and RAID 5) >> for that matter will break down under highly randomized write loads. >> > > Oh? What precisely do you mean by "break down"? RAID 5''s write performance is well-understood and it''s used successfully in many installations for random write loads. Clearly if you need the very highest performance from a given amount of hardware, RAID 1 will perform better for random writes, but RAID 5 can be quite good. (RAID 6 is slightly worse, since a random write requires access to 3 disks instead of 2.) > > There are certainly bad implementations out there, but in general RAID 5 is a reasonable choice for many random-access workloads. > > (For those who haven''t been paying attention, note that RAIDZ and RAIDZ2 are closer to RAID 3 in implementation and performance than to RAID 5; neither is a good choice for random-write workloads.) > > >In my testing, if you have a lot of IO queues spread widely across your array, you do better with RAID 1 or 10. RAIDZ and RAIDZ2 are much worse, yes. If you add large transfers on top of this, which happen in multi-purpose pools, small reads can get starved out. The throughput curve (IO rate vs. queues*size) with RAID 5-6 flattens out a lot faster than with RAID 10. The scoop is this. On multipurpose pools, zfs often takes the place of many individual file systems. Those had the advantage of separation of IO and some tuning was also available to each file system. My experience, or should I say theory is that RAID 5,6 hardware accelerated arrays work pretty good in more predictable IO patterns. Sometimes even great. I use RAID 5,6 a lot for these. Don''t get me wrong, I love zfs, I ain''t going back. Don''t start flaming me, I just think we have to be aware of the limitations and engineer our storage carefully. I made the mistake recently of putting to much faith in hardware RAID 6, and as our user load grew, the performance went through the floor faster than I thought it would. My 2 cents. Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080210/0391f873/attachment.html>