Jochen M. Kaiser
2006-Dec-08 14:19 UTC
[zfs-discuss] ZFS Usage in Warehousing (lengthy intro)
Dear all, we''re currently looking forward to restructure our hardware environment for our datawarehousing product/suite/solution/whatever. We''re currently running the database side on various SF V440''s attached via dual FC to our SAN backend (EMC DMX3) with UFS. The storage system is (obviously in a SAN) shared between many systems. Performance is mediocre in terms of raw throughput at 70-150MB/sec. (lengthy, sequential reads due to full table scan operations on the db side) and excellent is terms of I/O and service times (averaging at 1,7ms according to sar).>From our applications perspective sequential read is the most important factor.Read-to-Write ratio is almost 20:1. We now want to consolidate our database servers (Oracle, btw.) to a pair of x4600 systems running Solaris 10 (which we''ve already tested in a benchmark setup). The whole system was still I/O-bound, even though the backend (3510, 12x146GB, QFS, RAID10) delivered a sustained data rate of 250-300MB/sec. I''d like to target a sequential read performance of 500++MB/sec while reading from the db on multiple tablespaces. We''re experiencing massive data volume growth of about 100% per year and are therefore looking both for an expandable, yet "cheap" solution. We''d like to use a DAS solution, because we had negative experiences with SAN in the past in terms of tuning and throughput. Being a friend of simplicity I was thinking about using a pair (or more) of 3320 SCSI JBODs with multiple RAIDZ and/or RAID10 zfs disk pools on which we''d place the database. If we need more space we''ll simply connect yet another JBOD. I''d calculate 1-2 PCIe U320 controllers (w/o raid) per jbod, starting with a minimum of 4 controllers per server. Regarding ZFS I''d be very interested to know, whether someone else is running a similar setup and can provide me with some hints or point me at some caveats. I''d be also very interested in the cpu usage of such a setup for the zfs raidz pools. After searching this forum I found the rule of thumb that 200MB/sec throughput roughly consume one 2GHz Opteron cpu, but am hoping that someone can provide me with some in depth data. (Frankly I can hardly imagine that this holds true for reads). I''d be also be interested in you opinion on my targeted setup, so if you have any comments - go ahead. Any help is appreciated, Jochen P.S. Fallback scenarios would be Oracle with ASM or a (zfs/ufs) SAN setup. This message posted from opensolaris.org
Anton B. Rang
2006-Dec-08 15:18 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro)
If your database performance is dominated by sequential reads, ZFS may not be the best solution from a performance perspective. Because ZFS uses a write-anywhere layout, any database table which is being updated will quickly become scattered on the disk, so that sequential read patterns become random reads. Of course, ZFS has other benefits, such as ease of use and protection from many sources of data corruption; if you want to use ZFS in this application, though, I''d expect that you will need substantially more raw I/O bandwidth than UFS or QFS (which update in place) would require. (If you have predictable access patterns to the tables, a QFS setup which ties certain tables to particular LUNs using stripe groups might work well, as you can guarantee that accesses to one table will not interfere with accesses to another.) As always, your application is the real test. ;-) This message posted from opensolaris.org
Jim Mauro
2006-Dec-09 16:39 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro)
But can''t this behavior be "tuned" (so to speak...I hate that word but I can''t think of something better) by increasing the recordsize? For DSS applications, Video streaming, etc....apps that read very large files, I seem to remember (in some ZFS work many, many months ago), getting very good (excellent?) sequential read performance by tweaking recordsize to 1MB. (I may be remembering this wrong.....will recheck this). Thanks, /jim Anton B. Rang wrote:> If your database performance is dominated by sequential reads, ZFS may not be the best solution from a performance perspective. Because ZFS uses a write-anywhere layout, any database table which is being updated will quickly become scattered on the disk, so that sequential read patterns become random reads. > > Of course, ZFS has other benefits, such as ease of use and protection from many sources of data corruption; if you want to use ZFS in this application, though, I''d expect that you will need substantially more raw I/O bandwidth than UFS or QFS (which update in place) would require. > > (If you have predictable access patterns to the tables, a QFS setup which ties certain tables to particular LUNs using stripe groups might work well, as you can guarantee that accesses to one table will not interfere with accesses to another.) > > As always, your application is the real test. ;-) > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
James Dickens
2006-Dec-09 17:17 UTC
[zfs-discuss] ZFS Usage in Warehousing (lengthy intro)
On 12/8/06, Jochen M. Kaiser <jochen.m.kaiser at t-online.de> wrote:> Dear all, > > we''re currently looking forward to restructure our hardware environment for > our datawarehousing product/suite/solution/whatever. > > We''re currently running the database side on various SF V440''s attached via > dual FC to our SAN backend (EMC DMX3) with UFS. The storage system is > (obviously in a SAN) shared between many systems. Performance is mediocre > in terms of raw throughput at 70-150MB/sec. (lengthy, sequential reads due to > full table scan operations on the db side) and excellent is terms of I/O and > service times (averaging at 1,7ms according to sar). > >From our applications perspective sequential read is the most important factor. > Read-to-Write ratio is almost 20:1. > > We now want to consolidate our database servers (Oracle, btw.) to a pair of > x4600 systems running Solaris 10 (which we''ve already tested in a benchmark > setup). The whole system was still I/O-bound, even though the backend (3510, > 12x146GB, QFS, RAID10) delivered a sustained data rate of 250-300MB/sec.Just a thought. have you thought about giving thumper x4500''s a trial for this work load? Oracle would seem to be IO limited in the end so 4 cores may be enough to keep oracle happy when linked with upto 2GB/s disk IO speed. James Dickens uadmin.blogspot.com> I''d like to target a sequential read performance of 500++MB/sec while reading > from the db on multiple tablespaces. We''re experiencing massive data volume > growth of about 100% per year and are therefore looking both for an expandable, > yet "cheap" solution. We''d like to use a DAS solution, because we had negative > experiences with SAN in the past in terms of tuning and throughput. > > Being a friend of simplicity I was thinking about using a pair (or more) of 3320 > SCSI JBODs with multiple RAIDZ and/or RAID10 zfs disk pools on which we''d > place the database. If we need more space we''ll simply connect yet another > JBOD. I''d calculate 1-2 PCIe U320 controllers (w/o raid) per jbod, starting with a > minimum of 4 controllers per server. > > Regarding ZFS I''d be very interested to know, whether someone else is running > a similar setup and can provide me with some hints or point me at some caveats. > > I''d be also very interested in the cpu usage of such a setup for the zfs raidz > pools. After searching this forum I found the rule of thumb that 200MB/sec > throughput roughly consume one 2GHz Opteron cpu, but am hoping that someone > can provide me with some in depth data. (Frankly I can hardly imagine that this > holds true for reads). > > I''d be also be interested in you opinion on my targeted setup, so if you have > any comments - go ahead. > > Any help is appreciated, > > Jochen > > P.S. Fallback scenarios would be Oracle with ASM or a (zfs/ufs) SAN setup. > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Richard Elling
2006-Dec-09 17:34 UTC
[zfs-discuss] ZFS Usage in Warehousing (lengthy intro)
Jochen M. Kaiser wrote:> Dear all, > > we''re currently looking forward to restructure our hardware environment for > our datawarehousing product/suite/solution/whatever.cool.> We''re currently running the database side on various SF V440''s attached via > dual FC to our SAN backend (EMC DMX3) with UFS. The storage system is > (obviously in a SAN) shared between many systems. Performance is mediocre > in terms of raw throughput at 70-150MB/sec. (lengthy, sequential reads due to > full table scan operations on the db side) and excellent is terms of I/O and > service times (averaging at 1,7ms according to sar). >>From our applications perspective sequential read is the most important factor. > Read-to-Write ratio is almost 20:1.Do you do updates? Or is this a (more typical) append-mostly data warehouse? Anton mentions why this is important when trying to predict the appropriateness of ZFS.> We now want to consolidate our database servers (Oracle, btw.) to a pair of > x4600 systems running Solaris 10 (which we''ve already tested in a benchmark > setup). The whole system was still I/O-bound, even though the backend (3510, > 12x146GB, QFS, RAID10) delivered a sustained data rate of 250-300MB/sec. > > I''d like to target a sequential read performance of 500++MB/sec while reading > from the db on multiple tablespaces. We''re experiencing massive data volume > growth of about 100% per year and are therefore looking both for an expandable, > yet "cheap" solution. We''d like to use a DAS solution, because we had negative > experiences with SAN in the past in terms of tuning and throughput.Getting such high, sustained data rates with typical SAN-style systems becomes quite expensive. The media can deliver 50-150 MBytes/s sequentially, but you run into bottlenecks in all of the busses, controllers, loops, memory, and copies of the data between the media and Oracle.> Being a friend of simplicity I was thinking about using a pair (or more) of 3320 > SCSI JBODs with multiple RAIDZ and/or RAID10 zfs disk pools on which we''d > place the database. If we need more space we''ll simply connect yet another > JBOD. I''d calculate 1-2 PCIe U320 controllers (w/o raid) per jbod, starting with a > minimum of 4 controllers per server.Yes, this is a natural progression. KISS.> Regarding ZFS I''d be very interested to know, whether someone else is running > a similar setup and can provide me with some hints or point me at some caveats. > > I''d be also very interested in the cpu usage of such a setup for the zfs raidz > pools. After searching this forum I found the rule of thumb that 200MB/sec > throughput roughly consume one 2GHz Opteron cpu, but am hoping that someone > can provide me with some in depth data. (Frankly I can hardly imagine that this > holds true for reads).Personally, I''d avoid raid-z for this, I''m a mirroring fan. We do have a model for small, random reads, but don''t have a good model for large, sequential reads. You might have to do some tests to see where the performance envelope is for your hardware.> I''d be also be interested in you opinion on my targeted setup, so if you have > any comments - go ahead. > > Any help is appreciated, > > Jochen > > P.S. Fallback scenarios would be Oracle with ASM or a (zfs/ufs) SAN setup.The same bottlenecks still remain, though. -- richard
Luke Lonergan
2006-Dec-10 05:28 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro)
Anton, On 12/8/06 7:18 AM, "Anton B. Rang" <Anton.Rang at Sun.COM> wrote:> If your database performance is dominated by sequential reads, ZFS may not be > the best solution from a performance perspective. Because ZFS uses a > write-anywhere layout, any database table which is being updated will quickly > become scattered on the disk, so that sequential read patterns become random > reads.This is not the case with ZFS, it writes in large sequential chunks like other FS. For the Sun data warehouse appliance, we use clusters of X4500 and each one sustains 1.7GB/s of sequential scan rate through the database. A cluster of ten sustains 17GB/s through the database. I would be surprised if the Sun data warehouse powered by Greenplum and using ZFS isn''t over 100 times faster than the planned Oracle system mentioned in this thread and likely a lot cheaper. - Luke
Jochen M. Kaiser
2006-Dec-10 09:51 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (no more lengthy intro)
James,> Just a thought. > > have you thought about giving thumper x4500''s a trial > for this work > load? Oracle would seem to be IO limited in the end > so 4 cores may be > enough to keep oracle happy when linked with upto > 2GB/s disk IO speed.== Actually yes, however I''ve doubts in regard to scalability of cpu power. I''d imagine that a RaidZ setup will increase cpu usage of zfs, so Mirroring will be the way to go. I''ve also browsed some info on greenplum and other appliance vendors. However none are listed as strategic products for our company (forcing a lengthy assessment process), support/consulting in Germany is usually non-existent and a port of our current setup is difficult at best. I''ve asked Robert Milkowski (milek.blogspot.com) if he can provide me with some cpu figures from his throughput benchmarks. Jochen This message posted from opensolaris.org
Jochen M. Kaiser
2006-Dec-10 10:07 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro)
Richard, <snip>> Do you do updates? Or is this a (more typical) > append-mostly data warehouse? > Anton mentions why this is important when trying to > predict the appropriateness > of ZFS.==We do updates, for (almost) every record we insert, we update (expire) an old record. So in terms of read and write ratio I''d say that we''ve 40(read/select):1(write/insert):1(write/update).> > I''d like to target a sequential read performance of > > 500++MB/sec while reading > > from the db on multiple tablespaces. <snip> > Getting such high, sustained data rates with typical > SAN-style systems becomes > quite expensive. The media can deliver 50-150 > MBytes/s sequentially, but you > run into bottlenecks in all of the busses, > controllers, loops, memory, and > copies of the data between the media and Oracle.==Yes, apart from this we''re having a very adverse effect on the rest of the applications placed on the SAN and vice versa. A recovery of a large SAP instance extended our loading time by almost 25%. IMHO warehousing and (non-dedicated) SAN environments don''t work well together. Apart from this SAN setups are usually much more difficult to setup, tune and maintain. Therefore I favor a DAS based approach.> Personally, I''d avoid raid-z for this, I''m a > mirroring fan. We do have a > model for small, random reads, but don''t have a good > model for large, sequential > reads. You might have to do some tests to see where > the performance envelope > is for your hardware.==Does this statment hold true for both raid-z and striped mirrors or only for the first? Jochen This message posted from opensolaris.org
Wee Yeh Tan
2006-Dec-10 14:45 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro)
On 12/10/06, Luke Lonergan <llonergan at greenplum.com> wrote:> This is not the case with ZFS, it writes in large sequential chunks like > other FS.The problem here is that ZFS is COW so while the first create is sequential, subsequent updates of random blocks will completely mess up placing of the data file. I will imagine that turning up the record size should help here but I do not have the benchmarks to show that. I am interested in what Greenplum does different than the default ZFS settings especially for recordsize. I also wonder if the said performance difference can be maintained if we replicate the setup but run Oracle instead of pgsql. -- Just me, Wire ...
Richard Elling
2006-Dec-10 18:08 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro)
Jochen M. Kaiser wrote:> Richard, > > <snip> >> Do you do updates? Or is this a (more typical) >> append-mostly data warehouse? >> Anton mentions why this is important when trying to >> predict the appropriateness >> of ZFS. > ==> We do updates, for (almost) every record we insert, we update > (expire) an old record. So in terms of read and write ratio I''d > say that we''ve 40(read/select):1(write/insert):1(write/update).In ZFS the inserts and updates will be physically close. I''d be very interested in seeing how ZFS reacts over time on such workloads.>>> I''d like to target a sequential read performance of >>> 500++MB/sec while reading >>> from the db on multiple tablespaces. <snip> >> Getting such high, sustained data rates with typical >> SAN-style systems becomes >> quite expensive. The media can deliver 50-150 >> MBytes/s sequentially, but you >> run into bottlenecks in all of the busses, >> controllers, loops, memory, and >> copies of the data between the media and Oracle. > ==> Yes, apart from this we''re having a very adverse effect > on the rest of the applications placed on the SAN and vice > versa. A recovery of a large SAP instance extended our > loading time by almost 25%. > IMHO warehousing and (non-dedicated) SAN environments > don''t work well together. Apart from this SAN setups are > usually much more difficult to setup, tune and maintain. > Therefore I favor a DAS based approach.Agree.>> Personally, I''d avoid raid-z for this, I''m a >> mirroring fan. We do have a >> model for small, random reads, but don''t have a good >> model for large, sequential >> reads. You might have to do some tests to see where >> the performance envelope >> is for your hardware. > ==> Does this statment hold true for both raid-z and striped > mirrors or only for the first?Correct me if I misinterpret your question. The small random read performance model considers a raidz set as having the equivalent performance of one disk. The mirror model offers the performance of N disks, where N is the number of disks in the mirror set. Striping raidz or mirrors multiplies by the number of sets. A sequential workload is more tricky because you begin to see caching affects on reads. This isn''t a problem on writes because you will very quickly blow through any caches in the path and achieve media speed. But it isn''t clear to me that a sequential write performance model is useful. Thoughts? -- richard
Luke Lonergan
2006-Dec-10 18:33 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro)
Wee, On 12/10/06 6:45 AM, "Wee Yeh Tan" <weeyeh at gmail.com> wrote:> I am interested in what Greenplum does different than the default ZFS > settings especially for recordsize. I also wonder if the said > performance difference can be maintained if we replicate the setup but > run Oracle instead of pgsql.The performance comes from the parallel version of pgsql, which uses all CPUs and I/O channels together (and not special settings of ZFS). What sets this apart from Oracle is that it''s an automatic parallelism that leverages the internal storage across multiple machines, no SAN involved. Fault tolerance is also automatic and up to half of the machines can fail without downtime. Oracle''s fault and parallelism model relies on SAN, which is both expensive and does not scale bandwidth with CPUs. With the Sun warehouse, scaling of CPU and storage bandwidth are coupled - just add another node with internal ZFS storage and the response time, data loading, updates, etc scale. This also works for other platforms like the X4600 with direct attach storage, which may work better for some people, particularly with the upcoming dense storage arrays. In the case of the X4600, you''ll be able to get 32 CPUs (with quad core) and 256GB of RAM per host and all of it working on each query, update, etc without modification to the DBMS app. And if the I/O bandwidth isn''t sufficient within one X4600, more X4600s can be added. Unlike the Oracle case, all X4600s are used together for each query. For instance, with 4 X4600s with 256GB of RAM each, you have 1TB of RAM that will be used for I/O cache within the DBMS. Most modern data warehouses are larger than 10TB, so you can''t fit all of the working set in RAM, but the more RAM working for you on complex queries or for multiple users, the better. - Luke
Wee Yeh Tan
2006-Dec-11 07:10 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro)
Luke, On 12/11/06, Luke Lonergan <llonergan at greenplum.com> wrote:> The performance comes from the parallel version of pgsql, which uses all > CPUs and I/O channels together (and not special settings of ZFS). What sets > this apart from Oracle is that it''s an automatic parallelism that leverages > the internal storage across multiple machines, no SAN involved. Fault > tolerance is also automatic and up to half of the machines can fail without > downtime. Oracle''s fault and parallelism model relies on SAN, which is both > expensive and does not scale bandwidth with CPUs. With the Sun warehouse, > scaling of CPU and storage bandwidth are coupled - just add another node > with internal ZFS storage and the response time, data loading, updates, etc > scale. > ...Cool. That''s great stuff. I certainly wasn''t aware of such advances to pgsql. I''ll get my hands on greenplum and give it a spin :). -- Just me, Wire ...
Roch - PAE
2006-Dec-11 11:03 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro)
Anton B. Rang writes: > If your database performance is dominated by sequential reads, ZFS may > not be the best solution from a performance perspective. Because ZFS > uses a write-anywhere layout, any database table which is being > updated will quickly become scattered on the disk, so that sequential > read patterns become random reads. While for OLTP our best practice is to set the ZFS recordsize to match the DB blocksize, for DSS we would advise to run without such tuning. True the sequential reads becomes random reads but of 128K records and that should still allow to draw close to 20-25MB/s per [modern] disk. So to reach your goal of 500MB/s++ you would need 20++ disks. -r > > Of course, ZFS has other benefits, such as ease of use and protection > from many sources of data corruption; if you want to use ZFS in this > application, though, I''d expect that you will need substantially more > raw I/O bandwidth than UFS or QFS (which update in place) would > require. > > (If you have predictable access patterns to the tables, a QFS setup > which ties certain tables to particular LUNs using stripe groups might > work well, as you can guarantee that accesses to one table will not > interfere with accesses to another.) > > As always, your application is the real test. ;-) > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Robert Milkowski
2006-Dec-12 14:36 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (no more lengthy intro)
Hello Jochen, Sunday, December 10, 2006, 10:51:57 AM, you wrote: JMK> James,>> Just a thought. >> >> have you thought about giving thumper x4500''s a trial >> for this work >> load? Oracle would seem to be IO limited in the end >> so 4 cores may be >> enough to keep oracle happy when linked with upto >> 2GB/s disk IO speed.JMK> == JMK> Actually yes, however I''ve doubts in regard to scalability JMK> of cpu power. I''d imagine that a RaidZ setup will increase JMK> cpu usage of zfs, so Mirroring will be the way to go. JMK> I''ve also browsed some info on greenplum and other appliance JMK> vendors. However none are listed as strategic products for our JMK> company (forcing a lengthy assessment process), support/consulting JMK> in Germany is usually non-existent and a port of our current setup JMK> is difficult at best. JMK> I''ve asked Robert Milkowski (milek.blogspot.com) if he can provide JMK> me with some cpu figures from his throughput benchmarks. It''s not that bad with CPU usage. For example with RAID-Z2 while doing scrub I get something like 800MB/s read from disks (550-600MB/s from zpool iostat perspective) and all four cores are mostly consumed - I get something like 10% idle on each cpu. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
On Fri, 8 Dec 2006, Jochen M. Kaiser wrote:> Dear all, > > we''re currently looking forward to restructure our hardware environment for > our datawarehousing product/suite/solution/whatever. > > We''re currently running the database side on various SF V440''s attached via > dual FC to our SAN backend (EMC DMX3) with UFS. The storage system is > (obviously in a SAN) shared between many systems. Performance is mediocre > in terms of raw throughput at 70-150MB/sec. (lengthy, sequential reads due to > full table scan operations on the db side) and excellent is terms of I/O and > service times (averaging at 1,7ms according to sar). > >From our applications perspective sequential read is the most important factor. > Read-to-Write ratio is almost 20:1. > > We now want to consolidate our database servers (Oracle, btw.) to a pair of > x4600 systems running Solaris 10 (which we''ve already tested in a benchmark > setup). The whole system was still I/O-bound, even though the backend (3510, > 12x146GB, QFS, RAID10) delivered a sustained data rate of 250-300MB/sec. > > I''d like to target a sequential read performance of 500++MB/sec while reading > from the db on multiple tablespaces. We''re experiencing massive data volume > growth of about 100% per year and are therefore looking both for an expandable, > yet "cheap" solution. We''d like to use a DAS solution, because we had negative > experiences with SAN in the past in terms of tuning and throughput. > > Being a friend of simplicity I was thinking about using a pair (or more) of 3320 > SCSI JBODs with multiple RAIDZ and/or RAID10 zfs disk pools on which we''dHave you not heard that SCSI is dead? :) But seriously, the big issue with SCSI, is that the SCSI commands are sent over the SCSI bus at the original (legacy) rate of 5 Mbits/Sec in 8-bit mode. And since it takes an average of 5 SCSI commands to do something useful, you can''t send enough commands over the bus to busy out a modern SCSI drive. Even a single drive on a single SCSI bus. Also, it takes a lot of time to send those commands - so you have latency. And everyone understands how latency affects throughput on a LAN (or WAN) .. same issue with SCSI. This is the main reason why SCSI is EOL and could not be extended without breaking the existing standards. While I understand you don''t want to build a SAN, an alternative would be a Fibre Channel (FC) box that presents SATA drives. This would be a DAS solution with one or two connections to (Qlogic) FC controllers in the host - IOW not a SAN and there is no FC switch required. Many such boxes are designed to provide expansion to a FC based hardware RAID box. For example, the DS4000 EXP100 Storage Expansion Unit from IBM. In your application you''d need to find something that supports FC rates of 4Gb/Sec, if possible. Another possiblity, which is on my todo list to checkout, is: http://www.norcotek.com/item_detail.php?categoryid=8&modelno=DS-1220 Now if I could find a Marvell based equivalent to the: http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm with external SATA ports, life would be great. Another card with external SATA ports that works with Solaris (via the si3124 driver) is: http://www.newegg.com/product/product.asp?item=N82E16816124003 which only has a 32-bit PCI connection. :(> place the database. If we need more space we''ll simply connect yet another > JBOD. I''d calculate 1-2 PCIe U320 controllers (w/o raid) per jbod, starting with a > minimum of 4 controllers per server. > > Regarding ZFS I''d be very interested to know, whether someone else is running > a similar setup and can provide me with some hints or point me at some caveats. > > I''d be also very interested in the cpu usage of such a setup for the zfs raidz > pools. After searching this forum I found the rule of thumb that 200MB/sec > throughput roughly consume one 2GHz Opteron cpu, but am hoping that someone > can provide me with some in depth data. (Frankly I can hardly imagine that this > holds true for reads). > > I''d be also be interested in you opinion on my targeted setup, so if you have > any comments - go ahead. > > Any help is appreciated, > > Jochen > > P.S. Fallback scenarios would be Oracle with ASM or a (zfs/ufs) SAN setup. >Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006
On Dec 12, 2006, at 10:02, Al Hopper wrote:> > Another possiblity, which is on my todo list to checkout, is: > > http://www.norcotek.com/item_detail.php?categoryid=8&modelno=DS-1220I would not go with this device. I picked up one along with 12 500GB SATA drives with the hopes of making a dumping ground on the network for my servers to rsync to. Now I might have it all kinds of not configured or tuned correctly in terms of solaris & zfs (which if I do I can''t fiure out), but performance is terrible compared to my existing dumping ground based on a cheap-o raid-5 card & freebsd
Anton B. Rang
2006-Dec-12 17:58 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro)
> But seriously, the big issue with SCSI, is that the SCSI commands are sent > over the SCSI bus at the original (legacy) rate of 5 Mbits/Sec in 8-bit > mode.Actually, this isn''t true on the newest (Ultra320) SCSI systems, though I don''t know if the 3320 supports packetized SCSI. It''s definitely an issue for older SCSI buses if the reads and writes are small, less than a megabyte, say. (For data warehousing applications you should see larger reads, as long as your data is laid out contiguously on disk.) There''s rather a nice chart at http://www.hitachigst.com/hdd/library/whitepap/tech/hdwpacket.htm showing how the overhead grows with the speed of the bus.> And since it takes an average of 5 SCSI commands to do something usefulUrm? What''s wrong with just READ(10) or WRITE(10)?> Also, it takes a lot of time to send those commands - so you have latency.Not much compared to the rotational latency if you''re actually reading from media, though. (Measured latency for a read operation with disconnect/reconnect on a parallel SCSI bus is around 22 ?s. [That''s microseconds in case your mail program/browser doesn''t get it right.])> This is the main reason why SCSI is EOLI presume you mean parallel SCSI? I''d argue that the larger reason was the cost and cooling requirements of parallel cabling; SAS seems to be alive, at least, if not taking off quickly. FC, SAS, and SATA all have lower overhead since they''re point-to-point and don''t need to arbitrate (or drive multiple receivers). How noticeable this is depends on your application. For large sequential I/O, the data transfer time dominates the overhead; for random I/O, the seek time and rotational latency dominates the overhead. Only in the cases where you''re doing fairly small sequential I/Os, you have a very fast caching controller, or you have so many spindles on one connection that you have enough I/O operations in flight to keep the bus busy, will this matter much. For this application, with a mix of random & sequential I/O, FC disks, or other disks with very low seek+rotation times, might perform quite a lot better than inexpensive disks with longer seek+rotation times. I''d be concerned that the updates would dominate performance, unless they''re happening at a rate of fewer than about 50/second/spindle. Anton This message posted from opensolaris.org
> http://www.norcotek.com/item_detail.php?categoryid=8&modelno=DS-1220yea SiI3726 Multipliers, are cool.. http://cooldrives.com/cosapomubrso.html http://cooldrives.com/mac-port-multiplier-sata-case.html but finding PCI-X slots for Ying Tian''s si3124 or marvell88sx cards are getting tricky.. even harder at 133Mhz. the 1x PCIe two SATA si3132 card should come up http://elektronkind.org/category/geekery/solaris/ but has issues http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6404812 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6492430 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6492427 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=2133861 what would be nice is support for Marvell''s 88SX7042 4x PCIe four SATA card http://www.amug.org/amug-web/html/amug/reviews/articles/sonnet/e4p/ an easier bet is AMD''s 4x4 Platform http://www.tomshardware.com/2006/11/30/brute_force_quad_cores/page6.html with its watered down Professional 3600 chipset http://www.nvidia.com/page/pg_20060814366736.html that would likely "just work" with 12 sata ports. man, if someone would sell me a diskless thumper... its an impressive grouping of PCI-X slots. Rob
Jochen M. Kaiser
2006-Dec-13 21:35 UTC
[zfs-discuss] Re: Re: ZFS Usage in Warehousing (no more lengthy intro)
Robert,> It''s not that bad with CPU usage. > For example with RAID-Z2 while doing scrub I get > something like > 800MB/s read from disks (550-600MB/s from zpool > iostat perspective) > and all four cores are mostly consumed - I get > something like 10% idle > on each cpu.== But in the end this would leave us which too few cpu cycles for the database backend. Some of the queries are rather complex and fairly cpu intensive (completly saturating a v440 box cpu-wise). I''d say going x4600 plus attached storage will probably suit us more, especially when the number of users on the warehouse does increase. Nevertheless many thanks for your input - I really enjoy your blog. Jochen This message posted from opensolaris.org
Jochen M. Kaiser
2006-Dec-13 21:47 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro, now slightly OT)
Al, <snip>> > Being a friend of simplicity I was thinking about > using a pair (or more) of 3320 > > SCSI JBODs with multiple RAIDZ and/or RAID10 zfs > disk pools on which we''d > > Have you not heard that SCSI is dead? :)<scis == slow&dead, well more or less, that is>> While I understand you don''t want to build a SAN, an > alternative would be > a Fibre Channel (FC) box that presents SATA drives. > This would be a DAS > olution with one or two connections to (Qlogic) FC > controllers in the > host - IOW not a SAN and there is no FC switch > required. Many such boxes > are designed to provide expansion to a FC based > hardware RAID box. For > example, the DS4000 EXP100 Storage Expansion Unit > from IBM. In your > application you''d need to find something that > supports FC rates of > 4Gb/Sec, if possible.==Well, secondary approach would be the usage of infortrend boxes of the latest generation. The guys of the european support were very helpful and supplied me with a wealth of information including how they setup the boxes for the throughput test on their webpage. (All of those a 4GB FC based) The fallback of this fallback was the uage of SAS JBODs (j300s) from Promise or HP''s (yes, I know shame on me) 2,5'''' 10-HDD sas jbod''s, if they''ll be equipped with a secondary controller module next year(?). Didn''t find any decent SAS controllers though, qlogic has some, but the PCIe model with two external ports isn''t supported on Solaris. The single port model would work though...> Another possiblity, which is on my todo list to > checkout, is: > > http://www.norcotek.com/item_detail.php?categoryid=8&m > odelno=DS-1220 > > Now if I could find a Marvell based equivalent to > the: > http://www.supermicro.com/products/accessories/addon/A > oC-SAT2-MV8.cfm with > external SATA ports, life would be great. Another > card with external SATA > ports that works with Solaris (via the si3124 driver) > is: > http://www.newegg.com/product/product.asp?item=N82E168 > 16124003 which only > has a 32-bit PCI connection. :(==I''ll check those out - thanks for the wealth of information :-) Jochen This message posted from opensolaris.org
Richard Elling
2006-Dec-13 23:39 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro, now slightly OT)
Jochen M. Kaiser wrote:> Didn''t find any decent SAS controllers though, qlogic has some, > but the PCIe model with two external ports isn''t supported on > Solaris. The single port model would work though...We (Sun) sell LSI 1064-based SAS/SATA controllers. There should be several sources of these besides Sun, too. -- richard
Robert Milkowski
2006-Dec-14 11:06 UTC
[zfs-discuss] Re: Re: ZFS Usage in Warehousing (no more lengthy intro)
Hello Jochen, Wednesday, December 13, 2006, 10:35:22 PM, you wrote: JMK> Robert,>> It''s not that bad with CPU usage. >> For example with RAID-Z2 while doing scrub I get >> something like >> 800MB/s read from disks (550-600MB/s from zpool >> iostat perspective) >> and all four cores are mostly consumed - I get >> something like 10% idle >> on each cpu.JMK> == JMK> But in the end this would leave us which too few cpu cycles for the JMK> database backend. Some of the queries are rather complex and fairly JMK> cpu intensive (completly saturating a v440 box cpu-wise). JMK> I''d say going x4600 plus attached storage will probably suit us more, JMK> especially when the number of users on the warehouse does increase. Well, this is with RAID-Z2 and it''s expected to consume more CPU. If you go with RAID-10 CPU usage should be much lower - however I don''t remember exact numbers (I was interested on x4500 mostly in file serving). JMK> Nevertheless many thanks for your input - I really enjoy your blog. Nice to hear you like it. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Brian Hechinger
2006-Dec-14 13:58 UTC
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro, now slightly OT)
On Wed, Dec 13, 2006 at 03:39:58PM -0800, Richard Elling wrote:> Jochen M. Kaiser wrote: > >Didn''t find any decent SAS controllers though, qlogic has some, > >but the PCIe model with two external ports isn''t supported on > >Solaris. The single port model would work though... > > We (Sun) sell LSI 1064-based SAS/SATA controllers. There should be > several sources of these besides Sun, too.The LSI SAS3442X is reported to work under SPARC. I haven''t purchased one to try yet, but that will happen hopefully sometime soon. I''ll report here how it works. -brian