Hi list, Experimental question ... Imagine a pool made of SSDs disks, is there any interest to add a SSD cache to it ? What real impact ? Thx. -- Francois
Depends. a) Pool design 5 x SSD as raidZ = 4 SSD space - read I/O performance of one drive Adding 5 cheap 40 GB L2ARC device (which are pooled) increases the read performance for your working window of 200 GB. If you have a pool of mirrors - adding L2ARC does not make sence. b) SSD type Is your devices are MLC adding a ZIL makes sence. Watch out for drive qualification ! (honor the cache flush command). Robert -- This message posted from opensolaris.org
On Jan 9, 2010, at 1:32 AM, Lutz Schumann wrote:> Depends. > > a) Pool design > 5 x SSD as raidZ = 4 SSD space - read I/O performance of one drive > Adding 5 cheap 40 GB L2ARC device (which are pooled) increases the > read performance for your working window of 200 GB.An interesting thing happens when an app suddenly has 50-100x more IOPS. The bottleneck tends to move back to the CPU. This is a good thing, because the application running on a CPU is where the most value is gained. Be aware of this, because it is not uncommon for people to upgrade the storage and not see significant improvement when the application becomes CPU-bound.> If you have a pool of mirrors - adding L2ARC does not make sence.I think this is a good rule of thumb. -- richard> b) SSD type > Is your devices are MLC adding a ZIL makes sence. Watch out for > drive qualification ! (honor the cache flush command). > > Robert > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
I''m also considering adding a cheap SSD as a a cache drive. The only problem is that SSDs loose performance over time because when something is deleted, it is not actually deleted. So the next time something is written on the same blocks, it must first delete, then write. To fix this, SSDs allow a new command called Trim which automatically clean the blocks after deleting something. Does anyone know if opensolaris supports Trim? -- This message posted from opensolaris.org
On Fri, Jun 4, 2010 at 11:28 AM, zfsnoob4 <zfsnoobman at hotmail.co.uk> wrote:> Does anyone know if opensolaris supports Trim?It does not. However, it doesn''t really matter for a cache device. The cache device is written to rather slowly, and only needs to have low latency access on reads. Most current gen SSDs such as the Intel X25-M, Indilinx Barefoot, etc. also support garbage collection which reduces the need for TRIM. It''s important that you align block on a 4k or 8k boundary though. (OCZ recommends 8k for the Vertex drives.) I think that most current drives have between a 128k and 512k erase block size, which is another alignment point you can use. -B -- Brandon High : bhigh at freaks.com
On Jun 4, 2010, at 14:28, zfsnoob4 wrote:> Does anyone know if opensolaris supports Trim?Not at this time. Are you referring to a read cache or a write cache?
On Fri, Jun 4, 2010 at 2:59 PM, David Magda <dmagda at ee.ryerson.ca> wrote:> Are you referring to a read cache or a write cache?A cache vdev is a L2ARC, used for reads. A log vdev is a slog/zil, used for writes. Oh, how we overload our terms. -B -- Brandon High : bhigh at freaks.com
I was talking about a write cache (slog/zil I suppose). This is just a media server for home. The idea is when I copy an HD video from my camera to the network drive it is always several GBs. So if it could copy the file to the SSD first and then have it slowly copy to the normal HDs that would be very good since it would essentially saturate the GigE network. Having a read cache for me is not useful, because all the files are huge and you can''t really predict which one someone will watch. I guess its not that useful to begin with, and without Trim the write performance will start to drop off anyways. Thanks for the clarification. -- This message posted from opensolaris.org
On Jun 5, 2010, at 1:30 PM, zfsnoob4 wrote:> I guess its not that useful to begin with, and without Trim the write performance will start to drop off anyways.At the risk of sounding like a broken record, file systems which are not COW unquestionably gain benefits with TRIM. ZFS uses COW and it has yet to be demonstrated that the performance degradation as seen in other file systems affects ZFS. If you would care to undertake such a study, please share the results with us. -- richard -- ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/
On Sat, 5 Jun 2010, zfsnoob4 wrote:> > I guess its not that useful to begin with, and without Trim the > write performance will start to drop off anyways.It is not necessarily true that SSD write performance will drop off over time without TRIM. It depends on how the SSD is designed. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On 6/5/2010 1:30 PM, zfsnoob4 wrote:> I was talking about a write cache (slog/zil I suppose). This is just a media server for home. The idea is when I copy an HD video from my camera to the network drive it is always several GBs. So if it could copy the file to the SSD first and then have it slowly copy to the normal HDs that would be very good since it would essentially saturate the GigE network. > > Having a read cache for me is not useful, because all the files are huge and you can''t really predict which one someone will watch. > > I guess its not that useful to begin with, and without Trim the write performance will start to drop off anyways. > > Thanks for the clarification. >Now, wait a minute. Are you copying the data from the camera to a local hard drive first, then to a network drive? If that''s the scenario, then you''ll likely be bottlenecked on the speed of your camera->PC connection, which is likely 400MB/s or 800MB/s Firewire. 3-4 hard drives can easily keep up with a large sequential write such as that. In this case, a local SSD isn''t going to be any faster than a Hard drive. Sequential write for large files has no real difference in speed between an SSD and a HD. If you''re copying the data from your camera straight to a network-mounted drive, then Gigabit ethernet is your bottleneck, and there''s no real benefit to an SSD on the server side at all - even a single HD should be able to keep up with Gigabit speeds for a sequential write. Having somewhere locally to copy the data first doesn''t really buy you anything - you still have to push it through the GigE bottleneck. And, ZFS isn''t a network file system - it''s not going to be able to cache something on the client side. That''s up to the client. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
> Sequential write for large files has no real > difference in speed between > an SSD and a HD.That''s not true. Indilinx based SSDs can write upto 200MB/s sequentially, and Sandforce based even more. I don''t know of any HD that can do that. Most HD are considered good if they do half of that. -- This message posted from opensolaris.org
FWIW. I use 4 intel 32gb ssds as read cache for each pool of 10 Patriot Torx drives which are running in a raidz2 configuration. No Slogs as I haven''t seen a compliant SSD drive yet. I am pleased with the results. The bottleneck really turns out to be the 24 port raid card they are plugged into. Bonnie++ Local read about 750 MB/sec rewrite about 450 MB/sec Write about 600 MB/sec if memory serves. A SQLio test run from a fibre connected vmware guest reached over 16,000 IOPS for 8k random reads. Because the vmware host only has a 4gb Fibre card max reads were limited to a hair under 400 MB/sec. Useing several guest on two VMware host machines achieved 690 MB/sec reads combined. -- This message posted from opensolaris.org
On Jun 7, 2010, at 00:15, Richard Jahnel wrote:> I use 4 intel 32gb ssds as read cache for each pool of 10 Patriot > Torx drives which are running in a raidz2 configuration. No Slogs as > I haven''t seen a compliant SSD drive yet.Besides STEC''s Zeus drives you mean? (Which aren''t available in retail.) There was a discussion about drives based on SandForce''s SF-1500 a little while ago. I believe some of the OCZ drives with it have supercaps.
I''ll have to take your word on the Zeus drives. I don''t see any thing in thier literature that explicitly states that cache flushes are obeyed or other wise protected against power loss. As for OCZ they cancelled the Vertex 2 Pro which was to be the one with the super cap. For the moment they are just selling the Vertex 2 and Vetex LE neither of which have the super cap. -- This message posted from opensolaris.org
On Mon, June 7, 2010 09:21, Richard Jahnel wrote:> I''ll have to take your word on the Zeus drives. I don''t see any thing in > thier literature that explicitly states that cache flushes are obeyed or > other wise protected against power loss.The STEC units is what Oracle/Sun use in their 7000 series appliances, and I believe EMC and many others use them as well.
> No Slogs as I haven''t seen a compliant SSD drive yet.As the architect of the DDRdrive X1, I can state categorically the X1 correctly implements the SCSI Synchronize Cache (flush cache) command. Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org
And a very nice device it is indeed. However for my purposes it doesn''t work as it doesn''t fit into a 2.5" slot and use sata/sas connections. Unfortunately all my pci express slots are in use. 2 raid controllers 1 Fibre HBA 1 10gb ethernet card. -- This message posted from opensolaris.org
On Mon, 2010-06-07 at 07:51 -0700, Christopher George wrote:> > No Slogs as I haven''t seen a compliant SSD drive yet. > > As the architect of the DDRdrive X1, I can state categorically the X1 > correctly implements the SCSI Synchronize Cache (flush cache) > command. > > Christopher George > Founder/CTO > www.ddrdrive.comI can also confirm this, as the author of the first Solaris device driver for this hardware. (My driver is not what is in the X1 product though, and that is a topic for a different day.) Some notes about the X1: 1) its a PCIe x1 form factor, so it isn''t a typical SSD 2) it is dependent on an external power source (a little wall wart provides low voltage power to the card... I don''t recall the voltage off hand) 3) the contents of the card''s DDR ram are never flushed to non-volatile storage automatically, but require an explicit action from the administrator to save or restore the contents of the DDR to NAND flash. (This operation takes 60 seconds, during which the card is not responsive to other commands.) 4) the cost of the device is significantly higher (ISTR $1800, but it may be less than that) than a typical SSD, with much smaller capacity (4GB) than typical SSD. But it offers much lower latencies and higher performance than any other SSD I''ve encountered. If you have an extra PCIe slot, an available UPS, and the dollars to spend, this is a nice little device for use as a ZIL -- my driver was able to drive I/O all the way to the limit of the PCIe x1 slot easily. The various caveats for its usage and high per-unit cost probably make it not practical for the typical home user, though. -- Garrett
On Mon, Jun 7, 2010 at 9:45 AM, David Magda <dmagda at ee.ryerson.ca> wrote:> On Mon, June 7, 2010 09:21, Richard Jahnel wrote: > > I''ll have to take your word on the Zeus drives. I don''t see any thing in > > thier literature that explicitly states that cache flushes are obeyed or > > other wise protected against power loss. > > The STEC units is what Oracle/Sun use in their 7000 series appliances, and > I believe EMC and many others use them as well. > >When did that start? Every 7000 I''ve seen uses Intel drives. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100607/5ad8e171/attachment.html>
Thanks Garrett!> 2) it is dependent on an external power source (a little wall wart > provides low voltage power to the card... I don''t recall the voltage off > hand)9V DC.> 3) the contents of the card''s DDR ram are never flushed to non-volatile > storage automatically, but require an explicit action from the > administrator to save or restore the contents of the DDR to NAND flash. > (This operation takes 60 seconds, during which the card is not > responsive to other commands.)For the internally developed and RTM OpenSolaris/NexentaStor 3.0 device driver this is not the case, as automatic backup/restore is the default configuration. On host power down/failure the X1 automatically performs a backup, i.e. the DRAM is copied to the on-board NAND (Flash). On the next boot, the NAND is automatically restored to DRAM. This process is seamless and doesn''t require any user intervention. *** The hardware support required for automatic backup/restore was not yet available when Garrett wrote the blk2scsa based driver.> 4) the cost of the device is significantly higher (ISTR $1800, but it > may be less than that) than a typical SSD, with much smaller capacity > (4GB) than typical SSD. But it offers much lower latencies and higher > performance than any other SSD I''ve encountered.The last I checked, the STEC SSD resold by Sun/Oracle, which also correctly implements cache flush, was $6,000. So for SSDs that fully comply with the POSIX requirements for synchronous write transactions and do not lose transactions on a host power failure, we are competitively priced at $1,995 SRP. Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org
On Mon, June 7, 2010 12:56, Tim Cook wrote:>> The STEC units is what Oracle/Sun use in their 7000 series appliances, >> and I believe EMC and many others use them as well. > > When did that start? Every 7000 I''ve seen uses Intel drives.According to the Sun System Handbook for the 7310, the 18 GB SSD (ZIL?) is a STEC ZEUS^ops, and the 100 GB SSD (L2ARC?) is a STEC Mach8^iops. See Oracle/Sun parts 540-7763 and 540-7793, respectively.
Do you lose the data if you lose that 9v feed at the same time the computer losses power? -- This message posted from opensolaris.org
On Mon, 2010-06-07 at 11:49 -0700, Richard Jahnel wrote:> Do you lose the data if you lose that 9v feed at the same time the computer losses power?Yes. Hence the need for a separate UPS. - Garrett
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Garrett D''Amore > > On Mon, 2010-06-07 at 11:49 -0700, Richard Jahnel wrote: > > Do you lose the data if you lose that 9v feed at the same time the > computer losses power? > > Yes. Hence the need for a separate UPS.Having worked with ddrdrive a lot during evaluation, I want to say something in their defense too: They are top notch architects, who have only released the first version of their first product. Their performance absolutely destroys everything else (or at least soundly beats everything else I''ve ever heard of) and they''re aware of the obvious drawbacks of physical form factor and external power source. duh. I don''t know their product roadmap. Make your own decisions, and obviously for a lot of people form factor and power are show-stoppers. But please don''t dismiss the whole company or future products on those grounds. Expect good things.