Still kicking around this idea and didn''t see it addressed in any of the threads before the forum closed. If one made an all ssd pool, would a log/cache drive just slow you down? Would zil slow you down? Thinking rotate MLC drives with sandforce controllers every few years to avoid losing a drive to "sorry no more writes aloud" scenarios. Thanks Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111028/a64b0fab/attachment.html>
On 10/28/11 07:04 PM, Mark Wolek wrote:> > Still kicking around this idea and didn?t see it addressed in any of > the threads before the forum closed. > > If one made an all ssd pool, would a log/cache drive just slow you > down? Would zil slow you down? >I would guess not, you would still be spreading your IOPs. I haven''t tried an all SSD pool, but I have tried adding a lump of spinning rust as a log to pool of identical dives and it did give a small improvement to NFS performance. -- Ian.
On 10/28/11 00:04, Mark Wolek wrote: Still kicking around this idea and didn’t see it addressed in any of the threads before the forum closed. If one made an all ssd pool, would a log/cache drive just slow you down? Would zil slow you down? Thinking rotate MLC drives with sandforce controllers every few years to avoid losing a drive to “sorry no more writes aloud” scenarios. Thanks Mark Interesting question. I don''t think there''s a straightforward answer. Oracle uses write optimised log devices and read optimised cache devices in it''s appliances. However, assuming all the SSDs are the same then I suspect neither a log nor a cache device would help: Log If there is a log then it is solely used, and can be written to in parallel with periodic TXG commit writes to the other pool devices. If that log were part of the pool then the ZIL code will spread the load among all pool devices, but will compete with TXG commit writes. My gut feeling is that this would be the higher performing option though. I think, a long time ago, I experimented with designating one disk out of the pool as a log and saw degradation on synchronous performance. That seems to be the equivalent to your SSD question. Cache Similarly for cache devices the read would compete at TXG commit writes, but otherwise performance ought to be higher. Neil. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Mark Wolek > > Still kicking around this idea and didn?t see it addressed in any of thethreads> before the forum closed. > > If one made an all ssd pool, would a log/cache drive just slow you > down?? Would zil slow you down?? Thinking rotate MLC drives with sandforce > controllers every few years to avoid losing a drive to ?sorry no morewrites> aloud? scenarios.Even if you have an all-HDD pool, you benefit by adding an HDD for log device. Why? Because the log device is dedicated to ONLY sync mode writes, and when you''re doing sync mode writes, you want low latency. If the primary disks in the pool are busy doing other things, that means additional latency before they can respond to a sync mode write, to stick something in the ZIL. The same argument applies to SSD''s. Even if your pool is all SSD, yes you benefit by adding a dedicated log device. The benefit won''t be as dramatic, of course, as if your pool were HDD with SSD for log... But it''s something. As for cache... It is conceivable that you might be able to get some benefit from cache, for the same reason. When other disks are busy, you might be able to get data out of the cache devices, and have some acceleration. But the cache devices require a not insignificant amount of maintenance overhead, keeping track of them and populating/expiring data in them. I think you probably wouldn''t get much benefit from a cache device. I think you would probably benefit more by parallelizing your main pool more. For example instead of making your pool from a dozen 256G disks, you might use two dozen 128G disks. Or instead of using mirrors, you might use 3-way mirrors. Etc.
On 10/28/11 00:54, Neil Perrin wrote: On 10/28/11 00:04, Mark Wolek wrote: Still kicking around this idea and didn’t see it addressed in any of the threads before the forum closed. If one made an all ssd pool, would a log/cache drive just slow you down? Would zil slow you down? Thinking rotate MLC drives with sandforce controllers every few years to avoid losing a drive to “sorry no more writes aloud” scenarios. Thanks Mark Interesting question. I don''t think there''s a straightforward answer. Oracle uses write optimised log devices and read optimised cache devices in it''s appliances. However, assuming all the SSDs are the same then I suspect neither a log nor a cache device would help: Log If there is a log then it is solely used, and can be written to in parallel with periodic TXG commit writes to the other pool devices. If that log were part of the pool then the ZIL code will spread the load among all pool devices, but will compete with TXG commit writes. My gut feeling is that this would be the higher performing option though. I think, a long time ago, I experimented with designating one disk out of the pool as a log and saw degradation on synchronous performance. That seems to be the equivalent to your SSD question. Cache Similarly for cache devices the read would compete at TXG commit writes, but otherwise performance ought to be higher. Neil. Did some quick tests with disks to check if my memory was correct. ''sb'' is a simple problem to spawn a number of threads to fill a file of a certain size with specified sized non zero writes. Bandwidth is also important. 1. Simple 2 disk system. 32KB synchronous writes filling 1GB with 20 threads zpool create whirl <2 disks>; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 20 Elapsed time 95s 10.8MB/s zpool create whirl log ; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 20 Elapsed time 151s 6.8MB/s 2. Higher end 6 disk system. 32KB synchronous writes filling 1GB with 100 threads zpool create whirl <6 disks>; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 100 Elapsed time 33s 31MB/s zpool create whirl <5 disks> log <1disk>; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 100 Elapsed time 147s 7.0MB/s and for interest: zpool create whirl <5 disk> log ; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 100 Elapsed time 8s 129MB/s 3. Higher end smaller writes 2K synchronous writes filling 128MB with 100 threads zpool create whirl <6 disks>: zfs set recordsize=1k whirl st1 -n /whirl/f -f 134217728 -b 2048 -t 100 Elapsed time 16s 8.2MB/s zpool create whirl <5 disks> log <1 disk> zfs set recordsize=1k whirl ds8 -n /whirl/f -f 134217728 -b 2048 -t 100 Elapsed time 24s 5.5MB/s _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Having the log disk slowed it down a lot in your tests (when it wasn''t a SSD), 30MB/s vs 7. Is this is also a 100% write / 100% sequential workload? Forcing sync? It''s gotten to the point where I can buy a 120G SSD for less or the same price as a 146G SAS disk...Sure the MLC drives have limited lifetime, but at $150 (and dropping) just replace them every few years to be safe, work out a rotation/rebuild cycle, it''s tempting... I suppose if we do end up buying all SSD''s it becomes really easy to test if we should use a log or not! From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Neil Perrin Sent: Friday, October 28, 2011 11:38 AM To: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] Log disk with all ssd pool? On 10/28/11 00:54, Neil Perrin wrote: On 10/28/11 00:04, Mark Wolek wrote: Still kicking around this idea and didn''t see it addressed in any of the threads before the forum closed. If one made an all ssd pool, would a log/cache drive just slow you down? Would zil slow you down? Thinking rotate MLC drives with sandforce controllers every few years to avoid losing a drive to "sorry no more writes aloud" scenarios. Thanks Mark Interesting question. I don''t think there''s a straightforward answer. Oracle uses write optimised log devices and read optimised cache devices in it''s appliances. However, assuming all the SSDs are the same then I suspect neither a log nor a cache device would help: Log If there is a log then it is solely used, and can be written to in parallel with periodic TXG commit writes to the other pool devices. If that log were part of the pool then the ZIL code will spread the load among all pool devices, but will compete with TXG commit writes. My gut feeling is that this would be the higher performing option though. I think, a long time ago, I experimented with designating one disk out of the pool as a log and saw degradation on synchronous performance. That seems to be the equivalent to your SSD question. Cache Similarly for cache devices the read would compete at TXG commit writes, but otherwise performance ought to be higher. Neil. Did some quick tests with disks to check if my memory was correct. ''sb'' is a simple problem to spawn a number of threads to fill a file of a certain size with specified sized non zero writes. Bandwidth is also important. 1. Simple 2 disk system. 32KB synchronous writes filling 1GB with 20 threads zpool create whirl <2 disks>; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 20 Elapsed time 95s 10.8MB/s zpool create whirl <disk> log <disk> ; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 20 Elapsed time 151s 6.8MB/s 2. Higher end 6 disk system. 32KB synchronous writes filling 1GB with 100 threads zpool create whirl <6 disks>; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 100 Elapsed time 33s 31MB/s zpool create whirl <5 disks> log <1disk>; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 100 Elapsed time 147s 7.0MB/s and for interest: zpool create whirl <5 disk> log <SSD>; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 100 Elapsed time 8s 129MB/s 3. Higher end smaller writes 2K synchronous writes filling 128MB with 100 threads zpool create whirl <6 disks>: zfs set recordsize=1k whirl st1 -n /whirl/f -f 134217728 -b 2048 -t 100 Elapsed time 16s 8.2MB/s zpool create whirl <5 disks> log <1 disk> zfs set recordsize=1k whirl ds8 -n /whirl/f -f 134217728 -b 2048 -t 100 Elapsed time 24s 5.5MB/s -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111028/18d9926f/attachment.html>
On 10/28/11 11:21, Mark Wolek wrote: Having the log disk slowed it down a lot in your tests (when it wasn’t a SSD), 30MB/s vs 7. Is this is also a 100% write / 100% sequential workload? Forcing sync? 100% synchronous write. Writes are random but ZFS will write them sequentially on disk. It’s gotten to the point where I can buy a 120G SSD for less or the same price as a 146G SAS disk…Sure the MLC drives have limited lifetime, but at $150 (and dropping) just replace them every few years to be safe, work out a rotation/rebuild cycle, it’s tempting… I suppose if we do end up buying all SSD’s it becomes really easy to test if we should use a log or not! Would highly recommend some form of zpool redundancy (mirroring or raidz). From: zfs-discuss-bounces@opensolaris.org [mailto:zfs-discuss-bounces@opensolaris.org] On Behalf Of Neil Perrin Sent: Friday, October 28, 2011 11:38 AM To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Log disk with all ssd pool? On 10/28/11 00:54, Neil Perrin wrote: On 10/28/11 00:04, Mark Wolek wrote: Still kicking around this idea and didn’t see it addressed in any of the threads before the forum closed. If one made an all ssd pool, would a log/cache drive just slow you down? Would zil slow you down? Thinking rotate MLC drives with sandforce controllers every few years to avoid losing a drive to “sorry no more writes aloud” scenarios. Thanks Mark Interesting question. I don''t think there''s a straightforward answer. Oracle uses write optimised log devices and read optimised cache devices in it''s appliances. However, assuming all the SSDs are the same then I suspect neither a log nor a cache device would help: Log If there is a log then it is solely used, and can be written to in parallel with periodic TXG commit writes to the other pool devices. If that log were part of the pool then the ZIL code will spread the load among all pool devices, but will compete with TXG commit writes. My gut feeling is that this would be the higher performing option though. I think, a long time ago, I experimented with designating one disk out of the pool as a log and saw degradation on synchronous performance. That seems to be the equivalent to your SSD question. Cache Similarly for cache devices the read would compete at TXG commit writes, but otherwise performance ought to be higher. Neil. Did some quick tests with disks to check if my memory was correct. ''sb'' is a simple problem to spawn a number of threads to fill a file of a certain size with specified sized non zero writes. Bandwidth is also important. 1. Simple 2 disk system. 32KB synchronous writes filling 1GB with 20 threads zpool create whirl <2 disks>; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 20 Elapsed time 95s 10.8MB/s zpool create whirl log ; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 20 Elapsed time 151s 6.8MB/s 2. Higher end 6 disk system. 32KB synchronous writes filling 1GB with 100 threads zpool create whirl <6 disks>; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 100 Elapsed time 33s 31MB/s zpool create whirl <5 disks> log <1disk>; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 100 Elapsed time 147s 7.0MB/s and for interest: zpool create whirl <5 disk> log ; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 100 Elapsed time 8s 129MB/s 3. Higher end smaller writes 2K synchronous writes filling 128MB with 100 threads zpool create whirl <6 disks>: zfs set recordsize=1k whirl st1 -n /whirl/f -f 134217728 -b 2048 -t 100 Elapsed time 16s 8.2MB/s zpool create whirl <5 disks> log <1 disk> zfs set recordsize=1k whirl ds8 -n /whirl/f -f 134217728 -b 2048 -t 100 Elapsed time 24s 5.5MB/s _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Oct 27, 2011, at 11:04 PM, Mark Wolek wrote:> Still kicking around this idea and didn?t see it addressed in any of the threads before the forum closed. > > If one made an all ssd pool, would a log/cache drive just slow you down? Would zil slow you down?In general, a slog makes sense when the latency is significantly better than access to the main pool. For an order of magnitude difference, it is a no brainer. For less significant differences, it can be debatable.> Thinking rotate MLC drives with sandforce controllers every few years to avoid losing a drive to ?sorry no more writes aloud? scenarios.Different SSDs have different performance behaviours. If you don''t pay attention to the details, you might find your pool is faster than your slog :-) -- richard -- ZFS and performance consulting http://www.RichardElling.com LISA ''11, Boston, MA, December 4-9
On 10/28/2011 01:04 AM, Mark Wolek wrote:> before the forum closed.Did I miss something? Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies.
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Karl Rossing > > On 10/28/2011 01:04 AM, Mark Wolek wrote: > > before the forum closed. > Did I miss something?Yes. The forums no longer exist. It''s only mailman email now.
On Tue, 1 Nov 2011, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Karl Rossing >> >> On 10/28/2011 01:04 AM, Mark Wolek wrote: >>> before the forum closed. >> Did I miss something? > > Yes. The forums no longer exist. It''s only mailman email now.I notice that the mail activity has diminished substantially since the forums were shut down. Apparently they were still in use. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
> From: Bob Friesenhahn [mailto:bfriesen at simple.dallas.tx.us] > > I notice that the mail activity has diminished substantially since the > forums were shut down. Apparently they were still in use.I''m sure nobody thought they were unused. I''m sure it was a cost saving measure. Jive forums start at $20k/yr, assuming you want just a vanilla config (which opensolaris didn''t) plus the man hours to maintain it and its consistency between mailman. I recently looked into implementing a community portal modeled on the opensolaris community (forums + email coexisting blissfully) but all the competitors were in the same range. You can do either one by itself extremely well for free. You can do both poorly for free, or you can do both very well for big bucks. That''s what opensolaris was doing. Seems coincidental this change was very near 1 year after "the change," doesn''t it? Somebody''s budget got slashed... But the jive renewal was nearly a year later... I bet.
On Tue, Nov 01, 2011 at 06:17:57PM -0400, Edward Ned Harvey wrote:> You can do both poorly for free, or you can do both very well for big bucks. > That''s what opensolaris was doing.That mess was costing someone money and considered very well done? Good riddance. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111102/12e857b3/attachment.bin>