I was wondering if there were work done in the area of zfs configuration running out of 100% SSD disks. L2ARC and ZIL have been designed as a way to improve long seek times/latencies of rotational disks. now if we use only SSD (F5100 or F20) as back end drives for zfs, we should not need those additional log/cache mechanisms..or at least algorithms managing those caches might need improvement in the same way, I guess, when running an OS on a SSD boot disk, should we still need the same memory swapping mechanisms as we do today, considering that in that case, the swap device is (nearly) as fast as memory itself. To some extension, log journals found in DB would also not be relevant anymore? tia, selim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091205/8952dc01/attachment.html>
Hi, What you say is probably right with respect to L2ARC, but logging (ZIL or database log) is required for consistency purpose. Anurag. Sent from my BlackBerry? smartphone from !DEA -----Original Message----- From: Selim Daoud <selim.daoud at gmail.com> Date: Sat, 5 Dec 2009 08:59:52 To: ZFS Discussions<zfs-discuss at opensolaris.org> Subject: [zfs-discuss] zfs on ssd _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Selim Daoud wrote:> I was wondering if there were work done in the area of zfs > configuration running out of 100% SSD disks. > > L2ARC and ZIL have been designed as a way to improve long seek > times/latencies of rotational disks. > now if we use only SSD (F5100 or F20) as back end drives for zfs, we > should not need those additional log/cache mechanisms..or at least > algorithms managing those caches might need improvementGiven correct tuning ZFS is already pretty solid (pun intended) on SSD. Any log-structured thing is going to be fast on SSD. ZFS has the unique property of being able to mix SSD and traditional SAS/SCSI drives for maximum bang for the buck. If you just want bang, and no buck left, go ahead and buy a zillion SSD drives instead? If you don''t want/need log or cache, disable these? You might want to run your ZIL (slog) on ramdisk. Beware, that without a persisted ZIL there _will_ be dataloss with unexpected shutdowns. I''d go for the default: without explicit log-vdev(s) the ZIL will reside in the storage pool itself.> > in the same way, I guess, when running an OS on a SSD boot disk, > should we still need the same memory swapping mechanisms as we do > today, considering that in that case, the swap device is (nearly) as > fast as memory itself.Is it? I think that when you look up the numbers (for server-grade hardware) you could find an order of magnitude difference. Now there are solid state storage cards that employ RAM chips and backup power to persist the state. These are the fastest in the industry, but I know you will _never_ want to put your multi-terabyte ZFS pools on those (better buy a couple of Ferrari''s instead).> To some extension, log journals found in DB would also not be > relevant anymore? >I beg your pardon? The crux with transaction logs is that they get _physically committed_ (and synced, that is) before writing the actual transaction so that the log will survive reboot and the transaction can be rolled back at reboot. This is crucial for atomicity/integrity. So... the log is obviously still required to be on disk/SSD.> > > tia, > > selim > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On 12/05/09 01:36, anurag at kqinfotech.com wrote:> Hi, > > What you say is probably right with respect to L2ARC, but logging (ZIL or database log) is required for consistency purpose.No, the ZIL is not required for consistency. The pool is fully consistent without the ZIL. See http://blogs.sun.com/perrin/entry/the_lumberjack for more details. Neil.
On Sat, 5 Dec 2009, Seth Heeren wrote:>> >> in the same way, I guess, when running an OS on a SSD boot disk, >> should we still need the same memory swapping mechanisms as we do >> today, considering that in that case, the swap device is (nearly) as >> fast as memory itself. > Is it? I think that when you look up the numbers (for server-grade > hardware) you could find an order of magnitude difference. Now there areThe difference is pretty huge. Consider 6GB+/second vs 140MB/second. The interesting thing for the future will be non-volatile main memory, with the primary concern being how to firewall damage due to a bug. You would be able to turn your computer off and back on and be working again almost instantaneously. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> The interesting thing for the future will be non-volatile main memory, > with the primary concern being how to firewall damage due to a bug. > You would be able to turn your computer off and back on and be working > again almost instantaneously.Some of us are old enough (just) to have used computers back in the days when they all did this anyway... Funny how things go full circle... -- Andrew
On Dec 5, 2009, at 8:09 AM, Andrew Gabriel wrote:> Bob Friesenhahn wrote: >> The interesting thing for the future will be non-volatile main >> memory, with the primary concern being how to firewall damage due >> to a bug. You would be able to turn your computer off and back on >> and be working again almost instantaneously. > Some of us are old enough (just) to have used computers back in the > days when they all did this anyway... > Funny how things go full circle...:-) Get the power low enough and we''ll never turn them off... you can even remove the on/off switch entirely. -- richard
Bob Friesenhahn wrote:> On Sat, 5 Dec 2009, Seth Heeren wrote: >>> >>> in the same way, I guess, when running an OS on a SSD boot disk, >>> should we still need the same memory swapping mechanisms as we do >>> today, considering that in that case, the swap device is (nearly) as >>> fast as memory itself. >> Is it? I think that when you look up the numbers (for server-grade >> hardware) you could find an order of magnitude difference. Now there are > > The difference is pretty huge. Consider 6GB+/second vs 140MB/second.Not to detract from the point (my own point in fact) but my 2xSSD in stripes deliver a peak read throughput of 350Mb/s each time I boot :) My boot time lands at 11-13 seconds depending on wheather conditions.> > The interesting thing for the future will be non-volatile main memory, > with the primary concern being how to firewall damage due to a bug. > You would be able to turn your computer off and back on and be working > again almost instantaneously. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, > http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Colin Raven wrote:> On Sat, Dec 5, 2009 at 17:43, Seth Heeren <seth at zfs-fuse.net > <mailto:seth at zfs-fuse.net>> wrote: > > Bob Friesenhahn wrote: > > On Sat, 5 Dec 2009, Seth Heeren wrote: > >>> > >>> in the same way, I guess, when running an OS on a SSD boot disk, > >>> should we still need the same memory swapping mechanisms as we do > >>> today, considering that in that case, the swap device is > (nearly) as > >>> fast as memory itself. > >> Is it? I think that when you look up the numbers (for server-grade > >> hardware) you could find an order of magnitude difference. Now > there are > > > > The difference is pretty huge. Consider 6GB+/second vs > 140MB/second. > Not to detract from the point (my own point in fact) but my 2xSSD in > stripes deliver a peak read throughput of 350Mb/s each time I boot > :) My > boot time lands at 11-13 seconds depending on wheather conditions. > > > Goodness me, what the heck does weather have to do with the > performance of an SSD? >It''s a figure of speech... Nothing. But my network conditions and the speed at which I login, the exact delay detecting the logical volume groups on my 6 sata disks in the same system... these times will vary. Not to mention system updates that trigger actions at boot time etc. I''m on Ubuntu Karmic, btw. and the boot fs is (obviously) not on ZFS (but ext4 minus journalling and with tweaked block sizes/alignment).
>>>>> "sh" == Seth Heeren <seth at zfs-fuse.net> writes:sh> If you don''t want/need log or cache, disable these? You might sh> want to run your ZIL (slog) on ramdisk. seems quite silly. why would you do that instead of just disabling the ZIL? I guess it would give you a way to disable it pool-wide instead of system-wide. A per-filesystem ZIL knob would be awesome. sh> I beg your pardon? The crux with transaction logs is that they sh> get _physically committed_ (and synced, that is) before sh> writing the actual transaction so that the log will survive sh> reboot and the transaction can be rolled back at reboot. This sh> is crucial for atomicity/integrity. No, it''s crucial for durability. You''ll have the other two even without fsync(), so long as writes are never reordered. Unlike other filesystems, AIUI ZFS never reorders writes (it always restores to a TXG commit point, which are 5 - 30sec apart, but are point-in-time snapshots so there''s no reordering). Thus, even without the persistent ZIL, the recovered system should be crash-consistent and preserve ACI but not D of databases stored on it. For an MTA accepting mail and promising the sender, ``I''ve durably stored what you sent,'''' it''s a problem. For a distributed database where nodes need to be in sync with each other, it''s a problem. For an NFS server where you don''t want to have to reboot all the clients when the server reboots, it''s a problem. but for *corrupting* a database, it''s *not* a problem, and for two separate databases stored in the same ZFS filesystem being in sync wrt each other (ex., an sqlite3 database being in sync with the mbox it points to) it''s also *not* a problem. makes sense? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091211/06cb5091/attachment.bin>
On Fri, 2009-12-11 at 13:49 -0500, Miles Nordin wrote:> >>>>> "sh" == Seth Heeren <seth at zfs-fuse.net> writes: > > sh> If you don''t want/need log or cache, disable these? You might > sh> want to run your ZIL (slog) on ramdisk. > > seems quite silly. why would you do that instead of just disabling > the ZIL? I guess it would give you a way to disable it pool-wide > instead of system-wide. > > A per-filesystem ZIL knob would be awesome.for what it''s worth, there''s already a per-filesystem ZIL knob: the "logbias" property. It can be set either to "latency" or "throughput".
On 12/11/09 14:56, Bill Sommerfeld wrote:> On Fri, 2009-12-11 at 13:49 -0500, Miles Nordin wrote: >>>>>>> "sh" == Seth Heeren <seth at zfs-fuse.net> writes: >> sh> If you don''t want/need log or cache, disable these? You might >> sh> want to run your ZIL (slog) on ramdisk. >> >> seems quite silly. why would you do that instead of just disabling >> the ZIL? I guess it would give you a way to disable it pool-wide >> instead of system-wide. >> >> A per-filesystem ZIL knob would be awesome. > > for what it''s worth, there''s already a per-filesystem ZIL knob: the > "logbias" property. It can be set either to "latency" or > "throughput".That''s a bit different. logbias controls whether the intent log block blocks go to main pool or the log devices (if they exist). I think Miles was requesting a per fs knob to disable writing any log blocks. A proposal for this exists that suggests a new sync property: everything synchrnous; everything not synchronous (ie zil disabled on fs); and the current behaviour (the default). The RFE is: 6280630 zil synchronicity My problem with implementing this is that people might actually use it! Well actually my concern is more that it will be misused. Neil.