Lutz Schumann
2010-Jan-10 18:22 UTC
[zfs-discuss] ZFS cache flush ignored by certain devices ?
A very interesting thread (http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/) and some thinking about the design of SSD''s lead to a experiment I did with the Intel X25-M SSD. The question was: Is my data safe, once it has reached the disk and has been commited to my application ? All transactional safety in ZFS requires the correct impementation of the synchronize cache command (see http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg27264.html, where someone used Opensolaris within VirtualBox, which per default - does ignore the cache flush command). Thus qualified hardware is VERY essential (also see http://www.snia.org/events/storage-developer2009/presentations/monday/JeffBonwick_zfs-What_Next-SDC09.pdf). What I did (for a Intel X25-M G2 (default settings = write cache on) and a Seagate SATA drive (ST3500418AS)): a) Create a Pool b) Create a Programm that opens a file synchronously and writes to the file. It also prints the latest record written successfully. c) pull the power of the SATA disk d) power cycle everything e) open the pool again and verify the content of the file is the one that has been to the application e1) if it is the same - nice hardware e2) if it is NOT the same - BAD hardware What I found out was: Intel X25-M G2: - If I pull the power cable much data is lost, altought commited to the app (some hundred) - If I pull the sata cable no data is lost ST3500418AS: - If I pull the power cable almost no data is lost, but still the last write is lost (strange!) - If I pull the sata cable no data is lost Actually this result was partially expected. Howerver the one missing transaction in my SATA HDD Disk (Seagate) is strange. Unfortunately I do not have "enterprise SAS hardware" handy to verify that my test procedure is correct. Maybe someone can run this test on a SAS test machine ? (see script attached) --- Attachments --- --- script (call it with script.pl --file /mypool/testfile) --- #!/usr/bin/env perl # for O_SYNC use Fcntl qw(:DEFAULT :flock SEEK_CUR SEEK_SET SEEK_END); use IO::File; use Getopt::Long; my $pool="disk"; my $mountroot="/volumes"; my $file="$mountroot/$pool/testfile"; my $abort=0; my $count=0; GetOptions( "pool=s" => \$pool, "testfile|file=s" => \$file, "count=i" => \$count, ); my $dir = $file; $dir =~ s/[^\/]+$//g; if (-e $file) { print "ERROR: File $file already exists\n"; exit 1; } if (! -d "$dir" ) { print "ERROR: Directory $dir does not exist\n"; exit 1; } sysopen (FILE, "$file", O_RDWR | O_CREAT | O_EXCL | O_SYNC) or die "ERROR Opening file $file: $!\n"; $SIG{INT}= sub { print " ... signalling Abort ... (file: $file)\n"; $abort=1; }; $|=1; my $lastok=undef; my $i=0; my $msg=sprintf("This is round number %20s", $i); # O_SYNC, O_CREAT while (!$abort) { $i++; if ($count && $i>$count) { last; }; $msg=sprintf("This is round number %20s", $i); sysseek (FILE, 0, SEEK_SET); print "$msg"; my $rc=syswrite FILE,$msg; if (!defined($rc)) { print "ERROR\n"; print "ERROR While writing $msg\n"; print "ERROR: $!\n"; last; } else { print " DONE \n"; $lastok=$msg; } } close(FILE); print "\nTHE LAST MESSAGE WRITTEN to file $file was:\n\n\t\"$lastok\"\n\n"; ---- Here''s the logs of my tests ---- 1) Test the SATA SSD (Intel X25-M) ---------------------------------- .. start write.pl This is round number 67482 This is round number 67483 This is round number 67484 This is round number 67485 This is round number 67486 This is round number 67487 This is round number 67488 This is round number 67489 This is round number 67490 ( .. I pull the POWER CABLE of the SATA SSD .. ) .. I/O hangs .. zpool status shows zpool status -v pool: ssd state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run ''zpool clear''. see: http://www.sun.com/msg/ZFS-8000-JQ scrub: none requested config: NAME STATE READ WRITE CKSUM ssd UNAVAIL 0 11 0 insufficient replicas c3t5d0 UNAVAIL 3 2 0 cannot open errors: Permanent errors have been detected in the following files: ssd:<0x0> /volumes/ssd/ /volumes/ssd/testfile ... now I power cycled the machine and put back the power cable ... lets see the pool status pool: ssd state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM ssd ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 errors: No known data errors .... lets look at the file content ... .. remember last reported successful transaction was "67490" root at nexenta:/volumes/ssd# cat testfile This is round number 67246 root at nexenta:/volumes/ssd# ... UPS 244 transactions missing - bummer ... .. Ok repeeat the test with pulling the SATA cable only !!! (thus the device has time to write out the changes) This is round number 39451 This is round number 39452 This is round number 39453 This is round number 39454 This is round number 39455 This is round number 39456 This is round number 39457 This is round number 39458 This is round number 39459 .. hangs .. reboot pool: ssd state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM ssd ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 ... cat ssd/testfile (last commited = 39459) This is round number 39459 .. this is OK 1) Test the SATA HDD (Seagate ST3500418AS) ..... same test with a HDD ... This is round number 3548 This is round number 3549 This is round number 3550 This is round number 3551 This is round number 3552 This is round number 3553 This is round number 3554 This is round number 3555 This is round number 3556 This is round number 3557 This is round number 3558 .. hangs pool: disk state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run ''zpool clear''. see: http://www.sun.com/msg/ZFS-8000-JQ scrub: none requested config: NAME STATE READ WRITE CKSUM disk UNAVAIL 0 10 0 insufficient replicas c3t5d0 UNAVAIL 3 2 0 cannot open .. reboot .. check file (last commited = 3558) nmc at nexenta:/disk$ cat testfile This is round number 3557 .. Again one transaction missing, strange, test again ... .. Again (Disk) ... This is round number 1689 DONE This is round number 1690 DONE This is round number 1691 DONE This is round number 1692 DONE This is round number 1693 DONE This is round number 1694 DONE This is round number 1695 .. pull power cable .. reboot .. check nmc at nexenta:/$ cat disk/testfile This is round number 1693 ... again just one missing .. test the SATA cable pull .... This is round number 1269 DONE This is round number 1270 DONE This is round number 1271 DONE This is round number 1272 DONE This is round number 1273 DONE This is round number 1274 DONE This is round number 1275 DONE This is round number 1276 DONE This is round number 1277 .. pull sata cable (not power) nmc at nexenta:/$ cat disk/testfile This is round number 1276 .. this is OK -- This message posted from opensolaris.org
Lutz Schumann
2010-Jan-10 19:09 UTC
[zfs-discuss] ZFS cache flush ignored by certain devices ?
I managed to disable the write cache (did not know a tool on Solaris, hoever hdadm from the EON NAS binary_kit does the job): Same power discuption test with Seagate HDD and write cache disabled ... --------------------------------------------------------------------------------------- root at nexenta:/volumes# .sc/bin/hdadm write_cache display c3t5 c3t5 write_cache> disabled ... pull power cable of Seagate SATA Disk This is round number 4543 DONE This is round number 4544 DONE This is round number 4545 DONE This is round number 4546 DONE This is round number 4547 DONE This is round number 4548 DONE This is round number 4549 DONE This is round number 4550 <... hangs here> ... power cycle everything node1:/mnt/disk# cat testfile This is round number 4549 ... this looks good. So disabeling the write cache helps, but limits the performance really (not for synchronous but for async writes). Test with Intel X25-M -------------------------- ... Same with SSD root at nexenta:/volumes# hdadm write_cache off c3t5 c3t5 write_cache> disabled root at nexenta:/volumes# hdadm write_cache display c3t5 c3t5 write_cache> disabled .. pull SSD power cable This is round number 9249 DONE This is round number 9250 DONE This is round number 9251 DONE This is round number 9252 DONE This is round number 9253 DONE This is round number 9254 DONE This is round number 9255 DONE This is round number 9256 DONE This is round number 9257 <... hangs here> .. power cycle everything ... test node1:/mnt/ssd# cat testfile This is round number 9256 So without a write cache the device works correctly .... However be warned on boot the cache is enabled again: Device Serial Vendor Model Rev Temperature ------ ------ ------ ----- ---- ----------- c3t5d0p0 7200Y5160AGN ATA INTEL SSDSA2M160 02HD 255 C (491 F) root at nexenta:/volumes# hdadm write_cache display c3t5 c3t5 write_cache> enabled Question: Does anyone know the impact of disabeling the write cache for the write amplification factor of the intel SSD''s ? I would deploy Intel X25-M only for "mostly read" workloads anyway. Thus the performance impact of disabeling the write cache can be ignored. However if the life expectency of the device goes down without a write cache (I means it is MLC already!) - Bummer. And another Question: How can I permanently disable the write cache on the Intel X25-M SSD''s ? Regards -- This message posted from opensolaris.org
Lutz Schumann
2010-Jan-10 19:43 UTC
[zfs-discuss] ZFS cache flush ignored by certain devices ?
Actually the performance decrease when disableing the write cache on the SSD is aprox 3x (aka 66%). Setup: node1 = Linux Client with open-iscsi server = comstar (cache=write through) + zvol (recordsize=8k, compression=off) --- with SSD-Disk-write cache disabled: node1:/mnt/ssd# iozone -ec -r 8k -s 128m -l 2 -i 0 -i 2 -o -I Iozone: Performance Test of File I/O Version $Revision: 3.327 $ Compiled for 32 bit mode. Build: linux Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root. Run began: Sun Jan 10 20:14:46 2010 Include fsync in write timing Include close in write timing Record Size 8 KB File size set to 131072 KB SYNC Mode. O_DIRECT feature enabled Command line used: iozone -ec -r 8k -s 128m -l 2 -i 0 -i 2 -o -I Output is in Kbytes/sec Time Resolution = 0.000002 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Min process = 2 Max process = 2 Throughput test with 2 processes Each process writes a 131072 Kbyte file in 8 Kbyte records Children see throughput for 2 initial writers = 1324.45 KB/sec Parent sees throughput for 2 initial writers = 1291.27 KB/sec Min throughput per process = 646.07 KB/sec Max throughput per process = 678.38 KB/sec Avg throughput per process = 662.23 KB/sec Min xfer = 124832.00 KB Children see throughput for 2 rewriters = 4360.29 KB/sec Parent sees throughput for 2 rewriters = 4360.08 KB/sec Min throughput per process = 2158.82 KB/sec Max throughput per process = 2201.47 KB/sec Avg throughput per process = 2180.15 KB/sec Min xfer = 128536.00 KB Children see throughput for 2 random readers = 43930.41 KB/sec Parent sees throughput for 2 random readers = 43914.01 KB/sec Min throughput per process = 21768.16 KB/sec Max throughput per process = 22162.25 KB/sec Avg throughput per process = 21965.21 KB/sec Min xfer = 128760.00 KB Children see throughput for 2 random writers = 5561.01 KB/sec Parent sees throughput for 2 random writers = 5560.41 KB/sec Min throughput per process = 2780.37 KB/sec Max throughput per process = 2780.64 KB/sec Avg throughput per process = 2780.50 KB/sec Min xfer = 131064.00 KB ... with SSD write cache enabled node1:/mnt/ssd# iozone -ec -r 8k -s 128m -l 2 -i 0 -i 2 -o -I Iozone: Performance Test of File I/O Version $Revision: 3.327 $ Compiled for 32 bit mode. Build: linux Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root. Run began: Sun Jan 10 20:22:14 2010 Include fsync in write timing Include close in write timing Record Size 8 KB File size set to 131072 KB SYNC Mode. O_DIRECT feature enabled Command line used: iozone -ec -r 8k -s 128m -l 2 -i 0 -i 2 -o -I Output is in Kbytes/sec Time Resolution = 0.000002 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Min process = 2 Max process = 2 Throughput test with 2 processes Each process writes a 131072 Kbyte file in 8 Kbyte records Children see throughput for 2 initial writers = 3387.15 KB/sec Parent sees throughput for 2 initial writers = 3258.90 KB/sec Min throughput per process = 1621.62 KB/sec Max throughput per process = 1765.53 KB/sec Avg throughput per process = 1693.57 KB/sec Min xfer = 120392.00 KB Children see throughput for 2 rewriters = 11084.93 KB/sec Parent sees throughput for 2 rewriters = 11083.10 KB/sec Min throughput per process = 5503.68 KB/sec Max throughput per process = 5581.25 KB/sec Avg throughput per process = 5542.46 KB/sec Min xfer = 129256.00 KB Children see throughput for 2 random readers = 46140.94 KB/sec Parent sees throughput for 2 random readers = 46104.64 KB/sec Min throughput per process = 23002.35 KB/sec Max throughput per process = 23138.59 KB/sec Avg throughput per process = 23070.47 KB/sec Min xfer = 130312.00 KB Children see throughput for 2 random writers = 18500.58 KB/sec Parent sees throughput for 2 random writers = 18492.31 KB/sec Min throughput per process = 9248.47 KB/sec Max throughput per process = 9252.11 KB/sec Avg throughput per process = 9250.29 KB/sec Min xfer = 131032.00 KB Difference for Writes: 50-66% less performance. Still much better then disks for writes. One more question for understanding: Talking about read performance. Assuming a reliable ZIL disk (cache flush = working): The ZIL can guarantee data integrity, however if the backend disks (aka pool disks) do not properly implement cache flush - a reliable ZIL device does not "workaround" the bad backend disks rigth ??? (meaning: having a reliable ZIL + some MLC SSD with write cache enabled is not reliable at the end) Thanks -- This message posted from opensolaris.org
Bob Friesenhahn
2010-Jan-10 20:43 UTC
[zfs-discuss] ZFS cache flush ignored by certain devices ?
On Sun, 10 Jan 2010, Lutz Schumann wrote:> > Talking about read performance. Assuming a reliable ZIL disk (cache > flush = working): The ZIL can guarantee data integrity, however if > the backend disks (aka pool disks) do not properly implement cache > flush - a reliable ZIL device does not "workaround" the bad backend > disks rigth ??? > > (meaning: having a reliable ZIL + some MLC SSD with write cache > enabled is not reliable at the end)As soon as there is more than one disk in the pool, it is necessary for cache flush to work or else the devices may contain content from entirely different transaction groups, resulting in a scrambled pool. If you just had one disk which tended to ignore cache flush requests, then you should be ok as long as the disk writes the data in order. In that case any unwritten data would be lost, but the pool should not be lost. If the device ignores cache flush requests and writes data in some random order, then the pool is likely to eventually fail. I think that zfs mirrors should be safer than raidz when faced with devices which fail to flush (should be similar to the single-disk case), but only if there is one mirror pair. A scary thing about SSDs is that they may re-write old data while writing new data, which could result in corruption of the old data if the power fails while it is being re-written. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Kjetil Torgrim Homme
2010-Jan-11 11:24 UTC
[zfs-discuss] ZFS cache flush ignored by certain devices ?
Lutz Schumann <presales at storageconcepts.de> writes:> Actually the performance decrease when disableing the write cache on > the SSD is aprox 3x (aka 66%).for this reason, you want a controller with battery backed write cache. in practice this means a RAID controller, even if you don''t use the RAID functionality. of course you can buy SSDs with capacitors, too, but I think that will be more expensive, and it will restrict your choice of model severely. (BTW, thank you for testing forceful removal of power. the result is as expected, but it''s good to see that theory and practice match.) -- Kjetil T. Homme Redpill Linpro AS - Changing the game
Lutz Schumann
2010-Jan-11 14:49 UTC
[zfs-discuss] ZFS cache flush ignored by certain devices ?
Maybe it is lost in this much text :) .. thus this re-post Does anyone know the impact of disabeling the write cache for the write amplification factor of the intel SSD''s ? How can I permanently disable the write cache on the Intel X25-M SSD''s ? Thanks, Robert -- This message posted from opensolaris.org
Bob Friesenhahn
2010-Jan-11 15:59 UTC
[zfs-discuss] ZFS cache flush ignored by certain devices ?
On Mon, 11 Jan 2010, Kjetil Torgrim Homme wrote:> > (BTW, thank you for testing forceful removal of power. the result is as > expected, but it''s good to see that theory and practice match.)Actually, the result is not "as expected" since the device should not have lost any data preceding a cache flush request. These sort of results should be cause for concern for anyone currently using one as a zfs log device, or using it for any write-sensitive application at all. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Miles Nordin
2010-Jan-25 21:00 UTC
[zfs-discuss] x4500...need input and clarity on striped/mirrored configuration
>>>>> "ca" == Carsten Aulbert <carsten.aulbert at aei.mpg.de> writes: >>>>> "ls" == Lutz Schumann <presales at storageconcepts.de> writes:ca> X25-E drives and a converter from 3.5 to 2.5 inches. So far ca> two systems have shown pretty bad instabilities with that. instability after crashing or instability while running? Lutz Schumann 2010-01-10 seemed to find the x25m g2 was ignoring sync cache commands when its write cache was set to ``on'''', but it did indeed do uncached writing if you turn the write cache off for the whole drive albeit at half the performance advertised in the spec sheet: ls> Intel X25-M G2: - If I pull the power cable much data is lost, ls> altought commited to the app (some hundred) - If I pull the ls> sata cable no data is lost ls> ST3500418AS: - If I pull the power cable almost no data is ls> lost, but still the last write is lost (strange!) - If I pull ls> the sata cable no data is lost the test for it was to write a program that did ''write, sync, write, sync'' and notice when yaning x25m power connector with cache on <n> transactions were lost, while yanking SATA connector 0 or 1 transactions were lost. Therefore I suspect the x25e which also lacks a supercap might also be another deliberately broken to inflate specs drive, and if it''s instability after crashing you might try to disable the x25e write cache (1/2 performance) and try again? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100125/638ef061/attachment.bin>
Lutz Schumann
2010-Jan-25 22:56 UTC
[zfs-discuss] ZFS cache flush ignored by certain devices ?
One problem with the write cache is that I do not know if it is needed for write wearing ? As mentioned, disabeling write cache might be ok in terms of performance (I want to use MLC SSD as data disks, not as ZIL, to have a SSD only appliance - I''m looking for read speed for dedupe, zfs send and all the other things ZFS tends to do a lot of random reads for). I could not life with a degration in write endurance with a disabled write cache. Unfortunately nobody was able to anwer this and I guess only Intel can -- and won''t. However I don''t want to ruin 2 Postville SSD''s for 200? each to find out :). -- This message posted from opensolaris.org
Miles Nordin
2010-Feb-05 19:45 UTC
[zfs-discuss] Impact of an enterprise class SSD on ZIL performance
>>>>> "pr" == Peter Radig <peter at radig.de> writes: >>>>> "ls" == Lutz Schumann <presales at storageconcepts.de> writes:pr> I was expecting a good performance from the X25-E, but was pr> really suprised that it is that good (only 1.7 times slower pr> than it takes with ZIL completely disabled). So I will use the pr> X25-E as ZIL device on my box and will not consider disabling pr> ZIL at all to improve NFS performance. According to Lutz posting here ~2010-01-10, the X25-M may not actually be functioning as a ZIL unless you disable its write cache with ''hdadm''. He said he found normal hard drives respect cache flush commands in stream, but Intel X25-M does not. however both do respect disabling the write cache. ls> root at nexenta:/volumes# hdadm write_cache off c3t5 ls> c3t5 write_cache> disabled You might want to repeat his test with X25-E. If the X25-E is also dropping cache flush commands (it might!), you may be, compared to disabling the ZIL, slowing down your pool for no reason, and making it more fragile as well since an exported pool with a dead ZIL cannot be imported. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100205/8b25bb75/attachment.bin>
Bob Friesenhahn
2010-Feb-05 19:55 UTC
[zfs-discuss] Impact of an enterprise class SSD on ZIL performance
On Fri, 5 Feb 2010, Miles Nordin wrote:> > ls> root at nexenta:/volumes# hdadm write_cache off c3t5 > > ls> c3t5 write_cache> disabled > > You might want to repeat his test with X25-E. If the X25-E is also > dropping cache flush commands (it might!), you may be, compared to > disabling the ZIL, slowing down your pool for no reason, and making it > more fragile as well since an exported pool with a dead ZIL cannot be > imported.Others have tested the X25-E and found that with its cache enabled, it does drop flushed writes, but is clearly not such a gaping chasm as the X25-M. Some time has passed so there is the possibility that X25-E firmware has (or will) improve. If Sun offers an X25-E based device for use as an slog, you can be sure that its has been qualified for this purpose, and may contain modified firmware. The ''E'' stands for "Extreme" and not "Enterprise" as some tend to believe. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Andrey Kuzmin
2010-Feb-05 20:01 UTC
[zfs-discuss] Impact of an enterprise class SSD on ZIL performance
On Fri, Feb 5, 2010 at 10:55 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> On Fri, 5 Feb 2010, Miles Nordin wrote: >> >> ? ls> root at nexenta:/volumes# hdadm write_cache off c3t5 >> >> ? ls> ?c3t5 write_cache> disabled >> >> You might want to repeat his test with X25-E. ?If the X25-E is also >> dropping cache flush commands (it might!), you may be, compared to >> disabling the ZIL, slowing down your pool for no reason, and making it >> more fragile as well since an exported pool with a dead ZIL cannot be >> imported. > > Others have tested the X25-E and found that with its cache enabled, it does > drop flushed writes, but is clearly not such a gaping chasm as the X25-M. > ?Some time has passed so there is the possibility that X25-E firmware has > (or will) improve. ?If Sun offers an X25-E based device for use as an slog, > you can be sure that its has been qualified for this purpose, and may > contain modified firmware. > > The ''E'' stands for "Extreme" and not "Enterprise" as some tend to believe.Exactly. It would be therefore very interesting to hear on performance from anyone using (real) enterprise SSD (which now spells STEC) as slog. Regards, Andrey> > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, ? ?http://www.GraphicsMagick.org/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Ray Van Dolson
2010-Feb-05 20:26 UTC
[zfs-discuss] Impact of an enterprise class SSD on ZIL performance
On Fri, Feb 05, 2010 at 11:55:12AM -0800, Bob Friesenhahn wrote:> On Fri, 5 Feb 2010, Miles Nordin wrote: > > > > ls> root at nexenta:/volumes# hdadm write_cache off c3t5 > > > > ls> c3t5 write_cache> disabled > > > > You might want to repeat his test with X25-E. If the X25-E is also > > dropping cache flush commands (it might!), you may be, compared to > > disabling the ZIL, slowing down your pool for no reason, and making it > > more fragile as well since an exported pool with a dead ZIL cannot be > > imported. > > Others have tested the X25-E and found that with its cache enabled, it > does drop flushed writes, but is clearly not such a gaping chasm as > the X25-M. Some time has passed so there is the possibility that > X25-E firmware has (or will) improve. If Sun offers an X25-E based > device for use as an slog, you can be sure that its has been qualified > for this purpose, and may contain modified firmware. > > The ''E'' stands for "Extreme" and not "Enterprise" as some tend to > believe. >I missed out on this thread. How would these dropped flushed writes manifest themselves? Something in the logs, or just worsened performance? Ray
Miles Nordin
2010-Feb-05 21:33 UTC
[zfs-discuss] Impact of an enterprise class SSD on ZIL performance
>>>>> "rvd" == Ray Van Dolson <rvandolson at esri.com> writes: >>>>> "ak" == Andrey Kuzmin <andrey.v.kuzmin at gmail.com> writes:rvd> I missed out on this thread. How would these dropped flushed rvd> writes manifest themselves? presumably corrupted databases, lost mail, or strange NFS behavior after the server reboots when the clients do not. But the actual test to which I referred is benchmark-like and didn''t observe any of those things. If you read my post I gave you Lutz''s name and the date he posted and also linked to the msgid in my message''s header, so go read for yourself! A good point, though, is that drives with lying write caches are still okay if your box reboots because of a kernel panic, just not if it loses power, so they''re not worthless. ak> performance from anyone using (real) enterprise SSD (which now ak> spells STEC) as slog. I wonder how ACARD would do also since it is 1/5th the cost, or if Seagate Pulsar will behave correctly. STEC coming in at more expensive than DRAM is like a sucker-premium you pay because no one else has their act together. And according to the test Lutz did the X25-M (and probably also -E?) are okay so long as you disable the write cache, though you have to do it at every boot, and ''hdadm'' is not bundled. It would also be nice to convince anandtech and friends to yank power cords, too, to confirm that write flushes issued in their tests are actually obeyed, and to redo the io/s test with write cache disabled if the device lies, so that we actually have comparable numbers. If they would do that, the $ value of a supercap would become obvious. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100205/cbff2d8b/attachment.bin>