thr3ads.net - zfs discuss - [zfs-discuss] ZFS cache flush ignored by certain devices ? [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Lutz Schumann

2010-Jan-10 18:22 UTC

[zfs-discuss] ZFS cache flush ignored by certain devices ?

A very interesting thread
(http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/)
and some thinking about the design of SSD''s lead to a experiment I did
with the Intel X25-M SSD. The question was:

Is my data safe, once it has reached the disk and has been commited to my
application ?

All transactional safety in ZFS requires the correct impementation of the
synchronize cache command (see http://www.mail-archive.com/zfs-discuss at
opensolaris.org/msg27264.html, where someone used Opensolaris within VirtualBox,
which  per default - does ignore the cache flush command). Thus qualified
hardware is VERY essential (also see
http://www.snia.org/events/storage-developer2009/presentations/monday/JeffBonwick_zfs-What_Next-SDC09.pdf).

What I did (for a Intel X25-M G2 (default settings = write cache on) and a
Seagate SATA drive (ST3500418AS)):

a) Create a Pool 
b) Create a Programm that opens a file 
   synchronously and writes to the file. 
   It also prints the latest record written 
   successfully.
c) pull the power of the SATA disk 
d) power cycle everything 
e) open the pool again and verify the content 
   of the file is the one that has been to 
   the application 
    e1) if it is the same - nice hardware 
    e2) if it is NOT the same - BAD hardware 

What I found out was: 

Intel X25-M G2: 
  - If I pull the power cable much data is lost, altought commited to the app
(some hundred)
  - If I pull the sata cable no data is lost
  
ST3500418AS: 
  - If I pull the power cable almost no data is lost, but still the last write
is lost (strange!)
  - If I pull the sata cable no data is lost

Actually this result was partially expected. Howerver the one missing
transaction in my SATA HDD Disk (Seagate) is strange.

Unfortunately I do not have "enterprise SAS hardware" handy to verify
that my test procedure is correct.

Maybe someone can run this test on a SAS test machine ? (see script attached)


--- Attachments ---

--- script (call it with script.pl --file /mypool/testfile) ---

#!/usr/bin/env perl

# for O_SYNC
use Fcntl qw(:DEFAULT :flock SEEK_CUR SEEK_SET SEEK_END);
use IO::File;
use Getopt::Long;

my $pool="disk";
my $mountroot="/volumes";
my $file="$mountroot/$pool/testfile";
my $abort=0;
my $count=0;

GetOptions(
        "pool=s" => \$pool,
        "testfile|file=s" => \$file,
        "count=i" => \$count,
);

my $dir = $file;
$dir =~ s/[^\/]+$//g;

if (-e $file) {
        print "ERROR: File $file already exists\n";
        exit 1;
}

if (! -d "$dir" ) {
                print "ERROR: Directory $dir does not exist\n";
                exit 1;
}
sysopen (FILE, "$file", O_RDWR | O_CREAT | O_EXCL | O_SYNC) or die
"ERROR Opening file $file: $!\n";

$SIG{INT}= sub { print " ... signalling Abort ... (file: $file)\n";
$abort=1; };

$|=1;

my $lastok=undef;
my $i=0;
my $msg=sprintf("This is round number %20s", $i);
# O_SYNC, O_CREAT
while (!$abort) {
        $i++;

        if ($count && $i>$count) { last; };

        $msg=sprintf("This is round number %20s", $i);
        sysseek (FILE, 0, SEEK_SET);
        print "$msg";
        my $rc=syswrite FILE,$msg;
        if (!defined($rc)) {
                print "ERROR\n";
                print "ERROR While writing $msg\n";
                print "ERROR: $!\n";
                last;
        } else {
                print " DONE \n";
                $lastok=$msg;
        }
}

close(FILE);

print "\nTHE LAST MESSAGE WRITTEN to file $file
was:\n\n\t\"$lastok\"\n\n";

---- Here''s the logs of my tests ----

1) Test the SATA SSD (Intel X25-M) 
----------------------------------
.. start write.pl

This is round number                67482
This is round number                67483
This is round number                67484
This is round number                67485
This is round number                67486
This is round number                67487
This is round number                67488
This is round number                67489
This is round number                67490

( .. I pull the POWER CABLE of the SATA SSD .. )

.. I/O hangs 

.. zpool status shows 

zpool status -v
  pool: ssd
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run ''zpool
clear''.
   see: http://www.sun.com/msg/ZFS-8000-JQ
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        ssd         UNAVAIL      0    11     0  insufficient replicas
          c3t5d0    UNAVAIL      3     2     0  cannot open

errors: Permanent errors have been detected in the following files:

        ssd:<0x0>
        /volumes/ssd/
        /volumes/ssd/testfile


... now I power cycled the machine and put back the power cable 

... lets see the pool status 

  pool: ssd
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        ssd         ONLINE       0     0     0
          c3t5d0    ONLINE       0     0     0

errors: No known data errors

.... lets look at the file content ...
.. remember last reported successful transaction was "67490"
root at nexenta:/volumes/ssd# cat testfile
This is round number                67246
root at nexenta:/volumes/ssd#

... UPS 244 transactions missing - bummer ... 

.. Ok repeeat the test with pulling the SATA cable only !!!
(thus the device has time to write out the changes) 

This is round number                39451
This is round number                39452
This is round number                39453
This is round number                39454
This is round number                39455
This is round number                39456
This is round number                39457
This is round number                39458
This is round number                39459
.. hangs 
.. reboot 

  pool: ssd
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        ssd         ONLINE       0     0     0
          c3t5d0    ONLINE       0     0     0

... cat ssd/testfile (last commited = 39459)
This is round number                39459
.. this is OK

1) Test the SATA HDD (Seagate ST3500418AS) 

..... same test with a HDD ... 

This is round number                 3548
This is round number                 3549
This is round number                 3550
This is round number                 3551
This is round number                 3552
This is round number                 3553
This is round number                 3554
This is round number                 3555
This is round number                 3556
This is round number                 3557
This is round number                 3558
.. hangs 

  pool: disk
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run ''zpool
clear''.
   see: http://www.sun.com/msg/ZFS-8000-JQ
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        disk        UNAVAIL      0    10     0  insufficient replicas
          c3t5d0    UNAVAIL      3     2     0  cannot open


		  
.. reboot 

.. check file  (last commited = 3558)

nmc at nexenta:/disk$ cat testfile
This is round number                 3557

.. Again one transaction missing, strange, test again ...

.. Again (Disk) ...

This is round number                 1689 DONE
This is round number                 1690 DONE
This is round number                 1691 DONE
This is round number                 1692 DONE
This is round number                 1693 DONE
This is round number                 1694 DONE
This is round number                 1695

.. pull power cable 
.. reboot 
.. check 

nmc at nexenta:/$ cat disk/testfile
This is round number                 1693

... again just one missing

.. test the SATA cable pull ....

This is round number                 1269 DONE
This is round number                 1270 DONE
This is round number                 1271 DONE
This is round number                 1272 DONE
This is round number                 1273 DONE
This is round number                 1274 DONE
This is round number                 1275 DONE
This is round number                 1276 DONE
This is round number                 1277
.. pull sata cable (not power)

nmc at nexenta:/$ cat disk/testfile
This is round number                 1276

.. this is OK
-- 
This message posted from opensolaris.org

Lutz Schumann

2010-Jan-10 19:09 UTC

head link

[zfs-discuss] ZFS cache flush ignored by certain devices ?

I managed to disable the write cache (did not know a tool on Solaris, hoever
hdadm from the EON NAS binary_kit does the job):

Same power discuption test with Seagate HDD and write cache disabled ...
---------------------------------------------------------------------------------------

root at nexenta:/volumes# .sc/bin/hdadm write_cache display c3t5

 c3t5 write_cache> disabled

... pull power cable of Seagate SATA Disk 
 
This is round number                 4543 DONE
This is round number                 4544 DONE
This is round number                 4545 DONE
This is round number                 4546 DONE
This is round number                 4547 DONE
This is round number                 4548 DONE
This is round number                 4549 DONE
This is round number                 4550 <... hangs here>

... power cycle everything

node1:/mnt/disk# cat testfile
This is round number                 4549

... this looks good. 

So disabeling the write cache helps, but limits the performance really (not for
synchronous but for async writes).

Test with Intel X25-M
--------------------------

... Same with SSD 
root at nexenta:/volumes# hdadm write_cache off c3t5

 c3t5 write_cache> disabled

root at nexenta:/volumes# hdadm write_cache display c3t5

 c3t5 write_cache> disabled

.. pull SSD power cable 

This is round number                 9249 DONE
This is round number                 9250 DONE
This is round number                 9251 DONE
This is round number                 9252 DONE
This is round number                 9253 DONE
This is round number                 9254 DONE
This is round number                 9255 DONE
This is round number                 9256 DONE
This is round number                 9257 <... hangs here>

.. power cycle everything 
... test 

node1:/mnt/ssd# cat testfile
This is round number                 9256

So without a write cache the device works correctly ....

However be warned on boot the cache is enabled again: 

Device    Serial        Vendor   Model             Rev  Temperature
------    ------        ------   -----             ---- -----------
c3t5d0p0  7200Y5160AGN  ATA      INTEL SSDSA2M160  02HD 255 C (491 F)

root at nexenta:/volumes# hdadm write_cache display c3t5

 c3t5 write_cache> enabled

Question: Does anyone know the impact of disabeling the write cache for the
write amplification factor of the intel SSD''s ?

I would deploy Intel X25-M only for "mostly read" workloads anyway.
Thus the performance impact of disabeling the write cache can be ignored.
However if the life expectency of the device goes down without a write cache (I
means it is MLC already!) - Bummer.

And another Question: How can I permanently disable the write cache on the Intel
X25-M SSD''s ?

Regards
-- 
This message posted from opensolaris.org

Lutz Schumann

2010-Jan-10 19:43 UTC

head link

[zfs-discuss] ZFS cache flush ignored by certain devices ?

Actually the performance decrease when disableing the write cache on the SSD is
aprox 3x (aka 66%).

Setup: 
  node1 = Linux Client with open-iscsi 
  server = comstar (cache=write through) + zvol (recordsize=8k, compression=off)

--- with SSD-Disk-write cache disabled: 

node1:/mnt/ssd# iozone -ec -r 8k -s 128m -l 2 -i 0 -i 2 -o -I
        Iozone: Performance Test of File I/O
                Version $Revision: 3.327 $
                Compiled for 32 bit mode.
                Build: linux

        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.

        Run began: Sun Jan 10 20:14:46 2010

        Include fsync in write timing
        Include close in write timing
        Record Size 8 KB
        File size set to 131072 KB
        SYNC Mode.
        O_DIRECT feature enabled
        Command line used: iozone -ec -r 8k -s 128m -l 2 -i 0 -i 2 -o -I
        Output is in Kbytes/sec
        Time Resolution = 0.000002 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Min process = 2
        Max process = 2
        Throughput test with 2 processes
        Each process writes a 131072 Kbyte file in 8 Kbyte records

        Children see throughput for  2 initial writers  =    1324.45 KB/sec
        Parent sees throughput for  2 initial writers   =    1291.27 KB/sec
        Min throughput per process                      =     646.07 KB/sec
        Max throughput per process                      =     678.38 KB/sec
        Avg throughput per process                      =     662.23 KB/sec
        Min xfer                                        =  124832.00 KB

        Children see throughput for  2 rewriters        =    4360.29 KB/sec
        Parent sees throughput for  2 rewriters         =    4360.08 KB/sec
        Min throughput per process                      =    2158.82 KB/sec
        Max throughput per process                      =    2201.47 KB/sec
        Avg throughput per process                      =    2180.15 KB/sec
        Min xfer                                        =  128536.00 KB

        Children see throughput for 2 random readers    =   43930.41 KB/sec
        Parent sees throughput for 2 random readers     =   43914.01 KB/sec
        Min throughput per process                      =   21768.16 KB/sec
        Max throughput per process                      =   22162.25 KB/sec
        Avg throughput per process                      =   21965.21 KB/sec
        Min xfer                                        =  128760.00 KB

        Children see throughput for 2 random writers    =    5561.01 KB/sec
        Parent sees throughput for 2 random writers     =    5560.41 KB/sec
        Min throughput per process                      =    2780.37 KB/sec
        Max throughput per process                      =    2780.64 KB/sec
        Avg throughput per process                      =    2780.50 KB/sec
        Min xfer                                        =  131064.00 KB

... with SSD write cache enabled 

node1:/mnt/ssd# iozone -ec -r 8k -s 128m -l 2 -i 0 -i 2 -o -I
        Iozone: Performance Test of File I/O
                Version $Revision: 3.327 $
                Compiled for 32 bit mode.
                Build: linux

        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.

        Run began: Sun Jan 10 20:22:14 2010

        Include fsync in write timing
        Include close in write timing
        Record Size 8 KB
        File size set to 131072 KB
        SYNC Mode.
        O_DIRECT feature enabled
        Command line used: iozone -ec -r 8k -s 128m -l 2 -i 0 -i 2 -o -I
        Output is in Kbytes/sec
        Time Resolution = 0.000002 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Min process = 2
        Max process = 2
        Throughput test with 2 processes
        Each process writes a 131072 Kbyte file in 8 Kbyte records

        Children see throughput for  2 initial writers  =    3387.15 KB/sec
        Parent sees throughput for  2 initial writers   =    3258.90 KB/sec
        Min throughput per process                      =    1621.62 KB/sec
        Max throughput per process                      =    1765.53 KB/sec
        Avg throughput per process                      =    1693.57 KB/sec
        Min xfer                                        =  120392.00 KB

        Children see throughput for  2 rewriters        =   11084.93 KB/sec
        Parent sees throughput for  2 rewriters         =   11083.10 KB/sec
        Min throughput per process                      =    5503.68 KB/sec
        Max throughput per process                      =    5581.25 KB/sec
        Avg throughput per process                      =    5542.46 KB/sec
        Min xfer                                        =  129256.00 KB

        Children see throughput for 2 random readers    =   46140.94 KB/sec
        Parent sees throughput for 2 random readers     =   46104.64 KB/sec
        Min throughput per process                      =   23002.35 KB/sec
        Max throughput per process                      =   23138.59 KB/sec
        Avg throughput per process                      =   23070.47 KB/sec
        Min xfer                                        =  130312.00 KB

        Children see throughput for 2 random writers    =   18500.58 KB/sec
        Parent sees throughput for 2 random writers     =   18492.31 KB/sec
        Min throughput per process                      =    9248.47 KB/sec
        Max throughput per process                      =    9252.11 KB/sec
        Avg throughput per process                      =    9250.29 KB/sec
        Min xfer                                        =  131032.00 KB

Difference for Writes: 50-66% less performance. Still much better then disks for
writes.

One more question for understanding: 

Talking about read performance. Assuming a reliable ZIL disk (cache flush =
working): The ZIL can guarantee data integrity, however if the backend disks
(aka pool disks) do not properly implement cache flush - a reliable ZIL device
does not "workaround" the bad backend disks rigth ???

(meaning: having a reliable ZIL + some MLC SSD with write cache enabled is not
reliable at the end)

Thanks
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2010-Jan-10 20:43 UTC

head link

[zfs-discuss] ZFS cache flush ignored by certain devices ?

On Sun, 10 Jan 2010, Lutz Schumann wrote:>
> Talking about read performance. Assuming a reliable ZIL disk (cache 
> flush = working): The ZIL can guarantee data integrity, however if 
> the backend disks (aka pool disks) do not properly implement cache 
> flush - a reliable ZIL device does not "workaround" the bad
backend
> disks rigth ???
>
> (meaning: having a reliable ZIL + some MLC SSD with write cache 
> enabled is not reliable at the end)
As soon as there is more than one disk in the pool, it is necessary 
for cache flush to work or else the devices may contain content from 
entirely different transaction groups, resulting in a scrambled pool.

If you just had one disk which tended to ignore cache flush requests, 
then you should be ok as long as the disk writes the data in order. 
In that case any unwritten data would be lost, but the pool should not 
be lost.  If the device ignores cache flush requests and writes data 
in some random order, then the pool is likely to eventually fail.

I think that zfs mirrors should be safer than raidz when faced with 
devices which fail to flush (should be similar to the single-disk 
case), but only if there is one mirror pair.

A scary thing about SSDs is that they may re-write old data while 
writing new data, which could result in corruption of the old data if 
the power fails while it is being re-written.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Kjetil Torgrim Homme

2010-Jan-11 11:24 UTC

head link

[zfs-discuss] ZFS cache flush ignored by certain devices ?

Lutz Schumann <presales at storageconcepts.de> writes:
> Actually the performance decrease when disableing the write cache on
> the SSD is aprox 3x (aka 66%).
for this reason, you want a controller with battery backed write cache.
in practice this means a RAID controller, even if you don''t use the
RAID
functionality.  of course you can buy SSDs with capacitors, too, but I
think that will be more expensive, and it will restrict your choice of
model severely.

(BTW, thank you for testing forceful removal of power.  the result is as
expected, but it''s good to see that theory and practice match.)
-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

Lutz Schumann

2010-Jan-11 14:49 UTC

head link

[zfs-discuss] ZFS cache flush ignored by certain devices ?

Maybe it is lost in this much text :) .. thus this re-post 

Does anyone know the impact of disabeling the write cache for the write
amplification factor of the intel SSD''s ?

How can I permanently disable the write cache on the Intel X25-M SSD''s
?

Thanks, Robert
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2010-Jan-11 15:59 UTC

head link

[zfs-discuss] ZFS cache flush ignored by certain devices ?

On Mon, 11 Jan 2010, Kjetil Torgrim Homme wrote:>
> (BTW, thank you for testing forceful removal of power.  the result is as
> expected, but it''s good to see that theory and practice match.)
Actually, the result is not "as expected" since the device should not 
have lost any data preceding a cache flush request.

These sort of results should be cause for concern for anyone 
currently using one as a zfs log device, or using it for any 
write-sensitive application at all.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Miles Nordin

2010-Jan-25 21:00 UTC

head link

[zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

>>>>> "ca" == Carsten Aulbert <carsten.aulbert at
aei.mpg.de> writes:
>>>>> "ls" == Lutz Schumann <presales at
storageconcepts.de> writes:
    ca> X25-E drives and a converter from 3.5 to 2.5 inches. So far
    ca> two systems have shown pretty bad instabilities with that.

instability after crashing or instability while running?  Lutz
Schumann 2010-01-10 seemed to find the x25m g2 was ignoring sync cache
commands when its write cache was set to ``on'''', but it did
indeed do
uncached writing if you turn the write cache off for the whole drive
albeit at half the performance advertised in the spec sheet:

    ls> Intel X25-M G2: - If I pull the power cable much data is lost,
    ls> altought commited to the app (some hundred) - If I pull the
    ls> sata cable no data is lost
  
    ls> ST3500418AS: - If I pull the power cable almost no data is
    ls> lost, but still the last write is lost (strange!)  - If I pull
    ls> the sata cable no data is lost

the test for it was to write a program that did ''write, sync, write,
sync'' and notice when yaning x25m power connector with cache on
<n>
transactions were lost, while yanking SATA connector 0 or 1
transactions were lost.  Therefore I suspect the x25e which also lacks
a supercap might also be another deliberately broken to inflate specs
drive, and if it''s instability after crashing you might try to disable
the x25e write cache (1/2 performance) and try again?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100125/638ef061/attachment.bin>

Lutz Schumann

2010-Jan-25 22:56 UTC

head link

[zfs-discuss] ZFS cache flush ignored by certain devices ?

One problem with the write cache is that I do not know if it is needed for write
wearing ?

As mentioned, disabeling write cache might be ok in terms of performance (I want
to use MLC SSD as data disks, not as ZIL, to have a SSD only appliance -
I''m looking for read speed for dedupe, zfs send  and all the other
things ZFS tends to do a lot of random reads for).

I could not life with a degration in write endurance with a disabled write
cache. Unfortunately nobody was able to anwer this and I guess only Intel can --
and won''t. However I don''t want to ruin 2 Postville
SSD''s for 200? each to find out :).
-- 
This message posted from opensolaris.org

Miles Nordin

2010-Feb-05 19:45 UTC

head link

[zfs-discuss] Impact of an enterprise class SSD on ZIL performance

>>>>> "pr" == Peter Radig <peter at radig.de>
writes:
>>>>> "ls" == Lutz Schumann <presales at
storageconcepts.de> writes:
    pr> I was expecting a good performance from the X25-E, but was
    pr> really suprised that it is that good (only 1.7 times slower
    pr> than it takes with ZIL completely disabled). So I will use the
    pr> X25-E as ZIL device on my box and will not consider disabling
    pr> ZIL at all to improve NFS performance.

According to Lutz posting here ~2010-01-10, the X25-M may not actually
be functioning as a ZIL unless you disable its write cache with
''hdadm''.  He said he found normal hard drives respect cache
flush
commands in stream, but Intel X25-M does not.  however both do respect
disabling the write cache.

    ls> root at nexenta:/volumes# hdadm write_cache off c3t5

    ls>  c3t5 write_cache> disabled

You might want to repeat his test with X25-E.  If the X25-E is also
dropping cache flush commands (it might!), you may be, compared to
disabling the ZIL, slowing down your pool for no reason, and making it
more fragile as well since an exported pool with a dead ZIL cannot be
imported.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100205/8b25bb75/attachment.bin>

Bob Friesenhahn

2010-Feb-05 19:55 UTC

head link

[zfs-discuss] Impact of an enterprise class SSD on ZIL performance

On Fri, 5 Feb 2010, Miles Nordin wrote:>
>    ls> root at nexenta:/volumes# hdadm write_cache off c3t5
>
>    ls>  c3t5 write_cache> disabled
>
> You might want to repeat his test with X25-E.  If the X25-E is also
> dropping cache flush commands (it might!), you may be, compared to
> disabling the ZIL, slowing down your pool for no reason, and making it
> more fragile as well since an exported pool with a dead ZIL cannot be
> imported.
Others have tested the X25-E and found that with its cache enabled, it 
does drop flushed writes, but is clearly not such a gaping chasm as 
the X25-M.  Some time has passed so there is the possibility that 
X25-E firmware has (or will) improve.  If Sun offers an X25-E based 
device for use as an slog, you can be sure that its has been qualified 
for this purpose, and may contain modified firmware.

The ''E'' stands for "Extreme" and not
"Enterprise" as some tend to
believe.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Andrey Kuzmin

2010-Feb-05 20:01 UTC

head link

[zfs-discuss] Impact of an enterprise class SSD on ZIL performance

On Fri, Feb 5, 2010 at 10:55 PM, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:> On Fri, 5 Feb 2010, Miles Nordin wrote:
>>
>> ? ls> root at nexenta:/volumes# hdadm write_cache off c3t5
>>
>> ? ls> ?c3t5 write_cache> disabled
>>
>> You might want to repeat his test with X25-E. ?If the X25-E is also
>> dropping cache flush commands (it might!), you may be, compared to
>> disabling the ZIL, slowing down your pool for no reason, and making it
>> more fragile as well since an exported pool with a dead ZIL cannot be
>> imported.
>
> Others have tested the X25-E and found that with its cache enabled, it does
> drop flushed writes, but is clearly not such a gaping chasm as the X25-M.
> ?Some time has passed so there is the possibility that X25-E firmware has
> (or will) improve. ?If Sun offers an X25-E based device for use as an slog,
> you can be sure that its has been qualified for this purpose, and may
> contain modified firmware.
>
> The ''E'' stands for "Extreme" and not
"Enterprise" as some tend to believe.
Exactly. It would be therefore very interesting to hear on performance
from anyone using (real) enterprise SSD (which now spells STEC) as
slog.

Regards,
Andrey
>
> Bob
> --
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer, ? ?http://www.GraphicsMagick.org/
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Ray Van Dolson

2010-Feb-05 20:26 UTC

head link

[zfs-discuss] Impact of an enterprise class SSD on ZIL performance

On Fri, Feb 05, 2010 at 11:55:12AM -0800, Bob Friesenhahn
wrote:> On Fri, 5 Feb 2010, Miles Nordin wrote:
> >
> >    ls> root at nexenta:/volumes# hdadm write_cache off c3t5
> >
> >    ls>  c3t5 write_cache> disabled
> >
> > You might want to repeat his test with X25-E.  If the X25-E is also
> > dropping cache flush commands (it might!), you may be, compared to
> > disabling the ZIL, slowing down your pool for no reason, and making it
> > more fragile as well since an exported pool with a dead ZIL cannot be
> > imported.
> 
> Others have tested the X25-E and found that with its cache enabled, it 
> does drop flushed writes, but is clearly not such a gaping chasm as 
> the X25-M.  Some time has passed so there is the possibility that 
> X25-E firmware has (or will) improve.  If Sun offers an X25-E based 
> device for use as an slog, you can be sure that its has been qualified 
> for this purpose, and may contain modified firmware.
> 
> The ''E'' stands for "Extreme" and not
"Enterprise" as some tend to
> believe.
> 
I missed out on this thread.  How would these dropped flushed writes
manifest themselves?  Something in the logs, or just worsened
performance?

Ray

Miles Nordin

2010-Feb-05 21:33 UTC

head link

[zfs-discuss] Impact of an enterprise class SSD on ZIL performance

>>>>> "rvd" == Ray Van Dolson <rvandolson at
esri.com> writes:
>>>>> "ak" == Andrey Kuzmin <andrey.v.kuzmin at
gmail.com> writes:
   rvd> I missed out on this thread.  How would these dropped flushed
   rvd> writes manifest themselves?  

presumably corrupted databases, lost mail, or strange NFS behavior
after the server reboots when the clients do not.  But the actual test
to which I referred is benchmark-like and didn''t observe any of those
things.  If you read my post I gave you Lutz''s name and the date he
posted and also linked to the msgid in my message''s header, so go read
for yourself!

A good point, though, is that drives with lying write caches are still
okay if your box reboots because of a kernel panic, just not if it
loses power, so they''re not worthless.

    ak> performance from anyone using (real) enterprise SSD (which now
    ak> spells STEC) as slog.

I wonder how ACARD would do also since it is 1/5th the cost, or if
Seagate Pulsar will behave correctly.  STEC coming in at more
expensive than DRAM is like a sucker-premium you pay because no one
else has their act together.  And according to the test Lutz did the
X25-M (and probably also -E?) are okay so long as you disable the
write cache, though you have to do it at every boot, and
''hdadm'' is
not bundled.

It would also be nice to convince anandtech and friends to yank power
cords, too, to confirm that write flushes issued in their tests are
actually obeyed, and to redo the io/s test with write cache disabled
if the device lies, so that we actually have comparable numbers.  If
they would do that, the $ value of a supercap would become obvious.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100205/cbff2d8b/attachment.bin>

zfs discuss - Jan 2010 - ZFS cache flush ignored by certain devices ?

[zfs-discuss] ZFS cache flush ignored by certain devices ?

[zfs-discuss] ZFS cache flush ignored by certain devices ?

[zfs-discuss] ZFS cache flush ignored by certain devices ?

[zfs-discuss] ZFS cache flush ignored by certain devices ?

[zfs-discuss] ZFS cache flush ignored by certain devices ?

[zfs-discuss] ZFS cache flush ignored by certain devices ?

[zfs-discuss] ZFS cache flush ignored by certain devices ?

[zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

[zfs-discuss] ZFS cache flush ignored by certain devices ?

[zfs-discuss] Impact of an enterprise class SSD on ZIL performance

[zfs-discuss] Impact of an enterprise class SSD on ZIL performance

[zfs-discuss] Impact of an enterprise class SSD on ZIL performance

[zfs-discuss] Impact of an enterprise class SSD on ZIL performance

[zfs-discuss] Impact of an enterprise class SSD on ZIL performance