thr3ads.net - zfs discuss - [zfs-discuss] SAS/short stroking vs. SSDs for ZIL [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Stephan Budach

2010-Dec-23 10:25 UTC

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

Hi,

as I have learned from the discussion about which SSD to use as ZIL 
drives, I stumbled across this article, that discusses short stroking 
for increasing IOPs on SAS and SATA drives:

http://www.tomshardware.com/reviews/short-stroking-hdd,2157.html

Now, I am wondering if using a mirror of such 15k SAS drives would be a 
good-enough fit for a ZIL on a zpool that is mainly used for file 
services via AFP and SMB.
I''d particulary like to know, if someone has already used such a 
solution and how it has worked out.

Cheers,
budy


-- 
Stephan Budach
Jung von Matt/it-services GmbH
Glash?ttenstra?e 79
20357 Hamburg

Tel: +49 40-4321-1353
Fax: +49 40-4321-1114
E-Mail: stephan.budach at jvm.de
Internet: http://www.jvm.com

Gesch?ftsf?hrer: Ulrich Pallas, Frank Wilhelm
AG HH HRB 98380

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101223/6e933387/attachment.html>

Phil Harman

2010-Dec-23 11:07 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

Great question. In "good enough" computing, beauty is in the eye of
the beholder. My home NAS appliance uses IDE and SATA drives withoutba dedicated
ZIL

http://dtrace.org/blogs/ahl/2010/11/15/zil-analysis-from-chris-george/


"if HDDs and commodity SSDs continue to be target ZIL devices, ZFS could
and should do more to ensure that writes are sequential."


On 23 Dec 2010, at 10:25, Stephan Budach <stephan.budach at jvm.de> wrote:
> Hi,
> 
> as I have learned from the discussion about which SSD to use as ZIL drives,
I stumbled across this article, that discusses short stroking for increasing
IOPs on SAS and SATA drives:
> 
> http://www.tomshardware.com/reviews/short-stroking-hdd,2157.html
> 
> Now, I am wondering if using a mirror of such 15k SAS drives would be a
good-enough fit for a ZIL on a zpool that is mainly used for file services via
AFP and SMB.
> I''d particulary like to know, if someone has already used such a
solution and how it has worked out.
> 
> Cheers,
> budy
> 
> 
>  -- 
> Stephan Budach
> Jung von Matt/it-services GmbH
> Glash?ttenstra?e 79
> 20357 Hamburg
> 
> Tel: +49 40-4321-1353
> Fax: +49 40-4321-1114
> E-Mail: stephan.budach at jvm.de
> Internet: http://www.jvm.com
> 
> Gesch?ftsf?hrer: Ulrich Pallas, Frank Wilhelm
> AG HH HRB 98380
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101223/9058c06e/attachment.html>

Phil Harman

2010-Dec-23 11:18 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

Sent from my iPhone (which had a lousy user interface which makes it all too
easy for a clumsy oaf like me to touch "Send" before I''m
done)...

On 23 Dec 2010, at 11:07, Phil Harman <phil.harman at gmail.com> wrote:
> Great question. In "good enough" computing, beauty is in the eye
of the beholder. My home NAS appliance uses mirrorwd IDE and SATA drives without
a dedicated ZIL
device. And for my home SMB and NFS, that''s good enough.

I''m sure that even a 7200rpm SATA ZIL would improve things inmy case.

The random I/O requirement for the ZIL is discussed by Adam (and Chris) here ...
> http://dtrace.org/blogs/ahl/2010/11/15/zil-analysis-from-chris-george/
What I find most encouraging is this statement:
> "if HDDs and commodity SSDs continue to be target ZIL devices, ZFS
could and should do more to ensure that writes are sequential."
It''s not broken, but it is suboptimal, and fixable (apparently) ;)
> On 23 Dec 2010, at 10:25, Stephan Budach <stephan.budach at jvm.de>
wrote:
> 
>> Hi,
>> 
>> as I have learned from the discussion about which SSD to use as ZIL
drives, I stumbled across this article, that discusses short stroking for
increasing IOPs on SAS and SATA drives:
>> 
>> http://www.tomshardware.com/reviews/short-stroking-hdd,2157.html
>> 
>> Now, I am wondering if using a mirror of such 15k SAS drives would be a
good-enough fit for a ZIL on a zpool that is mainly used for file services via
AFP and SMB.
>> I''d particulary like to know, if someone has already used such
a solution and how it has worked out.
>> 
>> Cheers,
>> budy
>> 
>> 
>>  -- 
>> Stephan Budach
>> Jung von Matt/it-services GmbH
>> Glash?ttenstra?e 79
>> 20357 Hamburg
>> 
>> Tel: +49 40-4321-1353
>> Fax: +49 40-4321-1114
>> E-Mail: stephan.budach at jvm.de
>> Internet: http://www.jvm.com
>> 
>> Gesch?ftsf?hrer: Ulrich Pallas, Frank Wilhelm
>> AG HH HRB 98380
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101223/3f364bb5/attachment-0001.html>

Stephan Budach

2010-Dec-23 11:53 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

Am 23.12.10 12:18, schrieb Phil Harman:> Sent from my iPhone (which had a lousy user interface which makes it 
> all too easy for a clumsy oaf like me to touch "Send" before
I''m done)...
>
> On 23 Dec 2010, at 11:07, Phil Harman <phil.harman at gmail.com 
> <mailto:phil.harman at gmail.com>> wrote:
>
>> Great question. In "good enough" computing, beauty is in the
eye of
>> the beholder. My home NAS appliance uses mirrorwd IDE and SATA drives 
>> without a dedicated ZIL
>
> device. And for my home SMB and NFS, that''s good enough.
>
> I''m sure that even a 7200rpm SATA ZIL would improve things inmy
case.
>
> The random I/O requirement for the ZIL is discussed by Adam (and 
> Chris) here ...
>
>> http://dtrace.org/blogs/ahl/2010/11/15/zil-analysis-from-chris-george/
>
> What I find most encouraging is this statement:
>
>> "if HDDs and commodity SSDs continue to be target ZIL devices, ZFS
>> could and should do more to ensure that writes are sequential."
>
> It''s not broken, but it is suboptimal, and fixable (apparently) ;)Yeah - I read through Christopher''s article already and it clearly
shows
the shortcomings of current flash SSDs as ZIL devices. On the other 
hand, if you''s be using a DDRdrive as a ZIL device, you''d
pretty lock
this zpool to that particular host, since you can''t easily move the 
zpool onto another host, without moving the DDRdrive as well or without 
detaching the ZIL device(s) from the zpool, which I find a little bit odd.

I am not actually running in a SOHO scenario with my ZFS file server, 
since it has to serve up to 200 users on up to 200 zfs volumes in one 
zpool, but the actual data traffic is also not that high either. The 
traffic is more of small peaks when someone writes back to a file.

Cheers,
budy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101223/ec1f90c4/attachment.html>

Phil Harman

2010-Dec-23 12:09 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

On 23 Dec 2010, at 11:53, Stephan Budach <stephan.budach at jvm.de> wrote:
> Am 23.12.10 12:18, schrieb Phil Harman:
>> 
>> Sent from my iPhone (which had a lousy user interface which makes it
all too easy for a clumsy oaf like me to touch "Send" before
I''m done)...
>> 
>> On 23 Dec 2010, at 11:07, Phil Harman <phil.harman at gmail.com>
wrote:
>> 
>>> Great question. In "good enough" computing, beauty is in
the eye of the beholder. My home NAS appliance uses mirrorwd IDE and SATA drives
without a dedicated ZIL
>> 
>> device. And for my home SMB and NFS, that''s good enough.
>> 
>> I''m sure that even a 7200rpm SATA ZIL would improve things
inmy case.
>> 
>> The random I/O requirement for the ZIL is discussed by Adam (and Chris)
here ...
>> 
>>>
http://dtrace.org/blogs/ahl/2010/11/15/zil-analysis-from-chris-george/
>> 
>> What I find most encouraging is this statement:
>> 
>>> "if HDDs and commodity SSDs continue to be target ZIL devices,
ZFS could and should do more to ensure that writes are sequential."
>> 
>> It''s not broken, but it is suboptimal, and fixable
(apparently) ;)
> Yeah - I read through Christopher''s article already and it clearly
shows the shortcomings of current flash SSDs as ZIL devices. On the other hand,
if you''s be using a DDRdrive as a ZIL device, you''d pretty
lock this zpool to that particular host, since you can''t easily move
the zpool onto another host, without moving the DDRdrive as well or without
detaching the ZIL device(s) from the zpool, which I find a little bit odd.
> 
> I am not actually running in a SOHO scenario with my ZFS file server, since
it has to serve up to 200 users on up to 200 zfs volumes in one zpool, but the
actual data traffic is also not that high either. The traffic is more of small
peaks when someone writes back to a file.
> 
> Cheers,
> budy
Well, your proposed config will improve what each user sees during their own
private burst, and short stroking can only improve things in the worst case
scenario (although it may not be measurable). So why not give it a spin and
report back to the list in the new year?

All the best,
Phil
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101223/1a22ca87/attachment.html>

Stephan Budach

2010-Dec-23 12:28 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

Am 23.12.10 13:09, schrieb Phil Harman:> On 23 Dec 2010, at 11:53, Stephan Budach <stephan.budach at jvm.de 
> <mailto:stephan.budach at jvm.de>> wrote:
>
>> Am 23.12.10 12:18, schrieb Phil Harman:
>>> Sent from my iPhone (which had a lousy user interface which makes
it
>>> all too easy for a clumsy oaf like me to touch "Send"
before I''m
>>> done)...
>>>
>>> On 23 Dec 2010, at 11:07, Phil Harman <phil.harman at gmail.com 
>>> <mailto:phil.harman at gmail.com>> wrote:
>>>
>>>> Great question. In "good enough" computing, beauty is
in the eye of
>>>> the beholder. My home NAS appliance uses mirrorwd IDE and SATA 
>>>> drives without a dedicated ZIL
>>>
>>> device. And for my home SMB and NFS, that''s good enough.
>>>
>>> I''m sure that even a 7200rpm SATA ZIL would improve things
inmy case.
>>>
>>> The random I/O requirement for the ZIL is discussed by Adam (and 
>>> Chris) here ...
>>>
>>>>
http://dtrace.org/blogs/ahl/2010/11/15/zil-analysis-from-chris-george/
>>>
>>> What I find most encouraging is this statement:
>>>
>>>> "if HDDs and commodity SSDs continue to be target ZIL
devices, ZFS
>>>> could and should do more to ensure that writes are
sequential."
>>>
>>> It''s not broken, but it is suboptimal, and fixable
(apparently) ;)
>> Yeah - I read through Christopher''s article already and it
clearly
>> shows the shortcomings of current flash SSDs as ZIL devices. On the 
>> other hand, if you''s be using a DDRdrive as a ZIL device,
you''d
>> pretty lock this zpool to that particular host, since you
can''t
>> easily move the zpool onto another host, without moving the DDRdrive 
>> as well or without detaching the ZIL device(s) from the zpool, which 
>> I find a little bit odd.
>>
>> I am not actually running in a SOHO scenario with my ZFS file server, 
>> since it has to serve up to 200 users on up to 200 zfs volumes in one 
>> zpool, but the actual data traffic is also not that high either. The 
>> traffic is more of small peaks when someone writes back to a file.
>>
>> Cheers,
>> budy
>
> Well, your proposed config will improve what each user sees during 
> their own private burst, and short stroking can only improve things in 
> the worst case scenario (although it may not be measurable). So why 
> not give it a spin and report back to the list in the new year?
>
Ha ha - if no one else has some more input on this, I will definetively 
give it a try in jannuary.

Cheers,
budy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101223/4270fd1f/attachment.html>

Eric D. Mudama

2010-Dec-23 18:05 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

On Thu, Dec 23 at 11:25, Stephan Budach wrote:>   Hi,
>
>   as I have learned from the discussion about which SSD to use as ZIL
>   drives, I stumbled across this article, that discusses short stroking for
>   increasing IOPs on SAS and SATA drives:
>
>   [1]http://www.tomshardware.com/reviews/short-stroking-hdd,2157.html
>
>   Now, I am wondering if using a mirror of such 15k SAS drives would be a
>   good-enough fit for a ZIL on a zpool that is mainly used for file
services
>   via AFP and SMB.
>   I''d particulary like to know, if someone has already used such a
solution
>   and how it has worked out.
Haven''t personally used it, but the worst case steady-state IOPS of
the Vertex2 EX, from the DDRDrive presentation, is 6k IOPS assuming a
full-pack random workload.

To achieve that through SAS disks in the same workload, you''ll
probably spend significantly more money and it will consume a LOT more
space and power.

According to that Tom''s article, a typical 15k SAS enterprise drive is
in the 600 IOPS ballpark when short-stroked and consumes about 15W
active.  Thus you''re going to need ten of these devices, to equal the
"degraded" steady-state IOPS of an SSD.  I just don''t think
the math
works out.  At that point, you''re probably better-off not having a
dedicated ZIL, instead of burning 10 slots and 150W.

--eric

-- 
Eric D. Mudama
edmudama at mail.bounceswoosh.org

Stephan Budach

2010-Dec-23 21:29 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

Am 23.12.10 19:05, schrieb Eric D. Mudama:> On Thu, Dec 23 at 11:25, Stephan Budach wrote:
>>   Hi,
>>
>>   as I have learned from the discussion about which SSD to use as ZIL
>>   drives, I stumbled across this article, that discusses short 
>> stroking for
>>   increasing IOPs on SAS and SATA drives:
>>
>>   [1]http://www.tomshardware.com/reviews/short-stroking-hdd,2157.html
>>
>>   Now, I am wondering if using a mirror of such 15k SAS drives would 
>> be a
>>   good-enough fit for a ZIL on a zpool that is mainly used for file 
>> services
>>   via AFP and SMB.
>>   I''d particulary like to know, if someone has already used
such a
>> solution
>>   and how it has worked out.
>
> Haven''t personally used it, but the worst case steady-state IOPS
of
> the Vertex2 EX, from the DDRDrive presentation, is 6k IOPS assuming a
> full-pack random workload.
>
> To achieve that through SAS disks in the same workload, you''ll
> probably spend significantly more money and it will consume a LOT more
> space and power.
>
> According to that Tom''s article, a typical 15k SAS enterprise
drive is
> in the 600 IOPS ballpark when short-stroked and consumes about 15W
> active.  Thus you''re going to need ten of these devices, to equal
the
> "degraded" steady-state IOPS of an SSD.  I just don''t
think the math
> works out.  At that point, you''re probably better-off not having a
> dedicated ZIL, instead of burning 10 slots and 150W.Good - that was actually the information I have been missing. So, I will 
rather go with the Vertex2 EX then and save me the hassle of short 
stroking entirely.

Thanks and merry christmas to all on this list.

Cheers,
budy

Nicolas Williams

2010-Dec-23 22:29 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

On Thu, Dec 23, 2010 at 11:25:43AM +0100, Stephan Budach
wrote:> as I have learned from the discussion about which SSD to use as ZIL
> drives, I stumbled across this article, that discusses short
> stroking for increasing IOPs on SAS and SATA drives:
There was a thread on this a while back.  I forget when or the subject.
But yes, you could even use 7200 rpm drives to make a fast ZIL device.
The trick is the on-disk format, and the pseudo-device driver that you
would have to layer on top of the actual device(s) to get such
performance.  The key is that sustained sequential I/O rates for disks
can be quite large, so if you organize the disk in a log form and use
the outer tracks only, then you can get pretend to have awesome write
IOPS for a disk (but NOT read IOPs).

But it''s not necessarily as cheap as you might think.  You''d
be making
very inefficient use of an expensive disk (in the case of an SAS 15k rpm
disk), or disks, and if plural then you are also using more ports
(oops).  Disks used this way probably also consume more power than SSDs
(OK, this part of my analysis if very iffy), and you still need to do
something about ensuring syncs to disk on power failure (such as just
disabling the cache on the disk, but this would lower performance,
increasing the cost).  When you factor all the costs in I suspect
you''ll
find that SSDs are priced reasonably well.  That''s not to say that one
could not put together a disk-based log device that could eat SSDs''
lunch, but SSD prices would then just come down to match that -- and you
can expect SSD prices to come down anyways, as with any new
technologies.

I don''t mean to discourage you, just to point out that there''s
plenty of
work to do to make "short-stroked disks as ZILs" a workable reality,
while the economics of doing that work versus waiting for SSD prices to
come down don''t seem appealing.  Caveat emptor: my analysis is
off-the-cuff; I could be wrong.

Nico
--

Edward Ned Harvey

2010-Dec-24 04:59 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Stephan Budach
> 
> Now, I am wondering if using a mirror of such 15k SAS drives would be a
> good-enough fit for a ZIL on a zpool that is mainly used for file services
via> AFP and SMB.
For supporting AFP and SMB, most likely, you would be perfectly happy simply
disabling the ZIL.  You will get maximum performance... Even higher than the
world''s fastest SSD or DDRDrive or any other type of storage device for
dedicated log.  To determine if this is ok for you, be aware of the argument
*against* disabling the ZIL:

In the event of an ungraceful crash, with ZIL enabled, you lose up to 30 sec
of async data, but you do not lose any sync data.

In the event of an ungraceful crash, with ZIL disabled, you lose up to 30
sec of async and sync data.

In neither case do you have data corruption, or a corrupt filesystem.  The
only question is about 30 seconds of sync data.  You must protect this type
of data, if you''re running a database, an iscsi target for virtual
hosts,
and for some other types of data services...  But if you''re doing just
AFP
and SMB, it''s pretty likely you don''t need to worry about it.

Frank Lahm

2010-Dec-24 06:45 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

2010/12/24 Edward Ned Harvey
<opensolarisisdeadlongliveopensolaris at
nedharvey.com>:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> bounces at opensolaris.org] On Behalf Of Stephan Budach
>>
>> Now, I am wondering if using a mirror of such 15k SAS drives would be a
>> good-enough fit for a ZIL on a zpool that is mainly used for file
services
> via
>> AFP and SMB.
>
> For supporting AFP and SMB, most likely, you would be perfectly happy
simply
> disabling the ZIL. ?You will get maximum performance... Even higher than
the
> world''s fastest SSD or DDRDrive or any other type of storage
device for
> dedicated log. ?To determine if this is ok for you, be aware of the
argument
> *against* disabling the ZIL:
>
> In the event of an ungraceful crash, with ZIL enabled, you lose up to 30
sec
> of async data, but you do not lose any sync data.
>
> In the event of an ungraceful crash, with ZIL disabled, you lose up to 30
> sec of async and sync data.
>
> In neither case do you have data corruption, or a corrupt filesystem. ?The
> only question is about 30 seconds of sync data. ?You must protect this type
> of data, if you''re running a database, ...
With Netatalk for AFP he _is_ running a database: any AFP server needs
to maintain a consistent mapping between _not reused_ catalog node ids
(CNIDs) and filesystem objects. Luckily for Apple, HFS[+] and their
Cocoa/Carbon APIs provide such a mapping making diirect use of HFS+
CNIDs. Unfortunately most UNIX filesystem reuse inodes and have no API
for mapping inodes to filesystem objects. Therefor all AFP servers
running on non-Apple OSen maintain a database providing this mapping,
in case of Netatalk it''s `cnid_dbd` using a BerkeleyDB database.

-f

Edward Ned Harvey

2010-Dec-24 13:20 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

> From: Frank Lahm [mailto:franklahm at googlemail.com]
> 
> With Netatalk for AFP he _is_ running a database: any AFP server needs
> to maintain a consistent mapping between _not reused_ catalog node ids
> (CNIDs) and filesystem objects. Luckily for Apple, HFS[+] and their
> Cocoa/Carbon APIs provide such a mapping making diirect use of HFS+
> CNIDs. Unfortunately most UNIX filesystem reuse inodes and have no API
> for mapping inodes to filesystem objects. Therefor all AFP servers
> running on non-Apple OSen maintain a database providing this mapping,
> in case of Netatalk it''s `cnid_dbd` using a BerkeleyDB database.
Don''t all of those concerns disappear in the event of a reboot?

If you stop AFP, you could completely obliterate the BDB database, and restart
AFP, and functionally continue from where you left off.  Right?

Richard Elling

2010-Dec-24 18:21 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

On Dec 23, 2010, at 2:25 AM, Stephan Budach wrote:> as I have learned from the discussion about which SSD to use as ZIL drives,
I stumbled across this article, that discusses short stroking for increasing
IOPs on SAS and SATA drives:
> 
> http://www.tomshardware.com/reviews/short-stroking-hdd,2157.html
> 
> Now, I am wondering if using a mirror of such 15k SAS drives would be a
good-enough fit for a ZIL on a zpool that is mainly used for file services via
AFP and SMB.
SMB does not create much of a synchronous load.  I haven''t explored AFP
directly,
but if they do use Berkeley DB, then we do have a lot of experience tuning ZFS
for
Berkeley DB performance.
> I''d particulary like to know, if someone has already used such a
solution and how it has worked out.
Latency is what matters most.  While there is a loose relationship between IOPS
and latency, you really want low latency.  For 15krpm drives, the average
latency
is 2ms for zero seeks.  A decent SSD will beat that by an order of magnitude.
 -- richard

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101224/dc0c0549/attachment.html>

Phil Harman

2010-Dec-24 18:46 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

On 24/12/2010 18:21, Richard Elling wrote:> Latency is what matters most.  While there is a loose relationship 
> between IOPS
> and latency, you really want low latency.  For 15krpm drives, the 
> average latency
> is 2ms for zero seeks.  A decent SSD will beat that by an order of 
> magnitude.
And the closer you get to the CPU, the lower the latency. For example, 
the DDRdrive X1 is yet another order of magnitude faster because it sits 
directly on the PCI bus, without the overhead of SAS protocol.

Yet the humble old 15K drive with 2ms sequential latency is still and 
order of magnitude faster than a busy drive delivering 20ms latencies 
under a random workload.

Ross Walker

2010-Dec-26 01:37 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

On Dec 24, 2010, at 1:21 PM, Richard Elling <richard.elling at gmail.com>
wrote:
> Latency is what matters most.  While there is a loose relationship between
IOPS
> and latency, you really want low latency.  For 15krpm drives, the average
latency
> is 2ms for zero seeks.  A decent SSD will beat that by an order of
magnitude.
Actually I''d say that latency has a direct relationship to IOPS because
it''s the time it takes to perform an IO that determines how many IOs
Per Second that can be performed.

Ever notice how storage vendors list their max IOPS in 512 byte sequential IO
workloads and sustained throughput in 1MB+ sequential IO workloads. Only SSD
makers list their random IOPS workload numbers and their 4K IO workload numbers.

-Ross

Richard Elling

2010-Dec-26 04:21 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

On Dec 25, 2010, at 5:37 PM, Ross Walker wrote:> On Dec 24, 2010, at 1:21 PM, Richard Elling <richard.elling at
gmail.com> wrote:
> 
>> Latency is what matters most.  While there is a loose relationship
between IOPS
>> and latency, you really want low latency.  For 15krpm drives, the
average latency
>> is 2ms for zero seeks.  A decent SSD will beat that by an order of
magnitude.
> 
> Actually I''d say that latency has a direct relationship to IOPS
because it''s the time it takes to perform an IO that determines how
many IOs Per Second that can be performed.
That is only true when there is one queue and one server (in the queueing
context).
This is not the case where there are multiple concurrent I/O that can be
completed
out of order by multiple servers working in parallel (eg. disk subsystems).  For
an
extreme example, the Sun Storage F5100 Array specifications show 1.6 million
random read IOPS @ 4KB.  But instead of an average latency of 625 nanoseconds,
it shows an average latency of 0.378 milliseconds.  The analogy we''ve
used in parallel
computing for many years is "nine women cannot make a baby in one
month."
> Ever notice how storage vendors list their max IOPS in 512 byte sequential
IO workloads and sustained throughput in 1MB+ sequential IO workloads. Only SSD
makers list their random IOPS workload numbers and their 4K IO workload numbers.
The vendor will present the number that makes them look best, often without
regard for practical application... the "curse of marketing" :-)
 -- richard

Nicolas Williams

2010-Dec-27 00:07 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

On Sat, Dec 25, 2010 at 08:37:42PM -0500, Ross Walker
wrote:> On Dec 24, 2010, at 1:21 PM, Richard Elling <richard.elling at
gmail.com> wrote:
> 
> > Latency is what matters most.  While there is a loose relationship
between IOPS
> > and latency, you really want low latency.  For 15krpm drives, the
average latency
> > is 2ms for zero seeks.  A decent SSD will beat that by an order of
magnitude.
> 
> Actually I''d say that latency has a direct relationship to IOPS
because it''s the time it takes to perform an IO that determines how
many IOs Per Second that can be performed.
Assuming you have enough synchronous writes and that you can organize
them so as to keep the drive at max sustained sequential write
bandwidth, then IOPS == bandwidth / logical I/O size.  Latency doesn''t
enter into that formula.  Latency does remain though, and will be
noticeable to apps doing synchronous operations.

Thus 100MB/s, say, sustained sequential write bandwidth with, say, 2KB
avg ZIL entries you''d get 51200/s logical, sync write operations.  The
latency for each such operation would still be 2ms (or whatever it is
for the given disk).  Since you''d likely have to batch many ZIL writes
you''d end up making the latency for some ops longer than 2ms and others
shorter, but if you can keep the drive at max sustained seq write
bandwidth then the average latency will be 2ms.

SSDs are clearly a better choice.

BTW, a parallelized tar would greatly help reduce the impact of high
latency open()/close() (over NFS) operations...

Nico
--

Edward Ned Harvey

2010-Dec-28 02:06 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Nicolas Williams
> 
> > Actually I''d say that latency has a direct relationship to
IOPS because
it''s the> time it takes to perform an IO that determines how many IOs Per Second
> that can be performed.
> 
> Assuming you have enough synchronous writes and that you can organize
> them so as to keep the drive at max sustained sequential write
> bandwidth, then IOPS == bandwidth / logical I/O size.  Latency
doesn''t
Ok, what we''ve hit here is two people using the same word to talk about
different things.  Apples to oranges, as it were.  Both meanings of
"IOPS"
are ok, but context is everything.  

There are drive random IOPS, which is dependent on latency and seek time,
and there is also measured random IOPS above the filesystem layer, which is
not always related to latency or seek time, as described above.

Richard Elling

2010-Dec-28 06:42 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

On Dec 27, 2010, at 6:06 PM, Edward Ned Harvey wrote:
>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> bounces at opensolaris.org] On Behalf Of Nicolas Williams
>> 
>>> Actually I''d say that latency has a direct relationship to
IOPS because
> it''s the
>> time it takes to perform an IO that determines how many IOs Per Second
>> that can be performed.
>> 
>> Assuming you have enough synchronous writes and that you can organize
>> them so as to keep the drive at max sustained sequential write
>> bandwidth, then IOPS == bandwidth / logical I/O size.  Latency
doesn''t
> 
> Ok, what we''ve hit here is two people using the same word to talk
about
> different things.  Apples to oranges, as it were.  Both meanings of
"IOPS"
> are ok, but context is everything.  
> 
> There are drive random IOPS, which is dependent on latency and seek time,
> and there is also measured random IOPS above the filesystem layer, which is
> not always related to latency or seek time, as described above.
The small, random read model can assume no cache hits. Adding caches makes
the model too complicated for simple analysis, and arguably too complicated for
modeling at all. For such systems, empirical measurements are possible, but can
be overly optimistic.  For example, it is relatively trivial to demonstrate
500,000
small, random read IOPS at the application using a file system that caches to
RAM.
Achieving that performance level for the general case is much less common.
 -- richard

Nicolas Williams

2010-Dec-28 07:09 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

On Mon, Dec 27, 2010 at 09:06:45PM -0500, Edward Ned Harvey
wrote:> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> > bounces at opensolaris.org] On Behalf Of Nicolas Williams
> > 
> > > Actually I''d say that latency has a direct relationship
to IOPS because
> it''s the
> > time it takes to perform an IO that determines how many IOs Per Second
> > that can be performed.
> > 
> > Assuming you have enough synchronous writes and that you can organize
> > them so as to keep the drive at max sustained sequential write
> > bandwidth, then IOPS == bandwidth / logical I/O size.  Latency
doesn''t
> 
> Ok, what we''ve hit here is two people using the same word to talk
about
> different things.  Apples to oranges, as it were.  Both meanings of
"IOPS"
> are ok, but context is everything.  
> 
> There are drive random IOPS, which is dependent on latency and seek time,
> and there is also measured random IOPS above the filesystem layer, which is
> not always related to latency or seek time, as described above.
Clearly the application cares about _synchronous_ operations that are
meaningful to it.  In the case of an NFS application that would be
open() with O_CREAT (and particularly O_EXCL), close(), fsync() and so
on.  For a POSIX (but not NFS) application the number of synchronous
operations is smaller.  The rate of asynchronous operations is less
important to the application because those are subject to caching, thus
less predictable.  But to the filesystem the IOPS are not just about
synchronous I/O but about how many distinct I/O operations can be
completed per unit of time.  I tried to keep this clear; sorry for any
confusion.

Nico
--

Edward Ned Harvey

2010-Dec-29 01:26 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> Ok, what we''ve hit here is two people using the same word to talk
about
> different things.  Apples to oranges, as it were.  Both meanings of
"IOPS"
> are ok, but context is everything.
> 
> There are drive random IOPS, which is dependent on latency and seek time,
> and there is also measured random IOPS above the filesystem layer, which
is> not always related to latency or seek time, as described above.
In any event, the relevant points are:

The question of IOPS here is relevant to conversation because of ZIL
dedicated log.  If you have advanced short-stroking to get the write latency
of a log device down to zero, then it can compete against SSD for purposes
of a log device, but nobody seems to believe such technology currently
exists, and it certainly couldn''t compete against SSD for random reads.
(ZIL log is the only situation I know of, where write performance of a drive
matters and read performance does not matter.)

If using ZFS for AFP (and consequently BDB)...  If you disable the ZIL you
will have maximum performance, but maybe you''re not comfortable with
that
because you''re not convinced of stability with ZIL disabled, or for
other
reasons.

* If you put your BDB or ZIL on a spindle dedicated device, it will perform
better than having no dedicated device, but the difference might be anything
from 1x to 10x, depending on where your bottlenecks are.  AKA no improvement
is guaranteed, but probably you get at least a little bit.
* If you put your BDB or ZIL on a SSD dedicated log device, it will perform
still better, and again, the difference could be anywhere from 1x to 10x
depending on your bottlenecks.  
* If you disable your ZIL, it will perform still better, and again, the
difference could be anywhere from 1x to 10x.

Realistically, at some point you''ll hit a network bottleneck, and you
won''t
notice the improved performance.  If you''re just doing small numbers of
large files, none of the above will probably be noticeable, because in that
case latency is pretty much irrelevant.  But assuming you have at least a
bunch of reasonably small files, IMHO that threshold is at the SSD, because
the latency of the SSD is insignificant compared to the latency of the
network.  But even with short-stroking getting the latency down to 2ms,
that''s still significant compared to network latency, so
there''s probably
still room for improvement over the short-stroking techniques.  At least,
until somebody creates a more advanced short-stroking which gets latency
down to near-zero, if that will ever happen.

Bob Friesenhahn

2010-Dec-29 02:22 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

On Tue, 28 Dec 2010, Edward Ned Harvey wrote:>
> In any event, the relevant points are:
>
> The question of IOPS here is relevant to conversation because of ZIL
> dedicated log.  If you have advanced short-stroking to get the write
latency
> of a log device down to zero, then it can compete against SSD for purposes
> of a log device, but nobody seems to believe such technology currently
> exists, and it certainly couldn''t compete against SSD for random
reads.
> (ZIL log is the only situation I know of, where write performance of a
drive
> matters and read performance does not matter.)
It seems that you may be confused.  For the ZIL the drive''s rotational 
latency (based on RPM) is the dominating factor and not the lateral 
head seek time on the media.  In this case, the "short-stroking" you 
are talking about does not help any.  The ZIL is already effectively 
"short-stroking" since it writes in order.

The (possibly) worthy optimizations I have heard about are writing the 
log data in a different pattern on disk (via a special device driver) 
with the goal that when when drive sync request comes in the drive is 
quite likely to be able to write immediately.  Since such 
optimizations are quite device and write-load dependent, it is not 
worth while for a large company to develop the feature (but would make 
for an interesting project).

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Edward Ned Harvey

2010-Dec-29 13:40 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

> From: Bob Friesenhahn [mailto:bfriesen at simple.dallas.tx.us]
> Sent: Tuesday, December 28, 2010 9:23 PM
> 
> > The question of IOPS here is relevant to conversation because of ZIL
> > dedicated log.  If you have advanced short-stroking to get the write
latency> > of a log device down to zero, then it can compete against SSD for
purposes> > of a log device, but nobody seems to believe such technology currently
> > exists, and it certainly couldn''t compete against SSD for
random reads.
> > (ZIL log is the only situation I know of, where write performance of a
drive> > matters and read performance does not matter.)
> 
> It seems that you may be confused.  For the ZIL the drive''s
rotational
> latency (based on RPM) is the dominating factor and not the lateral
> head seek time on the media.  In this case, the "short-stroking"
you
> are talking about does not help any.  The ZIL is already effectively
> "short-stroking" since it writes in order.
Nope.  I''m not confused at all.  I''m making a distinction
between "short
stroking" and "advanced short stroking."  Where simple
"short stroking" does
as you said - eliminates the head seek time but still susceptible to
rotational latency.  As you said, the ZIL already effectively accomplishes
that end result, provided a dedicated spindle disk for log device, but does
not do that if your ZIL is on the pool storage.  And what I''m calling
"advanced short stroking" are techniques that effectively eliminate,
or
minimize both seek & latency, to zero or near-zero.  What I''m
calling
"advanced short stroking" doesn''t exist as far as I know, but
is
theoretically possible through either special disk hardware or special
drivers.

Kevin Walker

2010-Dec-30 00:35 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

You do seem to misunderstand ZIL.

ZIL is quite simply write cache and using a short stroked rotating drive is
never going to provide a performance increase that is worth talking about
and more importantly ZIL was designed to be used with a RAM/Solid State
Disk.

We use sata2 *HyperDrive5* RAM disks in mirrors and they work well and are
far cheaper than STEC or other enterprise SSD''s and have non of the
issue
related to trim...

Highly recommended... ;-)

http://www.hyperossystems.co.uk/

Kevin


On 29 December 2010 13:40, Edward Ned Harvey <
opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:
> > From: Bob Friesenhahn [mailto:bfriesen at simple.dallas.tx.us]
> > Sent: Tuesday, December 28, 2010 9:23 PM
> >
> > > The question of IOPS here is relevant to conversation because of
ZIL
> > > dedicated log.  If you have advanced short-stroking to get the
write
> latency
> > > of a log device down to zero, then it can compete against SSD for
> purposes
> > > of a log device, but nobody seems to believe such technology
currently
> > > exists, and it certainly couldn''t compete against SSD
for random reads.
> > > (ZIL log is the only situation I know of, where write performance
of a
> drive
> > > matters and read performance does not matter.)
> >
> > It seems that you may be confused.  For the ZIL the drive''s
rotational
> > latency (based on RPM) is the dominating factor and not the lateral
> > head seek time on the media.  In this case, the
"short-stroking" you
> > are talking about does not help any.  The ZIL is already effectively
> > "short-stroking" since it writes in order.
>
> Nope.  I''m not confused at all.  I''m making a distinction
between "short
> stroking" and "advanced short stroking."  Where simple
"short stroking"
> does
> as you said - eliminates the head seek time but still susceptible to
> rotational latency.  As you said, the ZIL already effectively accomplishes
> that end result, provided a dedicated spindle disk for log device, but does
> not do that if your ZIL is on the pool storage.  And what I''m
calling
> "advanced short stroking" are techniques that effectively
eliminate, or
> minimize both seek & latency, to zero or near-zero.  What I''m
calling
> "advanced short stroking" doesn''t exist as far as I
know, but is
> theoretically possible through either special disk hardware or special
> drivers.
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101230/cf2a35e2/attachment.html>

Jason Warr

2010-Dec-30 00:55 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

HyperDrive5 = ACard ANS9010

I have personally been wanting to try one of these for some time as a 
ZIL device.

On 12/29/2010 06:35 PM, Kevin Walker wrote:> You do seem to misunderstand ZIL.
>
> ZIL is quite simply write cache and using a short stroked rotating 
> drive is never going to provide a performance increase that is worth 
> talking about and more importantly ZIL was designed to be used with a 
> RAM/Solid State Disk.
>
> We use sata2 *HyperDrive/5/* RAM disks in mirrors and they work well 
> and are far cheaper than STEC or other enterprise SSD''s and have
non
> of the issue related to trim...
>
> Highly recommended... ;-)
>
> http://www.hyperossystems.co.uk/
>
> Kevin
>
>
> On 29 December 2010 13:40, Edward Ned Harvey 
> <opensolarisisdeadlongliveopensolaris at nedharvey.com 
> <mailto:opensolarisisdeadlongliveopensolaris at nedharvey.com>>
wrote:
>
>     > From: Bob Friesenhahn [mailto:bfriesen at simple.dallas.tx.us
>     <mailto:bfriesen at simple.dallas.tx.us>]
>     > Sent: Tuesday, December 28, 2010 9:23 PM
>     >
>     > > The question of IOPS here is relevant to conversation because
>     of ZIL
>     > > dedicated log.  If you have advanced short-stroking to get
the
>     write
>     latency
>     > > of a log device down to zero, then it can compete against SSD
for
>     purposes
>     > > of a log device, but nobody seems to believe such technology
>     currently
>     > > exists, and it certainly couldn''t compete against
SSD for
>     random reads.
>     > > (ZIL log is the only situation I know of, where write
>     performance of a
>     drive
>     > > matters and read performance does not matter.)
>     >
>     > It seems that you may be confused.  For the ZIL the
drive''s
>     rotational
>     > latency (based on RPM) is the dominating factor and not the
lateral
>     > head seek time on the media.  In this case, the
"short-stroking" you
>     > are talking about does not help any.  The ZIL is already
effectively
>     > "short-stroking" since it writes in order.
>
>     Nope.  I''m not confused at all.  I''m making a
distinction between
>     "short
>     stroking" and "advanced short stroking."  Where simple
"short
>     stroking" does
>     as you said - eliminates the head seek time but still susceptible to
>     rotational latency.  As you said, the ZIL already effectively
>     accomplishes
>     that end result, provided a dedicated spindle disk for log device,
>     but does
>     not do that if your ZIL is on the pool storage.  And what I''m
calling
>     "advanced short stroking" are techniques that effectively
>     eliminate, or
>     minimize both seek & latency, to zero or near-zero.  What
I''m calling
>     "advanced short stroking" doesn''t exist as far as I
know, but is
>     theoretically possible through either special disk hardware or special
>     drivers.
>
>
>     _______________________________________________
>     zfs-discuss mailing list
>     zfs-discuss at opensolaris.org <mailto:zfs-discuss at
opensolaris.org>
>     http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101229/345ee5d9/attachment.html>

Fred Liu

2010-Dec-30 01:36 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

I do the same with ACARD?
Works well enough.

Fred

From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at
opensolaris.org] On Behalf Of Jason Warr
Sent: ???, ??? 30, 2010 8:56
To: zfs-discuss at opensolaris.org
Subject: Re: [zfs-discuss] SAS/short stroking vs. SSDs for ZIL

HyperDrive5 = ACard ANS9010

I have personally been wanting to try one of these for some time as a ZIL
device.

On 12/29/2010 06:35 PM, Kevin Walker wrote:
You do seem to misunderstand ZIL.

ZIL is quite simply write cache and using a short stroked rotating drive is
never going to provide a performance increase that is worth talking about and
more importantly ZIL was designed to be used with a RAM/Solid State Disk.

We use sata2 HyperDrive5 RAM disks in mirrors and they work well and are far
cheaper than STEC or other enterprise SSD''s and have non of the issue
related to trim...

Highly recommended... ;-)

http://www.hyperossystems.co.uk/

Kevin

On 29 December 2010 13:40, Edward Ned Harvey
<opensolarisisdeadlongliveopensolaris at
nedharvey.com<mailto:opensolarisisdeadlongliveopensolaris at
nedharvey.com>> wrote:> From: Bob Friesenhahn [mailto:bfriesen at
simple.dallas.tx.us<mailto:bfriesen at simple.dallas.tx.us>]
> Sent: Tuesday, December 28, 2010 9:23 PM
>
> > The question of IOPS here is relevant to conversation because of ZIL
> > dedicated log.  If you have advanced short-stroking to get the write
latency> > of a log device down to zero, then it can compete against SSD for
purposes> > of a log device, but nobody seems to believe such technology currently
> > exists, and it certainly couldn''t compete against SSD for
random reads.
> > (ZIL log is the only situation I know of, where write performance of a
drive> > matters and read performance does not matter.)
>
> It seems that you may be confused.  For the ZIL the drive''s
rotational
> latency (based on RPM) is the dominating factor and not the lateral
> head seek time on the media.  In this case, the "short-stroking"
you
> are talking about does not help any.  The ZIL is already effectively
> "short-stroking" since it writes in order.Nope.  I''m not confused at all.  I''m making a distinction
between "short
stroking" and "advanced short stroking."  Where simple
"short stroking" does
as you said - eliminates the head seek time but still susceptible to
rotational latency.  As you said, the ZIL already effectively accomplishes
that end result, provided a dedicated spindle disk for log device, but does
not do that if your ZIL is on the pool storage.  And what I''m calling
"advanced short stroking" are techniques that effectively eliminate,
or
minimize both seek & latency, to zero or near-zero.  What I''m
calling
"advanced short stroking" doesn''t exist as far as I know, but
is
theoretically possible through either special disk hardware or special
drivers.


_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org<mailto:zfs-discuss at opensolaris.org>
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss







_______________________________________________

zfs-discuss mailing list

zfs-discuss at opensolaris.org<mailto:zfs-discuss at opensolaris.org>

http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101229/7d8302d3/attachment-0001.html>

Erik Trimble

2010-Dec-30 01:52 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

On 12/29/2010 4:55 PM, Jason Warr wrote:> HyperDrive5 = ACard ANS9010
>
> I have personally been wanting to try one of these for some time as a 
> ZIL device.
>
Yes, but do remember these require a half-height 5.25" drive bay, and 
you really, really should buy the extra CF card for backup.

Also, stay away from the ANS-9010S with LVD SCSI interface. As (I think) 
Bob pointed out a long time ago, parallel SCSI isn''t good for a 
high-IOPS interface. It (the LVD interface) will throttle long before 
the drive does...

I''ve been waiting for them to come out with a 3.5" version, one
which I
can plug directly into a standard 3.5" SAS/SATA hotswap bay...

And, of course, the ANS9010 is limited to the SATA2 interface speed, so 
it is cheaper and lower-performing (but still better than an SSD) than 
the DDRdrive.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101229/5d55246b/attachment.html>

jason@warr.net

2010-Dec-30 02:32 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

Had not even noticed the LVD version.

The biggest issue for me is not the form factor but the how hard it would be to
get the client I work for to accept them in the env given support issues.

----- Reply message -----
From: "Erik Trimble" <erik.trimble at oracle.com>
Date: Wed, Dec 29, 2010 19:52
Subject: [zfs-discuss] SAS/short stroking vs. SSDs for ZIL
To: "Jason Warr" <jason at warr.net>
Cc: <zfs-discuss at opensolaris.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101229/b8f7efc6/attachment.html>

Edward Ned Harvey

2010-Dec-30 12:42 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

> From: Kevin Walker [mailto:indigoskywalker at gmail.com]
> 
> You do seem to misunderstand ZIL.
Wrong.

> ZIL is quite simply write cache 
ZIL is not simply write cache, but it enables certain types of operations to
use write cache which otherwise would have been ineligible.

The Intent Log is where ZFS immediately writes sync-write requests, so it
can unblock the process which called write().  Once the data has been
committed to nonvolatile ZIL storage, the process can continue processing,
and ZFS can treat the write requests as async writes.  Which means, after
ZFS has written the ZIL, then the data is able to stay a while in the RAM
write buffer along with all the async writes.  Which means ZFS is able to
aggregate and optimize all the writes for best performance.

This means ZIL is highly sensitive to access times.  (seek + latency)

> using a short stroked rotating drive is
> never going to provide a performance increase that is worth talking about
If you don''t add a dedicated log device, then the ZIL utilizes blocks
from
the main storage pool, and all sync writes suddenly get higher priority than
all the queued reads and async writes.  If you have a busy storage pool,
your sync writes might see something like 20ms access times (seek + latency)
before they can hit nonvolatile storage, and every time this happens, some
other operation gets delayed.

If you add a spindle drive dedicated log device, then that drive is always
idle except when writing ZIL for sync writes, and also, the head will barely
move over the platter because all the ZIL blocks will be clustered tightly
together.  So the ZIL might require typically 2ms or 3ms access times
(negligible seek or 1ms seek + 2ms latency), which is an order of magnitude
better than before.  Plus the sync writes in this case don''t take away
performance from the main pool reads & writes.

If you replace your spindle drive with a SSD, then you get another order of
magnitude smaller access time.  (Tens of thousands of IOPS effectively
compares to <<1ms access time per OP)

If you disable your ZIL completely, then you get another order of magnitude
smaller access time.  (Some ns to think about putting the data directly into
RAM write buffer and entirely bypass the ZIL).

> and more importantly ZIL was designed to be used with a RAM/Solid State
> Disk.
I hope you mean NVRAM or battery-backed RAM of some kind.  Because if you
use volatile RAM for ZIL, then you have disabled ZIL from being able to
function correctly.

The ZFS Best Practices Guide specifically mentions "Better performance
might
be possible by using [...], or even a dedicated spindle disk."

Frank Lahm

2011-Jan-02 11:56 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

2010/12/24 Edward Ned Harvey
<opensolarisisdeadlongliveopensolaris at
nedharvey.com>:>> From: Frank Lahm [mailto:franklahm at googlemail.com]
>>
>> With Netatalk for AFP he _is_ running a database: any AFP server needs
>> to maintain a consistent mapping between _not reused_ catalog node ids
>> (CNIDs) and filesystem objects. Luckily for Apple, HFS[+] and their
>> Cocoa/Carbon APIs provide such a mapping making diirect use of HFS+
>> CNIDs. Unfortunately most UNIX filesystem reuse inodes and have no API
>> for mapping inodes to filesystem objects. Therefor all AFP servers
>> running on non-Apple OSen maintain a database providing this mapping,
>> in case of Netatalk it''s `cnid_dbd` using a BerkeleyDB
database.
>
> Don''t all of those concerns disappear in the event of a reboot?
>
> If you stop AFP, you could completely obliterate the BDB database, and
restart AFP, and functionally continue from where you left off. ?Right?
No. Apple''s APIs provide semantics by which you can reference
filesystem objects by their parent directory CNID + object name. More
important in this context: these references can be stored, retrieved
and reused, eg. Finder Aliasses, Adobe InDesign and many more
applications use these semantics to store references to files.
If you nuke the CNID database, upon renumeration of the volumes all
filesystem objects are likely to assigned new and different CNIDs,
thus all references are broken.

-f

Edward Ned Harvey

2011-Jan-02 15:52 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

> From: Frank Lahm [mailto:franklahm at googlemail.com]
> 
> > Don''t all of those concerns disappear in the event of a
reboot?
> >
> > If you stop AFP, you could completely obliterate the BDB database, and
> restart AFP, and functionally continue from where you left off.  Right?
> 
> No. Apple''s APIs provide semantics by which you can reference
> filesystem objects by their parent directory CNID + object name. More
> important in this context: these references can be stored, retrieved
> and reused, eg. Finder Aliasses, Adobe InDesign and many more
> applications use these semantics to store references to files.
> If you nuke the CNID database, upon renumeration of the volumes all
> filesystem objects are likely to assigned new and different CNIDs,
> thus all references are broken.
Just like...  If you shut down your Apple OSX AFP file server, move all the
files to a new upgraded file server, reassigned the old IP address and DNS name
to the new server, and enabled AFP file services on the new file server.

How do people handle the broken links issue, when they upgrade their Apple
server?  If they don''t bother doing anything about it, I would conclude
it''s no big deal.  If there is instead, some process you''re
supposed to follow when you upgrade/replace your Apple AFP fileserver, I wonder
if that process is applicable to the present thread of discussion as well.

Stephan Budach

2011-Jan-02 16:59 UTC

head link

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

Am 02.01.11 16:52, schrieb Edward Ned Harvey:>> From: Frank Lahm [mailto:franklahm at googlemail.com]
>>
>>> Don''t all of those concerns disappear in the event of a
reboot?
>>>
>>> If you stop AFP, you could completely obliterate the BDB database,
and
>> restart AFP, and functionally continue from where you left off.  Right?
>>
>> No. Apple''s APIs provide semantics by which you can reference
>> filesystem objects by their parent directory CNID + object name. More
>> important in this context: these references can be stored, retrieved
>> and reused, eg. Finder Aliasses, Adobe InDesign and many more
>> applications use these semantics to store references to files.
>> If you nuke the CNID database, upon renumeration of the volumes all
>> filesystem objects are likely to assigned new and different CNIDs,
>> thus all references are broken.
> Just like...  If you shut down your Apple OSX AFP file server, move all the
files to a new upgraded file server, reassigned the old IP address and DNS name
to the new server, and enabled AFP file services on the new file server.
>
> How do people handle the broken links issue, when they upgrade their Apple
server?  If they don''t bother doing anything about it, I would conclude
it''s no big deal.  If there is instead, some process you''re
supposed to follow when you upgrade/replace your Apple AFP fileserver, I wonder
if that process is applicable to the present thread of discussion as well.
>Well? on the Apple platform HFS+ (the Mac''s default fs) takes care of 
that, so you''d never have to worry about this issue there. On the 
*nix-side of things, when running Netatalk, you''ll have to store these 
information in some kind of extra database, which is BDB in this case.

Initially, I only wanted check what hw to get for my ZIL and I agree 
that by now, I have already decided - and ordered - two Vertex 2 EX 50GB 
SSDs to handle the ZIL for my zpool, since am serving already 50 AFP 
sharepoints which are accessed by 120 clients. The number of sharepoints 
will eventually rise up to 250 and the number of clients will rise up to 
450 and that would cause some real random workload on the zpool and the 
ZIL, I guess.

The technical discussion about short stroking is nevertheless very 
interesting. ;)

Apparently Analagous Threads

Search for more reasonably related threads

zfs discuss - Dec 2010 - SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

[zfs-discuss] SAS/short stroking vs. SSDs for ZIL

Apparently Analagous Threads