thr3ads.net - zfs discuss - [zfs-discuss] Performance of the ZIL [May 2010]

If this information is useful, please help other people find it:
Share via:

Tony MacDoodle

2010-May-04 17:19 UTC

[zfs-discuss] Performance of the ZIL

How would one determine if I should have a separate ZIL disk? We are using
ZFS as the backend of our Guest Domains boot drives using LDom''s. And
we are
seeing bad/very slow write performance?

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100504/d403b1d6/attachment.html>

Brandon High

2010-May-04 17:34 UTC

head link

[zfs-discuss] Performance of the ZIL

On Tue, May 4, 2010 at 10:19 AM, Tony MacDoodle <tpsdoodle at gmail.com>
wrote:> How would one determine if I should have a separate ZIL disk? We are using
> ZFS as the backend of our Guest Domains boot drives using LDom''s.
And we are
> seeing bad/very slow write performance?
There''s a dtrace script that Richard Elling wrote called zilstat.ksh.
It''s available at
http://www.richardelling.com/Home/scripts-and-programs-1/zilstat

I''m not sure what the numbers mean (there''s info at the
address) but
anything other than lots of 0s indicates that the ZIL is being used.

-B

-- 
Brandon High : bhigh at freaks.com

Robert Milkowski

2010-May-04 19:44 UTC

head link

[zfs-discuss] Performance of the ZIL

On 04/05/2010 18:19, Tony MacDoodle wrote:> How would one determine if I should have a separate ZIL disk? We are 
> using ZFS as the backend of our Guest Domains boot drives using 
> LDom''s. And we are seeing bad/very slow write performance?
>if you can disable ZIL and compare the performance to when it is off it 
will give you an estimate of what''s the absolute maximum performance 
increase (if any) by having a dedicated ZIL device.

-- 
Robert Milkowski
http://milek.blogspot.com

Edward Ned Harvey

2010-May-06 03:32 UTC

head link

[zfs-discuss] Performance of the ZIL

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Robert Milkowski
>
> if you can disable ZIL and compare the performance to when it is off it
> will give you an estimate of what''s the absolute maximum
performance
> increase (if any) by having a dedicated ZIL device.
I''ll second this suggestion.  It''ll cost you nothing to
disable the ZIL
temporarily.  (You have to dismount the filesystem twice.  Once to disable
the ZIL, and once to re-enable it.)  Then you can see if performance is
good.  If performance is good, then you''ll know you need to accelerate
your
ZIL.  (Because disabled ZIL is the fastest thing you could possibly ever
do.)

Generally speaking, you should not disable your ZIL for the long run.  But
in some cases, it makes sense.

Here''s how you determine if you want to disable your ZIL permanently:

First, understand that with the ZIL disabled, all sync writes are treated as
async writes.  This is buffered in ram before being written to disk, so the
kernel can optimize and aggregate the write operations into one big chunk.

No matter what, if you have an ungraceful system shutdown, you will lose all
the async writes that were waiting in ram.

If you have ZIL disabled, you will also lose the sync writes that were
waiting in ram (because those are being handled as async.)

In neither case do you have data or filesystem corruption.

The risk of running with no ZIL is:  In the case of ungraceful shutdown, in
addition to the (up to 30 sec) async writes that will be lost, you will also
lose up to 30 sec of sync writes.

Pasi Kärkkäinen

2010-May-06 06:17 UTC

head link

[zfs-discuss] Performance of the ZIL

On Wed, May 05, 2010 at 11:32:23PM -0400, Edward Ned Harvey
wrote:> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> > bounces at opensolaris.org] On Behalf Of Robert Milkowski
> >
> > if you can disable ZIL and compare the performance to when it is off
it
> > will give you an estimate of what''s the absolute maximum
performance
> > increase (if any) by having a dedicated ZIL device.
> 
> I''ll second this suggestion.  It''ll cost you nothing to
disable the ZIL
> temporarily.  (You have to dismount the filesystem twice.  Once to disable
> the ZIL, and once to re-enable it.)  Then you can see if performance is
> good.  If performance is good, then you''ll know you need to
accelerate your
> ZIL.  (Because disabled ZIL is the fastest thing you could possibly ever
> do.)
> 
> Generally speaking, you should not disable your ZIL for the long run.  But
> in some cases, it makes sense.
> 
> Here''s how you determine if you want to disable your ZIL
permanently:
> 
> First, understand that with the ZIL disabled, all sync writes are treated
as
> async writes.  This is buffered in ram before being written to disk, so the
> kernel can optimize and aggregate the write operations into one big chunk.
> 
> No matter what, if you have an ungraceful system shutdown, you will lose
all
> the async writes that were waiting in ram.
> 
> If you have ZIL disabled, you will also lose the sync writes that were
> waiting in ram (because those are being handled as async.)
> 
> In neither case do you have data or filesystem corruption.
> 
ZFS probably is still OK, since it''s designed to handle this (?),
but the data can''t be OK if you lose 30 secs of writes.. 30 secs of
writes
that have been ack''d being done to the servers/applications..
> The risk of running with no ZIL is:  In the case of ungraceful shutdown, in
> addition to the (up to 30 sec) async writes that will be lost, you will
also
> lose up to 30 sec of sync writes.
> 
-- Pasi

Ragnar Sundblad

2010-May-06 07:42 UTC

head link

[zfs-discuss] Performance of the ZIL

On 6 maj 2010, at 08.17, Pasi K?rkk?inen wrote:
> On Wed, May 05, 2010 at 11:32:23PM -0400, Edward Ned Harvey wrote:
>>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>>> bounces at opensolaris.org] On Behalf Of Robert Milkowski
>>> 
>>> if you can disable ZIL and compare the performance to when it is
off it
>>> will give you an estimate of what''s the absolute maximum
performance
>>> increase (if any) by having a dedicated ZIL device.
>> 
>> I''ll second this suggestion.  It''ll cost you nothing
to disable the ZIL
>> temporarily.  (You have to dismount the filesystem twice.  Once to
disable
>> the ZIL, and once to re-enable it.)  Then you can see if performance is
>> good.  If performance is good, then you''ll know you need to
accelerate your
>> ZIL.  (Because disabled ZIL is the fastest thing you could possibly
ever
>> do.)
>> 
>> Generally speaking, you should not disable your ZIL for the long run. 
But
>> in some cases, it makes sense.
>> 
>> Here''s how you determine if you want to disable your ZIL
permanently:
>> 
>> First, understand that with the ZIL disabled, all sync writes are
treated as
>> async writes.  This is buffered in ram before being written to disk, so
the
>> kernel can optimize and aggregate the write operations into one big
chunk.
>> 
>> No matter what, if you have an ungraceful system shutdown, you will
lose all
>> the async writes that were waiting in ram.
>> 
>> If you have ZIL disabled, you will also lose the sync writes that were
>> waiting in ram (because those are being handled as async.)
>> 
>> In neither case do you have data or filesystem corruption.
>> 
> 
> ZFS probably is still OK, since it''s designed to handle this (?),
> but the data can''t be OK if you lose 30 secs of writes.. 30 secs
of writes
> that have been ack''d being done to the servers/applications..
Entirely right!

This is the case for many local user writes anyway, since many
applications doesn''t sync the written data to disk.

But if you have an application, protocol and/or user that demands
or expects persistant storage, disabling ZIL of course could be fatal
in case of a crash. Examples are mail servers and NFS servers.

/ragge

Edward Ned Harvey

2010-May-06 12:34 UTC

head link

[zfs-discuss] Performance of the ZIL

> From: Pasi K?rkk?inen [mailto:pasik at iki.fi]
>
> > In neither case do you have data or filesystem corruption.
> >
> 
> ZFS probably is still OK, since it''s designed to handle this (?),
> but the data can''t be OK if you lose 30 secs of writes.. 30 secs
of
> writes
> that have been ack''d being done to the servers/applications..
What I meant was:  Yes there''s data loss.  But no corruption.  In other
filesystems, if you have an ungraceful shutdown while the filesystem is
writing, since filesystems such as EXT3 perform file-based (or inode-based)
block write operations, then you can have files whose contents have been
corrupted...  Some sectors of the file still in their "old" state, and
some
sectors of the file in their "new" state.  Likewise, in something like
EXT3,
you could have some file fully written, while another one hasn''t been
written yet, but should have been.  (AKA, some files written out of order.)

In the case of EXT3, since it is a journaled filesystem, the journal only
keeps the *filesystem* consistent after a crash.  It''s still possible
to
have corrupted data in the middle of a file.

These things don''t happen in ZFS.  ZFS takes journaling to a whole new
level.  Instead of just keeping your filesystem consistent, it also keeps
your data consistent.  Yes, data loss is possible when a system crashes, but
the filesystem will never have any corruption.  These are separate things
now, and never were before.

In ZFS, losing n-seconds of writes leading up to the crash will never result
in files partially written, or written out of order.  Every atomic write to
the filesystem results in a filesystem-consistent and data-consistent view
of *some* valid form of all the filesystem and data within it.

Edward Ned Harvey

2010-May-06 12:39 UTC

head link

[zfs-discuss] Performance of the ZIL

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Ragnar Sundblad
> 
> But if you have an application, protocol and/or user that demands
> or expects persistant storage, disabling ZIL of course could be fatal
> in case of a crash. Examples are mail servers and NFS servers.
Basically, anything which writes to disk based on requests from something
across a network.  Because if your system goes down and comes back up,
thinking itself is consistent, but there''s one client thinking
"A" and
another client thinking "B" ... even though your server is consistent,
the
world isn''t.

Another great example would be if your server handles credit card
transactions.  If a user clicks "buy now" in  a web interface, and the
server contacts Visa or MasterCard, records the transaction, and then
crashes before it records the transaction to its own disks ... Then the
server would come up and have no recollection of that transaction.  But the
user, and Visa/Mastercard certainly would remember it.

Ross Walker

2010-May-06 13:55 UTC

head link

[zfs-discuss] Performance of the ZIL

On May 6, 2010, at 8:34 AM, Edward Ned Harvey <solaris2 at nedharvey.com>
wrote:
>> From: Pasi K?rkk?inen [mailto:pasik at iki.fi]
>>
>>> In neither case do you have data or filesystem corruption.
>>>
>>
>> ZFS probably is still OK, since it''s designed to handle this
(?),
>> but the data can''t be OK if you lose 30 secs of writes.. 30
secs of
>> writes
>> that have been ack''d being done to the servers/applications..
>
> What I meant was:  Yes there''s data loss.  But no corruption.  In
> other
> filesystems, if you have an ungraceful shutdown while the filesystem  
> is
> writing, since filesystems such as EXT3 perform file-based (or inode- 
> based)
> block write operations, then you can have files whose contents have  
> been
> corrupted...  Some sectors of the file still in their "old"
state,
> and some
> sectors of the file in their "new" state.  Likewise, in something
> like EXT3,
> you could have some file fully written, while another one hasn''t
been
> written yet, but should have been.  (AKA, some files written out of  
> order.)
>
> In the case of EXT3, since it is a journaled filesystem, the journal  
> only
> keeps the *filesystem* consistent after a crash.  It''s still  
> possible to
> have corrupted data in the middle of a file.
I believe ext3 has an option to journal data as well as metadata, it  
just defaults to metadata.

I don''t believe out-of-order writes are so much an issue any more  
since Linux gained write barrier support (and most file systems and  
block devices now support it).
> These things don''t happen in ZFS.  ZFS takes journaling to a whole
new
> level.  Instead of just keeping your filesystem consistent, it also  
> keeps
> your data consistent.  Yes, data loss is possible when a system  
> crashes, but
> the filesystem will never have any corruption.  These are separate  
> things
> now, and never were before.
ZFS does NOT have a journal, it has an intent log which is completely  
different. A journal logs operations that are to be performed later  
(the journal is read, the operation performed) an intent log logs  
operations that are being performed now, when the disk flushes the  
intent entry is marked complete.

ZFS is consistent by the nature of COW which means a partial write  
will not become part of the file system (the old block pointer isn''t  
updated till the new block completes the write).
> In ZFS, losing n-seconds of writes leading up to the crash will  
> never result
> in files partially written, or written out of order.  Every atomic  
> write to
> the filesystem results in a filesystem-consistent and data- 
> consistent view
> of *some* valid form of all the filesystem and data within it.
ZFS file system will always be consistent, but if an application  
doesn''t flush it''s data, then it can definitely have partially
written
data.

-Ross

Marc Moreau

2010-May-06 22:50 UTC

head link

[zfs-discuss] Performance of the ZIL

On Tue, May 4, 2010 at 11:34 AM, Brandon High <bhigh at freaks.com> wrote:
> On Tue, May 4, 2010 at 10:19 AM, Tony MacDoodle <tpsdoodle at
gmail.com>
> wrote:
> > How would one determine if I should have a separate ZIL disk? We are
> using
> > ZFS as the backend of our Guest Domains boot drives using
LDom''s. And we
> are
> > seeing bad/very slow write performance?
>
> There''s a dtrace script that Richard Elling wrote called
zilstat.ksh.
> It''s available at
> http://www.richardelling.com/Home/scripts-and-programs-1/zilstat
>
> I''m not sure what the numbers mean (there''s info at the
address) but
> anything other than lots of 0s indicates that the ZIL is being used.

On my workstation, I peg my IOPS when using VirtialBox set to run on zvols.
 The zilstat line comes back with about 3000 total synchronous writes per
30sec.  Which means that my disks are doing about 90 sync IOPS write.  That
is about the upper limit for 7200rpm disks ( from what I understand ).

This 3000 number doesn''t really change much over time when running with
IO
load.

Disabling the ZIL, I get much better performance, in terms of IO throughput.
 This tells me that the ZIL is the bottleneck.  I will be getting an SSD
soon.

-- Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100506/45fb50c9/attachment.html>

zfs discuss - May 2010 - Performance of the ZIL

[zfs-discuss] Performance of the ZIL

[zfs-discuss] Performance of the ZIL

[zfs-discuss] Performance of the ZIL

[zfs-discuss] Performance of the ZIL

[zfs-discuss] Performance of the ZIL

[zfs-discuss] Performance of the ZIL

[zfs-discuss] Performance of the ZIL

[zfs-discuss] Performance of the ZIL

[zfs-discuss] Performance of the ZIL

[zfs-discuss] Performance of the ZIL