thr3ads.net - zfs discuss - [zfs-discuss] size of slog device [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Arne Jansen

2010-Jun-14 08:41 UTC

[zfs-discuss] size of slog device

Hi,

I known it''s been discussed here more than once, and I read the
Evil tuning guide, but I didn''t find a definitive statement:

There is absolutely no sense in having slog devices larger than
then main memory, because it will never be used, right?
ZFS will rather flush the txg to disk than reading back from
zil?
So there is a guideline to have enough slog to hold about 10
seconds of zil, but the absolute maximum value is the size of
main memory. Is this correct?

Thanks,
Arne

Thomas Burgess

2010-Jun-14 09:41 UTC

head link

[zfs-discuss] size of slog device

On Mon, Jun 14, 2010 at 4:41 AM, Arne Jansen <sensille at gmx.net> wrote:
> Hi,
>
> I known it''s been discussed here more than once, and I read the
> Evil tuning guide, but I didn''t find a definitive statement:
>
> There is absolutely no sense in having slog devices larger than
> then main memory, because it will never be used, right?
> ZFS will rather flush the txg to disk than reading back from
> zil?
> So there is a guideline to have enough slog to hold about 10
> seconds of zil, but the absolute maximum value is the size of
> main memory. Is this correct?
>
>

I thought it was half the size of memory.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100614/1058dbe5/attachment.html>

Roy Sigurd Karlsbakk

2010-Jun-14 10:15 UTC

head link

[zfs-discuss] size of slog device

> There is absolutely no sense in having slog devices larger than
> then main memory, because it will never be used, right?
> ZFS will rather flush the txg to disk than reading back from
> zil? So there is a guideline to have enough slog to hold about 10
> seconds of zil, but the absolute maximum value is the size of
> main memory. Is this correct?
ZFS uses at most RAM/2 for ZIL

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Edward Ned Harvey

2010-Jun-14 12:51 UTC

head link

[zfs-discuss] size of slog device

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Arne Jansen
> 
> There is absolutely no sense in having slog devices larger than
> then main memory, because it will never be used, right?
Also:  A TXG is guaranteed to flush within 30 sec.  Let''s suppose you
have a
super fast device, which is able to log 8Gbit/sec (which is unrealistic).
That''s 1Gbyte/sec, unrealistically theoretically possible, at best. 
You do
the math.  ;-)

That being said, it''s difficult to buy an SSD smaller than 32G.  So
what are
you going to do?  Slice it and use the remaining space for cache?  Some
people do.  Some people may even get a performance benefit by doing so.  But
if you do, now you''ve got a cache and a log both competing for IO on
the
same device.  The performance benefit degrades for sure.

My advice is to simply acknowledge wasted space in your log device, forget
about it and move on.  Same thing you did with all the wasted space on your
mirrored OS boot device, which can''t (or shouldn''t) be used by
your data
pool.

Arne Jansen

2010-Jun-14 13:02 UTC

head link

[zfs-discuss] size of slog device

Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> bounces at opensolaris.org] On Behalf Of Arne Jansen
>>
>> There is absolutely no sense in having slog devices larger than
>> then main memory, because it will never be used, right?
> 
> Also:  A TXG is guaranteed to flush within 30 sec.  Let''s suppose
you have a
> super fast device, which is able to log 8Gbit/sec (which is unrealistic).
> That''s 1Gbyte/sec, unrealistically theoretically possible, at
best.  You do
> the math.  ;-)
> 
> That being said, it''s difficult to buy an SSD smaller than 32G. 
So what are
> you going to do?
I''m still building my rotational write delay eliminating driver and am
trying
to figure out how much space I can waste on the underlying device without ever
running into problems. I need half the physical memory, or, under the assumption
that it might be tunable, a maximum of my physical memory. It''s good to
know
a hard upper limit. The more I can waste, the faster the device will be.

Also, to stay in your line of argumentation, this super-fast slog is most
probably a DRAM-based, battery backed solution. In this case it will make
a difference if you buy 8 or 32GB ;)

--Arne

Arne Jansen

2010-Jun-14 13:07 UTC

head link

[zfs-discuss] size of slog device

Roy Sigurd Karlsbakk wrote:>> There is absolutely no sense in having slog devices larger than
>> then main memory, because it will never be used, right?
>> ZFS will rather flush the txg to disk than reading back from
>> zil? So there is a guideline to have enough slog to hold about 10
>> seconds of zil, but the absolute maximum value is the size of
>> main memory. Is this correct?
> 
> ZFS uses at most RAM/2 for ZIL
Thanks!

Bob Friesenhahn

2010-Jun-14 16:49 UTC

head link

[zfs-discuss] size of slog device

On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:
>> There is absolutely no sense in having slog devices larger than
>> then main memory, because it will never be used, right?
>> ZFS will rather flush the txg to disk than reading back from
>> zil? So there is a guideline to have enough slog to hold about 10
>> seconds of zil, but the absolute maximum value is the size of
>> main memory. Is this correct?
>
> ZFS uses at most RAM/2 for ZIL
It is good to keep in mind that only small writes go to the dedicated 
slog.  Large writes to to main store.  A succession of that many small 
writes (to fill RAM/2) is highly unlikely.  Also, that the zil is not 
read back unless the system is improperly shut down.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Roy Sigurd Karlsbakk

2010-Jun-14 17:50 UTC

head link

[zfs-discuss] size of slog device

----- Original Message -----> On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:
> 
> >> There is absolutely no sense in having slog devices larger than
> >> then main memory, because it will never be used, right?
> >> ZFS will rather flush the txg to disk than reading back from
> >> zil? So there is a guideline to have enough slog to hold about 10
> >> seconds of zil, but the absolute maximum value is the size of
> >> main memory. Is this correct?
> >
> > ZFS uses at most RAM/2 for ZIL
> 
> It is good to keep in mind that only small writes go to the dedicated
> slog. Large writes to to main store. A succession of that many small
> writes (to fill RAM/2) is highly unlikely. Also, that the zil is not
> read back unless the system is improperly shut down.
I thought all sync writes, meaning everything NFS and iSCSI, went into the slog
- IIRC the docs says so.
 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Bob Friesenhahn

2010-Jun-14 18:29 UTC

head link

[zfs-discuss] size of slog device

On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:
>> It is good to keep in mind that only small writes go to the dedicated
>> slog. Large writes to to main store. A succession of that many small
>> writes (to fill RAM/2) is highly unlikely. Also, that the zil is not
>> read back unless the system is improperly shut down.
>
> I thought all sync writes, meaning everything NFS and iSCSI, went 
> into the slog - IIRC the docs says so.
Check a month or two back in the archives for a post by Matt Ahrens. 
It seems that larger writes (>32k?) are written directly to main 
store.  This is probably a change from the original zfs design.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Neil Perrin

2010-Jun-14 19:10 UTC

head link

[zfs-discuss] size of slog device

On 06/14/10 12:29, Bob Friesenhahn wrote:> On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:
>
>>> It is good to keep in mind that only small writes go to the
dedicated
>>> slog. Large writes to to main store. A succession of that many
small
>>> writes (to fill RAM/2) is highly unlikely. Also, that the zil is
not
>>> read back unless the system is improperly shut down.
>>
>> I thought all sync writes, meaning everything NFS and iSCSI, went 
>> into the slog - IIRC the docs says so.
>
> Check a month or two back in the archives for a post by Matt Ahrens. 
> It seems that larger writes (>32k?) are written directly to main 
> store.  This is probably a change from the original zfs design.
>
> Bob
If there''s a slog then the data, regardless of size, gets written to
the
slog.

If there''s no slog and if the data size is greater than 
zfs_immediate_write_sz/zvol_immediate_write_sz
(both default to 32K) then the data is written as a block into the pool 
and the block pointer
written into the log record. This is the WR_INDIRECT write type.

So Matt and Roy are both correct.

But wait, there''s more complexity!:

If logbias=throughput is set we always use WR_INDIRECT.

If we just wrote more than 1MB for a single zil commit and there''s more
than 2MB waiting
then we start using the main pool.

Clear as mud?  This is likely to change again...

Neil.

Erik Trimble

2010-Jun-15 01:35 UTC

head link

[zfs-discuss] size of slog device

On 6/14/2010 12:10 PM, Neil Perrin wrote:> On 06/14/10 12:29, Bob Friesenhahn wrote:
>> On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:
>>
>>>> It is good to keep in mind that only small writes go to the
dedicated
>>>> slog. Large writes to to main store. A succession of that many
small
>>>> writes (to fill RAM/2) is highly unlikely. Also, that the zil
is not
>>>> read back unless the system is improperly shut down.
>>>
>>> I thought all sync writes, meaning everything NFS and iSCSI, went 
>>> into the slog - IIRC the docs says so.
>>
>> Check a month or two back in the archives for a post by Matt Ahrens. 
>> It seems that larger writes (>32k?) are written directly to main 
>> store.  This is probably a change from the original zfs design.
>>
>> Bob
>
> If there''s a slog then the data, regardless of size, gets written
to
> the slog.
>
> If there''s no slog and if the data size is greater than 
> zfs_immediate_write_sz/zvol_immediate_write_sz
> (both default to 32K) then the data is written as a block into the 
> pool and the block pointer
> written into the log record. This is the WR_INDIRECT write type.
>
> So Matt and Roy are both correct.
>
> But wait, there''s more complexity!:
>
> If logbias=throughput is set we always use WR_INDIRECT.
>
> If we just wrote more than 1MB for a single zil commit and there''s
> more than 2MB waiting
> then we start using the main pool.
>
> Clear as mud?  This is likely to change again...
>
> Neil.
>
How do I monitor the amount of live (i.e. non-committed) data in the 
slog?  I''d like to spend some time with my setup, seeing exactly how 
much I tend to use.

I''d suspect that very few use cases call for more than a couple (2-4)
GB
of slog...

I''m trying to get hard numbers as I''m working on building a 
DRAM/battery/flash slog device in one of my friend''s electronics 
prototyping shops.  It would be really nice if I could solve 99% of the 
need with 1 or 2 2GB SODIMMs and the chips from a cheap 4GB USB thumb 
drive...

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Richard Elling

2010-Jun-15 04:55 UTC

head link

[zfs-discuss] size of slog device

On Jun 14, 2010, at 6:35 PM, Erik Trimble wrote:> On 6/14/2010 12:10 PM, Neil Perrin wrote:
>> On 06/14/10 12:29, Bob Friesenhahn wrote:
>>> On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:
>>> 
>>>>> It is good to keep in mind that only small writes go to the
dedicated
>>>>> slog. Large writes to to main store. A succession of that
many small
>>>>> writes (to fill RAM/2) is highly unlikely. Also, that the
zil is not
>>>>> read back unless the system is improperly shut down.
>>>> 
>>>> I thought all sync writes, meaning everything NFS and iSCSI,
went into the slog - IIRC the docs says so.
>>> 
>>> Check a month or two back in the archives for a post by Matt
Ahrens. It seems that larger writes (>32k?) are written directly to main
store.  This is probably a change from the original zfs design.
>>> 
>>> Bob
>> 
>> If there''s a slog then the data, regardless of size, gets
written to the slog.
>> 
>> If there''s no slog and if the data size is greater than
zfs_immediate_write_sz/zvol_immediate_write_sz
>> (both default to 32K) then the data is written as a block into the pool
and the block pointer
>> written into the log record. This is the WR_INDIRECT write type.
>> 
>> So Matt and Roy are both correct.
>> 
>> But wait, there''s more complexity!:
>> 
>> If logbias=throughput is set we always use WR_INDIRECT.
>> 
>> If we just wrote more than 1MB for a single zil commit and
there''s more than 2MB waiting
>> then we start using the main pool.
>> 
>> Clear as mud?  This is likely to change again...
>> 
>> Neil.
>> 
> 
> How do I monitor the amount of live (i.e. non-committed) data in the slog? 
I''d like to spend some time with my setup, seeing exactly how much I
tend to use.
zilstat
http://www.richardelling.com/Home/scripts-and-programs-1/zilstat
> I''d suspect that very few use cases call for more than a couple
(2-4) GB of slog...
I''d suspect few real cases need more than 1GB.
 -- richard

-- 
Richard Elling
richard at nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/

Neil Perrin

2010-Jun-15 05:00 UTC

head link

[zfs-discuss] size of slog device

On 06/14/10 19:35, Erik Trimble wrote:> On 6/14/2010 12:10 PM, Neil Perrin wrote:
>> On 06/14/10 12:29, Bob Friesenhahn wrote:
>>> On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:
>>>
>>>>> It is good to keep in mind that only small writes go to the
dedicated
>>>>> slog. Large writes to to main store. A succession of that
many small
>>>>> writes (to fill RAM/2) is highly unlikely. Also, that the
zil is not
>>>>> read back unless the system is improperly shut down.
>>>>
>>>> I thought all sync writes, meaning everything NFS and iSCSI,
went
>>>> into the slog - IIRC the docs says so.
>>>
>>> Check a month or two back in the archives for a post by Matt
Ahrens.
>>> It seems that larger writes (>32k?) are written directly to main
>>> store.  This is probably a change from the original zfs design.
>>>
>>> Bob
>>
>> If there''s a slog then the data, regardless of size, gets
written to
>> the slog.
>>
>> If there''s no slog and if the data size is greater than 
>> zfs_immediate_write_sz/zvol_immediate_write_sz
>> (both default to 32K) then the data is written as a block into the 
>> pool and the block pointer
>> written into the log record. This is the WR_INDIRECT write type.
>>
>> So Matt and Roy are both correct.
>>
>> But wait, there''s more complexity!:
>>
>> If logbias=throughput is set we always use WR_INDIRECT.
>>
>> If we just wrote more than 1MB for a single zil commit and
there''s
>> more than 2MB waiting
>> then we start using the main pool.
>>
>> Clear as mud?  This is likely to change again...
>>
>> Neil.
>>
>
> How do I monitor the amount of live (i.e. non-committed) data in the 
> slog?  I''d like to spend some time with my setup, seeing exactly
how
> much I tend to use.
I think monitoring the capacity when running "zpool iostat -v <pool>
1"
should be fairly accurate.
A simple d script can be written to determine how often the ZIL (code) 
fails to get a slog block and
has to resort to the allocation in the main pool.

One recent change reduced the amount of data written and possibly the 
slog block fragmentation.
This is zpool version 23: "Slim ZIL". So be sure to experiment with
that.
>
>
> I''d suspect that very few use cases call for more than a couple
(2-4)
> GB of slog...
I agree this is typically true. Of course it depends on your workload. 
The amount slog data will reflect the
uncommitted synchronous txg data, and the size of each txg will depend 
on memory size.
This area is also undergoing tuning.>
> I''m trying to get hard numbers as I''m working on building
a
> DRAM/battery/flash slog device in one of my friend''s electronics 
> prototyping shops.  It would be really nice if I could solve 99% of 
> the need with 1 or 2 2GB SODIMMs and the chips from a cheap 4GB USB 
> thumb drive...
>
Sounds like fun. Good luck.

Neil.

Edward Ned Harvey

2010-Jun-16 03:13 UTC

head link

[zfs-discuss] size of slog device

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Bob Friesenhahn
> 
> It is good to keep in mind that only small writes go to the dedicated
> slog.  Large writes to to main store.  A succession of that many small
> writes (to fill RAM/2) is highly unlikely.  Also, that the zil is not
> read back unless the system is improperly shut down.
Can anyone verify this?  I thought the decision for small vs large sync
writes to go to log vs main store was determined by zfs_immediate_write_sz
and logbias.

logbias was introduced in snv_122, which is zpool 18 or 19.
zfs_immediate_write_sz seems to have been around forever (I see comments
about it as early as 2006).

Then again, I can''t seem to find my zfs_immediate_write_sz, via either
zpool
or zfs.  Can anybody say what version zpool introduced
zfs_immediate_write_sz, or perhaps I''m using the wrong commands to try
and
see mine?  zpool get all rpool | grep zfs_immediate_write_sz ; zfs get all
rpool | grep zfs_immediate_write_sz

I thought, if you didn''t explicitly tune these, all sync writes go to
ZIL
before the main store.  Can''t seem to find any way to verify this.

Richard Elling

2010-Jun-16 03:51 UTC

head link

[zfs-discuss] size of slog device

On Jun 15, 2010, at 8:13 PM, Edward Ned Harvey wrote:
>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> bounces at opensolaris.org] On Behalf Of Bob Friesenhahn
>> 
>> It is good to keep in mind that only small writes go to the dedicated
>> slog.  Large writes to to main store.  A succession of that many small
>> writes (to fill RAM/2) is highly unlikely.  Also, that the zil is not
>> read back unless the system is improperly shut down.
> 
> Can anyone verify this?  I thought the decision for small vs large sync
> writes to go to log vs main store was determined by zfs_immediate_write_sz
> and logbias.
> 
> logbias was introduced in snv_122, which is zpool 18 or 19.
> zfs_immediate_write_sz seems to have been around forever (I see comments
> about it as early as 2006).
> 
> Then again, I can''t seem to find my zfs_immediate_write_sz, via
either zpool
> or zfs.  Can anybody say what version zpool introduced
> zfs_immediate_write_sz, or perhaps I''m using the wrong commands to
try and
> see mine?  zpool get all rpool | grep zfs_immediate_write_sz ; zfs get all
> rpool | grep zfs_immediate_write_sz
It is an int, as in C, not a parameter tunable by zpool or zfs commands.
For NFS service, it can be tuned by the client via wsize.
> I thought, if you didn''t explicitly tune these, all sync writes go
to ZIL
> before the main store.  Can''t seem to find any way to verify this.
Cake.  All sync writes go to the ZIL.  The ZIL may be in the pool or in
the separate log device :-)
 -- richard

-- 
Richard Elling
richard at nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/

Richard Elling

2010-Jun-16 05:43 UTC

head link

[zfs-discuss] size of slog device

On Jun 15, 2010, at 8:51 PM, Richard Elling wrote>> I thought, if you didn''t explicitly tune these, all sync
writes go to ZIL
>> before the main store.  Can''t seem to find any way to verify
this.
> 
> Cake.  All sync writes go to the ZIL.  The ZIL may be in the pool or in
> the separate log device :-)
"go to" may be too confusing.  s/go to/are handled by/
 -- richard

-- 
Richard Elling
richard at nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/

zfs discuss - Jun 2010 - size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device

[zfs-discuss] size of slog device