thr3ads.net - zfs discuss - [zfs-discuss] ZFS memory recommendations [May 2010]

If this information is useful, please help other people find it:
Share via:

Deon Cui

2010-May-19 12:06 UTC

[zfs-discuss] ZFS memory recommendations

I am currently doing research on how much memory ZFS should have for a storage
server.

I came across this blog

http://constantin.glez.de/blog/2010/04/ten-ways-easily-improve-oracle-solaris-zfs-filesystem-performance

It recommends that for every TB of storage you have you want 1GB of RAM just for
the metadata.

Is this really the case that ZFS metadata consumes so much RAM?
I''m currently building a storage server which will eventually hold up
to 20TB of storage, I can''t fit in 20GB of RAM on the motherboard!
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2010-May-19 15:23 UTC

head link

[zfs-discuss] ZFS memory recommendations

On Wed, 19 May 2010, Deon Cui wrote:>
>
http://constantin.glez.de/blog/2010/04/ten-ways-easily-improve-oracle-solaris-zfs-filesystem-performance
>
> It recommends that for every TB of storage you have you want 1GB of 
> RAM just for the metadata.
Interesting conclusion.
> Is this really the case that ZFS metadata consumes so much RAM?
> I''m currently building a storage server which will eventually hold
> up to 20TB of storage, I can''t fit in 20GB of RAM on the 
> motherboard!
Unless you do something like enable dedup (which is still risky to 
use), then there is no rule of thumb that I know of.  ZFS will take 
advantage of available RAM.  You should have at least 1GB of RAM 
available for ZFS to use.  Beyond that, it depends entirely on the 
size of your expected working set.  The size of accessed files, the 
randomness of the access, the number of simultaneous accesses, and the 
maximum number of files per directory all make a difference to how 
much RAM you should have for good performance. If you have 200TB of 
stored data, but only actually access 2GB of it at any one time, then 
the caching requirements are not very high.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Roy Sigurd Karlsbakk

2010-May-19 16:27 UTC

head link

[zfs-discuss] ZFS memory recommendations

----- "Deon Cui" <deon.cui at gmail.com> skrev:
> I am currently doing research on how much memory ZFS should have for a
> storage server.
> 
> I came across this blog
> 
>
http://constantin.glez.de/blog/2010/04/ten-ways-easily-improve-oracle-solaris-zfs-filesystem-performance
> 
> It recommends that for every TB of storage you have you want 1GB of
> RAM just for the metadata.
That''s for dedup, 150 bytes per block, meaning approx 1GB per 1TB if
all (or most) are 128kB blocks, and way more memory (or L2ARC) if you have small
files.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Erik Trimble

2010-May-19 17:49 UTC

head link

[zfs-discuss] ZFS memory recommendations

Bob Friesenhahn wrote:> On Wed, 19 May 2010, Deon Cui wrote:
>>
>>
http://constantin.glez.de/blog/2010/04/ten-ways-easily-improve-oracle-solaris-zfs-filesystem-performance
>>
>>
>> It recommends that for every TB of storage you have you want 1GB of 
>> RAM just for the metadata.
>
> Interesting conclusion.
>
>> Is this really the case that ZFS metadata consumes so much RAM?
>> I''m currently building a storage server which will eventually
hold up
>> to 20TB of storage, I can''t fit in 20GB of RAM on the
motherboard!
>
> Unless you do something like enable dedup (which is still risky to 
> use), then there is no rule of thumb that I know of.  ZFS will take 
> advantage of available RAM.  You should have at least 1GB of RAM 
> available for ZFS to use.  Beyond that, it depends entirely on the 
> size of your expected working set.  The size of accessed files, the 
> randomness of the access, the number of simultaneous accesses, and the 
> maximum number of files per directory all make a difference to how 
> much RAM you should have for good performance. If you have 200TB of 
> stored data, but only actually access 2GB of it at any one time, then 
> the caching requirements are not very high.
>
> Bob
I''d second Bob''s notes here - for non-dedup purposes, you need
at a very
bare minimum of 512MB of RAM just for ZFS (Bob''s recommendation of 1GB 
is much better, I''m quoting a real basement level beyond which
you''re
effectively crippling ZFS).

The primary RAM consumption determination for pools without dedup is the 
size of your active working set (as Bob mentioned).  It''s unrealistic
to
expect to cache /all/ metadata for every file for large pools, and I 
can''t really see the worth in it anyhow (you end up with very 
infrequently-used metadata sitting in RAM, which gets evicted for use by 
other things in most cases).  Storing any more metadata than what you 
need for your working set isn''t going to bring much performance bonus.
What you need to have is sufficient RAM to cache your async writes 
(remember, this amount is relatively small in most cases - it''s 3 
pending transactions per pool), plus enough RAM to hold your all the 
files (plus metadata) you expect to use (i.e. read more than once or 
write to) within about 5 minutes.

Here''s three examples to show the differences (all without dedup):

(1)  100TB system which contains scientific data used in a data-mining 
app.  The system will need to frequently access very large amounts of 
the available data, but seldom writes much.  As it is doing data-mining, 
a specific piece of data is read seldom, though the system needs to read 
large aggregate amounts continuously.  In this case, you''re pretty much
out of luck for caching. You''ll need enough RAM to cache your maximum 
write size, and a little bit for read-ahead, but since you''re accessing
the pool almost at random for large amounts of data which aren''t 
re-used, caching isn''t going to help really at all.  In this cases, 
1-2GB of RAM is likely all that really can be used.

(2)  1TB of data are being used for a Virtual Machine disk server. That 
is, the machine exports iSCSI (or FCoE, or NFS, or whatever) volumes for 
use on client hardware to run a VM.  Typically in this case, there are 
lots of effectively random read requests coming in for a bunch of
"hot"
files (which tend to be OS files in the VM-hosted OSes). There''s also 
fairly frequent write requests.   However, the VMs will do a fair amount 
of read-caching of their own, so the amount of read requests is lower 
than one would think. For performance and administrative reasons, it is 
likely that you will want multiple pools, rather than a single large 
pool.  In this case, you need a reasonable amount of write-cache for 
*each* pool, plus enough RAM to cache all of the OS files very often 
used for ALL the VMs.  In this case, dedup would actually really help 
RAM consumption, since it is highly likely that frequently-accessed 
files from multiple VMs are in fact identical, and thus with dedup, 
you''d only need to store one copy in the cache.   In any case, here 
you''d need a few GB for the write caching, plus likely a dozen or more 
GB for read caching, as your working set is moderately large, and 
frequently re-used.

(3)  100TB of data for NFS home directory serving.  Access pattern here 
is likely highly random, with only small amounts of re-used data. 
However, you''ll often have non-trivial write sizes. Having a ZIL is 
probably a good idea, but in any case, you''ll want a couple of GB (call
it 3-4) for write caching per pool, and then several dozen MB per active 
user as read cache.  That is, in this case, it''s likely that your 
determining factor is not total data size, but the number of 
simultaneous users, since the latter will dictate your frequency of file 
access.

I''d say all of the recommendations/insights on the referenced link are 
good, except for #1.  The base amount of RAM is highly variable based on 
the factors discussed above, and the blanket assumption that you need to 
cache all pool metadata isn''t valid.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Miles Nordin

2010-May-19 18:15 UTC

head link

[zfs-discuss] ZFS memory recommendations

>>>>> "et" == Erik Trimble <erik.trimble at
oracle.com> writes:
    et> frequently-accessed files from multiple VMs are in fact
    et> identical, and thus with dedup, you''d only need to store one
    et> copy in the cache.

although counterintuitive I thought this wasn''t part of the initial
release.  Maybe I''m wrong altogether or maybe it got added later?

  http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup#comment-1257191094000

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100519/febb6308/attachment.bin>

Erik Trimble

2010-May-19 18:20 UTC

head link

[zfs-discuss] ZFS memory recommendations

Miles Nordin wrote:>>>>>> "et" == Erik Trimble <erik.trimble at
oracle.com> writes:
>>>>>>             
>
>     et> frequently-accessed files from multiple VMs are in fact
>     et> identical, and thus with dedup, you''d only need to
store one
>     et> copy in the cache.
>
> although counterintuitive I thought this wasn''t part of the
initial
> release.  Maybe I''m wrong altogether or maybe it got added later?
>
>   http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup#comment-1257191094000
>   No, you''re reading that blog right - dedup is on a per-pool basis. 
What
I was talking about was inside a single pool.  Without dedup enabled on 
a pool, if I have 2 VM images, both of which are say WinXP, then I''d 
have to cache identical files twice.  With dedup, I''d only have to
cache
those blocks once, even if they were being accessed by both VMs.

So, dedup is both hard on RAM (you need the DDT), and easier (it lowers 
the amount of actual data blocks which have to be stored in cache). 


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Miles Nordin

2010-May-20 19:16 UTC

head link

[zfs-discuss] ZFS memory recommendations

>>>>> "et" == Erik Trimble <erik.trimble at
oracle.com> writes:
    et> No, you''re reading that blog right - dedup is on a per-pool
    et> basis.

The way I''m reading that blog is that deduped data is expaned in the
ARC.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100520/7998bb91/attachment.bin>

Erik Trimble

2010-May-20 19:27 UTC

head link

[zfs-discuss] ZFS memory recommendations

Miles Nordin wrote:>>>>>> "et" == Erik Trimble <erik.trimble at
oracle.com> writes:
>>>>>>             
>
>     et> No, you''re reading that blog right - dedup is on a
per-pool
>     et> basis.
>
> The way I''m reading that blog is that deduped data is expaned in
the
> ARC.
>   What I think is being done is this:  for pool A and B, each have a 
separate DDT.  That is, if there is an identical block in both pool A 
and pool B, then BOTH keep a copy of that block - removal of duplicates 
is only done *inside* a pool.    However, if that identical block 
happens to be read from both pools and stored in the ARC, the new 
feature will detect that you have an identical block in ARC, and thus 
only keep a single copy of it cached in ARC.



-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

zfs discuss - May 2010 - ZFS memory recommendations

[zfs-discuss] ZFS memory recommendations

[zfs-discuss] ZFS memory recommendations

[zfs-discuss] ZFS memory recommendations

[zfs-discuss] ZFS memory recommendations

[zfs-discuss] ZFS memory recommendations

[zfs-discuss] ZFS memory recommendations

[zfs-discuss] ZFS memory recommendations

[zfs-discuss] ZFS memory recommendations