thr3ads.net - zfs discuss - [zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata [May 2011]

If this information is useful, please help other people find it:
Share via:

Chris Forgeron

2011-May-09 19:29 UTC

[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata

Hello,
 
 I''m on FreeBSD 9 with ZFS v28, and it''s possible this
combination is causing my issue, but I thought I''d start here first and
will cross-post to the FreeBSD ZFS threads if the Solaris crowd thinks this is a
FreeBSD problem.

 The issue: From carefully watching my ARC/L2ARC size and activity when I set
primarycache=metadata and secondarycache=all, the secondarycache isn''t
acting like "all", it''s just metadata.

I''ve read a lot on the functionality of the ARC/L2ARC, and I know that
L2ARC is filled by a scanning the ARC for soon to expire objects, and then
copying them to the L2ARC.

There is my first question:

Q1 - Doesn''t this behavior mean that the L2ARC can never get data
objects if the ARC doesn''t hold them? Is setting primary to metadata
and secondary to all an impossible request?

I believe the user data still goes through the ARC, it''s just not kept
when primarycache=metadata.


I should back up and explain what I have, and what I''m trying to do. 

I''m running a 24 gig ZFS system with a 20 TB pool, roughly 12 TB full. 
It''s serving NFS data to an ESX server for my various VM''s and
it''s running great, albeit a bit slow. I have 4 120 gig SSD''s
as my L2ARC.

I''m estimating my DeDeuplicationTable (DDT) needs to be well over 24
gigs, and with the default primary and secondary cache settings of
"all" there is a lot of churning of both the ARC and L2ARC. I can see
it on my L2ARC SSD drives, as they all fill up within a day, and stay chock full
as the server continues to run 50+ VM''s.  However, there is so much
user data to be accessed, that it keeps a constant pressure on the metadata, and
eventually the user data flowing in erodes most of the metadata.

I''m trying to make sure my DDT is as available as possible without
putting more RAM in this server. If I can dedicate my ARC to holding as much of
the DDT as possible, and then use my L2ARC for any DDT overflow, and then lastly
for data, I''d be happy with the performance.  I understand
there''s a performance hit here over RAM.

When I set primarycache and secondarycache both to metadata, I find that my
L2ARC drives fill up to around 20 Gig Each, for a total of 80 gigs on my L2ARC,
and it won''t go any further than this, no matter how hard I beat on it.

However, when I set primarycache=metadata and secondarycache=all, I find the
same behavior as setting both to metadata. My L2ARC''s doesn''t
budge past 20 Gigs each.

I''m familiar with arc_meta_used and arc_meta_limit. I''ve
increased my arc_meta_limit to be 75% of my ARC, as I''m not interested
in caching data in the ARC. I can watch arc_meta_used drop when I set
primarycache=all, due to the pressure of the data objects pushing out the
metadata, and setting primarycache=metadata allows the arc_meta_used to grow
back to it''s limit.

Q2 is basically: What''s the best way to keep as much of the DDT live in
ARC and L2ARC without buying more RAM?

Hopefully I''m not dismissed with "Just buy more RAM", which I
can see as a valid point in a lot of situations where people are running on 4
and 8 gig systems and trying to access 12+ TB of data in a DeDupe situation.

Keeping my RAM at 24 gigs isn''t just a budget request, but it also the
max RAM you can get in most workstation boards these days, and it''s
moving in the direction of better energy usage and heat generation, as my 120
Gig SSD''s burn a fraction of the power than another 24 gigs of DDR3
will.

Even if I added more RAM, I''d still have this problem. Say I have 196
Gigs of RAM in this server, and I want to dedicate the ARC just to metadata, and
the L2ARC to user data. From my experiments, this wouldn''t work. As I
keep scaling up this server, it will keep eroding the metadata as userdata pours
through the cache system.

I think the ultimate would be cache priority - Setting metadata to be the most
important, and then user data as secondary importance, so that it never evicts
any metadata from ARC/L2ARC for user data, however I don''t think this
is possible today, and I''m unsure if it''s on the drawing
board.

Your input is appreciated. 

Thanks.

Richard Elling

2011-May-09 20:28 UTC

head link

[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata

On May 9, 2011, at 12:29 PM, Chris Forgeron wrote:
> Hello,
> 
> I''m on FreeBSD 9 with ZFS v28, and it''s possible this
combination is causing my issue, but I thought I''d start here first and
will cross-post to the FreeBSD ZFS threads if the Solaris crowd thinks this is a
FreeBSD problem.
> 
> The issue: From carefully watching my ARC/L2ARC size and activity when I
set primarycache=metadata and secondarycache=all, the secondarycache
isn''t acting like "all", it''s just metadata.
> 
> I''ve read a lot on the functionality of the ARC/L2ARC, and I know
that L2ARC is filled by a scanning the ARC for soon to expire objects, and then
copying them to the L2ARC.
> 
> There is my first question:
> 
> Q1 - Doesn''t this behavior mean that the L2ARC can never get data
objects if the ARC doesn''t hold them?
Yes.
> Is setting primary to metadata and secondary to all an impossible request?
Today, this is not how it is implemented. However, it is open source :-)
The current architecture allows the L2ARC fill to not impact the normal ARC
operations.
If you implement a policy that says "only use the L2ARC to cache data"
then you are
implicitly requiring the write bandwidth of the L2ARC to be faster than the read
bandwidth
of the pool -- not likely to be a winning combination. This isn''t a
problem for the ARC because
it has memory bandwidth, which is, of course, always greater than I/O bandwidth.
> I believe the user data still goes through the ARC, it''s just not
kept when primarycache=metadata.
Correct.
> I should back up and explain what I have, and what I''m trying to
do.
> 
> I''m running a 24 gig ZFS system with a 20 TB pool, roughly 12 TB
full.  It''s serving NFS data to an ESX server for my various
VM''s and it''s running great, albeit a bit slow. I have 4 120
gig SSD''s as my L2ARC.
> 
> I''m estimating my DeDeuplicationTable (DDT) needs to be well over
24 gigs, and with the default primary and secondary cache settings of
"all" there is a lot of churning of both the ARC and L2ARC. I can see
it on my L2ARC SSD drives, as they all fill up within a day, and stay chock full
as the server continues to run 50+ VM''s.  However, there is so much
user data to be accessed, that it keeps a constant pressure on the metadata, and
eventually the user data flowing in erodes most of the metadata.
> 
> I''m trying to make sure my DDT is as available as possible without
putting more RAM in this server. If I can dedicate my ARC to holding as much of
the DDT as possible, and then use my L2ARC for any DDT overflow, and then lastly
for data, I''d be happy with the performance.  I understand
there''s a performance hit here over RAM.
> 
> When I set primarycache and secondarycache both to metadata, I find that my
L2ARC drives fill up to around 20 Gig Each, for a total of 80 gigs on my L2ARC,
and it won''t go any further than this, no matter how hard I beat on it.
> 
> However, when I set primarycache=metadata and secondarycache=all, I find
the same behavior as setting both to metadata. My L2ARC''s
doesn''t budge past 20 Gigs each.
> 
> I''m familiar with arc_meta_used and arc_meta_limit. I''ve
increased my arc_meta_limit to be 75% of my ARC, as I''m not interested
in caching data in the ARC. I can watch arc_meta_used drop when I set
primarycache=all, due to the pressure of the data objects pushing out the
metadata, and setting primarycache=metadata allows the arc_meta_used to grow
back to it''s limit.
> 
> Q2 is basically: What''s the best way to keep as much of the DDT
live in ARC and L2ARC without buying more RAM?
> 
> Hopefully I''m not dismissed with "Just buy more RAM",
which I can see as a valid point in a lot of situations where people are running
on 4 and 8 gig systems and trying to access 12+ TB of data in a DeDupe
situation.
:-)
> 
> Keeping my RAM at 24 gigs isn''t just a budget request, but it also
the max RAM you can get in most workstation boards these days, and it''s
moving in the direction of better energy usage and heat generation, as my 120
Gig SSD''s burn a fraction of the power than another 24 gigs of DDR3
will.
Perhaps use the tool designed for the task?
 -- richard
> 
> Even if I added more RAM, I''d still have this problem. Say I have
196 Gigs of RAM in this server, and I want to dedicate the ARC just to metadata,
and the L2ARC to user data. From my experiments, this wouldn''t work. As
I keep scaling up this server, it will keep eroding the metadata as userdata
pours through the cache system.
> 
> I think the ultimate would be cache priority - Setting metadata to be the
most important, and then user data as secondary importance, so that it never
evicts any metadata from ARC/L2ARC for user data, however I don''t think
this is possible today, and I''m unsure if it''s on the drawing
board.
> 
> Your input is appreciated. 
> 
> Thanks. 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Tomas Ögren

2011-May-09 21:54 UTC

head link

[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata

On 09 May, 2011 - Richard Elling sent me these 5,0K bytes:
> of the pool -- not likely to be a winning combination. This isn''t
a problem for the ARC because
> it has memory bandwidth, which is, of course, always greater than I/O
bandwidth.
Slightly off topic, but we had an IBM RS/6000 43P with a PowerPC 604e
cpu, which had about 60MB/s memory bandwidth (which is kind of bad for a
332MHz cpu) and its disks could do 70-80MB/s or so.. in some other
machine..

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Brandon High

2011-May-10 16:53 UTC

head link

[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata

On Mon, May 9, 2011 at 2:54 PM, Tomas ?gren <stric at acc.umu.se>
wrote:> Slightly off topic, but we had an IBM RS/6000 43P with a PowerPC 604e
> cpu, which had about 60MB/s memory bandwidth (which is kind of bad for a
> 332MHz cpu) and its disks could do 70-80MB/s or so.. in some other
> machine..
It wasn''t that long ago when 66MB/s ATA was considered a waste because
no drive could use that much bandwidth. These days a "slow" drive has
max throughput greater than 110MB/s.

(OK, looking at some online reviews, it was about 13 years ago. Maybe
I''m just old.)

-B

-- 
Brandon High : bhigh at freaks.com

Chris Forgeron

2011-May-14 18:28 UTC

head link

[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata

>On May 9, 2010 at 5:29PM, Richard Elling wrote:
>>On May 9, 2011, at 12:29 PM, Chris Forgeron wrote:
>>[..]
>> Q1 - Doesn''t this behavior mean that the L2ARC can never get
data objects if the ARC doesn''t hold them?
>
>Yes.
>
>> Is setting primary to metadata and secondary to all an impossible
request?
>
>Today, this is not how it is implemented. However, it is open source :-) The
current architecture allows the L2ARC fill to not impact the normal ARC
operations.
> If you implement a policy that says "only use the L2ARC to cache
data" then you are implicitly requiring the write bandwidth of the L2ARC to
be faster than
> the read bandwidth of the pool -- not likely to be a winning combination.
This isn''t a problem for > the ARC because it has memory bandwidth,
which is, of
> course, always greater than I/O bandwidth.
>> I believe the user data still goes through the ARC, it''s just
not kept when primarycache=metadata.
>Correct.
Ah, thanks, that saves me a lot of time banging by head against the desk trying
to figure out why I can''t make this setting work.

ZFS is open source, but we''re in a bit of a stall with ZFS development,
until the behind-the-scenes committee finishes deliberation and we have new
direction to start working with. IMO, anything other than a very small patch is
dangerous to work on, as we don''t know what''s in the cooker
for the next ZFS OpenSource release. I do recognize my status as a ZFS leech at
the moment (not committing any code, but asking for changes), but it''s
not easy to transition.

I have gone through the ARC code in detail  - I''ll have to review again
and see if there are any quick tweaks I can perform that may give me the options
(and thus the performance) I''m looking for. Of course, I think
there''s nothing small and quick to be done to the ZFS code, or it would
have already been done. :-)
>> Q2 is basically: What''s the best way to keep as much of the
DDT live in ARC and L2ARC without buying more RAM?
>> 
>> Hopefully I''m not dismissed with "Just buy more
RAM", which I can see as a valid point in a lot of situations where people
are running on 4 and 8 gig systems and trying to access 12+ TB of data in a
DeDupe situation.
>
> :-)
Thanks for not saying it. :-) 
>> 
>> Keeping my RAM at 24 gigs isn''t just a budget request, but it
also the max RAM you can get in most workstation boards these days, and
it''s moving in the direction of better energy usage and heat
generation, as my 120 Gig SSD''s burn a fraction of the >>power
than another 24 gigs of DDR3 will.
>Perhaps use the tool designed for the task?
> -- richard
Which tool is this?
>> 
>> Even if I added more RAM, I''d still have this problem. Say I
have 196 Gigs of RAM in this server, and I want to dedicate the ARC just to
metadata, and the L2ARC to user data. From my experiments, this
wouldn''t work. As I keep scaling up this server, it will keep eroding
the metadata as userdata pours through the cache system.
>> 
>> I think the ultimate would be cache priority - Setting metadata to be
the most important, and then user data as secondary importance, so that it never
evicts any metadata from ARC/L2ARC for user data, however I don''t think
this is possible today, and I''m unsure if it''s on the drawing
board.
>> 
After looking at the ARC code, I wonder if the easier method wouldn''t
be a arc_data_limit much like the arc_meta_limit.

It would work exactly the same way as meta_limit, but now we''d be able
to save the precious ARC for as much metadata as we desired. When
primarycache=all, ARC would contain user data, and thus would be available to
the L2ARC for caching. Because we can set the max amount of user data to live in
the cache, we would be able to control the erosion of the ARC''s
metadata.

Eventually if you have your secondarycache=all , both metadata and userdata
would enter the L2ARC. You''d be able to achieve many levels of
fine-grained control over the caching.

As it stands right now from my tests, the user data will _always_ erode the
metadata from the ARC, and that can''t be good for speed in a dedup
situation, no matter how much RAM you have available.  This means that the
general assertion that dedup only impacts a small hit on speed due to the lookup
is assuming that you''re not going to use your ARC for anything other
than metadata, and that the entire dedup table fits in ARC. The reality is that
the DDT needs to be fetched from L2ARC or even disk more frequently than
you''d expect.

zfs discuss - May 2011 - primarycache=metadata seems to force behaviour of secondarycache=metadata

[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata

[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata

[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata

[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata

[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata