Chris Forgeron
2011-May-09 19:29 UTC
[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata
Hello, I''m on FreeBSD 9 with ZFS v28, and it''s possible this combination is causing my issue, but I thought I''d start here first and will cross-post to the FreeBSD ZFS threads if the Solaris crowd thinks this is a FreeBSD problem. The issue: From carefully watching my ARC/L2ARC size and activity when I set primarycache=metadata and secondarycache=all, the secondarycache isn''t acting like "all", it''s just metadata. I''ve read a lot on the functionality of the ARC/L2ARC, and I know that L2ARC is filled by a scanning the ARC for soon to expire objects, and then copying them to the L2ARC. There is my first question: Q1 - Doesn''t this behavior mean that the L2ARC can never get data objects if the ARC doesn''t hold them? Is setting primary to metadata and secondary to all an impossible request? I believe the user data still goes through the ARC, it''s just not kept when primarycache=metadata. I should back up and explain what I have, and what I''m trying to do. I''m running a 24 gig ZFS system with a 20 TB pool, roughly 12 TB full. It''s serving NFS data to an ESX server for my various VM''s and it''s running great, albeit a bit slow. I have 4 120 gig SSD''s as my L2ARC. I''m estimating my DeDeuplicationTable (DDT) needs to be well over 24 gigs, and with the default primary and secondary cache settings of "all" there is a lot of churning of both the ARC and L2ARC. I can see it on my L2ARC SSD drives, as they all fill up within a day, and stay chock full as the server continues to run 50+ VM''s. However, there is so much user data to be accessed, that it keeps a constant pressure on the metadata, and eventually the user data flowing in erodes most of the metadata. I''m trying to make sure my DDT is as available as possible without putting more RAM in this server. If I can dedicate my ARC to holding as much of the DDT as possible, and then use my L2ARC for any DDT overflow, and then lastly for data, I''d be happy with the performance. I understand there''s a performance hit here over RAM. When I set primarycache and secondarycache both to metadata, I find that my L2ARC drives fill up to around 20 Gig Each, for a total of 80 gigs on my L2ARC, and it won''t go any further than this, no matter how hard I beat on it. However, when I set primarycache=metadata and secondarycache=all, I find the same behavior as setting both to metadata. My L2ARC''s doesn''t budge past 20 Gigs each. I''m familiar with arc_meta_used and arc_meta_limit. I''ve increased my arc_meta_limit to be 75% of my ARC, as I''m not interested in caching data in the ARC. I can watch arc_meta_used drop when I set primarycache=all, due to the pressure of the data objects pushing out the metadata, and setting primarycache=metadata allows the arc_meta_used to grow back to it''s limit. Q2 is basically: What''s the best way to keep as much of the DDT live in ARC and L2ARC without buying more RAM? Hopefully I''m not dismissed with "Just buy more RAM", which I can see as a valid point in a lot of situations where people are running on 4 and 8 gig systems and trying to access 12+ TB of data in a DeDupe situation. Keeping my RAM at 24 gigs isn''t just a budget request, but it also the max RAM you can get in most workstation boards these days, and it''s moving in the direction of better energy usage and heat generation, as my 120 Gig SSD''s burn a fraction of the power than another 24 gigs of DDR3 will. Even if I added more RAM, I''d still have this problem. Say I have 196 Gigs of RAM in this server, and I want to dedicate the ARC just to metadata, and the L2ARC to user data. From my experiments, this wouldn''t work. As I keep scaling up this server, it will keep eroding the metadata as userdata pours through the cache system. I think the ultimate would be cache priority - Setting metadata to be the most important, and then user data as secondary importance, so that it never evicts any metadata from ARC/L2ARC for user data, however I don''t think this is possible today, and I''m unsure if it''s on the drawing board. Your input is appreciated. Thanks.
Richard Elling
2011-May-09 20:28 UTC
[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata
On May 9, 2011, at 12:29 PM, Chris Forgeron wrote:> Hello, > > I''m on FreeBSD 9 with ZFS v28, and it''s possible this combination is causing my issue, but I thought I''d start here first and will cross-post to the FreeBSD ZFS threads if the Solaris crowd thinks this is a FreeBSD problem. > > The issue: From carefully watching my ARC/L2ARC size and activity when I set primarycache=metadata and secondarycache=all, the secondarycache isn''t acting like "all", it''s just metadata. > > I''ve read a lot on the functionality of the ARC/L2ARC, and I know that L2ARC is filled by a scanning the ARC for soon to expire objects, and then copying them to the L2ARC. > > There is my first question: > > Q1 - Doesn''t this behavior mean that the L2ARC can never get data objects if the ARC doesn''t hold them?Yes.> Is setting primary to metadata and secondary to all an impossible request?Today, this is not how it is implemented. However, it is open source :-) The current architecture allows the L2ARC fill to not impact the normal ARC operations. If you implement a policy that says "only use the L2ARC to cache data" then you are implicitly requiring the write bandwidth of the L2ARC to be faster than the read bandwidth of the pool -- not likely to be a winning combination. This isn''t a problem for the ARC because it has memory bandwidth, which is, of course, always greater than I/O bandwidth.> I believe the user data still goes through the ARC, it''s just not kept when primarycache=metadata.Correct.> I should back up and explain what I have, and what I''m trying to do. > > I''m running a 24 gig ZFS system with a 20 TB pool, roughly 12 TB full. It''s serving NFS data to an ESX server for my various VM''s and it''s running great, albeit a bit slow. I have 4 120 gig SSD''s as my L2ARC. > > I''m estimating my DeDeuplicationTable (DDT) needs to be well over 24 gigs, and with the default primary and secondary cache settings of "all" there is a lot of churning of both the ARC and L2ARC. I can see it on my L2ARC SSD drives, as they all fill up within a day, and stay chock full as the server continues to run 50+ VM''s. However, there is so much user data to be accessed, that it keeps a constant pressure on the metadata, and eventually the user data flowing in erodes most of the metadata. > > I''m trying to make sure my DDT is as available as possible without putting more RAM in this server. If I can dedicate my ARC to holding as much of the DDT as possible, and then use my L2ARC for any DDT overflow, and then lastly for data, I''d be happy with the performance. I understand there''s a performance hit here over RAM. > > When I set primarycache and secondarycache both to metadata, I find that my L2ARC drives fill up to around 20 Gig Each, for a total of 80 gigs on my L2ARC, and it won''t go any further than this, no matter how hard I beat on it. > > However, when I set primarycache=metadata and secondarycache=all, I find the same behavior as setting both to metadata. My L2ARC''s doesn''t budge past 20 Gigs each. > > I''m familiar with arc_meta_used and arc_meta_limit. I''ve increased my arc_meta_limit to be 75% of my ARC, as I''m not interested in caching data in the ARC. I can watch arc_meta_used drop when I set primarycache=all, due to the pressure of the data objects pushing out the metadata, and setting primarycache=metadata allows the arc_meta_used to grow back to it''s limit. > > Q2 is basically: What''s the best way to keep as much of the DDT live in ARC and L2ARC without buying more RAM? > > Hopefully I''m not dismissed with "Just buy more RAM", which I can see as a valid point in a lot of situations where people are running on 4 and 8 gig systems and trying to access 12+ TB of data in a DeDupe situation.:-)> > Keeping my RAM at 24 gigs isn''t just a budget request, but it also the max RAM you can get in most workstation boards these days, and it''s moving in the direction of better energy usage and heat generation, as my 120 Gig SSD''s burn a fraction of the power than another 24 gigs of DDR3 will.Perhaps use the tool designed for the task? -- richard> > Even if I added more RAM, I''d still have this problem. Say I have 196 Gigs of RAM in this server, and I want to dedicate the ARC just to metadata, and the L2ARC to user data. From my experiments, this wouldn''t work. As I keep scaling up this server, it will keep eroding the metadata as userdata pours through the cache system. > > I think the ultimate would be cache priority - Setting metadata to be the most important, and then user data as secondary importance, so that it never evicts any metadata from ARC/L2ARC for user data, however I don''t think this is possible today, and I''m unsure if it''s on the drawing board. > > Your input is appreciated. > > Thanks. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Tomas Ă–gren
2011-May-09 21:54 UTC
[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata
On 09 May, 2011 - Richard Elling sent me these 5,0K bytes:> of the pool -- not likely to be a winning combination. This isn''t a problem for the ARC because > it has memory bandwidth, which is, of course, always greater than I/O bandwidth.Slightly off topic, but we had an IBM RS/6000 43P with a PowerPC 604e cpu, which had about 60MB/s memory bandwidth (which is kind of bad for a 332MHz cpu) and its disks could do 70-80MB/s or so.. in some other machine.. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Brandon High
2011-May-10 16:53 UTC
[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata
On Mon, May 9, 2011 at 2:54 PM, Tomas ?gren <stric at acc.umu.se> wrote:> Slightly off topic, but we had an IBM RS/6000 43P with a PowerPC 604e > cpu, which had about 60MB/s memory bandwidth (which is kind of bad for a > 332MHz cpu) and its disks could do 70-80MB/s or so.. in some other > machine..It wasn''t that long ago when 66MB/s ATA was considered a waste because no drive could use that much bandwidth. These days a "slow" drive has max throughput greater than 110MB/s. (OK, looking at some online reviews, it was about 13 years ago. Maybe I''m just old.) -B -- Brandon High : bhigh at freaks.com
Chris Forgeron
2011-May-14 18:28 UTC
[zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata
>On May 9, 2010 at 5:29PM, Richard Elling wrote: >>On May 9, 2011, at 12:29 PM, Chris Forgeron wrote: >>[..] >> Q1 - Doesn''t this behavior mean that the L2ARC can never get data objects if the ARC doesn''t hold them? > >Yes. > >> Is setting primary to metadata and secondary to all an impossible request? > >Today, this is not how it is implemented. However, it is open source :-) The current architecture allows the L2ARC fill to not impact the normal ARC operations. > If you implement a policy that says "only use the L2ARC to cache data" then you are implicitly requiring the write bandwidth of the L2ARC to be faster than > the read bandwidth of the pool -- not likely to be a winning combination. This isn''t a problem for > the ARC because it has memory bandwidth, which is, of > course, always greater than I/O bandwidth.>> I believe the user data still goes through the ARC, it''s just not kept when primarycache=metadata.>Correct.Ah, thanks, that saves me a lot of time banging by head against the desk trying to figure out why I can''t make this setting work. ZFS is open source, but we''re in a bit of a stall with ZFS development, until the behind-the-scenes committee finishes deliberation and we have new direction to start working with. IMO, anything other than a very small patch is dangerous to work on, as we don''t know what''s in the cooker for the next ZFS OpenSource release. I do recognize my status as a ZFS leech at the moment (not committing any code, but asking for changes), but it''s not easy to transition. I have gone through the ARC code in detail - I''ll have to review again and see if there are any quick tweaks I can perform that may give me the options (and thus the performance) I''m looking for. Of course, I think there''s nothing small and quick to be done to the ZFS code, or it would have already been done. :-)>> Q2 is basically: What''s the best way to keep as much of the DDT live in ARC and L2ARC without buying more RAM? >> >> Hopefully I''m not dismissed with "Just buy more RAM", which I can see as a valid point in a lot of situations where people are running on 4 and 8 gig systems and trying to access 12+ TB of data in a DeDupe situation. > > :-)Thanks for not saying it. :-)>> >> Keeping my RAM at 24 gigs isn''t just a budget request, but it also the max RAM you can get in most workstation boards these days, and it''s moving in the direction of better energy usage and heat generation, as my 120 Gig SSD''s burn a fraction of the >>power than another 24 gigs of DDR3 will.>Perhaps use the tool designed for the task? > -- richardWhich tool is this?>> >> Even if I added more RAM, I''d still have this problem. Say I have 196 Gigs of RAM in this server, and I want to dedicate the ARC just to metadata, and the L2ARC to user data. From my experiments, this wouldn''t work. As I keep scaling up this server, it will keep eroding the metadata as userdata pours through the cache system. >> >> I think the ultimate would be cache priority - Setting metadata to be the most important, and then user data as secondary importance, so that it never evicts any metadata from ARC/L2ARC for user data, however I don''t think this is possible today, and I''m unsure if it''s on the drawing board. >>After looking at the ARC code, I wonder if the easier method wouldn''t be a arc_data_limit much like the arc_meta_limit. It would work exactly the same way as meta_limit, but now we''d be able to save the precious ARC for as much metadata as we desired. When primarycache=all, ARC would contain user data, and thus would be available to the L2ARC for caching. Because we can set the max amount of user data to live in the cache, we would be able to control the erosion of the ARC''s metadata. Eventually if you have your secondarycache=all , both metadata and userdata would enter the L2ARC. You''d be able to achieve many levels of fine-grained control over the caching. As it stands right now from my tests, the user data will _always_ erode the metadata from the ARC, and that can''t be good for speed in a dedup situation, no matter how much RAM you have available. This means that the general assertion that dedup only impacts a small hit on speed due to the lookup is assuming that you''re not going to use your ARC for anything other than metadata, and that the entire dedup table fits in ARC. The reality is that the DDT needs to be fetched from L2ARC or even disk more frequently than you''d expect.