Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06 (update 3). All file IO is mmap(file), read memory segment, unmap, close. Tweaked the arc size down via mdb to 1GB. I used that value because c_min was also 1GB, and I was not sure if c_max could be larger than c_min....Anyway, I set c_max to 1GB. After a workload run....: > arc::print -tad { . . . ffffffffc02e29e8 uint64_t size = 0t3099832832 ffffffffc02e29f0 uint64_t p = 0t16540761088 ffffffffc02e29f8 uint64_t c = 0t1070318720 ffffffffc02e2a00 uint64_t c_min = 0t1070318720 ffffffffc02e2a08 uint64_t c_max = 0t1070318720 . . . "size" is at 3GB, with c_max at 1GB. What gives? I''m looking at the code now, but was under the impression c_max would limit ARC growth. Granted, it''s not a factor of 10, and it''s certainly much better than the out-of-the-box growth to 24GB (this is a 32GB x4500), so clearly ARC growth is being limited, but it still grew to 3X c_max. Thanks, /jim
FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max: > arc::print -tad { . . . ffffffffc02e29e8 uint64_t size = 0t10527883264 ffffffffc02e29f0 uint64_t p = 0t16381819904 ffffffffc02e29f8 uint64_t c = 0t1070318720 ffffffffc02e2a00 uint64_t c_min = 0t1070318720 ffffffffc02e2a08 uint64_t c_max = 0t1070318720 . . . Perhaps c_max does not do what I think it does? Thanks, /jim Jim Mauro wrote:> Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06 > (update 3). All file IO is mmap(file), read memory segment, unmap, close. > > Tweaked the arc size down via mdb to 1GB. I used that value because > c_min was also 1GB, and I was not sure if c_max could be larger than > c_min....Anyway, I set c_max to 1GB. > > After a workload run....: > > arc::print -tad > { > . . . > ffffffffc02e29e8 uint64_t size = 0t3099832832 > ffffffffc02e29f0 uint64_t p = 0t16540761088 > ffffffffc02e29f8 uint64_t c = 0t1070318720 > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > ffffffffc02e2a08 uint64_t c_max = 0t1070318720 > . . . > > "size" is at 3GB, with c_max at 1GB. > > What gives? I''m looking at the code now, but was under the impression > c_max would limit ARC growth. Granted, it''s not a factor of 10, and > it''s certainly much better than the out-of-the-box growth to 24GB > (this is a 32GB x4500), so clearly ARC growth is being limited, but it > still grew to 3X c_max. > > Thanks, > /jim > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hi Jim, My understanding is that the DLNC can consume quite a bit of memory too, and the ARC limitations (and memory culler) don''t clean the DNLC yet. So if you''re working with a lot of smaller files, you can still go way over your ARC limit. Anyone, please correct me if I''ve got that wrong. -J On 3/15/07, Jim Mauro <James.Mauro at sun.com> wrote:> > FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max: > > > > arc::print -tad > { > . . . > ffffffffc02e29e8 uint64_t size = 0t10527883264 > ffffffffc02e29f0 uint64_t p = 0t16381819904 > ffffffffc02e29f8 uint64_t c = 0t1070318720 > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > ffffffffc02e2a08 uint64_t c_max = 0t1070318720 > . . . > > Perhaps c_max does not do what I think it does? > > Thanks, > /jim > > > Jim Mauro wrote: > > Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06 > > (update 3). All file IO is mmap(file), read memory segment, unmap, close. > > > > Tweaked the arc size down via mdb to 1GB. I used that value because > > c_min was also 1GB, and I was not sure if c_max could be larger than > > c_min....Anyway, I set c_max to 1GB. > > > > After a workload run....: > > > arc::print -tad > > { > > . . . > > ffffffffc02e29e8 uint64_t size = 0t3099832832 > > ffffffffc02e29f0 uint64_t p = 0t16540761088 > > ffffffffc02e29f8 uint64_t c = 0t1070318720 > > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > > ffffffffc02e2a08 uint64_t c_max = 0t1070318720 > > . . . > > > > "size" is at 3GB, with c_max at 1GB. > > > > What gives? I''m looking at the code now, but was under the impression > > c_max would limit ARC growth. Granted, it''s not a factor of 10, and > > it''s certainly much better than the out-of-the-box growth to 24GB > > (this is a 32GB x4500), so clearly ARC growth is being limited, but it > > still grew to 3X c_max. > > > > Thanks, > > /jim > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
This seems a bit strange. What''s the workload, and also, what''s the output for:> ARC_mru::print size lsize > ARC_mfu::print size lsizeand> ARC_anon::print sizeFor obvious reasons, the ARC can''t evict buffers that are in use. Buffers that are available to be evicted should be on the mru or mfu list, so this output should be instructive. -j On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote:> > FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max: > > > > arc::print -tad > { > . . . > ffffffffc02e29e8 uint64_t size = 0t10527883264 > ffffffffc02e29f0 uint64_t p = 0t16381819904 > ffffffffc02e29f8 uint64_t c = 0t1070318720 > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > ffffffffc02e2a08 uint64_t c_max = 0t1070318720 > . . . > > Perhaps c_max does not do what I think it does? > > Thanks, > /jim > > > Jim Mauro wrote: > >Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06 > >(update 3). All file IO is mmap(file), read memory segment, unmap, close. > > > >Tweaked the arc size down via mdb to 1GB. I used that value because > >c_min was also 1GB, and I was not sure if c_max could be larger than > >c_min....Anyway, I set c_max to 1GB. > > > >After a workload run....: > >> arc::print -tad > >{ > >. . . > > ffffffffc02e29e8 uint64_t size = 0t3099832832 > > ffffffffc02e29f0 uint64_t p = 0t16540761088 > > ffffffffc02e29f8 uint64_t c = 0t1070318720 > > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > > ffffffffc02e2a08 uint64_t c_max = 0t1070318720 > >. . . > > > >"size" is at 3GB, with c_max at 1GB. > > > >What gives? I''m looking at the code now, but was under the impression > >c_max would limit ARC growth. Granted, it''s not a factor of 10, and > >it''s certainly much better than the out-of-the-box growth to 24GB > >(this is a 32GB x4500), so clearly ARC growth is being limited, but it > >still grew to 3X c_max. > > > >Thanks, > >/jim > >_______________________________________________ > >zfs-discuss mailing list > >zfs-discuss at opensolaris.org > >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> ARC_mru::print -d size lsizesize = 0t10224433152 lsize = 0t10218960896 > ARC_mfu::print -d size lsize size = 0t303450112 lsize = 0t289998848 > ARC_anon::print -d size size = 0 > So it looks like the MRU is running at 10GB... What does this tell us? Thanks, /jim johansen-osdev at sun.com wrote:> This seems a bit strange. What''s the workload, and also, what''s the > output for: > > >> ARC_mru::print size lsize >> ARC_mfu::print size lsize >> > and > >> ARC_anon::print size >> > > For obvious reasons, the ARC can''t evict buffers that are in use. > Buffers that are available to be evicted should be on the mru or mfu > list, so this output should be instructive. > > -j > > On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote: > >> FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max: >> >> >> >>> arc::print -tad >>> >> { >> . . . >> ffffffffc02e29e8 uint64_t size = 0t10527883264 >> ffffffffc02e29f0 uint64_t p = 0t16381819904 >> ffffffffc02e29f8 uint64_t c = 0t1070318720 >> ffffffffc02e2a00 uint64_t c_min = 0t1070318720 >> ffffffffc02e2a08 uint64_t c_max = 0t1070318720 >> . . . >> >> Perhaps c_max does not do what I think it does? >> >> Thanks, >> /jim >> >> >> Jim Mauro wrote: >> >>> Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06 >>> (update 3). All file IO is mmap(file), read memory segment, unmap, close. >>> >>> Tweaked the arc size down via mdb to 1GB. I used that value because >>> c_min was also 1GB, and I was not sure if c_max could be larger than >>> c_min....Anyway, I set c_max to 1GB. >>> >>> After a workload run....: >>> >>>> arc::print -tad >>>> >>> { >>> . . . >>> ffffffffc02e29e8 uint64_t size = 0t3099832832 >>> ffffffffc02e29f0 uint64_t p = 0t16540761088 >>> ffffffffc02e29f8 uint64_t c = 0t1070318720 >>> ffffffffc02e2a00 uint64_t c_min = 0t1070318720 >>> ffffffffc02e2a08 uint64_t c_max = 0t1070318720 >>> . . . >>> >>> "size" is at 3GB, with c_max at 1GB. >>> >>> What gives? I''m looking at the code now, but was under the impression >>> c_max would limit ARC growth. Granted, it''s not a factor of 10, and >>> it''s certainly much better than the out-of-the-box growth to 24GB >>> (this is a 32GB x4500), so clearly ARC growth is being limited, but it >>> still grew to 3X c_max. >>> >>> Thanks, >>> /jim >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>
Gar. This isn''t what I was hoping to see. Buffers that aren''t available for eviction aren''t listed in the lsize count. It looks like the MRU has grown to 10Gb and most of this could be successfully evicted. The calculation for determining if we evict from the MRU is in arc_adjust() and looks something like: top_sz = ARC_anon.size + ARC_mru.size Then if top_sz > arc.p and ARC_mru.lsize > 0 we evict the smaller of ARC_mru.lsize and top_size - arc.p In your previous message it looks like arc.p is > (ARC_mru.size + ARC_anon.size). It might make sense to double-check these numbers together, so when you check the size and lsize again, also check arc.p. How/when did you configure arc_c_max? arc.p is supposed to be initialized to half of arc.c. Also, I assume that there''s a reliable test case for reproducing this problem? Thanks, -j On Thu, Mar 15, 2007 at 06:57:12PM -0400, Jim Mauro wrote:> > > > ARC_mru::print -d size lsize > size = 0t10224433152 > lsize = 0t10218960896 > > ARC_mfu::print -d size lsize > size = 0t303450112 > lsize = 0t289998848 > > ARC_anon::print -d size > size = 0 > > > > So it looks like the MRU is running at 10GB... > > What does this tell us? > > Thanks, > /jim > > > > johansen-osdev at sun.com wrote: > >This seems a bit strange. What''s the workload, and also, what''s the > >output for: > > > > > >>ARC_mru::print size lsize > >>ARC_mfu::print size lsize > >> > >and > > > >>ARC_anon::print size > >> > > > >For obvious reasons, the ARC can''t evict buffers that are in use. > >Buffers that are available to be evicted should be on the mru or mfu > >list, so this output should be instructive. > > > >-j > > > >On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote: > > > >>FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max: > >> > >> > >> > >>>arc::print -tad > >>> > >>{ > >>. . . > >> ffffffffc02e29e8 uint64_t size = 0t10527883264 > >> ffffffffc02e29f0 uint64_t p = 0t16381819904 > >> ffffffffc02e29f8 uint64_t c = 0t1070318720 > >> ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > >> ffffffffc02e2a08 uint64_t c_max = 0t1070318720 > >>. . . > >> > >>Perhaps c_max does not do what I think it does? > >> > >>Thanks, > >>/jim > >> > >> > >>Jim Mauro wrote: > >> > >>>Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06 > >>>(update 3). All file IO is mmap(file), read memory segment, unmap, close. > >>> > >>>Tweaked the arc size down via mdb to 1GB. I used that value because > >>>c_min was also 1GB, and I was not sure if c_max could be larger than > >>>c_min....Anyway, I set c_max to 1GB. > >>> > >>>After a workload run....: > >>> > >>>>arc::print -tad > >>>> > >>>{ > >>>. . . > >>> ffffffffc02e29e8 uint64_t size = 0t3099832832 > >>> ffffffffc02e29f0 uint64_t p = 0t16540761088 > >>> ffffffffc02e29f8 uint64_t c = 0t1070318720 > >>> ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > >>> ffffffffc02e2a08 uint64_t c_max = 0t1070318720 > >>>. . . > >>> > >>>"size" is at 3GB, with c_max at 1GB. > >>> > >>>What gives? I''m looking at the code now, but was under the impression > >>>c_max would limit ARC growth. Granted, it''s not a factor of 10, and > >>>it''s certainly much better than the out-of-the-box growth to 24GB > >>>(this is a 32GB x4500), so clearly ARC growth is being limited, but it > >>>still grew to 3X c_max. > >>> > >>>Thanks, > >>>/jim > >>>_______________________________________________ > >>>zfs-discuss mailing list > >>>zfs-discuss at opensolaris.org > >>>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >>> > >>_______________________________________________ > >>zfs-discuss mailing list > >>zfs-discuss at opensolaris.org > >>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Something else to consider, depending upon how you set arc_c_max, you may just want to set arc_c and arc_p at the same time. If you try setting arc_c_max, and then setting arc_c to arc_c_max, and then set arc_p to arc_c / 2, do you still get this problem? -j On Thu, Mar 15, 2007 at 05:18:12PM -0700, johansen-osdev at sun.com wrote:> Gar. This isn''t what I was hoping to see. Buffers that aren''t > available for eviction aren''t listed in the lsize count. It looks like > the MRU has grown to 10Gb and most of this could be successfully > evicted. > > The calculation for determining if we evict from the MRU is in > arc_adjust() and looks something like: > > top_sz = ARC_anon.size + ARC_mru.size > > Then if top_sz > arc.p and ARC_mru.lsize > 0 we evict the smaller of > ARC_mru.lsize and top_size - arc.p > > In your previous message it looks like arc.p is > (ARC_mru.size + > ARC_anon.size). It might make sense to double-check these numbers > together, so when you check the size and lsize again, also check arc.p. > > How/when did you configure arc_c_max? arc.p is supposed to be > initialized to half of arc.c. Also, I assume that there''s a reliable > test case for reproducing this problem? > > Thanks, > > -j > > On Thu, Mar 15, 2007 at 06:57:12PM -0400, Jim Mauro wrote: > > > > > > > ARC_mru::print -d size lsize > > size = 0t10224433152 > > lsize = 0t10218960896 > > > ARC_mfu::print -d size lsize > > size = 0t303450112 > > lsize = 0t289998848 > > > ARC_anon::print -d size > > size = 0 > > > > > > > So it looks like the MRU is running at 10GB... > > > > What does this tell us? > > > > Thanks, > > /jim > > > > > > > > johansen-osdev at sun.com wrote: > > >This seems a bit strange. What''s the workload, and also, what''s the > > >output for: > > > > > > > > >>ARC_mru::print size lsize > > >>ARC_mfu::print size lsize > > >> > > >and > > > > > >>ARC_anon::print size > > >> > > > > > >For obvious reasons, the ARC can''t evict buffers that are in use. > > >Buffers that are available to be evicted should be on the mru or mfu > > >list, so this output should be instructive. > > > > > >-j > > > > > >On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote: > > > > > >>FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max: > > >> > > >> > > >> > > >>>arc::print -tad > > >>> > > >>{ > > >>. . . > > >> ffffffffc02e29e8 uint64_t size = 0t10527883264 > > >> ffffffffc02e29f0 uint64_t p = 0t16381819904 > > >> ffffffffc02e29f8 uint64_t c = 0t1070318720 > > >> ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > > >> ffffffffc02e2a08 uint64_t c_max = 0t1070318720 > > >>. . . > > >> > > >>Perhaps c_max does not do what I think it does? > > >> > > >>Thanks, > > >>/jim > > >> > > >> > > >>Jim Mauro wrote: > > >> > > >>>Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06 > > >>>(update 3). All file IO is mmap(file), read memory segment, unmap, close. > > >>> > > >>>Tweaked the arc size down via mdb to 1GB. I used that value because > > >>>c_min was also 1GB, and I was not sure if c_max could be larger than > > >>>c_min....Anyway, I set c_max to 1GB. > > >>> > > >>>After a workload run....: > > >>> > > >>>>arc::print -tad > > >>>> > > >>>{ > > >>>. . . > > >>> ffffffffc02e29e8 uint64_t size = 0t3099832832 > > >>> ffffffffc02e29f0 uint64_t p = 0t16540761088 > > >>> ffffffffc02e29f8 uint64_t c = 0t1070318720 > > >>> ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > > >>> ffffffffc02e2a08 uint64_t c_max = 0t1070318720 > > >>>. . . > > >>> > > >>>"size" is at 3GB, with c_max at 1GB. > > >>> > > >>>What gives? I''m looking at the code now, but was under the impression > > >>>c_max would limit ARC growth. Granted, it''s not a factor of 10, and > > >>>it''s certainly much better than the out-of-the-box growth to 24GB > > >>>(this is a 32GB x4500), so clearly ARC growth is being limited, but it > > >>>still grew to 3X c_max. > > >>> > > >>>Thanks, > > >>>/jim > > >>>_______________________________________________ > > >>>zfs-discuss mailing list > > >>>zfs-discuss at opensolaris.org > > >>>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > >>> > > >>_______________________________________________ > > >>zfs-discuss mailing list > > >>zfs-discuss at opensolaris.org > > >>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > >> > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > How/when did you configure arc_c_max?Immediately following a reboot, I set arc.c_max using mdb, then verified reading the arc structure again.> arc.p is supposed to be > initialized to half of arc.c. Also, I assume that there''s a reliable > test case for reproducing this problem? >Yep. I''m using a x4500 in-house to sort out performance of a customer test case that uses mmap. We acquired the new DIMMs to bring the x4500 to 32GB, since the workload has a 64GB working set size, and we were clobbering a 16GB thumper. We wanted to see how doubling memory may help. I''m trying clamp the ARC size because for mmap-intensive workloads, it seems to hurt more than help (although, based on experiments up to this point, it''s not hurting a lot). I''ll do another reboot, and run it all down for you serially... /jim> Thanks, > > -j > > On Thu, Mar 15, 2007 at 06:57:12PM -0400, Jim Mauro wrote: > >> >>> ARC_mru::print -d size lsize >>> >> size = 0t10224433152 >> lsize = 0t10218960896 >> >>> ARC_mfu::print -d size lsize >>> >> size = 0t303450112 >> lsize = 0t289998848 >> >>> ARC_anon::print -d size >>> >> size = 0 >> >> So it looks like the MRU is running at 10GB... >> >> What does this tell us? >> >> Thanks, >> /jim >> >> >> >> johansen-osdev at sun.com wrote: >> >>> This seems a bit strange. What''s the workload, and also, what''s the >>> output for: >>> >>> >>> >>>> ARC_mru::print size lsize >>>> ARC_mfu::print size lsize >>>> >>>> >>> and >>> >>> >>>> ARC_anon::print size >>>> >>>> >>> For obvious reasons, the ARC can''t evict buffers that are in use. >>> Buffers that are available to be evicted should be on the mru or mfu >>> list, so this output should be instructive. >>> >>> -j >>> >>> On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote: >>> >>> >>>> FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max: >>>> >>>> >>>> >>>> >>>>> arc::print -tad >>>>> >>>>> >>>> { >>>> . . . >>>> ffffffffc02e29e8 uint64_t size = 0t10527883264 >>>> ffffffffc02e29f0 uint64_t p = 0t16381819904 >>>> ffffffffc02e29f8 uint64_t c = 0t1070318720 >>>> ffffffffc02e2a00 uint64_t c_min = 0t1070318720 >>>> ffffffffc02e2a08 uint64_t c_max = 0t1070318720 >>>> . . . >>>> >>>> Perhaps c_max does not do what I think it does? >>>> >>>> Thanks, >>>> /jim >>>> >>>> >>>> Jim Mauro wrote: >>>> >>>> >>>>> Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06 >>>>> (update 3). All file IO is mmap(file), read memory segment, unmap, close. >>>>> >>>>> Tweaked the arc size down via mdb to 1GB. I used that value because >>>>> c_min was also 1GB, and I was not sure if c_max could be larger than >>>>> c_min....Anyway, I set c_max to 1GB. >>>>> >>>>> After a workload run....: >>>>> >>>>> >>>>>> arc::print -tad >>>>>> >>>>>> >>>>> { >>>>> . . . >>>>> ffffffffc02e29e8 uint64_t size = 0t3099832832 >>>>> ffffffffc02e29f0 uint64_t p = 0t16540761088 >>>>> ffffffffc02e29f8 uint64_t c = 0t1070318720 >>>>> ffffffffc02e2a00 uint64_t c_min = 0t1070318720 >>>>> ffffffffc02e2a08 uint64_t c_max = 0t1070318720 >>>>> . . . >>>>> >>>>> "size" is at 3GB, with c_max at 1GB. >>>>> >>>>> What gives? I''m looking at the code now, but was under the impression >>>>> c_max would limit ARC growth. Granted, it''s not a factor of 10, and >>>>> it''s certainly much better than the out-of-the-box growth to 24GB >>>>> (this is a 32GB x4500), so clearly ARC growth is being limited, but it >>>>> still grew to 3X c_max. >>>>> >>>>> Thanks, >>>>> /jim >>>>> _______________________________________________ >>>>> zfs-discuss mailing list >>>>> zfs-discuss at opensolaris.org >>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>>> >>>>> >>>> _______________________________________________ >>>> zfs-discuss mailing list >>>> zfs-discuss at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>> >>>> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>
I suppose I should have been more forward about making my last point. If the arc_c_max isn''t set in /etc/system, I don''t believe that the ARC will initialize arc.p to the correct value. I could be wrong about this; however, next time you set c_max, set c to the same value as c_max and set p to half of c. Let me know if this addresses the problem or not. -j> >How/when did you configure arc_c_max? > Immediately following a reboot, I set arc.c_max using mdb, > then verified reading the arc structure again. > >arc.p is supposed to be > >initialized to half of arc.c. Also, I assume that there''s a reliable > >test case for reproducing this problem? > > > Yep. I''m using a x4500 in-house to sort out performance of a customer test > case that uses mmap. We acquired the new DIMMs to bring the > x4500 to 32GB, since the workload has a 64GB working set size, > and we were clobbering a 16GB thumper. We wanted to see how doubling > memory may help. > > I''m trying clamp the ARC size because for mmap-intensive workloads, > it seems to hurt more than help (although, based on experiments up to this > point, it''s not hurting a lot). > > I''ll do another reboot, and run it all down for you serially... > > /jim > > >Thanks, > > > >-j > > > >On Thu, Mar 15, 2007 at 06:57:12PM -0400, Jim Mauro wrote: > > > >> > >>>ARC_mru::print -d size lsize > >>> > >>size = 0t10224433152 > >>lsize = 0t10218960896 > >> > >>>ARC_mfu::print -d size lsize > >>> > >>size = 0t303450112 > >>lsize = 0t289998848 > >> > >>>ARC_anon::print -d size > >>> > >>size = 0 > >> > >>So it looks like the MRU is running at 10GB... > >> > >>What does this tell us? > >> > >>Thanks, > >>/jim > >> > >> > >> > >>johansen-osdev at sun.com wrote: > >> > >>>This seems a bit strange. What''s the workload, and also, what''s the > >>>output for: > >>> > >>> > >>> > >>>>ARC_mru::print size lsize > >>>>ARC_mfu::print size lsize > >>>> > >>>> > >>>and > >>> > >>> > >>>>ARC_anon::print size > >>>> > >>>> > >>>For obvious reasons, the ARC can''t evict buffers that are in use. > >>>Buffers that are available to be evicted should be on the mru or mfu > >>>list, so this output should be instructive. > >>> > >>>-j > >>> > >>>On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote: > >>> > >>> > >>>>FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max: > >>>> > >>>> > >>>> > >>>> > >>>>>arc::print -tad > >>>>> > >>>>> > >>>>{ > >>>>. . . > >>>> ffffffffc02e29e8 uint64_t size = 0t10527883264 > >>>> ffffffffc02e29f0 uint64_t p = 0t16381819904 > >>>> ffffffffc02e29f8 uint64_t c = 0t1070318720 > >>>> ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > >>>> ffffffffc02e2a08 uint64_t c_max = 0t1070318720 > >>>>. . . > >>>> > >>>>Perhaps c_max does not do what I think it does? > >>>> > >>>>Thanks, > >>>>/jim > >>>> > >>>> > >>>>Jim Mauro wrote: > >>>> > >>>> > >>>>>Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06 > >>>>>(update 3). All file IO is mmap(file), read memory segment, unmap, > >>>>>close. > >>>>> > >>>>>Tweaked the arc size down via mdb to 1GB. I used that value because > >>>>>c_min was also 1GB, and I was not sure if c_max could be larger than > >>>>>c_min....Anyway, I set c_max to 1GB. > >>>>> > >>>>>After a workload run....: > >>>>> > >>>>> > >>>>>>arc::print -tad > >>>>>> > >>>>>> > >>>>>{ > >>>>>. . . > >>>>>ffffffffc02e29e8 uint64_t size = 0t3099832832 > >>>>>ffffffffc02e29f0 uint64_t p = 0t16540761088 > >>>>>ffffffffc02e29f8 uint64_t c = 0t1070318720 > >>>>>ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > >>>>>ffffffffc02e2a08 uint64_t c_max = 0t1070318720 > >>>>>. . . > >>>>> > >>>>>"size" is at 3GB, with c_max at 1GB. > >>>>> > >>>>>What gives? I''m looking at the code now, but was under the impression > >>>>>c_max would limit ARC growth. Granted, it''s not a factor of 10, and > >>>>>it''s certainly much better than the out-of-the-box growth to 24GB > >>>>>(this is a 32GB x4500), so clearly ARC growth is being limited, but it > >>>>>still grew to 3X c_max. > >>>>> > >>>>>Thanks, > >>>>>/jim > >>>>>_______________________________________________ > >>>>>zfs-discuss mailing list > >>>>>zfs-discuss at opensolaris.org > >>>>>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >>>>> > >>>>> > >>>>_______________________________________________ > >>>>zfs-discuss mailing list > >>>>zfs-discuss at opensolaris.org > >>>>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >>>> > >>>> > >>_______________________________________________ > >>zfs-discuss mailing list > >>zfs-discuss at opensolaris.org > >>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Following a reboot: > arc::print -tad { . . . ffffffffc02e29e8 uint64_t size = 0t299008 ffffffffc02e29f0 uint64_t p = 0t16588228608 ffffffffc02e29f8 uint64_t c = 0t33176457216 ffffffffc02e2a00 uint64_t c_min = 0t1070318720 ffffffffc02e2a08 uint64_t c_max = 0t33176457216 . . . } > ffffffffc02e2a08 /Z 0x20000000 <------------------- set c_max to 512MB arc+0x48: 0x7b9789000 = 0x20000000 > arc::print -tad { . . . ffffffffc02e29e8 uint64_t size = 0t299008 ffffffffc02e29f0 uint64_t p = 0t16588228608 ffffffffc02e29f8 uint64_t c = 0t33176457216 ffffffffc02e2a00 uint64_t c_min = 0t1070318720 ffffffffc02e2a08 uint64_t c_max = 0t536870912 <--------- c_max is 512MB . . . } > ARC_mru::print -d size lsize size = 0t294912 lsize = 0t32768 Run the workload a couple times... ffffffffc02e29e8 uint64_t size = 0t27121205248 <------- ARC size is 27GB ffffffffc02e29f0 uint64_t p = 0t10551351442 ffffffffc02e29f8 uint64_t c = 0t27121332576 ffffffffc02e2a00 uint64_t c_min = 0t1070318720 ffffffffc02e2a08 uint64_t c_max = 0t536870912 <--------- c_max is 512MB > ARC_mru::print -d size lsize size = 0t223985664 lsize = 0t221839360 > ARC_mfu::print -d size lsize size = 0t26897219584 <---------------------- MFU list is almost 27GB ... lsize = 0t26869121024 Thanks, /jim
Will try that now... /jim johansen-osdev at sun.com wrote:> I suppose I should have been more forward about making my last point. > If the arc_c_max isn''t set in /etc/system, I don''t believe that the ARC > will initialize arc.p to the correct value. I could be wrong about > this; however, next time you set c_max, set c to the same value as c_max > and set p to half of c. Let me know if this addresses the problem or > not. > > -j > > >>> How/when did you configure arc_c_max? >>> >> Immediately following a reboot, I set arc.c_max using mdb, >> then verified reading the arc structure again. >> >>> arc.p is supposed to be >>> initialized to half of arc.c. Also, I assume that there''s a reliable >>> test case for reproducing this problem? >>> >>> >> Yep. I''m using a x4500 in-house to sort out performance of a customer test >> case that uses mmap. We acquired the new DIMMs to bring the >> x4500 to 32GB, since the workload has a 64GB working set size, >> and we were clobbering a 16GB thumper. We wanted to see how doubling >> memory may help. >> >> I''m trying clamp the ARC size because for mmap-intensive workloads, >> it seems to hurt more than help (although, based on experiments up to this >> point, it''s not hurting a lot). >> >> I''ll do another reboot, and run it all down for you serially... >> >> /jim >> >> >>> Thanks, >>> >>> -j >>> >>> On Thu, Mar 15, 2007 at 06:57:12PM -0400, Jim Mauro wrote: >>> >>> >>>> >>>> >>>>> ARC_mru::print -d size lsize >>>>> >>>>> >>>> size = 0t10224433152 >>>> lsize = 0t10218960896 >>>> >>>> >>>>> ARC_mfu::print -d size lsize >>>>> >>>>> >>>> size = 0t303450112 >>>> lsize = 0t289998848 >>>> >>>> >>>>> ARC_anon::print -d size >>>>> >>>>> >>>> size = 0 >>>> >>>> So it looks like the MRU is running at 10GB... >>>> >>>> What does this tell us? >>>> >>>> Thanks, >>>> /jim >>>> >>>> >>>> >>>> johansen-osdev at sun.com wrote: >>>> >>>> >>>>> This seems a bit strange. What''s the workload, and also, what''s the >>>>> output for: >>>>> >>>>> >>>>> >>>>> >>>>>> ARC_mru::print size lsize >>>>>> ARC_mfu::print size lsize >>>>>> >>>>>> >>>>>> >>>>> and >>>>> >>>>> >>>>> >>>>>> ARC_anon::print size >>>>>> >>>>>> >>>>>> >>>>> For obvious reasons, the ARC can''t evict buffers that are in use. >>>>> Buffers that are available to be evicted should be on the mru or mfu >>>>> list, so this output should be instructive. >>>>> >>>>> -j >>>>> >>>>> On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote: >>>>> >>>>> >>>>> >>>>>> FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> arc::print -tad >>>>>>> >>>>>>> >>>>>>> >>>>>> { >>>>>> . . . >>>>>> ffffffffc02e29e8 uint64_t size = 0t10527883264 >>>>>> ffffffffc02e29f0 uint64_t p = 0t16381819904 >>>>>> ffffffffc02e29f8 uint64_t c = 0t1070318720 >>>>>> ffffffffc02e2a00 uint64_t c_min = 0t1070318720 >>>>>> ffffffffc02e2a08 uint64_t c_max = 0t1070318720 >>>>>> . . . >>>>>> >>>>>> Perhaps c_max does not do what I think it does? >>>>>> >>>>>> Thanks, >>>>>> /jim >>>>>> >>>>>> >>>>>> Jim Mauro wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06 >>>>>>> (update 3). All file IO is mmap(file), read memory segment, unmap, >>>>>>> close. >>>>>>> >>>>>>> Tweaked the arc size down via mdb to 1GB. I used that value because >>>>>>> c_min was also 1GB, and I was not sure if c_max could be larger than >>>>>>> c_min....Anyway, I set c_max to 1GB. >>>>>>> >>>>>>> After a workload run....: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> arc::print -tad >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> { >>>>>>> . . . >>>>>>> ffffffffc02e29e8 uint64_t size = 0t3099832832 >>>>>>> ffffffffc02e29f0 uint64_t p = 0t16540761088 >>>>>>> ffffffffc02e29f8 uint64_t c = 0t1070318720 >>>>>>> ffffffffc02e2a00 uint64_t c_min = 0t1070318720 >>>>>>> ffffffffc02e2a08 uint64_t c_max = 0t1070318720 >>>>>>> . . . >>>>>>> >>>>>>> "size" is at 3GB, with c_max at 1GB. >>>>>>> >>>>>>> What gives? I''m looking at the code now, but was under the impression >>>>>>> c_max would limit ARC growth. Granted, it''s not a factor of 10, and >>>>>>> it''s certainly much better than the out-of-the-box growth to 24GB >>>>>>> (this is a 32GB x4500), so clearly ARC growth is being limited, but it >>>>>>> still grew to 3X c_max. >>>>>>> >>>>>>> Thanks, >>>>>>> /jim >>>>>>> _______________________________________________ >>>>>>> zfs-discuss mailing list >>>>>>> zfs-discuss at opensolaris.org >>>>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> zfs-discuss mailing list >>>>>> zfs-discuss at opensolaris.org >>>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>>>> >>>>>> >>>>>> >>>> _______________________________________________ >>>> zfs-discuss mailing list >>>> zfs-discuss at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>> >>>> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
All righty...I set c_max to 512MB, c to 512MB, and p to 256MB... > arc::print -tad { ... ffffffffc02e29e8 uint64_t size = 0t299008 ffffffffc02e29f0 uint64_t p = 0t16588228608 ffffffffc02e29f8 uint64_t c = 0t33176457216 ffffffffc02e2a00 uint64_t c_min = 0t1070318720 ffffffffc02e2a08 uint64_t c_max = 0t33176457216 ... } > ffffffffc02e2a08 /Z 0x20000000 arc+0x48: 0x7b9789000 = 0x20000000 > ffffffffc02e29f8 /Z 0x20000000 arc+0x38: 0x7b9789000 = 0x20000000 > ffffffffc02e29f0 /Z 0x10000000 arc+0x30: 0x3dcbc4800 = 0x10000000 > arc::print -tad { ... ffffffffc02e29e8 uint64_t size = 0t299008 ffffffffc02e29f0 uint64_t p = 0t268435456 <------ p is 256MB ffffffffc02e29f8 uint64_t c = 0t536870912 <------ c is 512MB ffffffffc02e2a00 uint64_t c_min = 0t1070318720 ffffffffc02e2a08 uint64_t c_max = 0t536870912 <------- c_max is 512MB ... } After a few runs of the workload ... > arc::print -d size size = 0t536788992 > Ah - looks like we''re out of the woods. The ARC remains clamped at 512MB. Thanks! /jim johansen-osdev at sun.com wrote:> I suppose I should have been more forward about making my last point. > If the arc_c_max isn''t set in /etc/system, I don''t believe that the ARC > will initialize arc.p to the correct value. I could be wrong about > this; however, next time you set c_max, set c to the same value as c_max > and set p to half of c. Let me know if this addresses the problem or > not. > > -j > > >>> How/when did you configure arc_c_max? >>> >> Immediately following a reboot, I set arc.c_max using mdb, >> then verified reading the arc structure again. >> >>> arc.p is supposed to be >>> initialized to half of arc.c. Also, I assume that there''s a reliable >>> test case for reproducing this problem? >>> >>> >> Yep. I''m using a x4500 in-house to sort out performance of a customer test >> case that uses mmap. We acquired the new DIMMs to bring the >> x4500 to 32GB, since the workload has a 64GB working set size, >> and we were clobbering a 16GB thumper. We wanted to see how doubling >> memory may help. >> >> I''m trying clamp the ARC size because for mmap-intensive workloads, >> it seems to hurt more than help (although, based on experiments up to this >> point, it''s not hurting a lot). >> >> I''ll do another reboot, and run it all down for you serially... >> >> /jim >> >> >>> Thanks, >>> >>> -j >>> >>> On Thu, Mar 15, 2007 at 06:57:12PM -0400, Jim Mauro wrote: >>> >>> >>>> >>>> >>>>> ARC_mru::print -d size lsize >>>>> >>>>> >>>> size = 0t10224433152 >>>> lsize = 0t10218960896 >>>> >>>> >>>>> ARC_mfu::print -d size lsize >>>>> >>>>> >>>> size = 0t303450112 >>>> lsize = 0t289998848 >>>> >>>> >>>>> ARC_anon::print -d size >>>>> >>>>> >>>> size = 0 >>>> >>>> So it looks like the MRU is running at 10GB... >>>> >>>> What does this tell us? >>>> >>>> Thanks, >>>> /jim >>>> >>>> >>>> >>>> johansen-osdev at sun.com wrote: >>>> >>>> >>>>> This seems a bit strange. What''s the workload, and also, what''s the >>>>> output for: >>>>> >>>>> >>>>> >>>>> >>>>>> ARC_mru::print size lsize >>>>>> ARC_mfu::print size lsize >>>>>> >>>>>> >>>>>> >>>>> and >>>>> >>>>> >>>>> >>>>>> ARC_anon::print size >>>>>> >>>>>> >>>>>> >>>>> For obvious reasons, the ARC can''t evict buffers that are in use. >>>>> Buffers that are available to be evicted should be on the mru or mfu >>>>> list, so this output should be instructive. >>>>> >>>>> -j >>>>> >>>>> On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote: >>>>> >>>>> >>>>> >>>>>> FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> arc::print -tad >>>>>>> >>>>>>> >>>>>>> >>>>>> { >>>>>> . . . >>>>>> ffffffffc02e29e8 uint64_t size = 0t10527883264 >>>>>> ffffffffc02e29f0 uint64_t p = 0t16381819904 >>>>>> ffffffffc02e29f8 uint64_t c = 0t1070318720 >>>>>> ffffffffc02e2a00 uint64_t c_min = 0t1070318720 >>>>>> ffffffffc02e2a08 uint64_t c_max = 0t1070318720 >>>>>> . . . >>>>>> >>>>>> Perhaps c_max does not do what I think it does? >>>>>> >>>>>> Thanks, >>>>>> /jim >>>>>> >>>>>> >>>>>> Jim Mauro wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06 >>>>>>> (update 3). All file IO is mmap(file), read memory segment, unmap, >>>>>>> close. >>>>>>> >>>>>>> Tweaked the arc size down via mdb to 1GB. I used that value because >>>>>>> c_min was also 1GB, and I was not sure if c_max could be larger than >>>>>>> c_min....Anyway, I set c_max to 1GB. >>>>>>> >>>>>>> After a workload run....: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> arc::print -tad >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> { >>>>>>> . . . >>>>>>> ffffffffc02e29e8 uint64_t size = 0t3099832832 >>>>>>> ffffffffc02e29f0 uint64_t p = 0t16540761088 >>>>>>> ffffffffc02e29f8 uint64_t c = 0t1070318720 >>>>>>> ffffffffc02e2a00 uint64_t c_min = 0t1070318720 >>>>>>> ffffffffc02e2a08 uint64_t c_max = 0t1070318720 >>>>>>> . . . >>>>>>> >>>>>>> "size" is at 3GB, with c_max at 1GB. >>>>>>> >>>>>>> What gives? I''m looking at the code now, but was under the impression >>>>>>> c_max would limit ARC growth. Granted, it''s not a factor of 10, and >>>>>>> it''s certainly much better than the out-of-the-box growth to 24GB >>>>>>> (this is a 32GB x4500), so clearly ARC growth is being limited, but it >>>>>>> still grew to 3X c_max. >>>>>>> >>>>>>> Thanks, >>>>>>> /jim >>>>>>> _______________________________________________ >>>>>>> zfs-discuss mailing list >>>>>>> zfs-discuss at opensolaris.org >>>>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> zfs-discuss mailing list >>>>>> zfs-discuss at opensolaris.org >>>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>>>> >>>>>> >>>>>> >>>> _______________________________________________ >>>> zfs-discuss mailing list >>>> zfs-discuss at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>> >>>> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
I''ve been seeing this failure to cap on a number of (Solaris 10 update 2 and 3) machines since the script came out (arc hogging is a huge problem for me, esp on Oracle). This is probably a red herring, but my v490 testbed seemed to actually cap on 3 separate tests, but my t2000 testbed doesn''t even pretend to cap - kernel memory (as identified in Orca) sails right to the top, leaves me maybe 2GB free on a 32GB machine and shoves Oracle data into swap. This isn''t as amusing as one Stage and one Production Oracle machine which have 128GB and 96GB respectively. Sending in 92GB core dumps to support is an impressive gesture taking 2-3 days to complete. This message posted from opensolaris.org
johansen-osdev at sun.com
2007-Mar-16 20:36 UTC
[zfs-discuss] Re: C''mon ARC, stay small...
> I''ve been seeing this failure to cap on a number of (Solaris 10 update > 2 and 3) machines since the script came out (arc hogging is a huge > problem for me, esp on Oracle). This is probably a red herring, but my > v490 testbed seemed to actually cap on 3 separate tests, but my t2000 > testbed doesn''t even pretend to cap - kernel memory (as identified in > Orca) sails right to the top, leaves me maybe 2GB free on a 32GB > machine and shoves Oracle data into swap.What method are you using to cap this memory? Jim and I just disucssed the required steps for doing this by hand using MDB.> This isn''t as amusing as one Stage and one Production Oracle machine > which have 128GB and 96GB respectively. Sending in 92GB core dumps to > support is an impressive gesture taking 2-3 days to complete.This is solved by CR 4894692, which is in snv_56 and s10u4. -j
On Mar 16, 2007, at 1:29 PM, JS wrote:> I''ve been seeing this failure to cap on a number of (Solaris 10 > update 2 and 3) machines since the script came out (arc hogging is > a huge problem for me, esp on Oracle). This is probably a red > herring, but my v490 testbed seemed to actually cap on 3 separate > tests, but my t2000 testbed doesn''t even pretend to cap - kernel > memory (as identified in Orca) sails right to the top, leaves me > maybe 2GB free on a 32GB machine and shoves Oracle data into swap. > This isn''t as amusing as one Stage and one Production Oracle > machine which have 128GB and 96GB respectively. Sending in 92GB > core dumps to support is an impressive gesture taking 2-3 days to > complete. >hey Jeff, For the ARC using lots of memory, is this a problem for you just on the startup of Oracle or throughout? If the ARC didn''t cache user data (still would cache metadata), do you foresee that as a win in your tests? This could be set per-dataset. eric
My biggest concern has been more making sure that Oracle doesn''t have to fight to get memory, which it does now. There''s definite performance uptick during the process of releasing ARC cache memory to allow Oracle to get what it''s asking for and this is passed on to the application. The problem has been that, while considering percentages of total memory as being limits for ARC cache, it makes sense with 8 or 16GB RAM to be sure that ARC has a reasonable local cache, but I can''t imagine so much kernel memory for 128GB. I''m still working with the older paradigm that it''s better to leave 80% of my RAM doing nothing 80% of the time because of the performance benefit I get the %20 of the time I need it to grab it quickly. This message posted from opensolaris.org
Jim Mauro wrote:> All righty...I set c_max to 512MB, c to 512MB, and p to 256MB... > > > arc::print -tad > { > ... > ffffffffc02e29e8 uint64_t size = 0t299008 > ffffffffc02e29f0 uint64_t p = 0t16588228608 > ffffffffc02e29f8 uint64_t c = 0t33176457216 > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > ffffffffc02e2a08 uint64_t c_max = 0t33176457216 > ... > } > > ffffffffc02e2a08 /Z 0x20000000 > arc+0x48: 0x7b9789000 = 0x20000000 > > ffffffffc02e29f8 /Z 0x20000000 > arc+0x38: 0x7b9789000 = 0x20000000 > > ffffffffc02e29f0 /Z 0x10000000 > arc+0x30: 0x3dcbc4800 = 0x10000000 > > arc::print -tad > { > ... > ffffffffc02e29e8 uint64_t size = 0t299008 > ffffffffc02e29f0 uint64_t p = 0t268435456 <------ p > is 256MB > ffffffffc02e29f8 uint64_t c = 0t536870912 <------ c > is 512MB > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > ffffffffc02e2a08 uint64_t c_max = 0t536870912 <------- c_max is > 512MB > ... > } > > After a few runs of the workload ... > > > arc::print -d size > size = 0t536788992 > > > > > Ah - looks like we''re out of the woods. The ARC remains clamped at 512MB.Is there a way to set these fields using /etc/system? Or does this require a new or modified init script to run and do the above with each boot? Darren
With latest Nevada setting zfs_arc_max in /etc/system is sufficient. Playing with mdb on a live system is more tricky and is what caused the problem here. -r Darren.Reed at Sun.COM writes: > Jim Mauro wrote: > > > All righty...I set c_max to 512MB, c to 512MB, and p to 256MB... > > > > > arc::print -tad > > { > > ... > > ffffffffc02e29e8 uint64_t size = 0t299008 > > ffffffffc02e29f0 uint64_t p = 0t16588228608 > > ffffffffc02e29f8 uint64_t c = 0t33176457216 > > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > > ffffffffc02e2a08 uint64_t c_max = 0t33176457216 > > ... > > } > > > ffffffffc02e2a08 /Z 0x20000000 > > arc+0x48: 0x7b9789000 = 0x20000000 > > > ffffffffc02e29f8 /Z 0x20000000 > > arc+0x38: 0x7b9789000 = 0x20000000 > > > ffffffffc02e29f0 /Z 0x10000000 > > arc+0x30: 0x3dcbc4800 = 0x10000000 > > > arc::print -tad > > { > > ... > > ffffffffc02e29e8 uint64_t size = 0t299008 > > ffffffffc02e29f0 uint64_t p = 0t268435456 <------ p > > is 256MB > > ffffffffc02e29f8 uint64_t c = 0t536870912 <------ c > > is 512MB > > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > > ffffffffc02e2a08 uint64_t c_max = 0t536870912 <------- c_max is > > 512MB > > ... > > } > > > > After a few runs of the workload ... > > > > > arc::print -d size > > size = 0t536788992 > > > > > > > > > Ah - looks like we''re out of the woods. The ARC remains clamped at 512MB. > > > Is there a way to set these fields using /etc/system? > Or does this require a new or modified init script to > run and do the above with each boot? > > Darren > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hi Guys, Rather than starting a new thread I thought I''d continue this thread. I''ve been running Build 54 on a Thumper since Mid January and wanted to ask a question about the zfs_arc_max setting. We set it to " 0x100000000 #4GB", however its creeping over that till our Kernel memory usage is nearly 7GB (::memstat inserted below). This is a database server so I was curious if the DNLC would have this affect over time, as it does quite quickly when dealing with small files? Would it be worth upgrade to Build 59? Thank you in advance! Best Regards, Jason Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 1750044 6836 42% Anon 1211203 4731 29% Exec and libs 7648 29 0% Page cache 220434 861 5% Free (cachelist) 318625 1244 8% Free (freelist) 659607 2576 16% Total 4167561 16279 Physical 4078747 15932 On 3/23/07, Roch - PAE <Roch.Bourbonnais at sun.com> wrote:> > With latest Nevada setting zfs_arc_max in /etc/system is > sufficient. Playing with mdb on a live system is more > tricky and is what caused the problem here. > > -r > > Darren.Reed at Sun.COM writes: > > Jim Mauro wrote: > > > > > All righty...I set c_max to 512MB, c to 512MB, and p to 256MB... > > > > > > > arc::print -tad > > > { > > > ... > > > ffffffffc02e29e8 uint64_t size = 0t299008 > > > ffffffffc02e29f0 uint64_t p = 0t16588228608 > > > ffffffffc02e29f8 uint64_t c = 0t33176457216 > > > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > > > ffffffffc02e2a08 uint64_t c_max = 0t33176457216 > > > ... > > > } > > > > ffffffffc02e2a08 /Z 0x20000000 > > > arc+0x48: 0x7b9789000 = 0x20000000 > > > > ffffffffc02e29f8 /Z 0x20000000 > > > arc+0x38: 0x7b9789000 = 0x20000000 > > > > ffffffffc02e29f0 /Z 0x10000000 > > > arc+0x30: 0x3dcbc4800 = 0x10000000 > > > > arc::print -tad > > > { > > > ... > > > ffffffffc02e29e8 uint64_t size = 0t299008 > > > ffffffffc02e29f0 uint64_t p = 0t268435456 <------ p > > > is 256MB > > > ffffffffc02e29f8 uint64_t c = 0t536870912 <------ c > > > is 512MB > > > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > > > ffffffffc02e2a08 uint64_t c_max = 0t536870912 <------- c_max is > > > 512MB > > > ... > > > } > > > > > > After a few runs of the workload ... > > > > > > > arc::print -d size > > > size = 0t536788992 > > > > > > > > > > > > > Ah - looks like we''re out of the woods. The ARC remains clamped at 512MB. > > > > > > Is there a way to set these fields using /etc/system? > > Or does this require a new or modified init script to > > run and do the above with each boot? > > > > Darren > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
So you''re not really sure it''s the ARC growing, but only that the kernel is growing to 6.8GB. Print the arc values via mdb: # mdb -k Loading modules: [ unix krtld genunix specfs dtrace uppc scsi_vhci ufs ip hook neti sctp arp usba nca lofs zfs random sppp crypto ptm ipc ] > arc::print -t size c p c_max uint64_t size = 0x2a8000 uint64_t c = 0x1cdfe800 uint64_t p = 0xe707400 uint64_t c_max = 0x1cdfe800 > Is size <= c_max? Assuming it is, you need to look through kmastats and see where the kernel memory is being used (again, inside mdb): >::kmastat The above generates a LOT of output that''s not completely painless to parse, but it''s not too bad either. If you think it''s DNLC related, you can monitor the number of entries with: # kstat -p unix:0:dnlcstats:dir_entries_cached_current unix:0:dnlcstats:dir_entries_cached_current 9374 # You can also monitor kernel memory for the dnlc (just using grep with the kmastat in mdb): > ::kmastat ! grep dnlc dnlc_space_cache 16 104 254 4096 104 0 > The 5th column starting from the left is "mem in use", in this example 4096. I''m not sure if the dnlc_space_cache represents all of kernel memory used for the dnlc. It might, but I need to look at the code to be sure... Let''s start with this... /jim Jason J. W. Williams wrote:> Hi Guys, > > Rather than starting a new thread I thought I''d continue this thread. > I''ve been running Build 54 on a Thumper since Mid January and wanted > to ask a question about the zfs_arc_max setting. We set it to " > 0x100000000 #4GB", however its creeping over that till our Kernel > memory usage is nearly 7GB (::memstat inserted below). > > This is a database server so I was curious if the DNLC would have this > affect over time, as it does quite quickly when dealing with small > files? Would it be worth upgrade to Build 59? > > Thank you in advance! >> Best Regards, > Jason > > Page Summary Pages MB %Tot > ------------ ---------------- ---------------- ---- > Kernel 1750044 6836 42% > Anon 1211203 4731 29% > Exec and libs 7648 29 0% > Page cache 220434 861 5% > Free (cachelist) 318625 1244 8% > Free (freelist) 659607 2576 16% > > Total 4167561 16279 > Physical 4078747 15932 > > > On 3/23/07, Roch - PAE <Roch.Bourbonnais at sun.com> wrote: >> >> With latest Nevada setting zfs_arc_max in /etc/system is >> sufficient. Playing with mdb on a live system is more >> tricky and is what caused the problem here. >> >> -r >> >> Darren.Reed at Sun.COM writes: >> > Jim Mauro wrote: >> > >> > > All righty...I set c_max to 512MB, c to 512MB, and p to 256MB... >> > > >> > > > arc::print -tad >> > > { >> > > ... >> > > ffffffffc02e29e8 uint64_t size = 0t299008 >> > > ffffffffc02e29f0 uint64_t p = 0t16588228608 >> > > ffffffffc02e29f8 uint64_t c = 0t33176457216 >> > > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 >> > > ffffffffc02e2a08 uint64_t c_max = 0t33176457216 >> > > ... >> > > } >> > > > ffffffffc02e2a08 /Z 0x20000000 >> > > arc+0x48: 0x7b9789000 = 0x20000000 >> > > > ffffffffc02e29f8 /Z 0x20000000 >> > > arc+0x38: 0x7b9789000 = 0x20000000 >> > > > ffffffffc02e29f0 /Z 0x10000000 >> > > arc+0x30: 0x3dcbc4800 = 0x10000000 >> > > > arc::print -tad >> > > { >> > > ... >> > > ffffffffc02e29e8 uint64_t size = 0t299008 >> > > ffffffffc02e29f0 uint64_t p = 0t268435456 >> <------ p >> > > is 256MB >> > > ffffffffc02e29f8 uint64_t c = 0t536870912 >> <------ c >> > > is 512MB >> > > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 >> > > ffffffffc02e2a08 uint64_t c_max = 0t536870912 <------- >> c_max is >> > > 512MB >> > > ... >> > > } >> > > >> > > After a few runs of the workload ... >> > > >> > > > arc::print -d size >> > > size = 0t536788992 >> > > > >> > > >> > > >> > > Ah - looks like we''re out of the woods. The ARC remains clamped >> at 512MB. >> > >> > >> > Is there a way to set these fields using /etc/system? >> > Or does this require a new or modified init script to >> > run and do the above with each boot? >> > >> > Darren >> > >> > _______________________________________________ >> > zfs-discuss mailing list >> > zfs-discuss at opensolaris.org >> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Jason J. W. Williams writes: > Hi Guys, > > Rather than starting a new thread I thought I''d continue this thread. > I''ve been running Build 54 on a Thumper since Mid January and wanted > to ask a question about the zfs_arc_max setting. We set it to " > 0x100000000 #4GB", however its creeping over that till our Kernel > memory usage is nearly 7GB (::memstat inserted below). > > This is a database server so I was curious if the DNLC would have this > affect over time, as it does quite quickly when dealing with small > files? Would it be worth upgrade to Build 59? > Another possibility is that, there is a portion of memory that might be in the kmem caches, ready to be reclaimed and returned to the OS free space. Such reclaims currently only occurs on memory shortage. I think we should do it under some more conditions... This might fall under: CrNumber: 6416757 Synopsis: zfs should return memory eventually http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6416757 If you induce some temporary memory pressure, it would be nice to see if you''re kernel shrinks down to ~4GB. -r > Thank you in advance! > > Best Regards, > Jason > > Page Summary Pages MB %Tot > ------------ ---------------- ---------------- ---- > Kernel 1750044 6836 42% > Anon 1211203 4731 29% > Exec and libs 7648 29 0% > Page cache 220434 861 5% > Free (cachelist) 318625 1244 8% > Free (freelist) 659607 2576 16% > > Total 4167561 16279 > Physical 4078747 15932 > > > On 3/23/07, Roch - PAE <Roch.Bourbonnais at sun.com> wrote: > > > > With latest Nevada setting zfs_arc_max in /etc/system is > > sufficient. Playing with mdb on a live system is more > > tricky and is what caused the problem here. > > > > -r > > > > Darren.Reed at Sun.COM writes: > > > Jim Mauro wrote: > > > > > > > All righty...I set c_max to 512MB, c to 512MB, and p to 256MB... > > > > > > > > > arc::print -tad > > > > { > > > > ... > > > > ffffffffc02e29e8 uint64_t size = 0t299008 > > > > ffffffffc02e29f0 uint64_t p = 0t16588228608 > > > > ffffffffc02e29f8 uint64_t c = 0t33176457216 > > > > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > > > > ffffffffc02e2a08 uint64_t c_max = 0t33176457216 > > > > ... > > > > } > > > > > ffffffffc02e2a08 /Z 0x20000000 > > > > arc+0x48: 0x7b9789000 = 0x20000000 > > > > > ffffffffc02e29f8 /Z 0x20000000 > > > > arc+0x38: 0x7b9789000 = 0x20000000 > > > > > ffffffffc02e29f0 /Z 0x10000000 > > > > arc+0x30: 0x3dcbc4800 = 0x10000000 > > > > > arc::print -tad > > > > { > > > > ... > > > > ffffffffc02e29e8 uint64_t size = 0t299008 > > > > ffffffffc02e29f0 uint64_t p = 0t268435456 <------ p > > > > is 256MB > > > > ffffffffc02e29f8 uint64_t c = 0t536870912 <------ c > > > > is 512MB > > > > ffffffffc02e2a00 uint64_t c_min = 0t1070318720 > > > > ffffffffc02e2a08 uint64_t c_max = 0t536870912 <------- c_max is > > > > 512MB > > > > ... > > > > } > > > > > > > > After a few runs of the workload ... > > > > > > > > > arc::print -d size > > > > size = 0t536788992 > > > > > > > > > > > > > > > > > Ah - looks like we''re out of the woods. The ARC remains clamped at 512MB. > > > > > > > > > Is there a way to set these fields using /etc/system? > > > Or does this require a new or modified init script to > > > run and do the above with each boot? > > > > > > Darren > > > > > > _______________________________________________ > > > zfs-discuss mailing list > > > zfs-discuss at opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss