Hi. We''re running OmniOS as a ZFS storage server. For some reason, our arc cache will grow to a certain point, then suddenly drops. I used arcstat to catch it in action, but I was not able to capture what else was going on in the system at the time. I''ll do that next. read hits miss hit% l2read l2hits l2miss l2hit% arcsz l2size 166 166 0 100 0 0 0 0 85G 225G 5.9K 5.9K 0 100 0 0 0 0 85G 225G 755 715 40 94 40 0 40 0 84G 225G 17K 17K 0 100 0 0 0 0 67G 225G 409 395 14 96 14 0 14 0 49G 225G 388 364 24 93 24 0 24 0 41G 225G 37K 37K 20 99 20 6 14 30 40G 225G For reference, it''s a 12TB pool with 512GB SSD L2 ARC and 198GB RAM. We have nothing else running on the system except NFS. We are also not using dedupe. Here is the output of memstat at one point: # echo ::memstat | mdb -k Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 19061902 74460 38% ZFS File Data 28237282 110301 56% Anon 43112 168 0% Exec and libs 1522 5 0% Page cache 13509 52 0% Free (cachelist) 6366 24 0% Free (freelist) 2958527 11556 6% Total 50322220 196571 Physical 50322219 196571 According to "prstat -s rss" nothing else is consuming the memory. 592 root 33M 26M sleep 59 0 0:00:33 0.0% fmd/27 12 root 13M 11M sleep 59 0 0:00:08 0.0% svc.configd/21 641 root 12M 11M sleep 59 0 0:04:48 0.0% snmpd/1 10 root 14M 10M sleep 59 0 0:00:03 0.0% svc.startd/16 342 root 12M 9084K sleep 59 0 0:00:15 0.0% hald/5 321 root 14M 8652K sleep 59 0 0:03:00 0.0% nscd/52 So far I can''t figure out what could be causing this. The only other thing I can think of is that we have a bunch of zfs send/receive operations going on as backups across 10 datasets in the pool. I am not sure how snapshots and send/receive affect the arc. Does anyone else have any ideas? Thanks, Chris
Hi, If after it decreases in size it stays there it might be similar to: 7111576 arc shrinks in the absence of memory pressure Also, see document: ZFS ARC can shrink down without memory pressure result in slow performance [ID 1404581.1] Specifically, check if arc_no_grow is set to 1 after the cache size is decreased, and if it stays that way. The fix is in one of the SRUs and I think it should be in 11.1 I don''t know if it was fixed in Illumos or even if Illumos was affected by this at all. -- Robert Milkowski http://milek.blogspot.com> -----Original Message----- > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Chris Nagele > Sent: 20 October 2012 18:47 > To: zfs-discuss at opensolaris.org > Subject: [zfs-discuss] ARC de-allocation with large ram > > Hi. We''re running OmniOS as a ZFS storage server. For some reason, our > arc cache will grow to a certain point, then suddenly drops. I used > arcstat to catch it in action, but I was not able to capture what else > was going on in the system at the time. I''ll do that next. > > read hits miss hit% l2read l2hits l2miss l2hit% arcsz l2size > 166 166 0 100 0 0 0 0 85G 225G > 5.9K 5.9K 0 100 0 0 0 0 85G 225G > 755 715 40 94 40 0 40 0 84G 225G > 17K 17K 0 100 0 0 0 0 67G 225G > 409 395 14 96 14 0 14 0 49G 225G > 388 364 24 93 24 0 24 0 41G 225G > 37K 37K 20 99 20 6 14 30 40G 225G > > For reference, it''s a 12TB pool with 512GB SSD L2 ARC and 198GB RAM. > We have nothing else running on the system except NFS. We are also not > using dedupe. Here is the output of memstat at one point: > > # echo ::memstat | mdb -k > Page Summary Pages MB %Tot > ------------ ---------------- ---------------- ---- > Kernel 19061902 74460 38% > ZFS File Data 28237282 110301 56% > Anon 43112 168 0% > Exec and libs 1522 5 0% > Page cache 13509 52 0% > Free (cachelist) 6366 24 0% > Free (freelist) 2958527 11556 6% > > Total 50322220 196571 > Physical 50322219 196571 > > According to "prstat -s rss" nothing else is consuming the memory. > > 592 root 33M 26M sleep 59 0 0:00:33 0.0% fmd/27 > 12 root 13M 11M sleep 59 0 0:00:08 0.0% > svc.configd/21 > 641 root 12M 11M sleep 59 0 0:04:48 0.0% snmpd/1 > 10 root 14M 10M sleep 59 0 0:00:03 0.0% > svc.startd/16 > 342 root 12M 9084K sleep 59 0 0:00:15 0.0% hald/5 > 321 root 14M 8652K sleep 59 0 0:03:00 0.0% nscd/52 > > So far I can''t figure out what could be causing this. The only other > thing I can think of is that we have a bunch of zfs send/receive > operations going on as backups across 10 datasets in the pool. I am > not sure how snapshots and send/receive affect the arc. Does anyone > else have any ideas? > > Thanks, > Chris > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 22 October, 2012 - Robert Milkowski sent me these 3,6K bytes:> Hi, > > If after it decreases in size it stays there it might be similar to: > > 7111576 arc shrinks in the absence of memory pressure > > Also, see document: > > ZFS ARC can shrink down without memory pressure result in slow > performance [ID 1404581.1] > > Specifically, check if arc_no_grow is set to 1 after the cache size is > decreased, and if it stays that way. > > The fix is in one of the SRUs and I think it should be in 11.1 > I don''t know if it was fixed in Illumos or even if Illumos was affected by > this at all.The code that affects bug 7111576 was introduced between s10 and s11. /Tomas -- Tomas Forsman, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
> If after it decreases in size it stays there it might be similar to: > > 7111576 arc shrinks in the absence of memory pressureAfter it dropped, it did build back up. Today is the first day that these servers are working under real production load and it is looking much better. arcstat is showing some nice numbers for arc, but l2 is still building. read hits miss hit% l2read l2hits l2miss l2hit% arcsz l2size 19K 17K 2.5K 87 2.5K 490 2.0K 19 148G 371G 41K 39K 2.3K 94 2.3K 184 2.1K 7 148G 371G 34K 34K 694 98 694 17 677 2 148G 371G 16K 15K 1.0K 93 1.0K 16 1.0K 1 148G 371G 39K 36K 2.3K 94 2.3K 20 2.3K 0 148G 371G 23K 22K 746 96 746 76 670 10 148G 371G 49K 47K 1.7K 96 1.7K 249 1.5K 14 148G 371G 23K 21K 1.4K 93 1.4K 38 1.4K 2 148G 371G My only guess is that the large zfs send / recv streams were affecting the cache when they started and finished. Thanks for the responses and help. Chris
On Oct 22, 2012, at 6:52 AM, Chris Nagele <nagele at wildbit.com> wrote:>> If after it decreases in size it stays there it might be similar to: >> >> 7111576 arc shrinks in the absence of memory pressure > > After it dropped, it did build back up. Today is the first day that > these servers are working under real production load and it is looking > much better. arcstat is showing some nice numbers for arc, but l2 is > still building. > > read hits miss hit% l2read l2hits l2miss l2hit% arcsz l2size > 19K 17K 2.5K 87 2.5K 490 2.0K 19 148G 371G > 41K 39K 2.3K 94 2.3K 184 2.1K 7 148G 371G > 34K 34K 694 98 694 17 677 2 148G 371G > 16K 15K 1.0K 93 1.0K 16 1.0K 1 148G 371G > 39K 36K 2.3K 94 2.3K 20 2.3K 0 148G 371G > 23K 22K 746 96 746 76 670 10 148G 371G > 49K 47K 1.7K 96 1.7K 249 1.5K 14 148G 371G > 23K 21K 1.4K 93 1.4K 38 1.4K 2 148G 371G > > My only guess is that the large zfs send / recv streams were affecting > the cache when they started and finished.There are other cases where data is evicted from the ARC, though I don''t have a complete list at my fingertips. For example, if a zvol is closed, then the data for the zvol is evicted. -- richard> > Thanks for the responses and help. > > Chris > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121022/a7d36535/attachment-0001.html>