thr3ads.net - zfs discuss - [zfs-discuss] Problem with memory recovery from arc cache [Nov 2009]

If this information is useful, please help other people find it:
Share via:

Peter Pickford

2009-Nov-06 01:09 UTC

[zfs-discuss] Problem with memory recovery from arc cache

Hi All,

Has anyone seen problems with the arc cache holding on to memory in
memory pressure conditions?

We have several Oracle DB servers running zfs for the root file
systems and the databases on vxfs.

An unexpected number of clients connected and cause a memory shortage
such that some processes were swapped out.

The system recovered partially with around 1G free however the arch
cache was still around 9-10g.

It appears that the arc cache didn''t dump memory as fast as it was
recovered from processes etc.

As a workaround we have limited the max_arc_cache to 2G.

Shouldn''t the arc_cache be recovered in preference to active process
memory?
Having to competing systems recovering memory does not make sense to
me and seems to result in a strange situation with memory shortages
and a arc large cache.

Also would it be better if the min_arc_cache was based on the size of
zfs file systems rather than a percentage of total memory?
3or 4G minimums seem huge!

Thanks

Peter

Richard Elling

2009-Nov-06 19:02 UTC

head link

[zfs-discuss] Problem with memory recovery from arc cache

On Nov 5, 2009, at 5:09 PM, Peter Pickford wrote:> Hi All,
>
> Has anyone seen problems with the arc cache holding on to memory in
> memory pressure conditions?
The ARC can also contain uncommitted data, which obviously
can''t be evicted until it is committed. However, that tends to be a
very small amount of data, especially if it is just used for root.
> We have several Oracle DB servers running zfs for the root file
> systems and the databases on vxfs.
It seems odd that root would use 9-10 GB of memory for the ARC.
Are you sure there is not something else going on, or the configuration
is different than you expect?
> An unexpected number of clients connected and cause a memory shortage
> such that some processes were swapped out.
>
> The system recovered partially with around 1G free however the arch
> cache was still around 9-10g.
How was this measured?
> It appears that the arc cache didn''t dump memory as fast as it was
> recovered from processes etc.
>
> As a workaround we have limited the max_arc_cache to 2G.
>
> Shouldn''t the arc_cache be recovered in preference to active
process
> memory?
Yes, it does. However, there can be seemingly odd behaviour when a
sudden, large memory shortfall occurs, due to the multithreaded nature
of Solaris.
> Having to competing systems recovering memory does not make sense to
> me and seems to result in a strange situation with memory shortages
> and a arc large cache.
>
> Also would it be better if the min_arc_cache was based on the size of
> zfs file systems rather than a percentage of total memory?
> 3or 4G minimums seem huge!
The min is set at boot, likely before the ZFS file systems are imported
(bootstrap issues notwithstanding). It can be hardwired if needed.
  -- richard

Peter Pickford

2009-Nov-06 23:47 UTC

head link

[zfs-discuss] Problem with memory recovery from arc cache

Hi Richard,

Thanks for you help looking at this.

How can I find out how much uncommitted data there is in the arc cache?

What other things could be going on and what configuration do you
think I should look at?

[root at cad2updb007 ~]# zfs list
NAME                 USED  AVAIL  REFER  MOUNTPOINT
rpool               43.3G  90.5G    94K  /rpool
rpool/ROOT          14.8G  90.5G    18K  legacy
rpool/ROOT/zfs      14.8G  90.5G  9.48G  /
rpool/ROOT/zfs/var  5.37G  90.5G  5.37G  /var
rpool/app           2.66G  90.5G  2.66G  /opt/app
rpool/core          1.51G  90.5G  1.51G  /var/core
rpool/crash           22K  90.5G    22K  /var/crash
rpool/dump          8.03G  90.5G  8.03G  -
rpool/export         111M  90.5G    20K  /export
rpool/export/home    111M  90.5G   111M  /export/home
rpool/swap          16.2G  90.5G  16.2G  -
[root at cad2updb007 ~]# df -h -F zfs
Filesystem             size   used  avail capacity  Mounted on
rpool/ROOT/zfs         134G   9.5G    91G    10%    /
rpool/ROOT/zfs/var     134G   5.4G    91G     6%    /var
rpool/export           134G    20K    91G     1%    /export
rpool/export/home      134G   111M    91G     1%    /export/home
rpool/app              134G   2.7G    91G     3%    /opt/app
rpool                  134G    94K    91G     1%    /rpool
rpool/core             134G   1.5G    91G     2%    /var/core
rpool/crash            134G    22K    91G     1%    /var/crash
[root at cad2updb007 ~]# df -k -F zfs |nawk ''{tot=tot+$3}END {print
tot/1024/1024}''
19.0784
currently with no database running
[root at cad2updb007 ~]# /net/imageserver/install/misc/bin/arc_summary.pl
|grep ''Current Size''
         Current Size:             15682 MB (arcsize)
thats one big cache at 80% of allocated space
great if memory is not needed for anything else

The arc cache was observed with arc summary.pl which I believe uses kstats
Free memory was observed with top, mdb ::memstat and CAT via an online
savecore which is unfortunately not fully self consistent.

The server was confirmed to be recovering from a memory shortage
condition by sun and interactive performance was very sluggish.

CATS meminfo reports
pages_locked 23.3G I assume a huge chunk of this is arc cache
total locked shared memory 7.81G
tmpfs is only around 120M for all file systems

only 23M is on the swap device
CAT reported 4 therads swapped (due to not being able to fully
traverse the thread list)
vmstat reported 98 threads waiting for memory
currently it reports 64 threads waiting for memory with no db running
so I guess they are not used much :)

I''m trying to work out what happened If I have mis-configured
something please let me know where to take a look but I cant think
what there is that should have been tweaked and sun recommend not to
tune zfs.

I take you point about solaris being multi threaded I wonder if that
may be the same as saying that conventional memory recovery, arc
recovery are on separate threads. I also wonder if it would be better
if they were more aware of each other.

I''m sure that is severe memory shortages persist then the arc cache
will come down to its minimum but in these circumstances it appears
not to have and the server is not stabilizing even though there is
oodles of memory in the arc cache.

kernel time remained high perhaps because of memory being constantly
being freed and re-referenced.

How do I look at this further or set up a good test case for this
scenario without actively running a huge database?

I still have the server running so I could look at the contents of the
arc cache, given the knowlage, but the db has had to be moved to
another machine. Iit would be good to know how much of the arc cache
is locked for instance.

I don''t mind the arc cache being huge that''s great if
there''s free
memory but it not reducing enough and sufficiently quickly to allow
the machine to function without soft paging seems a bit odd.

Thanks

Peter

2009/11/6 Richard Elling <richard.elling at
gmail.com>:> On Nov 5, 2009, at 5:09 PM, Peter Pickford wrote:
>>
>> Hi All,
>>
>> Has anyone seen problems with the arc cache holding on to memory in
>> memory pressure conditions?
>
> The ARC can also contain uncommitted data, which obviously
> can''t be evicted until it is committed. However, that tends to be
a
> very small amount of data, especially if it is just used for root.
>
>> We have several Oracle DB servers running zfs for the root file
>> systems and the databases on vxfs.
>
> It seems odd that root would use 9-10 GB of memory for the ARC.
> Are you sure there is not something else going on, or the configuration
> is different than you expect?
>
>> An unexpected number of clients connected and cause a memory shortage
>> such that some processes were swapped out.
>>
>> The system recovered partially with around 1G free however the arch
>> cache was still around 9-10g.
>
> How was this measured?
>
>> It appears that the arc cache didn''t dump memory as fast as it
was
>> recovered from processes etc.
>>
>> As a workaround we have limited the max_arc_cache to 2G.
>>
>> Shouldn''t the arc_cache be recovered in preference to active
process
>> memory?
>
> Yes, it does. However, there can be seemingly odd behaviour when a
> sudden, large memory shortfall occurs, due to the multithreaded nature
> of Solaris.
>
>> Having to competing systems recovering memory does not make sense to
>> me and seems to result in a strange situation with memory shortages
>> and a arc large cache.
>>
>> Also would it be better if the min_arc_cache was based on the size of
>> zfs file systems rather than a percentage of total memory?
>> 3or 4G minimums seem huge!
>
> The min is set at boot, likely before the ZFS file systems are imported
> (bootstrap issues notwithstanding). It can be hardwired if needed.
> ?-- richard
>
>

Richard Elling

2009-Nov-07 00:19 UTC

head link

[zfs-discuss] Problem with memory recovery from arc cache

On Nov 6, 2009, at 3:47 PM, Peter Pickford wrote:
> Hi Richard,
>
> Thanks for you help looking at this.
>
> How can I find out how much uncommitted data there is in the arc  
> cache?
This is not easy because it is constantly changing and
commits occur every 30 seconds or so. I think your efforts
are better spent looking at more traditional memory usage
issues.
> What other things could be going on and what configuration do you
> think I should look at?
>
> [root at cad2updb007 ~]# zfs list
> NAME                 USED  AVAIL  REFER  MOUNTPOINT
> rpool               43.3G  90.5G    94K  /rpool
> rpool/ROOT          14.8G  90.5G    18K  legacy
> rpool/ROOT/zfs      14.8G  90.5G  9.48G  /
> rpool/ROOT/zfs/var  5.37G  90.5G  5.37G  /var
> rpool/app           2.66G  90.5G  2.66G  /opt/app
> rpool/core          1.51G  90.5G  1.51G  /var/core
> rpool/crash           22K  90.5G    22K  /var/crash
> rpool/dump          8.03G  90.5G  8.03G  -
> rpool/export         111M  90.5G    20K  /export
> rpool/export/home    111M  90.5G   111M  /export/home
> rpool/swap          16.2G  90.5G  16.2G  -
> [root at cad2updb007 ~]# df -h -F zfs
> Filesystem             size   used  avail capacity  Mounted on
> rpool/ROOT/zfs         134G   9.5G    91G    10%    /
> rpool/ROOT/zfs/var     134G   5.4G    91G     6%    /var
> rpool/export           134G    20K    91G     1%    /export
> rpool/export/home      134G   111M    91G     1%    /export/home
> rpool/app              134G   2.7G    91G     3%    /opt/app
> rpool                  134G    94K    91G     1%    /rpool
> rpool/core             134G   1.5G    91G     2%    /var/core
> rpool/crash            134G    22K    91G     1%    /var/crash
> [root at cad2updb007 ~]# df -k -F zfs |nawk ''{tot=tot+$3}END
{print tot/
> 1024/1024}''
> 19.0784
> currently with no database running
> [root at cad2updb007 ~]# /net/imageserver/install/misc/bin/arc_summary.pl
> |grep ''Current Size''
>         Current Size:             15682 MB (arcsize)
> thats one big cache at 80% of allocated space
> great if memory is not needed for anything else
>
> The arc cache was observed with arc summary.pl which I believe uses  
> kstats
> Free memory was observed with top, mdb ::memstat and CAT via an online
> savecore which is unfortunately not fully self consistent.
>
> The server was confirmed to be recovering from a memory shortage
> condition by sun and interactive performance was very sluggish.
>
> CATS meminfo reports
> pages_locked 23.3G I assume a huge chunk of this is arc cache
I don''t think this is a good assumption.  Current size of the
arc is represented in the kstats as: c.
	kstat -n arcstats -s c
> total locked shared memory 7.81G
> tmpfs is only around 120M for all file systems
>
> only 23M is on the swap device
> CAT reported 4 therads swapped (due to not being able to fully
> traverse the thread list)
> vmstat reported 98 threads waiting for memory
> currently it reports 64 threads waiting for memory with no db running
> so I guess they are not used much :)
>
> I''m trying to work out what happened If I have mis-configured
> something please let me know where to take a look but I cant think
> what there is that should have been tweaked and sun recommend not to
> tune zfs.
>
> I take you point about solaris being multi threaded I wonder if that
> may be the same as saying that conventional memory recovery, arc
> recovery are on separate threads. I also wonder if it would be better
> if they were more aware of each other.
>
> I''m sure that is severe memory shortages persist then the arc
cache
> will come down to its minimum but in these circumstances it appears
> not to have and the server is not stabilizing even though there is
> oodles of memory in the arc cache.
>
> kernel time remained high perhaps because of memory being constantly
> being freed and re-referenced.
Did the database restart at least once since boot?
If so, then you may be seeing large page stealing instead.
It will behave similarly to a memory shortfall, with lots of
time spent managing memory, but the cause is not a shortage
of memory, but a shortage of large pages. This effect is one of
the (few) good reasons for capping the ARC size.
> How do I look at this further or set up a good test case for this
> scenario without actively running a huge database?
It will be more fruitful to examine the database system under
load.
> I still have the server running so I could look at the contents of the
> arc cache, given the knowlage, but the db has had to be moved to
> another machine. Iit would be good to know how much of the arc cache
> is locked for instance.
It is much easier to examine the resources consumed by the
database. This can be a deep discussion, so I''ll point you to
Allan Packer''s excellent book and website:
http://www.solarisdatabases.com/

If you want to talk more, perhaps we should move the conversation
off of the alias?
  -- richard
> I don''t mind the arc cache being huge that''s great if
there''s free
> memory but it not reducing enough and sufficiently quickly to allow
> the machine to function without soft paging seems a bit odd.
>
> Thanks
>
> Peter
>
> 2009/11/6 Richard Elling <richard.elling at gmail.com>:
>> On Nov 5, 2009, at 5:09 PM, Peter Pickford wrote:
>>>
>>> Hi All,
>>>
>>> Has anyone seen problems with the arc cache holding on to memory in
>>> memory pressure conditions?
>>
>> The ARC can also contain uncommitted data, which obviously
>> can''t be evicted until it is committed. However, that tends to
be a
>> very small amount of data, especially if it is just used for root.
>>
>>> We have several Oracle DB servers running zfs for the root file
>>> systems and the databases on vxfs.
>>
>> It seems odd that root would use 9-10 GB of memory for the ARC.
>> Are you sure there is not something else going on, or the  
>> configuration
>> is different than you expect?
>>
>>> An unexpected number of clients connected and cause a memory  
>>> shortage
>>> such that some processes were swapped out.
>>>
>>> The system recovered partially with around 1G free however the arch
>>> cache was still around 9-10g.
>>
>> How was this measured?
>>
>>> It appears that the arc cache didn''t dump memory as fast
as it was
>>> recovered from processes etc.
>>>
>>> As a workaround we have limited the max_arc_cache to 2G.
>>>
>>> Shouldn''t the arc_cache be recovered in preference to
active process
>>> memory?
>>
>> Yes, it does. However, there can be seemingly odd behaviour when a
>> sudden, large memory shortfall occurs, due to the multithreaded  
>> nature
>> of Solaris.
>>
>>> Having to competing systems recovering memory does not make sense
to
>>> me and seems to result in a strange situation with memory shortages
>>> and a arc large cache.
>>>
>>> Also would it be better if the min_arc_cache was based on the size
>>> of
>>> zfs file systems rather than a percentage of total memory?
>>> 3or 4G minimums seem huge!
>>
>> The min is set at boot, likely before the ZFS file systems are  
>> imported
>> (bootstrap issues notwithstanding). It can be hardwired if needed.
>>  -- richard
>>
>>

zfs discuss - Nov 2009 - Problem with memory recovery from arc cache

[zfs-discuss] Problem with memory recovery from arc cache

[zfs-discuss] Problem with memory recovery from arc cache

[zfs-discuss] Problem with memory recovery from arc cache

[zfs-discuss] Problem with memory recovery from arc cache