Peter Pickford
2009-Nov-06 01:09 UTC
[zfs-discuss] Problem with memory recovery from arc cache
Hi All, Has anyone seen problems with the arc cache holding on to memory in memory pressure conditions? We have several Oracle DB servers running zfs for the root file systems and the databases on vxfs. An unexpected number of clients connected and cause a memory shortage such that some processes were swapped out. The system recovered partially with around 1G free however the arch cache was still around 9-10g. It appears that the arc cache didn''t dump memory as fast as it was recovered from processes etc. As a workaround we have limited the max_arc_cache to 2G. Shouldn''t the arc_cache be recovered in preference to active process memory? Having to competing systems recovering memory does not make sense to me and seems to result in a strange situation with memory shortages and a arc large cache. Also would it be better if the min_arc_cache was based on the size of zfs file systems rather than a percentage of total memory? 3or 4G minimums seem huge! Thanks Peter
Richard Elling
2009-Nov-06 19:02 UTC
[zfs-discuss] Problem with memory recovery from arc cache
On Nov 5, 2009, at 5:09 PM, Peter Pickford wrote:> Hi All, > > Has anyone seen problems with the arc cache holding on to memory in > memory pressure conditions?The ARC can also contain uncommitted data, which obviously can''t be evicted until it is committed. However, that tends to be a very small amount of data, especially if it is just used for root.> We have several Oracle DB servers running zfs for the root file > systems and the databases on vxfs.It seems odd that root would use 9-10 GB of memory for the ARC. Are you sure there is not something else going on, or the configuration is different than you expect?> An unexpected number of clients connected and cause a memory shortage > such that some processes were swapped out. > > The system recovered partially with around 1G free however the arch > cache was still around 9-10g.How was this measured?> It appears that the arc cache didn''t dump memory as fast as it was > recovered from processes etc. > > As a workaround we have limited the max_arc_cache to 2G. > > Shouldn''t the arc_cache be recovered in preference to active process > memory?Yes, it does. However, there can be seemingly odd behaviour when a sudden, large memory shortfall occurs, due to the multithreaded nature of Solaris.> Having to competing systems recovering memory does not make sense to > me and seems to result in a strange situation with memory shortages > and a arc large cache. > > Also would it be better if the min_arc_cache was based on the size of > zfs file systems rather than a percentage of total memory? > 3or 4G minimums seem huge!The min is set at boot, likely before the ZFS file systems are imported (bootstrap issues notwithstanding). It can be hardwired if needed. -- richard
Peter Pickford
2009-Nov-06 23:47 UTC
[zfs-discuss] Problem with memory recovery from arc cache
Hi Richard, Thanks for you help looking at this. How can I find out how much uncommitted data there is in the arc cache? What other things could be going on and what configuration do you think I should look at? [root at cad2updb007 ~]# zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 43.3G 90.5G 94K /rpool rpool/ROOT 14.8G 90.5G 18K legacy rpool/ROOT/zfs 14.8G 90.5G 9.48G / rpool/ROOT/zfs/var 5.37G 90.5G 5.37G /var rpool/app 2.66G 90.5G 2.66G /opt/app rpool/core 1.51G 90.5G 1.51G /var/core rpool/crash 22K 90.5G 22K /var/crash rpool/dump 8.03G 90.5G 8.03G - rpool/export 111M 90.5G 20K /export rpool/export/home 111M 90.5G 111M /export/home rpool/swap 16.2G 90.5G 16.2G - [root at cad2updb007 ~]# df -h -F zfs Filesystem size used avail capacity Mounted on rpool/ROOT/zfs 134G 9.5G 91G 10% / rpool/ROOT/zfs/var 134G 5.4G 91G 6% /var rpool/export 134G 20K 91G 1% /export rpool/export/home 134G 111M 91G 1% /export/home rpool/app 134G 2.7G 91G 3% /opt/app rpool 134G 94K 91G 1% /rpool rpool/core 134G 1.5G 91G 2% /var/core rpool/crash 134G 22K 91G 1% /var/crash [root at cad2updb007 ~]# df -k -F zfs |nawk ''{tot=tot+$3}END {print tot/1024/1024}'' 19.0784 currently with no database running [root at cad2updb007 ~]# /net/imageserver/install/misc/bin/arc_summary.pl |grep ''Current Size'' Current Size: 15682 MB (arcsize) thats one big cache at 80% of allocated space great if memory is not needed for anything else The arc cache was observed with arc summary.pl which I believe uses kstats Free memory was observed with top, mdb ::memstat and CAT via an online savecore which is unfortunately not fully self consistent. The server was confirmed to be recovering from a memory shortage condition by sun and interactive performance was very sluggish. CATS meminfo reports pages_locked 23.3G I assume a huge chunk of this is arc cache total locked shared memory 7.81G tmpfs is only around 120M for all file systems only 23M is on the swap device CAT reported 4 therads swapped (due to not being able to fully traverse the thread list) vmstat reported 98 threads waiting for memory currently it reports 64 threads waiting for memory with no db running so I guess they are not used much :) I''m trying to work out what happened If I have mis-configured something please let me know where to take a look but I cant think what there is that should have been tweaked and sun recommend not to tune zfs. I take you point about solaris being multi threaded I wonder if that may be the same as saying that conventional memory recovery, arc recovery are on separate threads. I also wonder if it would be better if they were more aware of each other. I''m sure that is severe memory shortages persist then the arc cache will come down to its minimum but in these circumstances it appears not to have and the server is not stabilizing even though there is oodles of memory in the arc cache. kernel time remained high perhaps because of memory being constantly being freed and re-referenced. How do I look at this further or set up a good test case for this scenario without actively running a huge database? I still have the server running so I could look at the contents of the arc cache, given the knowlage, but the db has had to be moved to another machine. Iit would be good to know how much of the arc cache is locked for instance. I don''t mind the arc cache being huge that''s great if there''s free memory but it not reducing enough and sufficiently quickly to allow the machine to function without soft paging seems a bit odd. Thanks Peter 2009/11/6 Richard Elling <richard.elling at gmail.com>:> On Nov 5, 2009, at 5:09 PM, Peter Pickford wrote: >> >> Hi All, >> >> Has anyone seen problems with the arc cache holding on to memory in >> memory pressure conditions? > > The ARC can also contain uncommitted data, which obviously > can''t be evicted until it is committed. However, that tends to be a > very small amount of data, especially if it is just used for root. > >> We have several Oracle DB servers running zfs for the root file >> systems and the databases on vxfs. > > It seems odd that root would use 9-10 GB of memory for the ARC. > Are you sure there is not something else going on, or the configuration > is different than you expect? > >> An unexpected number of clients connected and cause a memory shortage >> such that some processes were swapped out. >> >> The system recovered partially with around 1G free however the arch >> cache was still around 9-10g. > > How was this measured? > >> It appears that the arc cache didn''t dump memory as fast as it was >> recovered from processes etc. >> >> As a workaround we have limited the max_arc_cache to 2G. >> >> Shouldn''t the arc_cache be recovered in preference to active process >> memory? > > Yes, it does. However, there can be seemingly odd behaviour when a > sudden, large memory shortfall occurs, due to the multithreaded nature > of Solaris. > >> Having to competing systems recovering memory does not make sense to >> me and seems to result in a strange situation with memory shortages >> and a arc large cache. >> >> Also would it be better if the min_arc_cache was based on the size of >> zfs file systems rather than a percentage of total memory? >> 3or 4G minimums seem huge! > > The min is set at boot, likely before the ZFS file systems are imported > (bootstrap issues notwithstanding). It can be hardwired if needed. > ?-- richard > >
Richard Elling
2009-Nov-07 00:19 UTC
[zfs-discuss] Problem with memory recovery from arc cache
On Nov 6, 2009, at 3:47 PM, Peter Pickford wrote:> Hi Richard, > > Thanks for you help looking at this. > > How can I find out how much uncommitted data there is in the arc > cache?This is not easy because it is constantly changing and commits occur every 30 seconds or so. I think your efforts are better spent looking at more traditional memory usage issues.> What other things could be going on and what configuration do you > think I should look at? > > [root at cad2updb007 ~]# zfs list > NAME USED AVAIL REFER MOUNTPOINT > rpool 43.3G 90.5G 94K /rpool > rpool/ROOT 14.8G 90.5G 18K legacy > rpool/ROOT/zfs 14.8G 90.5G 9.48G / > rpool/ROOT/zfs/var 5.37G 90.5G 5.37G /var > rpool/app 2.66G 90.5G 2.66G /opt/app > rpool/core 1.51G 90.5G 1.51G /var/core > rpool/crash 22K 90.5G 22K /var/crash > rpool/dump 8.03G 90.5G 8.03G - > rpool/export 111M 90.5G 20K /export > rpool/export/home 111M 90.5G 111M /export/home > rpool/swap 16.2G 90.5G 16.2G - > [root at cad2updb007 ~]# df -h -F zfs > Filesystem size used avail capacity Mounted on > rpool/ROOT/zfs 134G 9.5G 91G 10% / > rpool/ROOT/zfs/var 134G 5.4G 91G 6% /var > rpool/export 134G 20K 91G 1% /export > rpool/export/home 134G 111M 91G 1% /export/home > rpool/app 134G 2.7G 91G 3% /opt/app > rpool 134G 94K 91G 1% /rpool > rpool/core 134G 1.5G 91G 2% /var/core > rpool/crash 134G 22K 91G 1% /var/crash > [root at cad2updb007 ~]# df -k -F zfs |nawk ''{tot=tot+$3}END {print tot/ > 1024/1024}'' > 19.0784 > currently with no database running > [root at cad2updb007 ~]# /net/imageserver/install/misc/bin/arc_summary.pl > |grep ''Current Size'' > Current Size: 15682 MB (arcsize) > thats one big cache at 80% of allocated space > great if memory is not needed for anything else > > The arc cache was observed with arc summary.pl which I believe uses > kstats > Free memory was observed with top, mdb ::memstat and CAT via an online > savecore which is unfortunately not fully self consistent. > > The server was confirmed to be recovering from a memory shortage > condition by sun and interactive performance was very sluggish. > > CATS meminfo reports > pages_locked 23.3G I assume a huge chunk of this is arc cacheI don''t think this is a good assumption. Current size of the arc is represented in the kstats as: c. kstat -n arcstats -s c> total locked shared memory 7.81G > tmpfs is only around 120M for all file systems > > only 23M is on the swap device > CAT reported 4 therads swapped (due to not being able to fully > traverse the thread list) > vmstat reported 98 threads waiting for memory > currently it reports 64 threads waiting for memory with no db running > so I guess they are not used much :) > > I''m trying to work out what happened If I have mis-configured > something please let me know where to take a look but I cant think > what there is that should have been tweaked and sun recommend not to > tune zfs. > > I take you point about solaris being multi threaded I wonder if that > may be the same as saying that conventional memory recovery, arc > recovery are on separate threads. I also wonder if it would be better > if they were more aware of each other. > > I''m sure that is severe memory shortages persist then the arc cache > will come down to its minimum but in these circumstances it appears > not to have and the server is not stabilizing even though there is > oodles of memory in the arc cache. > > kernel time remained high perhaps because of memory being constantly > being freed and re-referenced.Did the database restart at least once since boot? If so, then you may be seeing large page stealing instead. It will behave similarly to a memory shortfall, with lots of time spent managing memory, but the cause is not a shortage of memory, but a shortage of large pages. This effect is one of the (few) good reasons for capping the ARC size.> How do I look at this further or set up a good test case for this > scenario without actively running a huge database?It will be more fruitful to examine the database system under load.> I still have the server running so I could look at the contents of the > arc cache, given the knowlage, but the db has had to be moved to > another machine. Iit would be good to know how much of the arc cache > is locked for instance.It is much easier to examine the resources consumed by the database. This can be a deep discussion, so I''ll point you to Allan Packer''s excellent book and website: http://www.solarisdatabases.com/ If you want to talk more, perhaps we should move the conversation off of the alias? -- richard> I don''t mind the arc cache being huge that''s great if there''s free > memory but it not reducing enough and sufficiently quickly to allow > the machine to function without soft paging seems a bit odd. > > Thanks > > Peter > > 2009/11/6 Richard Elling <richard.elling at gmail.com>: >> On Nov 5, 2009, at 5:09 PM, Peter Pickford wrote: >>> >>> Hi All, >>> >>> Has anyone seen problems with the arc cache holding on to memory in >>> memory pressure conditions? >> >> The ARC can also contain uncommitted data, which obviously >> can''t be evicted until it is committed. However, that tends to be a >> very small amount of data, especially if it is just used for root. >> >>> We have several Oracle DB servers running zfs for the root file >>> systems and the databases on vxfs. >> >> It seems odd that root would use 9-10 GB of memory for the ARC. >> Are you sure there is not something else going on, or the >> configuration >> is different than you expect? >> >>> An unexpected number of clients connected and cause a memory >>> shortage >>> such that some processes were swapped out. >>> >>> The system recovered partially with around 1G free however the arch >>> cache was still around 9-10g. >> >> How was this measured? >> >>> It appears that the arc cache didn''t dump memory as fast as it was >>> recovered from processes etc. >>> >>> As a workaround we have limited the max_arc_cache to 2G. >>> >>> Shouldn''t the arc_cache be recovered in preference to active process >>> memory? >> >> Yes, it does. However, there can be seemingly odd behaviour when a >> sudden, large memory shortfall occurs, due to the multithreaded >> nature >> of Solaris. >> >>> Having to competing systems recovering memory does not make sense to >>> me and seems to result in a strange situation with memory shortages >>> and a arc large cache. >>> >>> Also would it be better if the min_arc_cache was based on the size >>> of >>> zfs file systems rather than a percentage of total memory? >>> 3or 4G minimums seem huge! >> >> The min is set at boot, likely before the ZFS file systems are >> imported >> (bootstrap issues notwithstanding). It can be hardwired if needed. >> -- richard >> >>