Andrea Rucks
2009-Jun-04 22:52 UTC
[Lustre-discuss] lctl set_param /proc/fs/lustre/llite/lusfs0*/max_cached_mb ???
Hi there, I''m experiencing some issues with Lustre and WebSphere Portal Server 6.1 (WPS) co-existing on the Lustre client application server. WPS likes to use a lot of memory. The server was originally allocated 16 GB of RAM. These servers are XEN virtualized on RHEL 5.3 running Lustre 1.6.7 patched 16895 (but not fully 1.6.7.1). What I''m seeing is that WPS eventually takes all 15.5 GB of available memory (or tries to) and then my server will hang and show an out of memory error on the console: Call Trace: [<ffffffff802bc998>] out_of_memory+0x8b/0x203 [<ffffffff8020f657>] __alloc_pages+0x245/0x2ce [<ffffffff8021336e>] __do_page_cache_readahead+0xd0/0x21c [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14 [<ffffffff8023efff>] lock_timer_base+0x1b/0x3c [<ffffffff88081d4d>] :dm_mod:dm_any_congested+0x38/0x3f [<ffffffff80213c47>] filemap_nopage+0x148/0x322 [<ffffffff80208db9>] __handle_mm_fault+0x440/0x11f6 [<ffffffff802666ef>] do_page_fault+0xf7b/0x12e0 [<ffffffff80207141>] kmem_cache_free+0x80/0xd3 [<ffffffff8025f82b>] error_exit+0x0/0x6e DMA per-cpu: cpu 0 hot: high 186, batch 31 used:73 cpu 0 cold: high 62, batch 15 used:61 cpu 1 hot: high 186, batch 31 used:164 cpu 1 cold: high 62, batch 15 used:59 DMA32 per-cpu: empty Normal per-cpu: empty HighMem per-cpu: empty Free pages: 6040kB (0kB HighMem) Active:2087085 inactive:1984455 dirty:0 writeback:0 unstable:0 free:1510 slab:9371 mapped-file:992 mapped-anon:4054270 pagetables:11073 DMA free:6040kB min:16384kB low:20480kB high:24576kB active:8348340kB inactive:7937820kB present:16785408kB pages_scanned:375028407 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 0*4kB 3*8kB 0*16kB 0*32kB 0*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 6040kB DMA32: empty Normal: empty HighMem: empty 17533 pagecache pages Swap cache: add 6497852, delete 6497594, find 1492942/1923882, race 4+83 Free swap = 0kB Total swap = 4194296kB uptimeagent invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 I have 6 filesystems of varying size (1.5 TB - 4 GB), and we use Lustre to share them amongst the WebSphere cluster; our use of Lustre is commercial in nature, non-HPC and uses legacy filesystem structures (still uses Linux HA though). If we stop WPS as it begins chewing through RAM, we still see a lot of memory in use (Lustre client cache). As I unmount each Lustre filesystem, I gain back a significant portion of memory (about 7 GB back total). For grins, we ripple stopped each WPS server, adjusted the maxmem XEN value and gave each server an additional 6 GB of RAM for a total of 22 GB. I''d like to now limit the Lustre clients to the following, but I''m not sure if doing so will mess things up: lctl set_param /proc/fs/lustre/llite/lusfs01*/max_cached_mb 2048 # Lustre Default is 12288 lctl set_param /proc/fs/lustre/llite/lusfs02*/max_cached_mb 2048 # Lustre Default is 12288 lctl set_param /proc/fs/lustre/llite/lusfs03*/max_cached_mb 1048 # Lustre Default is 12288 lctl set_param /proc/fs/lustre/llite/lusfs04*/max_cached_mb 1048 # Lustre Default is 12288 lctl set_param /proc/fs/lustre/llite/lusfs05*/max_cached_mb 1048 # Lustre Default is 12288 lctl set_param /proc/fs/lustre/llite/lusfs06*/max_cached_mb 512 # Lustre Default is 12288 So, here are my questions. Why is 75% the default for max_cached_mb? What will happen if I "lctl set_param /proc/fs/lustre/llite/lusfs01*/max_cached_mb 2048" instead of 12288 where it is today for each of those filesystems mentioned above? How am I affecting the performance of the client by making that change? Is this a bad thing to do or no big deal? Some filesystems are more heavily used than others, should I give them more memory? Some filesystems have large files that I''m sure end up sitting up in memory, should I give them more memory? I know the lmld lru_size can be used to flush cache, but I don''t think that''s a wise thing to do, people might lose...locks (true/false?) on files they''re downloading or something, right? Is there another cache tunable where I can flush cached things that are two hours or more old, but leave the newer stuff (a max_cache_time parameter perhaps)? Cheers, Ms. Andrea D. Rucks Sr. Unix Systems Administrator, Lawson ITS Unix Server Team _____________________________ Lawson 380 St. Peter Street St. Paul, MN 55102 Tel: 651-767-6252 http://www.lawson.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090604/f11ba378/attachment-0001.html
Andrea Rucks
2009-Jun-10 15:37 UTC
[Lustre-discuss] lctl set_param /proc/fs/lustre/llite/lusfs0*/max_cached_mb ???
Hi there, I''m experiencing some issues with Lustre and WebSphere Portal Server 6.1 (WPS) co-existing on the Lustre client application server. WPS likes to use a lot of memory. The server was originally allocated 16 GB of RAM. These servers are XEN virtualized on RHEL 5.3 running Lustre 1.6.7 patched 16895 (but not fully 1.6.7.1). What I''m seeing is that WPS eventually takes all 15.5 GB of available memory (or tries to) and then my server will hang and show an out of memory error on the console: Call Trace: [<ffffffff802bc998>] out_of_memory+0x8b/0x203 [<ffffffff8020f657>] __alloc_pages+0x245/0x2ce [<ffffffff8021336e>] __do_page_cache_readahead+0xd0/0x21c [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14 [<ffffffff8023efff>] lock_timer_base+0x1b/0x3c [<ffffffff88081d4d>] :dm_mod:dm_any_congested+0x38/0x3f [<ffffffff80213c47>] filemap_nopage+0x148/0x322 [<ffffffff80208db9>] __handle_mm_fault+0x440/0x11f6 [<ffffffff802666ef>] do_page_fault+0xf7b/0x12e0 [<ffffffff80207141>] kmem_cache_free+0x80/0xd3 [<ffffffff8025f82b>] error_exit+0x0/0x6e DMA per-cpu: cpu 0 hot: high 186, batch 31 used:73 cpu 0 cold: high 62, batch 15 used:61 cpu 1 hot: high 186, batch 31 used:164 cpu 1 cold: high 62, batch 15 used:59 DMA32 per-cpu: empty Normal per-cpu: empty HighMem per-cpu: empty Free pages: 6040kB (0kB HighMem) Active:2087085 inactive:1984455 dirty:0 writeback:0 unstable:0 free:1510 slab:9371 mapped-file:992 mapped-anon:4054270 pagetables:11073 DMA free:6040kB min:16384kB low:20480kB high:24576kB active:8348340kB inactive:7937820kB present:16785408kB pages_scanned:375028407 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 0*4kB 3*8kB 0*16kB 0*32kB 0*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 6040kB DMA32: empty Normal: empty HighMem: empty 17533 pagecache pages Swap cache: add 6497852, delete 6497594, find 1492942/1923882, race 4+83 Free swap = 0kB Total swap = 4194296kB uptimeagent invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 I have 6 filesystems of varying size (1.5 TB - 4 GB), and we use Lustre to share them amongst the WebSphere cluster; our use of Lustre is commercial in nature, non-HPC and uses legacy filesystem structures (still uses Linux HA though). If we stop WPS as it begins chewing through RAM, we still see a lot of memory in use (Lustre client cache). As I unmount each Lustre filesystem, I gain back a significant portion of memory (about 7 GB back total). For grins, we ripple stopped each WPS server, adjusted the maxmem XEN value and gave each server an additional 6 GB of RAM for a total of 22 GB. I''d like to now limit the Lustre clients to the following, but I''m not sure if doing so will mess things up: lctl set_param /proc/fs/lustre/llite/lusfs01*/max_cached_mb 2048 # Lustre Default is 12288 lctl set_param /proc/fs/lustre/llite/lusfs02*/max_cached_mb 2048 # Lustre Default is 12288 lctl set_param /proc/fs/lustre/llite/lusfs03*/max_cached_mb 1048 # Lustre Default is 12288 lctl set_param /proc/fs/lustre/llite/lusfs04*/max_cached_mb 1048 # Lustre Default is 12288 lctl set_param /proc/fs/lustre/llite/lusfs05*/max_cached_mb 1048 # Lustre Default is 12288 lctl set_param /proc/fs/lustre/llite/lusfs06*/max_cached_mb 512 # Lustre Default is 12288 So, here are my questions. Why is 75% the default for max_cached_mb? What will happen if I "lctl set_param /proc/fs/lustre/llite/lusfs01*/max_cached_mb 2048" instead of 12288 where it is today for each of those filesystems mentioned above? How am I affecting the performance of the client by making that change? Is this a bad thing to do or no big deal? Some filesystems are more heavily used than others, should I give them more memory? Some filesystems have large files that I''m sure end up sitting up in memory, should I give them more memory? I know the lmld lru_size can be used to flush cache, but I don''t think that''s a wise thing to do, people might lose...locks (true/false?) on files they''re downloading or something, right? Is there another cache tunable where I can flush cached things that are two hours or more old, but leave the newer stuff (a max_cache_time parameter perhaps)? Cheers, Ms. Andrea D. Rucks Sr. Unix Systems Administrator, Lawson ITS Unix Server Team _____________________________ Lawson 380 St. Peter Street St. Paul, MN 55102 Tel: 651-767-6252 http://www.lawson.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090610/17f8ae9c/attachment-0001.html
Andreas Dilger
2009-Jun-11 00:10 UTC
[Lustre-discuss] lctl set_param /proc/fs/lustre/llite/lusfs0*/max_cached_mb ???
On Jun 04, 2009 17:52 -0500, Andrea Rucks wrote:> What I''m seeing is that WPS eventually takes all 15.5 GB of available > memory (or tries to) and then my server will hang and show an out of > memory error on the console: > > Call Trace: > [<ffffffff802bc998>] out_of_memory+0x8b/0x203 > Free pages: 6040kB (0kB HighMem) > active:8348340kB inactive:7937820kB present:16785408kBSo, about 8GB is just in cached memory, but is inactive so it should be able to be released under memory pressure.> pages_scanned:375028407 all_unreclaimable? yes > lowmem_reserve[]: 0 0 0 0 > DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB > inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no > inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no > active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? noThis _should_ mean that some pages are reclaimable, not sure why they are not.> If we stop WPS as it begins chewing through RAM, we still see a lot of > memory in use (Lustre client cache). As I unmount each Lustre filesystem, > I gain back a significant portion of memory (about 7 GB back total).That isn''t indicative of anything, because Linux/Lustre caches data that isn''t in use (inactive) in case it might be used later.> I''d like to now limit the Lustre clients to the following, but I''m not > sure if doing so will mess things up: > > lctl set_param /proc/fs/lustre/llite/lusfs01*/max_cached_mb 2048 # Lustre > Default is 12288Note: you can use "lctl set_param llite.*.max_cached_mb=2048" as a shortcut for this. Note that having many separate caches (i.e. multiple filesystems) is less efficient than a single large filesystem.> So, here are my questions. Why is 75% the default for max_cached_mb?Just a reasonable maximum amount of cached data. Something has to be kept available for application use.> What will happen if I "lctl set_param llite.*.max_cached_mb 2048" instead > of 12288 where it is today for each of those filesystems mentioned above?It should cap the cached data at 2GB per filesystem.> How am I affecting the performance of the client by making that change?Depends on how much they re-use data.> Is this a bad thing to do or no big deal?For Lustre, no big deal. Depends again on how much cached data affects your application performance.> Some filesystems are more heavily used than others, should I give them > more memory?Seems reasonable.> Some filesystems have large files that I''m sure end up sitting up in > memory, should I give them more memory?Depends if your application re-uses files or not.> I know the ldlm lru_size can be used to flush cache, but I don''t > think that''s a wise thing to do, people might lose...locks (true/false?)Clearing all of the locks will in turn flush all of your caches, so it is only a short-term fix unless you put a hard limit on the number of locks for each filesystem. Getting that right is hard.> on files they''re downloading or something, right? Is there another cache > tunable where I can flush cached things that are two hours or more old, > but leave the newer stuff (a max_cache_time parameter perhaps)?Yes, there is the ldlm.namespaces.*.lru_max_age parameter you could tune. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.