Andrea Rucks
2009-Jun-04  22:52 UTC
[Lustre-discuss] lctl set_param /proc/fs/lustre/llite/lusfs0*/max_cached_mb ???
Hi there,
I''m experiencing some issues with Lustre and WebSphere Portal Server
6.1
(WPS) co-existing on the Lustre client application server.  WPS likes to 
use a lot of memory.  The server was originally allocated 16 GB of RAM. 
These servers are XEN virtualized on RHEL 5.3 running Lustre 1.6.7 patched 
16895 (but not fully 1.6.7.1).
What I''m seeing is that WPS eventually takes all 15.5 GB of available 
memory (or tries to) and then my server will hang and show an out of 
memory error on the console:
Call Trace:
 [<ffffffff802bc998>] out_of_memory+0x8b/0x203
  [<ffffffff8020f657>] __alloc_pages+0x245/0x2ce
   [<ffffffff8021336e>] __do_page_cache_readahead+0xd0/0x21c
    [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
     [<ffffffff8023efff>] lock_timer_base+0x1b/0x3c
      [<ffffffff88081d4d>] :dm_mod:dm_any_congested+0x38/0x3f
       [<ffffffff80213c47>] filemap_nopage+0x148/0x322
        [<ffffffff80208db9>] __handle_mm_fault+0x440/0x11f6
                  [<ffffffff802666ef>] do_page_fault+0xf7b/0x12e0
                   [<ffffffff80207141>] kmem_cache_free+0x80/0xd3
                    [<ffffffff8025f82b>] error_exit+0x0/0x6e
                    DMA per-cpu:
                    cpu 0 hot: high 186, batch 31 used:73
                    cpu 0 cold: high 62, batch 15 used:61
                    cpu 1 hot: high 186, batch 31 used:164
                    cpu 1 cold: high 62, batch 15 used:59
                    DMA32 per-cpu: empty
                    Normal per-cpu: empty
                    HighMem per-cpu: empty
                    Free pages:        6040kB (0kB HighMem)
                    Active:2087085 inactive:1984455 dirty:0 writeback:0 
unstable:0 free:1510 slab:9371 mapped-file:992 mapped-anon:4054270 
pagetables:11073
                    DMA free:6040kB min:16384kB low:20480kB high:24576kB 
active:8348340kB inactive:7937820kB present:16785408kB 
pages_scanned:375028407 all_unreclaimable? yes
                    lowmem_reserve[]: 0 0 0 0
                    DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB 
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
                    lowmem_reserve[]: 0 0 0 0
                    Normal free:0kB min:0kB low:0kB high:0kB active:0kB 
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
                    lowmem_reserve[]: 0 0 0 0
                    HighMem free:0kB min:128kB low:128kB high:128kB 
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
                    lowmem_reserve[]: 0 0 0 0
                    DMA: 0*4kB 3*8kB 0*16kB 0*32kB 0*64kB 1*128kB 1*256kB 
1*512kB 1*1024kB 0*2048kB 1*4096kB = 6040kB
                    DMA32: empty
                    Normal: empty
                    HighMem: empty
                    17533 pagecache pages
                    Swap cache: add 6497852, delete 6497594, find 
1492942/1923882, race 4+83
                    Free swap  = 0kB
                    Total swap = 4194296kB
                    uptimeagent invoked oom-killer: gfp_mask=0x201d2, 
order=0, oomkilladj=0
I have 6 filesystems of varying size (1.5 TB - 4 GB), and we use Lustre to 
share them amongst the WebSphere cluster; our use of Lustre is commercial 
in nature, non-HPC and uses legacy filesystem structures (still uses Linux 
HA though).
If we stop WPS as it begins chewing through RAM, we still see a lot of 
memory in use (Lustre client cache).  As I unmount each Lustre filesystem, 
I gain back a significant portion of memory (about 7 GB back total).  For 
grins, we ripple stopped each WPS server, adjusted the maxmem XEN value 
and gave each server an additional 6 GB of RAM for a total of 22 GB. 
I''d
like to now limit the Lustre clients to the following, but I''m not sure
if
doing so will mess things up:
lctl set_param /proc/fs/lustre/llite/lusfs01*/max_cached_mb 2048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs02*/max_cached_mb 2048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs03*/max_cached_mb 1048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs04*/max_cached_mb 1048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs05*/max_cached_mb 1048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs06*/max_cached_mb 512    # 
Lustre Default is 12288
So, here are my questions.  Why is 75% the default for max_cached_mb? What 
will happen if I "lctl set_param 
/proc/fs/lustre/llite/lusfs01*/max_cached_mb 2048" instead of 12288 where 
it is today for each of those filesystems mentioned above?  How am I 
affecting the performance of the client by making that change?   Is this a 
bad thing to do or no big deal?  Some filesystems are more heavily used 
than others, should I give them more memory?  Some filesystems have large 
files that I''m sure end up sitting up in memory, should I give them
more
memory?  I know the lmld lru_size can be used to flush cache, but I
don''t
think that''s a wise thing to do, people might lose...locks
(true/false?)
on files they''re downloading or something, right?  Is there another
cache
tunable where I can flush cached things that are two hours or more old, 
but leave the newer stuff (a max_cache_time parameter perhaps)?
Cheers,
Ms. Andrea D. Rucks
Sr. Unix Systems Administrator,
Lawson ITS Unix Server Team
_____________________________
Lawson
380 St. Peter Street
St. Paul, MN 55102
Tel: 651-767-6252
http://www.lawson.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090604/f11ba378/attachment-0001.html
Andrea Rucks
2009-Jun-10  15:37 UTC
[Lustre-discuss] lctl set_param /proc/fs/lustre/llite/lusfs0*/max_cached_mb ???
Hi there,
I''m experiencing some issues with Lustre and WebSphere Portal Server
6.1
(WPS) co-existing on the Lustre client application server.  WPS likes to 
use a lot of memory.  The server was originally allocated 16 GB of RAM. 
These servers are XEN virtualized on RHEL 5.3 running Lustre 1.6.7 patched 
16895 (but not fully 1.6.7.1).
What I''m seeing is that WPS eventually takes all 15.5 GB of available 
memory (or tries to) and then my server will hang and show an out of 
memory error on the console:
Call Trace:
 [<ffffffff802bc998>] out_of_memory+0x8b/0x203
  [<ffffffff8020f657>] __alloc_pages+0x245/0x2ce
   [<ffffffff8021336e>] __do_page_cache_readahead+0xd0/0x21c
    [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
     [<ffffffff8023efff>] lock_timer_base+0x1b/0x3c
      [<ffffffff88081d4d>] :dm_mod:dm_any_congested+0x38/0x3f
       [<ffffffff80213c47>] filemap_nopage+0x148/0x322
        [<ffffffff80208db9>] __handle_mm_fault+0x440/0x11f6
                  [<ffffffff802666ef>] do_page_fault+0xf7b/0x12e0
                   [<ffffffff80207141>] kmem_cache_free+0x80/0xd3
                    [<ffffffff8025f82b>] error_exit+0x0/0x6e
                    DMA per-cpu:
                    cpu 0 hot: high 186, batch 31 used:73
                    cpu 0 cold: high 62, batch 15 used:61
                    cpu 1 hot: high 186, batch 31 used:164
                    cpu 1 cold: high 62, batch 15 used:59
                    DMA32 per-cpu: empty
                    Normal per-cpu: empty
                    HighMem per-cpu: empty
                    Free pages:        6040kB (0kB HighMem)
                    Active:2087085 inactive:1984455 dirty:0 writeback:0 
unstable:0 free:1510 slab:9371 mapped-file:992 mapped-anon:4054270 
pagetables:11073
                    DMA free:6040kB min:16384kB low:20480kB high:24576kB 
active:8348340kB inactive:7937820kB present:16785408kB 
pages_scanned:375028407 all_unreclaimable? yes
                    lowmem_reserve[]: 0 0 0 0
                    DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB 
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
                    lowmem_reserve[]: 0 0 0 0
                    Normal free:0kB min:0kB low:0kB high:0kB active:0kB 
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
                    lowmem_reserve[]: 0 0 0 0
                    HighMem free:0kB min:128kB low:128kB high:128kB 
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
                    lowmem_reserve[]: 0 0 0 0
                    DMA: 0*4kB 3*8kB 0*16kB 0*32kB 0*64kB 1*128kB 1*256kB 
1*512kB 1*1024kB 0*2048kB 1*4096kB = 6040kB
                    DMA32: empty
                    Normal: empty
                    HighMem: empty
                    17533 pagecache pages
                    Swap cache: add 6497852, delete 6497594, find 
1492942/1923882, race 4+83
                    Free swap  = 0kB
                    Total swap = 4194296kB
                    uptimeagent invoked oom-killer: gfp_mask=0x201d2, 
order=0, oomkilladj=0
I have 6 filesystems of varying size (1.5 TB - 4 GB), and we use Lustre to 
share them amongst the WebSphere cluster; our use of Lustre is commercial 
in nature, non-HPC and uses legacy filesystem structures (still uses Linux 
HA though).
If we stop WPS as it begins chewing through RAM, we still see a lot of 
memory in use (Lustre client cache).  As I unmount each Lustre filesystem, 
I gain back a significant portion of memory (about 7 GB back total).  For 
grins, we ripple stopped each WPS server, adjusted the maxmem XEN value 
and gave each server an additional 6 GB of RAM for a total of 22 GB. 
I''d
like to now limit the Lustre clients to the following, but I''m not sure
if
doing so will mess things up:
lctl set_param /proc/fs/lustre/llite/lusfs01*/max_cached_mb 2048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs02*/max_cached_mb 2048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs03*/max_cached_mb 1048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs04*/max_cached_mb 1048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs05*/max_cached_mb 1048  # Lustre 
Default is 12288
lctl set_param /proc/fs/lustre/llite/lusfs06*/max_cached_mb 512    # 
Lustre Default is 12288
So, here are my questions.  Why is 75% the default for max_cached_mb? What 
will happen if I "lctl set_param 
/proc/fs/lustre/llite/lusfs01*/max_cached_mb 2048" instead of 12288 where 
it is today for each of those filesystems mentioned above?  How am I 
affecting the performance of the client by making that change?   Is this a 
bad thing to do or no big deal?  Some filesystems are more heavily used 
than others, should I give them more memory?  Some filesystems have large 
files that I''m sure end up sitting up in memory, should I give them
more
memory?  I know the lmld lru_size can be used to flush cache, but I
don''t
think that''s a wise thing to do, people might lose...locks
(true/false?)
on files they''re downloading or something, right?  Is there another
cache
tunable where I can flush cached things that are two hours or more old, 
but leave the newer stuff (a max_cache_time parameter perhaps)?
Cheers,
Ms. Andrea D. Rucks
Sr. Unix Systems Administrator,
Lawson ITS Unix Server Team
_____________________________
Lawson
380 St. Peter Street
St. Paul, MN 55102
Tel: 651-767-6252
http://www.lawson.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090610/17f8ae9c/attachment-0001.html
Andreas Dilger
2009-Jun-11  00:10 UTC
[Lustre-discuss] lctl set_param /proc/fs/lustre/llite/lusfs0*/max_cached_mb ???
On Jun 04, 2009 17:52 -0500, Andrea Rucks wrote:> What I''m seeing is that WPS eventually takes all 15.5 GB of available > memory (or tries to) and then my server will hang and show an out of > memory error on the console: > > Call Trace: > [<ffffffff802bc998>] out_of_memory+0x8b/0x203 > Free pages: 6040kB (0kB HighMem) > active:8348340kB inactive:7937820kB present:16785408kBSo, about 8GB is just in cached memory, but is inactive so it should be able to be released under memory pressure.> pages_scanned:375028407 all_unreclaimable? yes > lowmem_reserve[]: 0 0 0 0 > DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB > inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no > inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no > active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? noThis _should_ mean that some pages are reclaimable, not sure why they are not.> If we stop WPS as it begins chewing through RAM, we still see a lot of > memory in use (Lustre client cache). As I unmount each Lustre filesystem, > I gain back a significant portion of memory (about 7 GB back total).That isn''t indicative of anything, because Linux/Lustre caches data that isn''t in use (inactive) in case it might be used later.> I''d like to now limit the Lustre clients to the following, but I''m not > sure if doing so will mess things up: > > lctl set_param /proc/fs/lustre/llite/lusfs01*/max_cached_mb 2048 # Lustre > Default is 12288Note: you can use "lctl set_param llite.*.max_cached_mb=2048" as a shortcut for this. Note that having many separate caches (i.e. multiple filesystems) is less efficient than a single large filesystem.> So, here are my questions. Why is 75% the default for max_cached_mb?Just a reasonable maximum amount of cached data. Something has to be kept available for application use.> What will happen if I "lctl set_param llite.*.max_cached_mb 2048" instead > of 12288 where it is today for each of those filesystems mentioned above?It should cap the cached data at 2GB per filesystem.> How am I affecting the performance of the client by making that change?Depends on how much they re-use data.> Is this a bad thing to do or no big deal?For Lustre, no big deal. Depends again on how much cached data affects your application performance.> Some filesystems are more heavily used than others, should I give them > more memory?Seems reasonable.> Some filesystems have large files that I''m sure end up sitting up in > memory, should I give them more memory?Depends if your application re-uses files or not.> I know the ldlm lru_size can be used to flush cache, but I don''t > think that''s a wise thing to do, people might lose...locks (true/false?)Clearing all of the locks will in turn flush all of your caches, so it is only a short-term fix unless you put a hard limit on the number of locks for each filesystem. Getting that right is hard.> on files they''re downloading or something, right? Is there another cache > tunable where I can flush cached things that are two hours or more old, > but leave the newer stuff (a max_cache_time parameter perhaps)?Yes, there is the ldlm.namespaces.*.lru_max_age parameter you could tune. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.