thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre client memory usage very high [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Guillaume Demillecamps

2009-Jul-22 09:45 UTC

[Lustre-discuss] Lustre client memory usage very high

Hello people,


Lustre 1.8.0 on all servers / clients involved in this. OS is SLES 10  
SP2 with un-patched kernel on the clients. I however has put the same  
kernel revision downloaded from suse.com on the clients as the version  
used in the Lustre-patched MGS/MDS/OSS servers. File system is only  
several GBs, with ~500000 files. All inter-connections are through TCP.

We have some ?manual? replication of an active lustre file system to a  
passive lustre file system. We have ?sync? clients that just basically  
mount both file systems and run large sync jobs from the active Lustre  
to the passive Lustre. So far, so good (apart that it is quite a slow  
process). However my issue is that Lustre is rising memory so high  
that rsync cannot get enough RAM to finish its job before kswap kicks  
in and slows things down drastically.
Up to now, I have succeeded fine-tuning things using the following  
steps in my rsync script:
       ########
	umount /opt/lustre_a
	umount /opt/lustre_z
	mount /opt/lustre_a
	mount /opt/lustre_z
	for i in `ls /proc/fs/lustre/osc/*/max_dirty_mb`; do echo 4 > $i ; done
	for i in `ls /proc/fs/lustre/ldlm/namespaces/*/lru_max_age`; do echo  
30 > $i ; done
	for i in `ls /proc/fs/lustre/llite/*/max_cached_mb`; do echo 64 > $i ; done
	echo 64 > /proc/sys/lustre/max_dirty_mb
	lctl set_param ldlm.namespaces.*osc*.lru_size=100
	sysctl -w lnet.debug=0
       ########
What I still don''t understand is that even when putting a max limit of
a few MB of read-cache (max_cached_mb / max_dirty_mb) and putting the  
write-cache (lru_max_age ? is it correct ?) to a very limited number,  
it still sky-rise to several GBs in /proc/sys/lustre/mem_used ? And as  
soon as I un-mount the disks, it drops. The memused number however  
will not decrease even if the client remains idle for several days  
with no i/o from/to any lustre file systems. Note that cutting the  
rsync jobs in smaller but more numbered jobs is not helping. Unless  
I''d start un-mounting and re-mounting the lustre file systems between  
each job (which is nevertheless what I may have to plan if there is no  
further parameter which would help me) !

Any help/guidance/hint/... is very much appreciated.

Thank you,


Guillaume Demillecamps

Andreas Dilger

2009-Jul-29 22:46 UTC

head link

[Lustre-discuss] Lustre client memory usage very high

On Jul 22, 2009  11:45 +0200, Guillaume Demillecamps
wrote:> Lustre 1.8.0 on all servers / clients involved in this. OS is SLES 10  
> SP2 with un-patched kernel on the clients. I however has put the same  
> kernel revision downloaded from suse.com on the clients as the version  
> used in the Lustre-patched MGS/MDS/OSS servers. File system is only  
> several GBs, with ~500000 files. All inter-connections are through TCP.
> 
> We have some ?manual? replication of an active lustre file system to a  
> passive lustre file system. We have ?sync? clients that just basically  
> mount both file systems and run large sync jobs from the active Lustre  
> to the passive Lustre. So far, so good (apart that it is quite a slow  
> process). However my issue is that Lustre is rising memory so high  
> that rsync cannot get enough RAM to finish its job before kswap kicks  
> in and slows things down drastically.
> Up to now, I have succeeded fine-tuning things using the following  
> steps in my rsync script:
>        ########
> 	umount /opt/lustre_a
> 	umount /opt/lustre_z
> 	mount /opt/lustre_a
> 	mount /opt/lustre_z
> 	for i in `ls /proc/fs/lustre/osc/*/max_dirty_mb`; do echo 4 > $i ; done
> 	for i in `ls /proc/fs/lustre/ldlm/namespaces/*/lru_max_age`; do echo  
> 30 > $i ; done
> 	for i in `ls /proc/fs/lustre/llite/*/max_cached_mb`; do echo 64 > $i ;
done
> 	echo 64 > /proc/sys/lustre/max_dirty_mb
Note that you can do these more easily with

        lctl set_param osc.*.max_dirty_mb=4
        lctl set_param ldlm.namespaces.*.lru_max_age=30
        lctl set_param llite.*.max_cache_mb=64
        lctl set_param max_dirty_mb=64
> 	lctl set_param ldlm.namespaces.*osc*.lru_size=100
> 	sysctl -w lnet.debug=0
This can also be "lctl set_param debug=0".
> What I still don''t understand is that even when putting a max
limit of
> a few MB of read-cache (max_cached_mb / max_dirty_mb) and putting the  
> write-cache (lru_max_age ? is it correct ?) to a very limited number,  
> it still sky-rise to several GBs in /proc/sys/lustre/mem_used ?
Can you please check /proc/slabinfo to see what kind of memory is being
allocated the most?  The max_cached_mb/max_dirty_mb are only limits on
the cached/dirty data pages, and not for metadata structures.  Also,
in 30s I expect you can have a LOT of inodes traversed, so that might
be your problem, and even then lock cancellation does not necessarily
force the kernel dentry/inode out of memory.

Getting total lock counts would also help:

	lctl get_param ldlm.namespaces.*.resource_count

You might be able to tweak some of the "normal" (not Lustre specific)
/proc parmeters to flush the inodes from cache more quickly, or increase
the rate at which kswapd is trying to flush unused inodes.
> And as soon as I un-mount the disks, it drops. The memused number however  
> will not decrease even if the client remains idle for several days  
> with no i/o from/to any lustre file systems. Note that cutting the  
> rsync jobs in smaller but more numbered jobs is not helping.
There is a test program called "memhog" that could force memory to be
flushed between jobs, but that is a sub-standard solution.
> Unless  
> I''d start un-mounting and re-mounting the lustre file systems
between
> each job (which is nevertheless what I may have to plan if there is no  
> further parameter which would help me) !

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Guillaume Demillecamps

2009-Jul-30 07:52 UTC

head link

[Lustre-discuss] Lustre client memory usage very high

Hello,


First of all thank you for your time.
You can find here attached the information you asked.
If you can keep on spending some more of your time on this...
Your help is greatly appreciated !

Best regards,


Guillaume Demillecamps



----- Message de adilger at sun.com ---------
     Date?: Wed, 29 Jul 2009 16:46:27 -0600
      De?: Andreas Dilger <adilger at sun.com>
  Objet?: Re: [Lustre-discuss] Lustre client memory usage very high
       ??: Guillaume Demillecamps <guillaume at multipurpose.be>
      Cc?: lustre-discuss at lists.lustre.org

> On Jul 22, 2009  11:45 +0200, Guillaume Demillecamps wrote:
>> Lustre 1.8.0 on all servers / clients involved in this. OS is SLES 10
>> SP2 with un-patched kernel on the clients. I however has put the same
>> kernel revision downloaded from suse.com on the clients as the version
>> used in the Lustre-patched MGS/MDS/OSS servers. File system is only
>> several GBs, with ~500000 files. All inter-connections are through TCP.
>>
>> We have some ?manual? replication of an active lustre file system to a
>> passive lustre file system. We have ?sync? clients that just basically
>> mount both file systems and run large sync jobs from the active Lustre
>> to the passive Lustre. So far, so good (apart that it is quite a slow
>> process). However my issue is that Lustre is rising memory so high
>> that rsync cannot get enough RAM to finish its job before kswap kicks
>> in and slows things down drastically.
>> Up to now, I have succeeded fine-tuning things using the following
>> steps in my rsync script:
>>        ########
>> 	umount /opt/lustre_a
>> 	umount /opt/lustre_z
>> 	mount /opt/lustre_a
>> 	mount /opt/lustre_z
>> 	for i in `ls /proc/fs/lustre/osc/*/max_dirty_mb`; do echo 4 > $i ;
done
>> 	for i in `ls /proc/fs/lustre/ldlm/namespaces/*/lru_max_age`; do echo
>> 30 > $i ; done
>> 	for i in `ls /proc/fs/lustre/llite/*/max_cached_mb`; do echo 64 >
$i ; done
>> 	echo 64 > /proc/sys/lustre/max_dirty_mb
>
> Note that you can do these more easily with
>
>         lctl set_param osc.*.max_dirty_mb=4
>         lctl set_param ldlm.namespaces.*.lru_max_age=30
>         lctl set_param llite.*.max_cache_mb=64
>         lctl set_param max_dirty_mb=64
>
>> 	lctl set_param ldlm.namespaces.*osc*.lru_size=100
>> 	sysctl -w lnet.debug=0
>
> This can also be "lctl set_param debug=0".
>
>> What I still don''t understand is that even when putting a max
limit of
>> a few MB of read-cache (max_cached_mb / max_dirty_mb) and putting the
>> write-cache (lru_max_age ? is it correct ?) to a very limited number,
>> it still sky-rise to several GBs in /proc/sys/lustre/mem_used ?
>
> Can you please check /proc/slabinfo to see what kind of memory is being
> allocated the most?  The max_cached_mb/max_dirty_mb are only limits on
> the cached/dirty data pages, and not for metadata structures.  Also,
> in 30s I expect you can have a LOT of inodes traversed, so that might
> be your problem, and even then lock cancellation does not necessarily
> force the kernel dentry/inode out of memory.
>
> Getting total lock counts would also help:
>
> 	lctl get_param ldlm.namespaces.*.resource_count
>
> You might be able to tweak some of the "normal" (not Lustre
specific)
> /proc parmeters to flush the inodes from cache more quickly, or increase
> the rate at which kswapd is trying to flush unused inodes.
>
>> And as soon as I un-mount the disks, it drops. The memused number
however
>> will not decrease even if the client remains idle for several days
>> with no i/o from/to any lustre file systems. Note that cutting the
>> rsync jobs in smaller but more numbered jobs is not helping.
>
> There is a test program called "memhog" that could force memory
to be
> flushed between jobs, but that is a sub-standard solution.
>
>> Unless
>> I''d start un-mounting and re-mounting the lustre file systems
between
>> each job (which is nevertheless what I may have to plan if there is no
>> further parameter which would help me) !
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>

----- Fin du message de adilger at sun.com -----

-------------- next part --------------
=======================================================BEESPBESXSNC08:~ # cat
/proc/sys/lustre/memused
1338178530
=======================================================BEESPBESXSNC08:~ # cat
/proc/slabinfo
slabinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize>
<objperslab> <pagesperslab> : tunables <limit>
<batchcount> <sharedfactor> : slabdata <active_slabs>
<num_slabs> <sharedavail>
ll_async_page          0      0    320   12    1 : tunables   54   27    8 :
slabdata      0      0      0
ll_file_data          20     20    192   20    1 : tunables  120   60    8 :
slabdata      1      1      0
lustre_inode_cache 385652 385652    960    4    1 : tunables   54   27    8 :
slabdata  96413  96413      0
lov_oinfo         2929548 2929548    320   12    1 : tunables   54   27    8 :
slabdata 244129 244129      0
osc_quota_info         0      0     32  112    1 : tunables  120   60    8 :
slabdata      0      0      0
ll_qunit_cache         0      0    112   34    1 : tunables  120   60    8 :
slabdata      0      0      0
llcd_cache             0      0   3952    1    1 : tunables   24   12    8 :
slabdata      0      0      0
ptlrpc_cbdatas         0      0     32  112    1 : tunables  120   60    8 :
slabdata      0      0      0
interval_node      89964 209790    128   30    1 : tunables  120   60    8 :
slabdata   6993   6993    480
ldlm_locks        136262 254424    512    8    1 : tunables   54   27    8 :
slabdata  31803  31803    216
ldlm_resources    136183 256120    384   10    1 : tunables   54   27    8 :
slabdata  25612  25612    216
ll_import_cache        0      0    984    4    1 : tunables   54   27    8 :
slabdata      0      0      0
ll_obdo_cache         16     19    208   19    1 : tunables  120   60    8 :
slabdata      1      1      0
ll_obd_dev_cache      22     22   5600    1    2 : tunables    8    4    0 :
slabdata     22     22      0
obd_lvfs_ctxt_cache      0      0     88   44    1 : tunables  120   60    8 :
slabdata      0      0      0
ip_fib_alias          23     59     64   59    1 : tunables  120   60    8 :
slabdata      1      1      0
ip_fib_hash           23     59     64   59    1 : tunables  120   60    8 :
slabdata      1      1      0
dm_events             16     92     40   92    1 : tunables  120   60    8 :
slabdata      1      1      0
dm_tio               768    864     24  144    1 : tunables  120   60    8 :
slabdata      6      6      0
dm_io                768    828     40   92    1 : tunables  120   60    8 :
slabdata      9      9      0
ext3_inode_cache    8981   8985    800    5    1 : tunables   54   27    8 :
slabdata   1797   1797      0
ext3_xattr             0      0     88   44    1 : tunables  120   60    8 :
slabdata      0      0      0
journal_handle         0      0     24  144    1 : tunables  120   60    8 :
slabdata      0      0      0
journal_head          15     80     96   40    1 : tunables  120   60    8 :
slabdata      2      2      0
revoke_table           8    202     16  202    1 : tunables  120   60    8 :
slabdata      1      1      0
revoke_record          0      0     32  112    1 : tunables  120   60    8 :
slabdata      0      0      0
scsi_cmd_cache         1     10    384   10    1 : tunables   54   27    8 :
slabdata      1      1      0
sgpool-256            32     32   8192    1    2 : tunables    8    4    0 :
slabdata     32     32      0
sgpool-128            32     32   4096    1    1 : tunables   24   12    8 :
slabdata     32     32      0
sgpool-64             32     32   2048    2    1 : tunables   24   12    8 :
slabdata     16     16      0
sgpool-32             32     32   1024    4    1 : tunables   54   27    8 :
slabdata      8      8      0
sgpool-16             32     32    512    8    1 : tunables   54   27    8 :
slabdata      4      4      0
sgpool-8              32     45    256   15    1 : tunables  120   60    8 :
slabdata      3      3      0
scsi_io_context        0      0    112   34    1 : tunables  120   60    8 :
slabdata      0      0      0
UNIX                  33     33    704   11    2 : tunables   54   27    8 :
slabdata      3      3      0
ip_mrt_cache           0      0    128   30    1 : tunables  120   60    8 :
slabdata      0      0      0
tcp_bind_bucket        7    112     32  112    1 : tunables  120   60    8 :
slabdata      1      1      0
inet_peer_cache        1     30    128   30    1 : tunables  120   60    8 :
slabdata      1      1      0
secpath_cache          0      0    192   20    1 : tunables  120   60    8 :
slabdata      0      0      0
xfrm_dst_cache         0      0    384   10    1 : tunables   54   27    8 :
slabdata      0      0      0
ip_dst_cache          59     70    384   10    1 : tunables   54   27    8 :
slabdata      7      7      0
arp_cache             13     15    256   15    1 : tunables  120   60    8 :
slabdata      1      1      0
RAW                    2      5    768    5    1 : tunables   54   27    8 :
slabdata      1      1      0
UDP                    5      5    768    5    1 : tunables   54   27    8 :
slabdata      1      1      0
tw_sock_TCP            0      0    192   20    1 : tunables  120   60    8 :
slabdata      0      0      0
request_sock_TCP       0      0    128   30    1 : tunables  120   60    8 :
slabdata      0      0      0
TCP                   41     45   1536    5    2 : tunables   24   12    8 :
slabdata      9      9      0
flow_cache             0      0    128   30    1 : tunables  120   60    8 :
slabdata      0      0      0
cfq_ioc_pool          18     23    168   23    1 : tunables  120   60    8 :
slabdata      1      1      0
cfq_pool              18     24    160   24    1 : tunables  120   60    8 :
slabdata      1      1      0
crq_pool              16     44     88   44    1 : tunables  120   60    8 :
slabdata      1      1      0
deadline_drq           0      0     96   40    1 : tunables  120   60    8 :
slabdata      0      0      0
as_arq                 0      0    112   34    1 : tunables  120   60    8 :
slabdata      0      0      0
mqueue_inode_cache      1      4    896    4    1 : tunables   54   27    8 :
slabdata      1      1      0
isofs_inode_cache      0      0    640    6    1 : tunables   54   27    8 :
slabdata      0      0      0
minix_inode_cache      0      0    656    6    1 : tunables   54   27    8 :
slabdata      0      0      0
hugetlbfs_inode_cache      1      6    608    6    1 : tunables   54   27    8 :
slabdata      1      1      0
ext2_inode_cache       5      5    752    5    1 : tunables   54   27    8 :
slabdata      1      1      0
ext2_xattr             0      0     88   44    1 : tunables  120   60    8 :
slabdata      0      0      0
dnotify_cache          1     92     40   92    1 : tunables  120   60    8 :
slabdata      1      1      0
dquot                  0      0    256   15    1 : tunables  120   60    8 :
slabdata      0      0      0
eventpoll_pwq          0      0     72   53    1 : tunables  120   60    8 :
slabdata      0      0      0
eventpoll_epi          0      0    192   20    1 : tunables  120   60    8 :
slabdata      0      0      0
inotify_event_cache      0      0     40   92    1 : tunables  120   60    8 :
slabdata      0      0      0
inotify_watch_cache      1     53     72   53    1 : tunables  120   60    8 :
slabdata      1      1      0
kioctx                 0      0    384   10    1 : tunables   54   27    8 :
slabdata      0      0      0
kiocb                  0      0    256   15    1 : tunables  120   60    8 :
slabdata      0      0      0
fasync_cache           0      0     24  144    1 : tunables  120   60    8 :
slabdata      0      0      0
shmem_inode_cache    424    450    816    5    1 : tunables   54   27    8 :
slabdata     90     90      0
posix_timers_cache      0      0    152   25    1 : tunables  120   60    8 :
slabdata      0      0      0
uid_cache              3     59     64   59    1 : tunables  120   60    8 :
slabdata      1      1      0
blkdev_ioc            17     67     56   67    1 : tunables  120   60    8 :
slabdata      1      1      0
blkdev_queue          32     35   1608    5    2 : tunables   24   12    8 :
slabdata      7      7      0
blkdev_requests       16     26    288   13    1 : tunables   54   27    8 :
slabdata      2      2      0
biovec-(256)         268    268   4096    1    1 : tunables   24   12    8 :
slabdata    268    268      0
biovec-128           280    280   2048    2    1 : tunables   24   12    8 :
slabdata    140    140      0
biovec-64            304    304   1024    4    1 : tunables   54   27    8 :
slabdata     76     76      0
biovec-16            304    315    256   15    1 : tunables  120   60    8 :
slabdata     21     21      0
biovec-4             304    354     64   59    1 : tunables  120   60    8 :
slabdata      6      6      0
biovec-1             304    808     16  202    1 : tunables  120   60    8 :
slabdata      4      4      0
bio                  304    390    128   30    1 : tunables  120   60    8 :
slabdata     13     13      0
sock_inode_cache      90     90    704    5    1 : tunables   54   27    8 :
slabdata     18     18      0
skbuff_fclone_cache     47     91    512    7    1 : tunables   54   27    8 :
slabdata     12     13      0
skbuff_head_cache    346    405    256   15    1 : tunables  120   60    8 :
slabdata     27     27      1
file_lock_cache        1     22    176   22    1 : tunables  120   60    8 :
slabdata      1      1      0
acpi_operand         448    530     72   53    1 : tunables  120   60    8 :
slabdata     10     10      0
acpi_parse_ext         0      0     64   59    1 : tunables  120   60    8 :
slabdata      0      0      0
acpi_parse             0      0     40   92    1 : tunables  120   60    8 :
slabdata      0      0      0
acpi_state             0      0     88   44    1 : tunables  120   60    8 :
slabdata      0      0      0
delayacct_cache      142    236     64   59    1 : tunables  120   60    8 :
slabdata      4      4      0
taskstats_cache       14     14    272   14    1 : tunables   54   27    8 :
slabdata      1      1      0
proc_inode_cache     454    462    624    6    1 : tunables   54   27    8 :
slabdata     77     77      0
sigqueue              16     24    160   24    1 : tunables  120   60    8 :
slabdata      1      1      0
radix_tree_node     7938   8001    536    7    1 : tunables   54   27    8 :
slabdata   1143   1143     18
bdev_cache            29     32    832    4    1 : tunables   54   27    8 :
slabdata      8      8      0
sysfs_dir_cache     3079   3120     80   48    1 : tunables  120   60    8 :
slabdata     65     65      0
mnt_cache             27     30    256   15    1 : tunables  120   60    8 :
slabdata      2      2      0
inode_cache          693    828    592    6    1 : tunables   54   27    8 :
slabdata    138    138      0
dentry_cache       68754  75791    208   19    1 : tunables  120   60    8 :
slabdata   3989   3989    420
filp                 510    510    256   15    1 : tunables  120   60    8 :
slabdata     34     34      0
names_cache            3      3   4096    1    1 : tunables   24   12    8 :
slabdata      3      3      0
idr_layer_cache      103    105    528    7    1 : tunables   54   27    8 :
slabdata     15     15      0
buffer_head        16476  16500     88   44    1 : tunables  120   60    8 :
slabdata    375    375      0
mm_struct             45     45    832    9    2 : tunables   54   27    8 :
slabdata      5      5      0
vm_area_struct      1313   1323    184   21    1 : tunables  120   60    8 :
slabdata     63     63      0
fs_cache              59     59     64   59    1 : tunables  120   60    8 :
slabdata      1      1      0
files_cache           52     52    896    4    1 : tunables   54   27    8 :
slabdata     13     13      0
signal_cache          85     85    768    5    1 : tunables   54   27    8 :
slabdata     17     17      0
sighand_cache         87     87   2112    3    2 : tunables   24   12    8 :
slabdata     29     29      0
task_struct           88     88   1888    2    1 : tunables   24   12    8 :
slabdata     44     44      0
anon_vma             558    576     24  144    1 : tunables  120   60    8 :
slabdata      4      4      0
shared_policy_node      0      0     56   67    1 : tunables  120   60    8 :
slabdata      0      0      0
numa_policy           30    144     24  144    1 : tunables  120   60    8 :
slabdata      1      1      0
size-131072(DMA)       2      2 131072    1   32 : tunables    8    4    0 :
slabdata      2      2      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 :
slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 :
slabdata      0      0      0
size-65536             1      1  65536    1   16 : tunables    8    4    0 :
slabdata      1      1      0
size-32768(DMA)        2      2  32768    1    8 : tunables    8    4    0 :
slabdata      2      2      0
size-32768             5      5  32768    1    8 : tunables    8    4    0 :
slabdata      5      5      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 :
slabdata      0      0      0
size-16384             9      9  16384    1    4 : tunables    8    4    0 :
slabdata      9      9      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 :
slabdata      0      0      0
size-8192            395    395   8192    1    2 : tunables    8    4    0 :
slabdata    395    395      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    8 :
slabdata      0      0      0
size-4096            122    122   4096    1    1 : tunables   24   12    8 :
slabdata    122    122      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    8 :
slabdata      0      0      0
size-2048            648    758   2048    2    1 : tunables   24   12    8 :
slabdata    379    379      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    8 :
slabdata      0      0      0
size-1024           1198   1560   1024    4    1 : tunables   54   27    8 :
slabdata    390    390     81
size-512(DMA)          0      0    512    8    1 : tunables   54   27    8 :
slabdata      0      0      0
size-512             691    880    512    8    1 : tunables   54   27    8 :
slabdata    110    110    162
size-256(DMA)          0      0    256   15    1 : tunables  120   60    8 :
slabdata      0      0      0
size-256          425400 425400    256   15    1 : tunables  120   60    8 :
slabdata  28360  28360      0
size-128(DMA)          0      0    128   30    1 : tunables  120   60    8 :
slabdata      0      0      0
size-64(DMA)           0      0     64   59    1 : tunables  120   60    8 :
slabdata      0      0      0
size-64            98679 267211     64   59    1 : tunables  120   60    8 :
slabdata   4529   4529    480
size-32(DMA)           0      0     32  112    1 : tunables  120   60    8 :
slabdata      0      0      0
size-128           44478  47610    128   30    1 : tunables  120   60    8 :
slabdata   1587   1587    444
size-32             2742   3024     32  112    1 : tunables  120   60    8 :
slabdata     27     27    480
kmem_cache           140    140   1664    2    1 : tunables   24   12    8 :
slabdata     70     70      0
=======================================================BEESPBESXSNC08:~ # lctl
get_param ldlm.namespaces.*.resource_count
ldlm.namespaces.MGC172.16.0.54 at tcp.resource_count=1
ldlm.namespaces.MGC172.16.0.64 at tcp.resource_count=1
ldlm.namespaces.l.ap-a-MDT0000-mdc-ffff81001b243c00.resource_count=28862
ldlm.namespaces.l.ap-a-OST0000-osc-ffff81001b243c00.resource_count=100
ldlm.namespaces.l.ap-a-OST0001-osc-ffff81001b243c00.resource_count=23306
ldlm.namespaces.l.ap-a-OST0002-osc-ffff81001b243c00.resource_count=23278
ldlm.namespaces.l.ap-a-OST0003-osc-ffff81001b243c00.resource_count=100
ldlm.namespaces.l.ap-a-OST0004-osc-ffff81001b243c00.resource_count=100
ldlm.namespaces.l.ap-a-OST0005-osc-ffff81001b243c00.resource_count=18027
ldlm.namespaces.l.ap-a-OST0006-osc-ffff81001b243c00.resource_count=17212
ldlm.namespaces.l.ap-a-OST0007-osc-ffff81001b243c00.resource_count=100
ldlm.namespaces.l.ap-z-MDT0000-mdc-ffff81001ba73400.resource_count=10138
ldlm.namespaces.l.ap-z-OST0000-osc-ffff81001ba73400.resource_count=100
ldlm.namespaces.l.ap-z-OST0001-osc-ffff81001ba73400.resource_count=100
ldlm.namespaces.l.ap-z-OST0002-osc-ffff81001ba73400.resource_count=100
ldlm.namespaces.l.ap-z-OST0003-osc-ffff81001ba73400.resource_count=100
ldlm.namespaces.l.ap-z-OST0004-osc-ffff81001ba73400.resource_count=100
ldlm.namespaces.l.ap-z-OST0005-osc-ffff81001ba73400.resource_count=100
ldlm.namespaces.l.ap-z-OST0006-osc-ffff81001ba73400.resource_count=100
ldlm.namespaces.l.ap-z-OST0007-osc-ffff81001ba73400.resource_count=100
========================================================

Andreas Dilger

2009-Jul-30 22:45 UTC

head link

[Lustre-discuss] Lustre client memory usage very high

On Jul 30, 2009  09:52 +0200, Guillaume Demillecamps
wrote:>> On Jul 22, 2009  11:45 +0200, Guillaume Demillecamps wrote:
>>> Lustre 1.8.0 on all servers / clients involved in this. OS is SLES
10
>>> SP2 with un-patched kernel on the clients. I however has put the
same
>>> kernel revision downloaded from suse.com on the clients as the
version
>>> used in the Lustre-patched MGS/MDS/OSS servers. File system is only
>>> several GBs, with ~500000 files. All inter-connections are through
TCP.
>>>
>>> We have some ?manual? replication of an active lustre file system
to a
>>> passive lustre file system. We have ?sync? clients that just
basically
>>> mount both file systems and run large sync jobs from the active
Lustre
>>> to the passive Lustre. So far, so good (apart that it is quite a
slow
>>> process). However my issue is that Lustre is rising memory so high
>>> that rsync cannot get enough RAM to finish its job before kswap
kicks
>>> in and slows things down drastically.
> # name           <active>  <total> <size>
<obj/slab>: slabdata <active> <num>
> lustre_inode_cache  385652  385652    960    4      : slabdata  96413 
96413
> lov_oinfo          2929548 2929548    320   12      : slabdata 244129
244129
> ldlm_locks          136262  254424    512    8      : slabdata  31803 
31803
> ldlm_resources      136183  256120    384   10      : slabdata  25612 
25612
This shows that we have 385k Lustre inodes, yet there are 2.9M
"lov_oinfo"
structs (there should only be a single one per inode).  I''m not sure
why that is happening, but that is consuming about 1GB of RAM.  The 385k
inode count is reasonable, given you have 500k files, per above.  There
are 136k locks, which is also fine (probably so much lower than the inode
count because of your short lock expiry time).

So, it seems like a problem of some kind, and is probably deserving of
filing a bug.


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Guillaume Demillecamps

2009-Jul-31 06:56 UTC

head link

[Lustre-discuss] Lustre client memory usage very high

Hello again,


Not sure if it is interesting to be noted, but if I use the following  
command, my memory is freed:
sync; echo 3 > /proc/sys/vm/drop_caches
What is surprising, though, is that the cache never expires (at least  
it remains in memory for several days at the least).


Regards,


Guillaume Demillecamps





----- Message de adilger at sun.com ---------
     Date?: Thu, 30 Jul 2009 16:45:47 -0600
      De?: Andreas Dilger <adilger at sun.com>
  Objet?: Re: [Lustre-discuss] Lustre client memory usage very high
       ??: Guillaume Demillecamps <guillaume at multipurpose.be>
      Cc?: lustre-discuss at lists.lustre.org

> On Jul 30, 2009  09:52 +0200, Guillaume Demillecamps wrote:
>>> On Jul 22, 2009  11:45 +0200, Guillaume Demillecamps wrote:
>>>> Lustre 1.8.0 on all servers / clients involved in this. OS is
SLES 10
>>>> SP2 with un-patched kernel on the clients. I however has put
the same
>>>> kernel revision downloaded from suse.com on the clients as the
version
>>>> used in the Lustre-patched MGS/MDS/OSS servers. File system is
only
>>>> several GBs, with ~500000 files. All inter-connections are
through TCP.
>>>>
>>>> We have some ?manual? replication of an active lustre file
system to a
>>>> passive lustre file system. We have ?sync? clients that just
basically
>>>> mount both file systems and run large sync jobs from the active
Lustre
>>>> to the passive Lustre. So far, so good (apart that it is quite
a slow
>>>> process). However my issue is that Lustre is rising memory so
high
>>>> that rsync cannot get enough RAM to finish its job before kswap
kicks
>>>> in and slows things down drastically.
>
>> # name           <active>  <total> <size>
<obj/slab>: slabdata
>> <active> <num>
>> lustre_inode_cache  385652  385652    960    4      : slabdata  96413 
96413
>> lov_oinfo          2929548 2929548    320   12      : slabdata 244129
244129
>> ldlm_locks          136262  254424    512    8      : slabdata  31803 
31803
>> ldlm_resources      136183  256120    384   10      : slabdata  25612 
25612
>
> This shows that we have 385k Lustre inodes, yet there are 2.9M
"lov_oinfo"
> structs (there should only be a single one per inode).  I''m not
sure
> why that is happening, but that is consuming about 1GB of RAM.  The 385k
> inode count is reasonable, given you have 500k files, per above.  There
> are 136k locks, which is also fine (probably so much lower than the inode
> count because of your short lock expiry time).
>
> So, it seems like a problem of some kind, and is probably deserving of
> filing a bug.
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>

----- Fin du message de adilger at sun.com -----

Guillaume Demillecamps

2009-Jul-31 07:15 UTC

head link

[Lustre-discuss] 1.8 : recurrent LBUG''s on clients

Hello,


All servers and clients are having Lustre 1.8, on SLES 10 SP2. Clients  
use patchless kernels, using same base revision as the ones for the  
patched kernel servers.
We recurrently encounter this error :

Server log :
------------
Jul 30 06:11:47 BEESPBESXFIL27 kernel: LustreError:  
22061:0:(mds_open.c:1665:mds_close()) @@@ no handle for file close ino  
5606195: cookie 0x5ed7d8c3d1299f40  req at ffff810065a60400  
x1308791892785337/t0  
o35->4f104403-eb03-83be-2910-2fd7cc26087c at NET_0x20000c0a84410_UUID:0/0  
lens 408/864 e 0 to 0 dl 1248927113 ref 1 fl Interpret:/0/0 rc 0/0
Jul 30 06:11:47 BEESPBESXFIL27 kernel: LustreError:  
22061:0:(ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error  
(-116)  req at ffff810065a60400 x1308791892785337/t0  
o35->4f104403-eb03-83be-2910-2fd7cc26087c at NET_0x20000c0a84410_UUID:0/0  
lens 408/864 e 0 to 0 dl 1248927113 ref 1 fl Interpret:/0/0 rc -116/0
Jul 30 06:11:47 BEESPBESXFIL27 kernel: LustreError:  
22061:0:(mds_open.c:1665:mds_close()) @@@ no handle for file close ino  
5606200: cookie 0x5ed7d8c3d129a361  req at ffff810071b28400  
x1308791892785342/t0  
o35->4f104403-eb03-83be-2910-2fd7cc26087c at NET_0x20000c0a84410_UUID:0/0  
lens 408/864 e 0 to 0 dl 1248927113 ref 1 fl Interpret:/0/0 rc 0/0
Jul 30 06:11:47 BEESPBESXFIL27 kernel: LustreError:  
22061:0:(mds_open.c:1665:mds_close()) Skipped 4 previous similar  
messages
Jul 30 06:11:47 BEESPBESXFIL27 kernel: LustreError:  
22061:0:(ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error  
(-116)  req at ffff810071b28400 x1308791892785342/t0  
o35->4f104403-eb03-83be-2910-2fd7cc26087c at NET_0x20000c0a84410_UUID:0/0  
lens 408/864 e 0 to 0 dl 1248927113 ref 1 fl Interpret:/0/0 rc -116/0
Jul 30 06:11:47 BEESPBESXFIL27 kernel: LustreError:  
22061:0:(ldlm_lib.c:1826:target_send_reply_msg()) Skipped 4 previous  
similar messages


Client log:
-----------
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError: 11-0: an error  
occurred while communicating with 172.16.0.55 at tcp. The mds_close  
operation failed with -116
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError:  
13298:0:(file.c:114:ll_close_inode_openhandle()) inode 5606195 mdc  
close failed: rc = -116
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError:  
13298:0:(file.c:114:ll_close_inode_openhandle()) Skipped 1 previous  
similar message
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError:  
13298:0:(file.c:114:ll_close_inode_openhandle()) inode 5606155 mdc  
close failed: rc = -116
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError:  
13298:0:(file.c:114:ll_close_inode_openhandle()) Skipped 3 previous  
similar messages
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError: 11-0: an error  
occurred while communicating with 172.16.0.55 at tcp. The mds_close  
operation failed with -116
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError: Skipped 7 previous  
similar messages
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError:  
13298:0:(ldlm_lock.c:602:ldlm_lock_decref_internal_nolock())  
ASSERTION(lock->l_writers > 0) failed
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError:  
13298:0:(ldlm_lock.c:602:ldlm_lock_decref_internal_nolock()) LBUG
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
Jul 30 06:11:47 BEESPDESXAPP06 kernel: Call Trace:  
<ffffffff88257aea>{:libcfs:lbug_with_loc+122}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff8825fe00>{:libcfs:tracefile_init+0}  
<ffffffff8835d566>{:ptlrpc:ldlm_lock_decref_internal_nolock+182}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff8838533b>{:ptlrpc:ldlm_process_flock_lock+4139}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff883864ef>{:ptlrpc:ldlm_flock_completion_ast+2111}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff8835f4a9>{:ptlrpc:ldlm_lock_enqueue+2169}  
<ffffffff88377ca0>{:ptlrpc:ldlm_cli_enqueue_fini+2624}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff88376fd3>{:ptlrpc:ldlm_prep_elc_req+755}  
<ffffffff8835bc0d>{:ptlrpc:ldlm_lock_create+2541}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff8012c668>{default_wake_function+0}  
<ffffffff88379ae2>{:ptlrpc:ldlm_cli_enqueue+1666}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff88523fcf>{:lustre:ll_file_flock+1407}  
<ffffffff88385cb0>{:ptlrpc:ldlm_flock_completion_ast+0}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff8019ae2e>{locks_remove_posix+132}  
<ffffffff80147fdc>{bit_waitqueue+56}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff80190241>{flush_old_exec+2729}
<ffffffff80186fc1>{__fput+355}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff8018455b>{filp_close+84}  
<ffffffff801360b7>{put_files_struct+107}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff8010aecb>{sysret_signal+28} <ffffffff8013725c>{do_exit+684}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff80137995>{sys_exit_group+0}  
<ffffffff8014083c>{get_signal_to_deliver+1394}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff8010aecb>{sysret_signal+28}
<ffffffff8010a19c>{do_signal+118}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff8012c668>{default_wake_function+0}  
<ffffffff8014b227>{do_futex+104}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff801743b2>{sys_mprotect+1742}  
<ffffffff8010aecb>{sysret_signal+28}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:         
<ffffffff8010b14f>{ptregscall_common+103}
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError: dumping log to  
/tmp/lustre-log.1248927107.13298
Jul 30 06:11:47 BEESPDESXAPP06 kernel: Fixing recursive fault but  
reboot is needed!

Then ineed a reboot of the client is required. What does it mean ?  
Could it be related to sys.timeouts and/or ldlm_timeouts too short ?


Regards,


Guillaume Demillecamps

Oleg Drokin

2009-Aug-05 02:39 UTC

head link

[Lustre-discuss] 1.8 : recurrent LBUG''s on clients

Hello!

On Jul 31, 2009, at 3:15 AM, Guillaume Demillecamps
wrote:> All servers and clients are having Lustre 1.8, on SLES 10 SP2. Clients
> use patchless kernels, using same base revision as the ones for the
> patched kernel servers.
> We recurrently encounter this error :
Chances are you are hitting bug 17046.
There is a patch with a fix that would also be included in 1.8.1  
release.

Bye,
     Oleg

Lustre discuss - Jul 2009 - Lustre client memory usage very high

[Lustre-discuss] Lustre client memory usage very high

[Lustre-discuss] Lustre client memory usage very high

[Lustre-discuss] Lustre client memory usage very high

[Lustre-discuss] Lustre client memory usage very high

[Lustre-discuss] Lustre client memory usage very high

[Lustre-discuss] 1.8 : recurrent LBUG''s on clients

[Lustre-discuss] 1.8 : recurrent LBUG''s on clients