Hi,
The cluster is made idle on the weekend to look at the Lustre
ram consumpton issue. The ram used during yesterday''s rsync is still
not
freed up. Here is the output from free
total used free shared buffers cached
Mem: 4041880 3958744 83136 0 876132 144276
-/+ buffers/cache: 2938336 1103544
Swap: 4096564 240 4096324
Looking at vmstat -m, there is something odd. Seems like
ext3_inode_cache and dentry_cache seems to be the biggest occupants of
ram. ldiskfs_inode_cache comparatively smaller.
-
Cache Num Total Size Pages
ll_fmd_cache 0 0 56 69
osc_quota_info 0 0 32 119
lustre_dquot_cache 0 0 144 27
fsfilt_ldiskfs_fcb 0 0 56 69
ldiskfs_inode_cache 430199 440044 920 4
ldiskfs_xattr 0 0 88 45
ldiskfs_prealloc_space 14 38 104 38
ll_file_data 0 0 128 31
lustre_inode_cache 0 0 896 4
lov_oinfo 0 0 256 15
ll_qunit_cache 0 0 72 54
ldlm_locks 10509 12005 512 7
ldlm_resources 10291 11325 256 15
ll_import_cache 0 0 440 9
ll_obdo_cache 0 0 208 19
ll_obd_dev_cache 40 40 5328 1
fib6_nodes 11 61 64 61
ip6_dst_cache 16 24 320 12
ndisc_cache 1 15 256 15
rawv6_sock 10 12 1024 4
udpv6_sock 1 4 1024 4
tcpv6_sock 3 4 1728 4
rpc_buffers 8 8 2048 2
rpc_tasks 8 12 320 12
rpc_inode_cache 6 8 832 4
msi_cache 4 4 5760 1
ip_fib_alias 10 119 32 119
ip_fib_hash 10 61 64 61
dm_tio 0 0 24 156
dm_io 0 0 40 96
dm-bvec-(256) 0 0 4096 1
dm-bvec-128 0 0 2048 2
dm-bvec-64 0 0 1024 4
dm-bvec-16 0 0 256 15
dm-bvec-4 0 0 64 61
Cache Num Total Size Pages
dm-bvec-1 0 0 16 225
dm-bio 0 0 128 31
uhci_urb_priv 2 45 88 45
ext3_inode_cache 1636505 1636556 856 4
ext3_xattr 0 0 88 45
journal_handle 8 81 48 81
journal_head 460 855 88 45
revoke_table 38 225 16 225
revoke_record 0 0 32 119
scsi_cmd_cache 2 14 512 7
unix_sock 105 155 768 5
ip_mrt_cache 0 0 128 31
tcp_tw_bucket 0 0 192 20
tcp_bind_bucket 14 238 32 119
tcp_open_request 0 0 128 31
inet_peer_cache 0 0 128 31
secpath_cache 0 0 192 20
xfrm_dst_cache 0 0 384 10
ip_dst_cache 40 80 384 10
arp_cache 16 30 256 15
raw_sock 9 9 832 9
udp_sock 14 45 832 9
tcp_sock 56 60 1536 5
flow_cache 0 0 128 31
mqueue_inode_cache 1 4 896 4
relayfs_inode_cache 0 0 592 13
isofs_inode_cache 0 0 632 6
hugetlbfs_inode_cache 1 6 624 6
ext2_inode_cache 0 0 752 5
ext2_xattr 0 0 88 45
dquot 0 0 224 17
eventpoll_pwq 3 54 72 54
eventpoll_epi 3 20 192 20
kioctx 0 0 384 10
kiocb 0 0 256 15
Cache Num Total Size Pages
dnotify_cache 2 96 40 96
fasync_cache 1 156 24 156
shmem_inode_cache 376 405 816 5
posix_timers_cache 0 0 184 21
uid_cache 5 62 128 31
sgpool-256 32 32 8192 1
sgpool-128 32 32 4096 1
sgpool-64 32 32 2048 2
sgpool-32 32 32 1024 4
sgpool-16 32 32 512 8
sgpool-8 32 45 256 15
cfq_pool 66 207 56 69
crq_pool 64 324 72 54
deadline_drq 0 0 96 41
as_arq 0 0 112 35
blkdev_ioc 364 476 32 119
blkdev_queue 33 81 856 9
blkdev_requests 64 120 264 15
biovec-(256) 256 256 4096 1
biovec-128 256 256 2048 2
biovec-64 256 256 1024 4
biovec-16 256 270 256 15
biovec-4 256 305 64 61
biovec-1 256 450 16 225
bio 256 279 128 31
file_lock_cache 3 75 160 25
sock_inode_cache 209 220 704 5
skbuff_head_cache 16443 22008 320 12
sock 6 12 640 6
proc_inode_cache 2670 2670 616 6
sigqueue 40 230 168 23
radix_tree_node 68531 68880 536 7
bdev_cache 45 60 832 4
mnt_cache 60 80 192 20
inode_cache 927 1176 584 7
Cache Num Total Size Pages
dentry_cache 1349923 1361216 240 16
filp 717 924 320 12
names_cache 3 3 4096 1
avc_node 12 648 72 54
key_jar 10 60 192 20
idr_layer_cache 110 133 528 7
buffer_head 230970 393300 88 45
mm_struct 47 105 1152 7
vm_area_struct 1573 2904 176 22
fs_cache 422 549 64 61
files_cache 58 171 832 9
signal_cache 529 585 256 15
sighand_cache 522 528 2112 3
task_struct 550 554 2000 2
anon_vma 601 1404 24 156
shared_policy_node 0 0 56 69
numa_policy 82 675 16 225
size-131072(DMA) 0 0 131072 1
size-131072 12 12 131072 1
size-65536(DMA) 0 0 65536 1
size-65536 205 205 65536 1
size-32768(DMA) 0 0 32768 1
size-32768 0 0 32768 1
size-16384(DMA) 0 0 16384 1
size-16384 936 936 16384 1
size-8192(DMA) 0 0 8192 1
size-8192 4911 4911 8192 1
size-4096(DMA) 0 0 4096 1
size-4096 676 676 4096 1
size-2048(DMA) 0 0 2048 2
size-2048 8753 8782 2048 2
size-1620(DMA) 0 0 1664 4
size-1620 86 104 1664 4
size-1024(DMA) 0 0 1024 4
size-1024 15228 15900 1024 4
Cache Num Total Size Pages
size-512(DMA) 0 0 512 8
size-512 1189 2752 512 8
size-256(DMA) 0 0 256 15
size-256 10235 10560 256 15
size-128(DMA) 0 0 128 31
size-128 200934 211916 128 31
size-64(DMA) 0 0 64 61
size-64 712970 735416 64 61
size-32(DMA) 0 0 32 119
size-32 2338 94486 32 119
kmem_cache 210 210 256 15
On the second OSS, here is the vmstat output -
Again dentry_cache and ldiskfs_inode_cache and ext3_inode_cache
seems to be the biggest users of ram.
ll_fmd_cache 0 0 56 69
ldiskfs_inode_cache 987664 987668 920 4
lustre_inode_cache 0 0 896 4
ll_qunit_cache 0 0 72 54
ll_import_cache 0 0 440 9
ll_obdo_cache 0 0 208 19
ll_obd_dev_cache 10 10 5328 1
ip6_dst_cache 16 24 320 12
ndisc_cache 1 15 256 15
rpc_inode_cache 6 8 832 4
msi_cache 4 4 5760 1
ext3_inode_cache 392316 392328 856 4
scsi_cmd_cache 41 42 512 7
ip_mrt_cache 0 0 128 31
inet_peer_cache 0 0 128 31
secpath_cache 0 0 192 20
xfrm_dst_cache 0 0 384 10
ip_dst_cache 39 80 384 10
arp_cache 16 30 256 15
flow_cache 0 0 128 31
mqueue_inode_cache 1 4 896 4
relayfs_inode_cache 0 0 592 13
isofs_inode_cache 0 0 632 6
hugetlbfs_inode_cache 1 6 624 6
ext2_inode_cache 0 0 752 5
dnotify_cache 2 96 40 96
fasync_cache 1 156 24 156
shmem_inode_cache 370 400 816 5
posix_timers_cache 0 0 184 21
uid_cache 7 31 128 31
file_lock_cache 7 75 160 25
sock_inode_cache 216 235 704 5
skbuff_head_cache 16500 21768 320 12
proc_inode_cache 2260 2262 616 6
bdev_cache 56 56 832 4
mnt_cache 46 60 192 20
inode_cache 944 1218 584 7
dentry_cache 1387440 1387440 240 16
names_cache 10 10 4096 1
idr_layer_cache 91 98 528 7
fs_cache 366 549 64 61
files_cache 69 153 832 9
signal_cache 462 585 256 15
sighand_cache 453 453 2112 3
kmem_cache 180 180 256 15
Is there a way to flush out the cache so that the ram is
freed up? The same issue is reported here -
http://lkml.org/lkml/2006/8/3/376 But both OSS run CentOS 4 and 2.6.9
kernel, so drop_caches doesn''t seem to be available in /proc.
Is there anything in proc as explained in
http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html
that can force the kernel to flush out the dentry_cache and
ext3_inode_cache when the rsync is over and cache is not needed anymore?
Thanks very much.
Regards
Balagopal
On Dec 23, 2007 18:01 -0400, Balagopal Pillai wrote:> The cluster is made idle on the weekend to look at the Lustre > ram consumpton issue. The ram used during yesterday''s rsync is still not > freed up. Here is the output from free > > total used free shared buffers cached > Mem: 4041880 3958744 83136 0 876132 144276 > -/+ buffers/cache: 2938336 1103544 > Swap: 4096564 240 4096324Note that this is normal behaviour for Linux. Ram that is unused provides no value, so all available RAM is used for cache until something else is needing to use this memory.> Looking at vmstat -m, there is something odd. Seems like > ext3_inode_cache and dentry_cache seems to be the biggest occupants of > ram. ldiskfs_inode_cache comparatively smaller. > - > > Cache Num Total Size Pages > ldiskfs_inode_cache 430199 440044 920 4 > ldlm_locks 10509 12005 512 7 > ldlm_resources 10291 11325 256 15 > buffer_head 230970 393300 88 45> ext3_inode_cache 1636505 1636556 856 4 > dentry_cache 1349923 1361216 240 16This is odd, because Lustre doesn''t use ext3 at all. It uses ldiskfs (which is ext3 renamed + patches), so it is some non-Lustre filesystem usage which is consuming most of your memory.> > Is there anything in proc as explained in > http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html > that can force the kernel to flush out the dentry_cache and > ext3_inode_cache when the rsync is over and cache is not needed anymore? > Thanks very much.Only to unmount and remount the filesystem, on the server. On Lustre clients there is a mechanism to flush Lustre cache, but that doesn''t help you here. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
If you''re really interesting in tracking memory utilization, collectl - see http://collectl.sourceforge.net/ - when run as a daemon will collect/log all slab data once a minute and you can change the frequency to anything you like. You can then later play it back and see exactly what is happening over time. As another approach you can run interactively and if you specify the -oS switch, you''ll only see changes as they occur. Including the ''T'' will time stamp them as in the example below: [root at cag-dl380-01 root]# collectl -sY -oST -i:1 # SLAB DETAIL # <-----------Objects----------><---------Slab Allocation------> # Name InUse Bytes Alloc Bytes InUse Bytes Total Bytes 11:02:02 size-512 146 74752 208 106496 21 86016 26 106496 11:02:07 sigqueue 319 42108 319 42108 11 45056 11 45056 11:02:07 size-512 208 106496 208 106496 26 106496 26 106496 Since this isn''t a lustre system there isn''t a whole lot of activity... -mark Andreas Dilger wrote:> On Dec 23, 2007 18:01 -0400, Balagopal Pillai wrote: > >> The cluster is made idle on the weekend to look at the Lustre >> ram consumpton issue. The ram used during yesterday''s rsync is still not >> freed up. Here is the output from free >> >> total used free shared buffers cached >> Mem: 4041880 3958744 83136 0 876132 144276 >> -/+ buffers/cache: 2938336 1103544 >> Swap: 4096564 240 4096324 >> > > Note that this is normal behaviour for Linux. Ram that is unused provides > no value, so all available RAM is used for cache until something else is > needing to use this memory. > > >> Looking at vmstat -m, there is something odd. Seems like >> ext3_inode_cache and dentry_cache seems to be the biggest occupants of >> ram. ldiskfs_inode_cache comparatively smaller. >> - >> >> Cache Num Total Size Pages >> ldiskfs_inode_cache 430199 440044 920 4 >> ldlm_locks 10509 12005 512 7 >> ldlm_resources 10291 11325 256 15 >> buffer_head 230970 393300 88 45 >> > > >> ext3_inode_cache 1636505 1636556 856 4 >> dentry_cache 1349923 1361216 240 16 >> > > This is odd, because Lustre doesn''t use ext3 at all. It uses ldiskfs > (which is ext3 renamed + patches), so it is some non-Lustre filesystem > usage which is consuming most of your memory. > > >> Is there anything in proc as explained in >> http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html >> that can force the kernel to flush out the dentry_cache and >> ext3_inode_cache when the rsync is over and cache is not needed anymore? >> Thanks very much. >> > > Only to unmount and remount the filesystem, on the server. On Lustre > clients there is a mechanism to flush Lustre cache, but that doesn''t > help you here. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Hi Andreas,
Thanks. The two OSS also export two ext3 volumes each via NFS
that are used
to backup the 4 smaller Lustre volumes. One possibility as you
mentioned, is that the memory consumption is not Lustre related, but
ext3 related, as the destination ext3 volumes are also coming from the
same OSS servers, but mounted over NFS on the Lustre client that does
the rsync.
I upgraded the ram today morning on both OSS from 4 G to 8 G
and hope the ram on both OSS is enough for both Lustre operations and
the rsync backup.
Regards
Balagopal
Andreas Dilger wrote:> On Dec 23, 2007 18:01 -0400, Balagopal Pillai wrote:
>
>> The cluster is made idle on the weekend to look at the
Lustre
>> ram consumpton issue. The ram used during yesterday''s rsync is
still not
>> freed up. Here is the output from free
>>
>> total used free shared buffers
cached
>> Mem: 4041880 3958744 83136 0 876132
144276
>> -/+ buffers/cache: 2938336 1103544
>> Swap: 4096564 240 4096324
>>
>
> Note that this is normal behaviour for Linux. Ram that is unused provides
> no value, so all available RAM is used for cache until something else is
> needing to use this memory.
>
>
>> Looking at vmstat -m, there is something odd. Seems like
>> ext3_inode_cache and dentry_cache seems to be the biggest occupants of
>> ram. ldiskfs_inode_cache comparatively smaller.
>> -
>>
>> Cache Num Total Size Pages
>> ldiskfs_inode_cache 430199 440044 920 4
>> ldlm_locks 10509 12005 512 7
>> ldlm_resources 10291 11325 256 15
>> buffer_head 230970 393300 88 45
>>
>
>
>> ext3_inode_cache 1636505 1636556 856 4
>> dentry_cache 1349923 1361216 240 16
>>
>
> This is odd, because Lustre doesn''t use ext3 at all. It uses
ldiskfs
> (which is ext3 renamed + patches), so it is some non-Lustre filesystem
> usage which is consuming most of your memory.
>
>
>> Is there anything in proc as explained in
>>
http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html
>> that can force the kernel to flush out the dentry_cache and
>> ext3_inode_cache when the rsync is over and cache is not needed
anymore?
>> Thanks very much.
>>
>
> Only to unmount and remount the filesystem, on the server. On Lustre
> clients there is a mechanism to flush Lustre cache, but that
doesn''t
> help you here.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
Thanks Mark. This looks handy. I was about to put a cron job with vmstat to see how the memory utilization progresses with the early morning rsync . Since i put another 4G on both OSS today morning, hopefully it should be enough for its operation. Regards Balagopal Mark Seger wrote:> If you''re really interesting in tracking memory utilization, collectl > - see http://collectl.sourceforge.net/ - when run as a daemon will > collect/log all slab data once a minute and you can change the > frequency to anything you like. You can then later play it back and > see exactly what is happening over time. As another approach you can > run interactively and if you specify the -oS switch, you''ll only see > changes as they occur. Including the ''T'' will time stamp them as in > the example below: > > [root at cag-dl380-01 root]# collectl -sY -oST -i:1 > # SLAB DETAIL > # > <-----------Objects----------><---------Slab Allocation------> > # Name InUse Bytes Alloc Bytes > InUse Bytes Total Bytes > 11:02:02 size-512 146 74752 208 106496 > 21 86016 26 106496 > 11:02:07 sigqueue 319 42108 319 42108 > 11 45056 11 45056 > 11:02:07 size-512 208 106496 208 106496 26 > 106496 26 106496 > > Since this isn''t a lustre system there isn''t a whole lot of activity... > > -mark > > Andreas Dilger wrote: >> On Dec 23, 2007 18:01 -0400, Balagopal Pillai wrote: >> >>> The cluster is made idle on the weekend to look at the >>> Lustre ram consumpton issue. The ram used during yesterday''s rsync >>> is still not freed up. Here is the output from free >>> total used free shared buffers >>> cached >>> Mem: 4041880 3958744 83136 0 876132 >>> 144276 >>> -/+ buffers/cache: 2938336 1103544 >>> Swap: 4096564 240 4096324 >>> >> >> Note that this is normal behaviour for Linux. Ram that is unused >> provides >> no value, so all available RAM is used for cache until something else is >> needing to use this memory. >> >> >>> Looking at vmstat -m, there is something odd. Seems like >>> ext3_inode_cache and dentry_cache seems to be the biggest occupants >>> of ram. ldiskfs_inode_cache comparatively smaller. - >>> >>> Cache Num Total Size Pages >>> ldiskfs_inode_cache 430199 440044 920 4 >>> ldlm_locks 10509 12005 512 7 >>> ldlm_resources 10291 11325 256 15 >>> buffer_head 230970 393300 88 45 >>> >> >> >>> ext3_inode_cache 1636505 1636556 856 4 >>> dentry_cache 1349923 1361216 240 16 >>> >> >> This is odd, because Lustre doesn''t use ext3 at all. It uses ldiskfs >> (which is ext3 renamed + patches), so it is some non-Lustre filesystem >> usage which is consuming most of your memory. >> >> >>> Is there anything in proc as explained in >>> http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html >>> >>> that can force the kernel to flush out the dentry_cache and >>> ext3_inode_cache when the rsync is over and cache is not needed >>> anymore? Thanks very much. >>> >> >> Only to unmount and remount the filesystem, on the server. On Lustre >> clients there is a mechanism to flush Lustre cache, but that doesn''t >> help you here. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>
In my opinion there are a couple of problems with cron jobs that do monitoring. On the positive note they''re quick and easy, but on the downside you have extra work to do it you want timestamps and then, there''s the issue about all the other potential system metrics you''re missing out on. The neat thing about collectl is it essentially does it all! In the case of lustre that means if you run it with the defaults you''ll get cpu, memory, network, and more in addition to the slab data. However, if you really want to get crazy, you can get the performance by ost and even the rpc stats. The one negative with collectl is while it can do a lot, that translates into a lot of options which can be confusing at first. -mark Balagopal Pillai wrote:> Thanks Mark. This looks handy. I was about to put a cron job with vmstat > to see how the memory utilization progresses with the early morning rsync . > Since i put another 4G on both OSS today morning, hopefully it should be > enough for its operation. > > Regards > Balagopal > > > Mark Seger wrote: > >> If you''re really interesting in tracking memory utilization, collectl >> - see http://collectl.sourceforge.net/ - when run as a daemon will >> collect/log all slab data once a minute and you can change the >> frequency to anything you like. You can then later play it back and >> see exactly what is happening over time. As another approach you can >> run interactively and if you specify the -oS switch, you''ll only see >> changes as they occur. Including the ''T'' will time stamp them as in >> the example below: >> >> [root at cag-dl380-01 root]# collectl -sY -oST -i:1 >> # SLAB DETAIL >> # >> <-----------Objects----------><---------Slab Allocation------> >> # Name InUse Bytes Alloc Bytes >> InUse Bytes Total Bytes >> 11:02:02 size-512 146 74752 208 106496 >> 21 86016 26 106496 >> 11:02:07 sigqueue 319 42108 319 42108 >> 11 45056 11 45056 >> 11:02:07 size-512 208 106496 208 106496 26 >> 106496 26 106496 >> >> Since this isn''t a lustre system there isn''t a whole lot of activity... >> >> -mark >> >> Andreas Dilger wrote: >> >>> On Dec 23, 2007 18:01 -0400, Balagopal Pillai wrote: >>> >>> >>>> The cluster is made idle on the weekend to look at the >>>> Lustre ram consumpton issue. The ram used during yesterday''s rsync >>>> is still not freed up. Here is the output from free >>>> total used free shared buffers >>>> cached >>>> Mem: 4041880 3958744 83136 0 876132 >>>> 144276 >>>> -/+ buffers/cache: 2938336 1103544 >>>> Swap: 4096564 240 4096324 >>>> >>>> >>> Note that this is normal behaviour for Linux. Ram that is unused >>> provides >>> no value, so all available RAM is used for cache until something else is >>> needing to use this memory. >>> >>> >>> >>>> Looking at vmstat -m, there is something odd. Seems like >>>> ext3_inode_cache and dentry_cache seems to be the biggest occupants >>>> of ram. ldiskfs_inode_cache comparatively smaller. - >>>> >>>> Cache Num Total Size Pages >>>> ldiskfs_inode_cache 430199 440044 920 4 >>>> ldlm_locks 10509 12005 512 7 >>>> ldlm_resources 10291 11325 256 15 >>>> buffer_head 230970 393300 88 45 >>>> >>>> >>> >>> >>>> ext3_inode_cache 1636505 1636556 856 4 >>>> dentry_cache 1349923 1361216 240 16 >>>> >>>> >>> This is odd, because Lustre doesn''t use ext3 at all. It uses ldiskfs >>> (which is ext3 renamed + patches), so it is some non-Lustre filesystem >>> usage which is consuming most of your memory. >>> >>> >>> >>>> Is there anything in proc as explained in >>>> http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html >>>> >>>> that can force the kernel to flush out the dentry_cache and >>>> ext3_inode_cache when the rsync is over and cache is not needed >>>> anymore? Thanks very much. >>>> >>>> >>> Only to unmount and remount the filesystem, on the server. On Lustre >>> clients there is a mechanism to flush Lustre cache, but that doesn''t >>> help you here. >>> >>> Cheers, Andreas >>> -- >>> Andreas Dilger >>> Sr. Staff Engineer, Lustre Group >>> Sun Microsystems of Canada, Inc. >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>> >>> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
On Mon, 24 Dec 2007, Andreas Dilger wrote:
Hi Andreas,
Here the output of vmstat -m after doubling the ram yesterday
on both OSS. The rsync completed successfully yesterday night. But almost
5.4G of ram is used up!.
total used free shared buffers cached
Mem: 8166408 8094468 71940 0 2597688 48124
-/+ buffers/cache: 5448656 2717752
Swap: 4096564 224 4096340
Here is the vmstat -m. This time, ldiskfs_inode_cache is
biggest occupant. ex3 inode cache is smaller and dentry cache is quite
big. I can get around the problem if this is ext3 related by exporting the
backup volume by iscsi from the OSS and mounting on the lustre client
via an iscsi client. The nodes have 16GB and that should be enough for all
the caches. But ldiskfs_inode_cache is also becoming quite big. The only
difference between last time and this time is that, i have re-enabled all
the needed rsyncs with one copy of data to an nfs mounted ext3 volume and
another copy to another big Lustre volume. That could explain the beefing
up of ldiskfs_inode_cache this time. The current vmstat -m of both OSS are
pasted below -
1st OSS + MDS -
Cache Num Total Size Pages
ll_fmd_cache 0 0 56 69
osc_quota_info 0 0 32 119
lustre_dquot_cache 0 0 144 27
fsfilt_ldiskfs_fcb 0 0 56 69
ldiskfs_inode_cache 3969899 3969960 920 4
ldiskfs_xattr 0 0 88 45
ldiskfs_prealloc_space 5536 5662 104 38
ll_file_data 0 0 128 31
lustre_inode_cache 0 0 896 4
lov_oinfo 0 0 256 15
ll_qunit_cache 0 0 72 54
ldlm_locks 86258 110698 512 7
ldlm_resources 85847 103725 256 15
ll_import_cache 0 0 440 9
ll_obdo_cache 0 0 208 19
ll_obd_dev_cache 40 40 5328 1
fib6_nodes 11 61 64 61
ip6_dst_cache 16 24 320 12
ndisc_cache 1 15 256 15
rawv6_sock 10 12 1024 4
udpv6_sock 1 4 1024 4
tcpv6_sock 3 4 1728 4
rpc_buffers 8 8 2048 2
rpc_tasks 8 12 320 12
rpc_inode_cache 6 8 832 4
msi_cache 4 4 5760 1
ip_fib_alias 10 119 32 119
ip_fib_hash 10 61 64 61
dm_tio 0 0 24 156
dm_io 0 0 40 96
dm-bvec-(256) 0 0 4096 1
dm-bvec-128 0 0 2048 2
dm-bvec-64 0 0 1024 4
dm-bvec-16 0 0 256 15
dm-bvec-4 0 0 64 61
Cache Num Total Size Pages
dm-bvec-1 0 0 16 225
dm-bio 0 0 128 31
uhci_urb_priv 2 45 88 45
ext3_inode_cache 6104 20520 856 4
ext3_xattr 0 0 88 45
journal_handle 20 81 48 81
journal_head 482 2610 88 45
revoke_table 38 225 16 225
revoke_record 0 0 32 119
scsi_cmd_cache 7 7 512 7
unix_sock 103 150 768 5
ip_mrt_cache 0 0 128 31
tcp_tw_bucket 0 0 192 20
tcp_bind_bucket 14 119 32 119
tcp_open_request 0 0 128 31
inet_peer_cache 0 0 128 31
secpath_cache 0 0 192 20
xfrm_dst_cache 0 0 384 10
ip_dst_cache 38 90 384 10
arp_cache 16 30 256 15
raw_sock 9 9 832 9
udp_sock 14 54 832 9
tcp_sock 56 60 1536 5
flow_cache 0 0 128 31
mqueue_inode_cache 1 4 896 4
relayfs_inode_cache 0 0 592 13
isofs_inode_cache 0 0 632 6
hugetlbfs_inode_cache 1 6 624 6
ext2_inode_cache 0 0 752 5
ext2_xattr 0 0 88 45
dquot 0 0 224 17
eventpoll_pwq 3 54 72 54
eventpoll_epi 3 20 192 20
kioctx 0 0 384 10
kiocb 0 0 256 15
Cache Num Total Size Pages
dnotify_cache 2 96 40 96
fasync_cache 1 156 24 156
shmem_inode_cache 379 415 816 5
posix_timers_cache 0 0 184 21
uid_cache 5 31 128 31
sgpool-256 32 32 8192 1
sgpool-128 32 32 4096 1
sgpool-64 32 32 2048 2
sgpool-32 36 36 1024 4
sgpool-16 32 32 512 8
sgpool-8 45 45 256 15
cfq_pool 98 207 56 69
crq_pool 80 324 72 54
deadline_drq 0 0 96 41
as_arq 0 0 112 35
blkdev_ioc 360 476 32 119
blkdev_queue 33 63 856 9
blkdev_requests 80 120 264 15
biovec-(256) 256 256 4096 1
biovec-128 256 256 2048 2
biovec-64 256 256 1024 4
biovec-16 256 270 256 15
biovec-4 256 305 64 61
biovec-1 332 450 16 225
bio 310 310 128 31
file_lock_cache 3 75 160 25
sock_inode_cache 207 210 704 5
skbuff_head_cache 16465 21900 320 12
sock 6 12 640 6
proc_inode_cache 2637 2658 616 6
sigqueue 45 46 168 23
radix_tree_node 182213 186375 536 7
bdev_cache 52 52 832 4
mnt_cache 60 100 192 20
inode_cache 917 1239 584 7
Cache Num Total Size Pages
dentry_cache 2880362 2882112 240 16
filp 731 816 320 12
names_cache 4 5 4096 1
avc_node 12 432 72 54
key_jar 10 40 192 20
idr_layer_cache 111 119 528 7
buffer_head 650238 742680 88 45
mm_struct 45 112 1152 7
vm_area_struct 1626 2904 176 22
fs_cache 427 549 64 61
files_cache 48 126 832 9
signal_cache 534 615 256 15
sighand_cache 530 543 2112 3
task_struct 555 560 2000 2
anon_vma 679 1248 24 156
shared_policy_node 0 0 56 69
numa_policy 82 675 16 225
size-131072(DMA) 0 0 131072 1
size-131072 12 12 131072 1
size-65536(DMA) 0 0 65536 1
size-65536 229 229 65536 1
size-32768(DMA) 0 0 32768 1
size-32768 0 0 32768 1
size-16384(DMA) 0 0 16384 1
size-16384 1286 1286 16384 1
size-8192(DMA) 0 0 8192 1
size-8192 4884 4884 8192 1
size-4096(DMA) 0 0 4096 1
size-4096 744 786 4096 1
size-2048(DMA) 0 0 2048 2
size-2048 9114 9120 2048 2
size-1620(DMA) 0 0 1664 4
size-1620 86 100 1664 4
size-1024(DMA) 0 0 1024 4
size-1024 15217 16132 1024 4
Cache Num Total Size Pages
size-512(DMA) 0 0 512 8
size-512 1213 2752 512 8
size-256(DMA) 0 0 256 15
size-256 10441 11310 256 15
size-128(DMA) 0 0 128 31
size-128 205487 218488 128 31
size-64(DMA) 0 0 64 61
size-64 777658 891088 64 61
size-32(DMA) 0 0 32 119
size-32 43033 86632 32 119
kmem_cache 225 225 256 15
total used free shared buffers cached
Mem: 8166340 5462540 2703800 0 1515664 448516
-/+ buffers/cache: 3498360 4667980
Swap: 4096440 0 4096440
[root at lustre2 ~]# vmstat -m
Cache Num Total Size Pages
ll_fmd_cache 0 0 56 69
fsfilt_ldiskfs_fcb 4 69 56 69
ldiskfs_inode_cache 1971539 1971548 920 4
ldiskfs_xattr 0 0 88 45
ldiskfs_prealloc_space 9090 9120 104 38
ll_file_data 0 0 128 31
lustre_inode_cache 0 0 896 4
lov_oinfo 0 0 256 15
ll_qunit_cache 0 0 72 54
ldlm_locks 228 1253 512 7
ldlm_resources 226 2235 256 15
ll_import_cache 0 0 440 9
ll_obdo_cache 0 0 208 19
ll_obd_dev_cache 10 10 5328 1
fib6_nodes 11 61 64 61
ip6_dst_cache 16 24 320 12
ndisc_cache 1 15 256 15
rawv6_sock 10 12 1024 4
udpv6_sock 1 4 1024 4
tcpv6_sock 3 4 1728 4
rpc_buffers 8 8 2048 2
rpc_tasks 8 12 320 12
rpc_inode_cache 6 8 832 4
msi_cache 4 4 5760 1
ip_fib_alias 10 119 32 119
ip_fib_hash 10 61 64 61
dm_tio 0 0 24 156
dm_io 0 0 40 96
dm-bvec-(256) 0 0 4096 1
dm-bvec-128 0 0 2048 2
dm-bvec-64 0 0 1024 4
dm-bvec-16 0 0 256 15
dm-bvec-4 0 0 64 61
dm-bvec-1 0 0 16 225
dm-bio 0 0 128 31
Cache Num Total Size Pages
uhci_urb_priv 2 90 88 45
ext3_inode_cache 393257 393260 856 4
ext3_xattr 0 0 88 45
journal_handle 8 81 48 81
journal_head 653 2295 88 45
revoke_table 24 225 16 225
revoke_record 0 0 32 119
scsi_cmd_cache 10 49 512 7
unix_sock 106 150 768 5
ip_mrt_cache 0 0 128 31
tcp_tw_bucket 0 0 192 20
tcp_bind_bucket 17 119 32 119
tcp_open_request 0 0 128 31
inet_peer_cache 0 0 128 31
secpath_cache 0 0 192 20
xfrm_dst_cache 0 0 384 10
ip_dst_cache 38 80 384 10
arp_cache 16 30 256 15
raw_sock 9 9 832 9
udp_sock 15 36 832 9
tcp_sock 56 65 1536 5
flow_cache 0 0 128 31
mqueue_inode_cache 1 4 896 4
relayfs_inode_cache 0 0 592 13
isofs_inode_cache 0 0 632 6
hugetlbfs_inode_cache 1 6 624 6
ext2_inode_cache 0 0 752 5
ext2_xattr 0 0 88 45
dquot 0 0 224 17
eventpoll_pwq 3 54 72 54
eventpoll_epi 3 20 192 20
kioctx 0 0 384 10
kiocb 0 0 256 15
dnotify_cache 2 96 40 96
fasync_cache 1 156 24 156
Cache Num Total Size Pages
shmem_inode_cache 369 390 816 5
posix_timers_cache 0 0 184 21
uid_cache 6 62 128 31
sgpool-256 32 32 8192 1
sgpool-128 32 32 4096 1
sgpool-64 32 32 2048 2
sgpool-32 32 32 1024 4
sgpool-16 33 40 512 8
sgpool-8 45 90 256 15
cfq_pool 95 207 56 69
crq_pool 87 216 72 54
deadline_drq 0 0 96 41
as_arq 0 0 112 35
blkdev_ioc 300 357 32 119
blkdev_queue 35 72 856 9
blkdev_requests 95 135 264 15
biovec-(256) 256 256 4096 1
biovec-128 256 256 2048 2
biovec-64 256 256 1024 4
biovec-16 256 270 256 15
biovec-4 256 305 64 61
biovec-1 324 450 16 225
bio 305 372 128 31
file_lock_cache 3 50 160 25
sock_inode_cache 211 230 704 5
skbuff_head_cache 16556 21324 320 12
sock 6 12 640 6
proc_inode_cache 2361 2364 616 6
sigqueue 33 46 168 23
radix_tree_node 212941 212954 536 7
bdev_cache 45 56 832 4
mnt_cache 46 60 192 20
inode_cache 2730 2779 584 7
dentry_cache 2373901 2374016 240 16
filp 718 804 320 12
Cache Num Total Size Pages
names_cache 5 10 4096 1
avc_node 12 378 72 54
key_jar 12 40 192 20
idr_layer_cache 88 91 528 7
buffer_head 387120 387180 88 45
mm_struct 52 119 1152 7
vm_area_struct 1707 2706 176 22
fs_cache 349 488 64 61
files_cache 47 153 832 9
signal_cache 452 630 256 15
sighand_cache 448 465 2112 3
task_struct 476 492 2000 2
anon_vma 665 1248 24 156
shared_policy_node 0 0 56 69
numa_policy 82 450 16 225
size-131072(DMA) 0 0 131072 1
size-131072 12 12 131072 1
size-65536(DMA) 0 0 65536 1
size-65536 126 126 65536 1
size-32768(DMA) 0 0 32768 1
size-32768 0 0 32768 1
size-16384(DMA) 0 0 16384 1
size-16384 1210 1210 16384 1
size-8192(DMA) 0 0 8192 1
size-8192 2615 2616 8192 1
size-4096(DMA) 0 0 4096 1
size-4096 488 496 4096 1
size-2048(DMA) 0 0 2048 2
size-2048 9050 9102 2048 2
size-1620(DMA) 0 0 1664 4
size-1620 88 108 1664 4
size-1024(DMA) 0 0 1024 4
size-1024 13138 14816 1024 4
size-512(DMA) 0 0 512 8
size-512 854 2752 512 8
Cache Num Total Size Pages
size-256(DMA) 0 0 256 15
size-256 6495 7770 256 15
size-128(DMA) 0 0 128 31
size-128 198380 198586 128 31
size-64(DMA) 0 0 64 61
size-64 20477 36478 64 61
size-32(DMA) 0 0 32 119
size-32 43283 50932 32 119
kmem_cache 180 180 256 15
The collectl stats during the rsync is available at
http://cluster.mathstat.dal.ca/lustre2-20071225-000104.raw.gz
It shows the cache getting built up after 4 am in the morning.
Thanks very much for any recommendations and help. We still have a bit of
headroom in the available ram. Hope these caches don''t continue to
build
everyday and crash the OSS again.
Regards
Balagopal
> On Dec 23, 2007 18:01 -0400, Balagopal Pillai wrote:
> > The cluster is made idle on the weekend to look at the
Lustre
> > ram consumpton issue. The ram used during yesterday''s rsync
is still not
> > freed up. Here is the output from free
> >
> > total used free shared buffers
cached
> > Mem: 4041880 3958744 83136 0 876132
144276
> > -/+ buffers/cache: 2938336 1103544
> > Swap: 4096564 240 4096324
>
> Note that this is normal behaviour for Linux. Ram that is unused provides
> no value, so all available RAM is used for cache until something else is
> needing to use this memory.
>
> > Looking at vmstat -m, there is something odd. Seems like
> > ext3_inode_cache and dentry_cache seems to be the biggest occupants of
> > ram. ldiskfs_inode_cache comparatively smaller.
> > -
> >
> > Cache Num Total Size Pages
> > ldiskfs_inode_cache 430199 440044 920 4
> > ldlm_locks 10509 12005 512 7
> > ldlm_resources 10291 11325 256 15
> > buffer_head 230970 393300 88 45
>
> > ext3_inode_cache 1636505 1636556 856 4
> > dentry_cache 1349923 1361216 240 16
>
> This is odd, because Lustre doesn''t use ext3 at all. It uses
ldiskfs
> (which is ext3 renamed + patches), so it is some non-Lustre filesystem
> usage which is consuming most of your memory.
>
> >
> > Is there anything in proc as explained in
> >
http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html
> > that can force the kernel to flush out the dentry_cache and
> > ext3_inode_cache when the rsync is over and cache is not needed
anymore?
> > Thanks very much.
>
> Only to unmount and remount the filesystem, on the server. On Lustre
> clients there is a mechanism to flush Lustre cache, but that
doesn''t
> help you here.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>