Hi, The cluster is made idle on the weekend to look at the Lustre ram consumpton issue. The ram used during yesterday''s rsync is still not freed up. Here is the output from free total used free shared buffers cached Mem: 4041880 3958744 83136 0 876132 144276 -/+ buffers/cache: 2938336 1103544 Swap: 4096564 240 4096324 Looking at vmstat -m, there is something odd. Seems like ext3_inode_cache and dentry_cache seems to be the biggest occupants of ram. ldiskfs_inode_cache comparatively smaller. - Cache Num Total Size Pages ll_fmd_cache 0 0 56 69 osc_quota_info 0 0 32 119 lustre_dquot_cache 0 0 144 27 fsfilt_ldiskfs_fcb 0 0 56 69 ldiskfs_inode_cache 430199 440044 920 4 ldiskfs_xattr 0 0 88 45 ldiskfs_prealloc_space 14 38 104 38 ll_file_data 0 0 128 31 lustre_inode_cache 0 0 896 4 lov_oinfo 0 0 256 15 ll_qunit_cache 0 0 72 54 ldlm_locks 10509 12005 512 7 ldlm_resources 10291 11325 256 15 ll_import_cache 0 0 440 9 ll_obdo_cache 0 0 208 19 ll_obd_dev_cache 40 40 5328 1 fib6_nodes 11 61 64 61 ip6_dst_cache 16 24 320 12 ndisc_cache 1 15 256 15 rawv6_sock 10 12 1024 4 udpv6_sock 1 4 1024 4 tcpv6_sock 3 4 1728 4 rpc_buffers 8 8 2048 2 rpc_tasks 8 12 320 12 rpc_inode_cache 6 8 832 4 msi_cache 4 4 5760 1 ip_fib_alias 10 119 32 119 ip_fib_hash 10 61 64 61 dm_tio 0 0 24 156 dm_io 0 0 40 96 dm-bvec-(256) 0 0 4096 1 dm-bvec-128 0 0 2048 2 dm-bvec-64 0 0 1024 4 dm-bvec-16 0 0 256 15 dm-bvec-4 0 0 64 61 Cache Num Total Size Pages dm-bvec-1 0 0 16 225 dm-bio 0 0 128 31 uhci_urb_priv 2 45 88 45 ext3_inode_cache 1636505 1636556 856 4 ext3_xattr 0 0 88 45 journal_handle 8 81 48 81 journal_head 460 855 88 45 revoke_table 38 225 16 225 revoke_record 0 0 32 119 scsi_cmd_cache 2 14 512 7 unix_sock 105 155 768 5 ip_mrt_cache 0 0 128 31 tcp_tw_bucket 0 0 192 20 tcp_bind_bucket 14 238 32 119 tcp_open_request 0 0 128 31 inet_peer_cache 0 0 128 31 secpath_cache 0 0 192 20 xfrm_dst_cache 0 0 384 10 ip_dst_cache 40 80 384 10 arp_cache 16 30 256 15 raw_sock 9 9 832 9 udp_sock 14 45 832 9 tcp_sock 56 60 1536 5 flow_cache 0 0 128 31 mqueue_inode_cache 1 4 896 4 relayfs_inode_cache 0 0 592 13 isofs_inode_cache 0 0 632 6 hugetlbfs_inode_cache 1 6 624 6 ext2_inode_cache 0 0 752 5 ext2_xattr 0 0 88 45 dquot 0 0 224 17 eventpoll_pwq 3 54 72 54 eventpoll_epi 3 20 192 20 kioctx 0 0 384 10 kiocb 0 0 256 15 Cache Num Total Size Pages dnotify_cache 2 96 40 96 fasync_cache 1 156 24 156 shmem_inode_cache 376 405 816 5 posix_timers_cache 0 0 184 21 uid_cache 5 62 128 31 sgpool-256 32 32 8192 1 sgpool-128 32 32 4096 1 sgpool-64 32 32 2048 2 sgpool-32 32 32 1024 4 sgpool-16 32 32 512 8 sgpool-8 32 45 256 15 cfq_pool 66 207 56 69 crq_pool 64 324 72 54 deadline_drq 0 0 96 41 as_arq 0 0 112 35 blkdev_ioc 364 476 32 119 blkdev_queue 33 81 856 9 blkdev_requests 64 120 264 15 biovec-(256) 256 256 4096 1 biovec-128 256 256 2048 2 biovec-64 256 256 1024 4 biovec-16 256 270 256 15 biovec-4 256 305 64 61 biovec-1 256 450 16 225 bio 256 279 128 31 file_lock_cache 3 75 160 25 sock_inode_cache 209 220 704 5 skbuff_head_cache 16443 22008 320 12 sock 6 12 640 6 proc_inode_cache 2670 2670 616 6 sigqueue 40 230 168 23 radix_tree_node 68531 68880 536 7 bdev_cache 45 60 832 4 mnt_cache 60 80 192 20 inode_cache 927 1176 584 7 Cache Num Total Size Pages dentry_cache 1349923 1361216 240 16 filp 717 924 320 12 names_cache 3 3 4096 1 avc_node 12 648 72 54 key_jar 10 60 192 20 idr_layer_cache 110 133 528 7 buffer_head 230970 393300 88 45 mm_struct 47 105 1152 7 vm_area_struct 1573 2904 176 22 fs_cache 422 549 64 61 files_cache 58 171 832 9 signal_cache 529 585 256 15 sighand_cache 522 528 2112 3 task_struct 550 554 2000 2 anon_vma 601 1404 24 156 shared_policy_node 0 0 56 69 numa_policy 82 675 16 225 size-131072(DMA) 0 0 131072 1 size-131072 12 12 131072 1 size-65536(DMA) 0 0 65536 1 size-65536 205 205 65536 1 size-32768(DMA) 0 0 32768 1 size-32768 0 0 32768 1 size-16384(DMA) 0 0 16384 1 size-16384 936 936 16384 1 size-8192(DMA) 0 0 8192 1 size-8192 4911 4911 8192 1 size-4096(DMA) 0 0 4096 1 size-4096 676 676 4096 1 size-2048(DMA) 0 0 2048 2 size-2048 8753 8782 2048 2 size-1620(DMA) 0 0 1664 4 size-1620 86 104 1664 4 size-1024(DMA) 0 0 1024 4 size-1024 15228 15900 1024 4 Cache Num Total Size Pages size-512(DMA) 0 0 512 8 size-512 1189 2752 512 8 size-256(DMA) 0 0 256 15 size-256 10235 10560 256 15 size-128(DMA) 0 0 128 31 size-128 200934 211916 128 31 size-64(DMA) 0 0 64 61 size-64 712970 735416 64 61 size-32(DMA) 0 0 32 119 size-32 2338 94486 32 119 kmem_cache 210 210 256 15 On the second OSS, here is the vmstat output - Again dentry_cache and ldiskfs_inode_cache and ext3_inode_cache seems to be the biggest users of ram. ll_fmd_cache 0 0 56 69 ldiskfs_inode_cache 987664 987668 920 4 lustre_inode_cache 0 0 896 4 ll_qunit_cache 0 0 72 54 ll_import_cache 0 0 440 9 ll_obdo_cache 0 0 208 19 ll_obd_dev_cache 10 10 5328 1 ip6_dst_cache 16 24 320 12 ndisc_cache 1 15 256 15 rpc_inode_cache 6 8 832 4 msi_cache 4 4 5760 1 ext3_inode_cache 392316 392328 856 4 scsi_cmd_cache 41 42 512 7 ip_mrt_cache 0 0 128 31 inet_peer_cache 0 0 128 31 secpath_cache 0 0 192 20 xfrm_dst_cache 0 0 384 10 ip_dst_cache 39 80 384 10 arp_cache 16 30 256 15 flow_cache 0 0 128 31 mqueue_inode_cache 1 4 896 4 relayfs_inode_cache 0 0 592 13 isofs_inode_cache 0 0 632 6 hugetlbfs_inode_cache 1 6 624 6 ext2_inode_cache 0 0 752 5 dnotify_cache 2 96 40 96 fasync_cache 1 156 24 156 shmem_inode_cache 370 400 816 5 posix_timers_cache 0 0 184 21 uid_cache 7 31 128 31 file_lock_cache 7 75 160 25 sock_inode_cache 216 235 704 5 skbuff_head_cache 16500 21768 320 12 proc_inode_cache 2260 2262 616 6 bdev_cache 56 56 832 4 mnt_cache 46 60 192 20 inode_cache 944 1218 584 7 dentry_cache 1387440 1387440 240 16 names_cache 10 10 4096 1 idr_layer_cache 91 98 528 7 fs_cache 366 549 64 61 files_cache 69 153 832 9 signal_cache 462 585 256 15 sighand_cache 453 453 2112 3 kmem_cache 180 180 256 15 Is there a way to flush out the cache so that the ram is freed up? The same issue is reported here - http://lkml.org/lkml/2006/8/3/376 But both OSS run CentOS 4 and 2.6.9 kernel, so drop_caches doesn''t seem to be available in /proc. Is there anything in proc as explained in http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html that can force the kernel to flush out the dentry_cache and ext3_inode_cache when the rsync is over and cache is not needed anymore? Thanks very much. Regards Balagopal
On Dec 23, 2007 18:01 -0400, Balagopal Pillai wrote:> The cluster is made idle on the weekend to look at the Lustre > ram consumpton issue. The ram used during yesterday''s rsync is still not > freed up. Here is the output from free > > total used free shared buffers cached > Mem: 4041880 3958744 83136 0 876132 144276 > -/+ buffers/cache: 2938336 1103544 > Swap: 4096564 240 4096324Note that this is normal behaviour for Linux. Ram that is unused provides no value, so all available RAM is used for cache until something else is needing to use this memory.> Looking at vmstat -m, there is something odd. Seems like > ext3_inode_cache and dentry_cache seems to be the biggest occupants of > ram. ldiskfs_inode_cache comparatively smaller. > - > > Cache Num Total Size Pages > ldiskfs_inode_cache 430199 440044 920 4 > ldlm_locks 10509 12005 512 7 > ldlm_resources 10291 11325 256 15 > buffer_head 230970 393300 88 45> ext3_inode_cache 1636505 1636556 856 4 > dentry_cache 1349923 1361216 240 16This is odd, because Lustre doesn''t use ext3 at all. It uses ldiskfs (which is ext3 renamed + patches), so it is some non-Lustre filesystem usage which is consuming most of your memory.> > Is there anything in proc as explained in > http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html > that can force the kernel to flush out the dentry_cache and > ext3_inode_cache when the rsync is over and cache is not needed anymore? > Thanks very much.Only to unmount and remount the filesystem, on the server. On Lustre clients there is a mechanism to flush Lustre cache, but that doesn''t help you here. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
If you''re really interesting in tracking memory utilization, collectl - see http://collectl.sourceforge.net/ - when run as a daemon will collect/log all slab data once a minute and you can change the frequency to anything you like. You can then later play it back and see exactly what is happening over time. As another approach you can run interactively and if you specify the -oS switch, you''ll only see changes as they occur. Including the ''T'' will time stamp them as in the example below: [root at cag-dl380-01 root]# collectl -sY -oST -i:1 # SLAB DETAIL # <-----------Objects----------><---------Slab Allocation------> # Name InUse Bytes Alloc Bytes InUse Bytes Total Bytes 11:02:02 size-512 146 74752 208 106496 21 86016 26 106496 11:02:07 sigqueue 319 42108 319 42108 11 45056 11 45056 11:02:07 size-512 208 106496 208 106496 26 106496 26 106496 Since this isn''t a lustre system there isn''t a whole lot of activity... -mark Andreas Dilger wrote:> On Dec 23, 2007 18:01 -0400, Balagopal Pillai wrote: > >> The cluster is made idle on the weekend to look at the Lustre >> ram consumpton issue. The ram used during yesterday''s rsync is still not >> freed up. Here is the output from free >> >> total used free shared buffers cached >> Mem: 4041880 3958744 83136 0 876132 144276 >> -/+ buffers/cache: 2938336 1103544 >> Swap: 4096564 240 4096324 >> > > Note that this is normal behaviour for Linux. Ram that is unused provides > no value, so all available RAM is used for cache until something else is > needing to use this memory. > > >> Looking at vmstat -m, there is something odd. Seems like >> ext3_inode_cache and dentry_cache seems to be the biggest occupants of >> ram. ldiskfs_inode_cache comparatively smaller. >> - >> >> Cache Num Total Size Pages >> ldiskfs_inode_cache 430199 440044 920 4 >> ldlm_locks 10509 12005 512 7 >> ldlm_resources 10291 11325 256 15 >> buffer_head 230970 393300 88 45 >> > > >> ext3_inode_cache 1636505 1636556 856 4 >> dentry_cache 1349923 1361216 240 16 >> > > This is odd, because Lustre doesn''t use ext3 at all. It uses ldiskfs > (which is ext3 renamed + patches), so it is some non-Lustre filesystem > usage which is consuming most of your memory. > > >> Is there anything in proc as explained in >> http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html >> that can force the kernel to flush out the dentry_cache and >> ext3_inode_cache when the rsync is over and cache is not needed anymore? >> Thanks very much. >> > > Only to unmount and remount the filesystem, on the server. On Lustre > clients there is a mechanism to flush Lustre cache, but that doesn''t > help you here. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Hi Andreas, Thanks. The two OSS also export two ext3 volumes each via NFS that are used to backup the 4 smaller Lustre volumes. One possibility as you mentioned, is that the memory consumption is not Lustre related, but ext3 related, as the destination ext3 volumes are also coming from the same OSS servers, but mounted over NFS on the Lustre client that does the rsync. I upgraded the ram today morning on both OSS from 4 G to 8 G and hope the ram on both OSS is enough for both Lustre operations and the rsync backup. Regards Balagopal Andreas Dilger wrote:> On Dec 23, 2007 18:01 -0400, Balagopal Pillai wrote: > >> The cluster is made idle on the weekend to look at the Lustre >> ram consumpton issue. The ram used during yesterday''s rsync is still not >> freed up. Here is the output from free >> >> total used free shared buffers cached >> Mem: 4041880 3958744 83136 0 876132 144276 >> -/+ buffers/cache: 2938336 1103544 >> Swap: 4096564 240 4096324 >> > > Note that this is normal behaviour for Linux. Ram that is unused provides > no value, so all available RAM is used for cache until something else is > needing to use this memory. > > >> Looking at vmstat -m, there is something odd. Seems like >> ext3_inode_cache and dentry_cache seems to be the biggest occupants of >> ram. ldiskfs_inode_cache comparatively smaller. >> - >> >> Cache Num Total Size Pages >> ldiskfs_inode_cache 430199 440044 920 4 >> ldlm_locks 10509 12005 512 7 >> ldlm_resources 10291 11325 256 15 >> buffer_head 230970 393300 88 45 >> > > >> ext3_inode_cache 1636505 1636556 856 4 >> dentry_cache 1349923 1361216 240 16 >> > > This is odd, because Lustre doesn''t use ext3 at all. It uses ldiskfs > (which is ext3 renamed + patches), so it is some non-Lustre filesystem > usage which is consuming most of your memory. > > >> Is there anything in proc as explained in >> http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html >> that can force the kernel to flush out the dentry_cache and >> ext3_inode_cache when the rsync is over and cache is not needed anymore? >> Thanks very much. >> > > Only to unmount and remount the filesystem, on the server. On Lustre > clients there is a mechanism to flush Lustre cache, but that doesn''t > help you here. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >
Thanks Mark. This looks handy. I was about to put a cron job with vmstat to see how the memory utilization progresses with the early morning rsync . Since i put another 4G on both OSS today morning, hopefully it should be enough for its operation. Regards Balagopal Mark Seger wrote:> If you''re really interesting in tracking memory utilization, collectl > - see http://collectl.sourceforge.net/ - when run as a daemon will > collect/log all slab data once a minute and you can change the > frequency to anything you like. You can then later play it back and > see exactly what is happening over time. As another approach you can > run interactively and if you specify the -oS switch, you''ll only see > changes as they occur. Including the ''T'' will time stamp them as in > the example below: > > [root at cag-dl380-01 root]# collectl -sY -oST -i:1 > # SLAB DETAIL > # > <-----------Objects----------><---------Slab Allocation------> > # Name InUse Bytes Alloc Bytes > InUse Bytes Total Bytes > 11:02:02 size-512 146 74752 208 106496 > 21 86016 26 106496 > 11:02:07 sigqueue 319 42108 319 42108 > 11 45056 11 45056 > 11:02:07 size-512 208 106496 208 106496 26 > 106496 26 106496 > > Since this isn''t a lustre system there isn''t a whole lot of activity... > > -mark > > Andreas Dilger wrote: >> On Dec 23, 2007 18:01 -0400, Balagopal Pillai wrote: >> >>> The cluster is made idle on the weekend to look at the >>> Lustre ram consumpton issue. The ram used during yesterday''s rsync >>> is still not freed up. Here is the output from free >>> total used free shared buffers >>> cached >>> Mem: 4041880 3958744 83136 0 876132 >>> 144276 >>> -/+ buffers/cache: 2938336 1103544 >>> Swap: 4096564 240 4096324 >>> >> >> Note that this is normal behaviour for Linux. Ram that is unused >> provides >> no value, so all available RAM is used for cache until something else is >> needing to use this memory. >> >> >>> Looking at vmstat -m, there is something odd. Seems like >>> ext3_inode_cache and dentry_cache seems to be the biggest occupants >>> of ram. ldiskfs_inode_cache comparatively smaller. - >>> >>> Cache Num Total Size Pages >>> ldiskfs_inode_cache 430199 440044 920 4 >>> ldlm_locks 10509 12005 512 7 >>> ldlm_resources 10291 11325 256 15 >>> buffer_head 230970 393300 88 45 >>> >> >> >>> ext3_inode_cache 1636505 1636556 856 4 >>> dentry_cache 1349923 1361216 240 16 >>> >> >> This is odd, because Lustre doesn''t use ext3 at all. It uses ldiskfs >> (which is ext3 renamed + patches), so it is some non-Lustre filesystem >> usage which is consuming most of your memory. >> >> >>> Is there anything in proc as explained in >>> http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html >>> >>> that can force the kernel to flush out the dentry_cache and >>> ext3_inode_cache when the rsync is over and cache is not needed >>> anymore? Thanks very much. >>> >> >> Only to unmount and remount the filesystem, on the server. On Lustre >> clients there is a mechanism to flush Lustre cache, but that doesn''t >> help you here. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>
In my opinion there are a couple of problems with cron jobs that do monitoring. On the positive note they''re quick and easy, but on the downside you have extra work to do it you want timestamps and then, there''s the issue about all the other potential system metrics you''re missing out on. The neat thing about collectl is it essentially does it all! In the case of lustre that means if you run it with the defaults you''ll get cpu, memory, network, and more in addition to the slab data. However, if you really want to get crazy, you can get the performance by ost and even the rpc stats. The one negative with collectl is while it can do a lot, that translates into a lot of options which can be confusing at first. -mark Balagopal Pillai wrote:> Thanks Mark. This looks handy. I was about to put a cron job with vmstat > to see how the memory utilization progresses with the early morning rsync . > Since i put another 4G on both OSS today morning, hopefully it should be > enough for its operation. > > Regards > Balagopal > > > Mark Seger wrote: > >> If you''re really interesting in tracking memory utilization, collectl >> - see http://collectl.sourceforge.net/ - when run as a daemon will >> collect/log all slab data once a minute and you can change the >> frequency to anything you like. You can then later play it back and >> see exactly what is happening over time. As another approach you can >> run interactively and if you specify the -oS switch, you''ll only see >> changes as they occur. Including the ''T'' will time stamp them as in >> the example below: >> >> [root at cag-dl380-01 root]# collectl -sY -oST -i:1 >> # SLAB DETAIL >> # >> <-----------Objects----------><---------Slab Allocation------> >> # Name InUse Bytes Alloc Bytes >> InUse Bytes Total Bytes >> 11:02:02 size-512 146 74752 208 106496 >> 21 86016 26 106496 >> 11:02:07 sigqueue 319 42108 319 42108 >> 11 45056 11 45056 >> 11:02:07 size-512 208 106496 208 106496 26 >> 106496 26 106496 >> >> Since this isn''t a lustre system there isn''t a whole lot of activity... >> >> -mark >> >> Andreas Dilger wrote: >> >>> On Dec 23, 2007 18:01 -0400, Balagopal Pillai wrote: >>> >>> >>>> The cluster is made idle on the weekend to look at the >>>> Lustre ram consumpton issue. The ram used during yesterday''s rsync >>>> is still not freed up. Here is the output from free >>>> total used free shared buffers >>>> cached >>>> Mem: 4041880 3958744 83136 0 876132 >>>> 144276 >>>> -/+ buffers/cache: 2938336 1103544 >>>> Swap: 4096564 240 4096324 >>>> >>>> >>> Note that this is normal behaviour for Linux. Ram that is unused >>> provides >>> no value, so all available RAM is used for cache until something else is >>> needing to use this memory. >>> >>> >>> >>>> Looking at vmstat -m, there is something odd. Seems like >>>> ext3_inode_cache and dentry_cache seems to be the biggest occupants >>>> of ram. ldiskfs_inode_cache comparatively smaller. - >>>> >>>> Cache Num Total Size Pages >>>> ldiskfs_inode_cache 430199 440044 920 4 >>>> ldlm_locks 10509 12005 512 7 >>>> ldlm_resources 10291 11325 256 15 >>>> buffer_head 230970 393300 88 45 >>>> >>>> >>> >>> >>>> ext3_inode_cache 1636505 1636556 856 4 >>>> dentry_cache 1349923 1361216 240 16 >>>> >>>> >>> This is odd, because Lustre doesn''t use ext3 at all. It uses ldiskfs >>> (which is ext3 renamed + patches), so it is some non-Lustre filesystem >>> usage which is consuming most of your memory. >>> >>> >>> >>>> Is there anything in proc as explained in >>>> http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html >>>> >>>> that can force the kernel to flush out the dentry_cache and >>>> ext3_inode_cache when the rsync is over and cache is not needed >>>> anymore? Thanks very much. >>>> >>>> >>> Only to unmount and remount the filesystem, on the server. On Lustre >>> clients there is a mechanism to flush Lustre cache, but that doesn''t >>> help you here. >>> >>> Cheers, Andreas >>> -- >>> Andreas Dilger >>> Sr. Staff Engineer, Lustre Group >>> Sun Microsystems of Canada, Inc. >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>> >>> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
On Mon, 24 Dec 2007, Andreas Dilger wrote: Hi Andreas, Here the output of vmstat -m after doubling the ram yesterday on both OSS. The rsync completed successfully yesterday night. But almost 5.4G of ram is used up!. total used free shared buffers cached Mem: 8166408 8094468 71940 0 2597688 48124 -/+ buffers/cache: 5448656 2717752 Swap: 4096564 224 4096340 Here is the vmstat -m. This time, ldiskfs_inode_cache is biggest occupant. ex3 inode cache is smaller and dentry cache is quite big. I can get around the problem if this is ext3 related by exporting the backup volume by iscsi from the OSS and mounting on the lustre client via an iscsi client. The nodes have 16GB and that should be enough for all the caches. But ldiskfs_inode_cache is also becoming quite big. The only difference between last time and this time is that, i have re-enabled all the needed rsyncs with one copy of data to an nfs mounted ext3 volume and another copy to another big Lustre volume. That could explain the beefing up of ldiskfs_inode_cache this time. The current vmstat -m of both OSS are pasted below - 1st OSS + MDS - Cache Num Total Size Pages ll_fmd_cache 0 0 56 69 osc_quota_info 0 0 32 119 lustre_dquot_cache 0 0 144 27 fsfilt_ldiskfs_fcb 0 0 56 69 ldiskfs_inode_cache 3969899 3969960 920 4 ldiskfs_xattr 0 0 88 45 ldiskfs_prealloc_space 5536 5662 104 38 ll_file_data 0 0 128 31 lustre_inode_cache 0 0 896 4 lov_oinfo 0 0 256 15 ll_qunit_cache 0 0 72 54 ldlm_locks 86258 110698 512 7 ldlm_resources 85847 103725 256 15 ll_import_cache 0 0 440 9 ll_obdo_cache 0 0 208 19 ll_obd_dev_cache 40 40 5328 1 fib6_nodes 11 61 64 61 ip6_dst_cache 16 24 320 12 ndisc_cache 1 15 256 15 rawv6_sock 10 12 1024 4 udpv6_sock 1 4 1024 4 tcpv6_sock 3 4 1728 4 rpc_buffers 8 8 2048 2 rpc_tasks 8 12 320 12 rpc_inode_cache 6 8 832 4 msi_cache 4 4 5760 1 ip_fib_alias 10 119 32 119 ip_fib_hash 10 61 64 61 dm_tio 0 0 24 156 dm_io 0 0 40 96 dm-bvec-(256) 0 0 4096 1 dm-bvec-128 0 0 2048 2 dm-bvec-64 0 0 1024 4 dm-bvec-16 0 0 256 15 dm-bvec-4 0 0 64 61 Cache Num Total Size Pages dm-bvec-1 0 0 16 225 dm-bio 0 0 128 31 uhci_urb_priv 2 45 88 45 ext3_inode_cache 6104 20520 856 4 ext3_xattr 0 0 88 45 journal_handle 20 81 48 81 journal_head 482 2610 88 45 revoke_table 38 225 16 225 revoke_record 0 0 32 119 scsi_cmd_cache 7 7 512 7 unix_sock 103 150 768 5 ip_mrt_cache 0 0 128 31 tcp_tw_bucket 0 0 192 20 tcp_bind_bucket 14 119 32 119 tcp_open_request 0 0 128 31 inet_peer_cache 0 0 128 31 secpath_cache 0 0 192 20 xfrm_dst_cache 0 0 384 10 ip_dst_cache 38 90 384 10 arp_cache 16 30 256 15 raw_sock 9 9 832 9 udp_sock 14 54 832 9 tcp_sock 56 60 1536 5 flow_cache 0 0 128 31 mqueue_inode_cache 1 4 896 4 relayfs_inode_cache 0 0 592 13 isofs_inode_cache 0 0 632 6 hugetlbfs_inode_cache 1 6 624 6 ext2_inode_cache 0 0 752 5 ext2_xattr 0 0 88 45 dquot 0 0 224 17 eventpoll_pwq 3 54 72 54 eventpoll_epi 3 20 192 20 kioctx 0 0 384 10 kiocb 0 0 256 15 Cache Num Total Size Pages dnotify_cache 2 96 40 96 fasync_cache 1 156 24 156 shmem_inode_cache 379 415 816 5 posix_timers_cache 0 0 184 21 uid_cache 5 31 128 31 sgpool-256 32 32 8192 1 sgpool-128 32 32 4096 1 sgpool-64 32 32 2048 2 sgpool-32 36 36 1024 4 sgpool-16 32 32 512 8 sgpool-8 45 45 256 15 cfq_pool 98 207 56 69 crq_pool 80 324 72 54 deadline_drq 0 0 96 41 as_arq 0 0 112 35 blkdev_ioc 360 476 32 119 blkdev_queue 33 63 856 9 blkdev_requests 80 120 264 15 biovec-(256) 256 256 4096 1 biovec-128 256 256 2048 2 biovec-64 256 256 1024 4 biovec-16 256 270 256 15 biovec-4 256 305 64 61 biovec-1 332 450 16 225 bio 310 310 128 31 file_lock_cache 3 75 160 25 sock_inode_cache 207 210 704 5 skbuff_head_cache 16465 21900 320 12 sock 6 12 640 6 proc_inode_cache 2637 2658 616 6 sigqueue 45 46 168 23 radix_tree_node 182213 186375 536 7 bdev_cache 52 52 832 4 mnt_cache 60 100 192 20 inode_cache 917 1239 584 7 Cache Num Total Size Pages dentry_cache 2880362 2882112 240 16 filp 731 816 320 12 names_cache 4 5 4096 1 avc_node 12 432 72 54 key_jar 10 40 192 20 idr_layer_cache 111 119 528 7 buffer_head 650238 742680 88 45 mm_struct 45 112 1152 7 vm_area_struct 1626 2904 176 22 fs_cache 427 549 64 61 files_cache 48 126 832 9 signal_cache 534 615 256 15 sighand_cache 530 543 2112 3 task_struct 555 560 2000 2 anon_vma 679 1248 24 156 shared_policy_node 0 0 56 69 numa_policy 82 675 16 225 size-131072(DMA) 0 0 131072 1 size-131072 12 12 131072 1 size-65536(DMA) 0 0 65536 1 size-65536 229 229 65536 1 size-32768(DMA) 0 0 32768 1 size-32768 0 0 32768 1 size-16384(DMA) 0 0 16384 1 size-16384 1286 1286 16384 1 size-8192(DMA) 0 0 8192 1 size-8192 4884 4884 8192 1 size-4096(DMA) 0 0 4096 1 size-4096 744 786 4096 1 size-2048(DMA) 0 0 2048 2 size-2048 9114 9120 2048 2 size-1620(DMA) 0 0 1664 4 size-1620 86 100 1664 4 size-1024(DMA) 0 0 1024 4 size-1024 15217 16132 1024 4 Cache Num Total Size Pages size-512(DMA) 0 0 512 8 size-512 1213 2752 512 8 size-256(DMA) 0 0 256 15 size-256 10441 11310 256 15 size-128(DMA) 0 0 128 31 size-128 205487 218488 128 31 size-64(DMA) 0 0 64 61 size-64 777658 891088 64 61 size-32(DMA) 0 0 32 119 size-32 43033 86632 32 119 kmem_cache 225 225 256 15 total used free shared buffers cached Mem: 8166340 5462540 2703800 0 1515664 448516 -/+ buffers/cache: 3498360 4667980 Swap: 4096440 0 4096440 [root at lustre2 ~]# vmstat -m Cache Num Total Size Pages ll_fmd_cache 0 0 56 69 fsfilt_ldiskfs_fcb 4 69 56 69 ldiskfs_inode_cache 1971539 1971548 920 4 ldiskfs_xattr 0 0 88 45 ldiskfs_prealloc_space 9090 9120 104 38 ll_file_data 0 0 128 31 lustre_inode_cache 0 0 896 4 lov_oinfo 0 0 256 15 ll_qunit_cache 0 0 72 54 ldlm_locks 228 1253 512 7 ldlm_resources 226 2235 256 15 ll_import_cache 0 0 440 9 ll_obdo_cache 0 0 208 19 ll_obd_dev_cache 10 10 5328 1 fib6_nodes 11 61 64 61 ip6_dst_cache 16 24 320 12 ndisc_cache 1 15 256 15 rawv6_sock 10 12 1024 4 udpv6_sock 1 4 1024 4 tcpv6_sock 3 4 1728 4 rpc_buffers 8 8 2048 2 rpc_tasks 8 12 320 12 rpc_inode_cache 6 8 832 4 msi_cache 4 4 5760 1 ip_fib_alias 10 119 32 119 ip_fib_hash 10 61 64 61 dm_tio 0 0 24 156 dm_io 0 0 40 96 dm-bvec-(256) 0 0 4096 1 dm-bvec-128 0 0 2048 2 dm-bvec-64 0 0 1024 4 dm-bvec-16 0 0 256 15 dm-bvec-4 0 0 64 61 dm-bvec-1 0 0 16 225 dm-bio 0 0 128 31 Cache Num Total Size Pages uhci_urb_priv 2 90 88 45 ext3_inode_cache 393257 393260 856 4 ext3_xattr 0 0 88 45 journal_handle 8 81 48 81 journal_head 653 2295 88 45 revoke_table 24 225 16 225 revoke_record 0 0 32 119 scsi_cmd_cache 10 49 512 7 unix_sock 106 150 768 5 ip_mrt_cache 0 0 128 31 tcp_tw_bucket 0 0 192 20 tcp_bind_bucket 17 119 32 119 tcp_open_request 0 0 128 31 inet_peer_cache 0 0 128 31 secpath_cache 0 0 192 20 xfrm_dst_cache 0 0 384 10 ip_dst_cache 38 80 384 10 arp_cache 16 30 256 15 raw_sock 9 9 832 9 udp_sock 15 36 832 9 tcp_sock 56 65 1536 5 flow_cache 0 0 128 31 mqueue_inode_cache 1 4 896 4 relayfs_inode_cache 0 0 592 13 isofs_inode_cache 0 0 632 6 hugetlbfs_inode_cache 1 6 624 6 ext2_inode_cache 0 0 752 5 ext2_xattr 0 0 88 45 dquot 0 0 224 17 eventpoll_pwq 3 54 72 54 eventpoll_epi 3 20 192 20 kioctx 0 0 384 10 kiocb 0 0 256 15 dnotify_cache 2 96 40 96 fasync_cache 1 156 24 156 Cache Num Total Size Pages shmem_inode_cache 369 390 816 5 posix_timers_cache 0 0 184 21 uid_cache 6 62 128 31 sgpool-256 32 32 8192 1 sgpool-128 32 32 4096 1 sgpool-64 32 32 2048 2 sgpool-32 32 32 1024 4 sgpool-16 33 40 512 8 sgpool-8 45 90 256 15 cfq_pool 95 207 56 69 crq_pool 87 216 72 54 deadline_drq 0 0 96 41 as_arq 0 0 112 35 blkdev_ioc 300 357 32 119 blkdev_queue 35 72 856 9 blkdev_requests 95 135 264 15 biovec-(256) 256 256 4096 1 biovec-128 256 256 2048 2 biovec-64 256 256 1024 4 biovec-16 256 270 256 15 biovec-4 256 305 64 61 biovec-1 324 450 16 225 bio 305 372 128 31 file_lock_cache 3 50 160 25 sock_inode_cache 211 230 704 5 skbuff_head_cache 16556 21324 320 12 sock 6 12 640 6 proc_inode_cache 2361 2364 616 6 sigqueue 33 46 168 23 radix_tree_node 212941 212954 536 7 bdev_cache 45 56 832 4 mnt_cache 46 60 192 20 inode_cache 2730 2779 584 7 dentry_cache 2373901 2374016 240 16 filp 718 804 320 12 Cache Num Total Size Pages names_cache 5 10 4096 1 avc_node 12 378 72 54 key_jar 12 40 192 20 idr_layer_cache 88 91 528 7 buffer_head 387120 387180 88 45 mm_struct 52 119 1152 7 vm_area_struct 1707 2706 176 22 fs_cache 349 488 64 61 files_cache 47 153 832 9 signal_cache 452 630 256 15 sighand_cache 448 465 2112 3 task_struct 476 492 2000 2 anon_vma 665 1248 24 156 shared_policy_node 0 0 56 69 numa_policy 82 450 16 225 size-131072(DMA) 0 0 131072 1 size-131072 12 12 131072 1 size-65536(DMA) 0 0 65536 1 size-65536 126 126 65536 1 size-32768(DMA) 0 0 32768 1 size-32768 0 0 32768 1 size-16384(DMA) 0 0 16384 1 size-16384 1210 1210 16384 1 size-8192(DMA) 0 0 8192 1 size-8192 2615 2616 8192 1 size-4096(DMA) 0 0 4096 1 size-4096 488 496 4096 1 size-2048(DMA) 0 0 2048 2 size-2048 9050 9102 2048 2 size-1620(DMA) 0 0 1664 4 size-1620 88 108 1664 4 size-1024(DMA) 0 0 1024 4 size-1024 13138 14816 1024 4 size-512(DMA) 0 0 512 8 size-512 854 2752 512 8 Cache Num Total Size Pages size-256(DMA) 0 0 256 15 size-256 6495 7770 256 15 size-128(DMA) 0 0 128 31 size-128 198380 198586 128 31 size-64(DMA) 0 0 64 61 size-64 20477 36478 64 61 size-32(DMA) 0 0 32 119 size-32 43283 50932 32 119 kmem_cache 180 180 256 15 The collectl stats during the rsync is available at http://cluster.mathstat.dal.ca/lustre2-20071225-000104.raw.gz It shows the cache getting built up after 4 am in the morning. Thanks very much for any recommendations and help. We still have a bit of headroom in the available ram. Hope these caches don''t continue to build everyday and crash the OSS again. Regards Balagopal> On Dec 23, 2007 18:01 -0400, Balagopal Pillai wrote: > > The cluster is made idle on the weekend to look at the Lustre > > ram consumpton issue. The ram used during yesterday''s rsync is still not > > freed up. Here is the output from free > > > > total used free shared buffers cached > > Mem: 4041880 3958744 83136 0 876132 144276 > > -/+ buffers/cache: 2938336 1103544 > > Swap: 4096564 240 4096324 > > Note that this is normal behaviour for Linux. Ram that is unused provides > no value, so all available RAM is used for cache until something else is > needing to use this memory. > > > Looking at vmstat -m, there is something odd. Seems like > > ext3_inode_cache and dentry_cache seems to be the biggest occupants of > > ram. ldiskfs_inode_cache comparatively smaller. > > - > > > > Cache Num Total Size Pages > > ldiskfs_inode_cache 430199 440044 920 4 > > ldlm_locks 10509 12005 512 7 > > ldlm_resources 10291 11325 256 15 > > buffer_head 230970 393300 88 45 > > > ext3_inode_cache 1636505 1636556 856 4 > > dentry_cache 1349923 1361216 240 16 > > This is odd, because Lustre doesn''t use ext3 at all. It uses ldiskfs > (which is ext3 renamed + patches), so it is some non-Lustre filesystem > usage which is consuming most of your memory. > > > > > Is there anything in proc as explained in > > http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html > > that can force the kernel to flush out the dentry_cache and > > ext3_inode_cache when the rsync is over and cache is not needed anymore? > > Thanks very much. > > Only to unmount and remount the filesystem, on the server. On Lustre > clients there is a mechanism to flush Lustre cache, but that doesn''t > help you here. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >