Yes, the clients are doing lots of creates. But my question is, if this is a memory leak, why does ocfs2 eat up the memory as soon as the clients start accessing the filesystem. Within about 5-10 minutes all physical RAM is consumed but then the memory consumption stops. It does not go into swap. Do you happen to know what version of ocfs2 has the fix? If it was a leak would the process not be more gradual and continuous? It would continue to eat into swap no? And if it was a leak would the ram be freed when ocfs was unmounted? Is there a command that shows what is using the kernel memory? Here is what /proc/slabinfo shows (cut down for formatting). I don't understand how to read this so maybe someone can indicate if something looks wrong? ======# cat /proc/slabinfo # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> nfsd4_delegations 0 0 596 13 2 nfsd4_stateids 0 0 72 53 1 nfsd4_files 0 0 36 101 1 nfsd4_stateowners 0 0 344 11 1 rpc_buffers 8 8 2048 2 1 rpc_tasks 8 15 256 15 1 rpc_inode_cache 0 0 512 7 1 ocfs2_lock 152 203 16 203 1 ocfs2_inode_cache 12484 12536 896 4 1 ocfs2_uptodate 1381 1469 32 113 1 ocfs2_em_ent 37005 37406 64 59 1 dlmfs_inode_cache 1 6 640 6 1 dlm_mle_cache 10 10 384 10 1 configfs_dir_cache 33 78 48 78 1 fib6_nodes 7 113 32 113 1 ip6_dst_cache 7 15 256 15 1 ndisc_cache 1 15 256 15 1 RAWv6 5 6 640 6 1 UDPv6 3 6 640 6 1 tw_sock_TCPv6 0 0 128 30 1 request_sock_TCPv6 0 0 128 30 1 TCPv6 8 9 1280 3 1 ip_fib_alias 16 113 32 113 1 ip_fib_hash 16 113 32 113 1 dm_events 16 169 20 169 1 dm_tio 4157 7308 16 203 1 dm_io 4155 6760 20 169 1 uhci_urb_priv 0 0 40 92 1 ext3_inode_cache 1062 2856 512 8 1 ext3_xattr 0 0 48 78 1 journal_handle 74 169 20 169 1 journal_head 583 1224 52 72 1 revoke_table 6 254 12 254 1 revoke_record 0 0 16 203 1 qla2xxx_srbs 244 360 128 30 1 scsi_cmd_cache 106 130 384 10 1 sgpool-256 32 32 4096 1 1 sgpool-128 42 42 2048 2 1 sgpool-64 44 44 1024 4 1 sgpool-32 48 48 512 8 1 sgpool-16 75 75 256 15 1 sgpool-8 153 210 128 30 1 scsi_io_context 0 0 104 37 1 UNIX 377 399 512 7 1 ip_mrt_cache 0 0 128 30 1 tcp_bind_bucket 14 203 16 203 1 inet_peer_cache 81 118 64 59 1 secpath_cache 0 0 128 30 1 xfrm_dst_cache 0 0 384 10 1 ip_dst_cache 176 240 256 15 1 arp_cache 6 30 256 15 1 RAW 3 7 512 7 1 UDP 29 42 512 7 1 tw_sock_TCP 0 0 128 30 1 request_sock_TCP 0 0 64 59 1 TCP 19 35 1152 7 2 flow_cache 0 0 128 30 1 cfq_ioc_pool 194 240 96 40 1 cfq_pool 185 240 96 40 1 crq_pool 312 468 48 78 1 deadline_drq 0 0 52 72 1 as_arq 0 0 64 59 1 mqueue_inode_cache 1 6 640 6 1 isofs_inode_cache 0 0 384 10 1 minix_inode_cache 0 0 420 9 1 hugetlbfs_inode_cache 1 11 356 11 1 ext2_inode_cache 0 0 492 8 1 ext2_xattr 0 0 48 78 1 dnotify_cache 1 169 20 169 1 dquot 0 0 128 30 1 eventpoll_pwq 1 101 36 101 1 eventpoll_epi 1 30 128 30 1 inotify_event_cache 0 0 28 127 1 inotify_watch_cache 40 92 40 92 1 kioctx 0 0 256 15 1 kiocb 0 0 128 30 1 fasync_cache 1 203 16 203 1 shmem_inode_cache 612 632 460 8 1 posix_timers_cache 0 0 100 39 1 uid_cache 7 59 64 59 1 blkdev_ioc 103 127 28 127 1 blkdev_queue 58 60 960 4 1 blkdev_requests 354 418 176 22 1 biovec-(256) 312 312 3072 2 2 biovec-128 368 370 1536 5 2 biovec-64 480 485 768 5 1 biovec-16 480 495 256 15 1 biovec-4 480 531 64 59 1 biovec-1 1104 5481 16 203 1 bio 1140 2250 128 30 1 sock_inode_cache 456 483 512 7 1 skbuff_fclone_cache 36 40 384 10 1 skbuff_head_cache 655 825 256 15 1 file_lock_cache 5 42 92 42 1 acpi_operand 634 828 40 92 1 acpi_parse_ext 0 0 44 84 1 acpi_parse 0 0 28 127 1 acpi_state 0 0 48 78 1 delayacct_cache 183 390 48 78 1 taskstats_cache 9 32 236 16 1 proc_inode_cache 49 170 372 10 1 sigqueue 96 135 144 27 1 radix_tree_node 16046 16786 276 14 1 bdev_cache 56 56 512 7 1 sysfs_dir_cache 4831 4876 40 92 1 mnt_cache 30 60 128 30 1 inode_cache 1041 1276 356 11 1 dentry_cache 11588 13688 132 29 1 filp 2734 2820 192 20 1 names_cache 25 25 4096 1 1 idr_layer_cache 204 232 136 29 1 buffer_head 456669 459936 52 72 1 mm_struct 109 126 448 9 1 vm_area_struct 5010 5632 88 44 1 fs_cache 109 177 64 59 1 files_cache 94 135 448 9 1 signal_cache 159 160 384 10 1 sighand_cache 147 147 1344 3 1 task_struct 175 175 1376 5 2 anon_vma 2355 2540 12 254 1 pgd 81 81 4096 1 1 On Thu, 2007-02-15 at 10:40 -0700, Robert Wipfel wrote:> >>> On Thu, Feb 15, 2007 at 10:34 AM, in message > <1171560898.4589.12.camel@ibmlaptop.darkcore.net>, John Lange > <john.lange@open-it.ca> wrote: > > System is SUSE SLES 10 running heartbeat, ocfs2, evms, and exporting the > > file system via nfs. > > > > The ocfs2 partition is 12 Terabytes and is being exported via nfs. > > > > What we see is as soon as the nfs clients (80 nfs v2 clients) start > > connecting, memory usage goes up and up and up until all the physical > > RAM is consumed but it levels off before hitting swap. With 1G RAM, 1G > > of ram is used. With 2G RAM, 2G of ram is used. It just seems to consume > > everything. > > > > The system seems to run happily for a while. Then something happens and > > there is a RAM spike. Next thing you know we see the dreaded kernel > > oom- killer appear and start killing processes left and right resulting > > in a complete crash. > > > > I can confirm it is NOT nfs using the ram because when nfs is stopped, > > no ram is recovered. But when the ocfs2 partition is unmounted the RAM > > is freed. > > > > Can someone shed some light on what is going on here? Any suggestions on > > how to resolve this problem? > > Are your clients doing lots of creates? There was an OCFS2 bug > that left DLM structures lying around for each file create, that iirc is now > fixed. > > Hth, > Robert > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems >-- John Lange Epic Information Solutions p: (204) 975 7113
Fixed in 1.2.4. SUSE has the patch-fix. The patch has also been added to mainline. John Lange wrote:> Yes, the clients are doing lots of creates. > > But my question is, if this is a memory leak, why does ocfs2 eat up the > memory as soon as the clients start accessing the filesystem. Within > about 5-10 minutes all physical RAM is consumed but then the memory > consumption stops. It does not go into swap. > > Do you happen to know what version of ocfs2 has the fix? > > If it was a leak would the process not be more gradual and continuous? > It would continue to eat into swap no? And if it was a leak would the > ram be freed when ocfs was unmounted? > > Is there a command that shows what is using the kernel memory? > > Here is what /proc/slabinfo shows (cut down for formatting). I don't > understand how to read this so maybe someone can indicate if something > looks wrong? > > ======> # cat /proc/slabinfo > > # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> > nfsd4_delegations 0 0 596 13 2 > nfsd4_stateids 0 0 72 53 1 > nfsd4_files 0 0 36 101 1 > nfsd4_stateowners 0 0 344 11 1 > rpc_buffers 8 8 2048 2 1 > rpc_tasks 8 15 256 15 1 > rpc_inode_cache 0 0 512 7 1 > ocfs2_lock 152 203 16 203 1 > ocfs2_inode_cache 12484 12536 896 4 1 > ocfs2_uptodate 1381 1469 32 113 1 > ocfs2_em_ent 37005 37406 64 59 1 > dlmfs_inode_cache 1 6 640 6 1 > dlm_mle_cache 10 10 384 10 1 > configfs_dir_cache 33 78 48 78 1 > fib6_nodes 7 113 32 113 1 > ip6_dst_cache 7 15 256 15 1 > ndisc_cache 1 15 256 15 1 > RAWv6 5 6 640 6 1 > UDPv6 3 6 640 6 1 > tw_sock_TCPv6 0 0 128 30 1 > request_sock_TCPv6 0 0 128 30 1 > TCPv6 8 9 1280 3 1 > ip_fib_alias 16 113 32 113 1 > ip_fib_hash 16 113 32 113 1 > dm_events 16 169 20 169 1 > dm_tio 4157 7308 16 203 1 > dm_io 4155 6760 20 169 1 > uhci_urb_priv 0 0 40 92 1 > ext3_inode_cache 1062 2856 512 8 1 > ext3_xattr 0 0 48 78 1 > journal_handle 74 169 20 169 1 > journal_head 583 1224 52 72 1 > revoke_table 6 254 12 254 1 > revoke_record 0 0 16 203 1 > qla2xxx_srbs 244 360 128 30 1 > scsi_cmd_cache 106 130 384 10 1 > sgpool-256 32 32 4096 1 1 > sgpool-128 42 42 2048 2 1 > sgpool-64 44 44 1024 4 1 > sgpool-32 48 48 512 8 1 > sgpool-16 75 75 256 15 1 > sgpool-8 153 210 128 30 1 > scsi_io_context 0 0 104 37 1 > UNIX 377 399 512 7 1 > ip_mrt_cache 0 0 128 30 1 > tcp_bind_bucket 14 203 16 203 1 > inet_peer_cache 81 118 64 59 1 > secpath_cache 0 0 128 30 1 > xfrm_dst_cache 0 0 384 10 1 > ip_dst_cache 176 240 256 15 1 > arp_cache 6 30 256 15 1 > RAW 3 7 512 7 1 > UDP 29 42 512 7 1 > tw_sock_TCP 0 0 128 30 1 > request_sock_TCP 0 0 64 59 1 > TCP 19 35 1152 7 2 > flow_cache 0 0 128 30 1 > cfq_ioc_pool 194 240 96 40 1 > cfq_pool 185 240 96 40 1 > crq_pool 312 468 48 78 1 > deadline_drq 0 0 52 72 1 > as_arq 0 0 64 59 1 > mqueue_inode_cache 1 6 640 6 1 > isofs_inode_cache 0 0 384 10 1 > minix_inode_cache 0 0 420 9 1 > hugetlbfs_inode_cache 1 11 356 11 1 > ext2_inode_cache 0 0 492 8 1 > ext2_xattr 0 0 48 78 1 > dnotify_cache 1 169 20 169 1 > dquot 0 0 128 30 1 > eventpoll_pwq 1 101 36 101 1 > eventpoll_epi 1 30 128 30 1 > inotify_event_cache 0 0 28 127 1 > inotify_watch_cache 40 92 40 92 1 > kioctx 0 0 256 15 1 > kiocb 0 0 128 30 1 > fasync_cache 1 203 16 203 1 > shmem_inode_cache 612 632 460 8 1 > posix_timers_cache 0 0 100 39 1 > uid_cache 7 59 64 59 1 > blkdev_ioc 103 127 28 127 1 > blkdev_queue 58 60 960 4 1 > blkdev_requests 354 418 176 22 1 > biovec-(256) 312 312 3072 2 2 > biovec-128 368 370 1536 5 2 > biovec-64 480 485 768 5 1 > biovec-16 480 495 256 15 1 > biovec-4 480 531 64 59 1 > biovec-1 1104 5481 16 203 1 > bio 1140 2250 128 30 1 > sock_inode_cache 456 483 512 7 1 > skbuff_fclone_cache 36 40 384 10 1 > skbuff_head_cache 655 825 256 15 1 > file_lock_cache 5 42 92 42 1 > acpi_operand 634 828 40 92 1 > acpi_parse_ext 0 0 44 84 1 > acpi_parse 0 0 28 127 1 > acpi_state 0 0 48 78 1 > delayacct_cache 183 390 48 78 1 > taskstats_cache 9 32 236 16 1 > proc_inode_cache 49 170 372 10 1 > sigqueue 96 135 144 27 1 > radix_tree_node 16046 16786 276 14 1 > bdev_cache 56 56 512 7 1 > sysfs_dir_cache 4831 4876 40 92 1 > mnt_cache 30 60 128 30 1 > inode_cache 1041 1276 356 11 1 > dentry_cache 11588 13688 132 29 1 > filp 2734 2820 192 20 1 > names_cache 25 25 4096 1 1 > idr_layer_cache 204 232 136 29 1 > buffer_head 456669 459936 52 72 1 > mm_struct 109 126 448 9 1 > vm_area_struct 5010 5632 88 44 1 > fs_cache 109 177 64 59 1 > files_cache 94 135 448 9 1 > signal_cache 159 160 384 10 1 > sighand_cache 147 147 1344 3 1 > task_struct 175 175 1376 5 2 > anon_vma 2355 2540 12 254 1 > pgd 81 81 4096 1 1 > > > On Thu, 2007-02-15 at 10:40 -0700, Robert Wipfel wrote: > >>>>> On Thu, Feb 15, 2007 at 10:34 AM, in message >>>>> >> <1171560898.4589.12.camel@ibmlaptop.darkcore.net>, John Lange >> <john.lange@open-it.ca> wrote: >> >>> System is SUSE SLES 10 running heartbeat, ocfs2, evms, and exporting the >>> file system via nfs. >>> >>> The ocfs2 partition is 12 Terabytes and is being exported via nfs. >>> >>> What we see is as soon as the nfs clients (80 nfs v2 clients) start >>> connecting, memory usage goes up and up and up until all the physical >>> RAM is consumed but it levels off before hitting swap. With 1G RAM, 1G >>> of ram is used. With 2G RAM, 2G of ram is used. It just seems to consume >>> everything. >>> >>> The system seems to run happily for a while. Then something happens and >>> there is a RAM spike. Next thing you know we see the dreaded kernel >>> oom- killer appear and start killing processes left and right resulting >>> in a complete crash. >>> >>> I can confirm it is NOT nfs using the ram because when nfs is stopped, >>> no ram is recovered. But when the ocfs2 partition is unmounted the RAM >>> is freed. >>> >>> Can someone shed some light on what is going on here? Any suggestions on >>> how to resolve this problem? >>> >> Are your clients doing lots of creates? There was an OCFS2 bug >> that left DLM structures lying around for each file create, that iirc is now >> fixed. >> >> Hth, >> Robert >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> >>
I saw this problem on a few of SLES9 SP3 updates, but now it is not an issue anymore. ----- Original Message ----- From: "John Lange" <j.lange@epic.ca> To: <linux-ha@lists.linux-ha.org>; "ocfs2-users" <ocfs2-users@oss.oracle.com> Sent: Thursday, February 15, 2007 10:48 AM Subject: [Ocfs2-users] Re: [Linux-HA] OCFS2 - Memory hog?> Yes, the clients are doing lots of creates. > > But my question is, if this is a memory leak, why does ocfs2 eat up the > memory as soon as the clients start accessing the filesystem. Within > about 5-10 minutes all physical RAM is consumed but then the memory > consumption stops. It does not go into swap. > > Do you happen to know what version of ocfs2 has the fix? > > If it was a leak would the process not be more gradual and continuous? > It would continue to eat into swap no? And if it was a leak would the > ram be freed when ocfs was unmounted? > > Is there a command that shows what is using the kernel memory? > > Here is what /proc/slabinfo shows (cut down for formatting). I don't > understand how to read this so maybe someone can indicate if something > looks wrong? > > ======> # cat /proc/slabinfo > > # name <active_objs> <num_objs> <objsize> <objperslab><pagesperslab>> nfsd4_delegations 0 0 596 13 2 > nfsd4_stateids 0 0 72 53 1 > nfsd4_files 0 0 36 101 1 > nfsd4_stateowners 0 0 344 11 1 > rpc_buffers 8 8 2048 2 1 > rpc_tasks 8 15 256 15 1 > rpc_inode_cache 0 0 512 7 1 > ocfs2_lock 152 203 16 203 1 > ocfs2_inode_cache 12484 12536 896 4 1 > ocfs2_uptodate 1381 1469 32 113 1 > ocfs2_em_ent 37005 37406 64 59 1 > dlmfs_inode_cache 1 6 640 6 1 > dlm_mle_cache 10 10 384 10 1 > configfs_dir_cache 33 78 48 78 1 > fib6_nodes 7 113 32 113 1 > ip6_dst_cache 7 15 256 15 1 > ndisc_cache 1 15 256 15 1 > RAWv6 5 6 640 6 1 > UDPv6 3 6 640 6 1 > tw_sock_TCPv6 0 0 128 30 1 > request_sock_TCPv6 0 0 128 30 1 > TCPv6 8 9 1280 3 1 > ip_fib_alias 16 113 32 113 1 > ip_fib_hash 16 113 32 113 1 > dm_events 16 169 20 169 1 > dm_tio 4157 7308 16 203 1 > dm_io 4155 6760 20 169 1 > uhci_urb_priv 0 0 40 92 1 > ext3_inode_cache 1062 2856 512 8 1 > ext3_xattr 0 0 48 78 1 > journal_handle 74 169 20 169 1 > journal_head 583 1224 52 72 1 > revoke_table 6 254 12 254 1 > revoke_record 0 0 16 203 1 > qla2xxx_srbs 244 360 128 30 1 > scsi_cmd_cache 106 130 384 10 1 > sgpool-256 32 32 4096 1 1 > sgpool-128 42 42 2048 2 1 > sgpool-64 44 44 1024 4 1 > sgpool-32 48 48 512 8 1 > sgpool-16 75 75 256 15 1 > sgpool-8 153 210 128 30 1 > scsi_io_context 0 0 104 37 1 > UNIX 377 399 512 7 1 > ip_mrt_cache 0 0 128 30 1 > tcp_bind_bucket 14 203 16 203 1 > inet_peer_cache 81 118 64 59 1 > secpath_cache 0 0 128 30 1 > xfrm_dst_cache 0 0 384 10 1 > ip_dst_cache 176 240 256 15 1 > arp_cache 6 30 256 15 1 > RAW 3 7 512 7 1 > UDP 29 42 512 7 1 > tw_sock_TCP 0 0 128 30 1 > request_sock_TCP 0 0 64 59 1 > TCP 19 35 1152 7 2 > flow_cache 0 0 128 30 1 > cfq_ioc_pool 194 240 96 40 1 > cfq_pool 185 240 96 40 1 > crq_pool 312 468 48 78 1 > deadline_drq 0 0 52 72 1 > as_arq 0 0 64 59 1 > mqueue_inode_cache 1 6 640 6 1 > isofs_inode_cache 0 0 384 10 1 > minix_inode_cache 0 0 420 9 1 > hugetlbfs_inode_cache 1 11 356 11 1 > ext2_inode_cache 0 0 492 8 1 > ext2_xattr 0 0 48 78 1 > dnotify_cache 1 169 20 169 1 > dquot 0 0 128 30 1 > eventpoll_pwq 1 101 36 101 1 > eventpoll_epi 1 30 128 30 1 > inotify_event_cache 0 0 28 127 1 > inotify_watch_cache 40 92 40 92 1 > kioctx 0 0 256 15 1 > kiocb 0 0 128 30 1 > fasync_cache 1 203 16 203 1 > shmem_inode_cache 612 632 460 8 1 > posix_timers_cache 0 0 100 39 1 > uid_cache 7 59 64 59 1 > blkdev_ioc 103 127 28 127 1 > blkdev_queue 58 60 960 4 1 > blkdev_requests 354 418 176 22 1 > biovec-(256) 312 312 3072 2 2 > biovec-128 368 370 1536 5 2 > biovec-64 480 485 768 5 1 > biovec-16 480 495 256 15 1 > biovec-4 480 531 64 59 1 > biovec-1 1104 5481 16 203 1 > bio 1140 2250 128 30 1 > sock_inode_cache 456 483 512 7 1 > skbuff_fclone_cache 36 40 384 10 1 > skbuff_head_cache 655 825 256 15 1 > file_lock_cache 5 42 92 42 1 > acpi_operand 634 828 40 92 1 > acpi_parse_ext 0 0 44 84 1 > acpi_parse 0 0 28 127 1 > acpi_state 0 0 48 78 1 > delayacct_cache 183 390 48 78 1 > taskstats_cache 9 32 236 16 1 > proc_inode_cache 49 170 372 10 1 > sigqueue 96 135 144 27 1 > radix_tree_node 16046 16786 276 14 1 > bdev_cache 56 56 512 7 1 > sysfs_dir_cache 4831 4876 40 92 1 > mnt_cache 30 60 128 30 1 > inode_cache 1041 1276 356 11 1 > dentry_cache 11588 13688 132 29 1 > filp 2734 2820 192 20 1 > names_cache 25 25 4096 1 1 > idr_layer_cache 204 232 136 29 1 > buffer_head 456669 459936 52 72 1 > mm_struct 109 126 448 9 1 > vm_area_struct 5010 5632 88 44 1 > fs_cache 109 177 64 59 1 > files_cache 94 135 448 9 1 > signal_cache 159 160 384 10 1 > sighand_cache 147 147 1344 3 1 > task_struct 175 175 1376 5 2 > anon_vma 2355 2540 12 254 1 > pgd 81 81 4096 1 1 > > > On Thu, 2007-02-15 at 10:40 -0700, Robert Wipfel wrote: > > >>> On Thu, Feb 15, 2007 at 10:34 AM, in message > > <1171560898.4589.12.camel@ibmlaptop.darkcore.net>, John Lange > > <john.lange@open-it.ca> wrote: > > > System is SUSE SLES 10 running heartbeat, ocfs2, evms, and exportingthe> > > file system via nfs. > > > > > > The ocfs2 partition is 12 Terabytes and is being exported via nfs. > > > > > > What we see is as soon as the nfs clients (80 nfs v2 clients) start > > > connecting, memory usage goes up and up and up until all the physical > > > RAM is consumed but it levels off before hitting swap. With 1G RAM, 1G > > > of ram is used. With 2G RAM, 2G of ram is used. It just seems toconsume> > > everything. > > > > > > The system seems to run happily for a while. Then something happensand> > > there is a RAM spike. Next thing you know we see the dreaded kernel > > > oom- killer appear and start killing processes left and rightresulting> > > in a complete crash. > > > > > > I can confirm it is NOT nfs using the ram because when nfs is stopped, > > > no ram is recovered. But when the ocfs2 partition is unmounted the RAM > > > is freed. > > > > > > Can someone shed some light on what is going on here? Any suggestionson> > > how to resolve this problem? > > > > Are your clients doing lots of creates? There was an OCFS2 bug > > that left DLM structures lying around for each file create, that iirc isnow> > fixed. > > > > Hth, > > Robert > > _______________________________________________ > > Linux-HA mailing list > > Linux-HA@lists.linux-ha.org > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > -- > John Lange > Epic Information Solutions > p: (204) 975 7113 > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >