Alexander Finger
2006-Aug-11 06:08 UTC
[Ocfs2-users] out of memory... doing heavy IO on ocfs2 is wasting (low) memory?!
Hello, my problem: When I want to create a large number of small files on any node at my ocfs2 cluster, after some time the oom killer starts killing processes because of low LowMem. All error messages and memory stats are at the end of this mail. The only way to avoid this behavoir is to unmount the ocfs2 partition after some disk operations, because LowMem (LowFree) stays low until unmount... I searched the web and found many descriptions of this error, but no answer how to handle this problem. What I have and what I use: CentOS release 4.3 (Final) kernel-smp-2.6.9-34.0.2.EL ocfs2-tools-1.2.1-1 ocfs2-2.6.9-34.0.2.EL-1.2.3-1 ocfs2-tools-debuginfo-1.2.1-1 ocfs2-2.6.9-34.0.2.ELsmp-1.2.3-1 ocfs2console-1.2.1-1 I have a SAN (IBM DS4300) where I boot from over qla2xxx and linux RDAC (mpp) drivers. The boot device (/dev/sda) works perfectly and does not cause any problems (on each host). Doing large filesystem tests is no problem. As I wrote above, as soon as I do heavy IO on the ocfs2 device, LowFree drops down from 848624 kB to something about ~ 35000 Kb and stays there until I remount the device... is there any way to reclaim this memory without umount and mount the device? Best, Alexander Runtime info: Aug 11 13:20:24 fooserver kernel: oom-killer: gfp_mask=0xd0 Aug 11 13:20:24 fooserver kernel: Mem-info: Aug 11 13:20:24 fooserver kernel: DMA per-cpu: Aug 11 13:20:24 fooserver kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 11 13:20:24 fooserver kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 11 13:20:24 fooserver kernel: cpu 1 hot: low 2, high 6, batch 1 Aug 11 13:20:24 fooserver kernel: cpu 1 cold: low 0, high 2, batch 1 Aug 11 13:20:24 fooserver kernel: cpu 2 hot: low 2, high 6, batch 1 Aug 11 13:20:24 fooserver kernel: cpu 2 cold: low 0, high 2, batch 1 Aug 11 13:20:24 fooserver kernel: cpu 3 hot: low 2, high 6, batch 1 Aug 11 13:20:24 fooserver kernel: cpu 3 cold: low 0, high 2, batch 1 Aug 11 13:20:24 fooserver kernel: Normal per-cpu: Aug 11 13:20:24 fooserver kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 1 hot: low 32, high 96, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 1 cold: low 0, high 32, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 2 hot: low 32, high 96, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 2 cold: low 0, high 32, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 3 hot: low 32, high 96, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 3 cold: low 0, high 32, batch 16 Aug 11 13:20:26 fooserver kernel: HighMem per-cpu: Aug 11 13:20:26 fooserver kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 1 hot: low 32, high 96, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 1 cold: low 0, high 32, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 2 hot: low 32, high 96, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 2 cold: low 0, high 32, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 3 hot: low 32, high 96, batch 16 Aug 11 13:20:26 fooserver kernel: cpu 3 cold: low 0, high 32, batch 16 Aug 11 13:20:26 fooserver kernel: Aug 11 13:20:26 fooserver kernel: Free pages: 3223752kB (3201152kB HighMem) Aug 11 13:20:26 fooserver kernel: Active:6099 inactive:12675 dirty:11398 writeback:0 unstable:0 free:805938 slab:208990 mapped:2921 pagetables:193 Aug 11 13:20:26 fooserver kernel: DMA free:12568kB min:180kB low:360kB high:540kB active:0kB inactive:0kB present:16384kB pages_scanned:1113 all_unreclaimable? yes Aug 11 13:20:26 fooserver kernel: protections[]: 0 0 0 Aug 11 13:20:26 fooserver kernel: Normal free:10032kB min:10056kB low:20112kB high:30168kB active:2572kB inactive:3676kB present:901120kB pages_scanned:7095 all_unreclaimable? yes Aug 11 13:20:26 fooserver kernel: protections[]: 0 0 0 Aug 11 13:20:26 fooserver kernel: HighMem free:3201152kB min:512kB low:1024kB high:1536kB active:21824kB inactive:47024kB present:3932160kB pages_scanned:0 all_unreclaimable? no Aug 11 13:20:26 fooserver kernel: protections[]: 0 0 0 Aug 11 13:20:26 fooserver kernel: DMA: 4*4kB 3*8kB 3*16kB 4*32kB 3*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12568kB Aug 11 13:20:26 fooserver kernel: Normal: 0*4kB 2*8kB 2*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 2*4096kB = 10032kB Aug 11 13:20:26 fooserver kernel: HighMem: 0*4kB 0*8kB 128*16kB 2*32kB 101*64kB 316*128kB 217*256kB 156*512kB 124*1024kB 35*2048kB 688*4096kB = 3201152kB Aug 11 13:20:26 fooserver kernel: Swap cache: add 0, delete 0, find 0/0, race 0+0 Aug 11 13:20:26 fooserver kernel: 0 bounce buffer pages Aug 11 13:20:26 fooserver kernel: Free swap: 2096472kB Aug 11 13:20:26 fooserver kernel: 1212416 pages of RAM Aug 11 13:20:26 fooserver kernel: 819120 pages of HIGHMEM Aug 11 13:20:26 fooserver kernel: 174864 reserved pages Aug 11 13:20:26 fooserver kernel: 22317 pages shared Aug 11 13:20:26 fooserver kernel: 0 pages swap cached Aug 11 13:20:26 fooserver kernel: Out of Memory: Killed process 3368 (sshd). MemFree Normal output: MemFree: 4040304 kB HighFree: 3229824 kB LowFree: 810480 kB SwapFree: 2096472 kB HugePages_Free: 0 shortly before and after oom killer rampaged: MemFree: 3223736 kB HighFree: 3187840 kB LowFree: 35896 kB SwapFree: 2096472 kB HugePages_Free: 0 Fri Aug 11 13:18:48 CEST 2006 13:18:48 up 55 min, 4 users, load average: 2.92, 2.82, 2.36 MemFree: 3199080 kB HighFree: 3175808 kB LowFree: 23272 kB SwapFree: 2096472 kB HugePages_Free: 0 Fri Aug 11 13:19:18 CEST 2006 13:19:18 up 56 min, 4 users, load average: 3.22, 2.90, 2.40 MemFree: 3242080 kB HighFree: 3207296 kB LowFree: 34784 kB SwapFree: 2096472 kB HugePages_Free: 0 Fri Aug 11 13:19:48 CEST 2006 13:19:48 up 56 min, 4 users, load average: 3.71, 3.05, 2.47 MemFree: 3225848 kB HighFree: 3202368 kB LowFree: 23480 kB SwapFree: 2096472 kB HugePages_Free: 0 Fri Aug 11 13:20:18 CEST 2006 13:20:18 up 57 min, 4 users, load average: 3.66, 3.09, 2.50 LowFree stays at 23480 until the filesystem is remounted. -- Fotofinder GmbH USt-IdNr. DE812854514 Software Entwicklung Web: http://www.fotofinder.net/ Potsdamer Str. 96 Tel: +49 30 25792890 10785 Berlin Fax: +49 30 257928999 -------------- next part -------------- A non-text attachment was scrubbed... Name: a.finger.vcf Type: text/x-vcard Size: 346 bytes Desc: not available Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060811/a9dd7090/a.finger.vcf
Kurt Hackel
2006-Aug-11 11:52 UTC
[Ocfs2-users] out of memory... doing heavy IO on ocfs2 is wasting (low) memory?!
Hi, Alexander Finger wrote:> Hello, > > my problem: When I want to create a large number of small files on any > node at my ocfs2 cluster, after some time the oom killer starts > killing processes because of low LowMem. All error messages and memory > stats are at the end of this mail.This is a known issue that is being currently fixed for the next scheduled release. At this time, once a node masters a lock resource (from the filesystem this would happen if the node were the first node to access that file) it cannot drop the mastery of that resource until it unmounts. The fix is nontrivial but I'm almost done with it. Once the fix is done it will need extensive testing.> The only way to avoid this behavoir is to unmount the ocfs2 partition > after some disk operations, because LowMem (LowFree) stays low until > unmount... I searched the web and found many descriptions of this > error, but no answer how to handle this problem.Correct. The only current workaround is to unmount, or to attempt to spread the lock resources out across all the nodes of the cluster (which may be impossible in your usage case). Thanks -kurt