Dear list, Nov 20 10:43:16 boss02 kernel: ll_ost_io_126: page allocation failure. order:4, mode:0x50 Nov 20 10:43:16 boss02 kernel: ll_ost_io_42: page allocation failure. order:4, mode:0x50 Nov 20 10:43:19 boss02 kernel: [<fb671a80>]<4>ll_ost_io_47: page allocation failure. order:4, mode:0x50 Nov 20 10:43:26 boss02 kernel: HighMem: 688*4kB <4>ll_ost_io_133: page allocation failure. order:4, mode:0x50 Nov 20 12:02:27 boss02 kernel: ll_ost_io_25: page allocation failure. order:4, mode:0x50 Nov 20 12:02:27 boss02 kernel: ll_ost_io_13: page allocation failure. order:4, mode:0x50 Nov 20 12:02:27 boss02 kernel: [<fb65b009>]<4>ll_ost_io_09: page allocation failure. order:4, mode:0x50 Nov 20 12:02:27 boss02 kernel: [<fb0b715e>]<4>ll_ost_io_111: page allocation failure. order:4, mode:0x50 Nov 20 12:11:08 boss02 kernel: ll_ost_io_85: page allocation failure. order:4, mode:0x50 Nov 20 12:11:08 boss02 kernel: ll_ost_io_23: page allocation failure. order:4, mode:0x50 Nov 20 12:11:10 boss02 kernel: [<fb66fcf5>]<4>ll_ost_io_134: page allocation failure. order:4, mode:0x50 Nov 20 12:11:15 boss02 kernel: [<c01c41ca>]<4>ll_ost_io_109: page allocation failure. order:4, mode:0x50 Nov 20 16:39:12 boss02 kernel: ll_ost_io_122: page allocation failure. order:4, mode:0x50 Nov 20 16:39:12 boss02 kernel: ll_ost_io_32: page allocation failure. order:4, mode:0x50 Nov 20 16:39:12 boss02 kernel: [<c0146301>]<4>ll_ost_io_90: page allocation failure. order:4, mode:0x50 Nov 20 16:39:12 boss02 kernel: ll_ost_io_126: page allocation failure. order:4, mode:0x50 Nov 20 18:13:20 boss02 kernel: ll_ost_io_15: page allocation failure. order:4, mode:0x50 Nov 20 18:13:20 boss02 kernel: ll_ost_io_26: page allocation failure. order:4, mode:0x50 Nov 20 18:13:20 boss02 kernel: [<c01425cc>]<4>ll_ost_io_48: page allocation failure. order:4, mode:0x50 Nov 20 18:13:21 boss02 kernel: [<c01c0fba>]<4>ll_ost_io_105: page allocation failure. order:4, mode:0x50 Nov 20 19:40:37 boss02 kernel: ll_ost_io_21: page allocation failure. order:4, mode:0x50 Nov 20 19:40:37 boss02 kernel: ll_ost_io_121: page allocation failure. order:4, mode:0x50 Nov 20 19:40:37 boss02 kernel: ll_ost_io_128: page allocation failure. order:4, mode:0x50 Nov 20 19:40:38 boss02 kernel: [<c01259d4>]<4>ll_ost_io_81: page allocation failure. order:4, mode:0x50 Nov 21 00:51:51 boss02 kernel: ll_ost_io_46: page allocation failure. order:4, mode:0x50 Nov 21 00:51:51 boss02 kernel: ll_ost_io_73: page allocation failure. order:4, mode:0x50 Nov 21 00:51:51 boss02 kernel: [<c0145f2d>]<4>ll_ost_io_08: page allocation failure. order:4, mode:0x50 Nov 21 00:51:51 boss02 kernel: [<c0145f2d>]<4>ll_ost_io_67: page allocation failure. order:4, mode:0x50 Nov 21 05:48:44 boss02 kernel: ll_ost_io_38: page allocation failure. order:4, mode:0x50 Nov 21 05:48:44 boss02 kernel: ll_ost_io_114: page allocation failure. order:4, mode:0x50 Nov 21 05:48:44 boss02 kernel: [<c0146301>]<4>ll_ost_io_06: page allocation failure. order:4, mode:0x50 Nov 21 05:48:44 boss02 kernel: [<c01451ed>]<4>ll_ost_io_86: page allocation failure. order:4, mode:0x50
Dear list, I found these errors on OSS, is this a dangerours signal ? Nov 20 10:43:16 boss02 kernel: ll_ost_io_126: page allocation failure. order:4, mode:0x50 Nov 20 10:43:16 boss02 kernel: ll_ost_io_42: page allocation failure. order:4, mode:0x50 Nov 20 10:43:19 boss02 kernel: [<fb671a80>]<4>ll_ost_io_47: page allocation failure. order:4, mode:0x50 Nov 20 10:43:26 boss02 kernel: HighMem: 688*4kB <4>ll_ost_io_133: page allocation failure. order:4, mode:0x50 Nov 20 12:02:27 boss02 kernel: ll_ost_io_25: page allocation failure. order:4, mode:0x50 Nov 20 12:02:27 boss02 kernel: ll_ost_io_13: page allocation failure. order:4, mode:0x50 Nov 20 12:02:27 boss02 kernel: [<fb65b009>]<4>ll_ost_io_09: page allocation failure. order:4, mode:0x50 Nov 20 12:02:27 boss02 kernel: [<fb0b715e>]<4>ll_ost_io_111: page allocation failure. order:4, mode:0x50 Nov 20 12:11:08 boss02 kernel: ll_ost_io_85: page allocation failure. order:4, mode:0x50 Nov 20 12:11:08 boss02 kernel: ll_ost_io_23: page allocation failure. order:4, mode:0x50 Nov 20 12:11:10 boss02 kernel: [<fb66fcf5>]<4>ll_ost_io_134: page allocation failure. order:4, mode:0x50 Nov 20 12:11:15 boss02 kernel: [<c01c41ca>]<4>ll_ost_io_109: page allocation failure. order:4, mode:0x50 Nov 20 16:39:12 boss02 kernel: ll_ost_io_122: page allocation failure. order:4, mode:0x50 Nov 20 16:39:12 boss02 kernel: ll_ost_io_32: page allocation failure. order:4, mode:0x50 Nov 20 16:39:12 boss02 kernel: [<c0146301>]<4>ll_ost_io_90: page allocation failure. order:4, mode:0x50 Nov 20 16:39:12 boss02 kernel: ll_ost_io_126: page allocation failure. order:4, mode:0x50 Nov 20 18:13:20 boss02 kernel: ll_ost_io_15: page allocation failure. order:4, mode:0x50 Nov 20 18:13:20 boss02 kernel: ll_ost_io_26: page allocation failure. order:4, mode:0x50 Nov 20 18:13:20 boss02 kernel: [<c01425cc>]<4>ll_ost_io_48: page allocation failure. order:4, mode:0x50 Nov 20 18:13:21 boss02 kernel: [<c01c0fba>]<4>ll_ost_io_105: page allocation failure. order:4, mode:0x50 Nov 20 19:40:37 boss02 kernel: ll_ost_io_21: page allocation failure. order:4, mode:0x50 Nov 20 19:40:37 boss02 kernel: ll_ost_io_121: page allocation failure. order:4, mode:0x50 Nov 20 19:40:37 boss02 kernel: ll_ost_io_128: page allocation failure. order:4, mode:0x50 Nov 20 19:40:38 boss02 kernel: [<c01259d4>]<4>ll_ost_io_81: page allocation failure. order:4, mode:0x50 Nov 21 00:51:51 boss02 kernel: ll_ost_io_46: page allocation failure. order:4, mode:0x50 Nov 21 00:51:51 boss02 kernel: ll_ost_io_73: page allocation failure. order:4, mode:0x50 Nov 21 00:51:51 boss02 kernel: [<c0145f2d>]<4>ll_ost_io_08: page allocation failure. order:4, mode:0x50 Nov 21 00:51:51 boss02 kernel: [<c0145f2d>]<4>ll_ost_io_67: page allocation failure. order:4, mode:0x50 Nov 21 05:48:44 boss02 kernel: ll_ost_io_38: page allocation failure. order:4, mode:0x50 Nov 21 05:48:44 boss02 kernel: ll_ost_io_114: page allocation failure. order:4, mode:0x50 Nov 21 05:48:44 boss02 kernel: [<c0146301>]<4>ll_ost_io_06: page allocation failure. order:4, mode:0x50 Nov 21 05:48:44 boss02 kernel: [<c01451ed>]<4>ll_ost_io_86: page allocation failure. order:4, mode:0x50 Nov 20 19:40:45 boss02 kernel: Free pages: 7418824kB (7398656kB HighMem) Nov 20 19:40:45 boss02 kernel: Active:12247 inactive:93310 dirty:2 writeback:0 unstable:0 free:1854706 slab:65523 mapped:5465 pagetables: 312 Nov 20 19:40:45 boss02 kernel: DMA free:12784kB min:64kB low:80kB high: 96kB active:0kB inactive:0kB present:16384kB pages_scanned:0 all_unreclaimable? yes Nov 20 19:40:45 boss02 kernel: protections[]: 0 0 0 Nov 20 19:40:45 boss02 kernel: Normal free:7384kB min:3728kB low: 4660kB high:5592kB active:21472kB inactive:352196kB present:901120kB pages_scanned:0 all_unreclaimable? no Nov 20 19:40:45 boss02 kernel: protections[]: 0 0 0 Nov 20 19:40:45 boss02 kernel: HighMem free:7398656kB min:512kB low: 640kB high:768kB active:27516kB inactive:21044kB present:8519676kB pages_scanned:0 all_unreclaimable? no Nov 20 19:40:45 boss02 kernel: protections[]: 0 0 0 Nov 20 19:40:45 boss02 kernel: DMA: 4*4kB 4*8kB 4*16kB 2*32kB 3*64kB 3*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12784kB Nov 20 19:40:45 boss02 kernel: Normal: 640*4kB 109*8kB 127*16kB 60*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 7384kB Nov 20 19:40:45 boss02 kernel: HighMem: 376*4kB 1162*8kB 815*16kB 299*32kB 160*64kB 61*128kB 34*256kB 25*512kB 8*1024kB 1*2048kB tcp_clean_rtx_queue+0x255/0x35f Nov 20 19:40:46 boss02 kernel: 1786*4096kB = 7398656kB Nov 20 19:40:46 boss02 kernel: Swap cache: add 0, delete 0, find 0/0, race 0+0 Nov 20 19:40:46 boss02 kernel: 0 bounce buffer pages Nov 20 19:40:46 boss02 kernel: Free swap: 16386292kB Nov 20 19:40:46 boss02 kernel: [<c027dcbe>] __kfree_skb+0xf4/0xf7 Nov 20 19:40:46 boss02 kernel: [<c02b162a>] tcp_v4_do_rcv+0x1b/0xe9 Nov 20 19:40:46 boss02 kernel: [<fb18fd06>] ost_handle+0xe56/0x5790 [ost] Nov 20 19:40:46 boss02 kernel: [<c0299ba8>]DMA per-cpu: Nov 20 19:40:46 boss02 kernel: cpu 0 hot: low 2, high 6, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 0 cold: low 0, high 2, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 1 hot: low 2, high 6, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 1 cold: low 0, high 2, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 2 hot: low 2, high 6, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 2 cold: low 0, high 2, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 3 hot: low 2, high 6, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 3 cold: low 0, high 2, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 4 hot: low 2, high 6, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 4 cold: low 0, high 2, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 5 hot: low 2, high 6, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 5 cold: low 0, high 2, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 6 hot: low 2, high 6, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 6 cold: low 0, high 2, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 7 hot: low 2, high 6, batch 1 Nov 20 19:40:46 boss02 kernel: cpu 7 cold: low 0, high 2, batch 1 Nov 20 19:40:46 boss02 kernel: Normal per-cpu: Nov 20 19:40:46 boss02 kernel: cpu 0 hot: low 32, high 96, batch 16 Nov 20 19:40:46 boss02 kernel: cpu 0 cold: low 0, high 32, batch 16 Nov 20 19:40:46 boss02 kernel: cpu 1 hot: low 32, high 96, batch 16 Nov 20 19:40:46 boss02 kernel: cpu 1 cold: low 0, high 32, batch 16 Nov 20 19:40:46 boss02 kernel: cpu 2 hot: low 32, high 96, batch 16 Nov 20 19:40:46 boss02 kernel: cpu 2 cold: low 0, high 32, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 3 hot: low 32, high 96, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 3 cold: low 0, high 32, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 4 hot: low 32, high 96, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 4 cold: low 0, high 32, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 5 hot: low 32, high 96, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 5 cold: low 0, high 32, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 6 hot: low 32, high 96, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 6 cold: low 0, high 32, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 7 hot: low 32, high 96, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 7 cold: low 0, high 32, batch 16 Nov 20 19:40:47 boss02 kernel: HighMem per-cpu: Nov 20 19:40:47 boss02 kernel: cpu 0 hot: low 32, high 96, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 0 cold: low 0, high 32, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 1 hot: low 32, high 96, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 1 cold: low 0, high 32, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 2 hot: low 32, high 96, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 2 cold: low 0, high 32, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 3 hot: low 32, high 96, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 3 cold: low 0, high 32, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 4 hot: low 32, high 96, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 4 cold: low 0, high 32, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 5 hot: low 32, high 96, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 5 cold: low 0, high 32, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 6 hot: low 32, high 96, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 6 cold: low 0, high 32, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 7 hot: low 32, high 96, batch 16 Nov 20 19:40:47 boss02 kernel: cpu 7 cold: low 0, high 32, batch 16
Lu Wang writes: > Dear list, Hello, > I found these errors on OSS, is this a dangerours signal ? your ost runs out of memory, which is not by itself `dangerous'' as a temporary condition (client resends pages in this case). What version of lustre and kernel are you using? Nikita.
The %util of memory on OSS was always around 10% ,even when OSS was going to die. The OSS kernel is: 2.6.9-67.0.7.EL_lustre.1.6.5smp(32bit) Lustre version is 1.6.5.1 We have 8GB physical memory and 16GB(never been used) swap total. Is there a problem with memory management? Nikita Danilov ?:> Lu Wang writes: > > Dear list, > > Hello, > > > I found these errors on OSS, is this a dangerours signal ? > > your ost runs out of memory, which is not by itself `dangerous'' as a > temporary condition (client resends pages in this case). What version of > lustre and kernel are you using? > > Nikita. > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Nov 26, 2008 19:04 +0800, Wang lu wrote:> The %util of memory on OSS was always around 10% ,even when OSS was going to > die. > > The OSS kernel is: > 2.6.9-67.0.7.EL_lustre.1.6.5smp(32bit) > > Lustre version is 1.6.5.1 > > We have 8GB physical memory and 16GB(never been used) swap total. > > Is there a problem with memory management?The problem is with the 32-bit kernel. Linux doesn''t allow a 32-bit kernel to use more than 900MB of memory on a 32-bit system, no matter how much RAM is installed. 900MB/8192MB ~= 10% of RAM. Swap is not useful for the kernel.> Nov 20 19:40:45 boss02 kernel: Normal: 640*4kB 109*8kB 127*16kB 60*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7384kB > Nov 20 19:40:45 boss02 kernel: HighMem: 376*4kB 1162*8kB 815*16kB 299*32kB 160*64kB 61*128kB 34*256kB 25*512kB 8*1024kB 1*2048kB 1786*4096kB = 7398656kBAs you can see, all of the memory is available in "highmem" and not in the "normal" memory region that the kernel uses.> Nov 21 05:48:44 boss02 kernel: ll_ost_io_114: page allocation failure. order:4, mode:0x50These are "order 4" allocations (64kB), which the kernel is bad at handling under memory pressure in any case. You can see in the "Normal" zone above that all memory chunks 64kB and larger have no free memory to allocate.> Nov 20 19:40:46 boss02 kernel: [<c02b162a>] tcp_v4_do_rcv+0x1b/0xe9 > Nov 20 19:40:46 boss02 kernel: [<fb18fd06>] ost_handle+0xe56/0x5790This appears that the memory allocation problems are due to the TCP stack. I would suspect that you are using TCP with jumbo packets. The easiest solution is to run a 64-bit kernel, which I suspect should be possible given that hardly any 32-bit machines allow more than 4GB of RAM. Next it would be possible to use regular ethernet frames, which may help somewhat but it won''t let you use the other 7GB of RAM in the system. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Dear Andreas, Thank you very much for your suggestion. I have one more question. We have more then 100 client nodes running 32bit Linux, when we switch the OSS kernel to 64bit, is there any special configuration we should do? Cheers, Lu Andreas Dilger ?:> On Nov 26, 2008 19:04 +0800, Wang lu wrote: >> The %util of memory on OSS was always around 10% ,even when OSS was going to >> die. >> >> The OSS kernel is: >> 2.6.9-67.0.7.EL_lustre.1.6.5smp(32bit) >> >> Lustre version is 1.6.5.1 >> >> We have 8GB physical memory and 16GB(never been used) swap total. >> >> Is there a problem with memory management? > > The problem is with the 32-bit kernel. Linux doesn''t allow a 32-bit > kernel to use more than 900MB of memory on a 32-bit system, no matter > how much RAM is installed. 900MB/8192MB ~= 10% of RAM. Swap is not > useful for the kernel. > >> Nov 20 19:40:45 boss02 kernel: Normal: 640*4kB 109*8kB 127*16kB 60*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7384kB>> Nov 20 19:40:45 boss02 kernel: HighMem: 376*4kB 1162*8kB 815*16kB 299*32kB160*64kB 61*128kB 34*256kB 25*512kB 8*1024kB 1*2048kB 1786*4096kB = 7398656kB> > As you can see, all of the memory is available in "highmem" and not in > the "normal" memory region that the kernel uses. > >> Nov 21 05:48:44 boss02 kernel: ll_ost_io_114: page allocation failure.order:4, mode:0x50> > These are "order 4" allocations (64kB), which the kernel is bad at handling > under memory pressure in any case. You can see in the "Normal" zone above > that all memory chunks 64kB and larger have no free memory to allocate. > >> Nov 20 19:40:46 boss02 kernel: [<c02b162a>] tcp_v4_do_rcv+0x1b/0xe9 >> Nov 20 19:40:46 boss02 kernel: [<fb18fd06>] ost_handle+0xe56/0x5790 > > This appears that the memory allocation problems are due to the TCP > stack. I would suspect that you are using TCP with jumbo packets. > > The easiest solution is to run a 64-bit kernel, which I suspect should > be possible given that hardly any 32-bit machines allow more than 4GB > of RAM. Next it would be possible to use regular ethernet frames, which > may help somewhat but it won''t let you use the other 7GB of RAM in the > system. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >
On Nov 28, 2008 10:16 +0800, Wang lu wrote:> Thank you very much for your suggestion. I have one more question. We > have more then 100 client nodes running 32bit > Linux, when we switch the OSS kernel to 64bit, is there any special > configuration we should do?No, there is no requirement for the kernels on the OSS and clients to match.> Andreas Dilger ?: > >> On Nov 26, 2008 19:04 +0800, Wang lu wrote: >>> The %util of memory on OSS was always around 10% ,even when OSS was going to >>> die. The OSS kernel is: 2.6.9-67.0.7.EL_lustre.1.6.5smp(32bit) >>> >>> Lustre version is 1.6.5.1 >>> >>> We have 8GB physical memory and 16GB(never been used) swap total. >>> >>> Is there a problem with memory management? >> >> The problem is with the 32-bit kernel. Linux doesn''t allow a 32-bit >> kernel to use more than 900MB of memory on a 32-bit system, no matter >> how much RAM is installed. 900MB/8192MB ~= 10% of RAM. Swap is not >> useful for the kernel. >> >>> Nov 20 19:40:45 boss02 kernel: Normal: 640*4kB 109*8kB 127*16kB 60*32kB 0* > 64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7384kB >>> Nov 20 19:40:45 boss02 kernel: HighMem: 376*4kB 1162*8kB 815*16kB 299*32kB > 160*64kB 61*128kB 34*256kB 25*512kB 8*1024kB 1*2048kB 1786*4096kB = 7398656kB >> >> As you can see, all of the memory is available in "highmem" and not in >> the "normal" memory region that the kernel uses. >> >>> Nov 21 05:48:44 boss02 kernel: ll_ost_io_114: page allocation >>> failure. > order:4, mode:0x50 >> >> These are "order 4" allocations (64kB), which the kernel is bad at handling >> under memory pressure in any case. You can see in the "Normal" zone above >> that all memory chunks 64kB and larger have no free memory to allocate. >> >>> Nov 20 19:40:46 boss02 kernel: [<c02b162a>] tcp_v4_do_rcv+0x1b/0xe9 >>> Nov 20 19:40:46 boss02 kernel: [<fb18fd06>] ost_handle+0xe56/0x5790 >> >> This appears that the memory allocation problems are due to the TCP >> stack. I would suspect that you are using TCP with jumbo packets. >> >> The easiest solution is to run a 64-bit kernel, which I suspect should >> be possible given that hardly any 32-bit machines allow more than 4GB >> of RAM. Next it would be possible to use regular ethernet frames, which >> may help somewhat but it won''t let you use the other 7GB of RAM in the >> system. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >> >Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Dear Andreas, After I update OSS servers to 64bit, the memory usage become normal( there are 3GMemory used now). It is a big progress. However, there are still a lot of logs like slow *** when I am doing heavy load test. testjob.sh #!/bin/bash c=0 while true do let c=c+1 filename="/besfs/wanglu/$HOSTNAME-$$-$RANDOM" dd if=/dev/zero of="$filename" bs=1M count=1000 sleep 1 dd if="$filename" of="/dev/null" bs=1M count=1000 sleep 1 rm "$filename" date echo "$c GB W+R finished!" echo done Nov 28 16:12:50 boss01 kernel: LustreError: 2129:0:(filter_io_26.c:779: filter_commitrw_write()) besfs-OST0005: slow direct_io 53s Nov 28 16:12:50 boss01 kernel: LustreError: 871:0:(lustre_fsfilt.h:318: fsfilt_commit_wait()) besfs-OST0005: slow journal start 53s Nov 28 16:12:50 boss01 kernel: LustreError: 871:0:(filter_io_26.c:792: filter_commitrw_write()) besfs-OST0005: slow commitrw commit 53s Nov 28 16:13:24 boss01 kernel: LustreError: 838:0:(filter_io_26.c:714: filter_commitrw_write()) besfs-OST0002: slow i_mutex 123s Nov 28 16:13:24 boss01 kernel: LustreError: 838:0:(filter_io_26.c:714: filter_commitrw_write()) Skipped 84 previous similar messages Nov 28 16:13:24 boss01 kernel: LustreError: 2107:0:(lustre_fsfilt.h:262: fsfilt_brw_start_log()) besfs-OST0002: slow journal start 123s Nov 28 16:13:24 boss01 kernel: LustreError: 2107:0:(lustre_fsfilt.h:262: fsfilt_brw_start_log()) Skipped 10 previous similar messages Nov 28 16:13:24 boss01 kernel: LustreError: 2107:0:(filter_io_26.c:727: filter_commitrw_write()) besfs-OST0002: slow brw_start 123s Nov 28 16:13:24 boss01 kernel: LustreError: 2107:0:(filter_io_26.c:727: filter_commitrw_write()) Skipped 11 previous similar messages Nov 28 16:13:24 boss01 kernel: LustreError: 2093:0:(lustre_fsfilt.h:227: fsfilt_start_log()) besfs-OST0002: slow journal start 122s Nov 28 16:13:24 boss01 kernel: LustreError: 2175:0:(lustre_fsfilt.h:318: fsfilt_commit_wait()) besfs-OST0002: Andreas Dilger ?:> On Nov 28, 2008 10:16 +0800, Wang lu wrote: >> Thank you very much for your suggestion. I have one more question. We >> have more then 100 client nodes running 32bit >> Linux, when we switch the OSS kernel to 64bit, is there any special >> configuration we should do? > > No, there is no requirement for the kernels on the OSS and clients to match. > >> Andreas Dilger ?: >> >>> On Nov 26, 2008 19:04 +0800, Wang lu wrote: >>>> The %util of memory on OSS was always around 10% ,even when OSS was goingto>>>> die. The OSS kernel is: 2.6.9-67.0.7.EL_lustre.1.6.5smp(32bit) >>>> >>>> Lustre version is 1.6.5.1 >>>> >>>> We have 8GB physical memory and 16GB(never been used) swap total. >>>> >>>> Is there a problem with memory management? >>> >>> The problem is with the 32-bit kernel. Linux doesn''t allow a 32-bit >>> kernel to use more than 900MB of memory on a 32-bit system, no matter >>> how much RAM is installed. 900MB/8192MB ~= 10% of RAM. Swap is not >>> useful for the kernel. >>> >>>> Nov 20 19:40:45 boss02 kernel: Normal: 640*4kB 109*8kB 127*16kB 60*32kB 0* >> 64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7384kB >>>> Nov 20 19:40:45 boss02 kernel: HighMem: 376*4kB 1162*8kB 815*16kB 299*32kB >> 160*64kB 61*128kB 34*256kB 25*512kB 8*1024kB 1*2048kB 1786*4096kB 7398656kB >>> >>> As you can see, all of the memory is available in "highmem" and not in >>> the "normal" memory region that the kernel uses. >>> >>>> Nov 21 05:48:44 boss02 kernel: ll_ost_io_114: page allocation >>>> failure. >> order:4, mode:0x50 >>> >>> These are "order 4" allocations (64kB), which the kernel is bad at handling >>> under memory pressure in any case. You can see in the "Normal" zone above >>> that all memory chunks 64kB and larger have no free memory to allocate. >>> >>>> Nov 20 19:40:46 boss02 kernel: [<c02b162a>] tcp_v4_do_rcv+0x1b/0xe9 >>>> Nov 20 19:40:46 boss02 kernel: [<fb18fd06>] ost_handle+0xe56/0x5790 >>> >>> This appears that the memory allocation problems are due to the TCP >>> stack. I would suspect that you are using TCP with jumbo packets. >>> >>> The easiest solution is to run a 64-bit kernel, which I suspect should >>> be possible given that hardly any 32-bit machines allow more than 4GB >>> of RAM. Next it would be possible to use regular ethernet frames, which >>> may help somewhat but it won''t let you use the other 7GB of RAM in the >>> system. >>> >>> Cheers, Andreas >>> -- >>> Andreas Dilger >>> Sr. Staff Engineer, Lustre Group >>> Sun Microsystems of Canada, Inc. >>> >> > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >
By the way, Happy Thanksgiving!
On Thursday 27 November 2008 20:30, Andreas Dilger wrote:> On Nov 26, 2008 19:04 +0800, Wang lu wrote: > > The %util of memory on OSS was always around 10% ,even when OSS was going > > to die. > > > > The OSS kernel is: > > 2.6.9-67.0.7.EL_lustre.1.6.5smp(32bit) > > > > Lustre version is 1.6.5.1 > > > > We have 8GB physical memory and 16GB(never been used) swap total. > > > > Is there a problem with memory management? > > The problem is with the 32-bit kernel. Linux doesn''t allow a 32-bit > kernel to use more than 900MB of memory on a 32-bit system, no matter > how much RAM is installed. 900MB/8192MB ~= 10% of RAM.I want to clarify this, because i don''t understand why are you saying that it can be used max 900MB of our RAM! Afaik, on 32bit system, we have the following limits: - max 4GiB RAM using kernel without PAE (Physical Address Extension) - max 64GiB RAM using kernel with PAE (extend physical address size from 32 to 36bits)> Swap is not > useful for the kernel. >Why? Regards, Alx
On Fri, 2008-11-28 at 11:14 +0200, Alex wrote:> > I want to clarify this, because i don''t understand why are you saying that it > can be used max 900MB of our RAM! Afaik, on 32bit system, we have the > following limits: > - max 4GiB RAM using kernel without PAE (Physical Address Extension) > - max 64GiB RAM using kernel with PAE (extend physical address size from 32 to > 36bits)There is a further limit in how much of that RAM the kernel can use (i.e. the kernel is limited in how much of that 4 or 6 or whatever GB of memory you have that it can use). I tried to find some documentation on it to help you understand but couldn''t really find any. Maybe somebody else has a pointer to a good introduction to why this is. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081128/59abd678/attachment.bin
On Fri, 2008-11-28 at 16:22 +0800, Wang lu wrote:> > Nov 28 16:12:50 boss01 kernel: LustreError: 2129:0:(filter_io_26.c:779: > filter_commitrw_write()) besfs-OST0005: slow direct_io 53s > Nov 28 16:12:50 boss01 kernel: LustreError: 871:0:(lustre_fsfilt.h:318: > fsfilt_commit_wait()) besfs-OST0005: slow journal start 53s > Nov 28 16:12:50 boss01 kernel: LustreError: 871:0:(filter_io_26.c:792: > filter_commitrw_write()) besfs-OST0005: slow commitrw commit 53s > Nov 28 16:13:24 boss01 kernel: LustreError: 838:0:(filter_io_26.c:714: > filter_commitrw_write()) besfs-OST0002: slow i_mutex 123s > Nov 28 16:13:24 boss01 kernel: LustreError: 838:0:(filter_io_26.c:714: > filter_commitrw_write()) Skipped 84 previous similar messages > Nov 28 16:13:24 boss01 kernel: LustreError: 2107:0:(lustre_fsfilt.h:262: > fsfilt_brw_start_log()) besfs-OST0002: slow journal start 123s > Nov 28 16:13:24 boss01 kernel: LustreError: 2107:0:(lustre_fsfilt.h:262: > fsfilt_brw_start_log()) Skipped 10 previous similar messages > Nov 28 16:13:24 boss01 kernel: LustreError: 2107:0:(filter_io_26.c:727: > filter_commitrw_write()) besfs-OST0002: slow brw_start 123s > Nov 28 16:13:24 boss01 kernel: LustreError: 2107:0:(filter_io_26.c:727: > filter_commitrw_write()) Skipped 11 previous similar messages > Nov 28 16:13:24 boss01 kernel: LustreError: 2093:0:(lustre_fsfilt.h:227: > fsfilt_start_log()) besfs-OST0002: slow journal start 122s > Nov 28 16:13:24 boss01 kernel: LustreError: 2175:0:(lustre_fsfilt.h:318: > fsfilt_commit_wait()) besfs-OST0002:Your storage is too slow for the OSS load you are throwing at it. Try reducing the number of OST threads on your OSSes. I don''t recall exactly the name of the module option but it is an ost module option. modinfo should tell you. I''m sure the operations manual covers this as well. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081128/0705e7e1/attachment.bin
On Nov 28, 2008 11:14 +0200, Alex wrote:> On Thursday 27 November 2008 20:30, Andreas Dilger wrote: > > The problem is with the 32-bit kernel. Linux doesn''t allow a 32-bit > > kernel to use more than 900MB of memory on a 32-bit system, no matter > > how much RAM is installed. 900MB/8192MB ~= 10% of RAM. > > I want to clarify this, because i don''t understand why are you saying that it > can be used max 900MB of our RAM! Afaik, on 32bit system, we have the > following limits: > - max 4GiB RAM using kernel without PAE (Physical Address Extension) > - max 64GiB RAM using kernel with PAE (extend physical address size from 32 > to 36bits)This is a Linux kernel limitation. The 32-bit address space is split into 1GB for the kernel ("Normal" memory) and 3GB ("High" memory) for user space applications. As a result, the Lustre OST threads (which run in the kernel) can only use at most 1GB of RAM on a 32-bit system. Even for filesystems like NFS or ext3 they can cache only 1GB of metadata. There is no reason to use a 32-bit OSS node for systems that need to have good performance these days. Even the least expensive x86 CPU is 64-bit.> > Swap is not useful for the kernel. > > Why?Because that just isn''t the way the Linux kernel works. It is not possible to swap memory allocated by the kernel. Even if the Linux kernel allowed swapping kernel memory to disk, this would be a foolish thing to do, because now the Lustre IO data which might be going at 1GB/s to a fast storage system might first be swapped to a slow single disk at 40MB/s (at best!) and then read back (< 40MB/s) and then written to the fast storage. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Dear Brain, There are two kinds of threads that "may be" ost thread ll_ost_** and ll_ost_io_**. Which one is ost thread? I am asking this question to make sure the module option works. in /etc/modprobe.conf: options ost oss_num_threads=200 [root at boss03 ~]# ps -ef | grep ll_ost|wc -l 405 [root at boss03 ~]# ps -ef | grep ll_ost_io|wc -l 201 If ll_ost_io is ost thread, then what is ll_ost? Brian J. Murrell ?:> On Fri, 2008-11-28 at 16:22 +0800, Wang lu wrote: >> >> Nov 28 16:12:50 boss01 kernel: LustreError: 2129:0:(filter_io_26.c:779: >> filter_commitrw_write()) besfs-OST0005: slow direct_io 53s >> Nov 28 16:12:50 boss01 kernel: LustreError: 871:0:(lustre_fsfilt.h:318: >> fsfilt_commit_wait()) besfs-OST0005: slow journal start 53s >> Nov 28 16:12:50 boss01 kernel: LustreError: 871:0:(filter_io_26.c:792: >> filter_commitrw_write()) besfs-OST0005: slow commitrw commit 53s >> Nov 28 16:13:24 boss01 kernel: LustreError: 838:0:(filter_io_26.c:714: >> filter_commitrw_write()) besfs-OST0002: slow i_mutex 123s >> Nov 28 16:13:24 boss01 kernel: LustreError: 838:0:(filter_io_26.c:714: >> filter_commitrw_write()) Skipped 84 previous similar messages >> Nov 28 16:13:24 boss01 kernel: LustreError: 2107:0:(lustre_fsfilt.h:262: >> fsfilt_brw_start_log()) besfs-OST0002: slow journal start 123s >> Nov 28 16:13:24 boss01 kernel: LustreError: 2107:0:(lustre_fsfilt.h:262: >> fsfilt_brw_start_log()) Skipped 10 previous similar messages >> Nov 28 16:13:24 boss01 kernel: LustreError: 2107:0:(filter_io_26.c:727: >> filter_commitrw_write()) besfs-OST0002: slow brw_start 123s >> Nov 28 16:13:24 boss01 kernel: LustreError: 2107:0:(filter_io_26.c:727: >> filter_commitrw_write()) Skipped 11 previous similar messages >> Nov 28 16:13:24 boss01 kernel: LustreError: 2093:0:(lustre_fsfilt.h:227: >> fsfilt_start_log()) besfs-OST0002: slow journal start 122s >> Nov 28 16:13:24 boss01 kernel: LustreError: 2175:0:(lustre_fsfilt.h:318: >> fsfilt_commit_wait()) besfs-OST0002: > > Your storage is too slow for the OSS load you are throwing at it. Try > reducing the number of OST threads on your OSSes. I don''t recall > exactly the name of the module option but it is an ost module option. > modinfo should tell you. I''m sure the operations manual covers this as > well. > > b. >
On Mon, 2008-12-01 at 15:52 +0800, Wang lu wrote:> Dear Brain, > There are two kinds of threads that "may be" ost thread > ll_ost_** and ll_ost_io_**.You do see that (in terms of name globs) ll_ost_** is a subset of ll_ost_io_**, yes? Ahh. Perhaps you mean ll_ost_?? and ll_ost_io_??.> Which one is ost thread? I am asking this question > to make sure the module option works. > in /etc/modprobe.conf: > > > options ost oss_num_threads=200 > [root at boss03 ~]# ps -ef | grep ll_ost|wc -l > 405 > [root at boss03 ~]# ps -ef | grep ll_ost_io|wc -l > 201Given that the count of ll_ost_io is 201 (which will include your grep line) and that your count of ll_ost, which will include ll_ost_io as well as the other threads is just over double that, I''d say that your module option is working. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081201/55b86c07/attachment-0001.bin