The first thing i would try is to turn off all debug. For speed comparisons, you really need to. Try ''echo 0 > /proc/sys/portals/debug'' on the machine. By default lustre is configured with full debug on, as you installed it from the source, and started it with the test scripts. Evan --=20 ------------------------- Evan Felix Environmental Molecular Sciences Laboratory Pacific Northwest National Laboratory Operated for the U.S. DOE by Battelle On Thu, 2003-10-09 at 09:05, sophana wrote:> Hmmm it appears that it really is lustre related. >=20 > =20 >=20 > When I launch my test in /tmp it goes about 10 times faster. (not > really measured) >=20 > =20 >=20 > My makefile launches 3 jobs in parallel. Some of these jobs are big > python scripts that open about 150 files each. >=20 > With top, I can see that they are very very long to execute. >=20 > Is lustre doing things in user space? >=20 > There are also lots of ost and mdt processes consuming lot of cpu. >=20 > =20 >=20 > Maybe the local.sh and llmount scripts in /usr/lib/lustre/example/ are > not optimized? >=20 > =20 >=20 > Do you think having ost and mdt on separate machines can make things > MUCH faster? >=20 > =20 >=20 > Any help would be appreciated. >=20 > Regards >=20 > Sophana >=20 > =20 >=20 > -----Original Message----- > From: lustre-discuss-admin@lists.clusterfs.com > [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of sophana > Sent: jeudi 9 octobre 2003 17:43 > To: lustre-discuss@lists.clusterfs.com > Subject: RE: [Lustre-discuss] lustre7.3 problems >=20 > =20 >=20 > Sorry >=20 > =20 >=20 > I rebooted and retried. Now everything seems to be ok >=20 > =20 >=20 > Except one thing, everything seems to be really slow! >=20 > And this is not related to the FS. The cpu itself seems much slower. >=20 > I will investigate why=85 maybe patch another kernel=85 >=20 > =20 >=20 > =20 >=20 > -----Original Message----- > From: lustre-discuss-admin@lists.clusterfs.com > [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of sophana > Sent: jeudi 9 octobre 2003 17:15 > To: lustre-discuss@lists.clusterfs.com > Subject: [Lustre-discuss] lustre7.3 problems >=20 > =20 >=20 > Hi all >=20 > =20 >=20 > I=92m trying to evaluate lustre 0.7.3 on my redhat 7.3.=20 >=20 > =20 >=20 > I installed the rpms, but one of them (luster lite rpm) is requiring > glibc 2.3 and my redhat has glibc 2.2. >=20 > I cannot use 2.3 because one of my apps (that is commercial and cannot > be recompiled) is not compatible with glibc 2.3. >=20 > =20 >=20 > After several problems, I finally recompiled the kernel from sources > and also lustre from sources. (v0.7.3) >=20 > =20 >=20 > When I tried llmount.sh as mentioned in the howto, It could not load > the modules because it had no path. I found out I had to set the > PORTALS and LUSTRE variables to the correct source paths, then llmount > script execution succeeded. >=20 > =20 >=20 > Then I could play with /mnt/lustre. >=20 > =20 >=20 > The first command I did was to check out my code in mnt/lustre with > svn. >=20 > It fails with a segmentation fault. So I cannot event load my code to > try compilation in the lustre FS. >=20 > Of course when I checkout my code in another (non lustre) directory, > it works. >=20 > =20 >=20 > The same when I do=20 >=20 > echo foo > foo >=20 > =20 >=20 > Should I try an earlier version of lustre? >=20 > =20 >=20 > <1>Unable to handle kernel paging request at virtual address 73727b90=20 > printing eip: d0947aa2 *pde =3D 00000000 >=20 > Oops: 0000 >=20 > loop llite mdc osc mds obdfilter fsfilt_ext3 ost ldlm ptlrpc obdclass > ksocknal portals parport_pc lp parport binfmt_misc nfs autofs nfsd > lockd sunrpc bcm5700 >=20 > CPU: 0 >=20 > EIP: 0060:[<d0947aa2>] Not tainted >=20 > EFLAGS: 00010207 >=20 > =20 >=20 > EIP is at free_limit [llite] 0x142 (2.4.20-rh-lustre21) >=20 > eax: 00199418 ebx: 7fb93bb0 ecx: c02ebd40 edx: 73727170 >=20 > esi: 00000002 edi: 000001ff ebp: cb19bd58 esp: cb19bd50 >=20 > ds: 0068 es: 0068 ss: 0068 >=20 > Process tcsh (pid: 7694, stackpage=3Dcb19b000) >=20 > Stack: c6081a20 fffff000 cb19bd68 d0946b1a 00000000 00000002 cb19beb8 > d0946e81 >=20 > 00000080 00000001 d094a365 d094a44f 0000011b 00000148 d094a3d7 > 5a595857 >=20 > 62613900 66656463 6a696867 33323130 37363534 62613938 66656463 > 6a696867 >=20 > Call Trace: [<d0946b1a>] should_writeback [llite] 0x1a (0xcb19bd5c)) >=20 > [<d0946e81>] ll_check_dirty [llite] 0x51 (0xcb19bd6c)) [<d094a365>] > .rodata.str1.1 [llite] 0x1ba5 (0xcb19bd78)) [<d094a44f>] > .rodata.str1.1 [llite] 0x1c8f (0xcb19bd7c)) [<d094a3d7>] > .rodata.str1.1 [llite] 0x1c17 (0xcb19bd88)) [<c0227812>] vsnprintf > [kernel] 0x2a2 (0xc511de18)) [<c0227812>] vsnprintf [kernel] 0x2a2 > (0xc511de2c)) [<d094b92b>] .rodata.str1.32 [llite] 0x140b > (0xc511de78)) [<d094922d>] .rodata.str1.1 [llite] 0xa6d (0xc511de84)) > [<d0923cf5>] ll_file_write [llite] 0xc5 (0xc511debc)) [<d0948e8d>] > .rodata.str1.1 [llite] 0x6cd (0xc511dec8)) [<d094922d>] .rodata.str1.1 > [llite] 0xa6d (0xc511decc)) [<d094b900>] .rodata.str1.32 [llite] > 0x13e0 (0xc511ded8)) [<d0948e8d>] .rodata.str1.1 [llite] 0x6cd > (0xc511df30)) [<d094925d>] .rodata.str1.1 [llite] 0xa9d (0xc511df34)) > [<d094b9c0>] .rodata.str1.32 [llite] 0x14a0 (0xc511df40)) [<c0141c46>] > sys_write [kernel] 0x96 (0xc511df5c)) [<c014f623>] dupfd [kernel] 0x23 > (0xc511df84)) [<c014f63f>] dupfd [kernel] 0x3f (0xc511df88)) > [<c0123d45>] sys_rt_sigprocmask [kernel] 0x125 (0xc511df98)) > [<c0108cd3>] system_call [kernel] 0x33 (0xc511dfc0)) >=20 > =20 >=20 > =20 >=20 > Code: 8b 82 20 0a 00 00 31 d2 85 c0 0f 94 c2 4a 21 c2 85 d2 89 d1 >=20 > =20 >=20 > =20 >=20 > =20 >=20 > =20 >=20 > Any help appreciated >=20 > =20 >=20 > Best regards >=20 > Sophana >=20 > =20 >=20 > =20 >=20 > =20 >=20
Sorry I rebooted and retried. Now everything seems to be ok Except one thing, everything seems to be really slow! And this is not related to the FS. The cpu itself seems much slower. I will investigate why. maybe patch another kernel. -----Original Message----- From: lustre-discuss-admin@lists.clusterfs.com [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of sophana Sent: jeudi 9 octobre 2003 17:15 To: lustre-discuss@lists.clusterfs.com Subject: [Lustre-discuss] lustre7.3 problems Hi all I''m trying to evaluate lustre 0.7.3 on my redhat 7.3. I installed the rpms, but one of them (luster lite rpm) is requiring glibc 2.3 and my redhat has glibc 2.2. I cannot use 2.3 because one of my apps (that is commercial and cannot be recompiled) is not compatible with glibc 2.3. After several problems, I finally recompiled the kernel from sources and also lustre from sources. (v0.7.3) When I tried llmount.sh as mentioned in the howto, It could not load the modules because it had no path. I found out I had to set the PORTALS and LUSTRE variables to the correct source paths, then llmount script execution succeeded. Then I could play with /mnt/lustre. The first command I did was to check out my code in mnt/lustre with svn. It fails with a segmentation fault. So I cannot event load my code to try compilation in the lustre FS. Of course when I checkout my code in another (non lustre) directory, it works. The same when I do echo foo > foo Should I try an earlier version of lustre? <1>Unable to handle kernel paging request at virtual address 73727b90 printing eip: d0947aa2 *pde = 00000000 Oops: 0000 loop llite mdc osc mds obdfilter fsfilt_ext3 ost ldlm ptlrpc obdclass ksocknal portals parport_pc lp parport binfmt_misc nfs autofs nfsd lockd sunrpc bcm5700 CPU: 0 EIP: 0060:[<d0947aa2>] Not tainted EFLAGS: 00010207 EIP is at free_limit [llite] 0x142 (2.4.20-rh-lustre21) eax: 00199418 ebx: 7fb93bb0 ecx: c02ebd40 edx: 73727170 esi: 00000002 edi: 000001ff ebp: cb19bd58 esp: cb19bd50 ds: 0068 es: 0068 ss: 0068 Process tcsh (pid: 7694, stackpage=cb19b000) Stack: c6081a20 fffff000 cb19bd68 d0946b1a 00000000 00000002 cb19beb8 d0946e81 00000080 00000001 d094a365 d094a44f 0000011b 00000148 d094a3d7 5a595857 62613900 66656463 6a696867 33323130 37363534 62613938 66656463 6a696867 Call Trace: [<d0946b1a>] should_writeback [llite] 0x1a (0xcb19bd5c)) [<d0946e81>] ll_check_dirty [llite] 0x51 (0xcb19bd6c)) [<d094a365>] .rodata.str1.1 [llite] 0x1ba5 (0xcb19bd78)) [<d094a44f>] .rodata.str1.1 [llite] 0x1c8f (0xcb19bd7c)) [<d094a3d7>] .rodata.str1.1 [llite] 0x1c17 (0xcb19bd88)) [<c0227812>] vsnprintf [kernel] 0x2a2 (0xc511de18)) [<c0227812>] vsnprintf [kernel] 0x2a2 (0xc511de2c)) [<d094b92b>] .rodata.str1.32 [llite] 0x140b (0xc511de78)) [<d094922d>] .rodata.str1.1 [llite] 0xa6d (0xc511de84)) [<d0923cf5>] ll_file_write [llite] 0xc5 (0xc511debc)) [<d0948e8d>] .rodata.str1.1 [llite] 0x6cd (0xc511dec8)) [<d094922d>] .rodata.str1.1 [llite] 0xa6d (0xc511decc)) [<d094b900>] .rodata.str1.32 [llite] 0x13e0 (0xc511ded8)) [<d0948e8d>] .rodata.str1.1 [llite] 0x6cd (0xc511df30)) [<d094925d>] .rodata.str1.1 [llite] 0xa9d (0xc511df34)) [<d094b9c0>] .rodata.str1.32 [llite] 0x14a0 (0xc511df40)) [<c0141c46>] sys_write [kernel] 0x96 (0xc511df5c)) [<c014f623>] dupfd [kernel] 0x23 (0xc511df84)) [<c014f63f>] dupfd [kernel] 0x3f (0xc511df88)) [<c0123d45>] sys_rt_sigprocmask [kernel] 0x125 (0xc511df98)) [<c0108cd3>] system_call [kernel] 0x33 (0xc511dfc0)) Code: 8b 82 20 0a 00 00 31 d2 85 c0 0f 94 c2 4a 21 c2 85 d2 89 d1 Any help appreciated Best regards Sophana
Hmmm it appears that it really is lustre related. When I launch my test in /tmp it goes about 10 times faster. (not really measured) My makefile launches 3 jobs in parallel. Some of these jobs are big python scripts that open about 150 files each. With top, I can see that they are very very long to execute. Is lustre doing things in user space? There are also lots of ost and mdt processes consuming lot of cpu. Maybe the local.sh and llmount scripts in /usr/lib/lustre/example/ are not optimized? Do you think having ost and mdt on separate machines can make things MUCH faster? Any help would be appreciated. Regards Sophana -----Original Message----- From: lustre-discuss-admin@lists.clusterfs.com [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of sophana Sent: jeudi 9 octobre 2003 17:43 To: lustre-discuss@lists.clusterfs.com Subject: RE: [Lustre-discuss] lustre7.3 problems Sorry I rebooted and retried. Now everything seems to be ok Except one thing, everything seems to be really slow! And this is not related to the FS. The cpu itself seems much slower. I will investigate why. maybe patch another kernel. -----Original Message----- From: lustre-discuss-admin@lists.clusterfs.com [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of sophana Sent: jeudi 9 octobre 2003 17:15 To: lustre-discuss@lists.clusterfs.com Subject: [Lustre-discuss] lustre7.3 problems Hi all I''m trying to evaluate lustre 0.7.3 on my redhat 7.3. I installed the rpms, but one of them (luster lite rpm) is requiring glibc 2.3 and my redhat has glibc 2.2. I cannot use 2.3 because one of my apps (that is commercial and cannot be recompiled) is not compatible with glibc 2.3. After several problems, I finally recompiled the kernel from sources and also lustre from sources. (v0.7.3) When I tried llmount.sh as mentioned in the howto, It could not load the modules because it had no path. I found out I had to set the PORTALS and LUSTRE variables to the correct source paths, then llmount script execution succeeded. Then I could play with /mnt/lustre. The first command I did was to check out my code in mnt/lustre with svn. It fails with a segmentation fault. So I cannot event load my code to try compilation in the lustre FS. Of course when I checkout my code in another (non lustre) directory, it works. The same when I do echo foo > foo Should I try an earlier version of lustre? <1>Unable to handle kernel paging request at virtual address 73727b90 printing eip: d0947aa2 *pde = 00000000 Oops: 0000 loop llite mdc osc mds obdfilter fsfilt_ext3 ost ldlm ptlrpc obdclass ksocknal portals parport_pc lp parport binfmt_misc nfs autofs nfsd lockd sunrpc bcm5700 CPU: 0 EIP: 0060:[<d0947aa2>] Not tainted EFLAGS: 00010207 EIP is at free_limit [llite] 0x142 (2.4.20-rh-lustre21) eax: 00199418 ebx: 7fb93bb0 ecx: c02ebd40 edx: 73727170 esi: 00000002 edi: 000001ff ebp: cb19bd58 esp: cb19bd50 ds: 0068 es: 0068 ss: 0068 Process tcsh (pid: 7694, stackpage=cb19b000) Stack: c6081a20 fffff000 cb19bd68 d0946b1a 00000000 00000002 cb19beb8 d0946e81 00000080 00000001 d094a365 d094a44f 0000011b 00000148 d094a3d7 5a595857 62613900 66656463 6a696867 33323130 37363534 62613938 66656463 6a696867 Call Trace: [<d0946b1a>] should_writeback [llite] 0x1a (0xcb19bd5c)) [<d0946e81>] ll_check_dirty [llite] 0x51 (0xcb19bd6c)) [<d094a365>] .rodata.str1.1 [llite] 0x1ba5 (0xcb19bd78)) [<d094a44f>] .rodata.str1.1 [llite] 0x1c8f (0xcb19bd7c)) [<d094a3d7>] .rodata.str1.1 [llite] 0x1c17 (0xcb19bd88)) [<c0227812>] vsnprintf [kernel] 0x2a2 (0xc511de18)) [<c0227812>] vsnprintf [kernel] 0x2a2 (0xc511de2c)) [<d094b92b>] .rodata.str1.32 [llite] 0x140b (0xc511de78)) [<d094922d>] .rodata.str1.1 [llite] 0xa6d (0xc511de84)) [<d0923cf5>] ll_file_write [llite] 0xc5 (0xc511debc)) [<d0948e8d>] .rodata.str1.1 [llite] 0x6cd (0xc511dec8)) [<d094922d>] .rodata.str1.1 [llite] 0xa6d (0xc511decc)) [<d094b900>] .rodata.str1.32 [llite] 0x13e0 (0xc511ded8)) [<d0948e8d>] .rodata.str1.1 [llite] 0x6cd (0xc511df30)) [<d094925d>] .rodata.str1.1 [llite] 0xa9d (0xc511df34)) [<d094b9c0>] .rodata.str1.32 [llite] 0x14a0 (0xc511df40)) [<c0141c46>] sys_write [kernel] 0x96 (0xc511df5c)) [<c014f623>] dupfd [kernel] 0x23 (0xc511df84)) [<c014f63f>] dupfd [kernel] 0x3f (0xc511df88)) [<c0123d45>] sys_rt_sigprocmask [kernel] 0x125 (0xc511df98)) [<c0108cd3>] system_call [kernel] 0x33 (0xc511dfc0)) Code: 8b 82 20 0a 00 00 31 d2 85 c0 0f 94 c2 4a 21 c2 85 d2 89 d1 Any help appreciated Best regards Sophana
Hi all I''m trying to evaluate lustre 0.7.3 on my redhat 7.3. I installed the rpms, but one of them (luster lite rpm) is requiring glibc 2.3 and my redhat has glibc 2.2. I cannot use 2.3 because one of my apps (that is commercial and cannot be recompiled) is not compatible with glibc 2.3. After several problems, I finally recompiled the kernel from sources and also lustre from sources. (v0.7.3) When I tried llmount.sh as mentioned in the howto, It could not load the modules because it had no path. I found out I had to set the PORTALS and LUSTRE variables to the correct source paths, then llmount script execution succeeded. Then I could play with /mnt/lustre. The first command I did was to check out my code in mnt/lustre with svn. It fails with a segmentation fault. So I cannot event load my code to try compilation in the lustre FS. Of course when I checkout my code in another (non lustre) directory, it works. The same when I do echo foo > foo Should I try an earlier version of lustre? <1>Unable to handle kernel paging request at virtual address 73727b90 printing eip: d0947aa2 *pde = 00000000 Oops: 0000 loop llite mdc osc mds obdfilter fsfilt_ext3 ost ldlm ptlrpc obdclass ksocknal portals parport_pc lp parport binfmt_misc nfs autofs nfsd lockd sunrpc bcm5700 CPU: 0 EIP: 0060:[<d0947aa2>] Not tainted EFLAGS: 00010207 EIP is at free_limit [llite] 0x142 (2.4.20-rh-lustre21) eax: 00199418 ebx: 7fb93bb0 ecx: c02ebd40 edx: 73727170 esi: 00000002 edi: 000001ff ebp: cb19bd58 esp: cb19bd50 ds: 0068 es: 0068 ss: 0068 Process tcsh (pid: 7694, stackpage=cb19b000) Stack: c6081a20 fffff000 cb19bd68 d0946b1a 00000000 00000002 cb19beb8 d0946e81 00000080 00000001 d094a365 d094a44f 0000011b 00000148 d094a3d7 5a595857 62613900 66656463 6a696867 33323130 37363534 62613938 66656463 6a696867 Call Trace: [<d0946b1a>] should_writeback [llite] 0x1a (0xcb19bd5c)) [<d0946e81>] ll_check_dirty [llite] 0x51 (0xcb19bd6c)) [<d094a365>] .rodata.str1.1 [llite] 0x1ba5 (0xcb19bd78)) [<d094a44f>] .rodata.str1.1 [llite] 0x1c8f (0xcb19bd7c)) [<d094a3d7>] .rodata.str1.1 [llite] 0x1c17 (0xcb19bd88)) [<c0227812>] vsnprintf [kernel] 0x2a2 (0xc511de18)) [<c0227812>] vsnprintf [kernel] 0x2a2 (0xc511de2c)) [<d094b92b>] .rodata.str1.32 [llite] 0x140b (0xc511de78)) [<d094922d>] .rodata.str1.1 [llite] 0xa6d (0xc511de84)) [<d0923cf5>] ll_file_write [llite] 0xc5 (0xc511debc)) [<d0948e8d>] .rodata.str1.1 [llite] 0x6cd (0xc511dec8)) [<d094922d>] .rodata.str1.1 [llite] 0xa6d (0xc511decc)) [<d094b900>] .rodata.str1.32 [llite] 0x13e0 (0xc511ded8)) [<d0948e8d>] .rodata.str1.1 [llite] 0x6cd (0xc511df30)) [<d094925d>] .rodata.str1.1 [llite] 0xa9d (0xc511df34)) [<d094b9c0>] .rodata.str1.32 [llite] 0x14a0 (0xc511df40)) [<c0141c46>] sys_write [kernel] 0x96 (0xc511df5c)) [<c014f623>] dupfd [kernel] 0x23 (0xc511df84)) [<c014f63f>] dupfd [kernel] 0x3f (0xc511df88)) [<c0123d45>] sys_rt_sigprocmask [kernel] 0x125 (0xc511df98)) [<c0108cd3>] system_call [kernel] 0x33 (0xc511dfc0)) Code: 8b 82 20 0a 00 00 31 d2 85 c0 0f 94 c2 4a 21 c2 85 d2 89 d1 Any help appreciated Best regards Sophana