The first thing i would try is to turn off all debug. For speed comparisons, you really need to. Try ''echo 0 > /proc/sys/portals/debug'' on the machine. By default lustre is configured with full debug on, as you installed it from the source, and started it with the test scripts. Evan --=20 ------------------------- Evan Felix Environmental Molecular Sciences Laboratory Pacific Northwest National Laboratory Operated for the U.S. DOE by Battelle On Thu, 2003-10-09 at 09:05, sophana wrote:> Hmmm it appears that it really is lustre related. >=20 > =20 >=20 > When I launch my test in /tmp it goes about 10 times faster. (not > really measured) >=20 > =20 >=20 > My makefile launches 3 jobs in parallel. Some of these jobs are big > python scripts that open about 150 files each. >=20 > With top, I can see that they are very very long to execute. >=20 > Is lustre doing things in user space? >=20 > There are also lots of ost and mdt processes consuming lot of cpu. >=20 > =20 >=20 > Maybe the local.sh and llmount scripts in /usr/lib/lustre/example/ are > not optimized? >=20 > =20 >=20 > Do you think having ost and mdt on separate machines can make things > MUCH faster? >=20 > =20 >=20 > Any help would be appreciated. >=20 > Regards >=20 > Sophana >=20 > =20 >=20 > -----Original Message----- > From: lustre-discuss-admin@lists.clusterfs.com > [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of sophana > Sent: jeudi 9 octobre 2003 17:43 > To: lustre-discuss@lists.clusterfs.com > Subject: RE: [Lustre-discuss] lustre7.3 problems >=20 > =20 >=20 > Sorry >=20 > =20 >=20 > I rebooted and retried. Now everything seems to be ok >=20 > =20 >=20 > Except one thing, everything seems to be really slow! >=20 > And this is not related to the FS. The cpu itself seems much slower. >=20 > I will investigate why=85 maybe patch another kernel=85 >=20 > =20 >=20 > =20 >=20 > -----Original Message----- > From: lustre-discuss-admin@lists.clusterfs.com > [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of sophana > Sent: jeudi 9 octobre 2003 17:15 > To: lustre-discuss@lists.clusterfs.com > Subject: [Lustre-discuss] lustre7.3 problems >=20 > =20 >=20 > Hi all >=20 > =20 >=20 > I=92m trying to evaluate lustre 0.7.3 on my redhat 7.3.=20 >=20 > =20 >=20 > I installed the rpms, but one of them (luster lite rpm) is requiring > glibc 2.3 and my redhat has glibc 2.2. >=20 > I cannot use 2.3 because one of my apps (that is commercial and cannot > be recompiled) is not compatible with glibc 2.3. >=20 > =20 >=20 > After several problems, I finally recompiled the kernel from sources > and also lustre from sources. (v0.7.3) >=20 > =20 >=20 > When I tried llmount.sh as mentioned in the howto, It could not load > the modules because it had no path. I found out I had to set the > PORTALS and LUSTRE variables to the correct source paths, then llmount > script execution succeeded. >=20 > =20 >=20 > Then I could play with /mnt/lustre. >=20 > =20 >=20 > The first command I did was to check out my code in mnt/lustre with > svn. >=20 > It fails with a segmentation fault. So I cannot event load my code to > try compilation in the lustre FS. >=20 > Of course when I checkout my code in another (non lustre) directory, > it works. >=20 > =20 >=20 > The same when I do=20 >=20 > echo foo > foo >=20 > =20 >=20 > Should I try an earlier version of lustre? >=20 > =20 >=20 > <1>Unable to handle kernel paging request at virtual address 73727b90=20 > printing eip: d0947aa2 *pde =3D 00000000 >=20 > Oops: 0000 >=20 > loop llite mdc osc mds obdfilter fsfilt_ext3 ost ldlm ptlrpc obdclass > ksocknal portals parport_pc lp parport binfmt_misc nfs autofs nfsd > lockd sunrpc bcm5700 >=20 > CPU: 0 >=20 > EIP: 0060:[<d0947aa2>] Not tainted >=20 > EFLAGS: 00010207 >=20 > =20 >=20 > EIP is at free_limit [llite] 0x142 (2.4.20-rh-lustre21) >=20 > eax: 00199418 ebx: 7fb93bb0 ecx: c02ebd40 edx: 73727170 >=20 > esi: 00000002 edi: 000001ff ebp: cb19bd58 esp: cb19bd50 >=20 > ds: 0068 es: 0068 ss: 0068 >=20 > Process tcsh (pid: 7694, stackpage=3Dcb19b000) >=20 > Stack: c6081a20 fffff000 cb19bd68 d0946b1a 00000000 00000002 cb19beb8 > d0946e81 >=20 > 00000080 00000001 d094a365 d094a44f 0000011b 00000148 d094a3d7 > 5a595857 >=20 > 62613900 66656463 6a696867 33323130 37363534 62613938 66656463 > 6a696867 >=20 > Call Trace: [<d0946b1a>] should_writeback [llite] 0x1a (0xcb19bd5c)) >=20 > [<d0946e81>] ll_check_dirty [llite] 0x51 (0xcb19bd6c)) [<d094a365>] > .rodata.str1.1 [llite] 0x1ba5 (0xcb19bd78)) [<d094a44f>] > .rodata.str1.1 [llite] 0x1c8f (0xcb19bd7c)) [<d094a3d7>] > .rodata.str1.1 [llite] 0x1c17 (0xcb19bd88)) [<c0227812>] vsnprintf > [kernel] 0x2a2 (0xc511de18)) [<c0227812>] vsnprintf [kernel] 0x2a2 > (0xc511de2c)) [<d094b92b>] .rodata.str1.32 [llite] 0x140b > (0xc511de78)) [<d094922d>] .rodata.str1.1 [llite] 0xa6d (0xc511de84)) > [<d0923cf5>] ll_file_write [llite] 0xc5 (0xc511debc)) [<d0948e8d>] > .rodata.str1.1 [llite] 0x6cd (0xc511dec8)) [<d094922d>] .rodata.str1.1 > [llite] 0xa6d (0xc511decc)) [<d094b900>] .rodata.str1.32 [llite] > 0x13e0 (0xc511ded8)) [<d0948e8d>] .rodata.str1.1 [llite] 0x6cd > (0xc511df30)) [<d094925d>] .rodata.str1.1 [llite] 0xa9d (0xc511df34)) > [<d094b9c0>] .rodata.str1.32 [llite] 0x14a0 (0xc511df40)) [<c0141c46>] > sys_write [kernel] 0x96 (0xc511df5c)) [<c014f623>] dupfd [kernel] 0x23 > (0xc511df84)) [<c014f63f>] dupfd [kernel] 0x3f (0xc511df88)) > [<c0123d45>] sys_rt_sigprocmask [kernel] 0x125 (0xc511df98)) > [<c0108cd3>] system_call [kernel] 0x33 (0xc511dfc0)) >=20 > =20 >=20 > =20 >=20 > Code: 8b 82 20 0a 00 00 31 d2 85 c0 0f 94 c2 4a 21 c2 85 d2 89 d1 >=20 > =20 >=20 > =20 >=20 > =20 >=20 > =20 >=20 > Any help appreciated >=20 > =20 >=20 > Best regards >=20 > Sophana >=20 > =20 >=20 > =20 >=20 > =20 >=20
Sorry
 
I rebooted and retried. Now everything seems to be ok
 
Except one thing, everything seems to be really slow!
And this is not related to the FS. The cpu itself seems much slower.
I will investigate why. maybe patch another kernel.
 
 
-----Original Message-----
From: lustre-discuss-admin@lists.clusterfs.com
[mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of sophana
Sent: jeudi 9 octobre 2003 17:15
To: lustre-discuss@lists.clusterfs.com
Subject: [Lustre-discuss] lustre7.3 problems
 
Hi all
 
I''m trying to evaluate lustre 0.7.3 on my redhat 7.3. 
 
I installed the rpms, but one of them (luster lite rpm) is requiring glibc
2.3 and my redhat has glibc 2.2.
I cannot use 2.3 because one of my apps (that is commercial and cannot be
recompiled) is not compatible with glibc 2.3.
 
After several problems, I finally recompiled the kernel from sources and
also lustre from sources. (v0.7.3)
 
When I tried llmount.sh as mentioned in the howto, It could not load the
modules because it had no path. I found out I had to set the PORTALS and
LUSTRE variables to the correct source paths, then llmount script execution
succeeded.
 
Then I could play with /mnt/lustre.
 
The first command I did was to check out my code in mnt/lustre with svn.
It fails with a segmentation fault. So I cannot event load my code to try
compilation in the lustre FS.
Of course when I checkout my code in another (non lustre) directory, it
works.
 
The same when I do 
echo foo > foo
 
Should I try an earlier version of lustre?
 
<1>Unable to handle kernel paging request at virtual address 73727b90
printing eip: d0947aa2 *pde = 00000000
Oops: 0000
loop llite mdc osc mds obdfilter fsfilt_ext3 ost ldlm ptlrpc obdclass
ksocknal portals parport_pc lp parport binfmt_misc nfs autofs nfsd lockd
sunrpc bcm5700
CPU:    0
EIP:    0060:[<d0947aa2>]    Not tainted
EFLAGS: 00010207
 
EIP is at free_limit [llite] 0x142 (2.4.20-rh-lustre21)
eax: 00199418   ebx: 7fb93bb0   ecx: c02ebd40   edx: 73727170
esi: 00000002   edi: 000001ff   ebp: cb19bd58   esp: cb19bd50
ds: 0068   es: 0068   ss: 0068
Process tcsh (pid: 7694, stackpage=cb19b000)
Stack: c6081a20 fffff000 cb19bd68 d0946b1a 00000000 00000002 cb19beb8
d0946e81
       00000080 00000001 d094a365 d094a44f 0000011b 00000148 d094a3d7
5a595857
       62613900 66656463 6a696867 33323130 37363534 62613938 66656463
6a696867
Call Trace:   [<d0946b1a>] should_writeback [llite] 0x1a (0xcb19bd5c))
[<d0946e81>] ll_check_dirty [llite] 0x51 (0xcb19bd6c)) [<d094a365>]
.rodata.str1.1 [llite] 0x1ba5 (0xcb19bd78)) [<d094a44f>] .rodata.str1.1
[llite] 0x1c8f (0xcb19bd7c)) [<d094a3d7>] .rodata.str1.1 [llite] 0x1c17
(0xcb19bd88)) [<c0227812>] vsnprintf [kernel] 0x2a2 (0xc511de18))
[<c0227812>] vsnprintf [kernel] 0x2a2 (0xc511de2c)) [<d094b92b>]
.rodata.str1.32 [llite] 0x140b (0xc511de78)) [<d094922d>] .rodata.str1.1
[llite] 0xa6d (0xc511de84)) [<d0923cf5>] ll_file_write [llite] 0xc5
(0xc511debc)) [<d0948e8d>] .rodata.str1.1 [llite] 0x6cd (0xc511dec8))
[<d094922d>] .rodata.str1.1 [llite] 0xa6d (0xc511decc)) [<d094b900>]
.rodata.str1.32 [llite] 0x13e0 (0xc511ded8)) [<d0948e8d>] .rodata.str1.1
[llite] 0x6cd (0xc511df30)) [<d094925d>] .rodata.str1.1 [llite] 0xa9d
(0xc511df34)) [<d094b9c0>] .rodata.str1.32 [llite] 0x14a0 (0xc511df40))
[<c0141c46>] sys_write [kernel] 0x96 (0xc511df5c)) [<c014f623>]
dupfd
[kernel] 0x23 (0xc511df84)) [<c014f63f>] dupfd [kernel] 0x3f (0xc511df88))
[<c0123d45>] sys_rt_sigprocmask [kernel] 0x125 (0xc511df98))
[<c0108cd3>]
system_call [kernel] 0x33 (0xc511dfc0))
 
 
Code: 8b 82 20 0a 00 00 31 d2 85 c0 0f 94 c2 4a 21 c2 85 d2 89 d1
 
 
 
 
Any help appreciated
 
Best regards
Sophana
Hmmm it appears that it really is lustre related.
 
When I launch my test in /tmp it goes about 10 times faster. (not really
measured)
 
My makefile launches 3 jobs in parallel. Some of these jobs are big python
scripts that open about 150 files each.
With top, I can see that they are very very long to execute.
Is lustre doing things in user space?
There are also lots of  ost and mdt processes consuming lot of cpu.
 
Maybe the local.sh and llmount scripts in /usr/lib/lustre/example/ are not
optimized?
 
Do you think having ost and mdt on separate machines can make things MUCH
faster?
 
Any help would be appreciated.
Regards
Sophana
 
-----Original Message-----
From: lustre-discuss-admin@lists.clusterfs.com
[mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of sophana
Sent: jeudi 9 octobre 2003 17:43
To: lustre-discuss@lists.clusterfs.com
Subject: RE: [Lustre-discuss] lustre7.3 problems
 
Sorry
 
I rebooted and retried. Now everything seems to be ok
 
Except one thing, everything seems to be really slow!
And this is not related to the FS. The cpu itself seems much slower.
I will investigate why. maybe patch another kernel.
 
 
-----Original Message-----
From: lustre-discuss-admin@lists.clusterfs.com
[mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of sophana
Sent: jeudi 9 octobre 2003 17:15
To: lustre-discuss@lists.clusterfs.com
Subject: [Lustre-discuss] lustre7.3 problems
 
Hi all
 
I''m trying to evaluate lustre 0.7.3 on my redhat 7.3. 
 
I installed the rpms, but one of them (luster lite rpm) is requiring glibc
2.3 and my redhat has glibc 2.2.
I cannot use 2.3 because one of my apps (that is commercial and cannot be
recompiled) is not compatible with glibc 2.3.
 
After several problems, I finally recompiled the kernel from sources and
also lustre from sources. (v0.7.3)
 
When I tried llmount.sh as mentioned in the howto, It could not load the
modules because it had no path. I found out I had to set the PORTALS and
LUSTRE variables to the correct source paths, then llmount script execution
succeeded.
 
Then I could play with /mnt/lustre.
 
The first command I did was to check out my code in mnt/lustre with svn.
It fails with a segmentation fault. So I cannot event load my code to try
compilation in the lustre FS.
Of course when I checkout my code in another (non lustre) directory, it
works.
 
The same when I do 
echo foo > foo
 
Should I try an earlier version of lustre?
 
<1>Unable to handle kernel paging request at virtual address 73727b90
printing eip: d0947aa2 *pde = 00000000
Oops: 0000
loop llite mdc osc mds obdfilter fsfilt_ext3 ost ldlm ptlrpc obdclass
ksocknal portals parport_pc lp parport binfmt_misc nfs autofs nfsd lockd
sunrpc bcm5700
CPU:    0
EIP:    0060:[<d0947aa2>]    Not tainted
EFLAGS: 00010207
 
EIP is at free_limit [llite] 0x142 (2.4.20-rh-lustre21)
eax: 00199418   ebx: 7fb93bb0   ecx: c02ebd40   edx: 73727170
esi: 00000002   edi: 000001ff   ebp: cb19bd58   esp: cb19bd50
ds: 0068   es: 0068   ss: 0068
Process tcsh (pid: 7694, stackpage=cb19b000)
Stack: c6081a20 fffff000 cb19bd68 d0946b1a 00000000 00000002 cb19beb8
d0946e81
       00000080 00000001 d094a365 d094a44f 0000011b 00000148 d094a3d7
5a595857
       62613900 66656463 6a696867 33323130 37363534 62613938 66656463
6a696867
Call Trace:   [<d0946b1a>] should_writeback [llite] 0x1a (0xcb19bd5c))
[<d0946e81>] ll_check_dirty [llite] 0x51 (0xcb19bd6c)) [<d094a365>]
.rodata.str1.1 [llite] 0x1ba5 (0xcb19bd78)) [<d094a44f>] .rodata.str1.1
[llite] 0x1c8f (0xcb19bd7c)) [<d094a3d7>] .rodata.str1.1 [llite] 0x1c17
(0xcb19bd88)) [<c0227812>] vsnprintf [kernel] 0x2a2 (0xc511de18))
[<c0227812>] vsnprintf [kernel] 0x2a2 (0xc511de2c)) [<d094b92b>]
.rodata.str1.32 [llite] 0x140b (0xc511de78)) [<d094922d>] .rodata.str1.1
[llite] 0xa6d (0xc511de84)) [<d0923cf5>] ll_file_write [llite] 0xc5
(0xc511debc)) [<d0948e8d>] .rodata.str1.1 [llite] 0x6cd (0xc511dec8))
[<d094922d>] .rodata.str1.1 [llite] 0xa6d (0xc511decc)) [<d094b900>]
.rodata.str1.32 [llite] 0x13e0 (0xc511ded8)) [<d0948e8d>] .rodata.str1.1
[llite] 0x6cd (0xc511df30)) [<d094925d>] .rodata.str1.1 [llite] 0xa9d
(0xc511df34)) [<d094b9c0>] .rodata.str1.32 [llite] 0x14a0 (0xc511df40))
[<c0141c46>] sys_write [kernel] 0x96 (0xc511df5c)) [<c014f623>]
dupfd
[kernel] 0x23 (0xc511df84)) [<c014f63f>] dupfd [kernel] 0x3f (0xc511df88))
[<c0123d45>] sys_rt_sigprocmask [kernel] 0x125 (0xc511df98))
[<c0108cd3>]
system_call [kernel] 0x33 (0xc511dfc0))
 
 
Code: 8b 82 20 0a 00 00 31 d2 85 c0 0f 94 c2 4a 21 c2 85 d2 89 d1
 
 
 
 
Any help appreciated
 
Best regards
Sophana
Hi all
 
I''m trying to evaluate lustre 0.7.3 on my redhat 7.3. 
 
I installed the rpms, but one of them (luster lite rpm) is requiring glibc
2.3 and my redhat has glibc 2.2.
I cannot use 2.3 because one of my apps (that is commercial and cannot be
recompiled) is not compatible with glibc 2.3.
 
After several problems, I finally recompiled the kernel from sources and
also lustre from sources. (v0.7.3)
 
When I tried llmount.sh as mentioned in the howto, It could not load the
modules because it had no path. I found out I had to set the PORTALS and
LUSTRE variables to the correct source paths, then llmount script execution
succeeded.
 
Then I could play with /mnt/lustre.
 
The first command I did was to check out my code in mnt/lustre with svn.
It fails with a segmentation fault. So I cannot event load my code to try
compilation in the lustre FS.
Of course when I checkout my code in another (non lustre) directory, it
works.
 
The same when I do 
echo foo > foo
 
Should I try an earlier version of lustre?
 
<1>Unable to handle kernel paging request at virtual address 73727b90
printing eip: d0947aa2 *pde = 00000000
Oops: 0000
loop llite mdc osc mds obdfilter fsfilt_ext3 ost ldlm ptlrpc obdclass
ksocknal portals parport_pc lp parport binfmt_misc nfs autofs nfsd lockd
sunrpc bcm5700
CPU:    0
EIP:    0060:[<d0947aa2>]    Not tainted
EFLAGS: 00010207
 
EIP is at free_limit [llite] 0x142 (2.4.20-rh-lustre21)
eax: 00199418   ebx: 7fb93bb0   ecx: c02ebd40   edx: 73727170
esi: 00000002   edi: 000001ff   ebp: cb19bd58   esp: cb19bd50
ds: 0068   es: 0068   ss: 0068
Process tcsh (pid: 7694, stackpage=cb19b000)
Stack: c6081a20 fffff000 cb19bd68 d0946b1a 00000000 00000002 cb19beb8
d0946e81
       00000080 00000001 d094a365 d094a44f 0000011b 00000148 d094a3d7
5a595857
       62613900 66656463 6a696867 33323130 37363534 62613938 66656463
6a696867
Call Trace:   [<d0946b1a>] should_writeback [llite] 0x1a (0xcb19bd5c))
[<d0946e81>] ll_check_dirty [llite] 0x51 (0xcb19bd6c)) [<d094a365>]
.rodata.str1.1 [llite] 0x1ba5 (0xcb19bd78)) [<d094a44f>] .rodata.str1.1
[llite] 0x1c8f (0xcb19bd7c)) [<d094a3d7>] .rodata.str1.1 [llite] 0x1c17
(0xcb19bd88)) [<c0227812>] vsnprintf [kernel] 0x2a2 (0xc511de18))
[<c0227812>] vsnprintf [kernel] 0x2a2 (0xc511de2c)) [<d094b92b>]
.rodata.str1.32 [llite] 0x140b (0xc511de78)) [<d094922d>] .rodata.str1.1
[llite] 0xa6d (0xc511de84)) [<d0923cf5>] ll_file_write [llite] 0xc5
(0xc511debc)) [<d0948e8d>] .rodata.str1.1 [llite] 0x6cd (0xc511dec8))
[<d094922d>] .rodata.str1.1 [llite] 0xa6d (0xc511decc)) [<d094b900>]
.rodata.str1.32 [llite] 0x13e0 (0xc511ded8)) [<d0948e8d>] .rodata.str1.1
[llite] 0x6cd (0xc511df30)) [<d094925d>] .rodata.str1.1 [llite] 0xa9d
(0xc511df34)) [<d094b9c0>] .rodata.str1.32 [llite] 0x14a0 (0xc511df40))
[<c0141c46>] sys_write [kernel] 0x96 (0xc511df5c)) [<c014f623>]
dupfd
[kernel] 0x23 (0xc511df84)) [<c014f63f>] dupfd [kernel] 0x3f (0xc511df88))
[<c0123d45>] sys_rt_sigprocmask [kernel] 0x125 (0xc511df98))
[<c0108cd3>]
system_call [kernel] 0x33 (0xc511dfc0))
 
 
Code: 8b 82 20 0a 00 00 31 d2 85 c0 0f 94 c2 4a 21 c2 85 d2 89 d1
 
 
 
 
Any help appreciated
 
Best regards
Sophana