On Jun 03, 2005 09:34 -0300, Leandro Tavares Carneiro wrote:> I''m new using lustre and i''m evaluting this to use in our production > environment. I have installed Lustre 1.2.4 on 8 nodes, 1 MDS 7 OSS, and > everything works fine. > > My next step is mount this filesystem from this little cluster above in > another cluster, with 128 dual Opteron machines. I installed Lustre in one > node to test the client funcionality, and the local tests runned well, > confirming the software are working. > > Call Trace: [<ffffffff802bcca8>]{sprintf+136} > [<ffffffffa0133cb9>]{:obdclass:class_process_config+297} > [<ffffffffa013889c>]{:obdclass:class_config_dump_llog+3660} > [<ffffffffa00de8e7>]{:portals:kportal_nal_cmd+519} > [<ffffffffa013574f>]{:obdclass:class_config_llog_handler+1679} > [<ffffffffa02861c3>]{:ptlrpc:llog_client_next_block+1795} > [<ffffffffa01033c2>]{:obdclass:llog_process+3122} > > I am using RedHat WS 3 update 4 on all this nodes. The server nodes are > dual PIII 1.4GHz and are used only as lustre OSS and MDS. The client is a > dual Opteron 244.The 1.2.4 version of Lustre does not support "zeroconf" mounting across systems with different wordsize (i.e. i686 and x86_64), which appears to be the problem here. If you instead mount the clients with "lconf --node client {config}.xml" (assuming your generic client config is called "client" as it is in most sample configs) then it should work. This is fixed in the 1.4.2 release. If you are seriously considering Lustre evaluation for your company you should contact sales@clusterfs.com to get an evaluation of the 1.4.2 Lustre code. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Andreas, Thank you for your help. To evalute the comercial version, it need to run on the public version... This is the condition my boss put on me to evalute the comercial version... Now, using lconf to mount the filesystem, it give to me this messages: Lustre: 2407:0:(socknal.c:1631:ksocknal_module_init()) maximum lustre stack 16283 Lustre: 2407:0:(socknal.c:130:ksocknal_init()) maximum lustre stack 16380 Lustre: 2407:0:(lib-init.c:257:lib_init()) maximum lustre stack 16384 LustreError: 2441:0:(client.c:452:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -16 req@00000100f872ec00 x15/t0 o38->mds1_UUID@NID_bw3n25_UUID:12 lens 168/64 ref 1 fl Rpc:R/0/50000 rc 0/-16 LustreError: 2441:0:(client.c:452:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -16 req@000001007f708400 x17/t0 o38->mds1_UUID@NID_bw3n25_UUID:12 lens 168/64 ref 1 fl Rpc:R/0/50000 rc 0/-16 LustreError: 2441:0:(client.c:452:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -16 req@00000100f872e400 x32/t0 o38->mds1_UUID@NID_bw3n25_UUID:12 lens 168/64 ref 1 fl Rpc:R/0/50000 rc 0/-16 LustreError: 2441:0:(client.c:452:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -16 req@000001007d0b2c00 x34/t0 o38->mds1_UUID@NID_bw3n25_UUID:12 lens 168/64 ref 1 fl Rpc:R/0/50000 rc 0/-16 And after some time is mounted. The file system server messages are: Jun 3 15:25:24 bw3n29 acceptor[2260]: Accepted host: bw6n128.ep.petrobras.com.br snd: 16777216 rcv 16777216 nagle: disabled Jun 3 15:25:24 bw3n27 acceptor[2194]: Accepted host: bw6n128.ep.petrobras.com.br snd: 16777216 rcv 16777216 nagle: disabled Jun 3 15:25:24 bw3n28 acceptor[2263]: Accepted host: bw6n128.ep.petrobras.com.br snd: 16777216 rcv 16777216 nagle: disabled Jun 3 15:25:24 bw3n30 acceptor[2260]: Accepted host: bw6n128.ep.petrobras.com.br snd: 16777216 rcv 16777216 nagle: disabled Jun 3 15:25:24 bw3n32 acceptor[2260]: Accepted host: bw6n128.ep.petrobras.com.br snd: 16777216 rcv 16777216 nagle: disabled Jun 3 15:25:24 bw3n31 acceptor[2260]: Accepted host: bw6n128.ep.petrobras.com.br snd: 16777216 rcv 16777216 nagle: disabled Jun 3 15:25:24 bw3n26 acceptor[2263]: Accepted host: bw6n128.ep.petrobras.com.br snd: 16777216 rcv 16777216 nagle: disabled Jun 3 15:25:24 bw3n25 acceptor[2224]: Accepted host: bw6n128.ep.petrobras.com.br snd: 16777216 rcv 16777216 nagle: disabled Jun 3 15:25:24 bw3n25 kernel: Lustre: 2227:0:(ldlm_lib.c:752:target_start_recovery_timer()) mds1: starting recovery timer (250s) Jun 3 15:25:24 bw3n25 kernel: LustreError: 2227:0:(ldlm_lib.c:470:target_handle_connect()) denying connection for new client aa896_MNT_client_391951a146: 1 clients in recovery for 250s Jun 3 15:25:24 bw3n25 kernel: LustreError: 2227:0:(ldlm_lib.c:1050:target_send_reply_msg()) @@@ processing error (-16) req@f7829800 x15/t0 o38-><?>@<?>:-1 lens 168/64 ref 0 fl ?phase?:/0/50000 rc -16/0 Jun 3 15:25:24 bw3n25 acceptor[2224]: Accepted host: bw6n128.ep.petrobras.com.br snd: 16777216 rcv 16777216 nagle: disabled Jun 3 15:27:04 bw3n25 acceptor[2224]: Accepted host: bw6n128.ep.petrobras.com.br snd: 16777216 rcv 16777216 nagle: disabled Jun 3 15:27:04 bw3n25 kernel: LustreError: 2228:0:(ldlm_lib.c:470:target_handle_connect()) denying connection for new client aa896_MNT_client_391951a146: 1 clients in recovery for 150s Jun 3 15:27:04 bw3n25 kernel: LustreError: 2228:0:(ldlm_lib.c:1050:target_send_reply_msg()) @@@ processing error (-16) req@f64a4400 x17/t0 o38-><?>@<?>:-1 lens 168/64 ref 0 fl ?phase?:/0/50000 rc -16/0 Jun 3 15:27:50 bw3n25 kernel: Lustre: 2196:0:(socknal_cb.c:1513:ksocknal_process_receive()) [f64b6000] EOF from 0xa027480 ip 10.2.116.128:32793 Jun 3 15:27:50 bw3n25 kernel: Lustre: 2195:0:(socknal_cb.c:1513:ksocknal_process_receive()) [f64b6800] EOF from 0xa027480 ip 10.2.116.128:32792 Jun 3 15:27:50 bw3n25 acceptor[2224]: Accepted host: bw6n128.ep.petrobras.com.br snd: 16777216 rcv 16777216 nagle: disabled Jun 3 15:27:50 bw3n25 kernel: LustreError: 2229:0:(ldlm_lib.c:470:target_handle_connect()) denying connection for new client eed7a_MNT_client_ad7be1467d: 1 clients in recovery for 103s Jun 3 15:27:50 bw3n25 kernel: LustreError: 2229:0:(ldlm_lib.c:1050:target_send_reply_msg()) @@@ processing error (-16) req@f6485a00 x32/t0 o38-><?>@<?>:-1 lens 168/64 ref 0 fl ?phase?:/0/50000 rc -16/0 Jun 3 15:27:50 bw3n25 acceptor[2224]: Accepted host: bw6n128.ep.petrobras.com.br snd: 16777216 rcv 16777216 nagle: disabled Jun 3 15:29:30 bw3n25 acceptor[2224]: Accepted host: bw6n128.ep.petrobras.com.br snd: 16777216 rcv 16777216 nagle: disabled Jun 3 15:29:30 bw3n25 kernel: LustreError: 2230:0:(ldlm_lib.c:470:target_handle_connect()) denying connection for new client eed7a_MNT_client_ad7be1467d: 1 clients in recovery for 3s Jun 3 15:29:30 bw3n25 kernel: LustreError: 2230:0:(ldlm_lib.c:1050:target_send_reply_msg()) @@@ processing error (-16) req@f6e7d400 x34/t0 o38-><?>@<?>:-1 lens 168/64 ref 0 fl ?phase?:/0/50000 rc -16/0 Jun 3 15:29:34 bw3n25 kernel: LustreError: 0:0:(ldlm_lib.c:713:target_recovery_expired()) recovery timed out, aborting Jun 3 15:30:01 bw3n26 last message repeated 2 times Jun 3 15:31:10 bw3n25 kernel: LustreError: 2231:0:(ldlm_lib.c:701:target_abort_recovery()) mds1: recovery period over; disconnecting unfinished clients. Jun 3 15:31:10 bw3n25 kernel: LustreError: 2231:0:(genops.c:689:class_disconnect_stale_exports()) mds1: disconnecting 1 stale clients Jun 3 15:31:10 bw3n25 kernel: Lustre: 2231:0:(ldlm_lib.c:596:target_finish_recovery()) mds1: sending delayed replies to recovered clients Jun 3 15:31:10 bw3n25 kernel: Lustre: 2231:0:(ldlm_lib.c:605:target_finish_recovery()) mds1: all clients recovered, 0 MDS orphans deleted Jun 3 15:31:10 bw3n25 kernel: LustreError: 2231:0:(recover.c:68:ptlrpc_run_recovery_over_upcall()) Error invoking recovery upcall DEFAULT RECOVERY_OVER mds1_UUID: -2; check /proc/sys/lustre/upcall I think i have done something wrong. I wrote the script based on the example of the Lustre How To and it works very well when i mount the filesystem on the MDS node. Below, are the script i have made: # config.sh rm -f miglustre.xml # Create nodes lmc -m miglustre.xml --add net --node bw3n25 --nid bw3n25 --nettype tcp lmc -m miglustre.xml --add net --node bw3n26 --nid bw3n26 --nettype tcp lmc -m miglustre.xml --add net --node bw3n27 --nid bw3n27 --nettype tcp lmc -m miglustre.xml --add net --node bw3n28 --nid bw3n28 --nettype tcp lmc -m miglustre.xml --add net --node bw3n29 --nid bw3n29 --nettype tcp lmc -m miglustre.xml --add net --node bw3n30 --nid bw3n30 --nettype tcp lmc -m miglustre.xml --add net --node bw3n31 --nid bw3n31 --nettype tcp lmc -m miglustre.xml --add net --node bw3n32 --nid bw3n32 --nettype tcp lmc -m miglustre.xml --add net --node client --nid ''*'' --nettype tcp # Cofigure MDS lmc -m miglustre.xml --add mds --node bw3n25 --mds mds1 --fstype ext3 --dev /dev/md1 # Configures OSTs lmc -m miglustre.xml --add lov --lov lov1 --mds mds1 --stripe_sz 16777216 --stripe_cnt 0 --stripe_pattern 0 lmc -m miglustre.xml --add ost --node bw3n26 --lov lov1 --ost ost1 --fstype ext3 --dev /dev/md1 lmc -m miglustre.xml --add ost --node bw3n27 --lov lov1 --ost ost2 --fstype ext3 --dev /dev/md1 lmc -m miglustre.xml --add ost --node bw3n28 --lov lov1 --ost ost3 --fstype ext3 --dev /dev/md1 lmc -m miglustre.xml --add ost --node bw3n29 --lov lov1 --ost ost4 --fstype ext3 --dev /dev/md1 lmc -m miglustre.xml --add ost --node bw3n30 --lov lov1 --ost ost5 --fstype ext3 --dev /dev/md1 lmc -m miglustre.xml --add ost --node bw3n31 --lov lov1 --ost ost6 --fstype ext3 --dev /dev/md1 lmc -m miglustre.xml --add ost --node bw3n32 --lov lov1 --ost ost7 --fstype ext3 --dev /dev/md1 # Configure client (this is a ''generic'' client used for all client mounts) lmc -m miglustre.xml --add mtpt --node client --path /miglustre --mds mds1 --lov lov1 When it works mostly without problems, i will put a hard test on it. After that, if i got good results i will ask for a quote and will evalute the comercial version. Thanks for your help, Regards, Leandro Tavares Carneiro Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ Tel: (0xx21) 3224-1427 Andreas Dilger wrote:> On Jun 03, 2005 09:34 -0300, Leandro Tavares Carneiro wrote: > >>I''m new using lustre and i''m evaluting this to use in our production >>environment. I have installed Lustre 1.2.4 on 8 nodes, 1 MDS 7 OSS, and >>everything works fine. >> >>My next step is mount this filesystem from this little cluster above in >>another cluster, with 128 dual Opteron machines. I installed Lustre in one >>node to test the client funcionality, and the local tests runned well, >>confirming the software are working. >> >>Call Trace: [<ffffffff802bcca8>]{sprintf+136} >>[<ffffffffa0133cb9>]{:obdclass:class_process_config+297} >> [<ffffffffa013889c>]{:obdclass:class_config_dump_llog+3660} >> [<ffffffffa00de8e7>]{:portals:kportal_nal_cmd+519} >> [<ffffffffa013574f>]{:obdclass:class_config_llog_handler+1679} >> [<ffffffffa02861c3>]{:ptlrpc:llog_client_next_block+1795} >> [<ffffffffa01033c2>]{:obdclass:llog_process+3122} >> >>I am using RedHat WS 3 update 4 on all this nodes. The server nodes are >>dual PIII 1.4GHz and are used only as lustre OSS and MDS. The client is a >>dual Opteron 244. > > > The 1.2.4 version of Lustre does not support "zeroconf" mounting across > systems with different wordsize (i.e. i686 and x86_64), which appears > to be the problem here. If you instead mount the clients with "lconf > --node client {config}.xml" (assuming your generic client config is > called "client" as it is in most sample configs) then it should work. > This is fixed in the 1.4.2 release. > > If you are seriously considering Lustre evaluation for your company you > should contact sales@clusterfs.com to get an evaluation of the 1.4.2 > Lustre code. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.clusterfs.com > https://lists.clusterfs.com/mailman/listinfo/lustre-discuss >
On Fri, 2005-06-03 at 15:41 -0300, Leandro Tavares Carneiro wrote:> Andreas, > > Thank you for your help. To evalute the comercial version, it need to run on the > public version... This is the condition my boss put on me to evalute the > comercial version... > > Now, using lconf to mount the filesystem, it give to me this messages: >> And after some time is mounted. The file system server messages are:> Jun 3 15:25:24 bw3n25 kernel: Lustre: > 2227:0:(ldlm_lib.c:752:target_start_recovery_timer()) mds1: starting recovery > timer (250s)This means you will be waiting at least 4 minutes, 10 seconds, while Lustre waits for missing nodes to establish contact.> Jun 3 15:25:24 bw3n25 kernel: LustreError: > 2227:0:(ldlm_lib.c:470:target_handle_connect()) denying connection for new > client aa896_MNT_client_391951a146: 1 clients in recovery for 250sIn the meantime, new nodes will be blocked from connecting.> 2228:0:(ldlm_lib.c:470:target_handle_connect()) denying connection for new > client aa896_MNT_client_391951a146: 1 clients in recovery for 150sStill waiting ....> 2229:0:(ldlm_lib.c:470:target_handle_connect()) denying connection for new > client eed7a_MNT_client_ad7be1467d: 1 clients in recovery for 103sStill waiting ....> 2230:0:(ldlm_lib.c:470:target_handle_connect()) denying connection for new > client eed7a_MNT_client_ad7be1467d: 1 clients in recovery for 3sAlmost there....> Jun 3 15:29:30 bw3n25 kernel: LustreError: > 2230:0:(ldlm_lib.c:1050:target_send_reply_msg()) @@@ processing error (-16) > req@f6e7d400 x34/t0 o38-><?>@<?>:-1 lens 168/64 ref 0 fl ?phase?:/0/50000 rc -16/0 > Jun 3 15:29:34 bw3n25 kernel: LustreError: > 0:0:(ldlm_lib.c:713:target_recovery_expired()) recovery timed out, abortingAny node that hasn''t checked in yet is now forgotten:> 2231:0:(ldlm_lib.c:701:target_abort_recovery()) mds1: recovery period over; > disconnecting unfinished clients. > Jun 3 15:31:10 bw3n25 kernel: LustreError: > 2231:0:(genops.c:689:class_disconnect_stale_exports()) mds1: disconnecting 1 > stale clientsIncluding this anonymous node.> Jun 3 15:31:10 bw3n25 kernel: Lustre: > 2231:0:(ldlm_lib.c:596:target_finish_recovery()) mds1: sending delayed replies > to recovered clientsBut the other nodes are still fine.> Jun 3 15:31:10 bw3n25 kernel: Lustre: > 2231:0:(ldlm_lib.c:605:target_finish_recovery()) mds1: all clients recovered, 0 > MDS orphans deletedNo problem.> Jun 3 15:31:10 bw3n25 kernel: LustreError: > 2231:0:(recover.c:68:ptlrpc_run_recovery_over_upcall()) Error invoking recovery > upcall DEFAULT RECOVERY_OVER mds1_UUID: -2; check /proc/sys/lustre/upcallThis is harmless.> I think i have done something wrong.No, I think you did it properly. It''s just that Lustre is very verbose, and it''s difficult to tell what''s wrong in all that debugging information. ERROR: SUCCESS! -jwb
Well, it is working. Now i can star more serious tests. Thank you all for your help! Leandro Tavares Carneiro Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ Tel: (0xx21) 3224-1427 Jeffrey W. Baker wrote:> On Fri, 2005-06-03 at 15:41 -0300, Leandro Tavares Carneiro wrote: > >>Andreas, >> >>Thank you for your help. To evalute the comercial version, it need to run on the >>public version... This is the condition my boss put on me to evalute the >>comercial version... >> >>Now, using lconf to mount the filesystem, it give to me this messages: >> > > >>And after some time is mounted. The file system server messages are: > > >>Jun 3 15:25:24 bw3n25 kernel: Lustre: >>2227:0:(ldlm_lib.c:752:target_start_recovery_timer()) mds1: starting recovery >>timer (250s) > > > This means you will be waiting at least 4 minutes, 10 seconds, while > Lustre waits for missing nodes to establish contact. > > >>Jun 3 15:25:24 bw3n25 kernel: LustreError: >>2227:0:(ldlm_lib.c:470:target_handle_connect()) denying connection for new >>client aa896_MNT_client_391951a146: 1 clients in recovery for 250s > > > In the meantime, new nodes will be blocked from connecting. > > >>2228:0:(ldlm_lib.c:470:target_handle_connect()) denying connection for new >>client aa896_MNT_client_391951a146: 1 clients in recovery for 150s > > > Still waiting .... > > >>2229:0:(ldlm_lib.c:470:target_handle_connect()) denying connection for new >>client eed7a_MNT_client_ad7be1467d: 1 clients in recovery for 103s > > > Still waiting .... > > >>2230:0:(ldlm_lib.c:470:target_handle_connect()) denying connection for new >>client eed7a_MNT_client_ad7be1467d: 1 clients in recovery for 3s > > > Almost there.... > > >>Jun 3 15:29:30 bw3n25 kernel: LustreError: >>2230:0:(ldlm_lib.c:1050:target_send_reply_msg()) @@@ processing error (-16) >>req@f6e7d400 x34/t0 o38-><?>@<?>:-1 lens 168/64 ref 0 fl ?phase?:/0/50000 rc -16/0 >>Jun 3 15:29:34 bw3n25 kernel: LustreError: >>0:0:(ldlm_lib.c:713:target_recovery_expired()) recovery timed out, aborting > > > Any node that hasn''t checked in yet is now forgotten: > > >>2231:0:(ldlm_lib.c:701:target_abort_recovery()) mds1: recovery period over; >>disconnecting unfinished clients. >>Jun 3 15:31:10 bw3n25 kernel: LustreError: >>2231:0:(genops.c:689:class_disconnect_stale_exports()) mds1: disconnecting 1 >>stale clients > > > Including this anonymous node. > > >>Jun 3 15:31:10 bw3n25 kernel: Lustre: >>2231:0:(ldlm_lib.c:596:target_finish_recovery()) mds1: sending delayed replies >>to recovered clients > > > But the other nodes are still fine. > > >>Jun 3 15:31:10 bw3n25 kernel: Lustre: >>2231:0:(ldlm_lib.c:605:target_finish_recovery()) mds1: all clients recovered, 0 >>MDS orphans deleted > > > No problem. > > >>Jun 3 15:31:10 bw3n25 kernel: LustreError: >>2231:0:(recover.c:68:ptlrpc_run_recovery_over_upcall()) Error invoking recovery >>upcall DEFAULT RECOVERY_OVER mds1_UUID: -2; check /proc/sys/lustre/upcall > > > This is harmless. > > >>I think i have done something wrong. > > > No, I think you did it properly. It''s just that Lustre is very verbose, > and it''s difficult to tell what''s wrong in all that debugging > information. > > ERROR: SUCCESS! > > -jwb > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.clusterfs.com > https://lists.clusterfs.com/mailman/listinfo/lustre-discuss >
Hi,
I''m new using lustre and i''m evaluting this to use in our
production
environment. I have installed Lustre 1.2.4 on 8 nodes, 1 MDS 7 OSS, and
everything works fine.
My next step is mount this filesystem from this little cluster above in
another cluster, with 128 dual Opteron machines. I installed Lustre in one
node to test the client funcionality, and the local tests runned well,
confirming the software are working.
The problem begins when I mount the filesystem created on the small cluster on
this client node. I simply got a kernel crash and errors from lustre. I
didn''t
find a log with the lustre errors at the moment, but a kernel message is below.
Lustre: 9807:0:(module.c:724:init_kportals_module()) maximum lustre stack 16384
Unable to handle kernel paging request at virtual address 0000010103d66960
printing rip:
ffffffffa012e8b8
PML4 8063 PGD 0
Oops: 0000
CPU 1
Pid: 9806, comm: mount.lustre Not tainted
RIP: 0010:[<ffffffffa012e8b8>]{:obdclass:class_attach+440}
RSP: 0000:00000100f3b3f7f8 EFLAGS: 00010202
RAX: 0000000009472d58 RBX: 00000100fa8f3b80 RCX: 0000000000000012
RDX: 0000000000003fee RSI: ffffffffffffffee RDI: 00000100fa8f3b80
RBP: 00000100fa8f3c08 R08: 0000000000000001 R09: 0000000000000000
R10: 000001007957bb00 R11: 0000000000000012 R12: 00000100f3b3fd08
R13: 0000000000000000 R14: 00000000000000a0 R15: 00000100fa8f3be8
FS: 0000002a95c6b4c0(0000) GS:ffffffff805e5c00(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000010103d66960 CR3: 0000000007dc1000 CR4: 00000000000006e0
Call Trace: [<ffffffff802bcca8>]{sprintf+136}
[<ffffffffa0133cb9>]{:obdclass:class_process_config+297}
[<ffffffffa013889c>]{:obdclass:class_config_dump_llog+3660}
[<ffffffffa00de8e7>]{:portals:kportal_nal_cmd+519}
[<ffffffffa013574f>]{:obdclass:class_config_llog_handler+1679}
[<ffffffffa02861c3>]{:ptlrpc:llog_client_next_block+1795}
[<ffffffffa01033c2>]{:obdclass:llog_process+3122}
[<ffffffffa0286a35>]{:ptlrpc:llog_client_read_header+1909}
[<ffffffffa0148dc0>]{:obdclass:obd_dev+736}
[<ffffffffa02bad20>]{:ptlrpc:llog_client_ops+0}
[<ffffffffa01350c0>]{:obdclass:class_config_llog_handler+0}
[<ffffffffa02bad20>]{:ptlrpc:llog_client_ops+0}
[<ffffffffa01362e0>]{:obdclass:class_config_parse_llog+1648}
[<ffffffffa0148ae0>]{:obdclass:obd_dev+0}
[<ffffffffa011d798>]{:obdclass:class_conn2export+1256}
[<ffffffffa0148ae0>]{:obdclass:obd_dev+0}
[<ffffffffa03482b1>]{:llite:lustre_process_log+3761}
[<ffffffff802bcca8>]{sprintf+136}
[<ffffffffa036ee8c>]{:llite:.rodata.str1.1+3916}
[<ffffffffa036ee94>]{:llite:.rodata.str1.1+3924}
[<ffffffffa0349795>]{:llite:lustre_fill_super+3781}
[<ffffffffa036ee5a>]{:llite:.rodata.str1.1+3866}
[<ffffffff80127cfb>]{release_task+763}
[<ffffffff80129854>]{wait_task_zombie+372}
[<ffffffff80129d5f>]{sys_wait4+799}
[<ffffffffa036db0e>]{:llite:lustre_read_super+238}
[<ffffffffa037aa60>]{:llite:lustre_fs_type+0}
[<ffffffffa037aa60>]{:llite:lustre_fs_type+0}
[<ffffffff801673fe>]{get_sb_nodev+78}
[<ffffffffa037aa60>]{:llite:lustre_fs_type+0}
[<ffffffff801675f4>]{do_kern_mount+164}
[<ffffffff8017f181>]{do_add_mount+161}
[<ffffffff8017f4c3>]{do_mount+371}
[<ffffffff8017f8e5>]{sys_mount+197}
[<ffffffff801102a7>]{system_call+119}
Process mount.lustre (pid: 9806, stackpage=100f3b3d000)
Stack: 00000100f3b3f7f8 0000000000000000 00000000000000a0 0000000000000000
ffffffff802bcca8 0000003000000020 00000100f3b3f8e8 00000100f3b3f828
0000000100000001 0000000000000202 00000100f3a7c080 00000100fa8f3be8
00000100fa8f3b80 0000000000000012 00000100f3b3fd08 0000000000000000
00000000000000a0 00000100fa8f3be8 ffffffffa0133cb9 00000000000000a0
ffffffffa013889c 00000100f3a7c080 ffffffffa00de8e7 0000000000000001
0000000000000202 0000010087d681c0 00000100f3a7c0e0 00000100f3a7c0f0
0000000000000246 0000000000000246 00000100fa8f3b80 0000000000000012
ffffffffa013574f 0000010037f44048 ffffffffa02861c3 0000000000000000
00000100f3b3c000 0000000000000000 0000001200000000 000001007957bb00
Call Trace: [<ffffffff802bcca8>]{sprintf+136}
[<ffffffffa0133cb9>]{:obdclass:class_process_config+297}
[<ffffffffa013889c>]{:obdclass:class_config_dump_llog+3660}
[<ffffffffa00de8e7>]{:portals:kportal_nal_cmd+519}
[<ffffffffa013574f>]{:obdclass:class_config_llog_handler+1679}
[<ffffffffa02861c3>]{:ptlrpc:llog_client_next_block+1795}
[<ffffffffa01033c2>]{:obdclass:llog_process+3122}
[<ffffffffa0286a35>]{:ptlrpc:llog_client_read_header+1909}
[<ffffffffa0148dc0>]{:obdclass:obd_dev+736}
[<ffffffffa02bad20>]{:ptlrpc:llog_client_ops+0}
[<ffffffffa01350c0>]{:obdclass:class_config_llog_handler+0}
[<ffffffffa02bad20>]{:ptlrpc:llog_client_ops+0}
[<ffffffffa01362e0>]{:obdclass:class_config_parse_llog+1648}
[<ffffffffa0148ae0>]{:obdclass:obd_dev+0}
[<ffffffffa011d798>]{:obdclass:class_conn2export+1256}
[<ffffffffa0148ae0>]{:obdclass:obd_dev+0}
[<ffffffffa03482b1>]{:llite:lustre_process_log+3761}
[<ffffffff802bcca8>]{sprintf+136}
[<ffffffffa036ee8c>]{:llite:.rodata.str1.1+3916}
[<ffffffffa036ee94>]{:llite:.rodata.str1.1+3924}
[<ffffffffa0349795>]{:llite:lustre_fill_super+3781}
[<ffffffffa036ee5a>]{:llite:.rodata.str1.1+3866}
[<ffffffff80127cfb>]{release_task+763}
[<ffffffff80129854>]{wait_task_zombie+372}
[<ffffffff80129d5f>]{sys_wait4+799}
[<ffffffffa036db0e>]{:llite:lustre_read_super+238}
[<ffffffffa037aa60>]{:llite:lustre_fs_type+0}
[<ffffffffa037aa60>]{:llite:lustre_fs_type+0}
[<ffffffff801673fe>]{get_sb_nodev+78}
[<ffffffffa037aa60>]{:llite:lustre_fs_type+0}
[<ffffffff801675f4>]{do_kern_mount+164}
[<ffffffff8017f181>]{do_add_mount+161}
[<ffffffff8017f4c3>]{do_mount+371}
[<ffffffff8017f8e5>]{sys_mount+197}
[<ffffffff801102a7>]{system_call+119}
Code: 80 3c 28 00 0f 84 7e 01 00 00 8b 0d a0 a5 fc ff 85 c9 0f 84
Kernel panic: Fatal exception
I am using RedHat WS 3 update 4 on all this nodes. The server nodes are dual
PIII 1.4GHz and are used only as lustre OSS and MDS. The client is a dual
Opteron 244.
Thanks in advance,
--
Leandro Tavares Carneiro
Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P
Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ
Tel: (0xx21) 3224-1427