Kumaran Rajaram
2006-May-19 07:36 UTC
[Lustre-discuss] kopenibnal 1.4.1.2 do_proc_dointvec bug?
This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. --=__Part664448C1.0__Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Paul,=20 =20 I have been using kopenibnal (against OpenIB 1.8.0) and Lustre has been stable on Mellanox memfree cards (since 1.4.3). On Mellanox mem-based cards, Lustre mounts fine although crashes during large I/O. See #7246 on ClusterFS bugzilla for more info. Not sure if what you see is related to this.=20 =20 HTH,=20 -Kums=20 =20>>>PAulN <pauln@psc.edu> 10/26/05 7:19 pm >>>The kopenibnal crashes when lconf tries to set a proc value. I remember seeing a warning during compilation, I guess it wasn''t a joke. Has anyone else come across this? I''m running a older version of lustre so this may already be fixed.. is anyone even using the kopenibnal? paul during lconf: Service: network NET_ost1_openib NET_ost1_openib_UUID NETWORK: NET_ost1_openib NET_ost1_openib_UUID openib ost1 988 + lustre/utils/lctl network openib mynid ost1 quit + sysctl /proc/sys/openibnal/port 988 Killed .... Oct 26 21:11:02 ost1 kernel: Lustre: Filtering OBD driver; info@clusterfs.com Oct 26 21:11:07 ost1 kernel: Lustre: 1511:0:(openibnal.c:792:kibnal_start_ip_listener()) Listener started OK: pid:1512 port:988 backlog:127 Oct 26 21:11:07 ost1 kernel: Lustre: 1511:0:(openibnal.c:792:kibnal_start_ip_listener()) Listener started OK: pid:1512 port:988 backlog:127 Oct 26 21:11:07 ost1 kernel: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: Oct 26 21:11:07 ost1 kernel: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: Oct 26 21:11:07 ost1 kernel: <ffffffff8013c66a>{do_proc_dointvec+106} Oct 26 21:11:07 ost1 kernel: <ffffffff8013c66a>{do_proc_dointvec+106} Oct 26 21:11:07 ost1 kernel: PML4 74025067 PGD 74024067 PMD 0 Oct 26 21:11:07 ost1 kernel: PML4 74025067 PGD 74024067 PMD 0 Oct 26 21:11:07 ost1 kernel: Oops: 0000 [1] SMP Oct 26 21:11:07 ost1 kernel: Oops: 0000 [1] SMP Oct 26 21:11:07 ost1 kernel: CPU 0 Oct 26 21:11:07 ost1 kernel: CPU 0 Oct 26 21:11:07 ost1 kernel: Modules linked in: mds lov osc mdc obdfilter fsfilt_ldiskfs ldiskfs ost ptlrpc obdclass lvfs kopenibnal kptlrou ter portals libcfs aic79xx ib_useraccess ib_tavor mod_vapi mod_vipkl mod_thh mod_hh mod_mpga mod_vapi_common mosal ib_dm_client ib_sa_client ib_client_query ib_cm ib_poll ib_mad ib_core ib_services md Oct 26 21:11:07 ost1 kernel: Modules linked in: mds lov osc mdc obdfilter fsfilt_ldiskfs ldiskfs ost ptlrpc obdclass lvfs kopenibnal kptlrou ter portals libcfs aic79xx ib_useraccess ib_tavor mod_vapi mod_vipkl mod_thh mod_hh mod_mpga mod_vapi_common mosal ib_dm_client ib_sa_client ib_client_query ib_cm ib_poll ib_mad ib_core ib_services md Oct 26 21:11:07 ost1 kernel: Pid: 1463, comm: python Not tainted 2.6.9-qsnetp2.5.11.3qsnetlustre-1.4.2.1_pauln Oct 26 21:11:07 ost1 kernel: Pid: 1463, comm: python Not tainted 2.6.9-qsnetp2.5.11.3qsnetlustre-1.4.2.1_pauln Oct 26 21:11:07 ost1 kernel: RIP: 0010:[<ffffffff8013c66a>] <ffffffff8013c66a>{do_proc_dointvec+106} Oct 26 21:11:07 ost1 kernel: RIP: 0010:[<ffffffff8013c66a>] <ffffffff8013c66a>{do_proc_dointvec+106} Oct 26 21:11:07 ost1 kernel: RSP: 0018:0000010074097db8 EFLAGS: 00010206 Oct 26 21:11:07 ost1 kernel: RSP: 0018:0000010074097db8 EFLAGS: 00010206 Oct 26 21:11:07 ost1 kernel: RAX: 00000000000003dc RBX: ffffffffa01e0b2c RCX: ffffffffa01e0b2c Oct 26 21:11:07 ost1 kernel: RAX: 00000000000003dc RBX: ffffffffa01e0b2c RCX: ffffffffa01e0b2c Oct 26 21:11:07 ost1 kernel: RDX: 000001007370d880 RSI: 0000000000000004 RDI: 0000000000000003 Oct 26 21:11:07 ost1 kernel: RDX: 000001007370d880 RSI: 0000000000000004 RDI: 0000000000000003 Oct 26 21:11:07 ost1 kernel: RBP: 0000002a99c2d000 R08: 0000010074097ed8 R09: 0000000000000000 Oct 26 21:11:07 ost1 kernel: RBP: 0000002a99c2d000 R08: 0000010074097ed8 R09: 0000000000000000 Oct 26 21:11:07 ost1 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 Oct 26 21:11:07 ost1 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 Oct 26 21:11:07 ost1 kernel: R13: 0000010074097ed8 R14: 0000000000000000 R15: 0000000000000001 Oct 26 21:11:08 ost1 kernel: R13: 0000010074097ed8 R14: 0000000000000000 R15: 0000000000000001 Oct 26 21:11:08 ost1 kernel: FS: 0000002a95f59ca0(0000) GS:ffffffff806057c0(0000) knlGS:0000000000000000 Oct 26 21:11:08 ost1 kernel: FS: 0000002a95f59ca0(0000) GS:ffffffff806057c0(0000) knlGS:0000000000000000 Oct 26 21:11:08 ost1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 26 21:11:08 ost1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0 Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0 Oct 26 21:11:08 ost1 kernel: Process python (pid: 1463, threadinfo 0000010074096000, task 000001007df3f070) Oct 26 21:11:08 ost1 kernel: Process python (pid: 1463, threadinfo 0000010074096000, task 000001007df3f070) Oct 26 21:11:08 ost1 kernel: Stack: 0000000100000200 0000000100000000 0000002a99c2d000 000001007e0249c0 Oct 26 21:11:08 ost1 kernel: Stack: 0000000100000200 0000000100000000 0000002a99c2d000 000001007e0249c0 Oct 26 21:11:08 ost1 kernel: 00000100739a4058 0000000000000001 0000000000000000 ffffffff80162e30 Oct 26 21:11:08 ost1 kernel: 00000100739a4058 0000000000000001 0000000000000000 ffffffff80162e30 Oct 26 21:11:08 ost1 kernel: 000001007452d100 000001007e024a28 Oct 26 21:11:08 ost1 kernel: 000001007452d100 000001007e024a28 Oct 26 21:11:08 ost1 kernel: Call Trace:<ffffffff80162e30>{handle_mm_fault+368} <ffffffff8017b5b5>{vfs_fstat+133} Oct 26 21:11:08 ost1 kernel: Call Trace:<ffffffff80162e30>{handle_mm_fault+368} <ffffffff8017b5b5>{vfs_fstat+133} Oct 26 21:11:08 ost1 kernel: =20 <ffffffff802b865b>{rb_insert_color+107} <ffffffff8013c9da>{proc_dointvec+26} Oct 26 21:11:08 ost1 kernel: =20 <ffffffff802b865b>{rb_insert_color+107} <ffffffff8013c9da>{proc_dointvec+26} Oct 26 21:11:08 ost1 kernel: =20 <ffffffffa01c3a9f>{:kopenibnal:kibnal_listener_procint+191} Oct 26 21:11:08 ost1 kernel: =20 <ffffffffa01c3a9f>{:kopenibnal:kibnal_listener_procint+191} Oct 26 21:11:08 ost1 kernel: <ffffffff8013c2a9>{do_rw_proc+153} <ffffffff801722d7>{vfs_write+199} Oct 26 21:11:08 ost1 kernel: <ffffffff8013c2a9>{do_rw_proc+153} <ffffffff801722d7>{vfs_write+199} Oct 26 21:11:08 ost1 kernel: <ffffffff80172413>{sys_write+83} <ffffffff801105d6>{system_call+126} Oct 26 21:11:08 ost1 kernel: <ffffffff80172413>{sys_write+83} <ffffffff801105d6>{system_call+126} Oct 26 21:11:08 ost1 kernel: =20 Oct 26 21:11:08 ost1 kernel: =20 Oct 26 21:11:08 ost1 kernel: Oct 26 21:11:08 ost1 kernel: Oct 26 21:11:08 ost1 kernel: Code: 49 83 39 00 0f 95 c2 31 c0 45 85 ff 0f 94 c0 85 c2 74 10 66 Oct 26 21:11:08 ost1 kernel: Code: 49 83 39 00 0f 95 c2 31 c0 45 85 ff 0f 94 c0 85 c2 74 10 66 Oct 26 21:11:08 ost1 kernel: RIP <ffffffff8013c66a>{do_proc_dointvec+106} RSP <0000010074097db8> Oct 26 21:11:08 ost1 kernel: RIP <ffffffff8013c66a>{do_proc_dointvec+106} RSP <0000010074097db8> Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 Lustre-discuss mailing list Lustre-discuss@lists.clusterfs.com https://lists.clusterfs.com/mailman/listinfo/lustre-discuss --=__Part664448C1.0__Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Description: HTML <html> <head> <style type=3D"text/css"> <!-- body { line-height: normal; margin-right: 4px; margin-bottom: 1px; margin-top: 4px; font-variant: normal; margin-left: 4px } --> </style> =20 </head> <body style=3D"margin-right: 4px; margin-top: 4px; margin-bottom: 1px; margin-left: 4px"> <DIV> Paul, </DIV> <DIV> </DIV> <DIV>I have been using kopenibnal (against OpenIB 1.8.0) and Lustre has been stable on Mellanox memfree cards (since 1.4.3). On Mellanox mem-based cards, Lustre mounts fine although crashes during large I/O. See #7246 on ClusterFS bugzilla for more info. Not sure if what you see is related to this. </DIV> <DIV> </DIV> <DIV>HTH, </DIV> <DIV>-Kums </DIV> <DIV> </DIV> <DIV>>>>PAulN <pauln@psc.edu> 10/26/05 7:19 pm >>><br>The kopenibnal crashes when lconf tries to set a proc value.<br>I remember seeing a warning during compilation, I guess it wasn't a joke.<br>Has anyone else come across this? I'm running a older version of lustre so<br>this may already be fixed..<br><br>is anyone even using the kopenibnal?<br>paul<br><br>during lconf:<br><br>Service: network NET_ost1_openib NET_ost1_openib_UUID<br>NETWORK: NET_ost1_openib NET_ost1_openib_UUID openib ost1 988<br>+ lustre/utils/lctl<br> network openib<br> mynid ost1<br> quit<br>+ sysctl /proc/sys/openibnal/port 988<br>Killed<br>....<br><br>Oct 26 21:11:02 ost1 kernel: Lustre: Filtering OBD driver;<br>info@clusterfs.com<br>Oct 26 21:11:07 ost1 kernel: Lustre:<br>1511:0:(openibnal.c:792:kibnal_start_ip_listener()) Listener started OK:<br>pid:1512 port:988 backlog:127<br>Oct 26 21:11:07 ost1 kernel: Lustre:<br>1511:0:(openibnal.c:792:kibnal_start_ip_listener()) Listener started OK:<br>pid:1512 port:988 backlog:127<br>Oct 26 21:11:07 ost1 kernel: Unable to handle kernel NULL pointer<br>dereference at 0000000000000000 RIP:<br>Oct 26 21:11:07 ost1 kernel: Unable to handle kernel NULL pointer<br>dereference at 0000000000000000 RIP:<br>Oct 26 21:11:07 ost1 kernel: <ffffffff8013c66a>{do_proc_dointvec+106}<br>Oct 26 21:11:07 ost1 kernel: <ffffffff8013c66a>{do_proc_dointvec+106}<br>Oct 26 21:11:07 ost1 kernel: PML4 74025067 PGD 74024067 PMD 0<br>Oct 26 21:11:07 ost1 kernel: PML4 74025067 PGD 74024067 PMD 0<br>Oct 26 21:11:07 ost1 kernel: Oops: 0000 [1] SMP<br>Oct 26 21:11:07 ost1 kernel: Oops: 0000 [1] SMP<br>Oct 26 21:11:07 ost1 kernel: CPU 0<br>Oct 26 21:11:07 ost1 kernel: CPU 0<br>Oct 26 21:11:07 ost1 kernel: Modules linked in: mds lov osc mdc<br>obdfilter fsfilt_ldiskfs ldiskfs ost ptlrpc obdclass lvfs kopenibnal kptlrou<br>ter portals libcfs aic79xx ib_useraccess ib_tavor mod_vapi mod_vipkl<br>mod_thh mod_hh mod_mpga mod_vapi_common mosal ib_dm_client ib_sa_client<br>ib_client_query ib_cm ib_poll ib_mad ib_core ib_services md<br>Oct 26 21:11:07 ost1 kernel: Modules linked in: mds lov osc mdc<br>obdfilter fsfilt_ldiskfs ldiskfs ost ptlrpc obdclass lvfs kopenibnal kptlrou<br>ter portals libcfs aic79xx ib_useraccess ib_tavor mod_vapi mod_vipkl<br>mod_thh mod_hh mod_mpga mod_vapi_common mosal ib_dm_client ib_sa_client<br>ib_client_query ib_cm ib_poll ib_mad ib_core ib_services md<br>Oct 26 21:11:07 ost1 kernel: Pid: 1463, comm: python Not tainted<br>2.6.9-qsnetp2.5.11.3qsnetlustre-1.4.2.1_pauln<br>Oct 26 21:11:07 ost1 kernel: Pid: 1463, comm: python Not tainted<br>2.6.9-qsnetp2.5.11.3qsnetlustre-1.4.2.1_pauln<br>Oct 26 21:11:07 ost1 kernel: RIP: 0010:[<ffffffff8013c66a>]<br><ffffffff8013c66a>{do_proc_dointvec+106}<br>Oct 26 21:11:07 ost1 kernel: RIP: 0010:[<ffffffff8013c66a>]<br><ffffffff8013c66a>{do_proc_dointvec+106}<br>Oct 26 21:11:07 ost1 kernel: RSP: 0018:0000010074097db8  EFLAGS: 00010206<br>Oct 26 21:11:07 ost1 kernel: RSP: 0018:0000010074097db8  EFLAGS: 00010206<br>Oct 26 21:11:07 ost1 kernel: RAX: 00000000000003dc RBX: ffffffffa01e0b2c<br>RCX: ffffffffa01e0b2c<br>Oct 26 21:11:07 ost1 kernel: RAX: 00000000000003dc RBX: ffffffffa01e0b2c<br>RCX: ffffffffa01e0b2c<br>Oct 26 21:11:07 ost1 kernel: RDX: 000001007370d880 RSI: 0000000000000004<br>RDI: 0000000000000003<br>Oct 26 21:11:07 ost1 kernel: RDX: 000001007370d880 RSI: 0000000000000004<br>RDI: 0000000000000003<br>Oct 26 21:11:07 ost1 kernel: RBP: 0000002a99c2d000 R08: 0000010074097ed8<br>R09: 0000000000000000<br>Oct 26 21:11:07 ost1 kernel: RBP: 0000002a99c2d000 R08: 0000010074097ed8<br>R09: 0000000000000000<br>Oct 26 21:11:07 ost1 kernel: R10: 0000000000000000 R11: 0000000000000246<br>R12: 0000000000000001<br>Oct 26 21:11:07 ost1 kernel: R10: 0000000000000000 R11: 0000000000000246<br>R12: 0000000000000001<br>Oct 26 21:11:07 ost1 kernel: R13: 0000010074097ed8 R14: 0000000000000000<br>R15: 0000000000000001<br>Oct 26 21:11:08 ost1 kernel: R13: 0000010074097ed8 R14: 0000000000000000<br>R15: 0000000000000001<br>Oct 26 21:11:08 ost1 kernel: FS:  0000002a95f59ca0(0000)<br>GS:ffffffff806057c0(0000) knlGS:0000000000000000<br>Oct 26 21:11:08 ost1 kernel: FS:  0000002a95f59ca0(0000)<br>GS:ffffffff806057c0(0000) knlGS:0000000000000000<br>Oct 26 21:11:08 ost1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:<br>000000008005003b<br>Oct 26 21:11:08 ost1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:<br>000000008005003b<br>Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 CR3: 0000000000101000<br>CR4: 00000000000006e0<br>Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 CR3: 0000000000101000<br>CR4: 00000000000006e0<br>Oct 26 21:11:08 ost1 kernel: Process python (pid: 1463, threadinfo<br>0000010074096000, task 000001007df3f070)<br>Oct 26 21:11:08 ost1 kernel: Process python (pid: 1463, threadinfo<br>0000010074096000, task 000001007df3f070)<br>Oct 26 21:11:08 ost1 kernel: Stack: 0000000100000200 0000000100000000<br>0000002a99c2d000 000001007e0249c0<br>Oct 26 21:11:08 ost1 kernel: Stack: 0000000100000200 0000000100000000<br>0000002a99c2d000 000001007e0249c0<br>Oct 26 21:11:08 ost1 kernel:        00000100739a4058 0000000000000001<br>0000000000000000 ffffffff80162e30<br>Oct 26 21:11:08 ost1 kernel:        00000100739a4058 0000000000000001<br>0000000000000000 ffffffff80162e30<br>Oct 26 21:11:08 ost1 kernel:        000001007452d100 000001007e024a28<br>Oct 26 21:11:08 ost1 kernel:        000001007452d100 000001007e024a28<br>Oct 26 21:11:08 ost1 kernel: Call<br>Trace:<ffffffff80162e30>{handle_mm_fault+368}<br><ffffffff8017b5b5>{vfs_fstat+133}<br>Oct 26 21:11:08 ost1 kernel: Call<br>Trace:<ffffffff80162e30>{handle_mm_fault+368}<br><ffffffff8017b5b5>{vfs_fstat+133}<br>Oct 26 21:11:08 ost1 kernel:       <br><ffffffff802b865b>{rb_insert_color+107}<br><ffffffff8013c9da>{proc_dointvec+26}<br>Oct 26 21:11:08 ost1 kernel:       <br><ffffffff802b865b>{rb_insert_color+107}<br><ffffffff8013c9da>{proc_dointvec+26}<br>Oct 26 21:11:08 ost1 kernel:       <br><ffffffffa01c3a9f>{:kopenibnal:kibnal_listener_procint+191}<br>Oct 26 21:11:08 ost1 kernel:       <br><ffffffffa01c3a9f>{:kopenibnal:kibnal_listener_procint+191}<br>Oct 26 21:11:08 ost1 kernel:        <ffffffff8013c2a9>{do_rw_proc+153}<br><ffffffff801722d7>{vfs_write+199}<br>Oct 26 21:11:08 ost1 kernel:        <ffffffff8013c2a9>{do_rw_proc+153}<br><ffffffff801722d7>{vfs_write+199}<br>Oct 26 21:11:08 ost1 kernel:        <ffffffff80172413>{sys_write+83}<br><ffffffff801105d6>{system_call+126}<br>Oct 26 21:11:08 ost1 kernel:        <ffffffff80172413>{sys_write+83}<br><ffffffff801105d6>{system_call+126}<br>Oct 26 21:11:08 ost1 kernel:       <br>Oct 26 21:11:08 ost1 kernel:       <br>Oct 26 21:11:08 ost1 kernel:<br>Oct 26 21:11:08 ost1 kernel:<br>Oct 26 21:11:08 ost1 kernel: Code: 49 83 39 00 0f 95 c2 31 c0 45 85 ff<br>0f 94 c0 85 c2 74 10 66<br>Oct 26 21:11:08 ost1 kernel: Code: 49 83 39 00 0f 95 c2 31 c0 45 85 ff<br>0f 94 c0 85 c2 74 10 66<br>Oct 26 21:11:08 ost1 kernel: RIP<br><ffffffff8013c66a>{do_proc_dointvec+106} RSP <0000010074097db8><br>Oct 26 21:11:08 ost1 kernel: RIP<br><ffffffff8013c66a>{do_proc_dointvec+106} RSP <0000010074097db8><br>Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000<br>Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000<br><br><br><br>Lustre-discuss mailing list<br>Lustre-discuss@lists.clusterfs.com<br>https://lists.clusterfs.com/mailman/listinfo/lustre-discuss<br> </DIV> </body> </html> --=__Part664448C1.0__=--
hey Kums, what tests have you been running to demonstrate stability? Also, if you''re interested I have an ''enhanced'' version of the openibnal which allows routing between quadrics and ip networks. I haven''t tested it on 1.4.x yet but it was working under lustre 1.3.3. As soon as I get this proc_intvec bug figured out I''ll be able to test the openibnal router on 1.4.2. If it works ok I''ll post the patch. paul Kumaran Rajaram wrote:> Paul, > > I have been using kopenibnal (against OpenIB 1.8.0) and Lustre has > been stable on Mellanox memfree cards (since 1.4.3). On Mellanox > mem-based cards, Lustre mounts fine although crashes during large I/O. > See #7246 on ClusterFS bugzilla for more info. Not sure if what you > see is related to this. > > HTH, > -Kums > > >>>PAulN <pauln@psc.edu> 10/26/05 7:19 pm >>> > The kopenibnal crashes when lconf tries to set a proc value. > I remember seeing a warning during compilation, I guess it wasn''t a joke. > Has anyone else come across this? I''m running a older version of lustre so > this may already be fixed.. > > is anyone even using the kopenibnal? > paul > > during lconf: > > Service: network NET_ost1_openib NET_ost1_openib_UUID > NETWORK: NET_ost1_openib NET_ost1_openib_UUID openib ost1 988 > + lustre/utils/lctl > network openib > mynid ost1 > quit > + sysctl /proc/sys/openibnal/port 988 > Killed > .... > > Oct 26 21:11:02 ost1 kernel: Lustre: Filtering OBD driver; > info@clusterfs.com > Oct 26 21:11:07 ost1 kernel: Lustre: > 1511:0:(openibnal.c:792:kibnal_start_ip_listener()) Listener started OK: > pid:1512 port:988 backlog:127 > Oct 26 21:11:07 ost1 kernel: Lustre: > 1511:0:(openibnal.c:792:kibnal_start_ip_listener()) Listener started OK: > pid:1512 port:988 backlog:127 > Oct 26 21:11:07 ost1 kernel: Unable to handle kernel NULL pointer > dereference at 0000000000000000 RIP: > Oct 26 21:11:07 ost1 kernel: Unable to handle kernel NULL pointer > dereference at 0000000000000000 RIP: > Oct 26 21:11:07 ost1 kernel: <ffffffff8013c66a>{do_proc_dointvec+106} > Oct 26 21:11:07 ost1 kernel: <ffffffff8013c66a>{do_proc_dointvec+106} > Oct 26 21:11:07 ost1 kernel: PML4 74025067 PGD 74024067 PMD 0 > Oct 26 21:11:07 ost1 kernel: PML4 74025067 PGD 74024067 PMD 0 > Oct 26 21:11:07 ost1 kernel: Oops: 0000 [1] SMP > Oct 26 21:11:07 ost1 kernel: Oops: 0000 [1] SMP > Oct 26 21:11:07 ost1 kernel: CPU 0 > Oct 26 21:11:07 ost1 kernel: CPU 0 > Oct 26 21:11:07 ost1 kernel: Modules linked in: mds lov osc mdc > obdfilter fsfilt_ldiskfs ldiskfs ost ptlrpc obdclass lvfs kopenibnal > kptlrou > ter portals libcfs aic79xx ib_useraccess ib_tavor mod_vapi mod_vipkl > mod_thh mod_hh mod_mpga mod_vapi_common mosal ib_dm_client ib_sa_client > ib_client_query ib_cm ib_poll ib_mad ib_core ib_services md > Oct 26 21:11:07 ost1 kernel: Modules linked in: mds lov osc mdc > obdfilter fsfilt_ldiskfs ldiskfs ost ptlrpc obdclass lvfs kopenibnal > kptlrou > ter portals libcfs aic79xx ib_useraccess ib_tavor mod_vapi mod_vipkl > mod_thh mod_hh mod_mpga mod_vapi_common mosal ib_dm_client ib_sa_client > ib_client_query ib_cm ib_poll ib_mad ib_core ib_services md > Oct 26 21:11:07 ost1 kernel: Pid: 1463, comm: python Not tainted > 2.6.9-qsnetp2.5.11.3qsnetlustre-1.4.2.1_pauln > Oct 26 21:11:07 ost1 kernel: Pid: 1463, comm: python Not tainted > 2.6.9-qsnetp2.5.11.3qsnetlustre-1.4.2.1_pauln > Oct 26 21:11:07 ost1 kernel: RIP: 0010:[<ffffffff8013c66a>] > <ffffffff8013c66a>{do_proc_dointvec+106} > Oct 26 21:11:07 ost1 kernel: RIP: 0010:[<ffffffff8013c66a>] > <ffffffff8013c66a>{do_proc_dointvec+106} > Oct 26 21:11:07 ost1 kernel: RSP: 0018:0000010074097db8 EFLAGS: 00010206 > Oct 26 21:11:07 ost1 kernel: RSP: 0018:0000010074097db8 EFLAGS: 00010206 > Oct 26 21:11:07 ost1 kernel: RAX: 00000000000003dc RBX: ffffffffa01e0b2c > RCX: ffffffffa01e0b2c > Oct 26 21:11:07 ost1 kernel: RAX: 00000000000003dc RBX: ffffffffa01e0b2c > RCX: ffffffffa01e0b2c > Oct 26 21:11:07 ost1 kernel: RDX: 000001007370d880 RSI: 0000000000000004 > RDI: 0000000000000003 > Oct 26 21:11:07 ost1 kernel: RDX: 000001007370d880 RSI: 0000000000000004 > RDI: 0000000000000003 > Oct 26 21:11:07 ost1 kernel: RBP: 0000002a99c2d000 R08: 0000010074097ed8 > R09: 0000000000000000 > Oct 26 21:11:07 ost1 kernel: RBP: 0000002a99c2d000 R08: 0000010074097ed8 > R09: 0000000000000000 > Oct 26 21:11:07 ost1 kernel: R10: 0000000000000000 R11: 0000000000000246 > R12: 0000000000000001 > Oct 26 21:11:07 ost1 kernel: R10: 0000000000000000 R11: 0000000000000246 > R12: 0000000000000001 > Oct 26 21:11:07 ost1 kernel: R13: 0000010074097ed8 R14: 0000000000000000 > R15: 0000000000000001 > Oct 26 21:11:08 ost1 kernel: R13: 0000010074097ed8 R14: 0000000000000000 > R15: 0000000000000001 > Oct 26 21:11:08 ost1 kernel: FS: 0000002a95f59ca0(0000) > GS:ffffffff806057c0(0000) knlGS:0000000000000000 > Oct 26 21:11:08 ost1 kernel: FS: 0000002a95f59ca0(0000) > GS:ffffffff806057c0(0000) knlGS:0000000000000000 > Oct 26 21:11:08 ost1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 000000008005003b > Oct 26 21:11:08 ost1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 000000008005003b > Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 CR3: 0000000000101000 > CR4: 00000000000006e0 > Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 CR3: 0000000000101000 > CR4: 00000000000006e0 > Oct 26 21:11:08 ost1 kernel: Process python (pid: 1463, threadinfo > 0000010074096000, task 000001007df3f070) > Oct 26 21:11:08 ost1 kernel: Process python (pid: 1463, threadinfo > 0000010074096000, task 000001007df3f070) > Oct 26 21:11:08 ost1 kernel: Stack: 0000000100000200 0000000100000000 > 0000002a99c2d000 000001007e0249c0 > Oct 26 21:11:08 ost1 kernel: Stack: 0000000100000200 0000000100000000 > 0000002a99c2d000 000001007e0249c0 > Oct 26 21:11:08 ost1 kernel: 00000100739a4058 0000000000000001 > 0000000000000000 ffffffff80162e30 > Oct 26 21:11:08 ost1 kernel: 00000100739a4058 0000000000000001 > 0000000000000000 ffffffff80162e30 > Oct 26 21:11:08 ost1 kernel: 000001007452d100 000001007e024a28 > Oct 26 21:11:08 ost1 kernel: 000001007452d100 000001007e024a28 > Oct 26 21:11:08 ost1 kernel: Call > Trace:<ffffffff80162e30>{handle_mm_fault+368} > <ffffffff8017b5b5>{vfs_fstat+133} > Oct 26 21:11:08 ost1 kernel: Call > Trace:<ffffffff80162e30>{handle_mm_fault+368} > <ffffffff8017b5b5>{vfs_fstat+133} > Oct 26 21:11:08 ost1 kernel: > <ffffffff802b865b>{rb_insert_color+107} > <ffffffff8013c9da>{proc_dointvec+26} > Oct 26 21:11:08 ost1 kernel: > <ffffffff802b865b>{rb_insert_color+107} > <ffffffff8013c9da>{proc_dointvec+26} > Oct 26 21:11:08 ost1 kernel: > <ffffffffa01c3a9f>{:kopenibnal:kibnal_listener_procint+191} > Oct 26 21:11:08 ost1 kernel: > <ffffffffa01c3a9f>{:kopenibnal:kibnal_listener_procint+191} > Oct 26 21:11:08 ost1 kernel: <ffffffff8013c2a9>{do_rw_proc+153} > <ffffffff801722d7>{vfs_write+199} > Oct 26 21:11:08 ost1 kernel: <ffffffff8013c2a9>{do_rw_proc+153} > <ffffffff801722d7>{vfs_write+199} > Oct 26 21:11:08 ost1 kernel: <ffffffff80172413>{sys_write+83} > <ffffffff801105d6>{system_call+126} > Oct 26 21:11:08 ost1 kernel: <ffffffff80172413>{sys_write+83} > <ffffffff801105d6>{system_call+126} > Oct 26 21:11:08 ost1 kernel: > Oct 26 21:11:08 ost1 kernel: > Oct 26 21:11:08 ost1 kernel: > Oct 26 21:11:08 ost1 kernel: > Oct 26 21:11:08 ost1 kernel: Code: 49 83 39 00 0f 95 c2 31 c0 45 85 ff > 0f 94 c0 85 c2 74 10 66 > Oct 26 21:11:08 ost1 kernel: Code: 49 83 39 00 0f 95 c2 31 c0 45 85 ff > 0f 94 c0 85 c2 74 10 66 > Oct 26 21:11:08 ost1 kernel: RIP > <ffffffff8013c66a>{do_proc_dointvec+106} RSP <0000010074097db8> > Oct 26 21:11:08 ost1 kernel: RIP > <ffffffff8013c66a>{do_proc_dointvec+106} RSP <0000010074097db8> > Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 > Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 > > > > Lustre-discuss mailing list > Lustre-discuss@lists.clusterfs.com > https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
Kumaran Rajaram
2006-May-19 07:36 UTC
[Lustre-discuss] kopenibnal 1.4.1.2 do_proc_dointvec bug?
This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. --=__PartC8EAE67A.0__Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Paul,=20 =20 I have been running the following tests on largefiles (minimum 32G: read and write per client)=20 i) sgp_dd=20 ii) bonnie=20 iii) iozone=20 iv) PIORAW parallel I/O test=20 =20 For metadata + many short file testing : Timing 2.6 kernel untar and build process.=20 -Kums=20>>>PAulN <pauln@psc.edu> 10/27/05 9:48 am >>>hey Kums, what tests have you been running to demonstrate stability? Also, if you''re interested I have an ''enhanced'' version of the openibnal which allows routing between quadrics and ip networks. I haven''t tested it on 1.4.x yet but it was working under lustre 1.3.3. As soon as I get this proc_intvec bug figured out I''ll be able to test the openibnal router on 1.4.2. If it works ok I''ll post the patch. paul Kumaran Rajaram wrote:>Paul, >=20 >I have been using kopenibnal (against OpenIB 1.8.0) and Lustre has >been stable on Mellanox memfree cards (since 1.4.3). On Mellanox >mem-based cards, Lustre mounts fine although crashes during large I/O. >See #7246 on ClusterFS bugzilla for more info. Not sure if what you >see is related to this. >=20 >HTH, >-Kums >=20 >>>>PAulN <pauln@psc.edu> 10/26/05 7:19 pm >>> >The kopenibnal crashes when lconf tries to set a proc value. >I remember seeing a warning during compilation, I guess it wasn''t a joke. >Has anyone else come across this? I''m running a older version of lustre so >this may already be fixed.. > >is anyone even using the kopenibnal? >paul > >during lconf: > >Service: network NET_ost1_openib NET_ost1_openib_UUID >NETWORK: NET_ost1_openib NET_ost1_openib_UUID openib ost1 988 >+ lustre/utils/lctl > network openib > mynid ost1 > quit >+ sysctl /proc/sys/openibnal/port 988 >Killed >.... > >Oct 26 21:11:02 ost1 kernel: Lustre: Filtering OBD driver; >info@clusterfs.com >Oct 26 21:11:07 ost1 kernel: Lustre: >1511:0:(openibnal.c:792:kibnal_start_ip_listener()) Listener started OK: >pid:1512 port:988 backlog:127 >Oct 26 21:11:07 ost1 kernel: Lustre: >1511:0:(openibnal.c:792:kibnal_start_ip_listener()) Listener started OK: >pid:1512 port:988 backlog:127 >Oct 26 21:11:07 ost1 kernel: Unable to handle kernel NULL pointer >dereference at 0000000000000000 RIP: >Oct 26 21:11:07 ost1 kernel: Unable to handle kernel NULL pointer >dereference at 0000000000000000 RIP: >Oct 26 21:11:07 ost1 kernel: <ffffffff8013c66a>{do_proc_dointvec+106} >Oct 26 21:11:07 ost1 kernel: <ffffffff8013c66a>{do_proc_dointvec+106} >Oct 26 21:11:07 ost1 kernel: PML4 74025067 PGD 74024067 PMD 0 >Oct 26 21:11:07 ost1 kernel: PML4 74025067 PGD 74024067 PMD 0 >Oct 26 21:11:07 ost1 kernel: Oops: 0000 [1] SMP >Oct 26 21:11:07 ost1 kernel: Oops: 0000 [1] SMP >Oct 26 21:11:07 ost1 kernel: CPU 0 >Oct 26 21:11:07 ost1 kernel: CPU 0 >Oct 26 21:11:07 ost1 kernel: Modules linked in: mds lov osc mdc >obdfilter fsfilt_ldiskfs ldiskfs ost ptlrpc obdclass lvfs kopenibnal >kptlrou >ter portals libcfs aic79xx ib_useraccess ib_tavor mod_vapi mod_vipkl >mod_thh mod_hh mod_mpga mod_vapi_common mosal ib_dm_client ib_sa_client >ib_client_query ib_cm ib_poll ib_mad ib_core ib_services md >Oct 26 21:11:07 ost1 kernel: Modules linked in: mds lov osc mdc >obdfilter fsfilt_ldiskfs ldiskfs ost ptlrpc obdclass lvfs kopenibnal >kptlrou >ter portals libcfs aic79xx ib_useraccess ib_tavor mod_vapi mod_vipkl >mod_thh mod_hh mod_mpga mod_vapi_common mosal ib_dm_client ib_sa_client >ib_client_query ib_cm ib_poll ib_mad ib_core ib_services md >Oct 26 21:11:07 ost1 kernel: Pid: 1463, comm: python Not tainted >2.6.9-qsnetp2.5.11.3qsnetlustre-1.4.2.1_pauln >Oct 26 21:11:07 ost1 kernel: Pid: 1463, comm: python Not tainted >2.6.9-qsnetp2.5.11.3qsnetlustre-1.4.2.1_pauln >Oct 26 21:11:07 ost1 kernel: RIP: 0010:[<ffffffff8013c66a>] ><ffffffff8013c66a>{do_proc_dointvec+106} >Oct 26 21:11:07 ost1 kernel: RIP: 0010:[<ffffffff8013c66a>] ><ffffffff8013c66a>{do_proc_dointvec+106} >Oct 26 21:11:07 ost1 kernel: RSP: 0018:0000010074097db8 EFLAGS: 00010206 >Oct 26 21:11:07 ost1 kernel: RSP: 0018:0000010074097db8 EFLAGS: 00010206 >Oct 26 21:11:07 ost1 kernel: RAX: 00000000000003dc RBX: ffffffffa01e0b2c >RCX: ffffffffa01e0b2c >Oct 26 21:11:07 ost1 kernel: RAX: 00000000000003dc RBX: ffffffffa01e0b2c >RCX: ffffffffa01e0b2c >Oct 26 21:11:07 ost1 kernel: RDX: 000001007370d880 RSI: 0000000000000004 >RDI: 0000000000000003 >Oct 26 21:11:07 ost1 kernel: RDX: 000001007370d880 RSI: 0000000000000004 >RDI: 0000000000000003 >Oct 26 21:11:07 ost1 kernel: RBP: 0000002a99c2d000 R08: 0000010074097ed8 >R09: 0000000000000000 >Oct 26 21:11:07 ost1 kernel: RBP: 0000002a99c2d000 R08: 0000010074097ed8 >R09: 0000000000000000 >Oct 26 21:11:07 ost1 kernel: R10: 0000000000000000 R11: 0000000000000246 >R12: 0000000000000001 >Oct 26 21:11:07 ost1 kernel: R10: 0000000000000000 R11: 0000000000000246 >R12: 0000000000000001 >Oct 26 21:11:07 ost1 kernel: R13: 0000010074097ed8 R14: 0000000000000000 >R15: 0000000000000001 >Oct 26 21:11:08 ost1 kernel: R13: 0000010074097ed8 R14: 0000000000000000 >R15: 0000000000000001 >Oct 26 21:11:08 ost1 kernel: FS: 0000002a95f59ca0(0000) >GS:ffffffff806057c0(0000) knlGS:0000000000000000 >Oct 26 21:11:08 ost1 kernel: FS: 0000002a95f59ca0(0000) >GS:ffffffff806057c0(0000) knlGS:0000000000000000 >Oct 26 21:11:08 ost1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: >000000008005003b >Oct 26 21:11:08 ost1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: >000000008005003b >Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 CR3: 0000000000101000 >CR4: 00000000000006e0 >Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 CR3: 0000000000101000 >CR4: 00000000000006e0 >Oct 26 21:11:08 ost1 kernel: Process python (pid: 1463, threadinfo >0000010074096000, task 000001007df3f070) >Oct 26 21:11:08 ost1 kernel: Process python (pid: 1463, threadinfo >0000010074096000, task 000001007df3f070) >Oct 26 21:11:08 ost1 kernel: Stack: 0000000100000200 0000000100000000 >0000002a99c2d000 000001007e0249c0 >Oct 26 21:11:08 ost1 kernel: Stack: 0000000100000200 0000000100000000 >0000002a99c2d000 000001007e0249c0 >Oct 26 21:11:08 ost1 kernel: 00000100739a4058 0000000000000001 >0000000000000000 ffffffff80162e30 >Oct 26 21:11:08 ost1 kernel: 00000100739a4058 0000000000000001 >0000000000000000 ffffffff80162e30 >Oct 26 21:11:08 ost1 kernel: 000001007452d100 000001007e024a28 >Oct 26 21:11:08 ost1 kernel: 000001007452d100 000001007e024a28 >Oct 26 21:11:08 ost1 kernel: Call >Trace:<ffffffff80162e30>{handle_mm_fault+368} ><ffffffff8017b5b5>{vfs_fstat+133} >Oct 26 21:11:08 ost1 kernel: Call >Trace:<ffffffff80162e30>{handle_mm_fault+368} ><ffffffff8017b5b5>{vfs_fstat+133} >Oct 26 21:11:08 ost1 kernel: =20 ><ffffffff802b865b>{rb_insert_color+107} ><ffffffff8013c9da>{proc_dointvec+26} >Oct 26 21:11:08 ost1 kernel: =20 ><ffffffff802b865b>{rb_insert_color+107} ><ffffffff8013c9da>{proc_dointvec+26} >Oct 26 21:11:08 ost1 kernel: =20 ><ffffffffa01c3a9f>{:kopenibnal:kibnal_listener_procint+191} >Oct 26 21:11:08 ost1 kernel: =20 ><ffffffffa01c3a9f>{:kopenibnal:kibnal_listener_procint+191} >Oct 26 21:11:08 ost1 kernel: <ffffffff8013c2a9>{do_rw_proc+153} ><ffffffff801722d7>{vfs_write+199} >Oct 26 21:11:08 ost1 kernel: <ffffffff8013c2a9>{do_rw_proc+153} ><ffffffff801722d7>{vfs_write+199} >Oct 26 21:11:08 ost1 kernel: <ffffffff80172413>{sys_write+83} ><ffffffff801105d6>{system_call+126} >Oct 26 21:11:08 ost1 kernel: <ffffffff80172413>{sys_write+83} ><ffffffff801105d6>{system_call+126} >Oct 26 21:11:08 ost1 kernel: =20 >Oct 26 21:11:08 ost1 kernel: =20 >Oct 26 21:11:08 ost1 kernel: >Oct 26 21:11:08 ost1 kernel: >Oct 26 21:11:08 ost1 kernel: Code: 49 83 39 00 0f 95 c2 31 c0 45 85 ff >0f 94 c0 85 c2 74 10 66 >Oct 26 21:11:08 ost1 kernel: Code: 49 83 39 00 0f 95 c2 31 c0 45 85 ff >0f 94 c0 85 c2 74 10 66 >Oct 26 21:11:08 ost1 kernel: RIP ><ffffffff8013c66a>{do_proc_dointvec+106} RSP <0000010074097db8> >Oct 26 21:11:08 ost1 kernel: RIP ><ffffffff8013c66a>{do_proc_dointvec+106} RSP <0000010074097db8> >Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 >Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 > > > >Lustre-discuss mailing list >Lustre-discuss@lists.clusterfs.com >https://lists.clusterfs.com/mailman/listinfo/lustre-discuss--=__PartC8EAE67A.0__Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Description: HTML <html> <head> <style type=3D"text/css"> <!-- body { line-height: normal; margin-right: 4px; font-variant: normal; margin-bottom: 1px; margin-top: 4px; margin-left: 4px } --> </style> =20 </head> <body style=3D"margin-right: 4px; margin-bottom: 1px; margin-top: 4px; margin-left: 4px"> <DIV> Paul, </DIV> <DIV> </DIV> <DIV>I have been running the following tests on largefiles (minimum 32G: read and write per client) </DIV> <DIV>i) sgp_dd </DIV> <DIV>ii) bonnie </DIV> <DIV>iii) iozone </DIV> <DIV>iv) PIORAW parallel I/O test </DIV> <DIV> </DIV> <DIV>For metadata + many short file testing : Timing 2.6 kernel untar and build process. </DIV> <DIV><br>-Kums </DIV> <DIV><br>>>>PAulN <pauln@psc.edu> 10/27/05 9:48 am >>><br>hey Kums,<br>what tests have you been running to demonstrate stability?   Also, if<br>you're interested I have an<br>'enhanced' version of the openibnal which allows routing between<br>quadrics and ip networks.<br>I haven't tested it on 1.4.x yet but it was working under lustre<br>1.3.3.   As soon as I get this<br>proc_intvec bug figured out I'll be able to test the openibnal router on<br>1.4.2.  If it works ok<br>I'll post the patch.<br>paul<br><br>Kumaran Rajaram wrote:<br><br>>Paul,<br>> <br>>I have been using kopenibnal (against OpenIB 1.8.0) and Lustre has<br>>been stable on Mellanox memfree cards (since 1.4.3). On Mellanox<br>>mem-based cards, Lustre mounts fine although crashes during large I/O.<br>>See #7246 on ClusterFS bugzilla for more info. Not sure if what you<br>>see is related to this.<br>> <br>>HTH,<br>>-Kums<br>> <br>>>>>PAulN <pauln@psc.edu> 10/26/05 7:19 pm >>><br>>The kopenibnal crashes when lconf tries to set a proc value.<br>>I remember seeing a warning during compilation, I guess it wasn't a joke.<br>>Has anyone else come across this? I'm running a older version of lustre so<br>>this may already be fixed..<br>><br>>is anyone even using the kopenibnal?<br>>paul<br>><br>>during lconf:<br>><br>>Service: network NET_ost1_openib NET_ost1_openib_UUID<br>>NETWORK: NET_ost1_openib NET_ost1_openib_UUID openib ost1 988<br>>+ lustre/utils/lctl<br>> network openib<br>> mynid ost1<br>> quit<br>>+ sysctl /proc/sys/openibnal/port 988<br>>Killed<br>>....<br>><br>>Oct 26 21:11:02 ost1 kernel: Lustre: Filtering OBD driver;<br>>info@clusterfs.com<br>>Oct 26 21:11:07 ost1 kernel: Lustre:<br>>1511:0:(openibnal.c:792:kibnal_start_ip_listener()) Listener started OK:<br>>pid:1512 port:988 backlog:127<br>>Oct 26 21:11:07 ost1 kernel: Lustre:<br>>1511:0:(openibnal.c:792:kibnal_start_ip_listener()) Listener started OK:<br>>pid:1512 port:988 backlog:127<br>>Oct 26 21:11:07 ost1 kernel: Unable to handle kernel NULL pointer<br>>dereference at 0000000000000000 RIP:<br>>Oct 26 21:11:07 ost1 kernel: Unable to handle kernel NULL pointer<br>>dereference at 0000000000000000 RIP:<br>>Oct 26 21:11:07 ost1 kernel: <ffffffff8013c66a>{do_proc_dointvec+106}<br>>Oct 26 21:11:07 ost1 kernel: <ffffffff8013c66a>{do_proc_dointvec+106}<br>>Oct 26 21:11:07 ost1 kernel: PML4 74025067 PGD 74024067 PMD 0<br>>Oct 26 21:11:07 ost1 kernel: PML4 74025067 PGD 74024067 PMD 0<br>>Oct 26 21:11:07 ost1 kernel: Oops: 0000 [1] SMP<br>>Oct 26 21:11:07 ost1 kernel: Oops: 0000 [1] SMP<br>>Oct 26 21:11:07 ost1 kernel: CPU 0<br>>Oct 26 21:11:07 ost1 kernel: CPU 0<br>>Oct 26 21:11:07 ost1 kernel: Modules linked in: mds lov osc mdc<br>>obdfilter fsfilt_ldiskfs ldiskfs ost ptlrpc obdclass lvfs kopenibnal<br>>kptlrou<br>>ter portals libcfs aic79xx ib_useraccess ib_tavor mod_vapi mod_vipkl<br>>mod_thh mod_hh mod_mpga mod_vapi_common mosal ib_dm_client ib_sa_client<br>>ib_client_query ib_cm ib_poll ib_mad ib_core ib_services md<br>>Oct 26 21:11:07 ost1 kernel: Modules linked in: mds lov osc mdc<br>>obdfilter fsfilt_ldiskfs ldiskfs ost ptlrpc obdclass lvfs kopenibnal<br>>kptlrou<br>>ter portals libcfs aic79xx ib_useraccess ib_tavor mod_vapi mod_vipkl<br>>mod_thh mod_hh mod_mpga mod_vapi_common mosal ib_dm_client ib_sa_client<br>>ib_client_query ib_cm ib_poll ib_mad ib_core ib_services md<br>>Oct 26 21:11:07 ost1 kernel: Pid: 1463, comm: python Not tainted<br>>2.6.9-qsnetp2.5.11.3qsnetlustre-1.4.2.1_pauln<br>>Oct 26 21:11:07 ost1 kernel: Pid: 1463, comm: python Not tainted<br>>2.6.9-qsnetp2.5.11.3qsnetlustre-1.4.2.1_pauln<br>>Oct 26 21:11:07 ost1 kernel: RIP: 0010:[<ffffffff8013c66a>]<br>><ffffffff8013c66a>{do_proc_dointvec+106}<br>>Oct 26 21:11:07 ost1 kernel: RIP: 0010:[<ffffffff8013c66a>]<br>><ffffffff8013c66a>{do_proc_dointvec+106}<br>>Oct 26 21:11:07 ost1 kernel: RSP: 0018:0000010074097db8  EFLAGS: 00010206<br>>Oct 26 21:11:07 ost1 kernel: RSP: 0018:0000010074097db8  EFLAGS: 00010206<br>>Oct 26 21:11:07 ost1 kernel: RAX: 00000000000003dc RBX: ffffffffa01e0b2c<br>>RCX: ffffffffa01e0b2c<br>>Oct 26 21:11:07 ost1 kernel: RAX: 00000000000003dc RBX: ffffffffa01e0b2c<br>>RCX: ffffffffa01e0b2c<br>>Oct 26 21:11:07 ost1 kernel: RDX: 000001007370d880 RSI: 0000000000000004<br>>RDI: 0000000000000003<br>>Oct 26 21:11:07 ost1 kernel: RDX: 000001007370d880 RSI: 0000000000000004<br>>RDI: 0000000000000003<br>>Oct 26 21:11:07 ost1 kernel: RBP: 0000002a99c2d000 R08: 0000010074097ed8<br>>R09: 0000000000000000<br>>Oct 26 21:11:07 ost1 kernel: RBP: 0000002a99c2d000 R08: 0000010074097ed8<br>>R09: 0000000000000000<br>>Oct 26 21:11:07 ost1 kernel: R10: 0000000000000000 R11: 0000000000000246<br>>R12: 0000000000000001<br>>Oct 26 21:11:07 ost1 kernel: R10: 0000000000000000 R11: 0000000000000246<br>>R12: 0000000000000001<br>>Oct 26 21:11:07 ost1 kernel: R13: 0000010074097ed8 R14: 0000000000000000<br>>R15: 0000000000000001<br>>Oct 26 21:11:08 ost1 kernel: R13: 0000010074097ed8 R14: 0000000000000000<br>>R15: 0000000000000001<br>>Oct 26 21:11:08 ost1 kernel: FS:  0000002a95f59ca0(0000)<br>>GS:ffffffff806057c0(0000) knlGS:0000000000000000<br>>Oct 26 21:11:08 ost1 kernel: FS:  0000002a95f59ca0(0000)<br>>GS:ffffffff806057c0(0000) knlGS:0000000000000000<br>>Oct 26 21:11:08 ost1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:<br>>000000008005003b<br>>Oct 26 21:11:08 ost1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:<br>>000000008005003b<br>>Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 CR3: 0000000000101000<br>>CR4: 00000000000006e0<br>>Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 CR3: 0000000000101000<br>>CR4: 00000000000006e0<br>>Oct 26 21:11:08 ost1 kernel: Process python (pid: 1463, threadinfo<br>>0000010074096000, task 000001007df3f070)<br>>Oct 26 21:11:08 ost1 kernel: Process python (pid: 1463, threadinfo<br>>0000010074096000, task 000001007df3f070)<br>>Oct 26 21:11:08 ost1 kernel: Stack: 0000000100000200 0000000100000000<br>>0000002a99c2d000 000001007e0249c0<br>>Oct 26 21:11:08 ost1 kernel: Stack: 0000000100000200 0000000100000000<br>>0000002a99c2d000 000001007e0249c0<br>>Oct 26 21:11:08 ost1 kernel:        00000100739a4058 0000000000000001<br>>0000000000000000 ffffffff80162e30<br>>Oct 26 21:11:08 ost1 kernel:        00000100739a4058 0000000000000001<br>>0000000000000000 ffffffff80162e30<br>>Oct 26 21:11:08 ost1 kernel:        000001007452d100 000001007e024a28<br>>Oct 26 21:11:08 ost1 kernel:        000001007452d100 000001007e024a28<br>>Oct 26 21:11:08 ost1 kernel: Call<br>>Trace:<ffffffff80162e30>{handle_mm_fault+368}<br>><ffffffff8017b5b5>{vfs_fstat+133}<br>>Oct 26 21:11:08 ost1 kernel: Call<br>>Trace:<ffffffff80162e30>{handle_mm_fault+368}<br>><ffffffff8017b5b5>{vfs_fstat+133}<br>>Oct 26 21:11:08 ost1 kernel:      <br>><ffffffff802b865b>{rb_insert_color+107}<br>><ffffffff8013c9da>{proc_dointvec+26}<br>>Oct 26 21:11:08 ost1 kernel:      <br>><ffffffff802b865b>{rb_insert_color+107}<br>><ffffffff8013c9da>{proc_dointvec+26}<br>>Oct 26 21:11:08 ost1 kernel:      <br>><ffffffffa01c3a9f>{:kopenibnal:kibnal_listener_procint+191}<br>>Oct 26 21:11:08 ost1 kernel:      <br>><ffffffffa01c3a9f>{:kopenibnal:kibnal_listener_procint+191}<br>>Oct 26 21:11:08 ost1 kernel:        <ffffffff8013c2a9>{do_rw_proc+153}<br>><ffffffff801722d7>{vfs_write+199}<br>>Oct 26 21:11:08 ost1 kernel:        <ffffffff8013c2a9>{do_rw_proc+153}<br>><ffffffff801722d7>{vfs_write+199}<br>>Oct 26 21:11:08 ost1 kernel:        <ffffffff80172413>{sys_write+83}<br>><ffffffff801105d6>{system_call+126}<br>>Oct 26 21:11:08 ost1 kernel:        <ffffffff80172413>{sys_write+83}<br>><ffffffff801105d6>{system_call+126}<br>>Oct 26 21:11:08 ost1 kernel:      <br>>Oct 26 21:11:08 ost1 kernel:      <br>>Oct 26 21:11:08 ost1 kernel:<br>>Oct 26 21:11:08 ost1 kernel:<br>>Oct 26 21:11:08 ost1 kernel: Code: 49 83 39 00 0f 95 c2 31 c0 45 85 ff<br>>0f 94 c0 85 c2 74 10 66<br>>Oct 26 21:11:08 ost1 kernel: Code: 49 83 39 00 0f 95 c2 31 c0 45 85 ff<br>>0f 94 c0 85 c2 74 10 66<br>>Oct 26 21:11:08 ost1 kernel: RIP<br>><ffffffff8013c66a>{do_proc_dointvec+106} RSP <0000010074097db8><br>>Oct 26 21:11:08 ost1 kernel: RIP<br>><ffffffff8013c66a>{do_proc_dointvec+106} RSP <0000010074097db8><br>>Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000<br>>Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000<br>><br>><br>><br>>Lustre-discuss mailing list<br>>Lustre-discuss@lists.clusterfs.com<br>>https://lists.clusterfs.com/mailman/listinfo/lustre-discuss<br><br><br><br> </DIV> </body> </html> --=__PartC8EAE67A.0__=--
The kopenibnal crashes when lconf tries to set a proc value. I remember seeing a warning during compilation, I guess it wasn''t a joke. Has anyone else come across this? I''m running a older version of lustre so this may already be fixed.. is anyone even using the kopenibnal? paul during lconf: Service: network NET_ost1_openib NET_ost1_openib_UUID NETWORK: NET_ost1_openib NET_ost1_openib_UUID openib ost1 988 + lustre/utils/lctl network openib mynid ost1 quit + sysctl /proc/sys/openibnal/port 988 Killed .... Oct 26 21:11:02 ost1 kernel: Lustre: Filtering OBD driver; info@clusterfs.com Oct 26 21:11:07 ost1 kernel: Lustre: 1511:0:(openibnal.c:792:kibnal_start_ip_listener()) Listener started OK: pid:1512 port:988 backlog:127 Oct 26 21:11:07 ost1 kernel: Lustre: 1511:0:(openibnal.c:792:kibnal_start_ip_listener()) Listener started OK: pid:1512 port:988 backlog:127 Oct 26 21:11:07 ost1 kernel: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: Oct 26 21:11:07 ost1 kernel: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: Oct 26 21:11:07 ost1 kernel: <ffffffff8013c66a>{do_proc_dointvec+106} Oct 26 21:11:07 ost1 kernel: <ffffffff8013c66a>{do_proc_dointvec+106} Oct 26 21:11:07 ost1 kernel: PML4 74025067 PGD 74024067 PMD 0 Oct 26 21:11:07 ost1 kernel: PML4 74025067 PGD 74024067 PMD 0 Oct 26 21:11:07 ost1 kernel: Oops: 0000 [1] SMP Oct 26 21:11:07 ost1 kernel: Oops: 0000 [1] SMP Oct 26 21:11:07 ost1 kernel: CPU 0 Oct 26 21:11:07 ost1 kernel: CPU 0 Oct 26 21:11:07 ost1 kernel: Modules linked in: mds lov osc mdc obdfilter fsfilt_ldiskfs ldiskfs ost ptlrpc obdclass lvfs kopenibnal kptlrou ter portals libcfs aic79xx ib_useraccess ib_tavor mod_vapi mod_vipkl mod_thh mod_hh mod_mpga mod_vapi_common mosal ib_dm_client ib_sa_client ib_client_query ib_cm ib_poll ib_mad ib_core ib_services md Oct 26 21:11:07 ost1 kernel: Modules linked in: mds lov osc mdc obdfilter fsfilt_ldiskfs ldiskfs ost ptlrpc obdclass lvfs kopenibnal kptlrou ter portals libcfs aic79xx ib_useraccess ib_tavor mod_vapi mod_vipkl mod_thh mod_hh mod_mpga mod_vapi_common mosal ib_dm_client ib_sa_client ib_client_query ib_cm ib_poll ib_mad ib_core ib_services md Oct 26 21:11:07 ost1 kernel: Pid: 1463, comm: python Not tainted 2.6.9-qsnetp2.5.11.3qsnetlustre-1.4.2.1_pauln Oct 26 21:11:07 ost1 kernel: Pid: 1463, comm: python Not tainted 2.6.9-qsnetp2.5.11.3qsnetlustre-1.4.2.1_pauln Oct 26 21:11:07 ost1 kernel: RIP: 0010:[<ffffffff8013c66a>] <ffffffff8013c66a>{do_proc_dointvec+106} Oct 26 21:11:07 ost1 kernel: RIP: 0010:[<ffffffff8013c66a>] <ffffffff8013c66a>{do_proc_dointvec+106} Oct 26 21:11:07 ost1 kernel: RSP: 0018:0000010074097db8 EFLAGS: 00010206 Oct 26 21:11:07 ost1 kernel: RSP: 0018:0000010074097db8 EFLAGS: 00010206 Oct 26 21:11:07 ost1 kernel: RAX: 00000000000003dc RBX: ffffffffa01e0b2c RCX: ffffffffa01e0b2c Oct 26 21:11:07 ost1 kernel: RAX: 00000000000003dc RBX: ffffffffa01e0b2c RCX: ffffffffa01e0b2c Oct 26 21:11:07 ost1 kernel: RDX: 000001007370d880 RSI: 0000000000000004 RDI: 0000000000000003 Oct 26 21:11:07 ost1 kernel: RDX: 000001007370d880 RSI: 0000000000000004 RDI: 0000000000000003 Oct 26 21:11:07 ost1 kernel: RBP: 0000002a99c2d000 R08: 0000010074097ed8 R09: 0000000000000000 Oct 26 21:11:07 ost1 kernel: RBP: 0000002a99c2d000 R08: 0000010074097ed8 R09: 0000000000000000 Oct 26 21:11:07 ost1 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 Oct 26 21:11:07 ost1 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 Oct 26 21:11:07 ost1 kernel: R13: 0000010074097ed8 R14: 0000000000000000 R15: 0000000000000001 Oct 26 21:11:08 ost1 kernel: R13: 0000010074097ed8 R14: 0000000000000000 R15: 0000000000000001 Oct 26 21:11:08 ost1 kernel: FS: 0000002a95f59ca0(0000) GS:ffffffff806057c0(0000) knlGS:0000000000000000 Oct 26 21:11:08 ost1 kernel: FS: 0000002a95f59ca0(0000) GS:ffffffff806057c0(0000) knlGS:0000000000000000 Oct 26 21:11:08 ost1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 26 21:11:08 ost1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0 Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0 Oct 26 21:11:08 ost1 kernel: Process python (pid: 1463, threadinfo 0000010074096000, task 000001007df3f070) Oct 26 21:11:08 ost1 kernel: Process python (pid: 1463, threadinfo 0000010074096000, task 000001007df3f070) Oct 26 21:11:08 ost1 kernel: Stack: 0000000100000200 0000000100000000 0000002a99c2d000 000001007e0249c0 Oct 26 21:11:08 ost1 kernel: Stack: 0000000100000200 0000000100000000 0000002a99c2d000 000001007e0249c0 Oct 26 21:11:08 ost1 kernel: 00000100739a4058 0000000000000001 0000000000000000 ffffffff80162e30 Oct 26 21:11:08 ost1 kernel: 00000100739a4058 0000000000000001 0000000000000000 ffffffff80162e30 Oct 26 21:11:08 ost1 kernel: 000001007452d100 000001007e024a28 Oct 26 21:11:08 ost1 kernel: 000001007452d100 000001007e024a28 Oct 26 21:11:08 ost1 kernel: Call Trace:<ffffffff80162e30>{handle_mm_fault+368} <ffffffff8017b5b5>{vfs_fstat+133} Oct 26 21:11:08 ost1 kernel: Call Trace:<ffffffff80162e30>{handle_mm_fault+368} <ffffffff8017b5b5>{vfs_fstat+133} Oct 26 21:11:08 ost1 kernel: <ffffffff802b865b>{rb_insert_color+107} <ffffffff8013c9da>{proc_dointvec+26} Oct 26 21:11:08 ost1 kernel: <ffffffff802b865b>{rb_insert_color+107} <ffffffff8013c9da>{proc_dointvec+26} Oct 26 21:11:08 ost1 kernel: <ffffffffa01c3a9f>{:kopenibnal:kibnal_listener_procint+191} Oct 26 21:11:08 ost1 kernel: <ffffffffa01c3a9f>{:kopenibnal:kibnal_listener_procint+191} Oct 26 21:11:08 ost1 kernel: <ffffffff8013c2a9>{do_rw_proc+153} <ffffffff801722d7>{vfs_write+199} Oct 26 21:11:08 ost1 kernel: <ffffffff8013c2a9>{do_rw_proc+153} <ffffffff801722d7>{vfs_write+199} Oct 26 21:11:08 ost1 kernel: <ffffffff80172413>{sys_write+83} <ffffffff801105d6>{system_call+126} Oct 26 21:11:08 ost1 kernel: <ffffffff80172413>{sys_write+83} <ffffffff801105d6>{system_call+126} Oct 26 21:11:08 ost1 kernel: Oct 26 21:11:08 ost1 kernel: Oct 26 21:11:08 ost1 kernel: Oct 26 21:11:08 ost1 kernel: Oct 26 21:11:08 ost1 kernel: Code: 49 83 39 00 0f 95 c2 31 c0 45 85 ff 0f 94 c0 85 c2 74 10 66 Oct 26 21:11:08 ost1 kernel: Code: 49 83 39 00 0f 95 c2 31 c0 45 85 ff 0f 94 c0 85 c2 74 10 66 Oct 26 21:11:08 ost1 kernel: RIP <ffffffff8013c66a>{do_proc_dointvec+106} RSP <0000010074097db8> Oct 26 21:11:08 ost1 kernel: RIP <ffffffff8013c66a>{do_proc_dointvec+106} RSP <0000010074097db8> Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000 Oct 26 21:11:08 ost1 kernel: CR2: 0000000000000000