uselton2@llnl.gov
2006-Dec-20 09:33 UTC
[Lustre-devel] [Bug 10823] ptlrpc_import_recovery_state_machine null pointer dereference
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10823 What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO|3839 | nThis| | At a first look this seems to be what I just saw on our test cluster: 2006-12-19 23:31:04 Unable to handle kernel paging request at virtual address 5a5a5a8a 2006-12-19 23:31:04 printing eip: 2006-12-19 23:31:04 fcf7952b 2006-12-19 23:31:04 *pde = 00000000 2006-12-19 23:31:04 Oops: 0000 [#1] 2006-12-19 23:31:04 SMP 2006-12-19 23:31:04 Modules linked in: osc(U) nfs(U) lockd(U) sg(U) llite(U) lov(U) lquota(U) mdc(U) kqswlnd(U) ptlrpc(U) lnet(U) obdclass(U) lvfs(U) libcfs(U) perfctr(U) netdump(U) i2c_dev(U) i2c_core(U) jtag(U) rms(U) dm_mirror(U) dm_mod(U) ohci_hcd(U) tg3(U) eip(U) ep(U) elan3(U) sunrpc(U) elan4(U) elan(U) qsnet(U) floppy(U) rtc(U) ext3(U) jbd(U) mptscsih(U) mptsas(U) mptspi(U) mptfc(U) mptscsi(U) mptbase(U) sd_mod(U) scsi_mod(U) 2006-12-19 23:31:04 CPU: 1 2006-12-19 23:31:04 EIP: 0060:[<fcf7952b>] Not tainted VLI 2006-12-19 23:31:04 EFLAGS: 00010206 (2.6.9-41.4chaos) 2006-12-19 23:31:04 EIP is at ptlrpc_resend_req+0xb2/0x308 [ptlrpc] 2006-12-19 23:31:04 eax: 5a5a5a5a ebx: db8be464 ecx: db8be464 edx: 00000000 2006-12-19 23:31:04 esi: d6ae8e00 edi: fcfc0495 ebp: f77f1e90 esp: f77f1e84 2006-12-19 23:31:04 ds: 007b es: 007b ss: 0068 2006-12-19 23:31:04 Process ptlrpcd-recov (pid: 3535, threadinfo=f77f0000 task=f1abe8b0) 2006-12-19 23:31:04 Stack: d6ae8200 d6ae8e00 dce50400 f77f1ea0 fcf7eaf7 d6ae8e00 dce50500 f77f1ec4 2006-12-19 23:31:04 fcfa3446 00000000 fcfc64ec fcfa803f 00000231 00000000 dce50500 dce50400 2006-12-19 23:31:04 f77f1f00 fcfa195c 00000000 d6ae8e30 d6ae8e00 00000001 d6ae8e00 00000000 2006-12-19 23:31:04 Call Trace: 2006-12-19 23:31:04 [<c01063a8>] show_stack+0x76/0x7e 2006-12-19 23:31:04 [<c01064b7>] show_registers+0xf0/0x15a 2006-12-19 23:31:04 [<c010666b>] die+0xe0/0x170 2006-12-19 23:31:04 [<c01151f2>] do_page_fault+0x41d/0x5fb 2006-12-19 23:31:04 [<c02a93df>] error_code+0x2f/0x38 2006-12-19 23:31:04 [<fcf7eaf7>] ptlrpc_resend+0x201/0x2a4 [ptlrpc] 2006-12-19 23:31:04 [<fcfa3446>] ptlrpc_import_recovery_state_machine+0x7d1/0xacc [ptlrpc] 2006-12-19 23:31:04 [<fcfa195c>] ptlrpc_connect_interpret+0xe42/0x1b16 [ptlrpc] 2006-12-19 23:31:04 [<fcf76071>] ptlrpc_check_set+0xedf/0x1083 [ptlrpc] 2006-12-19 23:31:04 [<fcfa41b5>] ptlrpcd_check+0x185/0x292 [ptlrpc] 2006-12-19 23:31:04 [<fcfa45e4>] ptlrpcd+0x322/0x44f [ptlrpc] 2006-12-19 23:31:04 [<c0104769>] kernel_thread_helper+0x5/0xb 2006-12-19 23:31:04 Code: fc bf 9d 04 fc fc eb 1a bf a1 04 fc fc eb 13 bf a5 04 fc fc eb 0c bf aa 04 fc fc eb 05 bf b4 04 fc fc 8b 43 54 31 d2 85 c0 74 03 <8b> 50 30 52 31 d2 85 c0 ff 73 1c 74 03 8b 50 34 52 31 c0 8b 73 2006-12-19 23:31:08 CPU#0 is frozen. 2006-12-19 23:31:08 CPU#1 is executing netdump. 2006-12-19 23:31:08 < netdump activated - performing handshake with the server. > 2006-12-19 23:31:08 NETDUMP START! 2006-12-19 23:31:08 < handshake completed - listening for dump requests. > we''re running: Kernel = 2.6.9-41.4chaos Lustre = 1.4.7.2_pre_8llnl It looks like we didn''t get a dump with it I''m sorry to say. This bug is labeled as "new" and "fixed". I''m not sure that either of those is true :)