uselton2@llnl.gov
2006-Dec-20 09:33 UTC
[Lustre-devel] [Bug 10823] ptlrpc_import_recovery_state_machine null pointer dereference
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=10823
What |Removed |Added
----------------------------------------------------------------------------
OtherBugsDependingO|3839 |
nThis| |
At a first look this seems to be what I just saw on our test cluster:
2006-12-19 23:31:04 Unable to handle kernel paging request at virtual address
5a5a5a8a
2006-12-19 23:31:04 printing eip:
2006-12-19 23:31:04 fcf7952b
2006-12-19 23:31:04 *pde = 00000000
2006-12-19 23:31:04 Oops: 0000 [#1]
2006-12-19 23:31:04 SMP
2006-12-19 23:31:04 Modules linked in: osc(U) nfs(U) lockd(U) sg(U) llite(U)
lov(U) lquota(U) mdc(U) kqswlnd(U) ptlrpc(U) lnet(U) obdclass(U) lvfs(U)
libcfs(U) perfctr(U) netdump(U) i2c_dev(U) i2c_core(U) jtag(U) rms(U)
dm_mirror(U) dm_mod(U) ohci_hcd(U) tg3(U) eip(U) ep(U) elan3(U) sunrpc(U)
elan4(U) elan(U) qsnet(U) floppy(U) rtc(U) ext3(U) jbd(U) mptscsih(U) mptsas(U)
mptspi(U) mptfc(U) mptscsi(U) mptbase(U) sd_mod(U) scsi_mod(U)
2006-12-19 23:31:04 CPU: 1
2006-12-19 23:31:04 EIP: 0060:[<fcf7952b>] Not tainted VLI
2006-12-19 23:31:04 EFLAGS: 00010206 (2.6.9-41.4chaos)
2006-12-19 23:31:04 EIP is at ptlrpc_resend_req+0xb2/0x308 [ptlrpc]
2006-12-19 23:31:04 eax: 5a5a5a5a ebx: db8be464 ecx: db8be464 edx:
00000000
2006-12-19 23:31:04 esi: d6ae8e00 edi: fcfc0495 ebp: f77f1e90 esp:
f77f1e84
2006-12-19 23:31:04 ds: 007b es: 007b ss: 0068
2006-12-19 23:31:04 Process ptlrpcd-recov (pid: 3535, threadinfo=f77f0000
task=f1abe8b0)
2006-12-19 23:31:04 Stack: d6ae8200 d6ae8e00 dce50400 f77f1ea0 fcf7eaf7 d6ae8e00
dce50500 f77f1ec4
2006-12-19 23:31:04 fcfa3446 00000000 fcfc64ec fcfa803f 00000231 00000000
dce50500 dce50400
2006-12-19 23:31:04 f77f1f00 fcfa195c 00000000 d6ae8e30 d6ae8e00 00000001
d6ae8e00 00000000
2006-12-19 23:31:04 Call Trace:
2006-12-19 23:31:04 [<c01063a8>] show_stack+0x76/0x7e
2006-12-19 23:31:04 [<c01064b7>] show_registers+0xf0/0x15a
2006-12-19 23:31:04 [<c010666b>] die+0xe0/0x170
2006-12-19 23:31:04 [<c01151f2>] do_page_fault+0x41d/0x5fb
2006-12-19 23:31:04 [<c02a93df>] error_code+0x2f/0x38
2006-12-19 23:31:04 [<fcf7eaf7>] ptlrpc_resend+0x201/0x2a4 [ptlrpc]
2006-12-19 23:31:04 [<fcfa3446>]
ptlrpc_import_recovery_state_machine+0x7d1/0xacc [ptlrpc]
2006-12-19 23:31:04 [<fcfa195c>] ptlrpc_connect_interpret+0xe42/0x1b16
[ptlrpc]
2006-12-19 23:31:04 [<fcf76071>] ptlrpc_check_set+0xedf/0x1083 [ptlrpc]
2006-12-19 23:31:04 [<fcfa41b5>] ptlrpcd_check+0x185/0x292 [ptlrpc]
2006-12-19 23:31:04 [<fcfa45e4>] ptlrpcd+0x322/0x44f [ptlrpc]
2006-12-19 23:31:04 [<c0104769>] kernel_thread_helper+0x5/0xb
2006-12-19 23:31:04 Code: fc bf 9d 04 fc fc eb 1a bf a1 04 fc fc eb 13 bf a5 04
fc fc eb 0c bf aa 04 fc fc eb 05 bf b4 04 fc fc 8b 43 54 31 d2 85 c0 74 03
<8b>
50 30 52 31 d2 85 c0 ff 73 1c 74 03 8b 50 34 52 31 c0 8b 73
2006-12-19 23:31:08 CPU#0 is frozen.
2006-12-19 23:31:08 CPU#1 is executing netdump.
2006-12-19 23:31:08 < netdump activated - performing handshake with the
server. >
2006-12-19 23:31:08 NETDUMP START!
2006-12-19 23:31:08 < handshake completed - listening for dump requests. >
we''re running:
Kernel = 2.6.9-41.4chaos
Lustre = 1.4.7.2_pre_8llnl
It looks like we didn''t get a dump with it I''m sorry to say.
This bug is
labeled as "new" and "fixed". I''m not sure that
either of those is true :)