Huang Qiulan
2009-Sep-13 06:00 UTC
[Lustre-discuss] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP
Dear, list In this days, we got a unusually error of OSS crash. And when we restart the OSS and perform the recovery process in default. However,the OSS crashed a short time later in the recovery status. Then we reboot it again and abort recovery with the command: lctl --device N abort_recovery But the OSS crash again with the following log. We have no idea what to cause it. Please give us some ideas and I will be appreciated with your any help. Sep 13 00:44:52 boss15 kernel: LustreError: 23119:0:(ldlm_lib.c:1619: target_send_reply_msg()) @@@ processing error (-19) req at 000001045c2b5600 x1464287/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl 1252774392 ref 1 fl Interpret:/0/0 rc -19/0 Sep 13 00:44:52 boss15 kernel: LustreError: 23119:0:(ldlm_lib.c:1619: target_send_reply_msg()) Skipped 188 previous similar messages Sep 13 00:44:59 boss15 kernel: LustreError: 23123:0:(ldlm_lib.c:819: target_handle_connect()) besfs-OST0034: denying connection for new client 202.122.33.82 at tcp (8e8a925f-f4cc-58c1-851a-b22bc2d63f3c): 141 clients in recovery for 1199s Sep 13 00:44:59 boss15 kernel: LustreError: 23123:0:(ldlm_lib.c:819: target_handle_connect()) Skipped 2 previous similar messages Sep 13 00:46:10 boss15 kernel: LustreError: 24613:0:(filter.c:3630: filter_iocontrol()) aborting recovery for device besfs-OST0034 Sep 13 00:46:10 boss15 kernel: Lustre: besfs-OST0034: recovery period over; 115 clients never reconnected after 371s (281 clients did) Sep 13 00:46:10 boss15 kernel: LustreError: 24613:0:(genops.c:1061: class_disconnect_stale_exports()) besfs-OST0034: disconnecting 115 stale cl ients Sep 13 00:46:10 boss15 kernel: Lustre: besfs-OST0034: sending delayed replies to recovered clients Sep 13 00:46:10 boss15 kernel: Lustre: besfs-OST0034: received MDS connection from 192.168.50.32 at tcp Sep 13 00:46:10 boss15 kernel: Lustre: 22989:0:(filter.c:2830: filter_destroy_precreated()) besfs-OST0034: deleting orphan objects from 5027 to 5207 Sep 13 00:46:13 boss15 kernel: LustreError: 24625:0:(filter.c:3630: filter_iocontrol()) aborting recovery for device besfs-OST0035 Sep 13 00:46:13 boss15 kernel: Lustre: besfs-OST0035: recovery period over; 112 clients never reconnected after 365s (284 clients did) Sep 13 00:46:13 boss15 kernel: LustreError: 24625:0:(genops.c:1061: class_disconnect_stale_exports()) besfs-OST0035: disconnecting 112 stale cl ients Sep 13 00:46:13 boss15 kernel: Lustre: besfs-OST0035: sending delayed replies to recovered clients Sep 13 00:46:13 boss15 kernel: Lustre: besfs-OST0035: received MDS connection from 192.168.50.32 at tcp Sep 13 00:46:13 boss15 kernel: Lustre: 23044:0:(filter.c:2830: filter_destroy_precreated()) besfs-OST0035: deleting orphan objects from 5172 to 5612 Sep 13 00:46:14 boss15 kernel: LustreError: 23109:0:(filter.c:1396: filter_destroy_internal()) destroying objid 4902 ino 107326972 nlink 14727 count 1 Sep 13 00:46:14 boss15 kernel: LustreError: 23109:0:(filter.c:1402: filter_destroy_internal()) error unlinking objid 4902: rc -1 Sep 13 00:46:16 boss15 kernel: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: Sep 13 00:46:16 boss15 kernel: <ffffffff801ee7f2>{__memset+50} Thanks, Sarea