Nirmal Seenu
2009-May-21 18:45 UTC
[Lustre-discuss] Call trace generated on clients when rebooting the MDS.
I notice the following call trace on some(10-20%) of the lustre clients whenever I reboot the MDS. The lustre clients eventually recover and everything seems to be working fine at that point. Does anyone else notice these errors? Is it safe to ignore these errors? I am running the 1.6.7.1 lustre patched RHEL5 kernel and the clients run the 1.6.7.1 patchless clients on RHEL kernel: 2.6.18-128.1.10. BUG: soft lockup - CPU#1 stuck for 10s! [ptlrpcd-recov:10639] CPU 1: Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) libafs(PU) autofs4 hidp nfs lockd fscache nfs_acl rfcomm l2cap bluetooth sunrpc ip_conntrack_netbios_ns xt_state ip_conntrack nfnetlink ipt_REJECT iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api vfat fat dm_mirror dm_log dm_multipath scsi_dh dm_mod video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sr_mod joydev sg usb_storage ide_cd e1000e serio_raw i2c_piix4 cdrom pcspkr i2c_core shpchp bnx2 sata_svw libata megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 10639, comm: ptlrpcd-recov Tainted: P 2.6.18-128.1.10.el5 #1 RIP: 0010:[<ffffffff8865dcac>] [<ffffffff8865dcac>] :lnet:lnet_lookup_cookie+0x3c/0x50 RSP: 0018:ffff81022048bc58 EFLAGS: 00000206 RAX: ffff8103aec504d0 RBX: ffff81029678f000 RCX: ffff8103cb3215d0 RDX: ffff81021c1df250 RSI: 0000000000000001 RDI: 00000000052d3325 RBP: 0000000000000282 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000100000000 R11: 0000000000000000 R12: ffff81034e9de640 R13: ffffc2001049d4d0 R14: 0000000c00000000 R15: ffffffff8002df8f FS: 00002b4d02c93240(0000) GS:ffff81024711ca40(0000) knlGS:00000000f7f346c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b033ff30000 CR3: 00000001d3b13000 CR4: 00000000000006e0 Call Trace: [<ffffffff886647cb>] :lnet:LNetMDUnlink+0x7b/0xf0 [<ffffffff8878c64c>] :ptlrpc:at_add+0x4c/0x1b0 [<ffffffff8876d9d0>] :ptlrpc:lustre_msg_get_slv+0x30/0xf0 [<ffffffff8875c611>] :ptlrpc:ptlrpc_at_adj_net_latency+0xe1/0x200 [<ffffffff88744ae0>] :ptlrpc:ldlm_cli_update_pool+0x1f0/0x2a0 [<ffffffff8875e76c>] :ptlrpc:ptlrpc_unregister_reply+0x23c/0x9c0 [<ffffffff8875e3df>] :ptlrpc:after_reply+0x7df/0x8d0 [<ffffffff8002df8f>] __wake_up+0x38/0x4f [<ffffffff88761c75>] :ptlrpc:ptlrpc_check_set+0x15b5/0x18d0 [<ffffffff8879375d>] :ptlrpc:ptlrpcd_check+0xdd/0x1f0 [<ffffffff80094ffc>] process_timeout+0x0/0x5 [<ffffffff8003b730>] remove_wait_queue+0x1c/0x2c [<ffffffff88793ca8>] :ptlrpc:ptlrpcd+0xb8/0x259 [<ffffffff8008a4b3>] default_wake_function+0x0/0xe [<ffffffff800b4a92>] audit_syscall_exit+0x31b/0x336 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff88793bf0>] :ptlrpc:ptlrpcd+0x0/0x259 [<ffffffff8005dfa7>] child_rip+0x0/0x11 TIA Nirmal