We experienced an nfs lockup on the Lustre client exporting the filesystem. nfs is tuned to run 256 daemons in anticipation of heavy load. The Client is connected with bonded 10Gbps ethernet to the NFS export network, and also bonded 10Gbps ethernet to the Lustre network. At the time of the crash, approx 800Mbps of NFS writes to the Lustre filesystem was taking place. Any input on on this issue and most importantly how to prevent it is appreciated! Vitals: 2.6.18-164.11.1.el5_lustre.1.8.2 CentOS release 5.4 96GB RAM Dual intel quad Messages: Jul 21 14:25:36 ID6316-Client1 kernel: LustreError: 5464:0:(lov_merge.c:74:lov_merge_lvb()) ASSERTION(spin_is_locked(&lsm->lsm_lock)) failed Jul 21 14:25:36 ID6316-Client1 kernel: LustreError: 5464:0:(lov_merge.c:74:lov_merge_lvb()) LBUG Jul 21 14:25:36 ID6316-Client1 kernel: Lustre: 5464:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for process 5464 Jul 21 14:25:36 ID6316-Client1 kernel: nfsd R running task 0 5464 1 5465 5463 (L-TLB) Jul 21 14:25:36 ID6316-Client1 kernel: 0000000000000000 0000000000000000 0000000000000001 0000000000000000 Jul 21 14:25:36 ID6316-Client1 kernel: 0000000000000000 0000000000000000 0000000000000086 ffffffff80047152 Jul 21 14:25:36 ID6316-Client1 kernel: 0000000000000001 0000000000000001 0000000000000000 0000000000000001 Jul 21 14:25:36 ID6316-Client1 kernel: Call Trace: Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff80047152>] try_to_wake_up+0x472/0x484 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8016d94c>] vgacon_scroll+0x21e/0x23f Jul 21 14:25:36 ID6316-Client1 last message repeated 2 times Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff80091d2d>] printk+0x52/0xbd Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8008ac95>] __wake_up_common+0x3e/0x68 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8008ac95>] __wake_up_common+0x3e/0x68 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8002e2f3>] __wake_up+0x38/0x4f Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff800a7b2e>] kallsyms_lookup+0xe6/0x1ae Jul 21 14:25:36 ID6316-Client1 last message repeated 3 times Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8006bc3b>] printk_address+0x9f/0xab Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff80091d2d>] printk+0x52/0xbd Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff80091d2d>] printk+0x52/0xbd Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff800a54d2>] module_text_address+0x33/0x3c Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8009e65b>] kernel_text_address+0x1a/0x26 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8006b921>] dump_trace+0x206/0x22f Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff88768c8f>] :osc:osc_enqueue_fini+0x1af/0x240 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8006b97e>] show_trace+0x34/0x47 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8006ba83>] _show_stack+0xdb/0xea Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff88551b1a>] :libcfs:lbug_with_loc+0x7a/0xd0 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff88559e90>] :libcfs:tracefile_init+0x0/0x110 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8881ac2e>] :lov:lov_merge_lvb+0x4e/0x220 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff886b86d1>] :ptlrpc:lustre_swab_buf+0x81/0x170 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff888747cd>] :lustre:ll_inode_size_lock+0x5d/0x160 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8885d2ca>] :lustre:ll_extent_lock+0x8ea/0xaf0 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff885522b8>] :libcfs:cfs_alloc+0x68/0xc0 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8876cd50>] :osc:osc_extent_blocking_cb+0x0/0x2b0 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff88692e50>] :ptlrpc:ldlm_completion_ast+0x0/0x880 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff88868510>] :lustre:ll_glimpse_callback+0x0/0x440 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8889ef0c>] :lustre:ll_tree_lock_iov+0x14c/0x310 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8889e987>] :lustre:ll_node_from_inode+0xc7/0x210 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8885da6a>] :lustre:ll_file_get_tree_lock_iov+0x59a/0x740 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff886754f4>] :ptlrpc:ldlm_lock_add_to_lru+0x74/0xe0 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8867219a>] :ptlrpc:lock_res_and_lock+0xba/0xd0 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8885e6f4>] :lustre:ll_file_writev+0xae4/0x1750 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8867736a>] :ptlrpc:ldlm_lock_decref+0x9a/0xc0 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff887e0d9a>] :mdc:mdc_set_lock_data+0x1da/0x250 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff88692e50>] :ptlrpc:ldlm_completion_ast+0x0/0x880 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff88884ebd>] :lustre:ras_reset+0x2d/0x110 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8885f360>] :lustre:ll_file_write+0x0/0x20 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff800df4cd>] do_readv_writev+0x172/0x291 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8885f360>] :lustre:ll_file_write+0x0/0x20 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff888698fb>] :lustre:ll_file_open+0xbab/0xd10 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8008ac95>] __wake_up_common+0x3e/0x68 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8850b753>] :nfsd:nfsd_acceptable+0x0/0xd8 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8850c711>] :nfsd:nfsd_vfs_write+0xf2/0x2e1 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff88868d50>] :lustre:ll_file_open+0x0/0xd10 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8001e8b7>] __dentry_open+0x101/0x1dc Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8850d027>] :nfsd:nfsd_write+0xb5/0xd5 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff88513ae2>] :nfsd:nfsd3_proc_write+0xea/0x109 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff885091db>] :nfsd:nfsd_dispatch+0xd8/0x1d6 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8843d529>] :sunrpc:svc_process+0x454/0x71b Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff80064644>] __down_read+0x12/0x92 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff885095a1>] :nfsd:nfsd+0x0/0x2cb Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff88509746>] :nfsd:nfsd+0x1a5/0x2cb Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff885095a1>] :nfsd:nfsd+0x0/0x2cb Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff885095a1>] :nfsd:nfsd+0x0/0x2cb Jul 21 14:25:36 ID6316-Client1 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Jul 21 14:25:36 ID6316-Client1 kernel: Jul 21 14:25:36 ID6316-Client1 kernel: LustreError: dumping log to /tmp/lustre-log.1279736736.5464 And subsequently: Jul 21 15:40:10 ID6316-Client1 kernel: BUG: soft lockup - CPU#6 stuck for 10s! [nfsd:5324] Jul 21 15:40:10 ID6316-Client1 kernel: CPU 6: Jul 21 15:40:10 ID6316-Client1 kernel: Modules linked in: iptable_filter(U) ip_tables(U) mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfsd(U) exportfs(U) nfs_acl(U) auth_rpcgss(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) lockd(U) sunrpc(U) bonding(U) ipt_REJECT(U) ip6t_REJECT(U) xt_tcpudp(U) ip6table_filter(U) ip6_tables(U) x_tables(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) cpufreq_ondemand(U) acpi_cpufreq(U) freq_table(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) igb(U) ixgbe(U) 8021q(U) joydev(U) sg(U) i2c_i801(U) serio_raw(U) i2c_core(U) pcspkr(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) usb_storage(U) ata_piix(U) libata(U) shpchp(U) 3w_9xxx(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) Jul 21 15:40:10 ID6316-Client1 kernel: Pid: 5324, comm: nfsd Tainted: G 2.6.18-164.11.1.el5_lustre.1.8.2 #1 Jul 21 15:40:10 ID6316-Client1 kernel: RIP: 0010:[<ffffffff80064bfc>] [<ffffffff80064bfc>] .text.lock.spinlock+0x2/0x30 Jul 21 15:40:10 ID6316-Client1 kernel: RSP: 0018:ffff810827163478 EFLAGS: 00000286 Jul 21 15:40:10 ID6316-Client1 kernel: RAX: ffff81083ceea860 RBX: ffff8113b7703cc0 RCX: ffff8112e855dd10 Jul 21 15:40:10 ID6316-Client1 kernel: RDX: ffff810826a28040 RSI: ffff810827163760 RDI: ffff810f59ad4a40 Jul 21 15:40:10 ID6316-Client1 kernel: RBP: ffffffff88677168 R08: 000000000000012d R09: ffff810c8e7e4ae0 Jul 21 15:40:10 ID6316-Client1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8112e855dd18 Jul 21 15:40:10 ID6316-Client1 kernel: R13: 0000000000000000 R14: ffff81131fc73c00 R15: ffffffff8867219a Jul 21 15:40:10 ID6316-Client1 kernel: FS: 00002b0c882886e0(0000) GS:ffff81086a2e1140(0000) knlGS:0000000000000000 Jul 21 15:40:10 ID6316-Client1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jul 21 15:40:10 ID6316-Client1 kernel: CR2: 00002aaaae52c000 CR3: 0000000000201000 CR4: 00000000000006e0 Jul 21 15:40:10 ID6316-Client1 kernel: Jul 21 15:40:10 ID6316-Client1 kernel: Call Trace: Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88805e0d>] :lov:lov_stripe_lock+0x3d/0x80 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8882260e>] :lov:lov_update_enqueue_set+0x27e/0x4a0 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8860a1b0>] :obdclass:class_handle2object+0xe0/0x170 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88768c8f>] :osc:osc_enqueue_fini+0x1af/0x240 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88768dd8>] :osc:osc_enqueue_interpret+0xb8/0x180 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff886bb240>] :ptlrpc:lustre_swab_ost_lvb+0x0/0x40 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff886ab181>] :ptlrpc:ptlrpc_check_set+0x11a1/0x1450 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8004b344>] try_to_del_timer_sync+0x51/0x5a Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff886ae34a>] :ptlrpc:ptlrpc_set_wait+0x36a/0x660 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88808442>] :lov:lov_enqueue+0x612/0x8b0 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8008c86b>] default_wake_function+0x0/0xe Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff886ab9b9>] :ptlrpc:ptlrpc_prep_set+0x1e9/0x290 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88860eda>] :lustre:ll_glimpse_size+0x63a/0xc30 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8889b440>] :lustre:ll_mdc_blocking_ast+0x0/0x520 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8889adc6>] :lustre:ll_lookup_it+0x776/0x7c0 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8889b440>] :lustre:ll_mdc_blocking_ast+0x0/0x520 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8876cd50>] :osc:osc_extent_blocking_cb+0x0/0x2b0 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88692e50>] :ptlrpc:ldlm_completion_ast+0x0/0x880 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88868510>] :lustre:ll_glimpse_callback+0x0/0x440 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8886160a>] :lustre:ll_inode_revalidate_it+0x13a/0x1d0 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff888616c4>] :lustre:ll_getattr_it+0x24/0x110 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff888617e4>] :lustre:ll_getattr+0x34/0x40 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff885147fb>] :nfsd:encode_post_op_attr+0x3f/0x213 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8000d3a4>] dput+0x2c/0x113 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88514d50>] :nfsd:compose_entry_fh+0x113/0x121 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88514f8d>] :nfsd:encode_entry+0x22f/0x53c Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff886754f4>] :ptlrpc:ldlm_lock_add_to_lru+0x74/0xe0 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8867219a>] :ptlrpc:lock_res_and_lock+0xba/0xd0 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88676bb8>] :ptlrpc:ldlm_lock_decref_internal+0x538/0x7f0 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8867219a>] :ptlrpc:lock_res_and_lock+0xba/0xd0 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88677168>] :ptlrpc:__ldlm_handle2lock+0x2f8/0x360 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8867736a>] :ptlrpc:ldlm_lock_decref+0x9a/0xc0 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8885443d>] :lustre:ll_get_dir_page+0x63d/0x680 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff887e0d9a>] :mdc:mdc_set_lock_data+0x1da/0x250 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88692e50>] :ptlrpc:ldlm_completion_ast+0x0/0x880 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff885152a5>] :nfsd:nfs3svc_encode_entry_plus+0xb/0x10 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88855cdb>] :lustre:ll_readdir+0x87b/0x9e0 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8850b753>] :nfsd:nfsd_acceptable+0x0/0xd8 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8851529a>] :nfsd:nfs3svc_encode_entry_plus+0x0/0x10 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8001e8b7>] __dentry_open+0x101/0x1dc Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8851529a>] :nfsd:nfs3svc_encode_entry_plus+0x0/0x10 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff80035265>] vfs_readdir+0x77/0xa9 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8851529a>] :nfsd:nfs3svc_encode_entry_plus+0x0/0x10 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8850cea0>] :nfsd:nfsd_readdir+0x6d/0xc5 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88514122>] :nfsd:nfsd3_proc_readdirplus+0xf8/0x220 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff885091db>] :nfsd:nfsd_dispatch+0xd8/0x1d6 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8843d529>] :sunrpc:svc_process+0x454/0x71b Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff80064644>] __down_read+0x12/0x92 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff885095a1>] :nfsd:nfsd+0x0/0x2cb Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff88509746>] :nfsd:nfsd+0x1a5/0x2cb Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff885095a1>] :nfsd:nfsd+0x0/0x2cb Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff885095a1>] :nfsd:nfsd+0x0/0x2cb Jul 21 15:40:10 ID6316-Client1 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Jul 21 15:40:10 ID6316-Client1 kernel: -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100721/bc50bda5/attachment-0001.html