Hello, In one of our Lustre cluster, We have many softlockup that seems to come from a contention on the Lustre Spinlock LNET_LOCK (the_lnet.ln_lock). Indeed, several Lustre daemons are waiting for this LNET_LOCK spinlock and a Lustre daemon is executing lnet_match_md() function with this acquired spinlock. lnet_match_md() seems to have problems to manage a list of packets which contains 90000 elements. Do you know today a limitation for managing such a big list ? Have you any idea, information that can help me to progress on this problem ? Lustre : 14.8.1 Kernel : 2.6.18 More traces below : 0xe0000004619e0000 0 30567 1 0 0x400040 - kiblnd_sd_02 #1 [BSP:e0000004619e12d8] lnet_match_md at a0000002052c9000 #2 [BSP:e0000004619e1150] lnet_parse at a0000002052dbc10 #3 [BSP:e0000004619e10f8] kiblnd_handle_rx at a000000205825ba0 #4 [BSP:e0000004619e10a0] kiblnd_rx_complete at a000000205827890 #5 [BSP:e0000004619e1080] kiblnd_complete at a0000002058348c0 #6 [BSP:e0000004619e0fc8] kiblnd_scheduler at a000000205835b00 #7 [BSP:e0000004619e0fa0] kernel_thread_helper at a000000100014810 #8 [BSP:e0000004619e0fa0] start_kernel_thread at a0000001000090c0 crash> PID: 30572 TASK: e000000461a30000 CPU: 1 COMMAND: "kiblnd_sd_07" #1 [BSP:e000000461a318d8] serial_in at a000000100380cf0 #2 [BSP:e000000461a31890] serial8250_console_putchar at a000000100386ac0 #3 [BSP:e000000461a31850] uart_console_write at a00000010037f1f0 #4 [BSP:e000000461a317e0] serial8250_console_write at a000000100386fa0 #5 [BSP:e000000461a31798] __call_console_drivers at a00000010007c5a0 #6 [BSP:e000000461a31768] _call_console_drivers at a00000010007c700 #7 [BSP:e000000461a316f8] release_console_sem at a00000010007ce10 #8 [BSP:e000000461a31628] vprintk at a00000010007d5c0 #9 [BSP:e000000461a315c0] printk at a00000010007d8a0 #10 [BSP:e000000461a31558] ia64_dump_bs at a000000100012220 #11 [BSP:e000000461a31508] ia64_do_show_stack at a000000100012330 #12 [BSP:e000000461a314e0] unw_init_running at a00000010000cb90 #13 [BSP:e000000461a314c0] show_stack at a000000100012400 #14 [BSP:e000000461a314a8] dump_stack at a000000100012450 #15 [BSP:e000000461a31460] softlockup_tick at a0000001000dbbc0 #16 [BSP:e000000461a31448] run_local_timers at a000000100097470 #17 [BSP:e000000461a31418] update_process_times at a000000100097590 #18 [BSP:e000000461a313b8] timer_interrupt at a000000100039720 #19 [BSP:e000000461a31378] handle_IRQ_event at a0000001000dcfb0 #20 [BSP:e000000461a31318] __do_IRQ at a0000001000dd200 #21 [BSP:e000000461a312e0] ia64_handle_irq at a000000100011580 #22 [BSP:e000000461a312e0] ia64_leave_kernel at a00000010000c4e0 #23 [BSP:e000000461a312e0] ia64_spinlock_contention at a000000100009150 #24 [BSP:e000000461a312d8] _spin_lock at a00000010053fd00 #25 [BSP:e000000461a31150] lnet_parse at a0000002052dbb00 #26 [BSP:e000000461a310f8] kiblnd_handle_rx at a000000205825ba0 #27 [BSP:e000000461a310a0] kiblnd_rx_complete at a000000205827890 #28 [BSP:e000000461a31080] kiblnd_complete at a0000002058348c0 #29 [BSP:e000000461a30fc8] kiblnd_scheduler at a000000205835b00 #30 [BSP:e000000461a30fa0] kernel_thread_helper at a000000100014810 #31 [BSP:e000000461a30fa0] start_kernel_thread at a0000001000090c0 Thanks for any information, C?dric Lambert