Hello I''ve been struggling this week with one quite simple lustre setup, one MDT and one SUN X4540 OSS box. OSS hanged at some point due to broken disk and after reboot it usually either hits LBUG/assert or panics. MDT/OST e2fsck didn''t find anything alarming and full lfsck found only some orphaned objects. Next I''m going to roll back kernel version back from 1.8.7-wc to old one (1.8,3) but if that is not going to work I''m out of ideas what to do :-( Feb 24 17:22:28 sahara01 kernel: LustreError: 4883:0:(filter_io_26.c:178:dio_complete_routine()) ASSERTION(PageLocked(bvl->bv_page)) failed Feb 24 17:22:30 sahara01 kernel: LustreError: 4883:0:(filter_io_26.c:178:dio_complete_routine()) LBUG Feb 24 17:22:30 sahara01 kernel: LustreError: 4891:0:(filter_io_26.c:178:dio_complete_routine()) ASSERTION(PageLocked(bvl->bv_page)) failed Feb 24 17:22:30 sahara01 kernel: LustreError: 4891:0:(filter_io_26.c:178:dio_complete_routine()) LBUG Feb 24 17:22:30 sahara01 kernel: Pid: 4891, comm: md12_raid5 Feb 24 17:22:30 sahara01 kernel: Feb 24 17:22:30 sahara01 kernel: Call Trace: Feb 24 17:22:30 sahara01 kernel: [<ffffffff888286a1>] libcfs_debug_dumpstack+0x51/0x60 [libcfs] Feb 24 17:22:30 sahara01 kernel: [<ffffffff88828bda>] lbug_with_loc+0x7a/0xd0 [libcfs] Feb 24 17:22:30 sahara01 kernel: [<ffffffff88830fc0>] tracefile_init+0x0/0x110 [libcfs] Feb 24 17:22:30 sahara01 kernel: [<ffffffff88ce2248>] dio_complete_routine+0x1b8/0x2a0 [obdfilter] Feb 24 17:22:30 sahara01 kernel: [<ffffffff883bc277>] copy_data+0x169/0x17f [raid456] Feb 24 17:22:30 sahara01 kernel: [<ffffffff883c0519>] handle_stripe+0x223f/0x2567 [raid456] Feb 24 17:22:30 sahara01 kernel: [<ffffffff80062ff2>] thread_return+0x62/0xfe Feb 24 17:22:30 sahara01 kernel: [<ffffffff8021d333>] md_super_wait+0xb5/0xbc Feb 24 17:22:30 sahara01 kernel: [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4 Feb 24 17:22:30 sahara01 kernel: [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4 Feb 24 17:22:30 sahara01 kernel: [<ffffffff883c0999>] raid5d+0x158/0x18b [raid456] Feb 24 17:22:30 sahara01 kernel: [<ffffffff8003ac3b>] prepare_to_wait+0x34/0x61 Feb 24 17:22:30 sahara01 kernel: [<ffffffff8022075b>] md_thread+0xf8/0x10e Feb 24 17:22:30 sahara01 kernel: [<ffffffff800a2dff>] autoremove_wake_function+0x0/0x2e Feb 24 17:22:30 sahara01 kernel: [<ffffffff80220663>] md_thread+0x0/0x10e Feb 24 17:22:30 sahara01 kernel: [<ffffffff8003276f>] kthread+0xfe/0x132 Feb 24 17:22:30 sahara01 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Feb 24 17:22:30 sahara01 kernel: [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4 Feb 24 17:22:30 sahara01 kernel: [<ffffffff80032671>] kthread+0x0/0x132 Feb 24 17:22:30 sahara01 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 or: ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at fs/bio.c:222 invalid opcode: 0000 [1] SMP last sysfs file: /class/infiniband_mad/umad0/port CPU 10 Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) jbd2(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) lockd(U) sunrpc(U) ip_conntrack_netbios_ns(U) iptable_nat(U) ip_nat(U) ipt_REJECT(U) ipt_LOG(U) xt_limit(U) xt_state(U) ip_conntrack(U) nfnetlink(U) iptable_filter(U) ip_tables(U) ip6t_REJECT(U) xt_tcpudp(U) ip6table_filter(U) ip6_tables(U) x_tables(U) be2iscsi(U) ib_iser(U) iscsi_tcp(U) bnx2i(U) cnic(U) uio(U) cxgb3i(U) libcxgbi(U) cxgb3(U) libiscsi_tcp(U) libiscsi2(U) scsi_transport_iscsi2(U) scsi_transport_iscsi(U) ib_sdp(U) ib_ipoib(U) ipoib_helper(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) rdma_ucm(U) rdma_cm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) ib_cm(U) iw_cm(U) ib_addr(U) ib_sa(U) mlx4_ib(U) ib_mad(U) ib_core(U) mptctl(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) raid456(U) xor(U) video(U) backlight(U) sbs(U) power_meter(U) i2c_ec(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) mlx4_core(U) amd64_edac_mod(U) k10temp(U) sg(U) edac_mc(U) forcedeth(U) i2c_nforce2(U) hwmon(U) 8021q(U) tpm_tis(U) tpm(U) tpm_bios(U) i2c_core(U) pcspkr(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) usb_storage(U) mptfc(U) scsi_transport_fc(U) mptspi(U) scsi_transport_spi(U) shpchp(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) sd_mod(U) scsi_mod(U) raid1(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) Pid: 4891, comm: md11_raid5 Tainted: G? ???---- 2.6.18-274.3.1.el5_lustre.g9500ebf #1 RIP: 0010:[<ffffffff8002de97>]? [<ffffffff8002de97>] bio_put+0xa/0x31 RSP: 0018:ffff81082d2bdc78? EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100 RDX: ffff8103ea3baac0 RSI: 0000000000000001 RDI: ffff8103ea3baac0 RBP: ffff8108203edc08 R08: 0000000000000000 R09: 0000000000000036 R10: ffff81042e078000 R11: ffff810001000000 R12: ffff8103ea3baac0 R13: ffff81042e078000 R14: ffff8104017e28c0 R15: 0000000000000000 FS:? 00002aca8f44e6e0(0000) GS:ffff81043e38fa40(0000) knlGS:0000000000000000 CS:? 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000036be69a830 CR3: 0000000000201000 CR4: 00000000000006e0 Process md11_raid5 (pid: 4891, threadinfo ffff81082d2bc000, task ffff810426c687a0) Stack:? ffffffff88cd32f8 ffff8103e9ea4000 ffff81082d2bdde0 ffff81082d2bde0c 0000000200000040 ffff81082d2bde00 0000000a2f091140 ffff81043e0c42a0 0000000000000000 ffff81042fd4c1e0 0000000000000000 0000000100000140 Call Trace: [<ffffffff88cd32f8>] :obdfilter:dio_complete_routine+0x268/0x2a0 [<ffffffff883c2519>] :raid456:handle_stripe+0x223f/0x2567 [<ffffffff80062ff2>] thread_return+0x62/0xfe [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4 [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4 [<ffffffff883c2999>] :raid456:raid5d+0x158/0x18b [<ffffffff8003ac3b>] prepare_to_wait+0x34/0x61 [<ffffffff8022075b>] md_thread+0xf8/0x10e [<ffffffff800a2dff>] autoremove_wake_function+0x0/0x2e [<ffffffff80220663>] md_thread+0x0/0x10e [<ffffffff8003276f>] kthread+0xfe/0x132 [<ffffffff80015f80>] do_exit+0x949/0x955 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032671>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 Code: 0f 0b 68 b1 df 2b 80 c2 de 00 eb fe f0 ff 4f 50 0f 94 c0 84 RIP? [<ffffffff8002de97>] bio_put+0xa/0x31 RSP <ffff81082d2bdc78> <0>Kernel panic - not syncing: Fatal exception BR, Tommi