Aaron Knister
2009-Sep-27 22:46 UTC
[Lustre-discuss] [RESOLVED] Strange MDS Problem + Resolution
I wanted to post this here so in the event that anybody else stumbles across this problem they don''t spend hours banging their head against a brick wall. I was helping with a lustre disk setup that kept crashing. The lustre filesystem would hang and there would be one thread (ll_mdt_[0-9]*) that would be pegged at 100% of the cpu. It turns out there was some on disk inconsistencies as a result of the MDS crashing because it ran out of memory. A simple fsck of the MDT fixed the issue, after many hours of attempted debugging. We didn''t think the problem could be fixed by a simple fsck...but it makes sense. Here''s the call trace- BUG: soft lockup - CPU#0 stuck for 10s! [ll_mdt_26:12829] CPU 0: Modules linked in: mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) ldiskfs(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) ptlrpc (U) obdclass(U) lnet(U) lvfs(U) libcfs(U) rdma_ucm(U) ib_sdp(U) rdma_cm (U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ib_uverbs(U) ib_umad(U) iw_cxgb3(U) cxgb3(U) ib_ipath(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) crc16(U) ipmi_devintf(U) mptctl(U) mptbase(U) ipmi_si(U) ipmi_msghandler(U) dell_rbu(U) autofs4 (U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) dm_multipath(U) video(U) sbs(U) backlight (U) i2c_ec(U) i2c_core(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) pata_acpi(U) ata_piix(U) libata(U) sr_mod(U) sg(U) shpchp(U) ide_cd(U) i5000_edac(U) bnx2(U) serio_raw(U) edac_mc(U) cdrom(U) pcspkr(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) usb_storage(U) megaraid_sas(U) sd_mod(U) scsi_mod(U) ext3(U) jb (U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U) Pid: 12829, comm: ll_mdt_26 Tainted: G 2.6.18-92.1.10.el5_lustre. 1.6.6smp #1 RIP: 0010:[<ffffffff887ff8bc>] [<ffffffff887ff8bc>] :ldiskfs:do_split +0x3ec/0x560 RSP: 0018:ffff8103f4fab470 EFLAGS: 00000206 RAX: 0000000000000000 RBX: 0000000000000024 RCX: 0000000000000000 RDX: 0000000000000024 RSI: ffff8103aa719bb0 RDI: ffff8103aa719800 RBP: ffff8103fdd50d30 R08: 383030322e786e39 R09: 0000000031323730 R10: 000000006a3ef844 R11: ffff8103aa719cf8 R12: ffff81018bb81f70 R13: ffff81017ad46f70 R14: ffff810093d3cc10 R15: 0000000000000000 FS: 00002b55d25bc220(0000) GS:ffffffff803eb000(0000) knlGS: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000f1bff000 CR3: 00000004159d4000 CR4: 00000000000006e0 Call Trace: [<ffffffff88800395>] :ldiskfs:ldiskfs_add_entry+0x4f5/0x980 [<ffffffff8006d8f0>] do_gettimeofday+0x50/0x92 [<ffffffff88800e36>] :ldiskfs:ldiskfs_add_nondir+0x26/0x90 [<ffffffff88801756>] :ldiskfs:ldiskfs_create+0xf6/0x140 [<ffffffff888802ff>] :fsfilt_ldiskfs:fsfilt_ldiskfs_start+0x55f/0x630 [<ffffffff8003a049>] vfs_create+0xe6/0x158 [<ffffffff88b10453>] :mds:mds_open+0x15a3/0x332e [<ffffffff884c30e8>] :lvfs:entry_set_group_info+0xd8/0x2c0 [<ffffffff884c33fb>] :lvfs:alloc_entry+0x12b/0x140 [<ffffffff88666434>] :ko2iblnd:kiblnd_check_sends+0x644/0x7f0 [<ffffffff88546031>] :obdclass:class_handle2object+0xd1/0x160 [<ffffffff885a619e>] :ptlrpc:lock_res_and_lock+0xbe/0xe0 [<ffffffff88aed889>] :mds:mds_reint_rec+0x1d9/0x2b0 [<ffffffff88b14143>] :mds:mds_open_unpack+0x2f3/0x410 [<ffffffff88ae08da>] :mds:mds_reint+0x35a/0x420 [<ffffffff88adef62>] :mds:fixup_handle_for_resent_req+0x52/0x200 [<ffffffff88ae492c>] :mds:mds_intent_policy+0x48c/0xc40 [<ffffffff885db765>] :ptlrpc:ptlrpc_prep_set+0x1f5/0x2a0 [<ffffffff885ab926>] :ptlrpc:ldlm_lock_enqueue+0x186/0x990 [<ffffffff885a7a24>] :ptlrpc:ldlm_lock_remove_from_lru+0x74/0xe0 [<ffffffff885cd5c0>] :ptlrpc:ldlm_server_completion_ast+0x0/0x5c0 [<ffffffff885cae85>] :ptlrpc:ldlm_handle_enqueue+0xca5/0x12a0 [<ffffffff885cdb80>] :ptlrpc:ldlm_server_blocking_ast+0x0/0x6b2 [<ffffffff88ae9115>] :mds:mds_handle+0x4035/0x4cf0 [<ffffffff80143a09>] __next_cpu+0x19/0x28 [<ffffffff80089ab6>] find_busiest_group+0x20d/0x621
Johann Lombardi
2009-Sep-29 07:17 UTC
[Lustre-discuss] [RESOLVED] Strange MDS Problem + Resolution
On Sep 28, 2009, at 12:46 AM, Aaron Knister wrote:> I wanted to post this here so in the event that anybody else stumbles > across this problem they don''t spend hours banging their head against > a brick wall. I was helping with a lustre disk setup that kept > crashing. The lustre filesystem would hang and there would be one > thread (ll_mdt_[0-9]*) that would be pegged at 100% of the cpu. It > turns out there was some on disk inconsistencies as a result of the > MDS crashing because it ran out of memory. A simple fsck of the MDT > fixed the issue, after many hours of attempted debugging. We didn''t > think the problem could be fixed by a simple fsck...but it makes > sense.Recent kernels have additional checks (in do_split(), but in other places as well) to prevent this kind of problems (crash or infinite loop when the layout is corrupted). I wonder if this would catch this problem and return an error instead. Do you know where in do_split() the process was stuck? Cheers, Johann