Adrian Ulrich
2010-Aug-13 17:49 UTC
[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
Hi, Since a few hours we have a problem with one of our OSTs: One (and only one) ll_ost_create_ process on one of the OSTs seems to go crazy and uses 100% CPU. Rebooting the OST + MDS didn''t help and there isn''t much going on on the filesystem itself: - /proc/fs/lustre/ost/OSS/ost_create/stats is almost ''static'' - iostat shows almost no usage - ib traffic is < 100 kb/s The MDS logs this each ~3 minutes: Aug 13 19:11:14 mds1 kernel: LustreError: 11-0: an error occurred while communicating with 10.201.62.23 at o2ib. The ost_connect operation failed with -16 ..and later: Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(osc_create.c:390:osc_create()) lustre1-OST0005-osc: oscc recovery failed: -110 Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(lov_obd.c:1129:lov_clear_orphans()) error in orphan recovery on OST idx 5/32: rc = -110 Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(mds_lov.c:1022:__mds_lov_synchronize()) lustre1-OST0005_UUID failed at mds_lov_clear_orphans: -110 Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(mds_lov.c:1031:__mds_lov_synchronize()) lustre1-OST0005_UUID sync failed -110, deactivating Aug 13 19:17:54 mds1 kernel: Lustre: 6544:0:(import.c:508:import_select_connection()) lustre1-OST0005-osc: tried all connections, increasing latency to 51s oops! (lustre1-OST0005 is hosted on the OSS with the crazy ll_ost_create process) On the affected OSS we get Lustre: 11764:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005: refuse reconnection from lustre1-mdtlov_UUID at 10.201.62.11@o2ib to 0xffff8102164d0200; still busy with 2 active RPCs $ llog_reader lustre-log.1281718692.11833 shows: Bit 0 of 284875 not set Bit -32510 of 284875 not set Bit -32510 of 284875 not set Bit -32511 of 284875 not set Bit 0 of 284875 not set Bit -1 of 284875 not set Bit 0 of 284875 not set Bit -32510 of 284875 not set Bit -32510 of 284875 not set Bit -32510 of 284875 not set Bit -1 of 284875 not set Bit 0 of 284875 not set Segmentation fault <-- *ouch* And we get tons of soft-cpu lockups :-/ Any ideas? Regards, Adrian
Alexey Lyashkov
2010-Aug-13 18:15 UTC
[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
On Aug 13, 2010, at 20:49, Adrian Ulrich wrote:> Hi, > > Since a few hours we have a problem with one of our OSTs: > > One (and only one) ll_ost_create_ process on one of the OSTs > seems to go crazy and uses 100% CPU. > > Rebooting the OST + MDS didn''t help and there isn''t much > going on on the filesystem itself: > > - /proc/fs/lustre/ost/OSS/ost_create/stats is almost ''static'' > - iostat shows almost no usage > - ib traffic is < 100 kb/s > > > The MDS logs this each ~3 minutes: > Aug 13 19:11:14 mds1 kernel: LustreError: 11-0: an error occurred while communicating with 10.201.62.23 at o2ib. The ost_connect operation failed with -16 > ..and later: > Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(osc_create.c:390:osc_create()) lustre1-OST0005-osc: oscc recovery failed: -110 > Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(lov_obd.c:1129:lov_clear_orphans()) error in orphan recovery on OST idx 5/32: rc = -110 > Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(mds_lov.c:1022:__mds_lov_synchronize()) lustre1-OST0005_UUID failed at mds_lov_clear_orphans: -110 > Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(mds_lov.c:1031:__mds_lov_synchronize()) lustre1-OST0005_UUID sync failed -110, deactivating > Aug 13 19:17:54 mds1 kernel: Lustre: 6544:0:(import.c:508:import_select_connection()) lustre1-OST0005-osc: tried all connections, increasing latency to 51s >-110 = -ETIMEOUT, operation don''t finished before deadline, or network problem.> oops! (lustre1-OST0005 is hosted on the OSS with the crazy ll_ost_create process)ll_ost_create work on destroy old created objects, i think.> > On the affected OSS we get > Lustre: 11764:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005: refuse reconnection from lustre1-mdtlov_UUID at 10.201.62.11@o2ib to 0xffff8102164d0200; still busy with 2 active RPCs > > > $ llog_reader lustre-log.1281718692.11833 shows:Llog_reader is tool to read configuration llog, if you want decode debug log, you should use lctl df $file > $output> > And we get tons of soft-cpu lockups :-/ > > Any ideas?please post soft-lookup report. one of possibility, MDS ask too many objects to create on that OST or OST have too many reconnects.> > > Regards, > Adrian > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Adrian Ulrich
2010-Aug-13 18:29 UTC
[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
Hi Alexey,> Llog_reader is tool to read configuration llog, if you want decode debug log, you should use lctl df $file > $outputOh, sorry for mixing this up. ''lctl df'' doesn''t show much new stuff: 00010000:00000400:0:1281721514.362102:0:13008:0:(ldlm_lib.c:541:target_handle_reconnect()) lustre1-OST0005: 9f880a5e-2331-07a8-8611-d6e3102f466e reconnecting 00010000:00000400:0:1281721514.362107:0:13008:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005: refuse reconnection from 9f880a5e-2331-07a8-8611-d6e3102f466e at 10.201.48.12@o2ib to 0xffff8101c93b6000; still busy with 1 active RPCs 00010000:00020000: 00010000:00000400:7:1281721525.880767:0:11822:0:(ldlm_lib.c:541:target_handle_reconnect()) lustre1-OST0005: lustre1-mdtlov_UUID reconnecting> please post soft-lookup report. one of possibility, MDS ask too many objects to create on that OST or OST have too many reconnects.LustreError: 12972:0:(ldlm_lib.c:1863:target_send_reply_msg()) Skipped 71 previous similar messages BUG: soft lockup - CPU#4 stuck for 59s! [ll_ost_creat_00:11833] CPU 4: Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) raid456(U) xor(U) raid1(U) netconsole(U) lockd(U) sunrpc(U) rdma_ucm(U) qlgc_vnic(U) ib_sdp(U) rdma_cm (U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) iw_nes(U) iw_cxgb3(U) cxgb3(U) ib_ipath(U) ib_mthca(U) mptctl(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) sr_mod(U) cdrom(U) sg(U) tpm_infineon(U) tpm(U) tpm_bios(U) mlx4_core(U) i5000_edac(U) edac_mc(U) i2c_i801(U) pcspkr(U) e1000e(U) i2c_core(U) serio_raw(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) usb_storage(U) ahci(U) ata_piix(U) libata(U) mptsas(U) scsi_transport_sas(U) mptfc(U) scsi_transport_fc(U) mptspi(U) mptscsih(U) mptbase(U) scsi_transport_spi(U) shpchp(U) aacraid(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) Pid: 11833, comm: ll_ost_creat_00 Tainted: G 2.6.18-128.7.1.el5_lustre.1.8.1.1 #1 RIP: 0010:[<ffffffff88b49af4>] [<ffffffff88b49af4>] :ldiskfs:ldiskfs_find_entry+0x1d4/0x5c0 RSP: 0018:ffff8101e715d500 EFLAGS: 00000202 RAX: 0000000000000000 RBX: 0000000000000008 RCX: 000000000c91a9f1 RDX: ffff8101e8893800 RSI: ffff8101e715d4e8 RDI: ffff81010773e838 RBP: 0000000000000002 R08: ffff81017bd9cff8 R09: ffff81017bd9c000 R10: ffff810216dfb000 R11: 000000004c6578dc R12: ffff81017d41e6d0 R13: ffffffff80063b4c R14: ffff8101e715d5b8 R15: ffffffff80014fae FS: 00002ab1d8b97220(0000) GS:ffff81021fc74bc0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000009c9c178 CR3: 0000000000201000 CR4: 00000000000006e0 Call Trace: [<ffffffff8001a1b8>] vsnprintf+0x559/0x59e [<ffffffff8005ba4d>] cache_alloc_refill+0x106/0x186 [<ffffffff88b4bf63>] :ldiskfs:ldiskfs_lookup+0x53/0x290 [<ffffffff800366e8>] __lookup_hash+0x10b/0x130 [<ffffffff800e2c9b>] lookup_one_len+0x53/0x61 [<ffffffff88bd71ed>] :obdfilter:filter_fid2dentry+0x42d/0x730 [<ffffffff88bd3383>] :obdfilter:filter_statfs+0x273/0x350 [<ffffffff8026f08b>] __down_trylock+0x44/0x4e [<ffffffff88bd2f00>] :obdfilter:filter_parent_lock+0x20/0x220 [<ffffffff88bd7d43>] :obdfilter:filter_precreate+0x843/0x19e0 [<ffffffff887e0923>] :lnet:lnet_ni_send+0x93/0xd0 [<ffffffff8000d0bd>] dput+0x23/0x10a [<ffffffff88be1e19>] :obdfilter:filter_create+0x10b9/0x15e0 [<ffffffff887e6ad2>] :lnet:LNetPut+0x702/0x800 [<ffffffff888e9a13>] :ptlrpc:ptl_send_buf+0x3f3/0x5b0 [<ffffffff888ef394>] :ptlrpc:lustre_msg_add_version+0x34/0x110 [<ffffffff888ea198>] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0 [<ffffffff888f1f69>] :ptlrpc:lustre_pack_reply+0x29/0xb0 [<ffffffff88ba161d>] :ost:ost_handle+0x131d/0x5a70 [<ffffffff8001a1b8>] vsnprintf+0x559/0x59e [<ffffffff88797978>] :libcfs:libcfs_debug_vmsg2+0x6e8/0x990 [<ffffffff80148e8c>] __next_cpu+0x19/0x28 [<ffffffff80088f36>] find_busiest_group+0x20d/0x621 [<ffffffff888f3f05>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0 [<ffffffff80089d8d>] enqueue_task+0x41/0x56 [<ffffffff888f8c1d>] :ptlrpc:ptlrpc_check_req+0x1d/0x110 [<ffffffff888fb357>] :ptlrpc:ptlrpc_server_handle_request+0xa97/0x1170 [<ffffffff8003d382>] lock_timer_base+0x1b/0x3c [<ffffffff8008881d>] __wake_up_common+0x3e/0x68 [<ffffffff888fee08>] :ptlrpc:ptlrpc_main+0x1218/0x13e0 [<ffffffff8008a3f3>] default_wake_function+0x0/0xe [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff888fdbf0>] :ptlrpc:ptlrpc_main+0x0/0x13e0 [<ffffffff8005dfa7>] child_rip+0x0/0x11 Btw: We are running 1.8.1.1 (with rhel kernel 2.6.18-128.7.1.el5_lustre.1.8.1.1) Regards, Adrian
Alexey Lyashkov
2010-Aug-13 18:54 UTC
[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
Adrian, in general soft-lookup isn''t error, that just notice about some operation is need too many time (more then 10s i think). attached soft-lookup say - OST is busy with creating objects after MDS<>OST reconnect, i think you have too busy disks or overloaded node. to avoid that softlookup you can reduce maximal number of precreated object via procfs (if my memory correct). why clients has reconnects is different question, but that can be related to soft-lookup. if you have slow disks - client can be disconnected before they request is processing, and that request blocked to reconnect from that client. On Aug 13, 2010, at 21:29, Adrian Ulrich wrote:> Oh, sorry for mixing this up. > > ''lctl df'' doesn''t show much new stuff: > > 00010000:00000400:0:1281721514.362102:0:13008:0:(ldlm_lib.c:541:target_handle_reconnect()) lustre1-OST0005: > 9f880a5e-2331-07a8-8611-d6e3102f466e reconnecting > 00010000:00000400:0:1281721514.362107:0:13008:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005: > refuse reconnection from 9f880a5e-2331-07a8-8611-d6e3102f466e at 10.201.48.12@o2ib to 0xffff8101c93b6000; still busy with 1 active RPCs > 00010000:00020000: > 00010000:00000400:7:1281721525.880767:0:11822:0:(ldlm_lib.c:541:target_handle_reconnect()) lustre1-OST0005: > lustre1-mdtlov_UUID reconnecting >> >> please post soft-lookup report. one of possibility, MDS ask too many objects to create on that OST or OST have too many reconnects. > > LustreError: 12972:0:(ldlm_lib.c:1863:target_send_reply_msg()) Skipped 71 previous similar messages > BUG: soft lockup - CPU#4 stuck for 59s! [ll_ost_creat_00:11833] > CPU 4: > Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) > ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) raid456(U) xor(U) raid1(U) netconsole(U) lockd(U) sunrpc(U) rdma_ucm(U) qlgc_vnic(U) ib_sdp(U) rdma_cm > (U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) iw_nes(U) iw_cxgb3(U) > cxgb3(U) ib_ipath(U) ib_mthca(U) mptctl(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) > asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) sr_mod(U) cdrom(U) sg(U) tpm_infineon(U) > tpm(U) tpm_bios(U) mlx4_core(U) i5000_edac(U) edac_mc(U) i2c_i801(U) pcspkr(U) e1000e(U) i2c_core(U) serio_raw(U) dm_raid45(U) dm_message(U) > dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) usb_storage(U) ahci(U) ata_piix(U) libata(U) mptsas(U) scsi_transport_sas(U) mptfc(U) > scsi_transport_fc(U) mptspi(U) mptscsih(U) mptbase(U) scsi_transport_spi(U) shpchp(U) aacraid(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) > ohci_hcd(U) ehci_hcd(U) > Pid: 11833, comm: ll_ost_creat_00 Tainted: G 2.6.18-128.7.1.el5_lustre.1.8.1.1 #1 > RIP: 0010:[<ffffffff88b49af4>] [<ffffffff88b49af4>] :ldiskfs:ldiskfs_find_entry+0x1d4/0x5c0 > RSP: 0018:ffff8101e715d500 EFLAGS: 00000202 > RAX: 0000000000000000 RBX: 0000000000000008 RCX: 000000000c91a9f1 > RDX: ffff8101e8893800 RSI: ffff8101e715d4e8 RDI: ffff81010773e838 > RBP: 0000000000000002 R08: ffff81017bd9cff8 R09: ffff81017bd9c000 > R10: ffff810216dfb000 R11: 000000004c6578dc R12: ffff81017d41e6d0 > R13: ffffffff80063b4c R14: ffff8101e715d5b8 R15: ffffffff80014fae > FS: 00002ab1d8b97220(0000) GS:ffff81021fc74bc0(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000009c9c178 CR3: 0000000000201000 CR4: 00000000000006e0 > > Call Trace: > [<ffffffff8001a1b8>] vsnprintf+0x559/0x59e > [<ffffffff8005ba4d>] cache_alloc_refill+0x106/0x186 > [<ffffffff88b4bf63>] :ldiskfs:ldiskfs_lookup+0x53/0x290 > [<ffffffff800366e8>] __lookup_hash+0x10b/0x130 > [<ffffffff800e2c9b>] lookup_one_len+0x53/0x61 > [<ffffffff88bd71ed>] :obdfilter:filter_fid2dentry+0x42d/0x730 > [<ffffffff88bd3383>] :obdfilter:filter_statfs+0x273/0x350 > [<ffffffff8026f08b>] __down_trylock+0x44/0x4e > [<ffffffff88bd2f00>] :obdfilter:filter_parent_lock+0x20/0x220 > [<ffffffff88bd7d43>] :obdfilter:filter_precreate+0x843/0x19e0 > [<ffffffff887e0923>] :lnet:lnet_ni_send+0x93/0xd0 > [<ffffffff8000d0bd>] dput+0x23/0x10a > [<ffffffff88be1e19>] :obdfilter:filter_create+0x10b9/0x15e0 > [<ffffffff887e6ad2>] :lnet:LNetPut+0x702/0x800 > [<ffffffff888e9a13>] :ptlrpc:ptl_send_buf+0x3f3/0x5b0 > [<ffffffff888ef394>] :ptlrpc:lustre_msg_add_version+0x34/0x110 > [<ffffffff888ea198>] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0 > [<ffffffff888f1f69>] :ptlrpc:lustre_pack_reply+0x29/0xb0 > [<ffffffff88ba161d>] :ost:ost_handle+0x131d/0x5a70 > [<ffffffff8001a1b8>] vsnprintf+0x559/0x59e > [<ffffffff88797978>] :libcfs:libcfs_debug_vmsg2+0x6e8/0x990 > [<ffffffff80148e8c>] __next_cpu+0x19/0x28 > [<ffffffff80088f36>] find_busiest_group+0x20d/0x621 > [<ffffffff888f3f05>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0 > [<ffffffff80089d8d>] enqueue_task+0x41/0x56 > [<ffffffff888f8c1d>] :ptlrpc:ptlrpc_check_req+0x1d/0x110 > [<ffffffff888fb357>] :ptlrpc:ptlrpc_server_handle_request+0xa97/0x1170 > [<ffffffff8003d382>] lock_timer_base+0x1b/0x3c > [<ffffffff8008881d>] __wake_up_common+0x3e/0x68 > [<ffffffff888fee08>] :ptlrpc:ptlrpc_main+0x1218/0x13e0 > [<ffffffff8008a3f3>] default_wake_function+0x0/0xe > [<ffffffff8005dfb1>] child_rip+0xa/0x11 > [<ffffffff888fdbf0>] :ptlrpc:ptlrpc_main+0x0/0x13e0 > [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > Btw: We are running 1.8.1.1 (with rhel kernel 2.6.18-128.7.1.el5_lustre.1.8.1.1) > > Regards, > Adrian
Adrian Ulrich
2010-Aug-13 19:01 UTC
[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
Hi Alexey,> in general soft-lookup isn''t error, that just notice about some operation is need too many time (more then 10s i think). > attached soft-lookup say - OST is busy with creating objects after MDS<>OST reconnect,Yes, i know that a soft-lockup doesn''t mean that i hit a bug but having ll_ost_creat_* wasting 100% CPU doesn''t seem to be normal.> i think you have too busy disks or overloaded node.Disk %busy is < 5% for all attached disks. The OST is doing almost nothing (there are a few read()''s, that''s all)> if you have slow disks - client can be disconnected before they request is processing, and that request blocked to reconnect from that client.The recovery of the clients seems to be ok: all clients can write/read data from the OST but there is something wrong between the MDS<->OST0005. But this might just be a side-effect of the ll_ost_creat_* issue :-/ Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works.
Andreas Dilger
2010-Aug-13 21:54 UTC
[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
On 2010-08-13, at 12:29, Adrian Ulrich wrote:> Pid: 11833, comm: ll_ost_creat_00 Tainted: G 2.6.18-128.7.1.el5_lustre.1.8.1.1 #1 > :ldiskfs:ldiskfs_find_entry+0x1d4/0x5c0 > [<ffffffff88b4bf63>] :ldiskfs:ldiskfs_lookup+0x53/0x290 > [<ffffffff800366e8>] __lookup_hash+0x10b/0x130 > [<ffffffff800e2c9b>] lookup_one_len+0x53/0x61 > [<ffffffff88bd71ed>] :obdfilter:filter_fid2dentry+0x42d/0x730 > [<ffffffff88bd3383>] :obdfilter:filter_statfs+0x273/0x350 > [<ffffffff88bd2f00>] :obdfilter:filter_parent_lock+0x20/0x220 > [<ffffffff88bd7d43>] :obdfilter:filter_precreate+0x843/0x19e0 > [<ffffffff88be1e19>] :obdfilter:filter_create+0x10b9/0x15e0 > [<ffffffff88ba161d>] :ost:ost_handle+0x131d/0x5a70Two possibilities I can see: - MDS sent very large create request. Compare the values from: mds> lctl get_param osc.*.prealloc_* oss> lctl get_param obdfilter.*.last_id and see if they match. If last_id is growing quickly the thread is busy precreating many objects for some reason. If this OST has a much higher prealloc_last_id on the MDS, something is bad in the MDS lov_objids file. - the on-disk structure of the object directory for this OST is corrupted. Run "e2fsck -fp /dev/{ostdev}" on the unmounted OST filesystem. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Adrian Ulrich
2010-Aug-14 08:28 UTC
[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
> - the on-disk structure of the object directory for this OST is corrupted. > Run "e2fsck -fp /dev/{ostdev}" on the unmounted OST filesystem.e2fsck fixed it: The OST is now running since 40 minutes without problems: e2fsck 1.41.6.sun1 (30-May-2009) lustre1-OST0005: recovering journal lustre1-OST0005 has been mounted 72 times without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Directory inode 440696867, block 493, offset 0: directory corrupted Salvage<y>? yes Directory inode 440696853, block 517, offset 0: directory corrupted Salvage<y>? yes Directory inode 440696842, block 560, offset 0: directory corrupted Salvage<y>? yes Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached inode 17769156 Connect to /lost+found<y>? yes Inode 17769156 ref count is 2, should be 1. Fix<y>? yes Unattached zero-length inode 17883901. Clear<y>? yes Pass 5: Checking group summary information lustre1-OST0005: ***** FILE SYSTEM WAS MODIFIED ***** lustre1-OST0005: 44279/488382464 files (15.4% non-contiguous), 280329314/1953524992 blocks But shouldn''t the journal of ext3/ldiskfs make running e2fsck unnecessary? Have a nice weekend and thanks a lot for the fast reply! Regards, Adrian
Andreas Dilger
2010-Aug-14 15:29 UTC
[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
On 2010-08-14, at 2:28, Adrian Ulrich <adrian at blinkenlights.ch> wrote:>> - the on-disk structure of the object directory for this OST is corrupted. >> Run "e2fsck -fp /dev/{ostdev}" on the unmounted OST filesystem. > > e2fsck fixed it: The OST is now running since 40 minutes without problems: > > But shouldn''t the journal of ext3/ldiskfs make running e2fsck unnecessary?The journal will prevent inconsistencies in the filesystem in case of a crash. It cannot prevent corruption of the on-disk data, inconsistencies caused by cache enabled on the disks or in a RAID controller, software bugs, memory corruption, bad cables, etc. That is why it is still a good idea for users to run e2fsck periodically on a filesystem. If you are using LVM there is an lvcheck script I wrote that can check a filesystem snapshot on a running system, but otherwise you should do it whenever the opportunity arises. Cheers, Andreas
Adrian Ulrich
2010-Aug-14 16:05 UTC
[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)
> The journal will prevent inconsistencies in the filesystem in case of a crash. > It cannot prevent corruption of the on-disk data, inconsistencies caused by cache > enabled on the disks or in a RAID controller, software bugs, memory corruption, bad cables, etc.The OSS is part of a ''Snowbird'' installation, so the RAID/Disk part should be fine. I hope that we ''just'' hit a small software bug :-/> That is why it is still a good idea for users to run e2fsck periodically on a filesystem.Ok, we will keep this in mind (e2fsck was surprisingly fast anyway!) Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works.