thr3ads.net - Lustre discuss - [Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used

If this information is useful, please help other people find it:
Share via:

Adrian Ulrich

2010-Aug-13 17:49 UTC

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

Hi,

Since a few hours we have a problem with one of our OSTs:

One (and only one) ll_ost_create_ process on one of the OSTs
seems to go crazy and uses 100% CPU.

Rebooting the OST + MDS didn''t help and there isn''t much
going on on the filesystem itself:

 - /proc/fs/lustre/ost/OSS/ost_create/stats is almost ''static''
 - iostat shows almost no usage
 - ib traffic is < 100 kb/s


The MDS logs this each ~3 minutes:
 Aug 13 19:11:14 mds1 kernel: LustreError: 11-0: an error occurred while
communicating with 10.201.62.23 at o2ib. The ost_connect operation failed with
-16
..and later:
 Aug 13 19:17:16 mds1 kernel: LustreError:
10253:0:(osc_create.c:390:osc_create()) lustre1-OST0005-osc: oscc recovery
failed: -110
 Aug 13 19:17:16 mds1 kernel: LustreError:
10253:0:(lov_obd.c:1129:lov_clear_orphans()) error in orphan recovery on OST idx
5/32: rc = -110
 Aug 13 19:17:16 mds1 kernel: LustreError:
10253:0:(mds_lov.c:1022:__mds_lov_synchronize()) lustre1-OST0005_UUID failed at
mds_lov_clear_orphans: -110
 Aug 13 19:17:16 mds1 kernel: LustreError:
10253:0:(mds_lov.c:1031:__mds_lov_synchronize()) lustre1-OST0005_UUID sync
failed -110, deactivating
 Aug 13 19:17:54 mds1 kernel: Lustre:
6544:0:(import.c:508:import_select_connection()) lustre1-OST0005-osc: tried all
connections, increasing latency to 51s

oops! (lustre1-OST0005 is hosted on the OSS with the crazy ll_ost_create
process)

On the affected OSS we get
 Lustre: 11764:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005:
refuse reconnection from lustre1-mdtlov_UUID at 10.201.62.11@o2ib to
0xffff8102164d0200; still busy with 2 active RPCs


$ llog_reader lustre-log.1281718692.11833 shows:
Bit 0 of 284875 not set
Bit -32510 of 284875 not set
Bit -32510 of 284875 not set
Bit -32511 of 284875 not set
Bit 0 of 284875 not set
Bit -1 of 284875 not set
Bit 0 of 284875 not set
Bit -32510 of 284875 not set
Bit -32510 of 284875 not set
Bit -32510 of 284875 not set
Bit -1 of 284875 not set
Bit 0 of 284875 not set
Segmentation fault <-- *ouch*


And we get tons of soft-cpu lockups :-/

Any ideas?


Regards,
 Adrian

Alexey Lyashkov

2010-Aug-13 18:15 UTC

head link

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

On Aug 13, 2010, at 20:49, Adrian Ulrich wrote:
> Hi,
> 
> Since a few hours we have a problem with one of our OSTs:
> 
> One (and only one) ll_ost_create_ process on one of the OSTs
> seems to go crazy and uses 100% CPU.
> 
> Rebooting the OST + MDS didn''t help and there isn''t much
> going on on the filesystem itself:
> 
> - /proc/fs/lustre/ost/OSS/ost_create/stats is almost
''static''
> - iostat shows almost no usage
> - ib traffic is < 100 kb/s
> 
> 
> The MDS logs this each ~3 minutes:
> Aug 13 19:11:14 mds1 kernel: LustreError: 11-0: an error occurred while
communicating with 10.201.62.23 at o2ib. The ost_connect operation failed with
-16
> ..and later:
> Aug 13 19:17:16 mds1 kernel: LustreError:
10253:0:(osc_create.c:390:osc_create()) lustre1-OST0005-osc: oscc recovery
failed: -110
> Aug 13 19:17:16 mds1 kernel: LustreError:
10253:0:(lov_obd.c:1129:lov_clear_orphans()) error in orphan recovery on OST idx
5/32: rc = -110
> Aug 13 19:17:16 mds1 kernel: LustreError:
10253:0:(mds_lov.c:1022:__mds_lov_synchronize()) lustre1-OST0005_UUID failed at
mds_lov_clear_orphans: -110
> Aug 13 19:17:16 mds1 kernel: LustreError:
10253:0:(mds_lov.c:1031:__mds_lov_synchronize()) lustre1-OST0005_UUID sync
failed -110, deactivating
> Aug 13 19:17:54 mds1 kernel: Lustre:
6544:0:(import.c:508:import_select_connection()) lustre1-OST0005-osc: tried all
connections, increasing latency to 51s
> -110 = -ETIMEOUT, operation don''t finished before deadline, or network
problem.
> oops! (lustre1-OST0005 is hosted on the OSS with the crazy ll_ost_create
process)ll_ost_create work on destroy old created objects, i think.


> 
> On the affected OSS we get
> Lustre: 11764:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005:
refuse reconnection from lustre1-mdtlov_UUID at 10.201.62.11@o2ib to
0xffff8102164d0200; still busy with 2 active RPCs
> 
> 
> $ llog_reader lustre-log.1281718692.11833 shows:Llog_reader is tool to read configuration llog, if you want decode debug log,
you should use lctl df $file > $output
> 
> And we get tons of soft-cpu lockups :-/
> 
> Any ideas?please post soft-lookup report. one of possibility, MDS ask too many objects to
create on that OST or OST have too many reconnects.
> 
> 
> Regards,
> Adrian
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Adrian Ulrich

2010-Aug-13 18:29 UTC

head link

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

Hi Alexey,
> Llog_reader is tool to read configuration llog, if you want decode debug
log, you should use lctl df $file > $output
Oh, sorry for mixing this up.

''lctl df'' doesn''t show much new stuff:

00010000:00000400:0:1281721514.362102:0:13008:0:(ldlm_lib.c:541:target_handle_reconnect())
lustre1-OST0005:
 9f880a5e-2331-07a8-8611-d6e3102f466e reconnecting
00010000:00000400:0:1281721514.362107:0:13008:0:(ldlm_lib.c:835:target_handle_connect())
lustre1-OST0005:
refuse reconnection from 9f880a5e-2331-07a8-8611-d6e3102f466e at
10.201.48.12@o2ib to 0xffff8101c93b6000; still busy with 1 active RPCs
00010000:00020000:
00010000:00000400:7:1281721525.880767:0:11822:0:(ldlm_lib.c:541:target_handle_reconnect())
lustre1-OST0005:
lustre1-mdtlov_UUID reconnecting

> please post soft-lookup report. one of possibility, MDS ask too many
objects to create on that OST or OST have too many reconnects.
LustreError: 12972:0:(ldlm_lib.c:1863:target_send_reply_msg()) Skipped 71
previous similar messages
BUG: soft lockup - CPU#4 stuck for 59s! [ll_ost_creat_00:11833]
CPU 4:
Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U)
crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U)
 ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) raid456(U) xor(U) raid1(U)
netconsole(U) lockd(U) sunrpc(U) rdma_ucm(U) qlgc_vnic(U) ib_sdp(U) rdma_cm
(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6(U)
xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) iw_nes(U) iw_cxgb3(U)
cxgb3(U) ib_ipath(U) ib_mthca(U) mptctl(U) dm_mirror(U) dm_multipath(U)
scsi_dh(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U)
asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) mlx4_ib(U)
ib_mad(U) ib_core(U) joydev(U) sr_mod(U) cdrom(U) sg(U) tpm_infineon(U)
tpm(U) tpm_bios(U) mlx4_core(U) i5000_edac(U) edac_mc(U) i2c_i801(U) pcspkr(U)
e1000e(U) i2c_core(U) serio_raw(U) dm_raid45(U) dm_message(U)
dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) usb_storage(U) ahci(U)
ata_piix(U) libata(U) mptsas(U) scsi_transport_sas(U) mptfc(U)
scsi_transport_fc(U) mptspi(U) mptscsih(U) mptbase(U) scsi_transport_spi(U)
shpchp(U) aacraid(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U)
ohci_hcd(U) ehci_hcd(U)
Pid: 11833, comm: ll_ost_creat_00 Tainted: G     
2.6.18-128.7.1.el5_lustre.1.8.1.1 #1
RIP: 0010:[<ffffffff88b49af4>]  [<ffffffff88b49af4>]
:ldiskfs:ldiskfs_find_entry+0x1d4/0x5c0
RSP: 0018:ffff8101e715d500  EFLAGS: 00000202
RAX: 0000000000000000 RBX: 0000000000000008 RCX: 000000000c91a9f1
RDX: ffff8101e8893800 RSI: ffff8101e715d4e8 RDI: ffff81010773e838
RBP: 0000000000000002 R08: ffff81017bd9cff8 R09: ffff81017bd9c000
R10: ffff810216dfb000 R11: 000000004c6578dc R12: ffff81017d41e6d0
R13: ffffffff80063b4c R14: ffff8101e715d5b8 R15: ffffffff80014fae
FS:  00002ab1d8b97220(0000) GS:ffff81021fc74bc0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000009c9c178 CR3: 0000000000201000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff8001a1b8>] vsnprintf+0x559/0x59e
 [<ffffffff8005ba4d>] cache_alloc_refill+0x106/0x186
 [<ffffffff88b4bf63>] :ldiskfs:ldiskfs_lookup+0x53/0x290
 [<ffffffff800366e8>] __lookup_hash+0x10b/0x130
 [<ffffffff800e2c9b>] lookup_one_len+0x53/0x61
 [<ffffffff88bd71ed>] :obdfilter:filter_fid2dentry+0x42d/0x730
 [<ffffffff88bd3383>] :obdfilter:filter_statfs+0x273/0x350
 [<ffffffff8026f08b>] __down_trylock+0x44/0x4e
 [<ffffffff88bd2f00>] :obdfilter:filter_parent_lock+0x20/0x220
 [<ffffffff88bd7d43>] :obdfilter:filter_precreate+0x843/0x19e0
 [<ffffffff887e0923>] :lnet:lnet_ni_send+0x93/0xd0
 [<ffffffff8000d0bd>] dput+0x23/0x10a
 [<ffffffff88be1e19>] :obdfilter:filter_create+0x10b9/0x15e0
 [<ffffffff887e6ad2>] :lnet:LNetPut+0x702/0x800
 [<ffffffff888e9a13>] :ptlrpc:ptl_send_buf+0x3f3/0x5b0
 [<ffffffff888ef394>] :ptlrpc:lustre_msg_add_version+0x34/0x110
 [<ffffffff888ea198>] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0
 [<ffffffff888f1f69>] :ptlrpc:lustre_pack_reply+0x29/0xb0
 [<ffffffff88ba161d>] :ost:ost_handle+0x131d/0x5a70
 [<ffffffff8001a1b8>] vsnprintf+0x559/0x59e
 [<ffffffff88797978>] :libcfs:libcfs_debug_vmsg2+0x6e8/0x990
 [<ffffffff80148e8c>] __next_cpu+0x19/0x28
 [<ffffffff80088f36>] find_busiest_group+0x20d/0x621
 [<ffffffff888f3f05>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
 [<ffffffff80089d8d>] enqueue_task+0x41/0x56
 [<ffffffff888f8c1d>] :ptlrpc:ptlrpc_check_req+0x1d/0x110
 [<ffffffff888fb357>] :ptlrpc:ptlrpc_server_handle_request+0xa97/0x1170
 [<ffffffff8003d382>] lock_timer_base+0x1b/0x3c
 [<ffffffff8008881d>] __wake_up_common+0x3e/0x68
 [<ffffffff888fee08>] :ptlrpc:ptlrpc_main+0x1218/0x13e0
 [<ffffffff8008a3f3>] default_wake_function+0x0/0xe
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff888fdbf0>] :ptlrpc:ptlrpc_main+0x0/0x13e0
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Btw: We are running 1.8.1.1 (with rhel kernel 2.6.18-128.7.1.el5_lustre.1.8.1.1)

Regards,
 Adrian

Alexey Lyashkov

2010-Aug-13 18:54 UTC

head link

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

Adrian,

in general soft-lookup isn''t error, that just notice about some
operation is need too many time (more then 10s i think).
attached soft-lookup say - OST is busy with creating objects after
MDS<>OST reconnect,  i think you have too busy disks or overloaded node.
to avoid that softlookup  you can reduce maximal number of precreated object via
procfs (if my memory correct).

why clients has reconnects is different question, but that can be related to
soft-lookup.
if you have slow disks - client can be disconnected before they request is
processing, and that request blocked to reconnect from that client.

On Aug 13, 2010, at 21:29, Adrian Ulrich wrote:> Oh, sorry for mixing this up.
> 
> ''lctl df'' doesn''t show much new stuff:
> 
>
00010000:00000400:0:1281721514.362102:0:13008:0:(ldlm_lib.c:541:target_handle_reconnect())
lustre1-OST0005:
> 9f880a5e-2331-07a8-8611-d6e3102f466e reconnecting
>
00010000:00000400:0:1281721514.362107:0:13008:0:(ldlm_lib.c:835:target_handle_connect())
lustre1-OST0005:
> refuse reconnection from 9f880a5e-2331-07a8-8611-d6e3102f466e at
10.201.48.12@o2ib to 0xffff8101c93b6000; still busy with 1 active RPCs
> 00010000:00020000:
>
00010000:00000400:7:1281721525.880767:0:11822:0:(ldlm_lib.c:541:target_handle_reconnect())
lustre1-OST0005:
> lustre1-mdtlov_UUID reconnecting
> 
> 
>> please post soft-lookup report. one of possibility, MDS ask too many
objects to create on that OST or OST have too many reconnects.
> 
> LustreError: 12972:0:(ldlm_lib.c:1863:target_send_reply_msg()) Skipped 71
previous similar messages
> BUG: soft lockup - CPU#4 stuck for 59s! [ll_ost_creat_00:11833]
> CPU 4:
> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U)
crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U)
> ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) raid456(U) xor(U) raid1(U)
netconsole(U) lockd(U) sunrpc(U) rdma_ucm(U) qlgc_vnic(U) ib_sdp(U) rdma_cm
> (U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U)
ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) iw_nes(U)
iw_cxgb3(U)
> cxgb3(U) ib_ipath(U) ib_mthca(U) mptctl(U) dm_mirror(U) dm_multipath(U)
scsi_dh(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U)
> asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U)
mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) sr_mod(U) cdrom(U) sg(U)
tpm_infineon(U)
> tpm(U) tpm_bios(U) mlx4_core(U) i5000_edac(U) edac_mc(U) i2c_i801(U)
pcspkr(U) e1000e(U) i2c_core(U) serio_raw(U) dm_raid45(U) dm_message(U)
> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) usb_storage(U)
ahci(U) ata_piix(U) libata(U) mptsas(U) scsi_transport_sas(U) mptfc(U)
> scsi_transport_fc(U) mptspi(U) mptscsih(U) mptbase(U) scsi_transport_spi(U)
shpchp(U) aacraid(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U)
> ohci_hcd(U) ehci_hcd(U)
> Pid: 11833, comm: ll_ost_creat_00 Tainted: G     
2.6.18-128.7.1.el5_lustre.1.8.1.1 #1
> RIP: 0010:[<ffffffff88b49af4>]  [<ffffffff88b49af4>]
:ldiskfs:ldiskfs_find_entry+0x1d4/0x5c0
> RSP: 0018:ffff8101e715d500  EFLAGS: 00000202
> RAX: 0000000000000000 RBX: 0000000000000008 RCX: 000000000c91a9f1
> RDX: ffff8101e8893800 RSI: ffff8101e715d4e8 RDI: ffff81010773e838
> RBP: 0000000000000002 R08: ffff81017bd9cff8 R09: ffff81017bd9c000
> R10: ffff810216dfb000 R11: 000000004c6578dc R12: ffff81017d41e6d0
> R13: ffffffff80063b4c R14: ffff8101e715d5b8 R15: ffffffff80014fae
> FS:  00002ab1d8b97220(0000) GS:ffff81021fc74bc0(0000)
knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000009c9c178 CR3: 0000000000201000 CR4: 00000000000006e0
> 
> Call Trace:
> [<ffffffff8001a1b8>] vsnprintf+0x559/0x59e
> [<ffffffff8005ba4d>] cache_alloc_refill+0x106/0x186
> [<ffffffff88b4bf63>] :ldiskfs:ldiskfs_lookup+0x53/0x290
> [<ffffffff800366e8>] __lookup_hash+0x10b/0x130
> [<ffffffff800e2c9b>] lookup_one_len+0x53/0x61
> [<ffffffff88bd71ed>] :obdfilter:filter_fid2dentry+0x42d/0x730
> [<ffffffff88bd3383>] :obdfilter:filter_statfs+0x273/0x350
> [<ffffffff8026f08b>] __down_trylock+0x44/0x4e
> [<ffffffff88bd2f00>] :obdfilter:filter_parent_lock+0x20/0x220
> [<ffffffff88bd7d43>] :obdfilter:filter_precreate+0x843/0x19e0
> [<ffffffff887e0923>] :lnet:lnet_ni_send+0x93/0xd0
> [<ffffffff8000d0bd>] dput+0x23/0x10a
> [<ffffffff88be1e19>] :obdfilter:filter_create+0x10b9/0x15e0
> [<ffffffff887e6ad2>] :lnet:LNetPut+0x702/0x800
> [<ffffffff888e9a13>] :ptlrpc:ptl_send_buf+0x3f3/0x5b0
> [<ffffffff888ef394>] :ptlrpc:lustre_msg_add_version+0x34/0x110
> [<ffffffff888ea198>] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0
> [<ffffffff888f1f69>] :ptlrpc:lustre_pack_reply+0x29/0xb0
> [<ffffffff88ba161d>] :ost:ost_handle+0x131d/0x5a70
> [<ffffffff8001a1b8>] vsnprintf+0x559/0x59e
> [<ffffffff88797978>] :libcfs:libcfs_debug_vmsg2+0x6e8/0x990
> [<ffffffff80148e8c>] __next_cpu+0x19/0x28
> [<ffffffff80088f36>] find_busiest_group+0x20d/0x621
> [<ffffffff888f3f05>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
> [<ffffffff80089d8d>] enqueue_task+0x41/0x56
> [<ffffffff888f8c1d>] :ptlrpc:ptlrpc_check_req+0x1d/0x110
> [<ffffffff888fb357>]
:ptlrpc:ptlrpc_server_handle_request+0xa97/0x1170
> [<ffffffff8003d382>] lock_timer_base+0x1b/0x3c
> [<ffffffff8008881d>] __wake_up_common+0x3e/0x68
> [<ffffffff888fee08>] :ptlrpc:ptlrpc_main+0x1218/0x13e0
> [<ffffffff8008a3f3>] default_wake_function+0x0/0xe
> [<ffffffff8005dfb1>] child_rip+0xa/0x11
> [<ffffffff888fdbf0>] :ptlrpc:ptlrpc_main+0x0/0x13e0
> [<ffffffff8005dfa7>] child_rip+0x0/0x11
> 
> 
> Btw: We are running 1.8.1.1 (with rhel kernel
2.6.18-128.7.1.el5_lustre.1.8.1.1)
> 
> Regards,
> Adrian

Adrian Ulrich

2010-Aug-13 19:01 UTC

head link

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

Hi Alexey,
> in general soft-lookup isn''t error, that just notice about some
operation is need too many time (more then 10s i think).
> attached soft-lookup say - OST is busy with creating objects after
MDS<>OST reconnect,
Yes, i know that a soft-lockup doesn''t mean that i hit a bug but having
ll_ost_creat_* wasting 100% CPU
doesn''t seem to be normal.
> i think you have too busy disks or overloaded node.
Disk %busy is < 5% for all attached disks.
The OST is doing almost nothing (there are a few read()''s,
that''s all)

> if you have slow disks - client can be disconnected before they request is
processing, and that request blocked to reconnect from that client.
The recovery of the clients seems to be ok: all clients can write/read data from
the OST but
there is something wrong between the MDS<->OST0005.

But this might just be a side-effect of the ll_ost_creat_* issue :-/

Regards,
 Adrian


-- 
 RFC 1925:
   (11) Every old idea will be proposed again with a different name and
        a different presentation, regardless of whether it works.

Andreas Dilger

2010-Aug-13 21:54 UTC

head link

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

On 2010-08-13, at 12:29, Adrian Ulrich wrote:> Pid: 11833, comm: ll_ost_creat_00 Tainted: G     
2.6.18-128.7.1.el5_lustre.1.8.1.1 #1
> :ldiskfs:ldiskfs_find_entry+0x1d4/0x5c0
> [<ffffffff88b4bf63>] :ldiskfs:ldiskfs_lookup+0x53/0x290
> [<ffffffff800366e8>] __lookup_hash+0x10b/0x130
> [<ffffffff800e2c9b>] lookup_one_len+0x53/0x61
> [<ffffffff88bd71ed>] :obdfilter:filter_fid2dentry+0x42d/0x730
> [<ffffffff88bd3383>] :obdfilter:filter_statfs+0x273/0x350
> [<ffffffff88bd2f00>] :obdfilter:filter_parent_lock+0x20/0x220
> [<ffffffff88bd7d43>] :obdfilter:filter_precreate+0x843/0x19e0
> [<ffffffff88be1e19>] :obdfilter:filter_create+0x10b9/0x15e0
> [<ffffffff88ba161d>] :ost:ost_handle+0x131d/0x5a70
Two possibilities I can see:
- MDS sent very large create request.  Compare the values from:
  mds> lctl get_param osc.*.prealloc_*
  oss> lctl get_param obdfilter.*.last_id

  and see if they match.  If last_id is growing quickly the thread is busy
  precreating many objects for some reason.  If this OST has a much higher
  prealloc_last_id on the MDS, something is bad in the MDS lov_objids file.

- the on-disk structure of the object directory for this OST is corrupted.
  Run "e2fsck -fp /dev/{ostdev}" on the unmounted OST filesystem.


Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Adrian Ulrich

2010-Aug-14 08:28 UTC

head link

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

> - the on-disk structure of the object directory for this OST is corrupted.
>   Run "e2fsck -fp /dev/{ostdev}" on the unmounted OST filesystem.
e2fsck fixed it: The OST is now running since 40 minutes without problems:

e2fsck 1.41.6.sun1 (30-May-2009)
lustre1-OST0005: recovering journal
lustre1-OST0005 has been mounted 72 times without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Directory inode 440696867, block 493, offset 0: directory corrupted
Salvage<y>? yes

Directory inode 440696853, block 517, offset 0: directory corrupted
Salvage<y>? yes

Directory inode 440696842, block 560, offset 0: directory corrupted
Salvage<y>? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Unattached inode 17769156
Connect to /lost+found<y>? yes

Inode 17769156 ref count is 2, should be 1.  Fix<y>? yes

Unattached zero-length inode 17883901.  Clear<y>? yes

Pass 5: Checking group summary information

lustre1-OST0005: ***** FILE SYSTEM WAS MODIFIED *****
lustre1-OST0005: 44279/488382464 files (15.4% non-contiguous),
280329314/1953524992 blocks



But shouldn''t the journal of ext3/ldiskfs make running e2fsck
unnecessary?


Have a nice weekend and thanks a lot for the fast reply!

Regards,
 Adrian

Andreas Dilger

2010-Aug-14 15:29 UTC

head link

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

On 2010-08-14, at 2:28, Adrian Ulrich <adrian at blinkenlights.ch>
wrote:>> - the on-disk structure of the object directory for this OST is
corrupted.
>>  Run "e2fsck -fp /dev/{ostdev}" on the unmounted OST
filesystem.
> 
> e2fsck fixed it: The OST is now running since 40 minutes without problems:
> 
> But shouldn''t the journal of ext3/ldiskfs make running e2fsck
unnecessary?
The journal will prevent inconsistencies in the filesystem in case of a crash.
It cannot prevent corruption of the on-disk data, inconsistencies caused by
cache enabled on the disks or in a RAID controller, software bugs, memory
corruption, bad cables, etc.

That is why it is still a good idea for users to run e2fsck periodically on a
filesystem. If you are using LVM there is an lvcheck script I wrote that can
check a filesystem snapshot on a running system, but otherwise you should do it
whenever the opportunity arises.

Cheers, Andreas

Adrian Ulrich

2010-Aug-14 16:05 UTC

head link

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

> The journal will prevent inconsistencies in the filesystem in case of a
crash.
> It cannot prevent corruption of the on-disk data, inconsistencies caused by
cache
> enabled on the disks or in a RAID controller, software bugs, memory
corruption, bad cables, etc.
The OSS is part of a ''Snowbird'' installation, so the RAID/Disk
part should be fine.
I hope that we ''just'' hit a small software bug :-/

> That is why it is still a good idea for users to run e2fsck periodically on
a filesystem.
Ok, we will keep this in mind (e2fsck was surprisingly fast anyway!)


Regards,
 Adrian

-- 
 RFC 1925:
   (11) Every old idea will be proposed again with a different name and
        a different presentation, regardless of whether it works.

Lustre discuss - Aug 2010 - ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)