High,
after installing Lustre 1.6.4.2 we observed bad performance on our
clients. It takes a long time until commands are started. We do not
see any Lustre or IO related messages while waiting for clients to
respond.
Our Cluster consists of 40 compute nodes, 1 mds, 4 ost nodes. We are
running Scientific Linux 5 (Red Hat EL 5 (Boron)) with Kernel
2.6.18-8.1.14.el5_lustre.1.6.4.2smp on x86_64 systems. MDS/OSTs are
quadcore systems with 8GB Memory. As storage we use an iSCSI system
with 4 1Gb Eth connections to the Lustre server. One OST and the MDS
share one ethernet connection. Each OST uses a RAID 6 partition of
1.6GB size. The size of the MDS Partiotion is about 240GB.
We did the upgrade because 1.4 and 1.6.2 clients were often evicted
when using 1.4 servers. It was our assumption that this behaviour
killed some of our applications. (Error on open). At the moment we do
not observe evictions but a lot of other Lustre errrors so that we
assume that these errors are responsible for I/O-errors in some of our
applications.
The following holds independent of the load on the OSTs:
We get every few seconds messages from all OSTs like:
Feb 17 17:44:49 cn43 kernel: Lustre:
7590:0:(filter_io_26.c:776:filter_commitrw_write()) lustre-OST0000:
slow <xxx> <xy>s
where xxx is one of the following: commitrw,journal start,parent
lock,preprw_write,start_page_write,journal start,parent
lock,preprw_read setup,start_page_read,setattr
and xy is of order 10-40s. The frequency of these messages depends on
the load of the OSTs. On all OSTs we observe a very high load between
20 and 120 alternating between all 4 OSTs. We also see up too 900
ll_ost_io threads. There is no correlation between the high load
measured by uptime and the number of threads. A typical trace and a
lustre-log is attached. The IO-Load as observed by vmstat shows that
we have a high read load which is always smaller than the bandwidth to
the iSCSI Raid 6 partitions. (With 1.6.2 we observed a sustained
throughput of ~350MB/s).
With the change to 1.6.4.2 we also changed from direct connections
between the MDS/OSTs and the iSCSI system to a connection using a HP
Pro Curve switch. Flow control is enabled on the Lustre systems.
Any idea how we can overcome the slow invocation of processes?
thanks
w.dilling
--------------------------------------------------------------------------------
W.Dilling Tel.: (49) 7071/29-70206
Universitaet Tuebingen Fax.: (49) 7071/29-5912
Zentrum fuer Datenverarbeitung mail: dilling at zdv.uni-tuebingen.de
Waechterstrasse 76
72074 Tuebingen
-------------- next part --------------
Feb 18 07:34:00 cn44 kernel: Lustre:
18377:0:(filter_io.c:561:filter_preprw_write()) lustre-OST0001: s
low preprw_write setup 33s
Feb 18 07:34:00 cn44 kernel: Lustre:
18377:0:(filter_io.c:561:filter_preprw_write()) Skipped 3 previou
s similar messages
Feb 18 07:34:00 cn44 kernel: Lustre:
18377:0:(filter_io.c:662:filter_preprw_write()) lustre-OST0001: s
low start_page_write 33s
Feb 18 07:34:00 cn44 kernel: Lustre:
18377:0:(filter_io.c:662:filter_preprw_write()) Skipped 4 previou
s similar messages
Feb 18 07:35:07 cn44 kernel: ll_ost_io_212 D ffff8101a4e8d760 0 18377 1
18378 18376 (
L-TLB)
Feb 18 07:35:07 cn44 kernel: ffff8101a4e8d760 000000aa00000400 0000000000000023
000000000000000all_os
t_io_210 D ffff8101a4c69760 0 18375 1 18376 18374 (L-TLB)
Feb 18 07:35:07 cn44 kernel: ffff8101a4c69760 000000aa00000400 0000000000000023
000000000000000a
Feb 18 07:35:07 cn44 kernel: ffff8101a534a080 ffff810130aeb860 0002fee19d3f9b41
0000000000001179
Feb 18 07:35:07 cn44 kernel: ffff8101a534a268 000047c700000002 ffff8101a2144860
ffffffffffffffff
Feb 18 07:35:07 cn44 kernel: Call Trace:
Feb 18 07:35:07 cn44 kernel: [<ffffffff8803818b>]
:jbd:log_wait_commit+0xa3/0xf5
Feb 18 07:35:07 cn44 kernel: [<ffffffff8009b1e1>]
autoremove_wake_function+0x0/0x2e
Feb 18 07:35:07 cn44 kernel: [<ffffffff887cce7b>]
:fsfilt_ldiskfs:fsfilt_ldiskfs_commit_wait+0xab/0xd
0
Feb 18 07:35:07 cn44 kernel: [<ffffffff8880392f>]
:obdfilter:filter_commitrw_write+0x1c9f/0x2570
Feb 18 07:35:07 cn44 kernel: [<ffffffff8003cd0e>]
lock_timer_base+0x1b/0x3c
Feb 18 07:35:07 cn44 kernel: [<ffffffff887b0b46>]
:ost:ost_brw_write+0x21b6/0x28c0
Feb 18 07:35:07 cn44 kernel: [<ffffffff884bb133>]
:lnet:LNetPut+0x693/0x720
Feb 18 07:35:07 cn44 kernel: [<ffffffff884b44fc>]
:lnet:LNetMDBind+0x2ac/0x400
Feb 18 07:35:07 cn44 kernel: [<ffffffff885a5db0>]
:ptlrpc:ptl_send_buf+0x4a0/0x650
Feb 18 07:35:07 cn44 kernel: [<ffffffff800867b0>]
default_wake_function+0x0/0xe
Feb 18 07:35:07 cn44 kernel: [<ffffffff887b483e>]
:ost:ost_handle+0x2a8e/0x58d8
Feb 18 07:35:07 cn44 kernel: [<ffffffff88512c82>]
:obdclass:class_handle2object+0xd2/0x160
Feb 18 07:35:07 cn44 kernel: [<ffffffff885ae230>]
:ptlrpc:lustre_swab_ptlrpc_body+0x0/0x90
Feb 18 07:35:07 cn44 kernel: [<ffffffff885abde5>]
:ptlrpc:lustre_swab_buf+0xc5/0xf0
Feb 18 07:35:07 cn44 kernel: [<ffffffff885b3a3b>]
:ptlrpc:ptlrpc_server_handle_request+0xb0b/0x1270
Feb 18 07:35:07 cn44 kernel: [<ffffffff800606e8>] thread_return+0x0/0xea
Feb 18 07:35:07 cn44 kernel: [<ffffffff8006aec3>]
do_gettimeofday+0x50/0x92
Feb 18 07:35:07 cn44 kernel: [<ffffffff8846c056>]
:libcfs:lcw_update_time+0x16/0x100
Feb 18 07:35:07 cn44 kernel: [<ffffffff8003cd0e>]
lock_timer_base+0x1b/0x3c
Feb 18 07:35:07 cn44 kernel: [<ffffffff885b647c>]
:ptlrpc:ptlrpc_main+0x7dc/0x950
Feb 18 07:35:07 cn44 kernel: [<ffffffff800867b0>]
default_wake_function+0x0/0xe
Feb 18 07:35:07 cn44 kernel: [<ffffffff8005bc25>] child_rip+0xa/0x11
Feb 18 07:35:07 cn44 kernel: [<ffffffff885b5ca0>]
:ptlrpc:ptlrpc_main+0x0/0x950
Feb 18 07:35:07 cn44 kernel: [<ffffffff8005bc1b>] child_rip+0x0/0x11
Feb 18 07:35:07 cn44 kernel:
Feb 18 07:35:07 cn44 kernel: LustreError: dumping log to
/tmp/lustre-log.1203316507.18375
-------------- next part --------------
x??? :%?G????ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
65s
x??? :%?G????ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
65s
x??? :%?G??H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
40s
~??? ?%?G???filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
~??? ?%?GL??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
~??? ?%?Ga?Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 44s
z??? ?%?G???G(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal
start 42s
~??? ?%?G???Gfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 69s
z??? ?%?G???(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?G???filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
z??? ?%?G??D(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?G??Dfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
z??? ?%?G??(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
z??? ?%?G??(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
z??? ?%?G??(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
~???? ?%?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 31s
~??? ?%?G;??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 31s
~??? &?G??-filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 34s
~???? &?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 35s
~???? &?G?Jfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 32s
~???? &?G/?ifilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 34s
x???
&?G;???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 36s
~???
&?GC??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 36s
~??? ?&?G?hvfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 43s
~??? ?&?G??Gfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 40s
~??? ?&?Gi?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 40s
~??? ?&?Gtio-filter_io_26.cfilter_commitrw_writelustre-OST0001: slow
commitrw commit 33s
~???? ?&?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 46s
x??? ??GZ???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 35s
~??? ??Gc??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 35s
q??? ??G????filter.cfilter_parent_locklustre-OST0001: slow parent lock 33s
| ??G??1filter_io.cfilter_preprw_writelustre-OST0001: slow preprw_write setup
33s
z??? ??G????filter_io.cfilter_preprw_writelustre-OST0001: slow start_page_write
33s
p ??G_?G2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s
x??? ''?G????ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 76s
x??? ''?G?}?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 76s
x??? ''?G?B?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 76s
x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 76s
x??? ''?G?h?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 76s
x??? ''?G?%?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 76s
x??? ''?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 74s
x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 74s
x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 74s
x??? ''?G?H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 74s
x??? ''?G?8?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 74s
x??? ''?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 69s
x??? ''?G??"?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 75s
x??? ''?G????ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 42s
x??? ''?G?
y-?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s
x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 76s
x??? ''?G?[?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 76s
x??? ''?G?@?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 76s
x??? ''?G
?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 75s
x??? ''?G
??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 75s
z??? ;%?G+?(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
65s
??? ;%?G"+?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 132s
?;%?GO+?8watchdog.clcw_update_timeExpired watchdog for pid 8898 disabled after
132.1261s
z??? ;%?G$2?(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
65s
??? ;%?G&2?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 159s
?;%?GH2?8watchdog.clcw_update_timeExpired watchdog for pid 18369 disabled after
159.3035s
x??? ;%?G??w?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
40s
~??? ?%?G??wfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 45s
z??? ?%?GK??(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?GM??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
z??? ?%?Ge??G(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal
start 42s
~??? ?%?Gg??Gfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
z??? ?%?G}?t(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?G~?tfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
~??? ?%?G???filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 39s
~??? ?%?G?? filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 39s
~??? ?%?G? Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 38s
~??? ?%?G?7filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 35s
~??? ?%?G?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 34s
~???? ?%?GD??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 31s
x??? &?GX%?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
34s
~??? &?Gl?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 34s
x??? ?&?G?L
?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 35s
x??? ?&?G?L
??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 38s
x??? ?&?G?L
n-?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 38s
x??? ?&?G?L
H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 38s
~???? ?&?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 41s
~??? ?&?GXh?Gfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
commitrw commit 40s
~??? ?&?G|h?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 43s
~??? ?&?G?hn-filter_io_26.cfilter_commitrw_writelustre-OST0001: slow
commitrw commit 43s
~??? ?&?G?Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 43s
p ??G???2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s
p ??GJ?K2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s
p ??G??~2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s
p ??G???"2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s
p ??G??H2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s
p ??G??2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 34s
p ??G???2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 34s
p ??G?S2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 34s
p ??G0-2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 34s
p ??GUH2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 31s
p ??G??2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s
p??? ??G?*2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 34s
x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 75s
x??? ''?G?s?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 75s
x??? ''?G?H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 69s
x??? ''?G??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 68s
x??? ''?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 62s
x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 62s
x??? ''?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 50s
x??? ''?G??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 49s
x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 48s
x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 73s
x??? ''?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 74s
x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 48s
x??? ''?G?T?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 75s
x??? ''?G?U?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 56s
x??? ''?G
??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 43s
x??? ''?G ?a?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 43s
x??? ''?G?
??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 77s
x??? ''?G?
?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 77s
x??? ''?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 77s
x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 77s
x??? ''?G?z-?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 76s
x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 76s
x??? ''?G??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 48s
z??? ;%?G,H(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
65s
??? ;%?G$,Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 158s
?;%?Gh,H8watchdog.clcw_update_timeExpired watchdog for pid 18444 disabled after
158.3020s
z??? ;%?Go,?G(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal
start 65s
??? ;%?Gp,?Gfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 158s
?;%?G?,?G8watchdog.clcw_update_timeExpired watchdog for pid 18340 disabled after
158.3020s
z??? ;%?G?,z(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
65s
??? ;%?G?,zfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 149s
?;%?G?,z8watchdog.clcw_update_timeExpired watchdog for pid 6266 disabled after
153.3022s
z??? ;%?G?,
H(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 65s
??? ;%?G?,
Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 168s
?;%?G?
H8watchdog.clcw_update_timeExpired watchdog for pid 18442 disabled after
168.6575s
~??? ?%?G??-filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
z??? ?%?GL??(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?GN??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 48s
z??? ?%?Gx??(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?Gz??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 49s
z??? ?%?G???(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?G???filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 49s
z??? ?%?G??q(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?G??qfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
z??? ?%?G?x(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?G?xfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
z??? ?%?G?-(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?G?-filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
z??? ?%?G???(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start
42s
~??? ?%?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 70s
~???? ?%?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 31s
~???? ?%?G1??Gfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 31s
~???? ?%?GD??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 31s
x??? &?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
33s
x??? &?G''?O?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 33s
x??? &?G,?v?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
33s
x??? &?G7???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
34s
x??? &?GY%H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
34s
~??? &?G?Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 34s
~??? &?G???filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 35s
~??? &?G
?vfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 34s
~??? &?G)?Ofilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 34s
~??? &?G8??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 34s
x???
&?G;???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 36s
~???
&?GC??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 36s
x??? ?&?G
???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 41s
~??? ?&?G?h?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 46s
p ??G?.2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s
q??? ??G?H?filter.cfilter_parent_locklustre-OST0001: slow parent lock 31s
|??? ??G?H1filter_io.cfilter_preprw_writelustre-OST0001: slow preprw_write setup
31s
z??? ??G?H?filter_io.cfilter_preprw_writelustre-OST0001: slow start_page_write
31s
x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 64s
x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 64s
x??? ''?G??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 64s
x??? ''?G
??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 64s
x??? ''?G
?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 55s
x??? ''?G
H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 55s
x??? ''?G
??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 55s
x??? ''?G
H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 51s
x??? ''?G
?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 48s
x??? ''?G!
?"?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 47s
x??? ''?G?6?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 52s
x??? ''?G/S?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 52s
x??? ''?G/?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 54s
x??? ''?G/?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 52s
x??? ''?G/?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 54s
x??? ''?G/<?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 54s
x??? ''?G/3?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 54s
x??? ''?G
/??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 52s
x??? ''?G''/H?ilter_io_26.cfilter_commitrw_writelustre-OST0001:
slow direct_io 47s
x??? :%?G?-?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 65s
x??? ;%?GEP??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
34s
x??? ;%?GMP ?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
34s
x??? ;%?GPP H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
34s
x??? ;%?GTP7?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
31s
x??? ;%?GWP?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 30s
x??? {%?G)?(?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
33s
~??? {%?G/?(filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 33s
x??? |%?Gdj
?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 30s
~??? |%?Gkj
filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 30s
x??? ??G?NW?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 33s
x??? ??G?N??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 33s
x??? ??G?N??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 32s
~???? ??G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 33s
~??? ??G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 33s
~??? ??G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 33s
~??? ??G7?Wfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 33s
x??? &?Gc??-?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 33s
x??? &?Gl???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
34s
x??? &?G?J?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
31s
x??? &?Gq%i?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
34s
p &?GwA?2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 30s
x???
&?G??n?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 36s
~???
&?G??nfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 36s
x???
&?G? H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 30s
~???
&?G? Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 30s
x??? ?&?GM
v?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 38s
x??? ?&?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 35s
x??? ?&?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
35s
x??? ?&?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io
42s
x??? ??G????ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 36s
~??? ??G???filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 36s
x??? ??Gl|H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 36s
~??? ??Gs|Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw
commit 36s
p ??G ?2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 34s
p ??GZ?2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 32s
x??? ''?G ???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 76s
x??? ''?G?P?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 75s
x??? ''?G?
E?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 69s
x??? ''?GX?q-?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 75s
x??? ''?G?M?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 74s
x??? ''?G
%?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 63s
x??? ''?G?6
??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 63s
x??? ''?Gu
?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 77s
x??? ''?G?
)?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s
x??? ''?G?
??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 74s
x??? ''?G?
&?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 45s
x??? ''?G?
??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 45s
x??? ''?G? H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow
direct_io 55s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2052 bytes
Desc: S/MIME krytographische Unterschrift
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080218/3da134ce/attachment-0002.bin
Johann Lombardi
2008-Feb-18 08:44 UTC
[Lustre-discuss] Lots of Lustre Errors with Lustre 1.6.4.2
On Mon, Feb 18, 2008 at 08:57:32AM +0100, Dilling wrote:> where xxx is one of the following: commitrw,journal start,parent > lock,preprw_write,start_page_write,journal start,parent lock,preprw_read > setup,start_page_read,setattr > and xy is of order 10-40s. The frequency of these messages depends on > the load of the OSTs. On all OSTs we observe a very high load between 20 > and 120 alternating between all 4 OSTs. We also see up too 900 ll_ost_io > threads.Have you seen 900 ll_ost_io threads on a single OSS? Theoretically, the number of ll_ost_io is limited to 512 (i.e. OSS_THREADS_MAX). Could you please try to enforce the number of ost threads to 128 or 256? To do so: stop the OSTs, unload the lustre modules and add the following line in /etc/modprobe.conf: options ost oss_num_threads=128 Johann