High, after installing Lustre 1.6.4.2 we observed bad performance on our clients. It takes a long time until commands are started. We do not see any Lustre or IO related messages while waiting for clients to respond. Our Cluster consists of 40 compute nodes, 1 mds, 4 ost nodes. We are running Scientific Linux 5 (Red Hat EL 5 (Boron)) with Kernel 2.6.18-8.1.14.el5_lustre.1.6.4.2smp on x86_64 systems. MDS/OSTs are quadcore systems with 8GB Memory. As storage we use an iSCSI system with 4 1Gb Eth connections to the Lustre server. One OST and the MDS share one ethernet connection. Each OST uses a RAID 6 partition of 1.6GB size. The size of the MDS Partiotion is about 240GB. We did the upgrade because 1.4 and 1.6.2 clients were often evicted when using 1.4 servers. It was our assumption that this behaviour killed some of our applications. (Error on open). At the moment we do not observe evictions but a lot of other Lustre errrors so that we assume that these errors are responsible for I/O-errors in some of our applications. The following holds independent of the load on the OSTs: We get every few seconds messages from all OSTs like: Feb 17 17:44:49 cn43 kernel: Lustre: 7590:0:(filter_io_26.c:776:filter_commitrw_write()) lustre-OST0000: slow <xxx> <xy>s where xxx is one of the following: commitrw,journal start,parent lock,preprw_write,start_page_write,journal start,parent lock,preprw_read setup,start_page_read,setattr and xy is of order 10-40s. The frequency of these messages depends on the load of the OSTs. On all OSTs we observe a very high load between 20 and 120 alternating between all 4 OSTs. We also see up too 900 ll_ost_io threads. There is no correlation between the high load measured by uptime and the number of threads. A typical trace and a lustre-log is attached. The IO-Load as observed by vmstat shows that we have a high read load which is always smaller than the bandwidth to the iSCSI Raid 6 partitions. (With 1.6.2 we observed a sustained throughput of ~350MB/s). With the change to 1.6.4.2 we also changed from direct connections between the MDS/OSTs and the iSCSI system to a connection using a HP Pro Curve switch. Flow control is enabled on the Lustre systems. Any idea how we can overcome the slow invocation of processes? thanks w.dilling -------------------------------------------------------------------------------- W.Dilling Tel.: (49) 7071/29-70206 Universitaet Tuebingen Fax.: (49) 7071/29-5912 Zentrum fuer Datenverarbeitung mail: dilling at zdv.uni-tuebingen.de Waechterstrasse 76 72074 Tuebingen -------------- next part -------------- Feb 18 07:34:00 cn44 kernel: Lustre: 18377:0:(filter_io.c:561:filter_preprw_write()) lustre-OST0001: s low preprw_write setup 33s Feb 18 07:34:00 cn44 kernel: Lustre: 18377:0:(filter_io.c:561:filter_preprw_write()) Skipped 3 previou s similar messages Feb 18 07:34:00 cn44 kernel: Lustre: 18377:0:(filter_io.c:662:filter_preprw_write()) lustre-OST0001: s low start_page_write 33s Feb 18 07:34:00 cn44 kernel: Lustre: 18377:0:(filter_io.c:662:filter_preprw_write()) Skipped 4 previou s similar messages Feb 18 07:35:07 cn44 kernel: ll_ost_io_212 D ffff8101a4e8d760 0 18377 1 18378 18376 ( L-TLB) Feb 18 07:35:07 cn44 kernel: ffff8101a4e8d760 000000aa00000400 0000000000000023 000000000000000all_os t_io_210 D ffff8101a4c69760 0 18375 1 18376 18374 (L-TLB) Feb 18 07:35:07 cn44 kernel: ffff8101a4c69760 000000aa00000400 0000000000000023 000000000000000a Feb 18 07:35:07 cn44 kernel: ffff8101a534a080 ffff810130aeb860 0002fee19d3f9b41 0000000000001179 Feb 18 07:35:07 cn44 kernel: ffff8101a534a268 000047c700000002 ffff8101a2144860 ffffffffffffffff Feb 18 07:35:07 cn44 kernel: Call Trace: Feb 18 07:35:07 cn44 kernel: [<ffffffff8803818b>] :jbd:log_wait_commit+0xa3/0xf5 Feb 18 07:35:07 cn44 kernel: [<ffffffff8009b1e1>] autoremove_wake_function+0x0/0x2e Feb 18 07:35:07 cn44 kernel: [<ffffffff887cce7b>] :fsfilt_ldiskfs:fsfilt_ldiskfs_commit_wait+0xab/0xd 0 Feb 18 07:35:07 cn44 kernel: [<ffffffff8880392f>] :obdfilter:filter_commitrw_write+0x1c9f/0x2570 Feb 18 07:35:07 cn44 kernel: [<ffffffff8003cd0e>] lock_timer_base+0x1b/0x3c Feb 18 07:35:07 cn44 kernel: [<ffffffff887b0b46>] :ost:ost_brw_write+0x21b6/0x28c0 Feb 18 07:35:07 cn44 kernel: [<ffffffff884bb133>] :lnet:LNetPut+0x693/0x720 Feb 18 07:35:07 cn44 kernel: [<ffffffff884b44fc>] :lnet:LNetMDBind+0x2ac/0x400 Feb 18 07:35:07 cn44 kernel: [<ffffffff885a5db0>] :ptlrpc:ptl_send_buf+0x4a0/0x650 Feb 18 07:35:07 cn44 kernel: [<ffffffff800867b0>] default_wake_function+0x0/0xe Feb 18 07:35:07 cn44 kernel: [<ffffffff887b483e>] :ost:ost_handle+0x2a8e/0x58d8 Feb 18 07:35:07 cn44 kernel: [<ffffffff88512c82>] :obdclass:class_handle2object+0xd2/0x160 Feb 18 07:35:07 cn44 kernel: [<ffffffff885ae230>] :ptlrpc:lustre_swab_ptlrpc_body+0x0/0x90 Feb 18 07:35:07 cn44 kernel: [<ffffffff885abde5>] :ptlrpc:lustre_swab_buf+0xc5/0xf0 Feb 18 07:35:07 cn44 kernel: [<ffffffff885b3a3b>] :ptlrpc:ptlrpc_server_handle_request+0xb0b/0x1270 Feb 18 07:35:07 cn44 kernel: [<ffffffff800606e8>] thread_return+0x0/0xea Feb 18 07:35:07 cn44 kernel: [<ffffffff8006aec3>] do_gettimeofday+0x50/0x92 Feb 18 07:35:07 cn44 kernel: [<ffffffff8846c056>] :libcfs:lcw_update_time+0x16/0x100 Feb 18 07:35:07 cn44 kernel: [<ffffffff8003cd0e>] lock_timer_base+0x1b/0x3c Feb 18 07:35:07 cn44 kernel: [<ffffffff885b647c>] :ptlrpc:ptlrpc_main+0x7dc/0x950 Feb 18 07:35:07 cn44 kernel: [<ffffffff800867b0>] default_wake_function+0x0/0xe Feb 18 07:35:07 cn44 kernel: [<ffffffff8005bc25>] child_rip+0xa/0x11 Feb 18 07:35:07 cn44 kernel: [<ffffffff885b5ca0>] :ptlrpc:ptlrpc_main+0x0/0x950 Feb 18 07:35:07 cn44 kernel: [<ffffffff8005bc1b>] child_rip+0x0/0x11 Feb 18 07:35:07 cn44 kernel: Feb 18 07:35:07 cn44 kernel: LustreError: dumping log to /tmp/lustre-log.1203316507.18375 -------------- next part -------------- x??? :%?G????ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 65s x??? :%?G????ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 65s x??? :%?G??H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 40s ~??? ?%?G???filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s ~??? ?%?GL??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s ~??? ?%?Ga?Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 44s z??? ?%?G???G(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?G???Gfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 69s z??? ?%?G???(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?G???filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s z??? ?%?G??D(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?G??Dfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s z??? ?%?G??(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s z??? ?%?G??(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s z??? ?%?G??(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s ~???? ?%?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 31s ~??? ?%?G;??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 31s ~??? &?G??-filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 34s ~???? &?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 35s ~???? &?G?Jfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 32s ~???? &?G/?ifilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 34s x??? &?G;???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 36s ~??? &?GC??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 36s ~??? ?&?G?hvfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 43s ~??? ?&?G??Gfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 40s ~??? ?&?Gi?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 40s ~??? ?&?Gtio-filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 33s ~???? ?&?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 46s x??? ??GZ???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 35s ~??? ??Gc??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 35s q??? ??G????filter.cfilter_parent_locklustre-OST0001: slow parent lock 33s | ??G??1filter_io.cfilter_preprw_writelustre-OST0001: slow preprw_write setup 33s z??? ??G????filter_io.cfilter_preprw_writelustre-OST0001: slow start_page_write 33s p ??G_?G2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s x??? ''?G????ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G?}?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G?B?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G?h?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G?%?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 74s x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 74s x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 74s x??? ''?G?H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 74s x??? ''?G?8?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 74s x??? ''?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 69s x??? ''?G??"?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 75s x??? ''?G????ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 42s x??? ''?G? y-?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G?[?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G?@?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G ?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 75s x??? ''?G ??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 75s z??? ;%?G+?(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 65s ??? ;%?G"+?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 132s ?;%?GO+?8watchdog.clcw_update_timeExpired watchdog for pid 8898 disabled after 132.1261s z??? ;%?G$2?(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 65s ??? ;%?G&2?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 159s ?;%?GH2?8watchdog.clcw_update_timeExpired watchdog for pid 18369 disabled after 159.3035s x??? ;%?G??w?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 40s ~??? ?%?G??wfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 45s z??? ?%?GK??(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?GM??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s z??? ?%?Ge??G(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?Gg??Gfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s z??? ?%?G}?t(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?G~?tfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s ~??? ?%?G???filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 39s ~??? ?%?G?? filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 39s ~??? ?%?G? Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 38s ~??? ?%?G?7filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 35s ~??? ?%?G?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 34s ~???? ?%?GD??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 31s x??? &?GX%?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 34s ~??? &?Gl?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 34s x??? ?&?G?L ?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 35s x??? ?&?G?L ??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 38s x??? ?&?G?L n-?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 38s x??? ?&?G?L H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 38s ~???? ?&?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 41s ~??? ?&?GXh?Gfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 40s ~??? ?&?G|h?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 43s ~??? ?&?G?hn-filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 43s ~??? ?&?G?Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 43s p ??G???2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s p ??GJ?K2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s p ??G??~2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s p ??G???"2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s p ??G??H2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s p ??G??2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 34s p ??G???2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 34s p ??G?S2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 34s p ??G0-2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 34s p ??GUH2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 31s p ??G??2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s p??? ??G?*2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 34s x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 75s x??? ''?G?s?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 75s x??? ''?G?H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 69s x??? ''?G??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 68s x??? ''?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 62s x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 62s x??? ''?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 50s x??? ''?G??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 49s x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 48s x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 73s x??? ''?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 74s x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 48s x??? ''?G?T?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 75s x??? ''?G?U?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 56s x??? ''?G ??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 43s x??? ''?G ?a?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 43s x??? ''?G? ??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 77s x??? ''?G? ?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 77s x??? ''?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 77s x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 77s x??? ''?G?z-?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 48s z??? ;%?G,H(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 65s ??? ;%?G$,Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 158s ?;%?Gh,H8watchdog.clcw_update_timeExpired watchdog for pid 18444 disabled after 158.3020s z??? ;%?Go,?G(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 65s ??? ;%?Gp,?Gfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 158s ?;%?G?,?G8watchdog.clcw_update_timeExpired watchdog for pid 18340 disabled after 158.3020s z??? ;%?G?,z(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 65s ??? ;%?G?,zfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 149s ?;%?G?,z8watchdog.clcw_update_timeExpired watchdog for pid 6266 disabled after 153.3022s z??? ;%?G?, H(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 65s ??? ;%?G?, Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 168s ?;%?G? H8watchdog.clcw_update_timeExpired watchdog for pid 18442 disabled after 168.6575s ~??? ?%?G??-filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s z??? ?%?GL??(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?GN??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 48s z??? ?%?Gx??(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?Gz??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 49s z??? ?%?G???(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?G???filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 49s z??? ?%?G??q(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?G??qfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s z??? ?%?G?x(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?G?xfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s z??? ?%?G?-(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?G?-filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s z??? ?%?G???(lustre_fsfilt.hfsfilt_commit_waitlustre-OST0001: slow journal start 42s ~??? ?%?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 70s ~???? ?%?G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 31s ~???? ?%?G1??Gfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 31s ~???? ?%?GD??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 31s x??? &?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 33s x??? &?G''?O?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 33s x??? &?G,?v?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 33s x??? &?G7???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 34s x??? &?GY%H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 34s ~??? &?G?Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 34s ~??? &?G???filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 35s ~??? &?G ?vfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 34s ~??? &?G)?Ofilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 34s ~??? &?G8??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 34s x??? &?G;???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 36s ~??? &?GC??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 36s x??? ?&?G ???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 41s ~??? ?&?G?h?filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 46s p ??G?.2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 33s q??? ??G?H?filter.cfilter_parent_locklustre-OST0001: slow parent lock 31s |??? ??G?H1filter_io.cfilter_preprw_writelustre-OST0001: slow preprw_write setup 31s z??? ??G?H?filter_io.cfilter_preprw_writelustre-OST0001: slow start_page_write 31s x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 64s x??? ''?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 64s x??? ''?G??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 64s x??? ''?G ??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 64s x??? ''?G ?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 55s x??? ''?G H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 55s x??? ''?G ??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 55s x??? ''?G H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 51s x??? ''?G ?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 48s x??? ''?G! ?"?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 47s x??? ''?G?6?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 52s x??? ''?G/S?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 52s x??? ''?G/?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 54s x??? ''?G/?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 52s x??? ''?G/?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 54s x??? ''?G/<?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 54s x??? ''?G/3?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 54s x??? ''?G /??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 52s x??? ''?G''/H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 47s x??? :%?G?-?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 65s x??? ;%?GEP??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 34s x??? ;%?GMP ?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 34s x??? ;%?GPP H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 34s x??? ;%?GTP7?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 31s x??? ;%?GWP?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 30s x??? {%?G)?(?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 33s ~??? {%?G/?(filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 33s x??? |%?Gdj ?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 30s ~??? |%?Gkj filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 30s x??? ??G?NW?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 33s x??? ??G?N??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 33s x??? ??G?N??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 32s ~???? ??G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 33s ~??? ??G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 33s ~??? ??G??filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 33s ~??? ??G7?Wfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 33s x??? &?Gc??-?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 33s x??? &?Gl???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 34s x??? &?G?J?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 31s x??? &?Gq%i?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 34s p &?GwA?2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 30s x??? &?G??n?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 36s ~??? &?G??nfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 36s x??? &?G? H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 30s ~??? &?G? Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 30s x??? ?&?GM v?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 38s x??? ?&?G??G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 35s x??? ?&?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 35s x??? ?&?G???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 42s x??? ??G????ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 36s ~??? ??G???filter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 36s x??? ??Gl|H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 36s ~??? ??Gs|Hfilter_io_26.cfilter_commitrw_writelustre-OST0001: slow commitrw commit 36s p ??G ?2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 34s p ??GZ?2lustre_fsfilt.hfsfilt_setattrlustre-OST0001: slow setattr 32s x??? ''?G ???ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G?P?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 75s x??? ''?G? E?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 69s x??? ''?GX?q-?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 75s x??? ''?G?M?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 74s x??? ''?G %?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 63s x??? ''?G?6 ??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 63s x??? ''?Gu ?G?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 77s x??? ''?G? )?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 76s x??? ''?G? ??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 74s x??? ''?G? &?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 45s x??? ''?G? ??ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 45s x??? ''?G? H?ilter_io_26.cfilter_commitrw_writelustre-OST0001: slow direct_io 55s -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2052 bytes Desc: S/MIME krytographische Unterschrift Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080218/3da134ce/attachment-0002.bin
Johann Lombardi
2008-Feb-18 08:44 UTC
[Lustre-discuss] Lots of Lustre Errors with Lustre 1.6.4.2
On Mon, Feb 18, 2008 at 08:57:32AM +0100, Dilling wrote:> where xxx is one of the following: commitrw,journal start,parent > lock,preprw_write,start_page_write,journal start,parent lock,preprw_read > setup,start_page_read,setattr > and xy is of order 10-40s. The frequency of these messages depends on > the load of the OSTs. On all OSTs we observe a very high load between 20 > and 120 alternating between all 4 OSTs. We also see up too 900 ll_ost_io > threads.Have you seen 900 ll_ost_io threads on a single OSS? Theoretically, the number of ll_ost_io is limited to 512 (i.e. OSS_THREADS_MAX). Could you please try to enforce the number of ost threads to 128 or 256? To do so: stop the OSTs, unload the lustre modules and add the following line in /etc/modprobe.conf: options ost oss_num_threads=128 Johann