Have anyone seen these kind of errors while running IOR or some other benchmarks: Im running lustre 1.8.1 on CentOS 5.3. I have the following configuration: 4 JBDOs J4400 connected to 4 OSSs. Each OSS has 3 OSTs (raid5 - 8 disks) connected using multipathd, mdadm on /dev/dm* and using mptfusion driver (for de J4400 JBODS) Everytime I run: mpirun -hostfile ./lustre.hosts -np 20 /hpc/IOR -w -r -C -i 2 -b 1000M -t 128k -F -o /work/stripe12/teste (Specially with -b 1000M) One of my OSSs crashes, sometimes one, sometimes another. With the following error: Sep 9 07:43:40 a01n00 kernel: ll_ost_io_64 D ffff81037fea80c0 0 20381 1 20382 20380 (L-TLB) Sep 9 07:43:40 a01n00 kernel: ffff81036316b510 0000000000000046 0000000000000003 0000040000000282 Sep 9 07:43:40 a01n00 kernel: 0000000000000100 0000000000000009 ffff81037ac09100 ffff81037fea80c0 Sep 9 07:43:40 a01n00 kernel: 0000088160738e93 0000000000313ec1 ffff81037ac092e8 0000000328b65740 Sep 9 07:43:40 a01n00 kernel: Call Trace: Sep 9 07:43:40 a01n00 kernel: [<ffffffff80033608>] submit_bio+0xcd/0xd4 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88b14aac>] :obdfilter:filter_do_bio+0x95c/0xb60 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88ae0f24>] :fsfilt_ldiskfs:fsfilt_ldiskfs_write_record+0x464/0x4b0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88b014f0>] :obdfilter:filter_commit_cb+0x0/0x2d0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88031749>] :jbd:journal_callback_set+0x2d/0x47 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8009daef>] autoremove_wake_function+0x0/0x2e Sep 9 07:43:40 a01n00 kernel: [<ffffffff88b15974>] :obdfilter:filter_direct_io+0xcc4/0xd50 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8892ad70>] :lquota:filter_quota_acquire+0x0/0x120 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88b17c08>] :obdfilter:filter_commitrw_write+0x1558/0x25b0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88730d23>] :lnet:lnet_send+0x973/0x9a0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88790c11>] :obdclass:class_handle2object+0xd1/0x160 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88abc048>] :ost:ost_checksum_bulk+0x358/0x590 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88ac2b1e>] :ost:ost_brw_write+0x1b8e/0x2310 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88837c88>] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88803320>] :ptlrpc:target_committed_to_req+0x40/0x120 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88abe67c>] :ost:ost_brw_read+0x182c/0x19e0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8883c025>] :ptlrpc:lustre_msg_get_version+0x35/0xf0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8008a3ef>] default_wake_function+0x0/0xe Sep 9 07:43:40 a01n00 kernel: [<ffffffff8883c0e8>] :ptlrpc:lustre_msg_check_version_v2+0x8/0x20 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88ac60fb>] :ost:ost_handle+0x2e5b/0x5a70 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88735305>] :lnet:lnet_match_blocked_msg+0x375/0x390 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88811aea>] :ptlrpc:ldlm_resource_foreach+0x25a/0x390 Sep 9 07:43:40 a01n00 kernel: [<ffffffff80148d4f>] __next_cpu+0x19/0x28 Sep 9 07:43:40 a01n00 kernel: [<ffffffff80148d4f>] __next_cpu+0x19/0x28 Sep 9 07:43:40 a01n00 kernel: [<ffffffff80088f32>] find_busiest_group+0x20d/0x621 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88841a15>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8884672d>] :ptlrpc:ptlrpc_check_req+0x1d/0x110 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88848e67>] :ptlrpc:ptlrpc_server_handle_request+0xa97/0x1160 Sep 9 07:43:40 a01n00 kernel: [<ffffffff80063098>] thread_return+0x62/0xfe Sep 9 07:43:40 a01n00 kernel: [<ffffffff8003dc3f>] lock_timer_base+0x1b/0x3c Sep 9 07:43:40 a01n00 kernel: [<ffffffff8001ceb8>] __mod_timer+0xb0/0xbe Sep 9 07:43:40 a01n00 kernel: [<ffffffff8884c908>] :ptlrpc:ptlrpc_main+0x1218/0x13e0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8008a3ef>] default_wake_function+0x0/0xe Sep 9 07:43:40 a01n00 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8884b6f0>] :ptlrpc:ptlrpc_main+0x0/0x13e0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Sep 9 07:43:40 a01n00 kernel: Sep 9 07:43:40 a01n00 kernel: Lustre: 0:0:(watchdog.c:181:lcw_cb()) Watchdog triggered for pid 27733: it was inactive for 200.00s Sep 9 07:43:40 a01n00 kernel: Lustre: 0:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for process 27733 Sep 9 07:43:40 a01n00 kernel: ll_ost_io_159 D 0000000000000000 0 27733 1 27734 27732 (L-TLB) Sep 9 07:43:40 a01n00 kernel: ffff810521239510 0000000000000046 0000000000000003 0000040000000282 Sep 9 07:43:40 a01n00 kernel: 0000000000000100 000000000000000a ffff81067e810860 ffff81033115a040 Sep 9 07:43:40 a01n00 kernel: 00000881604f2d64 00000000000d2465 ffff81067e810a48 000000061ced4140 Sep 9 07:43:40 a01n00 kernel: Call Trace: Sep 9 07:43:40 a01n00 kernel: [<ffffffff80033608>] submit_bio+0xcd/0xd4 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88b14aac>] :obdfilter:filter_do_bio+0x95c/0xb60 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88ae0f24>] :fsfilt_ldiskfs:fsfilt_ldiskfs_write_record+0x464/0x4b0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88b014f0>] :obdfilter:filter_commit_cb+0x0/0x2d0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88031749>] :jbd:journal_callback_set+0x2d/0x47 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8009daef>] autoremove_wake_function+0x0/0x2e Sep 9 07:43:40 a01n00 kernel: [<ffffffff88b15974>] :obdfilter:filter_direct_io+0xcc4/0xd50 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8892ad70>] :lquota:filter_quota_acquire+0x0/0x120 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88b17c08>] :obdfilter:filter_commitrw_write+0x1558/0x25b0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88ac2b1e>] :ost:ost_brw_write+0x1b8e/0x2310 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88837c88>] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88803320>] :ptlrpc:target_committed_to_req+0x40/0x120 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88abe67c>] :ost:ost_brw_read+0x182c/0x19e0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8883c025>] :ptlrpc:lustre_msg_get_version+0x35/0xf0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8008a3ef>] default_wake_function+0x0/0xe Sep 9 07:43:40 a01n00 kernel: [<ffffffff8883c0e8>] :ptlrpc:lustre_msg_check_version_v2+0x8/0x20 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88ac60fb>] :ost:ost_handle+0x2e5b/0x5a70 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88735305>] :lnet:lnet_match_blocked_msg+0x375/0x390 Sep 9 07:43:40 a01n00 kernel: [<ffffffff800d74d2>] __drain_alien_cache+0x51/0x66 Sep 9 07:43:40 a01n00 kernel: [<ffffffff80148d4f>] __next_cpu+0x19/0x28 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88841a15>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff80089d89>] enqueue_task+0x41/0x56 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8884672d>] :ptlrpc:ptlrpc_check_req+0x1d/0x110 Sep 9 07:43:40 a01n00 kernel: [<ffffffff88848e67>] :ptlrpc:ptlrpc_server_handle_request+0xa97/0x1160 Sep 9 07:43:40 a01n00 kernel: [<ffffffff80088819>] __wake_up_common+0x3e/0x68 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8884c908>] :ptlrpc:ptlrpc_main+0x1218/0x13e0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8008a3ef>] default_wake_function+0x0/0xe Sep 9 07:43:40 a01n00 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8884b6f0>] :ptlrpc:ptlrpc_main+0x0/0x13e0 Sep 9 07:43:40 a01n00 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Sep 9 07:43:40 a01n00 kernel: ll_ost_io_195 D ffff81038ab8c860 0 27769 1 27770 27768 (L-TLB) Sep 9 07:43:40 a01n00 kernel: ffff81028a541190 0000000000000046 ffff81028a541120 ffffffff8009daf8 Sep 9 07:43:40 a01n00 kernel: ffff810369dc3b18 000000000000000a ffff81028a524820 ffff81038ab8c860 Sep 9 07:43:40 a01n00 kernel: 00000881659b85ee 0000000000000429 ffff81028a524a08 0000000000000003 Sep 9 07:43:40 a01n00 kernel: Call Trace: Sep 9 07:43:40 a01n00 kernel: [<ffffffff8009daf8>] autoremove_wake_function+0x9/0x2e Sep 9 07:43:40 a01n00 kernel: [<ffffffff8002e6ba>] __wake_up+0x38/0x4f Sep 9 07:43:41 a01n00 kernel: [<ffffffff881b8b39>] :dm_mod:dm_table_unplug_all+0x33/0x42 Sep 9 07:43:41 a01n00 kernel: [<ffffffff886b5e62>] :raid456:get_active_stripe+0x247/0x4f0 Sep 9 07:43:41 a01n00 kernel: [<ffffffff8008a3ef>] default_wake_function+0x0/0xe Sep 9 07:43:41 a01n00 kernel: [<ffffffff886bb4dd>] :raid456:make_request+0x472/0x9af Sep 9 07:43:41 a01n00 kernel: [<ffffffff8009daef>] autoremove_wake_function+0x0/0x2e Sep 9 07:43:41 a01n00 kernel: [<ffffffff8009daef>] autoremove_wake_function+0x0/0x2e Sep 9 07:43:41 a01n00 kernel: [<ffffffff8001c49b>] generic_make_request+0x1e7/0x1fe Sep 9 07:43:41 a01n00 kernel: [<ffffffff80023342>] mempool_alloc+0x24/0xda Sep 9 07:43:41 a01n00 kernel: [<ffffffff80033608>] submit_bio+0xcd/0xd4 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88788656>] :obdclass:lprocfs_oh_tally+0x26/0x50 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88adf7bc>] :fsfilt_ldiskfs:fsfilt_ldiskfs_send_bio+0xc/0x20 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88b14711>] :obdfilter:filter_do_bio+0x5c1/0xb60 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88ae0f24>] :fsfilt_ldiskfs:fsfilt_ldiskfs_write_record+0x464/0x4b0 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88b014f0>] :obdfilter:filter_commit_cb+0x0/0x2d0 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88031749>] :jbd:journal_callback_set+0x2d/0x47 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88adfad0>] :fsfilt_ldiskfs:fsfilt_ldiskfs_commit_async+0xd0/0x150 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88b15974>] :obdfilter:filter_direct_io+0xcc4/0xd50 Sep 9 07:43:41 a01n00 kernel: [<ffffffff8892ad70>] :lquota:filter_quota_acquire+0x0/0x120 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88b17c08>] :obdfilter:filter_commitrw_write+0x1558/0x25b0 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88730d23>] :lnet:lnet_send+0x973/0x9a0 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88790c11>] :obdclass:class_handle2object+0xd1/0x160 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88abc02c>] :ost:ost_checksum_bulk+0x33c/0x590 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88ac2b1e>] :ost:ost_brw_write+0x1b8e/0x2310 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88837c88>] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88803320>] :ptlrpc:target_committed_to_req+0x40/0x120 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88abe67c>] :ost:ost_brw_read+0x182c/0x19e0 Sep 9 07:43:41 a01n00 kernel: [<ffffffff8883c025>] :ptlrpc:lustre_msg_get_version+0x35/0xf0 Sep 9 07:43:41 a01n00 kernel: [<ffffffff8008a3ef>] default_wake_function+0x0/0xe Sep 9 07:43:41 a01n00 kernel: [<ffffffff8883c0e8>] :ptlrpc:lustre_msg_check_version_v2+0x8/0x20 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88ac60fb>] :ost:ost_handle+0x2e5b/0x5a70 Sep 9 07:43:41 a01n00 kernel: [<ffffffff800d7290>] free_block+0x126/0x143 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88735305>] :lnet:lnet_match_blocked_msg+0x375/0x390 Sep 9 07:43:41 a01n00 kernel: [<ffffffff800d74d2>] __drain_alien_cache+0x51/0x66 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88790c11>] :obdclass:class_handle2object+0xd1/0x160 Sep 9 07:43:41 a01n00 kernel: [<ffffffff80148d4f>] __next_cpu+0x19/0x28 Sep 9 07:43:41 a01n00 kernel: [<ffffffff80088f32>] find_busiest_group+0x20d/0x621 Sep 9 07:43:41 a01n00 kernel: [<ffffffff887f719a>] :ptlrpc:lock_res_and_lock+0xba/0xd0 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88841a15>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0 Sep 9 07:43:41 a01n00 kernel: [<ffffffff80089d89>] enqueue_task+0x41/0x56 Sep 9 07:43:41 a01n00 kernel: [<ffffffff8884672d>] :ptlrpc:ptlrpc_check_req+0x1d/0x110 Sep 9 07:43:41 a01n00 kernel: [<ffffffff88848e67>] :ptlrpc:ptlrpc_server_handle_request+0xa97/0x1160 Sep 9 07:43:41 a01n00 kernel: [<ffffffff80088819>] __wake_up_common+0x3e/0x68 Sep 9 07:43:41 a01n00 kernel: [<ffffffff8884c908>] :ptlrpc:ptlrpc_main+0x1218/0x13e0 Sep 9 07:43:41 a01n00 kernel: [<ffffffff8008a3ef>] default_wake_function+0x0/0xe Sep 9 07:43:41 a01n00 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Sep 9 07:43:41 a01n00 kernel: [<ffffffff8884b6f0>] :ptlrpc:ptlrpc_main+0x0/0x13e0 Sep 9 07:43:41 a01n00 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Sep 9 07:43:41 a01n00 kernel: Sep 9 07:43:41 a01n00 kernel: ll_ost_io_68 D 0000000000000000 0 20385 1 20386 20384 (L-TLB) Sep 9 07:43:41 a01n00 kernel: ffff810375ce5510 0000000000000046 0000000000000003 0000040000000282 Sep 9 07:43:41 a01n00 kernel: 0000000000000100 000000000000000a ffff81037a2ca080 ffff810365f9e860 Sep 9 07:43:41 a01n00 kernel: 000008815549e040 00000000000df2e2 ffff81037a2ca268 0000000730b8cd40 ... Any Ideas ? Tks Rafael Tinoco Rafael David Tinoco - Sun Microsystems Systems Engineer - High Performance Computing Rafael.Tinoco at Sun.COM - 55.11.5187.2194 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090909/aeedb904/attachment-0001.html
On Wed, 2009-09-09 at 14:31 -0300, Rafael David Tinoco wrote:> Have anyone seen these kind of errors while running IOR or some other > benchmarks:On a note of e-mail formatting, so much vertical whitespace is not really needed and makes reading a bit more difficult. Also, personally, I don''t wrap log file excerpts at ~80 columns. I think most people have a wide enough display to read that and it makes reading things like stack dumps much, much easier. Not MTAs make it all that easy to not wrap though.> One of my OSSs crashes,What do you mean by "crash"? Does it oops, or need a reboot, etc? You have not really provided enough log for me to determine what context the following is in:> sometimes one, sometimes another. With the following error: > > > > Sep 9 07:43:40 a01n00 kernel: ll_ost_io_64 D ffff81037fea80c0 0 > 20381 1 20382 20380 (L-TLB) > > Sep 9 07:43:40 a01n00 kernel: ffff81036316b510 0000000000000046 > 0000000000000003 0000040000000282 > > Sep 9 07:43:40 a01n00 kernel: 0000000000000100 0000000000000009 > ffff81037ac09100 ffff81037fea80c0 > > Sep 9 07:43:40 a01n00 kernel: 0000088160738e93 0000000000313ec1 > ffff81037ac092e8 0000000328b65740 > > Sep 9 07:43:40 a01n00 kernel: Call Trace: > > Sep 9 07:43:40 a01n00 kernel: [<ffffffff80033608>] submit_bio > +0xcd/0xd4 > > Sep 9 07:43:40 a01n00 kernel: > [<ffffffff88b14aac>] :obdfilter:filter_do_bio+0x95c/0xb60 > > Sep 9 07:43:40 a01n00 kernel: > [<ffffffff88ae0f24>] :fsfilt_ldiskfs:fsfilt_ldiskfs_write_record > +0x464/0x4b0 > > Sep 9 07:43:40 a01n00 kernel: > [<ffffffff88b014f0>] :obdfilter:filter_commit_cb+0x0/0x2d0 > > Sep 9 07:43:40 a01n00 kernel: > [<ffffffff88031749>] :jbd:journal_callback_set+0x2d/0x47 > > Sep 9 07:43:40 a01n00 kernel: [<ffffffff8009daef>] > autoremove_wake_function+0x0/0x2e... Can you provide a bit more of the log before the above so we can see what the stack trace is in reference to? Also, try to eliminate the white-space between lines. Are you getting any other errors or messages from Lustre prior to that? Perhaps you are getting some messages saying that various operations are "slow"? Have you tuned these OSSes with respect to the number of OST threads needed to drive (and not over-drive) your disks? The lustre iokit is useful for that tuning. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090909/31d30a37/attachment.bin
Hello! On Sep 9, 2009, at 1:31 PM, Rafael David Tinoco wrote:> One of my OSSs crashes, sometimes one, sometimes another. With the > following error:That''s not a crash. That''s watchdog timeout indicative of lustre spending too much time waiting on io. As such you need to somehow decrease the load on the system (by e.g. reducing the number of io threads - was discussed on this list recently), increase obd_timeout or get faster disk subsystem. Bye, Oleg
Im attaching the messages (only the error part) file so we don''t have these mail formatting problems. ------ Can you provide a bit more of the log before the above so we can see what the stack trace is in reference to? Also, try to eliminate the white-space between lines. Are you getting any other errors or messages from Lustre prior to that? Perhaps you are getting some messages saying that various operations are "slow"?>> Even beeing slow, the OST should respond right ? It "hangs".Have you tuned these OSSes with respect to the number of OST threads needed to drive (and not over-drive) your disks? The lustre iokit is useful for that tuning.>> Ok, tuning for performance is okay, but hanging with 20 nodes (IOR MPI).. strange right ?b. ----- I''m using 3 raid 5 with 8 disks each and 256 OST threads on each OSS. root at a02n00:~# cat /etc/mdadm.conf ARRAY /dev/md10 level=raid5 num-devices=8 devices=/dev/dm-0,/dev/dm-1,/dev/dm-2,/dev/dm-3,/dev/dm-4,/dev/dm-5,/dev/dm-6,/dev/dm-7 ARRAY /dev/md11 level=raid5 num-devices=8 devices=/dev/dm-8,/dev/dm-9,/dev/dm-10,/dev/dm-11,/dev/dm-12,/dev/dm-13,/dev/dm-14,/dev/dm-15 ARRAY /dev/md12 level=raid5 num-devices=8 devices=/dev/dm-16,/dev/dm-17,/dev/dm-18,/dev/dm-19,/dev/dm-20,/dev/dm-21,/dev/dm-22,/dev/dm-23 All my OSTs were created with internal journal (for test pourposes). mkfs.lustre --r --ost --fsname=work --mkfsoptions="-b 4096 -E stride=32,stripe-width=224 -m 0" --mgsnid=a03n00 at o2ib --mgsnid=b03n00 at o2ib /dev/md[10|11|12] Im using separete mdt and mgs: # MGS mkfs.lustre --fsname=work --r --mgs --mkfsoptions="-b 4096 -E stride=4,stripe-width=4 -m 0" --mountfsoptions=acl --failnode=b03n00 at o2ib /dev/sdb1 # MDT mkfs.lustre --fsname=work --r --mgsnid=a03n00 at o2ib --mgsnid=b03n00 at o2ib --mdt --mkfsoptions="-b 4096 -E stride=4,stripe-width=40 -m 0" --mountfsoptions=acl --failnode=b03n00 at o2ib /dev/sdc1 I''m using these packages on server: ---------- root at a03n00:~# rpm -aq | grep -i lustre lustre-modules-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 lustre-client-modules-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 lustre-ldiskfs-3.0.9-2.6.18_128.1.14.el5_lustre.1.8.1 kernel-lustre-headers-2.6.18-128.1.14.el5_lustre.1.8.1 kernel-lustre-2.6.18-128.1.14.el5_lustre.1.8.1 lustre-client-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 kernel-lustre-devel-2.6.18-128.1.14.el5_lustre.1.8.1 lustre-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 kernel-ib-1.4.1-2.6.18_128.1.14.el5_lustre.1.8.1 ---------- On client Ive compiled kernel 2.6.18-128.el5 without INFINIBAND support. Then compiled OFED 1.4.1 and after that compile patchless client. For the patchless client, compiled with: --ofa-kernel=/usr/src/ofa_kernel ---------- * THE ERROR Using: root at b00n00:~# mpirun -hostfile ./lustre.hosts -np 20 /hpc/IOR -w -r -C -i 2 -b 1G -t 512k -F -o /work/stripe12/teste for example starts "hanging" the OSTs and the filesystem "hangs". Any atempt to rm or read a file (or df -kh) hangs and keeps forever (not even kill -9 solves). With that.. I cannot umount my OSTs on the OSSs. And I have to "reboot" the server, and my raids starts resyncing. Tinoco
Forget the file.. sorry -----Original Message----- From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Rafael David Tinoco Sent: Wednesday, September 09, 2009 7:30 PM To: ''Brian J. Murrell''; lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] OSTs hanging while running IOR Im attaching the messages (only the error part) file so we don''t have these mail formatting problems. ------ Can you provide a bit more of the log before the above so we can see what the stack trace is in reference to? Also, try to eliminate the white-space between lines. Are you getting any other errors or messages from Lustre prior to that? Perhaps you are getting some messages saying that various operations are "slow"?>> Even beeing slow, the OST should respond right ? It "hangs".Have you tuned these OSSes with respect to the number of OST threads needed to drive (and not over-drive) your disks? The lustre iokit is useful for that tuning.>> Ok, tuning for performance is okay, but hanging with 20 nodes (IOR MPI).. strange right ?b. ----- I''m using 3 raid 5 with 8 disks each and 256 OST threads on each OSS. root at a02n00:~# cat /etc/mdadm.conf ARRAY /dev/md10 level=raid5 num-devices=8 devices=/dev/dm-0,/dev/dm-1,/dev/dm-2,/dev/dm-3,/dev/dm-4,/dev/dm-5,/dev/dm-6,/dev/dm-7 ARRAY /dev/md11 level=raid5 num-devices=8 devices=/dev/dm-8,/dev/dm-9,/dev/dm-10,/dev/dm-11,/dev/dm-12,/dev/dm-13,/dev/dm-14,/dev/dm-15 ARRAY /dev/md12 level=raid5 num-devices=8 devices=/dev/dm-16,/dev/dm-17,/dev/dm-18,/dev/dm-19,/dev/dm-20,/dev/dm-21,/dev/dm-22,/dev/dm-23 All my OSTs were created with internal journal (for test pourposes). mkfs.lustre --r --ost --fsname=work --mkfsoptions="-b 4096 -E stride=32,stripe-width=224 -m 0" --mgsnid=a03n00 at o2ib --mgsnid=b03n00 at o2ib /dev/md[10|11|12] Im using separete mdt and mgs: # MGS mkfs.lustre --fsname=work --r --mgs --mkfsoptions="-b 4096 -E stride=4,stripe-width=4 -m 0" --mountfsoptions=acl --failnode=b03n00 at o2ib /dev/sdb1 # MDT mkfs.lustre --fsname=work --r --mgsnid=a03n00 at o2ib --mgsnid=b03n00 at o2ib --mdt --mkfsoptions="-b 4096 -E stride=4,stripe-width=40 -m 0" --mountfsoptions=acl --failnode=b03n00 at o2ib /dev/sdc1 I''m using these packages on server: ---------- root at a03n00:~# rpm -aq | grep -i lustre lustre-modules-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 lustre-client-modules-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 lustre-ldiskfs-3.0.9-2.6.18_128.1.14.el5_lustre.1.8.1 kernel-lustre-headers-2.6.18-128.1.14.el5_lustre.1.8.1 kernel-lustre-2.6.18-128.1.14.el5_lustre.1.8.1 lustre-client-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 kernel-lustre-devel-2.6.18-128.1.14.el5_lustre.1.8.1 lustre-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 kernel-ib-1.4.1-2.6.18_128.1.14.el5_lustre.1.8.1 ---------- On client Ive compiled kernel 2.6.18-128.el5 without INFINIBAND support. Then compiled OFED 1.4.1 and after that compile patchless client. For the patchless client, compiled with: --ofa-kernel=/usr/src/ofa_kernel ---------- * THE ERROR Using: root at b00n00:~# mpirun -hostfile ./lustre.hosts -np 20 /hpc/IOR -w -r -C -i 2 -b 1G -t 512k -F -o /work/stripe12/teste for example starts "hanging" the OSTs and the filesystem "hangs". Any atempt to rm or read a file (or df -kh) hangs and keeps forever (not even kill -9 solves). With that.. I cannot umount my OSTs on the OSSs. And I have to "reboot" the server, and my raids starts resyncing. Tinoco _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: messages Type: application/octet-stream Size: 31917 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090909/cbdbf5f9/attachment-0001.obj
Rafael, Can you tell me what RAID stripe_cache_size are you using? Have you tuned your OSS nodes with something like: echo 4096 > /sys/block/md0/md/stripe_cache_size If you haven''t, can you try using above command to tune all your RAID arrays? Cheers, -Atul Rafael David Tinoco wrote:> Forget the file.. sorry > > -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Rafael David Tinoco > Sent: Wednesday, September 09, 2009 7:30 PM > To: ''Brian J. Murrell''; lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] OSTs hanging while running IOR > > Im attaching the messages (only the error part) file so we don''t have these mail formatting problems. > > ------ > > Can you provide a bit more of the log before the above so we can see what the stack trace is in reference to? Also, try to > eliminate the white-space between lines. Are you getting any other errors or messages from Lustre prior to that? > > Perhaps you are getting some messages saying that various operations are "slow"? > > >>> Even beeing slow, the OST should respond right ? It "hangs". >>> > > Have you tuned these OSSes with respect to the number of OST threads needed to drive (and not over-drive) your disks? The lustre > iokit is useful for that tuning. > > >>> Ok, tuning for performance is okay, but hanging with 20 nodes (IOR MPI).. strange right ? >>> > > b. > > ----- > > I''m using 3 raid 5 with 8 disks each and 256 OST threads on each OSS. > > root at a02n00:~# cat /etc/mdadm.conf > ARRAY /dev/md10 level=raid5 num-devices=8 devices=/dev/dm-0,/dev/dm-1,/dev/dm-2,/dev/dm-3,/dev/dm-4,/dev/dm-5,/dev/dm-6,/dev/dm-7 > ARRAY /dev/md11 level=raid5 num-devices=8 > devices=/dev/dm-8,/dev/dm-9,/dev/dm-10,/dev/dm-11,/dev/dm-12,/dev/dm-13,/dev/dm-14,/dev/dm-15 > ARRAY /dev/md12 level=raid5 num-devices=8 > devices=/dev/dm-16,/dev/dm-17,/dev/dm-18,/dev/dm-19,/dev/dm-20,/dev/dm-21,/dev/dm-22,/dev/dm-23 > > All my OSTs were created with internal journal (for test pourposes). > > mkfs.lustre --r --ost --fsname=work --mkfsoptions="-b 4096 -E stride=32,stripe-width=224 -m 0" --mgsnid=a03n00 at o2ib > --mgsnid=b03n00 at o2ib /dev/md[10|11|12] > > Im using separete mdt and mgs: > > # MGS > mkfs.lustre --fsname=work --r --mgs --mkfsoptions="-b 4096 -E stride=4,stripe-width=4 -m 0" --mountfsoptions=acl > --failnode=b03n00 at o2ib /dev/sdb1 > > # MDT > mkfs.lustre --fsname=work --r --mgsnid=a03n00 at o2ib --mgsnid=b03n00 at o2ib --mdt --mkfsoptions="-b 4096 -E stride=4,stripe-width=40 -m > 0" --mountfsoptions=acl --failnode=b03n00 at o2ib /dev/sdc1 > > I''m using these packages on server: > ---------- > root at a03n00:~# rpm -aq | grep -i lustre > lustre-modules-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 > lustre-client-modules-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 > lustre-ldiskfs-3.0.9-2.6.18_128.1.14.el5_lustre.1.8.1 > kernel-lustre-headers-2.6.18-128.1.14.el5_lustre.1.8.1 > kernel-lustre-2.6.18-128.1.14.el5_lustre.1.8.1 > lustre-client-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 > kernel-lustre-devel-2.6.18-128.1.14.el5_lustre.1.8.1 > lustre-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 > kernel-ib-1.4.1-2.6.18_128.1.14.el5_lustre.1.8.1 > ---------- > On client Ive compiled kernel 2.6.18-128.el5 without INFINIBAND support. > Then compiled OFED 1.4.1 and after that compile patchless client. > For the patchless client, compiled with: > --ofa-kernel=/usr/src/ofa_kernel > ---------- > > * THE ERROR > > Using: > > root at b00n00:~# mpirun -hostfile ./lustre.hosts -np 20 /hpc/IOR -w -r -C -i 2 -b 1G -t 512k -F -o /work/stripe12/teste > > for example starts "hanging" the OSTs and the filesystem "hangs". > Any atempt to rm or read a file (or df -kh) hangs and keeps forever (not even kill -9 solves). > > With that.. I cannot umount my OSTs on the OSSs. > And I have to "reboot" the server, and my raids starts resyncing. > > Tinoco > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
It seeeeeeeems that using 64 threads for OST solved the problem. :D But.. too early to celebrate .. running all blocksizes and stripe widths combinations. Regards Tinoco -----Original Message----- From: Oleg.Drokin at Sun.COM [mailto:Oleg.Drokin at Sun.COM] Sent: Wednesday, September 09, 2009 7:26 PM To: Rafael David Tinoco Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] OSTs hanging while running IOR Hello! On Sep 9, 2009, at 1:31 PM, Rafael David Tinoco wrote:> One of my OSSs crashes, sometimes one, sometimes another. With the > following error:That''s not a crash. That''s watchdog timeout indicative of lustre spending too much time waiting on io. As such you need to somehow decrease the load on the system (by e.g. reducing the number of io threads - was discussed on this list recently), increase obd_timeout or get faster disk subsystem. Bye, Oleg
On Wed, 2009-09-09 at 21:38 -0300, Rafael David Tinoco wrote:> It seeeeeeeems that using 64 threads for OST solved the problem.Ahhh. As I suspected then. Good. Thanx for updating the thread. The archives at least will like that. :-) b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090910/31ef9f86/attachment.bin
On Sep 09, 2009 19:32 -0300, Rafael David Tinoco wrote:> Forget the file.. sorry > > Lustre: 0:0:(watchdog.c:181:lcw_cb()) Watchdog > triggered for pid 16372: it was inactive for 200.00s > > [stack trace]Since lots of users are confused by this message, and think there is a crash, I think we should make a more useful message here. I''ve filed bug 20722 on this, which would be a trivial bug for someone to fix if they have some time. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.