behlendorf1@llnl.gov
2007-Jan-09 11:32 UTC
[Lustre-devel] [Bug 11512] Reading health_check can be exceedingly slow
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11512 # pigs52 /root > time cat /proc/fs/lustre/health_check healthy real 8m18.093s user 0m0.000s sys 0m0.001s While an OSS is under a heavy ost_destroy load reading the proc entry /proc/fs/lustre/health_check can take an unreasonably long time. This disrupts our ability the effectively monitor the health of the filesystem. /proc/fs/lustre/ost/OSS/ost_io//stats @ 1168365762.925608 Name Cur.Rate #Events last min avg max ost_destroy 18 1048933 15277920.89 5 5944339.71 388464831 # pigs52 /root > uptime 10:18:00 up 17 days, 18:38, 2 users, load average: 242.51, 228.74, 227 Notice the excessively long average destroy time of 5.9 seconds, resulting in virtually all of the threads being busy and a load average of 242+. The slow destroys look like they''re the result of swamping the journal. This is likely only this bad now because we''ve disabled the writeback caches on our DDNs which substantially slows down the journal IO. It seems likely to me any OST activity which overwhelms the journal, not just destroys, will results in the very sluggish behavior. PIGS: ddn-7b: host taskinfo DMT301,C op2A L:19 S:0000E8:42E0 a: 79 l: 2 01/01 00,00 W:WRU WT T:3 DMT311,5 op2A L:19 S:0000E8:4310 a: 80 l: 1 01/01 00,00 W:WRU WT T:3 DMT312,6 op2A L:21 S:0000E8:4340 a: 274AE l: 1 01/01 00,00 W:WRU WT T:3 DMT305,4 op2A L:21 S:0000E8:4358 a: 274BE l: 1 01/01 01,00 W:WRU WT T:3 DMT308,9 op2A L:21 S:0000E8:4370 a: 274CD l: 1 00/00 01+00 W:1/1 WT T:3 DMT310,B op2A L:19 S:0000E8:4388 a: 82 l: 2 01/01 00,00 W:WRU WT T:3 DMT303,3 op2A L:19 S:0000E8:43A0 a: 8A l: 1 01/01 01,00 W:WRU WT T:3 DMT304,2 op2A L:19 S:0000E8:43B8 a: 8C l: 3 00/00 01+00 W:1/1 WT T:2 DMT300,0 op2A L:21 S:0000E8:43E8 a: 274D8 l: 1 00/00 01+00 W:1/1 WT T:2 DMT306,7 op2A L:21 S:0000E8:4400 a: 274DD l: 1 00/00 01+00 W:1/1 WT T:2 DMT315,1 op2A L:21 S:0000E8:4490 a: 274F1 l: 1 00/00 01+00 W:1/1 WT T:1 DMT313,A op2A L:21 S:0000E8:44A8 a: 274FE l: 1 00/00 01+00 W:1/1 WT T:1 DMT316,8 op2A L:21 S:0000E8:44C0 a: 2750F l: 1 00/00 01+00 W:1/1 WT T:1 DMT406,2 op2A L:24 S:0000E8:2C18 a: 324A6 l: 1 01/01 00,00 W:WRU WT T:5 DMT405,C op2A L:24 S:0000E8:2C30 a: 324B6 l: 1 01/01 00,00 W:WRU WT T:4 DMT408,0 op2A L:24 S:0000E8:2C48 a: 3256E l: 1 01/01 01,00 W:WRU WT T:4 DMT407,A op2A L:23 S:0000E8:2D20 a: 38 l: 2 01/01 00,00 W:WRU WT T:3 DMT403,7 op2A L:23 S:0000E8:2D38 a: 3C l: 1 01/01 00,00 W:WRU WT T:3 DMT416,4 op2A L:23 S:0000E8:2D50 a: 3E l: 1 01/01 00,00 W:WRU WT T:2 DMT409,B op2A L:23 S:0000E8:2D68 a: 41 l: 1 01/01 01,00 W:WRU WT T:2 DMT400,9 op2A L:23 S:0000E8:2D98 a: 49 l: 2 00/00 01+00 W:1/1 WT T:1 DMT411,6 op2A L:23 S:0000E8:2DB0 a: 4C l: 3 00/00 01+00 W:1/1 WT T:1 DMT414,5 op2A L:24 S:0000E8:2DC8 a: 328B9 l: 1 01/01 00,00 W:WRU WT T:1 DMT413,8 op2A L:24 S:0000E8:2DE0 a: 328EA l: 1 01/01 00,00 W:WRU WT T:1 DMT404,3 op2A L:24 S:0000E8:2DF8 a: 32903 l: 1 01/01 01,00 W:WRU WT T:1 DMT415,1 op2A L:22 S:0000E8:2E10 a: 32666 l: 1 01/01 01,00 W:WRU WT T:1>From the DDN it shows all the active commands while are localized aroundlow numbers addresses exactly where we would expect the journal to be located. A backtrace from the timed cat above shows the proc read is waiting on a log_wait_commit(). Perhaps it''s not the best of ideas to have the health check rely on speedy access to the disk. It''s quite often the case on our servers that disk access will be slow. crash> bt 25674 PID: 25674 TASK: 10009024980 CPU: 2 COMMAND: "cat" #0 [10005085bf8] schedule at ffffffff802ec76f #1 [10005085cd0] log_wait_commit at ffffffffa0392fc8 #2 [10005085d60] journal_stop at ffffffffa038db34 #3 [10005085da0] fsfilt_ldiskfs_write_record at ffffffffa03d82db #4 [10005085e20] lvfs_check_io_health at ffffffffa02467fa #5 [10005085e50] filter_health_check at ffffffffa03f1e61 #6 [10005085e70] obd_proc_read_health at ffffffffa025ca81 #7 [10005085eb0] proc_file_read at ffffffff801a5baf #8 [10005085f10] vfs_read at ffffffff80175f17 #9 [10005085f40] sys_read at ffffffff801761b2 #10 [10005085f80] system_call at ffffffff8010f262 RIP: 0000002a95724a82 RSP: 0000007fbffff6d8 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffffff8010f262 RCX: 0000002a957244e2 RDX: 0000000000000400 RSI: 0000000000505070 RDI: 0000000000000003 RBP: 0000000000000003 R8: 0000002a9589cb18 R9: 0000000000000200 R10: 0000007fbffff601 R11: 0000000000000246 R12: 0000000000505070 R13: 0000000000000400 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
behlendorf1@llnl.gov
2007-Jan-29 17:50 UTC
[Lustre-devel] [Bug 11512] Reading health_check can be exceedingly slow
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11512 Created an attachment (id=9442) Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: --> (https://bugzilla.lustre.org/attachment.cgi?id=9442&action=view) Remove write during health check Remove the write from the health check. Under heavy load this can take many minutes per obdfilter which can lead to unnessisary failover.