Dear list, Our Lustre system crashes frequently these days with heavy average load. 1)#top top - 14:32:57 up 18:15, 1 user, load average: 25.05, 24.27, 24.47 Mem: 8307364k total, 859724k used, 7447640k free, 234288k buffers Swap: 16386292k total, 0k used, 16386292k free, 37932k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26695 root 15 0 0 0 0 S 7.6 0.0 51:57.40 socknal_sd04 26694 root 15 0 0 0 0 S 6.6 0.0 53:44.42 socknal_sd03 26691 root 15 0 0 0 0 S 5.6 0.0 51:11.76 socknal_sd00 26697 root 15 0 0 0 0 S 5.3 0.0 42:12.23 socknal_sd06 26696 root 15 0 0 0 0 S 3.3 0.0 52:47.42 socknal_sd05 26692 root 15 0 0 0 0 S 2.3 0.0 26:19.46 socknal_sd01 26693 root 15 0 0 0 0 S 2.3 0.0 32:38.21 socknal_sd02 26952 root 15 0 0 0 0 S 1.0 0.0 2:06.69 ll_ost_io_09 .... 2) iostat -x 5 Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp (boss01.ihep.ac.cn) 11/10/2008 avg-cpu: %user %nice %sys %iowait %idle 0.00 0.00 11.33 4.56 84.10 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util cciss/c0d0 1.05 0.43 0.27 0.41 9.78 6.65 4.89 3.32 24.31 0.01 17.15 5.78 0.39 sda 3.46 0.64 1297.05 0.60 2588.12 118.12 1294.06 59.06 2.09 22.81 15.69 0.77 99.57 sdb 3.09 0.28 1274.46 0.18 1541.21 23.54 770.60 11.77 1.23 16.75 12.16 0.78 99.56 avg-cpu: %user %nice %sys %iowait %idle 0.00 0.00 11.53 0.10 88.38 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util cciss/c0d0 0.00 1.80 0.00 0.00 0.00 16.00 0.00 8.00 0.00 0.00 0.00 0.00 0.00 sda 3.20 0.00 1436.60 0.00 130524.80 0.00 65262.40 0.00 90.86 16.29 10.73 0.70 100.00 sdb 3.40 0.00 1142.20 0.00 124113.60 0.00 62056.80 0.00 108.66 10.44 8.24 0.87 99.80 Before each crashes, there are LustreError like: Nov 9 17:25:41 boss01 kernel: LustreError: 27327:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req at e3df8e00 x133017/t0 o3->73c15254-a884-578e-9634-859b44619a4f at NET_0x20000c0a83446_UUID:0/0 lens 400/336 e 0 to 0 dl 1226222741 ref 1 fl Interpret:/0/0 rc 0/0 Nov 9 17:25:41 boss01 kernel: Lustre: 27327:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0005: ignoring bulk IO comm error with 73c15254-a884-578e-9634-859b44619a4f at NET_0x20000c0a83446_UUID id 12345-192.168.52.70 at tcp - client will retry Nov 9 17:27:47 boss01 kernel: Lustre: besfs-OST0006: haven''t heard from client 73c15254-a884-578e-9634-859b44619a4f (at 192.168.52.70 at tcp) in 227 seconds. I think it''s dead, and I am evicting it. Nov 9 17:27:48 boss01 kernel: Lustre: besfs-OST0007: haven''t heard from client 73c15254-a884-578e-9634-859b44619a4f (at 192.168.52.70 at tcp) in 227 seconds. I think it''s dead, and I am evicting it. Nov 9 09:28:05 boss01 sshd[29314]: Connection closed by 192.168.51.130 Nov 9 17:29:17 boss01 ntpd[27872]: kernel time sync enabled 0001 Nov 9 17:56:48 boss01 kernel: Lustre: besfs-OST0005: haven''t heard from client c06ff22f-03a6-3897-ec32-1f26f6958e8b (at 202.122.33.83 at tcp) in 227 seconds. I think it''s dead, and I am evicting it. Nov 9 17:56:48 boss01 kernel: Lustre: Skipped 2 previous similar messages Nov 9 17:59:15 boss01 kernel: Lustre: besfs-OST0002: haven''t heard from client c06ff22f-03a6-3897-ec32-1f26f6958e8b (at 202.122.33.83 at tcp) in 374 seconds. I think it''s dead, and I am evicting it. Nov 9 17:59:18 boss01 kernel: LustreError: 27250:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req at e2ccee00 x36870/t0 o3->7df31bbf-54a5-ada8-abd7-f0920f648d0a at NET_0x20000c0a83446_UUID:0/0 lens 400/336 e 0 to 0 dl 1226224758 ref 1 fl Interpret:/0/0 rc 0/0 Nov 9 17:59:18 boss01 kernel: LustreError: 27250:0:(ost_handler.c:868:ost_brw_read()) Skipped 2 previous similar messages Nov 9 17:59:18 boss01 kernel: Lustre: 27250:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0007: ignoring bulk IO comm error with 7df31bbf-54a5-ada8-abd7-f0920f648d0a at NET_0x20000c0a83446_UUID id 12345-192.168.52.70 at tcp - client will retry Nov 9 17:59:18 boss01 kernel: Lustre: 27250:0:(ost_handler.c:925:ost_brw_read()) Skipped 2 previous similar messages Nov 9 17:59:18 boss01 kernel: LustreError: 29507:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req at e01bce00 x36866/t0 o3->7df31bbf-54a5-ada8-abd7-f0920f648d0a at NET_0x20000c0a83446_UUID:0/0 lens 432/336 e 0 to 0 dl 1226224758 ref 1 fl Interpret:/0/0 rc 0/0 Nov 9 17:59:18 boss01 kernel: LustreError: 29507:0:(ost_handler.c:868:ost_brw_read()) Skipped 4 previous similar messages Nov 9 17:59:18 boss01 kernel: Lustre: 29507:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0005: ignoring bulk IO comm error with 7df31bbf-54a5-ada8-abd7-f0920f648d0a at NET_0x20000c0a83446_UUID id 12345-192.168.52.70 at tcp - client will retry Nov 9 17:59:18 boss01 kernel: Lustre: 29507:0:(ost_handler.c:925:ost_brw_read()) Skipped 4 previous similar messages Nov 9 18:01:33 boss01 kernel: Lustre: besfs-OST0007: haven''t heard from client c06ff22f-03a6-3897-ec32-1f26f6958e8b (at 202.122.33.83 at tcp) in 512 seconds. I think it''s dead, and I am evicting it. Nov 9 18:04:14 boss01 kernel: Lustre: besfs-OST0007: haven''t heard from client 7df31bbf-54a5-ada8-abd7-f0920f648d0a (at 192.168.52.70 at tcp) in 396 seconds. I think it''s dead, and I am evicting it. The configuration of our system OS:Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp MDS:1 OSS:2 with 10Gbit/s NIC, each attached with 2 disk arrays directly. Client: 50 nodes( 8 core server), each has 1Gbit/s NIC and [root at boss02 ~]# sysctl -q lnet lnet.nis = nid refs peer max tx min lnet.nis = 0 at lo 2 0 0 0 0 lnet.nis = 192.168.50.34 at tcp 136 8 256 250 88 lnet.buffers = pages count credits min lnet.buffers = 0 0 0 0 lnet.buffers = 1 0 0 0 lnet.buffers = 256 0 0 0 lnet.peers = nid refs state max rtr min tx min queue lnet.peers = 192.168.50.14 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.11 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.13 at tcp 1 ~rtr 8 8 8 8 -71 0 lnet.peers = 192.168.52.14 at tcp 1 ~rtr 8 8 8 8 -8 0 lnet.peers = 192.168.52.15 at tcp 1 ~rtr 8 8 8 8 -14 0 lnet.peers = 192.168.52.16 at tcp 1 ~rtr 8 8 8 8 -30 0 lnet.peers = 192.168.52.17 at tcp 1 ~rtr 8 8 8 8 -38 0 lnet.peers = 192.168.52.18 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.19 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.20 at tcp 1 ~rtr 8 8 8 8 -19 0 lnet.peers = 192.168.52.21 at tcp 1 ~rtr 8 8 8 8 3 0 lnet.peers = 192.168.52.22 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.50.32 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.23 at tcp 1 ~rtr 8 8 8 8 -6 0 lnet.peers = 192.168.52.24 at tcp 1 ~rtr 8 8 8 8 -50 0 lnet.peers = 192.168.52.25 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.26 at tcp 1 ~rtr 8 8 8 8 -2 0 lnet.peers = 192.168.52.27 at tcp 1 ~rtr 8 8 8 8 -31 0 lnet.peers = 192.168.52.28 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.29 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.30 at tcp 1 ~rtr 8 8 8 8 -31 0 lnet.peers = 192.168.52.31 at tcp 7 ~rtr 8 8 8 2 -10 3318192 lnet.peers = 192.168.52.32 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.33 at tcp 1 ~rtr 8 8 8 8 -6 0 lnet.peers = 192.168.52.34 at tcp 1 ~rtr 8 8 8 8 -4 0 lnet.peers = 192.168.52.35 at tcp 1 ~rtr 8 8 8 8 -2 0 lnet.peers = 192.168.52.36 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.37 at tcp 1 ~rtr 8 8 8 8 -55 0 lnet.peers = 192.168.52.38 at tcp 1 ~rtr 8 8 8 8 -62 0 lnet.peers = 192.168.52.39 at tcp 1 ~rtr 8 8 8 8 -8 0 lnet.peers = 192.168.52.40 at tcp 1 ~rtr 8 8 8 8 -5 0 lnet.peers = 192.168.52.41 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.42 at tcp 1 ~rtr 8 8 8 8 -4 0 lnet.peers = 192.168.52.43 at tcp 1 ~rtr 8 8 8 8 -31 0 lnet.peers = 192.168.52.44 at tcp 1 ~rtr 8 8 8 8 -14 0 lnet.peers = 192.168.52.45 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.46 at tcp 1 ~rtr 8 8 8 8 -3 0 lnet.peers = 192.168.52.47 at tcp 1 ~rtr 8 8 8 8 -10 0 lnet.peers = 192.168.52.48 at tcp 1 ~rtr 8 8 8 8 -23 0 lnet.peers = 192.168.52.49 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.50 at tcp 1 ~rtr 8 8 8 8 -3 0 lnet.peers = 192.168.52.51 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.52 at tcp 1 ~rtr 8 8 8 8 -23 0 lnet.peers = 192.168.52.53 at tcp 1 ~rtr 8 8 8 8 -5 0 lnet.peers = 192.168.52.54 at tcp 1 ~rtr 8 8 8 8 -20 0 lnet.peers = 192.168.52.55 at tcp 1 ~rtr 8 8 8 8 -5 0 lnet.peers = 192.168.52.56 at tcp 1 ~rtr 8 8 8 8 1 0 lnet.peers = 192.168.52.57 at tcp 1 ~rtr 8 8 8 8 1 0 lnet.peers = 192.168.52.58 at tcp 1 ~rtr 8 8 8 8 -11 0 lnet.peers = 192.168.52.59 at tcp 1 ~rtr 8 8 8 8 -4 0 lnet.peers = 192.168.52.60 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.61 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.62 at tcp 1 ~rtr 8 8 8 8 -19 0 lnet.peers = 192.168.52.63 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.64 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.65 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.66 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.67 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.68 at tcp 1 ~rtr 8 8 8 8 3 0 lnet.peers = 192.168.52.69 at tcp 1 ~rtr 8 8 8 8 1 0 lnet.peers = 192.168.52.70 at tcp 1 ~rtr 8 8 8 8 -8 0 lnet.peers = 192.168.52.71 at tcp 1 ~rtr 8 8 8 8 -2 0 lnet.peers = 192.168.52.72 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.73 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.74 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.75 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 202.122.33.56 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.76 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.77 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.78 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.79 at tcp 1 ~rtr 8 8 8 8 -3 0 lnet.peers = 192.168.52.80 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.81 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.82 at tcp 1 ~rtr 8 8 8 8 3 0 lnet.peers = 192.168.52.83 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.84 at tcp 1 ~rtr 8 8 8 8 1 0 lnet.peers = 192.168.52.86 at tcp 1 ~rtr 8 8 8 8 -12 0 lnet.peers = 192.168.52.87 at tcp 1 ~rtr 8 8 8 8 3 0 lnet.peers = 192.168.52.88 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.89 at tcp 1 ~rtr 8 8 8 8 1 0 lnet.peers = 192.168.52.90 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.91 at tcp 1 ~rtr 8 8 8 8 3 0 lnet.peers = 192.168.52.92 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.93 at tcp 1 ~rtr 8 8 8 8 -14 0 lnet.peers = 192.168.52.94 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.95 at tcp 1 ~rtr 8 8 8 8 -19 0 lnet.peers = 192.168.52.96 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.97 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.98 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.99 at tcp 1 ~rtr 8 8 8 8 -3 0 lnet.peers = 192.168.52.100 at tcp 1 ~rtr 8 8 8 8 -4 0 lnet.peers = 192.168.52.101 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 202.122.33.82 at tcp 1 ~rtr 8 8 8 8 -6383 0 lnet.peers = 192.168.52.102 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 202.122.33.83 at tcp 1 ~rtr 8 8 8 8 -6 0 lnet.peers = 192.168.52.103 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 202.122.33.84 at tcp 1 ~rtr 8 8 8 8 -649 0 lnet.peers = 192.168.52.104 at tcp 1 ~rtr 8 8 8 8 -6 0 lnet.peers = 192.168.52.105 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.106 at tcp 1 ~rtr 8 8 8 8 -15 0 lnet.peers = 192.168.52.107 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.108 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.109 at tcp 1 ~rtr 8 8 8 8 -79 0 lnet.peers = 192.168.52.110 at tcp 1 ~rtr 8 8 8 8 -24 0 lnet.peers = 192.168.52.111 at tcp 1 ~rtr 8 8 8 8 -102 0 lnet.peers = 192.168.52.112 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 202.122.33.92 at tcp 1 ~rtr 8 8 8 8 -1148 0 lnet.peers = 202.122.33.93 at tcp 1 ~rtr 8 8 8 8 -5 0 lnet.peers = 192.168.52.113 at tcp 1 ~rtr 8 8 8 8 -55 0 lnet.peers = 192.168.52.114 at tcp 1 ~rtr 8 8 8 8 -73 0 lnet.peers = 192.168.52.115 at tcp 1 ~rtr 8 8 8 8 -6 0 lnet.peers = 202.122.33.95 at tcp 1 ~rtr 8 8 8 8 -1914 0 lnet.peers = 192.168.52.116 at tcp 1 ~rtr 8 8 8 8 -4 0 lnet.peers = 192.168.52.117 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.118 at tcp 1 ~rtr 8 8 8 8 -55 0 lnet.peers = 192.168.52.119 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.120 at tcp 1 ~rtr 8 8 8 8 1 0 lnet.peers = 192.168.52.121 at tcp 1 ~rtr 8 8 8 8 -54 0 lnet.peers = 192.168.52.122 at tcp 1 ~rtr 8 8 8 8 -65 0 lnet.peers = 192.168.52.123 at tcp 1 ~rtr 8 8 8 8 -16 0 lnet.peers = 192.168.52.124 at tcp 1 ~rtr 8 8 8 8 -32 0 lnet.peers = 192.168.52.125 at tcp 1 ~rtr 8 8 8 8 -158 0 lnet.peers = 192.168.52.126 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.127 at tcp 1 ~rtr 8 8 8 8 -2 0 lnet.peers = 192.168.52.128 at tcp 1 ~rtr 8 8 8 8 -36 0 lnet.peers = 192.168.52.129 at tcp 1 ~rtr 8 8 8 8 -120 0 lnet.peers = 192.168.52.130 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.131 at tcp 1 ~rtr 8 8 8 8 -82 0 lnet.peers = 192.168.55.11 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.55.12 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.55.13 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.55.14 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.55.15 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.55.16 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.51.134 at tcp 1 ~rtr 8 8 8 8 -631 0 lnet.routers = ref rtr_ref alive_cnt state last_ping router lnet.routes = Routing disabled lnet.routes = net hops state router lnet.stats = 7 6513 0 349123954 349123978 0 25 9871897726514 80688968391 0 7600 lnet.debug_mb = 41 lnet.panic_on_lbug = 0 lnet.catastrophe = 0 lnet.memused = 4166984 lnet.upcall = /usr/lib/lustre/lnet_upcall lnet.debug_path = /tmp/lustre-log lnet.console_backoff = 2 lnet.console_min_delay_centisecs = 50 lnet.console_max_delay_centisecs = 60000 lnet.console_ratelimit = 1 lnet.printk = warning error emerg console lnet.subsystem_debug = undefined mdc mds osc ost class log llite rpc lnet lnd pinger filter echo ldlm lov lmv sec gss mgc mgs fid fld lnet.debug = ioctl neterror warning error emerg ha config console My questions is: 1.What is the signal of the Lustre overload? 2. Can Lustre reject too many connections before it is going to crash? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/e2beadab/attachment-0001.html
Dear list, Our Lustre system crashes frequently these days with heavy average load. 1)#top top - 14:32:57 up 18:15, 1 user, load average: 25.05, 24.27, 24.47 Mem: 8307364k total, 859724k used, 7447640k free, 234288k buffers Swap: 16386292k total, 0k used, 16386292k free, 37932k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26695 root 15 0 0 0 0 S 7.6 0.0 51:57.40 socknal_sd04 26694 root 15 0 0 0 0 S 6.6 0.0 53:44.42 socknal_sd03 26691 root 15 0 0 0 0 S 5.6 0.0 51:11.76 socknal_sd00 26697 root 15 0 0 0 0 S 5.3 0.0 42:12.23 socknal_sd06 26696 root 15 0 0 0 0 S 3.3 0.0 52:47.42 socknal_sd05 26692 root 15 0 0 0 0 S 2.3 0.0 26:19.46 socknal_sd01 26693 root 15 0 0 0 0 S 2.3 0.0 32:38.21 socknal_sd02 26952 root 15 0 0 0 0 S 1.0 0.0 2:06.69 ll_ost_io_09 .... 2) iostat -x 5 Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp (boss01.ihep.ac.cn) 11/10/2008 avg-cpu: %user %nice %sys %iowait %idle 0.00 0.00 11.33 4.56 84.10 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util cciss/c0d0 1.05 0.43 0.27 0.41 9.78 6.65 4.89 3.32 24.31 0.01 17.15 5.78 0.39 sda 3.46 0.64 1297.05 0.60 2588.12 118.12 1294.06 59.06 2.09 22.81 15.69 0.77 99.57 sdb 3.09 0.28 1274.46 0.18 1541.21 23.54 770.60 11.77 1.23 16.75 12.16 0.78 99.56 avg-cpu: %user %nice %sys %iowait %idle 0.00 0.00 11.53 0.10 88.38 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util cciss/c0d0 0.00 1.80 0.00 0.00 0.00 16.00 0.00 8.00 0.00 0.00 0.00 0.00 0.00 sda 3.20 0.00 1436.60 0.00 130524.80 0.00 65262.40 0.00 90.86 16.29 10.73 0.70 100.00 sdb 3.40 0.00 1142.20 0.00 124113.60 0.00 62056.80 0.00 108.66 10.44 8.24 0.87 99.80 Before each crashes, there are LustreError like: Nov 9 17:25:41 boss01 kernel: LustreError: 27327:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req at e3df8e00 x133017/t0 o3->73c15254-a884-578e-9634-859b44619a4f at NET_0x20000c0a83446_UUID:0/0 lens 400/336 e 0 to 0 dl 1226222741 ref 1 fl Interpret:/0/0 rc 0/0 Nov 9 17:25:41 boss01 kernel: Lustre: 27327:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0005: ignoring bulk IO comm error with 73c15254-a884-578e-9634-859b44619a4f at NET_0x20000c0a83446_UUID id 12345-192.168.52.70 at tcp - client will retry Nov 9 17:27:47 boss01 kernel: Lustre: besfs-OST0006: haven''t heard from client 73c15254-a884-578e-9634-859b44619a4f (at 192.168.52.70 at tcp) in 227 seconds. I think it''s dead, and I am evicting it. Nov 9 17:27:48 boss01 kernel: Lustre: besfs-OST0007: haven''t heard from client 73c15254-a884-578e-9634-859b44619a4f (at 192.168.52.70 at tcp) in 227 seconds. I think it''s dead, and I am evicting it. Nov 9 09:28:05 boss01 sshd[29314]: Connection closed by 192.168.51.130 Nov 9 17:29:17 boss01 ntpd[27872]: kernel time sync enabled 0001 Nov 9 17:56:48 boss01 kernel: Lustre: besfs-OST0005: haven''t heard from client c06ff22f-03a6-3897-ec32-1f26f6958e8b (at 202.122.33.83 at tcp) in 227 seconds. I think it''s dead, and I am evicting it. Nov 9 17:56:48 boss01 kernel: Lustre: Skipped 2 previous similar messages Nov 9 17:59:15 boss01 kernel: Lustre: besfs-OST0002: haven''t heard from client c06ff22f-03a6-3897-ec32-1f26f6958e8b (at 202.122.33.83 at tcp) in 374 seconds. I think it''s dead, and I am evicting it. Nov 9 17:59:18 boss01 kernel: LustreError: 27250:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req at e2ccee00 x36870/t0 o3->7df31bbf-54a5-ada8-abd7-f0920f648d0a at NET_0x20000c0a83446_UUID:0/0 lens 400/336 e 0 to 0 dl 1226224758 ref 1 fl Interpret:/0/0 rc 0/0 Nov 9 17:59:18 boss01 kernel: LustreError: 27250:0:(ost_handler.c:868:ost_brw_read()) Skipped 2 previous similar messages Nov 9 17:59:18 boss01 kernel: Lustre: 27250:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0007: ignoring bulk IO comm error with 7df31bbf-54a5-ada8-abd7-f0920f648d0a at NET_0x20000c0a83446_UUID id 12345-192.168.52.70 at tcp - client will retry Nov 9 17:59:18 boss01 kernel: Lustre: 27250:0:(ost_handler.c:925:ost_brw_read()) Skipped 2 previous similar messages Nov 9 17:59:18 boss01 kernel: LustreError: 29507:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req at e01bce00 x36866/t0 o3->7df31bbf-54a5-ada8-abd7-f0920f648d0a at NET_0x20000c0a83446_UUID:0/0 lens 432/336 e 0 to 0 dl 1226224758 ref 1 fl Interpret:/0/0 rc 0/0 Nov 9 17:59:18 boss01 kernel: LustreError: 29507:0:(ost_handler.c:868:ost_brw_read()) Skipped 4 previous similar messages Nov 9 17:59:18 boss01 kernel: Lustre: 29507:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0005: ignoring bulk IO comm error with 7df31bbf-54a5-ada8-abd7-f0920f648d0a at NET_0x20000c0a83446_UUID id 12345-192.168.52.70 at tcp - client will retry Nov 9 17:59:18 boss01 kernel: Lustre: 29507:0:(ost_handler.c:925:ost_brw_read()) Skipped 4 previous similar messages Nov 9 18:01:33 boss01 kernel: Lustre: besfs-OST0007: haven''t heard from client c06ff22f-03a6-3897-ec32-1f26f6958e8b (at 202.122.33.83 at tcp) in 512 seconds. I think it''s dead, and I am evicting it. Nov 9 18:04:14 boss01 kernel: Lustre: besfs-OST0007: haven''t heard from client 7df31bbf-54a5-ada8-abd7-f0920f648d0a (at 192.168.52.70 at tcp) in 396 seconds. I think it''s dead, and I am evicting it. The configuration of our system OS:Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp MDS:1 OSS:2 with 10Gbit/s NIC, each attached with 2 disk arrays directly. Client: 50 nodes( 8 core server), each has 1Gbit/s NIC and [root at boss02 ~]# sysctl -q lnet lnet.nis = nid refs peer max tx min lnet.nis = 0 at lo 2 0 0 0 0 lnet.nis = 192.168.50.34 at tcp 136 8 256 250 88 lnet.buffers = pages count credits min lnet.buffers = 0 0 0 0 lnet.buffers = 1 0 0 0 lnet.buffers = 256 0 0 0 lnet.peers = nid refs state max rtr min tx min queue lnet.peers = 192.168.50.14 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.11 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.13 at tcp 1 ~rtr 8 8 8 8 -71 0 lnet.peers = 192.168.52.14 at tcp 1 ~rtr 8 8 8 8 -8 0 lnet.peers = 192.168.52.15 at tcp 1 ~rtr 8 8 8 8 -14 0 lnet.peers = 192.168.52.16 at tcp 1 ~rtr 8 8 8 8 -30 0 lnet.peers = 192.168.52.17 at tcp 1 ~rtr 8 8 8 8 -38 0 lnet.peers = 192.168.52.18 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.19 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.20 at tcp 1 ~rtr 8 8 8 8 -19 0 lnet.peers = 192.168.52.21 at tcp 1 ~rtr 8 8 8 8 3 0 lnet.peers = 192.168.52.22 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.50.32 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.23 at tcp 1 ~rtr 8 8 8 8 -6 0 lnet.peers = 192.168.52.24 at tcp 1 ~rtr 8 8 8 8 -50 0 lnet.peers = 192.168.52.25 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.26 at tcp 1 ~rtr 8 8 8 8 -2 0 lnet.peers = 192.168.52.27 at tcp 1 ~rtr 8 8 8 8 -31 0 lnet.peers = 192.168.52.28 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.29 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.30 at tcp 1 ~rtr 8 8 8 8 -31 0 lnet.peers = 192.168.52.31 at tcp 7 ~rtr 8 8 8 2 -10 3318192 lnet.peers = 192.168.52.32 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.33 at tcp 1 ~rtr 8 8 8 8 -6 0 lnet.peers = 192.168.52.34 at tcp 1 ~rtr 8 8 8 8 -4 0 lnet.peers = 192.168.52.35 at tcp 1 ~rtr 8 8 8 8 -2 0 lnet.peers = 192.168.52.36 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.37 at tcp 1 ~rtr 8 8 8 8 -55 0 lnet.peers = 192.168.52.38 at tcp 1 ~rtr 8 8 8 8 -62 0 lnet.peers = 192.168.52.39 at tcp 1 ~rtr 8 8 8 8 -8 0 lnet.peers = 192.168.52.40 at tcp 1 ~rtr 8 8 8 8 -5 0 lnet.peers = 192.168.52.41 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.42 at tcp 1 ~rtr 8 8 8 8 -4 0 lnet.peers = 192.168.52.43 at tcp 1 ~rtr 8 8 8 8 -31 0 lnet.peers = 192.168.52.44 at tcp 1 ~rtr 8 8 8 8 -14 0 lnet.peers = 192.168.52.45 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.46 at tcp 1 ~rtr 8 8 8 8 -3 0 lnet.peers = 192.168.52.47 at tcp 1 ~rtr 8 8 8 8 -10 0 lnet.peers = 192.168.52.48 at tcp 1 ~rtr 8 8 8 8 -23 0 lnet.peers = 192.168.52.49 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.50 at tcp 1 ~rtr 8 8 8 8 -3 0 lnet.peers = 192.168.52.51 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.52 at tcp 1 ~rtr 8 8 8 8 -23 0 lnet.peers = 192.168.52.53 at tcp 1 ~rtr 8 8 8 8 -5 0 lnet.peers = 192.168.52.54 at tcp 1 ~rtr 8 8 8 8 -20 0 lnet.peers = 192.168.52.55 at tcp 1 ~rtr 8 8 8 8 -5 0 lnet.peers = 192.168.52.56 at tcp 1 ~rtr 8 8 8 8 1 0 lnet.peers = 192.168.52.57 at tcp 1 ~rtr 8 8 8 8 1 0 lnet.peers = 192.168.52.58 at tcp 1 ~rtr 8 8 8 8 -11 0 lnet.peers = 192.168.52.59 at tcp 1 ~rtr 8 8 8 8 -4 0 lnet.peers = 192.168.52.60 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.61 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.62 at tcp 1 ~rtr 8 8 8 8 -19 0 lnet.peers = 192.168.52.63 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.64 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.65 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.66 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.67 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.68 at tcp 1 ~rtr 8 8 8 8 3 0 lnet.peers = 192.168.52.69 at tcp 1 ~rtr 8 8 8 8 1 0 lnet.peers = 192.168.52.70 at tcp 1 ~rtr 8 8 8 8 -8 0 lnet.peers = 192.168.52.71 at tcp 1 ~rtr 8 8 8 8 -2 0 lnet.peers = 192.168.52.72 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.73 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.74 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.75 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 202.122.33.56 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.76 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.77 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.78 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.79 at tcp 1 ~rtr 8 8 8 8 -3 0 lnet.peers = 192.168.52.80 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.81 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.82 at tcp 1 ~rtr 8 8 8 8 3 0 lnet.peers = 192.168.52.83 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.84 at tcp 1 ~rtr 8 8 8 8 1 0 lnet.peers = 192.168.52.86 at tcp 1 ~rtr 8 8 8 8 -12 0 lnet.peers = 192.168.52.87 at tcp 1 ~rtr 8 8 8 8 3 0 lnet.peers = 192.168.52.88 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.89 at tcp 1 ~rtr 8 8 8 8 1 0 lnet.peers = 192.168.52.90 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.91 at tcp 1 ~rtr 8 8 8 8 3 0 lnet.peers = 192.168.52.92 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.93 at tcp 1 ~rtr 8 8 8 8 -14 0 lnet.peers = 192.168.52.94 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.95 at tcp 1 ~rtr 8 8 8 8 -19 0 lnet.peers = 192.168.52.96 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.97 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.98 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.52.99 at tcp 1 ~rtr 8 8 8 8 -3 0 lnet.peers = 192.168.52.100 at tcp 1 ~rtr 8 8 8 8 -4 0 lnet.peers = 192.168.52.101 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 202.122.33.82 at tcp 1 ~rtr 8 8 8 8 -6383 0 lnet.peers = 192.168.52.102 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 202.122.33.83 at tcp 1 ~rtr 8 8 8 8 -6 0 lnet.peers = 192.168.52.103 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 202.122.33.84 at tcp 1 ~rtr 8 8 8 8 -649 0 lnet.peers = 192.168.52.104 at tcp 1 ~rtr 8 8 8 8 -6 0 lnet.peers = 192.168.52.105 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.106 at tcp 1 ~rtr 8 8 8 8 -15 0 lnet.peers = 192.168.52.107 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.108 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.109 at tcp 1 ~rtr 8 8 8 8 -79 0 lnet.peers = 192.168.52.110 at tcp 1 ~rtr 8 8 8 8 -24 0 lnet.peers = 192.168.52.111 at tcp 1 ~rtr 8 8 8 8 -102 0 lnet.peers = 192.168.52.112 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 202.122.33.92 at tcp 1 ~rtr 8 8 8 8 -1148 0 lnet.peers = 202.122.33.93 at tcp 1 ~rtr 8 8 8 8 -5 0 lnet.peers = 192.168.52.113 at tcp 1 ~rtr 8 8 8 8 -55 0 lnet.peers = 192.168.52.114 at tcp 1 ~rtr 8 8 8 8 -73 0 lnet.peers = 192.168.52.115 at tcp 1 ~rtr 8 8 8 8 -6 0 lnet.peers = 202.122.33.95 at tcp 1 ~rtr 8 8 8 8 -1914 0 lnet.peers = 192.168.52.116 at tcp 1 ~rtr 8 8 8 8 -4 0 lnet.peers = 192.168.52.117 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.118 at tcp 1 ~rtr 8 8 8 8 -55 0 lnet.peers = 192.168.52.119 at tcp 1 ~rtr 8 8 8 8 -1 0 lnet.peers = 192.168.52.120 at tcp 1 ~rtr 8 8 8 8 1 0 lnet.peers = 192.168.52.121 at tcp 1 ~rtr 8 8 8 8 -54 0 lnet.peers = 192.168.52.122 at tcp 1 ~rtr 8 8 8 8 -65 0 lnet.peers = 192.168.52.123 at tcp 1 ~rtr 8 8 8 8 -16 0 lnet.peers = 192.168.52.124 at tcp 1 ~rtr 8 8 8 8 -32 0 lnet.peers = 192.168.52.125 at tcp 1 ~rtr 8 8 8 8 -158 0 lnet.peers = 192.168.52.126 at tcp 1 ~rtr 8 8 8 8 0 0 lnet.peers = 192.168.52.127 at tcp 1 ~rtr 8 8 8 8 -2 0 lnet.peers = 192.168.52.128 at tcp 1 ~rtr 8 8 8 8 -36 0 lnet.peers = 192.168.52.129 at tcp 1 ~rtr 8 8 8 8 -120 0 lnet.peers = 192.168.52.130 at tcp 1 ~rtr 8 8 8 8 2 0 lnet.peers = 192.168.52.131 at tcp 1 ~rtr 8 8 8 8 -82 0 lnet.peers = 192.168.55.11 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.55.12 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.55.13 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.55.14 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.55.15 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.55.16 at tcp 1 ~rtr 8 8 8 8 4 0 lnet.peers = 192.168.51.134 at tcp 1 ~rtr 8 8 8 8 -631 0 lnet.routers = ref rtr_ref alive_cnt state last_ping router lnet.routes = Routing disabled lnet.routes = net hops state router lnet.stats = 7 6513 0 349123954 349123978 0 25 9871897726514 80688968391 0 7600 lnet.debug_mb = 41 lnet.panic_on_lbug = 0 lnet.catastrophe = 0 lnet.memused = 4166984 lnet.upcall = /usr/lib/lustre/lnet_upcall lnet.debug_path = /tmp/lustre-log lnet.console_backoff = 2 lnet.console_min_delay_centisecs = 50 lnet.console_max_delay_centisecs = 60000 lnet.console_ratelimit = 1 lnet.printk = warning error emerg console lnet.subsystem_debug = undefined mdc mds osc ost class log llite rpc lnet lnd pinger filter echo ldlm lov lmv sec gss mgc mgs fid fld lnet.debug = ioctl neterror warning error emerg ha config console My questions is: 1.What is the signal of the Lustre overload? 2. Can Lustre reject too many connections before it is going to crash? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/d1f02228/attachment-0001.html
Brian J. Murrell
2008-Nov-10 14:20 UTC
[Lustre-discuss] Frequent OSS Crashes with heavy load
On Mon, 2008-11-10 at 14:58 +0800, wanglu wrote:> ? > Dear list, > > Our Lustre system crashesI don''t see any evidence of a "crash" in your posting here. Can you define what you mean by "crash"?> The configuration of our system > OS:Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp > MDS:1 > OSS:2 with 10Gbit/s NIC, each attached with 2 disk arrays directly. > Client: 50 nodes( 8 core server), each has 1Gbit/s NICSo your entire Lustre server infrastructure is a single node with all of the MDS, MGS and OSS (2x OSTs) on it? If yes, can I ask why? Lustre is likely not going to perform very well in such a configuration. Is your storage oversubscribed? Did you benchmark your storage system with our iokit to find out the optimum number of OST threads you should be running?> My questions is: > 1.What is the signal of the Lustre overload?I''m not sure I''m understanding this question.> 2. Can Lustre reject too many connections before it is going to > crash?Properly tuned, Lustre will not "crash" due to load, but will manage it. As long as your OSS is properly tuned for your storage capabilities, you can throw as many client loads at it as you want. Each load will just get it''s appropriate share of the backend resources. As you continue to add more clients loads, each load will just get a smaller portion of the total resources. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/3807b261/attachment.bin
Thanks, Brian, During a "crash", I can neither SSH to the OSS server, nor start a new console on the machine directly. A "df" uses over 10 sec. Our system has 3 server nodes. 1 server for MDS, 2 servers for 2 OSSs. And each OSS has 2 disk arrays attached. The Total space is 57TB. The problem may be caused by oversubscribed. Since the %util and average load are both high. However, I do not know 1.How to estimate the optimum number of OST thread? Do you have any suggestion? 2.What is the relationship between OST thread number and the number of Lustre client nodes? if max OST thread number is X, then max Lustre client number is X/ 8?(the default connections of a peer is 8). Brian J. Murrell ?:> On Mon, 2008-11-10 at 14:58 +0800, wanglu wrote: >> ? >> Dear list, >> >> Our Lustre system crashes > > I don''t see any evidence of a "crash" in your posting here. Can you > define what you mean by "crash"? > >> The configuration of our system >> OS:Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp >> MDS:1 >> OSS:2 with 10Gbit/s NIC, each attached with 2 disk arrays directly. >> Client: 50 nodes( 8 core server), each has 1Gbit/s NIC > > So your entire Lustre server infrastructure is a single node with all of > the MDS, MGS and OSS (2x OSTs) on it? If yes, can I ask why? Lustre is > likely not going to perform very well in such a configuration. > > Is your storage oversubscribed? Did you benchmark your storage system > with our iokit to find out the optimum number of OST threads you should > be running? > >> My questions is: >> 1.What is the signal of the Lustre overload? > > I''m not sure I''m understanding this question. > >> 2. Can Lustre reject too many connections before it is going to >> crash? > > Properly tuned, Lustre will not "crash" due to load, but will manage it. > As long as your OSS is properly tuned for your storage capabilities, you > can throw as many client loads at it as you want. Each load will just > get it''s appropriate share of the backend resources. As you continue to > add more clients loads, each load will just get a smaller portion of the > total resources. > > b. > >
Brian J. Murrell
2008-Nov-10 14:58 UTC
[Lustre-discuss] Frequent OSS Crashes with heavy load
On Mon, 2008-11-10 at 14:49 +0000, Wang lu wrote:> Thanks, Brian, > During a "crash", I can neither SSH to the OSS server, nor > start a new console on the machine directly. A "df" uses over 10 sec.Yeah, sounds like the OSS is quite "backed up".> Our system has 3 server nodes. 1 server for MDS, 2 servers for 2 OSSs. And > each OSS has 2 disk arrays attached. The Total space is 57TB.Ahhh. OK. Your description made it sound like you were running all of those on a single node and the reality is that Lustre doesn''t do anything magic. If you only use a single node, you likely won''t see any better performance than say, just NFS. But I digress.> The problem may be caused by oversubscribed. Since the %util and average > load are both high. However, I do not know 1.How to estimate the optimum number > of OST thread? Do you have any suggestion?Use our iokit.> 2.What is the relationship between OST thread number and the number of Lustre > client nodes?Nothing. The relationship is the point of diminishing returns on driving your storage as you add more threads. Most storage can benefit from having multiple threads from a single machine driving it -- to a point of saturation. There is no point in driving the storage beyond that point of saturation. The iokit will test your storage, throwing more and more threads at it. When you look at the output you will find a maximum number of threads beyond which you get no more increase in performance. That number is your optimum OST threads number. Certainly you can achieve this without the iokit by just playing with the number of ost threads, adjusting up and down (think about doing a binary search for example) until you find your sweet spot. This method is of course more time consuming. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/cb535a1f/attachment.bin
Thanks, but I am still unclear about: 1.How to limit the OST thread number after I find a optimum number? 2.The meaning of /proc/sys/lnet/peers and /proc/sys/lnet/nis? For example [root at boss01 ~]# cat /proc/sys/lnet/peers nid refs state max rtr min tx min queue 192.168.52.39 at tcp 6 ~rtr 8 8 8 3 -19 1458536 [root at boss01 ~]# cat /proc/sys/lnet/nis nid refs peer max tx min 0 at lo 2 0 0 0 0 192.168.50.33 at tcp 137 8 256 256 -424 Brian J. Murrell ?:> On Mon, 2008-11-10 at 14:49 +0000, Wang lu wrote: >> Thanks, Brian, >> During a "crash", I can neither SSH to the OSS server, nor >> start a new console on the machine directly. A "df" uses over 10 sec. > > Yeah, sounds like the OSS is quite "backed up". > >> Our system has 3 server nodes. 1 server for MDS, 2 servers for 2 OSSs.And>> each OSS has 2 disk arrays attached. The Total space is 57TB. > > Ahhh. OK. Your description made it sound like you were running all of > those on a single node and the reality is that Lustre doesn''t do > anything magic. If you only use a single node, you likely won''t see any > better performance than say, just NFS. But I digress. > >> The problem may be caused by oversubscribed. Since the %util and average >> load are both high. However, I do not know 1.How to estimate the optimumnumber>> of OST thread? Do you have any suggestion? > > Use our iokit. > >> 2.What is the relationship between OST thread number and the number ofLustre>> client nodes? > > Nothing. The relationship is the point of diminishing returns on > driving your storage as you add more threads. Most storage can benefit > from having multiple threads from a single machine driving it -- to a > point of saturation. There is no point in driving the storage beyond > that point of saturation. The iokit will test your storage, throwing > more and more threads at it. When you look at the output you will find > a maximum number of threads beyond which you get no more increase in > performance. That number is your optimum OST threads number. > > Certainly you can achieve this without the iokit by just playing with > the number of ost threads, adjusting up and down (think about doing a > binary search for example) until you find your sweet spot. This method > is of course more time consuming. > > b. >
Brian J. Murrell
2008-Nov-10 16:01 UTC
[Lustre-discuss] Frequent OSS Crashes with heavy load
On Mon, 2008-11-10 at 15:58 +0000, Wang lu wrote:> Thanks, but I am still unclear about: > > 1.How to limit the OST thread number after I find a optimum number?It''s a module option to the oss module. It should be documented in the manual.> 2.The meaning of /proc/sys/lnet/peers and /proc/sys/lnet/nis?The meaning of many of the variables in /proc are also documented in the manual. If you find any that are not, you can file a ticket in our bz requesting they be added.> For example > [root at boss01 ~]# cat /proc/sys/lnet/peers > nid refs state max rtr min tx min queue > 192.168.52.39 at tcp 6 ~rtr 8 8 8 3 -19 1458536 > > [root at boss01 ~]# cat /proc/sys/lnet/nis > nid refs peer max tx min > 0 at lo 2 0 0 0 0 > 192.168.50.33 at tcp 137 8 256 256 -424I don''t know the details of either of these off-hand. Probably one of our LNET experts might be able to provide more information. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/fc734f72/attachment.bin
I am also unclear about the top result: top - 00:16:19 up 1 day, 3:58, 1 user, load average: 22.71, 23.27, 23.74 Tasks: 851 total, 2 running, 849 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0% us, 7.0% sy, 0.0% ni, 86.7% id, 0.2% wa, 0.2% hi, 5.9% si Mem: 8307364k total, 894940k used, 7412424k free, 240912k buffers Swap: 16386292k total, 0k used, 16386292k free, 78108k cached The CPU and memory are both free, while the load average is quite high. It is possibile for Lustre to cache more data? Brian J. Murrell ?:> On Mon, 2008-11-10 at 15:58 +0000, Wang lu wrote: >> Thanks, but I am still unclear about: >> >> 1.How to limit the OST thread number after I find a optimum number? > > It''s a module option to the oss module. It should be documented in the > manual. > >> 2.The meaning of /proc/sys/lnet/peers and /proc/sys/lnet/nis? > > The meaning of many of the variables in /proc are also documented in the > manual. If you find any that are not, you can file a ticket in our bz > requesting they be added. > >> For example >> [root at boss01 ~]# cat /proc/sys/lnet/peers >> nid refs state max rtr min tx min queue >> 192.168.52.39 at tcp 6 ~rtr 8 8 8 3 -19 1458536 >> >> [root at boss01 ~]# cat /proc/sys/lnet/nis >> nid refs peer max tx min >> 0 at lo 2 0 0 0 0 >> 192.168.50.33 at tcp 137 8 256 256 -424 > > I don''t know the details of either of these off-hand. Probably one of > our LNET experts might be able to provide more information. > > b. >
Brian J. Murrell
2008-Nov-10 16:21 UTC
[Lustre-discuss] Frequent OSS Crashes with heavy load
On Mon, 2008-11-10 at 16:18 +0000, Wang lu wrote:> I am also unclear about the top result: > top - 00:16:19 up 1 day, 3:58, 1 user, load average: 22.71, 23.27, 23.74 > Tasks: 851 total, 2 running, 849 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.0% us, 7.0% sy, 0.0% ni, 86.7% id, 0.2% wa, 0.2% hi, 5.9% si > Mem: 8307364k total, 894940k used, 7412424k free, 240912k buffers > Swap: 16386292k total, 0k used, 16386292k free, 78108k cached > > > The CPU and memory are both free, while the load average is quite high. It is > possibile for Lustre to cache more data?Caching on the OSS is a coming feature but that doesn''t alleviate the need of the OST to read data not in cache and data that needs to be flushed to disk. IOW, a cache will not alleviate a problem of oversubscribed storage. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/219202ce/attachment-0001.bin
I have already 512(max number) IO thread running. Some of them are of "Dead" status. Is it safe to draw conclusion that the OSS is oversubscribed? Brian J. Murrell ?:> On Mon, 2008-11-10 at 16:18 +0000, Wang lu wrote: >> I am also unclear about the top result: >> top - 00:16:19 up 1 day, 3:58, 1 user, load average: 22.71, 23.27, 23.74 >> Tasks: 851 total, 2 running, 849 sleeping, 0 stopped, 0 zombie >> Cpu(s): 0.0% us, 7.0% sy, 0.0% ni, 86.7% id, 0.2% wa, 0.2% hi, 5.9% si >> Mem: 8307364k total, 894940k used, 7412424k free, 240912k buffers >> Swap: 16386292k total, 0k used, 16386292k free, 78108k cached >> >> >> The CPU and memory are both free, while the load average is quite high. Itis>> possibile for Lustre to cache more data? > > Caching on the OSS is a coming feature but that doesn''t alleviate the > need of the OST to read data not in cache and data that needs to be > flushed to disk. IOW, a cache will not alleviate a problem of > oversubscribed storage. > > b. >
Brian J. Murrell
2008-Nov-10 16:55 UTC
[Lustre-discuss] Frequent OSS Crashes with heavy load
On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote:> I have already 512(max number) IO thread running. Some of them are of "Dead" > status. Is it safe to draw conclusion that the OSS is oversubscribed?Until you do some analysis of your storage with the iokit, one cannot really draw any conclusions, however if you are already at the maximum value of OST threads, it would not be difficult to believe that perhaps this is a possibility. Try a simple experiment and half the number to 256 and see if you have any drop off in throughput to the storage devices. If not, then you can easily assume that 512 was either too much or not necessary. You can try doing this again if you wish. If you get to a value of OST threads where your throughput is lower than it should be, you''ve gone too low. But really, the iokit is the more efficient and accurate way to determine this. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/af7f384a/attachment.bin
Thanks a lot. I will go on tomorrow. Brian J. Murrell ?:> On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote: >> I have already 512(max number) IO thread running. Some of them are of "Dead" >> status. Is it safe to draw conclusion that the OSS is oversubscribed? > > Until you do some analysis of your storage with the iokit, one cannot > really draw any conclusions, however if you are already at the maximum > value of OST threads, it would not be difficult to believe that perhaps > this is a possibility. > > Try a simple experiment and half the number to 256 and see if you have > any drop off in throughput to the storage devices. If not, then you can > easily assume that 512 was either too much or not necessary. You can > try doing this again if you wish. If you get to a value of OST threads > where your throughput is lower than it should be, you''ve gone too low. > > But really, the iokit is the more efficient and accurate way to > determine this. > > b. >
Hi all, Since there are jobs running on the clustre, I cann''t do PIOS test now. I am afraid this situtaion may happen later. Does Lustre has some solution to deal with over-subscribed instead of Kernel crash? Users can accpt that their jobs are slow down, but they can not accept their jobs are dead because of crash of OSSs. Or is there any other reason may cause crash of OSSs? Thank you very much! ------------------ wanglu 2008-11-11 ------------------------------------------------------------- ????Wang lu ?????2008-11-11 01:01:12 ????Brian J. Murrell ???lustre-discuss at lists.lustre.org ???Re: [Lustre-discuss] Frequent OSS Crashes with heavy load Thanks a lot. I will go on tomorrow. Brian J. Murrell ?:> On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote: >> I have already 512(max number) IO thread running. Some of them are of "Dead" >> status. Is it safe to draw conclusion that the OSS is oversubscribed? > > Until you do some analysis of your storage with the iokit, one cannot > really draw any conclusions, however if you are already at the maximum > value of OST threads, it would not be difficult to believe that perhaps > this is a possibility. > > Try a simple experiment and half the number to 256 and see if you have > any drop off in throughput to the storage devices. If not, then you can > easily assume that 512 was either too much or not necessary. You can > try doing this again if you wish. If you get to a value of OST threads > where your throughput is lower than it should be, you''ve gone too low. > > But really, the iokit is the more efficient and accurate way to > determine this. > > b. >_______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Andreas Dilger
2008-Nov-11 09:50 UTC
[Lustre-discuss] Frequent OSS Crashes with heavy load
On Nov 11, 2008 15:52 +0800, wanglu wrote:> Since there are jobs running on the clustre, I cann''t do PIOS test now. I am afraid this situtaion may happen later. Does Lustre has some solution to deal with over-subscribed instead of Kernel crash? Users can accpt that their jobs are slow down, but they can not accept their jobs are dead because of crash of OSSs. > Or is there any other reason may cause crash of OSSs?You can increase the lustre timeout, temporarily on all clients & servers: lctl set_param timeout=200 or permanently in the filesystem configuration (on the MGS only): lctl conf_param {fsname}.sys.timeout=200> wanglu > 2008-11-11 > > ------------------------------------------------------------- > ????Wang lu > ?????2008-11-11 01:01:12 > ????Brian J. Murrell > ???lustre-discuss at lists.lustre.org > ???Re: [Lustre-discuss] Frequent OSS Crashes with heavy load > > Thanks a lot. I will go on tomorrow. > > Brian J. Murrell ?: > > > On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote: > >> I have already 512(max number) IO thread running. Some of them are of "Dead" > >> status. Is it safe to draw conclusion that the OSS is oversubscribed? > > > > Until you do some analysis of your storage with the iokit, one cannot > > really draw any conclusions, however if you are already at the maximum > > value of OST threads, it would not be difficult to believe that perhaps > > this is a possibility. > > > > Try a simple experiment and half the number to 256 and see if you have > > any drop off in throughput to the storage devices. If not, then you can > > easily assume that 512 was either too much or not necessary. You can > > try doing this again if you wish. If you get to a value of OST threads > > where your throughput is lower than it should be, you''ve gone too low. > > > > But really, the iokit is the more efficient and accurate way to > > determine this. > > > > b. > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Brian J. Murrell ?:> On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote: >> I have already 512(max number) IO thread running. Some of them are of "Dead" >> status. Is it safe to draw conclusion that the OSS is oversubscribed? > > Until you do some analysis of your storage with the iokit, one cannot > really draw any conclusions, however if you are already at the maximum > value of OST threads, it would not be difficult to believe that perhaps > this is a possibility. > > Try a simple experiment and half the number to 256 and see if you have > any drop off in throughput to the storage devices. If not, then you can > easily assume that 512 was either too much or not necessary. You can > try doing this again if you wish. If you get to a value of OST threads > where your throughput is lower than it should be, you''ve gone too low. > > But really, the iokit is the more efficient and accurate way to > determine this. > > b. >
May I ask where can I run PIOS command? I think to determine the max thread number of OSS, it should be run on OSS, however, the OST directorys are unwritable. Can I write to /dev/sdaX? I am confused. Brian J. Murrell ?:> On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote: >> I have already 512(max number) IO thread running. Some of them are of "Dead" >> status. Is it safe to draw conclusion that the OSS is oversubscribed? > > Until you do some analysis of your storage with the iokit, one cannot > really draw any conclusions, however if you are already at the maximum > value of OST threads, it would not be difficult to believe that perhaps > this is a possibility. > > Try a simple experiment and half the number to 256 and see if you have > any drop off in throughput to the storage devices. If not, then you can > easily assume that 512 was either too much or not necessary. You can > try doing this again if you wish. If you get to a value of OST threads > where your throughput is lower than it should be, you''ve gone too low. > > But really, the iokit is the more efficient and accurate way to > determine this. > > b. >
Andreas Dilger
2008-Nov-12 17:36 UTC
[Lustre-discuss] Frequent OSS Crashes with heavy load
On Nov 12, 2008 13:48 +0000, Wang lu wrote:> May I ask where can I run PIOS command? I think to determine the max thread > number of OSS, it should be run on OSS, however, the OST directorys are > unwritable. Can I write to /dev/sdaX? I am confused.Running PIOS directly the /dev/sdX will overwrite all data there. It should only be run on the disk devices before the filesystem is formatted. You can run PIOS against the filesystem itself (e.g. /mnt/lustre) to just create regular files in the filesystem.> Brian J. Murrell ?: > > > On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote: > >> I have already 512(max number) IO thread running. Some of them are of "Dead" > >> status. Is it safe to draw conclusion that the OSS is oversubscribed? > > > > Until you do some analysis of your storage with the iokit, one cannot > > really draw any conclusions, however if you are already at the maximum > > value of OST threads, it would not be difficult to believe that perhaps > > this is a possibility. > > > > Try a simple experiment and half the number to 256 and see if you have > > any drop off in throughput to the storage devices. If not, then you can > > easily assume that 512 was either too much or not necessary. You can > > try doing this again if you wish. If you get to a value of OST threads > > where your throughput is lower than it should be, you''ve gone too low. > > > > But really, the iokit is the more efficient and accurate way to > > determine this. > > > > b. > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Do you mean mount Lustre on a OSS server, and then do PIOS test? One client node has only 1Gbit Network. It can not saturat the OSS server. [root at boss01 /]# mount -t lustre mds01 at tcp0:/besfs /besfs [root at boss01 /]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/cciss/c0d0p1 30233896 3784000 24914084 14% / none 4153680 0 4153680 0% /dev/shm /dev/cciss/c0d0p5 92702372 90176 87903148 1% /scrach /dev/cciss/c0d0p3 2016044 35836 1877796 2% /usr/vice/cache /dev/sda1 6728210844 1657103924 4729333456 26% /lustre/besfs/ost0 /dev/sda2 6728210844 1659522080 4726915300 26% /lustre/besfs/ost1 /dev/sdb1 6728210844 1644823840 4741613540 26% /lustre/besfs/ost2 /dev/sdb2 6728210844 1653193084 4733244296 26% /lustre/besfs/ost3 mds01 at tcp0:/besfs 53825686752 13247980384 37843027072 26% /besfs Andreas Dilger ?:> On Nov 12, 2008 13:48 +0000, Wang lu wrote: >> May I ask where can I run PIOS command? I think to determine the max thread >> number of OSS, it should be run on OSS, however, the OST directorys are >> unwritable. Can I write to /dev/sdaX? I am confused. > > Running PIOS directly the /dev/sdX will overwrite all data there. It should > only be run on the disk devices before the filesystem is formatted. You > can run PIOS against the filesystem itself (e.g. /mnt/lustre) to just create > regular files in the filesystem. > >> Brian J. Murrell ?: >> >> > On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote: >> >> I have already 512(max number) IO thread running. Some of them are of "Dead">> >> status. Is it safe to draw conclusion that the OSS is oversubscribed? >> > >> > Until you do some analysis of your storage with the iokit, one cannot >> > really draw any conclusions, however if you are already at the maximum >> > value of OST threads, it would not be difficult to believe that perhaps >> > this is a possibility. >> > >> > Try a simple experiment and half the number to 256 and see if you have >> > any drop off in throughput to the storage devices. If not, then you can >> > easily assume that 512 was either too much or not necessary. You can >> > try doing this again if you wish. If you get to a value of OST threads >> > where your throughput is lower than it should be, you''ve gone too low. >> > >> > But really, the iokit is the more efficient and accurate way to >> > determine this. >> > >> > b. >> > >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >
Dear all, This is a piece of error log: Nov 13 18:25:26 boss02 kernel: Lustre: 27228:0:(filter_io_26.c:700:filter_commitrw_write()) Skipped 56 previous similar messages Nov 13 18:25:26 boss02 kernel: Lustre: 27176:0:(lustre_fsfilt.h:246:fsfilt_brw_start_log()) besfs-OST0004: slow journal start 47s Nov 13 18:25:26 boss02 kernel: Lustre: 27231:0:(filter_io_26.c:713:filter_commitrw_write()) besfs-OST0004: slow brw_start 47s Nov 13 18:25:26 boss02 kernel: Lustre: 27231:0:(filter_io_26.c:713:filter_commitrw_write()) Skipped 8 previous similar messages Nov 13 18:25:26 boss02 kernel: Lustre: 27176:0:(lustre_fsfilt.h:246:fsfilt_brw_start_log()) Skipped 10 previous similar messages Nov 13 18:25:26 boss02 kernel: Lustre: 27278:0:(filter_io_26.c:765:filter_commitrw_write()) besfs-OST0004: slow direct_io 47s Nov 13 18:25:26 boss02 kernel: Lustre: 27235:0:(lustre_fsfilt.h:302:fsfilt_commit_wait()) besfs-OST0004: slow journal start 47s Nov 13 18:25:26 boss02 kernel: Lustre: 27235:0:(filter_io_26.c:778:filter_commitrw_write()) besfs-OST0004: slow commitrw commit 47s Nov 13 18:25:47 boss02 sshd[18062]: Accepted password for root from 192.168.50.33 port 32796 Nov 13 10:25:47 boss02 sshd[18063]: Accepted password for root from 192.168.50.33 port 32796 <----I could not log in from SSH here and went to the console--> <---What I saw---> Nov 13 18:25:47 boss02 sshd(pam_unix)[18064]: session opened for user root by root(uid=0) Nov 13 18:29:00 boss02 kernel: Lustre: 27501:0:(ldlm_lib.c:525:target_handle_reconnect()) besfs-OST0004: f8e1ba7f-1faf-9b85-b04b-cbf89fe80640 reconnecting Nov 13 18:29:00 boss02 kernel: Lustre: 27501:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 1 previous similar message Nov 13 18:29:00 boss02 kernel: Lustre: 27359:0:(ldlm_lib.c:525:target_handle_reconnect()) besfs-OST0001: f819d104-ee19-f011-d6d6-bde44a19a8df reconnecting Nov 13 18:29:00 boss02 kernel: Lustre: 27359:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 4 previous similar messages Nov 13 18:29:51 boss02 kernel: Lustre: 27074:0:(ldlm_lib.c:525:target_handle_reconnect()) besfs-OST0001: 1cd07c52-94c0-3a1c-dcd4-390daf0f0d10 reconnecting Nov 13 18:29:51 boss02 kernel: Lustre: 27074:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 2 previous similar messages Nov 13 18:35:02 boss02 kernel: LustreError: 26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial receive from 12345-192.168.52.79 at tcp, ip 192.168.52.79:1021, with error Nov 13 18:35:02 boss02 kernel: LustreError: 26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc e1c24000 Nov 13 18:35:02 boss02 kernel: LustreError: 17941:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req at ea8cd200 x10376088/t0 o4->b99b0138-d1de-93db-0418-c08eeb8c4b57 at NET_0x20000c0a8344f_UUID:0/0 lens 384/352 e 0 to 0 dl 1226573467 ref 1 fl Interpret:/0/0 rc 0/0 Nov 13 18:35:02 boss02 kernel: Lustre: 17941:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0001: ignoring bulk IO comm error with b99b0138-d1de-93db-0418-c08eeb8c4b57 at NET_0x20000c0a8344f_UUID id 12345-192.168.52.79 at tcp - client will retry Nov 13 18:35:04 boss02 kernel: LustreError: 26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial receive from 12345-192.168.52.94 at tcp, ip 192.168.52.94:1021, with error Nov 13 18:35:04 boss02 kernel: LustreError: 26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc f6ce6000 Nov 13 18:35:04 boss02 kernel: LustreError: 18214:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req at e3d07a00 x12379068/t0 o4->7dfc3e78-2411-0625-f276-26756a033f22 at NET_0x20000c0a8345e_UUID:0/0 lens 384/352 e 0 to 0 dl 1226573468 ref 1 fl Interpret:/0/0 rc 0/0 Nov 13 18:35:04 boss02 kernel: Lustre: 18214:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0000: ignoring bulk IO comm error with 7dfc3e78-2411-0625-f276-26756a033f22 at NET_0x20000c0a8345e_UUID id 12345-192.168.52.94 at tcp - client will retry Nov 13 18:35:13 boss02 kernel: LustreError: 26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial receive from 12345-192.168.52.70 at tcp, ip 192.168.52.70:1021, with error Nov 13 18:35:13 boss02 kernel: LustreError: 26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc d9f6b000 Nov 13 18:35:13 boss02 kernel: LustreError: 27177:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req at f5164800 x600925/t0 o4->53e9e602-8258-51f4-c7f9-4b9ded4efc27 at NET_0x20000c0a83446_UUID:0/0 lens 384/352 e 0 to 0 dl 1226573467 ref 1 fl Interpret:/0/0 rc 0/0 Nov 13 18:35:13 boss02 kernel: Lustre: 27177:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0000: ignoring bulk IO comm error with 53e9e602-8258-51f4-c7f9-4b9ded4efc27 at NET_0x20000c0a83446_UUID id 12345-192.168.52.70 at tcp - client will retry Nov 13 18:35:15 boss02 kernel: LustreError: 26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial receive from 12345-192.168.52.81 at tcp, ip 192.168.52.81:1021, with error Nov 13 18:35:15 boss02 kernel: LustreError: 26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc d34d2000 Nov 13 18:35:15 boss02 kernel: LustreError: 27237:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req at c5a8da2c x12883457/t0 o4->dce502fc-79fb-9a4e-5e97-90a58a814569 at NET_0x20000c0a83451_UUID:0/0 lens 384/352 e 0 to 0 dl 1226573467 ref 1 fl Interpret:/0/0 rc 0/0 Nov 13 18:35:15 boss02 kernel: Lustre: 27237:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0003: ignoring bulk IO comm error with dce502fc-79fb-9a4e-5e97-90a58a814569 at NET_0x20000c0a83451_UUID id 12345-192.168.52.81 at tcp - client will retry Nov 13 18:35:17 boss02 kernel: LustreError: 26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc da5d7000 Nov 13 18:35:18 boss02 kernel: LustreError: 26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial receive from 12345-192.168.52.108 at tcp, ip 192.168.52.108:1021, with error Nov 13 18:35:18 boss02 kernel: LustreError: 26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Skipped 1 previous similar message Nov 13 18:35:18 boss02 kernel: LustreError: 26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc c71c0000 Nov 13 18:35:18 boss02 kernel: LustreError: 27215:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req at e3767800 x7236800/t0 o4->66fc5c30-3666-f0e6-d005-a39f58eb4be2 at NET_0x20000c0a8346c_UUID:0/0 lens 384/352 e 0 to 0 dl 1226573468 ref 1 fl Interpret:/0/0 rc 0/0 Nov 13 18:35:18 boss02 kernel: LustreError: 27215:0:(ost_handler.c:1139:ost_brw_write()) Skipped 1 previous similar message Nov 13 18:35:18 boss02 kernel: Lustre: 27215:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0000: ignoring bulk IO comm error with 66fc5c30-3666-f0e6-d005-a39f58eb4be2 at NET_0x20000c0a8346c_UUID id 12345-192.168.52.108 at tcp - client will retry Nov 13 18:35:18 boss02 kernel: Lustre: 27215:0:(ost_handler.c:1270:ost_brw_write()) Skipped 1 previous similar message <---At that time, the network was down, couldn''t ping gateway--> <--I have tried restart service network, but after restarted, gateway was still unreachable---> ------------------ wanglu 2008-11-13 ------------------------------------------------------------- ????Andreas Dilger ?????2008-11-13 01:36:57 ????Wang lu ???Brian J. Murrell; lustre-discuss at lists.lustre.org ???Re: [Lustre-discuss] Frequent OSS Crashes with heavy load On Nov 12, 2008 13:48 +0000, Wang lu wrote:> May I ask where can I run PIOS command? I think to determine the max thread > number of OSS, it should be run on OSS, however, the OST directorys are > unwritable. Can I write to /dev/sdaX? I am confused.Running PIOS directly the /dev/sdX will overwrite all data there. It should only be run on the disk devices before the filesystem is formatted. You can run PIOS against the filesystem itself (e.g. /mnt/lustre) to just create regular files in the filesystem.> Brian J. Murrell ?: > > > On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote: > >> I have already 512(max number) IO thread running. Some of them are of "Dead" > >> status. Is it safe to draw conclusion that the OSS is oversubscribed? > > > > Until you do some analysis of your storage with the iokit, one cannot > > really draw any conclusions, however if you are already at the maximum > > value of OST threads, it would not be difficult to believe that perhaps > > this is a possibility. > > > > Try a simple experiment and half the number to 256 and see if you have > > any drop off in throughput to the storage devices. If not, then you can > > easily assume that 512 was either too much or not necessary. You can > > try doing this again if you wish. If you get to a value of OST threads > > where your throughput is lower than it should be, you''ve gone too low. > > > > But really, the iokit is the more efficient and accurate way to > > determine this. > > > > b. > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Brian J. Murrell
2008-Nov-13 13:58 UTC
[Lustre-discuss] Frequent OSS Crashes with heavy load
There is really no need to put both andreas and myself into your new message recipient addresses. We are both on the lustre-discuss list. On Thu, 2008-11-13 at 19:32 +0800, wanglu wrote:> <----I could not log in from SSH here and went to the console--> > <---What I saw---> > ... > Nov 13 18:35:02 boss02 kernel: LustreError: 26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial receive from 12345-192.168.52.79 at tcp, ip 192.168.52.79:1021, with error > Nov 13 18:35:02 boss02 kernel: LustreError: 26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc e1c24000 > Nov 13 18:35:02 boss02 kernel: LustreError: 17941:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req at ea8cd200 x10376088/t0 o4->b99b0138-d1de-93db-0418-c08eeb8c4b57 at NET_0x20000c0a8344f_UUID:0/0 lens 384/352 e 0 to 0 dl 1226573467 ref 1 fl Interpret:/0/0 rc 0/0^^^^^^^^^^^^^^^^^^> Nov 13 18:35:02 boss02 kernel: Lustre: 17941:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0001: ignoring bulk IO comm error with b99b0138-d1de-93db-0418-c08eeb8c4b57 at NET_0x20000c0a8344f_UUID id 12345-192.168.52.79 at tcp - client will retry[ Many more ]> > <---At that time, the network was down, couldn''t ping gateway--> > <--I have tried restart service network, but after restarted, gateway was still unreachable--->You have networking problems, not Lustre problems. Lustre only utilizes whatever network you provide it. It does not control it. It does not bring it up, take it down or reconfigure it in any way. Your operating system does this. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081113/48be01b8/attachment.bin
We have experienced all these errors when we have a big job that is writing many small chunks. when the writes are ... say 80 bytes and the block size is 4k bytes, the back end storage can slow down with read block, modify block, write block, to such and extent as to cause the slow commitrw and slow journal messages very similar to yours. from Your email: Dear all, This is a piece of error log: Nov 13 18:25:26 boss02 kernel: Lustre: 27228:0:(filter_io_26.c:700:filter_commitrw_write()) Skipped 56 previous similar messages Nov 13 18:25:26 boss02 kernel: Lustre: 27176:0:(lustre_fsfilt.h:246:fsfilt_brw_start_log()) besfs-OST0004: slow journal start 47s Nov 13 18:25:26 boss02 kernel: Lustre: 27231:0:(filter_io_26.c:713:filter_commitrw_write()) besfs-OST0004: slow brw_start 47s Nov 13 18:25:26 boss02 kernel: Lustre: 27231:0:(filter_io_26.c:713:filter_commitrw_write()) Skipped 8 previous similar messages Nov 13 18:25:26 boss02 kernel: Lustre: 27176:0:(lustre_fsfilt.h:246:fsfilt_brw_start_log()) Skipped 10 previous similar messages Nov 13 18:25:26 boss02 kernel: Lustre: 27278:0:(filter_io_26.c:765:filter_commitrw_write()) besfs-OST0004: slow direct_io 47s Nov 13 18:25:26 boss02 kernel: Lustre: 27235:0:(lustre_fsfilt.h:302:fsfilt_commit_wait()) besfs-OST0004: slow journal start 47s You may check for a job that is confirming small writes instead of caching and writing Mbytes. we have even seen this phenomenon back up the server to the extent that it will appear to the client that it is time to try the failover server, which fails. just something to check. At 8:58 AM -0500 11/13/08, Brian J. Murrell wrote:>Content-type: multipart/signed; boundary="=-8X67HSS1Wp3J4Z9OBh3u"; > protocol="application/pgp-signature"; micalg=pgp-sha1 > >There is really no need to put both andreas and myself into your new >message recipient addresses. We are both on the lustre-discuss list. > >On Thu, 2008-11-13 at 19:32 +0800, wanglu wrote: >> <----I could not log in from SSH here and went to the console--> >> <---What I saw---> >> ... >> Nov 13 18:35:02 boss02 kernel: LustreError: >>26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial >>receive from 12345-192.168.52.79 at tcp, ip 192.168.52.79:1021, with >>error >> Nov 13 18:35:02 boss02 kernel: LustreError: >>26928:0:(events.c:361:server_bulk_callback()) event type 2, status >>-5, desc e1c24000 >> Nov 13 18:35:02 boss02 kernel: LustreError: >>17941:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on >>bulk GET 0(1048576) req at ea8cd200 x10376088/t0 >>o4->b99b0138-d1de-93db-0418-c08eeb8c4b57 at NET_0x20000c0a8344f_UUID:0/0 >>lens 384/352 e 0 to 0 dl 1226573467 ref 1 fl Interpret:/0/0 rc 0/0 > >^^^^^^^^^^^^^^^^^^ >> Nov 13 18:35:02 boss02 kernel: Lustre: >>17941:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0001: >>ignoring bulk IO comm error with >>b99b0138-d1de-93db-0418-c08eeb8c4b57 at NET_0x20000c0a8344f_UUID id >>12345-192.168.52.79 at tcp - client will retry >[ Many more ] >> >> <---At that time, the network was down, couldn''t ping gateway--> >> <--I have tried restart service network, but after restarted, >>gateway was still unreachable---> > >You have networking problems, not Lustre problems. Lustre only utilizes >whatever network you provide it. It does not control it. It does not >bring it up, take it down or reconfigure it in any way. Your operating >system does this. > >b. > > >Content-Type: application/pgp-signature; name="signature.asc" >Content-Description: This is a digitally signed message part > >Attachment converted: PowerBook HD:signature.asc ( / ) (001D608B) >_______________________________________________ >Lustre-discuss mailing list >Lustre-discuss at lists.lustre.org >http:// lists.lustre.org/mailman/listinfo/lustre-discuss-- }}}===============>> LLNL James E. Harm (Jim); jharm at llnl.gov System Administrator, ICCD Clusters (925) 422-4018 Page: 423-7705x57152