thr3ads.net - Lustre discuss - [Lustre-discuss] Frequent OSS Crashes with heavy load [Nov 2008]

If this information is useful, please help other people find it:
Share via:

wanglu

2008-Nov-10 06:50 UTC

[Lustre-discuss] Frequent OSS Crashes with heavy load

Dear list, 

     Our Lustre system crashes frequently these days with heavy average load. 

1)#top
 top - 14:32:57 up 18:15,  1 user,  load average: 25.05, 24.27, 24.47
Mem:   8307364k total,   859724k used,  7447640k free,   234288k buffers
Swap: 16386292k total,        0k used, 16386292k free,    37932k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
26695 root      15   0     0    0    0 S  7.6  0.0  51:57.40 socknal_sd04
26694 root      15   0     0    0    0 S  6.6  0.0  53:44.42 socknal_sd03
26691 root      15   0     0    0    0 S  5.6  0.0  51:11.76 socknal_sd00
26697 root      15   0     0    0    0 S  5.3  0.0  42:12.23 socknal_sd06
26696 root      15   0     0    0    0 S  3.3  0.0  52:47.42 socknal_sd05
26692 root      15   0     0    0    0 S  2.3  0.0  26:19.46 socknal_sd01
26693 root      15   0     0    0    0 S  2.3  0.0  32:38.21 socknal_sd02
26952 root      15   0     0    0    0 S  1.0  0.0   2:06.69 ll_ost_io_09
....


2) iostat -x 5 
Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp (boss01.ihep.ac.cn)       11/10/2008

avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.00    0.00   11.33    4.56   84.10

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
cciss/c0d0   1.05   0.43  0.27  0.41    9.78    6.65     4.89     3.32    24.31 
0.01   17.15   5.78   0.39
sda          3.46   0.64 1297.05  0.60 2588.12  118.12  1294.06    59.06    
2.09    22.81   15.69   0.77  99.57
sdb          3.09   0.28 1274.46  0.18 1541.21   23.54   770.60    11.77    
1.23    16.75   12.16   0.78  99.56

avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.00    0.00   11.53    0.10   88.38

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
cciss/c0d0   0.00   1.80  0.00  0.00    0.00   16.00     0.00     8.00     0.00 
0.00    0.00   0.00   0.00
sda          3.20   0.00 1436.60  0.00 130524.80    0.00 65262.40     0.00   
90.86    16.29   10.73   0.70 100.00
sdb          3.40   0.00 1142.20  0.00 124113.60    0.00 62056.80     0.00  
108.66    10.44    8.24   0.87  99.80


Before each crashes, there are LustreError like:

Nov  9 17:25:41 boss01 kernel: LustreError:
27327:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s 
req at e3df8e00 x133017/t0 o3->73c15254-a884-578e-9634-859b44619a4f at
NET_0x20000c0a83446_UUID:0/0 lens 400/336 e 0 to 0 dl 1226222741 ref 1 fl
Interpret:/0/0 rc 0/0
Nov  9 17:25:41 boss01 kernel: Lustre:
27327:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0005: ignoring bulk IO comm
error with 73c15254-a884-578e-9634-859b44619a4f at NET_0x20000c0a83446_UUID id
12345-192.168.52.70 at tcp - client will retry
Nov  9 17:27:47 boss01 kernel: Lustre: besfs-OST0006: haven''t heard
from client 73c15254-a884-578e-9634-859b44619a4f (at 192.168.52.70 at tcp) in
227 seconds. I think it''s dead, and I am evicting it.
Nov  9 17:27:48 boss01 kernel: Lustre: besfs-OST0007: haven''t heard
from client 73c15254-a884-578e-9634-859b44619a4f (at 192.168.52.70 at tcp) in
227 seconds. I think it''s dead, and I am evicting it.
Nov  9 09:28:05 boss01 sshd[29314]: Connection closed by 192.168.51.130
Nov  9 17:29:17 boss01 ntpd[27872]: kernel time sync enabled 0001
Nov  9 17:56:48 boss01 kernel: Lustre: besfs-OST0005: haven''t heard
from client c06ff22f-03a6-3897-ec32-1f26f6958e8b (at 202.122.33.83 at tcp) in
227 seconds. I think it''s dead, and I am evicting it.
Nov  9 17:56:48 boss01 kernel: Lustre: Skipped 2 previous similar messages
Nov  9 17:59:15 boss01 kernel: Lustre: besfs-OST0002: haven''t heard
from client c06ff22f-03a6-3897-ec32-1f26f6958e8b (at 202.122.33.83 at tcp) in
374 seconds. I think it''s dead, and I am evicting it.
Nov  9 17:59:18 boss01 kernel: LustreError:
27250:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s 
req at e2ccee00 x36870/t0 o3->7df31bbf-54a5-ada8-abd7-f0920f648d0a at
NET_0x20000c0a83446_UUID:0/0 lens 400/336 e 0 to 0 dl 1226224758 ref 1 fl
Interpret:/0/0 rc 0/0
Nov  9 17:59:18 boss01 kernel: LustreError:
27250:0:(ost_handler.c:868:ost_brw_read()) Skipped 2 previous similar messages
Nov  9 17:59:18 boss01 kernel: Lustre:
27250:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0007: ignoring bulk IO comm
error with 7df31bbf-54a5-ada8-abd7-f0920f648d0a at NET_0x20000c0a83446_UUID id
12345-192.168.52.70 at tcp - client will retry
Nov  9 17:59:18 boss01 kernel: Lustre:
27250:0:(ost_handler.c:925:ost_brw_read()) Skipped 2 previous similar messages
Nov  9 17:59:18 boss01 kernel: LustreError:
29507:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s 
req at e01bce00 x36866/t0 o3->7df31bbf-54a5-ada8-abd7-f0920f648d0a at
NET_0x20000c0a83446_UUID:0/0 lens 432/336 e 0 to 0 dl 1226224758 ref 1 fl
Interpret:/0/0 rc 0/0
Nov  9 17:59:18 boss01 kernel: LustreError:
29507:0:(ost_handler.c:868:ost_brw_read()) Skipped 4 previous similar messages
Nov  9 17:59:18 boss01 kernel: Lustre:
29507:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0005: ignoring bulk IO comm
error with 7df31bbf-54a5-ada8-abd7-f0920f648d0a at NET_0x20000c0a83446_UUID id
12345-192.168.52.70 at tcp - client will retry
Nov  9 17:59:18 boss01 kernel: Lustre:
29507:0:(ost_handler.c:925:ost_brw_read()) Skipped 4 previous similar messages
Nov  9 18:01:33 boss01 kernel: Lustre: besfs-OST0007: haven''t heard
from client c06ff22f-03a6-3897-ec32-1f26f6958e8b (at 202.122.33.83 at tcp) in
512 seconds. I think it''s dead, and I am evicting it.
Nov  9 18:04:14 boss01 kernel: Lustre: besfs-OST0007: haven''t heard
from client 7df31bbf-54a5-ada8-abd7-f0920f648d0a (at 192.168.52.70 at tcp) in
396 seconds. I think it''s dead, and I am evicting it.

The configuration of our system
OS:Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp
MDS:1
OSS:2 with 10Gbit/s NIC, each attached with 2 disk arrays directly. 
Client: 50 nodes( 8 core server), each has 1Gbit/s NIC

and 

[root at boss02 ~]# sysctl -q lnet
lnet.nis = nid                      refs peer   max    tx   min
lnet.nis = 0 at lo                        2    0     0     0     0
lnet.nis = 192.168.50.34 at tcp         136    8   256   250    88
lnet.buffers = pages count credits     min
lnet.buffers =     0     0       0       0
lnet.buffers =     1     0       0       0
lnet.buffers =   256     0       0       0
lnet.peers = nid                      refs state   max   rtr   min    tx   min
queue
lnet.peers = 192.168.50.14 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.11 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.13 at tcp           1  ~rtr     8     8     8     8  
-71 0
lnet.peers = 192.168.52.14 at tcp           1  ~rtr     8     8     8     8   
-8 0
lnet.peers = 192.168.52.15 at tcp           1  ~rtr     8     8     8     8  
-14 0
lnet.peers = 192.168.52.16 at tcp           1  ~rtr     8     8     8     8  
-30 0
lnet.peers = 192.168.52.17 at tcp           1  ~rtr     8     8     8     8  
-38 0
lnet.peers = 192.168.52.18 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.19 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.20 at tcp           1  ~rtr     8     8     8     8  
-19 0
lnet.peers = 192.168.52.21 at tcp           1  ~rtr     8     8     8     8    
3 0
lnet.peers = 192.168.52.22 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.50.32 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.23 at tcp           1  ~rtr     8     8     8     8   
-6 0
lnet.peers = 192.168.52.24 at tcp           1  ~rtr     8     8     8     8  
-50 0
lnet.peers = 192.168.52.25 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.26 at tcp           1  ~rtr     8     8     8     8   
-2 0
lnet.peers = 192.168.52.27 at tcp           1  ~rtr     8     8     8     8  
-31 0
lnet.peers = 192.168.52.28 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.29 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.30 at tcp           1  ~rtr     8     8     8     8  
-31 0
lnet.peers = 192.168.52.31 at tcp           7  ~rtr     8     8     8     2  
-10 3318192
lnet.peers = 192.168.52.32 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.33 at tcp           1  ~rtr     8     8     8     8   
-6 0
lnet.peers = 192.168.52.34 at tcp           1  ~rtr     8     8     8     8   
-4 0
lnet.peers = 192.168.52.35 at tcp           1  ~rtr     8     8     8     8   
-2 0
lnet.peers = 192.168.52.36 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.37 at tcp           1  ~rtr     8     8     8     8  
-55 0
lnet.peers = 192.168.52.38 at tcp           1  ~rtr     8     8     8     8  
-62 0
lnet.peers = 192.168.52.39 at tcp           1  ~rtr     8     8     8     8   
-8 0
lnet.peers = 192.168.52.40 at tcp           1  ~rtr     8     8     8     8   
-5 0
lnet.peers = 192.168.52.41 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.42 at tcp           1  ~rtr     8     8     8     8   
-4 0
lnet.peers = 192.168.52.43 at tcp           1  ~rtr     8     8     8     8  
-31 0
lnet.peers = 192.168.52.44 at tcp           1  ~rtr     8     8     8     8  
-14 0
lnet.peers = 192.168.52.45 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.46 at tcp           1  ~rtr     8     8     8     8   
-3 0
lnet.peers = 192.168.52.47 at tcp           1  ~rtr     8     8     8     8  
-10 0
lnet.peers = 192.168.52.48 at tcp           1  ~rtr     8     8     8     8  
-23 0
lnet.peers = 192.168.52.49 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.50 at tcp           1  ~rtr     8     8     8     8   
-3 0
lnet.peers = 192.168.52.51 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.52 at tcp           1  ~rtr     8     8     8     8  
-23 0
lnet.peers = 192.168.52.53 at tcp           1  ~rtr     8     8     8     8   
-5 0
lnet.peers = 192.168.52.54 at tcp           1  ~rtr     8     8     8     8  
-20 0
lnet.peers = 192.168.52.55 at tcp           1  ~rtr     8     8     8     8   
-5 0
lnet.peers = 192.168.52.56 at tcp           1  ~rtr     8     8     8     8    
1 0
lnet.peers = 192.168.52.57 at tcp           1  ~rtr     8     8     8     8    
1 0
lnet.peers = 192.168.52.58 at tcp           1  ~rtr     8     8     8     8  
-11 0
lnet.peers = 192.168.52.59 at tcp           1  ~rtr     8     8     8     8   
-4 0
lnet.peers = 192.168.52.60 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.61 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.62 at tcp           1  ~rtr     8     8     8     8  
-19 0
lnet.peers = 192.168.52.63 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.64 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.65 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.66 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.67 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.68 at tcp           1  ~rtr     8     8     8     8    
3 0
lnet.peers = 192.168.52.69 at tcp           1  ~rtr     8     8     8     8    
1 0
lnet.peers = 192.168.52.70 at tcp           1  ~rtr     8     8     8     8   
-8 0
lnet.peers = 192.168.52.71 at tcp           1  ~rtr     8     8     8     8   
-2 0
lnet.peers = 192.168.52.72 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.73 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.74 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.75 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 202.122.33.56 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.76 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.77 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.78 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.79 at tcp           1  ~rtr     8     8     8     8   
-3 0
lnet.peers = 192.168.52.80 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.81 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.82 at tcp           1  ~rtr     8     8     8     8    
3 0
lnet.peers = 192.168.52.83 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.84 at tcp           1  ~rtr     8     8     8     8    
1 0
lnet.peers = 192.168.52.86 at tcp           1  ~rtr     8     8     8     8  
-12 0
lnet.peers = 192.168.52.87 at tcp           1  ~rtr     8     8     8     8    
3 0
lnet.peers = 192.168.52.88 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.89 at tcp           1  ~rtr     8     8     8     8    
1 0
lnet.peers = 192.168.52.90 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.91 at tcp           1  ~rtr     8     8     8     8    
3 0
lnet.peers = 192.168.52.92 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.93 at tcp           1  ~rtr     8     8     8     8  
-14 0
lnet.peers = 192.168.52.94 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.95 at tcp           1  ~rtr     8     8     8     8  
-19 0
lnet.peers = 192.168.52.96 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.97 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.98 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.99 at tcp           1  ~rtr     8     8     8     8   
-3 0
lnet.peers = 192.168.52.100 at tcp          1  ~rtr     8     8     8     8   
-4 0
lnet.peers = 192.168.52.101 at tcp          1  ~rtr     8     8     8     8    
4 0
lnet.peers = 202.122.33.82 at tcp           1  ~rtr     8     8     8     8
-6383 0
lnet.peers = 192.168.52.102 at tcp          1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 202.122.33.83 at tcp           1  ~rtr     8     8     8     8   
-6 0
lnet.peers = 192.168.52.103 at tcp          1  ~rtr     8     8     8     8    
2 0
lnet.peers = 202.122.33.84 at tcp           1  ~rtr     8     8     8     8 
-649 0
lnet.peers = 192.168.52.104 at tcp          1  ~rtr     8     8     8     8   
-6 0
lnet.peers = 192.168.52.105 at tcp          1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.106 at tcp          1  ~rtr     8     8     8     8  
-15 0
lnet.peers = 192.168.52.107 at tcp          1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.108 at tcp          1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.109 at tcp          1  ~rtr     8     8     8     8  
-79 0
lnet.peers = 192.168.52.110 at tcp          1  ~rtr     8     8     8     8  
-24 0
lnet.peers = 192.168.52.111 at tcp          1  ~rtr     8     8     8     8 
-102 0
lnet.peers = 192.168.52.112 at tcp          1  ~rtr     8     8     8     8    
0 0
lnet.peers = 202.122.33.92 at tcp           1  ~rtr     8     8     8     8
-1148 0
lnet.peers = 202.122.33.93 at tcp           1  ~rtr     8     8     8     8   
-5 0
lnet.peers = 192.168.52.113 at tcp          1  ~rtr     8     8     8     8  
-55 0
lnet.peers = 192.168.52.114 at tcp          1  ~rtr     8     8     8     8  
-73 0
lnet.peers = 192.168.52.115 at tcp          1  ~rtr     8     8     8     8   
-6 0
lnet.peers = 202.122.33.95 at tcp           1  ~rtr     8     8     8     8
-1914 0
lnet.peers = 192.168.52.116 at tcp          1  ~rtr     8     8     8     8   
-4 0
lnet.peers = 192.168.52.117 at tcp          1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.118 at tcp          1  ~rtr     8     8     8     8  
-55 0
lnet.peers = 192.168.52.119 at tcp          1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.120 at tcp          1  ~rtr     8     8     8     8    
1 0
lnet.peers = 192.168.52.121 at tcp          1  ~rtr     8     8     8     8  
-54 0
lnet.peers = 192.168.52.122 at tcp          1  ~rtr     8     8     8     8  
-65 0
lnet.peers = 192.168.52.123 at tcp          1  ~rtr     8     8     8     8  
-16 0
lnet.peers = 192.168.52.124 at tcp          1  ~rtr     8     8     8     8  
-32 0
lnet.peers = 192.168.52.125 at tcp          1  ~rtr     8     8     8     8 
-158 0
lnet.peers = 192.168.52.126 at tcp          1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.127 at tcp          1  ~rtr     8     8     8     8   
-2 0
lnet.peers = 192.168.52.128 at tcp          1  ~rtr     8     8     8     8  
-36 0
lnet.peers = 192.168.52.129 at tcp          1  ~rtr     8     8     8     8 
-120 0
lnet.peers = 192.168.52.130 at tcp          1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.131 at tcp          1  ~rtr     8     8     8     8  
-82 0
lnet.peers = 192.168.55.11 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.55.12 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.55.13 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.55.14 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.55.15 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.55.16 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.51.134 at tcp          1  ~rtr     8     8     8     8 
-631 0
lnet.routers = ref  rtr_ref alive_cnt  state    last_ping router
lnet.routes = Routing disabled
lnet.routes = net      hops   state router
lnet.stats = 7 6513 0 349123954 349123978 0 25 9871897726514 80688968391 0 7600
lnet.debug_mb = 41
lnet.panic_on_lbug = 0
lnet.catastrophe = 0
lnet.memused = 4166984
lnet.upcall = /usr/lib/lustre/lnet_upcall
lnet.debug_path = /tmp/lustre-log
lnet.console_backoff = 2
lnet.console_min_delay_centisecs = 50
lnet.console_max_delay_centisecs = 60000
lnet.console_ratelimit = 1
lnet.printk = warning error emerg console
lnet.subsystem_debug = undefined mdc mds osc ost class log llite rpc lnet lnd
pinger filter echo ldlm lov lmv sec gss mgc mgs fid fld
lnet.debug = ioctl neterror warning error emerg ha config console


My questions is:
1.What is the signal of the Lustre overload?
2. Can Lustre reject too many connections before it is going to crash?  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/e2beadab/attachment-0001.html

wanglu

2008-Nov-10 06:58 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

Dear list, 

     Our Lustre system crashes frequently these days with heavy average load. 

1)#top
 top - 14:32:57 up 18:15,  1 user,  load average: 25.05, 24.27, 24.47
Mem:   8307364k total,   859724k used,  7447640k free,   234288k buffers
Swap: 16386292k total,        0k used, 16386292k free,    37932k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
26695 root      15   0     0    0    0 S  7.6  0.0  51:57.40 socknal_sd04
26694 root      15   0     0    0    0 S  6.6  0.0  53:44.42 socknal_sd03
26691 root      15   0     0    0    0 S  5.6  0.0  51:11.76 socknal_sd00
26697 root      15   0     0    0    0 S  5.3  0.0  42:12.23 socknal_sd06
26696 root      15   0     0    0    0 S  3.3  0.0  52:47.42 socknal_sd05
26692 root      15   0     0    0    0 S  2.3  0.0  26:19.46 socknal_sd01
26693 root      15   0     0    0    0 S  2.3  0.0  32:38.21 socknal_sd02
26952 root      15   0     0    0    0 S  1.0  0.0   2:06.69 ll_ost_io_09
....


2) iostat -x 5 
Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp (boss01.ihep.ac.cn)       11/10/2008

avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.00    0.00   11.33    4.56   84.10

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
cciss/c0d0   1.05   0.43  0.27  0.41    9.78    6.65     4.89     3.32    24.31 
0.01   17.15   5.78   0.39
sda          3.46   0.64 1297.05  0.60 2588.12  118.12  1294.06    59.06    
2.09    22.81   15.69   0.77  99.57
sdb          3.09   0.28 1274.46  0.18 1541.21   23.54   770.60    11.77    
1.23    16.75   12.16   0.78  99.56

avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.00    0.00   11.53    0.10   88.38

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
cciss/c0d0   0.00   1.80  0.00  0.00    0.00   16.00     0.00     8.00     0.00 
0.00    0.00   0.00   0.00
sda          3.20   0.00 1436.60  0.00 130524.80    0.00 65262.40     0.00   
90.86    16.29   10.73   0.70 100.00
sdb          3.40   0.00 1142.20  0.00 124113.60    0.00 62056.80     0.00  
108.66    10.44    8.24   0.87  99.80


Before each crashes, there are LustreError like:

Nov  9 17:25:41 boss01 kernel: LustreError:
27327:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s 
req at e3df8e00 x133017/t0 o3->73c15254-a884-578e-9634-859b44619a4f at
NET_0x20000c0a83446_UUID:0/0 lens 400/336 e 0 to 0 dl 1226222741 ref 1 fl
Interpret:/0/0 rc 0/0
Nov  9 17:25:41 boss01 kernel: Lustre:
27327:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0005: ignoring bulk IO comm
error with 73c15254-a884-578e-9634-859b44619a4f at NET_0x20000c0a83446_UUID id
12345-192.168.52.70 at tcp - client will retry
Nov  9 17:27:47 boss01 kernel: Lustre: besfs-OST0006: haven''t heard
from client 73c15254-a884-578e-9634-859b44619a4f (at 192.168.52.70 at tcp) in
227 seconds. I think it''s dead, and I am evicting it.
Nov  9 17:27:48 boss01 kernel: Lustre: besfs-OST0007: haven''t heard
from client 73c15254-a884-578e-9634-859b44619a4f (at 192.168.52.70 at tcp) in
227 seconds. I think it''s dead, and I am evicting it.
Nov  9 09:28:05 boss01 sshd[29314]: Connection closed by 192.168.51.130
Nov  9 17:29:17 boss01 ntpd[27872]: kernel time sync enabled 0001
Nov  9 17:56:48 boss01 kernel: Lustre: besfs-OST0005: haven''t heard
from client c06ff22f-03a6-3897-ec32-1f26f6958e8b (at 202.122.33.83 at tcp) in
227 seconds. I think it''s dead, and I am evicting it.
Nov  9 17:56:48 boss01 kernel: Lustre: Skipped 2 previous similar messages
Nov  9 17:59:15 boss01 kernel: Lustre: besfs-OST0002: haven''t heard
from client c06ff22f-03a6-3897-ec32-1f26f6958e8b (at 202.122.33.83 at tcp) in
374 seconds. I think it''s dead, and I am evicting it.
Nov  9 17:59:18 boss01 kernel: LustreError:
27250:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s 
req at e2ccee00 x36870/t0 o3->7df31bbf-54a5-ada8-abd7-f0920f648d0a at
NET_0x20000c0a83446_UUID:0/0 lens 400/336 e 0 to 0 dl 1226224758 ref 1 fl
Interpret:/0/0 rc 0/0
Nov  9 17:59:18 boss01 kernel: LustreError:
27250:0:(ost_handler.c:868:ost_brw_read()) Skipped 2 previous similar messages
Nov  9 17:59:18 boss01 kernel: Lustre:
27250:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0007: ignoring bulk IO comm
error with 7df31bbf-54a5-ada8-abd7-f0920f648d0a at NET_0x20000c0a83446_UUID id
12345-192.168.52.70 at tcp - client will retry
Nov  9 17:59:18 boss01 kernel: Lustre:
27250:0:(ost_handler.c:925:ost_brw_read()) Skipped 2 previous similar messages
Nov  9 17:59:18 boss01 kernel: LustreError:
29507:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s 
req at e01bce00 x36866/t0 o3->7df31bbf-54a5-ada8-abd7-f0920f648d0a at
NET_0x20000c0a83446_UUID:0/0 lens 432/336 e 0 to 0 dl 1226224758 ref 1 fl
Interpret:/0/0 rc 0/0
Nov  9 17:59:18 boss01 kernel: LustreError:
29507:0:(ost_handler.c:868:ost_brw_read()) Skipped 4 previous similar messages
Nov  9 17:59:18 boss01 kernel: Lustre:
29507:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0005: ignoring bulk IO comm
error with 7df31bbf-54a5-ada8-abd7-f0920f648d0a at NET_0x20000c0a83446_UUID id
12345-192.168.52.70 at tcp - client will retry
Nov  9 17:59:18 boss01 kernel: Lustre:
29507:0:(ost_handler.c:925:ost_brw_read()) Skipped 4 previous similar messages
Nov  9 18:01:33 boss01 kernel: Lustre: besfs-OST0007: haven''t heard
from client c06ff22f-03a6-3897-ec32-1f26f6958e8b (at 202.122.33.83 at tcp) in
512 seconds. I think it''s dead, and I am evicting it.
Nov  9 18:04:14 boss01 kernel: Lustre: besfs-OST0007: haven''t heard
from client 7df31bbf-54a5-ada8-abd7-f0920f648d0a (at 192.168.52.70 at tcp) in
396 seconds. I think it''s dead, and I am evicting it.

The configuration of our system
OS:Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp
MDS:1
OSS:2 with 10Gbit/s NIC, each attached with 2 disk arrays directly. 
Client: 50 nodes( 8 core server), each has 1Gbit/s NIC

and 

[root at boss02 ~]# sysctl -q lnet
lnet.nis = nid                      refs peer   max    tx   min
lnet.nis = 0 at lo                        2    0     0     0     0
lnet.nis = 192.168.50.34 at tcp         136    8   256   250    88
lnet.buffers = pages count credits     min
lnet.buffers =     0     0       0       0
lnet.buffers =     1     0       0       0
lnet.buffers =   256     0       0       0
lnet.peers = nid                      refs state   max   rtr   min    tx   min
queue
lnet.peers = 192.168.50.14 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.11 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.13 at tcp           1  ~rtr     8     8     8     8  
-71 0
lnet.peers = 192.168.52.14 at tcp           1  ~rtr     8     8     8     8   
-8 0
lnet.peers = 192.168.52.15 at tcp           1  ~rtr     8     8     8     8  
-14 0
lnet.peers = 192.168.52.16 at tcp           1  ~rtr     8     8     8     8  
-30 0
lnet.peers = 192.168.52.17 at tcp           1  ~rtr     8     8     8     8  
-38 0
lnet.peers = 192.168.52.18 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.19 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.20 at tcp           1  ~rtr     8     8     8     8  
-19 0
lnet.peers = 192.168.52.21 at tcp           1  ~rtr     8     8     8     8    
3 0
lnet.peers = 192.168.52.22 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.50.32 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.23 at tcp           1  ~rtr     8     8     8     8   
-6 0
lnet.peers = 192.168.52.24 at tcp           1  ~rtr     8     8     8     8  
-50 0
lnet.peers = 192.168.52.25 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.26 at tcp           1  ~rtr     8     8     8     8   
-2 0
lnet.peers = 192.168.52.27 at tcp           1  ~rtr     8     8     8     8  
-31 0
lnet.peers = 192.168.52.28 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.29 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.30 at tcp           1  ~rtr     8     8     8     8  
-31 0
lnet.peers = 192.168.52.31 at tcp           7  ~rtr     8     8     8     2  
-10 3318192
lnet.peers = 192.168.52.32 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.33 at tcp           1  ~rtr     8     8     8     8   
-6 0
lnet.peers = 192.168.52.34 at tcp           1  ~rtr     8     8     8     8   
-4 0
lnet.peers = 192.168.52.35 at tcp           1  ~rtr     8     8     8     8   
-2 0
lnet.peers = 192.168.52.36 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.37 at tcp           1  ~rtr     8     8     8     8  
-55 0
lnet.peers = 192.168.52.38 at tcp           1  ~rtr     8     8     8     8  
-62 0
lnet.peers = 192.168.52.39 at tcp           1  ~rtr     8     8     8     8   
-8 0
lnet.peers = 192.168.52.40 at tcp           1  ~rtr     8     8     8     8   
-5 0
lnet.peers = 192.168.52.41 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.42 at tcp           1  ~rtr     8     8     8     8   
-4 0
lnet.peers = 192.168.52.43 at tcp           1  ~rtr     8     8     8     8  
-31 0
lnet.peers = 192.168.52.44 at tcp           1  ~rtr     8     8     8     8  
-14 0
lnet.peers = 192.168.52.45 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.46 at tcp           1  ~rtr     8     8     8     8   
-3 0
lnet.peers = 192.168.52.47 at tcp           1  ~rtr     8     8     8     8  
-10 0
lnet.peers = 192.168.52.48 at tcp           1  ~rtr     8     8     8     8  
-23 0
lnet.peers = 192.168.52.49 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.50 at tcp           1  ~rtr     8     8     8     8   
-3 0
lnet.peers = 192.168.52.51 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.52 at tcp           1  ~rtr     8     8     8     8  
-23 0
lnet.peers = 192.168.52.53 at tcp           1  ~rtr     8     8     8     8   
-5 0
lnet.peers = 192.168.52.54 at tcp           1  ~rtr     8     8     8     8  
-20 0
lnet.peers = 192.168.52.55 at tcp           1  ~rtr     8     8     8     8   
-5 0
lnet.peers = 192.168.52.56 at tcp           1  ~rtr     8     8     8     8    
1 0
lnet.peers = 192.168.52.57 at tcp           1  ~rtr     8     8     8     8    
1 0
lnet.peers = 192.168.52.58 at tcp           1  ~rtr     8     8     8     8  
-11 0
lnet.peers = 192.168.52.59 at tcp           1  ~rtr     8     8     8     8   
-4 0
lnet.peers = 192.168.52.60 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.61 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.62 at tcp           1  ~rtr     8     8     8     8  
-19 0
lnet.peers = 192.168.52.63 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.64 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.65 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.66 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.67 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.68 at tcp           1  ~rtr     8     8     8     8    
3 0
lnet.peers = 192.168.52.69 at tcp           1  ~rtr     8     8     8     8    
1 0
lnet.peers = 192.168.52.70 at tcp           1  ~rtr     8     8     8     8   
-8 0
lnet.peers = 192.168.52.71 at tcp           1  ~rtr     8     8     8     8   
-2 0
lnet.peers = 192.168.52.72 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.73 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.74 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.75 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 202.122.33.56 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.76 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.77 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.78 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.79 at tcp           1  ~rtr     8     8     8     8   
-3 0
lnet.peers = 192.168.52.80 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.81 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.82 at tcp           1  ~rtr     8     8     8     8    
3 0
lnet.peers = 192.168.52.83 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.84 at tcp           1  ~rtr     8     8     8     8    
1 0
lnet.peers = 192.168.52.86 at tcp           1  ~rtr     8     8     8     8  
-12 0
lnet.peers = 192.168.52.87 at tcp           1  ~rtr     8     8     8     8    
3 0
lnet.peers = 192.168.52.88 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.89 at tcp           1  ~rtr     8     8     8     8    
1 0
lnet.peers = 192.168.52.90 at tcp           1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.91 at tcp           1  ~rtr     8     8     8     8    
3 0
lnet.peers = 192.168.52.92 at tcp           1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.93 at tcp           1  ~rtr     8     8     8     8  
-14 0
lnet.peers = 192.168.52.94 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.95 at tcp           1  ~rtr     8     8     8     8  
-19 0
lnet.peers = 192.168.52.96 at tcp           1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.97 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.98 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.52.99 at tcp           1  ~rtr     8     8     8     8   
-3 0
lnet.peers = 192.168.52.100 at tcp          1  ~rtr     8     8     8     8   
-4 0
lnet.peers = 192.168.52.101 at tcp          1  ~rtr     8     8     8     8    
4 0
lnet.peers = 202.122.33.82 at tcp           1  ~rtr     8     8     8     8
-6383 0
lnet.peers = 192.168.52.102 at tcp          1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 202.122.33.83 at tcp           1  ~rtr     8     8     8     8   
-6 0
lnet.peers = 192.168.52.103 at tcp          1  ~rtr     8     8     8     8    
2 0
lnet.peers = 202.122.33.84 at tcp           1  ~rtr     8     8     8     8 
-649 0
lnet.peers = 192.168.52.104 at tcp          1  ~rtr     8     8     8     8   
-6 0
lnet.peers = 192.168.52.105 at tcp          1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.106 at tcp          1  ~rtr     8     8     8     8  
-15 0
lnet.peers = 192.168.52.107 at tcp          1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.108 at tcp          1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.109 at tcp          1  ~rtr     8     8     8     8  
-79 0
lnet.peers = 192.168.52.110 at tcp          1  ~rtr     8     8     8     8  
-24 0
lnet.peers = 192.168.52.111 at tcp          1  ~rtr     8     8     8     8 
-102 0
lnet.peers = 192.168.52.112 at tcp          1  ~rtr     8     8     8     8    
0 0
lnet.peers = 202.122.33.92 at tcp           1  ~rtr     8     8     8     8
-1148 0
lnet.peers = 202.122.33.93 at tcp           1  ~rtr     8     8     8     8   
-5 0
lnet.peers = 192.168.52.113 at tcp          1  ~rtr     8     8     8     8  
-55 0
lnet.peers = 192.168.52.114 at tcp          1  ~rtr     8     8     8     8  
-73 0
lnet.peers = 192.168.52.115 at tcp          1  ~rtr     8     8     8     8   
-6 0
lnet.peers = 202.122.33.95 at tcp           1  ~rtr     8     8     8     8
-1914 0
lnet.peers = 192.168.52.116 at tcp          1  ~rtr     8     8     8     8   
-4 0
lnet.peers = 192.168.52.117 at tcp          1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.118 at tcp          1  ~rtr     8     8     8     8  
-55 0
lnet.peers = 192.168.52.119 at tcp          1  ~rtr     8     8     8     8   
-1 0
lnet.peers = 192.168.52.120 at tcp          1  ~rtr     8     8     8     8    
1 0
lnet.peers = 192.168.52.121 at tcp          1  ~rtr     8     8     8     8  
-54 0
lnet.peers = 192.168.52.122 at tcp          1  ~rtr     8     8     8     8  
-65 0
lnet.peers = 192.168.52.123 at tcp          1  ~rtr     8     8     8     8  
-16 0
lnet.peers = 192.168.52.124 at tcp          1  ~rtr     8     8     8     8  
-32 0
lnet.peers = 192.168.52.125 at tcp          1  ~rtr     8     8     8     8 
-158 0
lnet.peers = 192.168.52.126 at tcp          1  ~rtr     8     8     8     8    
0 0
lnet.peers = 192.168.52.127 at tcp          1  ~rtr     8     8     8     8   
-2 0
lnet.peers = 192.168.52.128 at tcp          1  ~rtr     8     8     8     8  
-36 0
lnet.peers = 192.168.52.129 at tcp          1  ~rtr     8     8     8     8 
-120 0
lnet.peers = 192.168.52.130 at tcp          1  ~rtr     8     8     8     8    
2 0
lnet.peers = 192.168.52.131 at tcp          1  ~rtr     8     8     8     8  
-82 0
lnet.peers = 192.168.55.11 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.55.12 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.55.13 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.55.14 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.55.15 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.55.16 at tcp           1  ~rtr     8     8     8     8    
4 0
lnet.peers = 192.168.51.134 at tcp          1  ~rtr     8     8     8     8 
-631 0
lnet.routers = ref  rtr_ref alive_cnt  state    last_ping router
lnet.routes = Routing disabled
lnet.routes = net      hops   state router
lnet.stats = 7 6513 0 349123954 349123978 0 25 9871897726514 80688968391 0 7600
lnet.debug_mb = 41
lnet.panic_on_lbug = 0
lnet.catastrophe = 0
lnet.memused = 4166984
lnet.upcall = /usr/lib/lustre/lnet_upcall
lnet.debug_path = /tmp/lustre-log
lnet.console_backoff = 2
lnet.console_min_delay_centisecs = 50
lnet.console_max_delay_centisecs = 60000
lnet.console_ratelimit = 1
lnet.printk = warning error emerg console
lnet.subsystem_debug = undefined mdc mds osc ost class log llite rpc lnet lnd
pinger filter echo ldlm lov lmv sec gss mgc mgs fid fld
lnet.debug = ioctl neterror warning error emerg ha config console


My questions is:
1.What is the signal of the Lustre overload?
2. Can Lustre reject too many connections before it is going to crash?  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/d1f02228/attachment-0001.html

Brian J. Murrell

2008-Nov-10 14:20 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

On Mon, 2008-11-10 at 14:58 +0800, wanglu wrote:> ? 
> Dear list, 
>  
>      Our Lustre system crashes
I don''t see any evidence of a "crash" in your posting here. 
Can you
define what you mean by "crash"?
> The configuration of our system
> OS:Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp
> MDS:1
> OSS:2 with 10Gbit/s NIC, each attached with 2 disk arrays directly. 
> Client: 50 nodes( 8 core server), each has 1Gbit/s NIC
So your entire Lustre server infrastructure is a single node with all of
the MDS, MGS and OSS (2x OSTs) on it?  If yes, can I ask why?  Lustre is
likely not going to perform very well in such a configuration.

Is your storage oversubscribed?  Did you benchmark your storage system
with our iokit to find out the optimum number of OST threads you should
be running?
 > My questions is:
> 1.What is the signal of the Lustre overload?
I''m not sure I''m understanding this question.
> 2. Can Lustre reject too many connections before it is going to
> crash?  
Properly tuned, Lustre will not "crash" due to load, but will manage
it.
As long as your OSS is properly tuned for your storage capabilities, you
can throw as many client loads at it as you want.  Each load will just
get it''s appropriate share of the backend resources.  As you continue
to
add more clients loads, each load will just get a smaller portion of the
total resources.

b.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/3807b261/attachment.bin

Wang lu

2008-Nov-10 14:49 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

Thanks, Brian,
   During a "crash", I can neither SSH to the OSS server, nor
start a new console on the machine directly. A "df" uses over 10 sec.

   Our system has 3 server nodes. 1 server for MDS, 2 servers for 2 OSSs. And
each OSS has 2 disk arrays attached. The Total space is 57TB.

   The problem may be caused by oversubscribed. Since the %util and average
load are both high. However, I do not know 1.How to estimate the optimum number
of OST thread? Do you have any suggestion? 
2.What is the relationship between OST thread number and the number of Lustre
client nodes? if max OST thread number is X, then max Lustre client number is X/
8?(the default connections of a  peer is 8).

Brian J. Murrell ?:
> On Mon, 2008-11-10 at 14:58 +0800, wanglu wrote:
>> ? 
>> Dear list, 
>>  
>>      Our Lustre system crashes
> 
> I don''t see any evidence of a "crash" in your posting
here.  Can you
> define what you mean by "crash"?
> 
>> The configuration of our system
>> OS:Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp
>> MDS:1
>> OSS:2 with 10Gbit/s NIC, each attached with 2 disk arrays directly. 
>> Client: 50 nodes( 8 core server), each has 1Gbit/s NIC
> 
> So your entire Lustre server infrastructure is a single node with all of
> the MDS, MGS and OSS (2x OSTs) on it?  If yes, can I ask why?  Lustre is
> likely not going to perform very well in such a configuration.
> 
> Is your storage oversubscribed?  Did you benchmark your storage system
> with our iokit to find out the optimum number of OST threads you should
> be running?
>  
>> My questions is:
>> 1.What is the signal of the Lustre overload?
> 
> I''m not sure I''m understanding this question.
> 
>> 2. Can Lustre reject too many connections before it is going to
>> crash?  
> 
> Properly tuned, Lustre will not "crash" due to load, but will
manage it.
> As long as your OSS is properly tuned for your storage capabilities, you
> can throw as many client loads at it as you want.  Each load will just
> get it''s appropriate share of the backend resources.  As you
continue to
> add more clients loads, each load will just get a smaller portion of the
> total resources.
> 
> b.
> 
>

Brian J. Murrell

2008-Nov-10 14:58 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

On Mon, 2008-11-10 at 14:49 +0000, Wang lu wrote:> Thanks, Brian,
>    During a "crash", I can neither SSH to the OSS server, nor
> start a new console on the machine directly. A "df" uses over 10
sec.
Yeah, sounds like the OSS is quite "backed up".
>    Our system has 3 server nodes. 1 server for MDS, 2 servers for 2 OSSs.
And
> each OSS has 2 disk arrays attached. The Total space is 57TB.
Ahhh.  OK.  Your description made it sound like you were running all of
those on a single node and the reality is that Lustre doesn''t do
anything magic.  If you only use a single node, you likely won''t see
any
better performance than say, just NFS.  But I digress.
>    The problem may be caused by oversubscribed. Since the %util and average
> load are both high. However, I do not know 1.How to estimate the optimum
number
> of OST thread? Do you have any suggestion? 
Use our iokit.
> 2.What is the relationship between OST thread number and the number of
Lustre
> client nodes?
Nothing.  The relationship is the point of diminishing returns on
driving your storage as you add more threads.  Most storage can benefit
from having multiple threads from a single machine driving it -- to a
point of saturation.  There is no point in driving the storage beyond
that point of saturation.  The iokit will test your storage, throwing
more and more threads at it.  When you look at the output you will find
a maximum number of threads beyond which you get no more increase in
performance.  That number is your optimum OST threads number.

Certainly you can achieve this without the iokit by just playing with
the number of ost threads, adjusting up and down (think about doing a
binary search for example) until you find your sweet spot.  This method
is of course more time consuming.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/cb535a1f/attachment.bin

Wang lu

2008-Nov-10 15:58 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

Thanks, but I am still unclear about: 

1.How to limit the OST thread number after I find a optimum number?
2.The meaning of /proc/sys/lnet/peers and /proc/sys/lnet/nis?
For example
[root at boss01 ~]# cat /proc/sys/lnet/peers
nid                      refs state   max   rtr   min    tx   min queue
192.168.52.39 at tcp           6  ~rtr     8     8     8     3   -19 1458536

[root at boss01 ~]# cat /proc/sys/lnet/nis
nid                      refs peer   max    tx   min
0 at lo                        2    0     0     0     0
192.168.50.33 at tcp         137    8   256   256  -424


Brian J. Murrell ?:
> On Mon, 2008-11-10 at 14:49 +0000, Wang lu wrote:
>> Thanks, Brian,
>>    During a "crash", I can neither SSH to the OSS server, nor
>> start a new console on the machine directly. A "df" uses over
10 sec.
> 
> Yeah, sounds like the OSS is quite "backed up".
> 
>>    Our system has 3 server nodes. 1 server for MDS, 2 servers for 2
OSSs.
And>> each OSS has 2 disk arrays attached. The Total space is 57TB.
> 
> Ahhh.  OK.  Your description made it sound like you were running all of
> those on a single node and the reality is that Lustre doesn''t do
> anything magic.  If you only use a single node, you likely won''t
see any
> better performance than say, just NFS.  But I digress.
> 
>>    The problem may be caused by oversubscribed. Since the %util and
average
>> load are both high. However, I do not know 1.How to estimate the
optimum
number>> of OST thread? Do you have any suggestion? 
> 
> Use our iokit.
> 
>> 2.What is the relationship between OST thread number and the number of
Lustre>> client nodes?
> 
> Nothing.  The relationship is the point of diminishing returns on
> driving your storage as you add more threads.  Most storage can benefit
> from having multiple threads from a single machine driving it -- to a
> point of saturation.  There is no point in driving the storage beyond
> that point of saturation.  The iokit will test your storage, throwing
> more and more threads at it.  When you look at the output you will find
> a maximum number of threads beyond which you get no more increase in
> performance.  That number is your optimum OST threads number.
> 
> Certainly you can achieve this without the iokit by just playing with
> the number of ost threads, adjusting up and down (think about doing a
> binary search for example) until you find your sweet spot.  This method
> is of course more time consuming.
> 
> b.
>

Brian J. Murrell

2008-Nov-10 16:01 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

On Mon, 2008-11-10 at 15:58 +0000, Wang lu wrote:> Thanks, but I am still unclear about: 
> 
> 1.How to limit the OST thread number after I find a optimum number?
It''s a module option to the oss module.  It should be documented in the
manual.
> 2.The meaning of /proc/sys/lnet/peers and /proc/sys/lnet/nis?
The meaning of many of the variables in /proc are also documented in the
manual.  If you find any that are not, you can file a ticket in our bz
requesting they be added.
> For example
> [root at boss01 ~]# cat /proc/sys/lnet/peers
> nid                      refs state   max   rtr   min    tx   min queue
> 192.168.52.39 at tcp           6  ~rtr     8     8     8     3   -19
1458536
> 
> [root at boss01 ~]# cat /proc/sys/lnet/nis
> nid                      refs peer   max    tx   min
> 0 at lo                        2    0     0     0     0
> 192.168.50.33 at tcp         137    8   256   256  -424
I don''t know the details of either of these off-hand.  Probably one of
our LNET experts might be able to provide more information.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/fc734f72/attachment.bin

Wang lu

2008-Nov-10 16:18 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

I am also unclear about the top result:
top - 00:16:19 up 1 day,  3:58,  1 user,  load average: 22.71, 23.27, 23.74
Tasks: 851 total,   2 running, 849 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0% us,  7.0% sy,  0.0% ni, 86.7% id,  0.2% wa,  0.2% hi,  5.9% si
Mem:   8307364k total,   894940k used,  7412424k free,   240912k buffers
Swap: 16386292k total,        0k used, 16386292k free,    78108k cached


The CPU and memory are both free, while the load average is quite high. It is
possibile for Lustre to cache more data?

Brian J. Murrell ?:
> On Mon, 2008-11-10 at 15:58 +0000, Wang lu wrote:
>> Thanks, but I am still unclear about: 
>> 
>> 1.How to limit the OST thread number after I find a optimum number?
> 
> It''s a module option to the oss module.  It should be documented
in the
> manual.
> 
>> 2.The meaning of /proc/sys/lnet/peers and /proc/sys/lnet/nis?
> 
> The meaning of many of the variables in /proc are also documented in the
> manual.  If you find any that are not, you can file a ticket in our bz
> requesting they be added.
> 
>> For example
>> [root at boss01 ~]# cat /proc/sys/lnet/peers
>> nid                      refs state   max   rtr   min    tx   min queue
>> 192.168.52.39 at tcp           6  ~rtr     8     8     8     3   -19
1458536
>> 
>> [root at boss01 ~]# cat /proc/sys/lnet/nis
>> nid                      refs peer   max    tx   min
>> 0 at lo                        2    0     0     0     0
>> 192.168.50.33 at tcp         137    8   256   256  -424
> 
> I don''t know the details of either of these off-hand.  Probably
one of
> our LNET experts might be able to provide more information.
> 
> b.
>

Brian J. Murrell

2008-Nov-10 16:21 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

On Mon, 2008-11-10 at 16:18 +0000, Wang lu wrote:> I am also unclear about the top result:
> top - 00:16:19 up 1 day,  3:58,  1 user,  load average: 22.71, 23.27, 23.74
> Tasks: 851 total,   2 running, 849 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.0% us,  7.0% sy,  0.0% ni, 86.7% id,  0.2% wa,  0.2% hi,  5.9%
si
> Mem:   8307364k total,   894940k used,  7412424k free,   240912k buffers
> Swap: 16386292k total,        0k used, 16386292k free,    78108k cached
> 
> 
> The CPU and memory are both free, while the load average is quite high. It
is
> possibile for Lustre to cache more data?
Caching on the OSS is a coming feature but that doesn''t alleviate the
need of the OST to read data not in cache and data that needs to be
flushed to disk.  IOW, a cache will not alleviate a problem of
oversubscribed storage.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/219202ce/attachment-0001.bin

Wang lu

2008-Nov-10 16:42 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

I have already 512(max number) IO thread running. Some of them are of
"Dead"
status. Is it safe to draw conclusion that the OSS is oversubscribed? 

Brian J. Murrell ?:
> On Mon, 2008-11-10 at 16:18 +0000, Wang lu wrote:
>> I am also unclear about the top result:
>> top - 00:16:19 up 1 day,  3:58,  1 user,  load average: 22.71, 23.27,
23.74
>> Tasks: 851 total,   2 running, 849 sleeping,   0 stopped,   0 zombie
>> Cpu(s):  0.0% us,  7.0% sy,  0.0% ni, 86.7% id,  0.2% wa,  0.2% hi, 
5.9% si
>> Mem:   8307364k total,   894940k used,  7412424k free,   240912k
buffers
>> Swap: 16386292k total,        0k used, 16386292k free,    78108k cached
>> 
>> 
>> The CPU and memory are both free, while the load average is quite high.
It
is>> possibile for Lustre to cache more data?
> 
> Caching on the OSS is a coming feature but that doesn''t alleviate
the
> need of the OST to read data not in cache and data that needs to be
> flushed to disk.  IOW, a cache will not alleviate a problem of
> oversubscribed storage.
> 
> b.
>

Brian J. Murrell

2008-Nov-10 16:55 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote:> I have already 512(max number) IO thread running. Some of them are of
"Dead"
> status. Is it safe to draw conclusion that the OSS is oversubscribed? 
Until you do some analysis of your storage with the iokit, one cannot
really draw any conclusions, however if you are already at the maximum
value of OST threads, it would not be difficult to believe that perhaps
this is a possibility.

Try a simple experiment and half the number to 256 and see if you have
any drop off in throughput to the storage devices.  If not, then you can
easily assume that 512 was either too much or not necessary.  You can
try doing this again if you wish.  If you get to a value of OST threads
where your throughput is lower than it should be, you''ve gone too low.

But really, the iokit is the more efficient and accurate way to
determine this.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081110/af7f384a/attachment.bin

Wang lu

2008-Nov-10 17:01 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

Thanks a lot. I will go on tomorrow.

Brian J. Murrell ?:
> On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote:
>> I have already 512(max number) IO thread running. Some of them are of
"Dead"
>> status. Is it safe to draw conclusion that the OSS is oversubscribed? 
> 
> Until you do some analysis of your storage with the iokit, one cannot
> really draw any conclusions, however if you are already at the maximum
> value of OST threads, it would not be difficult to believe that perhaps
> this is a possibility.
> 
> Try a simple experiment and half the number to 256 and see if you have
> any drop off in throughput to the storage devices.  If not, then you can
> easily assume that 512 was either too much or not necessary.  You can
> try doing this again if you wish.  If you get to a value of OST threads
> where your throughput is lower than it should be, you''ve gone too
low.
> 
> But really, the iokit is the more efficient and accurate way to
> determine this.
> 
> b.
>

wanglu

2008-Nov-11 07:52 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

Hi all, 
    Since there are jobs running on the  clustre, I cann''t do PIOS test
now. I am afraid this situtaion may happen later. Does Lustre has some solution
to deal with over-subscribed instead of Kernel crash? Users can accpt that their
jobs are slow down, but they can not accept their jobs are  dead because of
crash of OSSs.
Or is there any other reason may cause crash of OSSs? 
    Thank you very much! 
    

------------------				 
wanglu
2008-11-11

-------------------------------------------------------------
????Wang lu
?????2008-11-11 01:01:12
????Brian J. Murrell
???lustre-discuss at lists.lustre.org
???Re: [Lustre-discuss] Frequent OSS Crashes with heavy load

Thanks a lot. I will go on tomorrow.

Brian J. Murrell ?:
> On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote:
>> I have already 512(max number) IO thread running. Some of them are of
"Dead"
>> status. Is it safe to draw conclusion that the OSS is oversubscribed? 
> 
> Until you do some analysis of your storage with the iokit, one cannot
> really draw any conclusions, however if you are already at the maximum
> value of OST threads, it would not be difficult to believe that perhaps
> this is a possibility.
> 
> Try a simple experiment and half the number to 256 and see if you have
> any drop off in throughput to the storage devices.  If not, then you can
> easily assume that 512 was either too much or not necessary.  You can
> try doing this again if you wish.  If you get to a value of OST threads
> where your throughput is lower than it should be, you''ve gone too
low.
> 
> But really, the iokit is the more efficient and accurate way to
> determine this.
> 
> b.
> 
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2008-Nov-11 09:50 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

On Nov 11, 2008  15:52 +0800, wanglu wrote:>     Since there are jobs running on the  clustre, I cann''t do PIOS
test now. I am afraid this situtaion may happen later. Does Lustre has some
solution to deal with over-subscribed instead of Kernel crash? Users can accpt
that their jobs are slow down, but they can not accept their jobs are  dead
because of crash of OSSs.
> Or is there any other reason may cause crash of OSSs? 
You can increase the lustre timeout, temporarily on all clients & servers:

    lctl set_param timeout=200

or permanently in the filesystem configuration (on the MGS only):

    lctl conf_param {fsname}.sys.timeout=200
> wanglu
> 2008-11-11
> 
> -------------------------------------------------------------
> ????Wang lu
> ?????2008-11-11 01:01:12
> ????Brian J. Murrell
> ???lustre-discuss at lists.lustre.org
> ???Re: [Lustre-discuss] Frequent OSS Crashes with heavy load
> 
> Thanks a lot. I will go on tomorrow.
> 
> Brian J. Murrell ?:
> 
> > On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote:
> >> I have already 512(max number) IO thread running. Some of them are
of "Dead"
> >> status. Is it safe to draw conclusion that the OSS is
oversubscribed?
> > 
> > Until you do some analysis of your storage with the iokit, one cannot
> > really draw any conclusions, however if you are already at the maximum
> > value of OST threads, it would not be difficult to believe that
perhaps
> > this is a possibility.
> > 
> > Try a simple experiment and half the number to 256 and see if you have
> > any drop off in throughput to the storage devices.  If not, then you
can
> > easily assume that 512 was either too much or not necessary.  You can
> > try doing this again if you wish.  If you get to a value of OST
threads
> > where your throughput is lower than it should be, you''ve gone
too low.
> > 
> > But really, the iokit is the more efficient and accurate way to
> > determine this.
> > 
> > b.
> > 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Wang lu

2008-Nov-12 12:24 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

Brian J. Murrell ?:
> On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote:
>> I have already 512(max number) IO thread running. Some of them are of
"Dead"
>> status. Is it safe to draw conclusion that the OSS is oversubscribed? 
> 
> Until you do some analysis of your storage with the iokit, one cannot
> really draw any conclusions, however if you are already at the maximum
> value of OST threads, it would not be difficult to believe that perhaps
> this is a possibility.
> 
> Try a simple experiment and half the number to 256 and see if you have
> any drop off in throughput to the storage devices.  If not, then you can
> easily assume that 512 was either too much or not necessary.  You can
> try doing this again if you wish.  If you get to a value of OST threads
> where your throughput is lower than it should be, you''ve gone too
low.
> 
> But really, the iokit is the more efficient and accurate way to
> determine this.
> 
> b.
>

Wang lu

2008-Nov-12 13:48 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

May I ask where can I run PIOS command? I think to determine the max thread
number of OSS, it should be run on OSS, however, the OST directorys are
unwritable. Can I write to /dev/sdaX? I am confused. 


Brian J. Murrell ?:
> On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote:
>> I have already 512(max number) IO thread running. Some of them are of
"Dead"
>> status. Is it safe to draw conclusion that the OSS is oversubscribed? 
> 
> Until you do some analysis of your storage with the iokit, one cannot
> really draw any conclusions, however if you are already at the maximum
> value of OST threads, it would not be difficult to believe that perhaps
> this is a possibility.
> 
> Try a simple experiment and half the number to 256 and see if you have
> any drop off in throughput to the storage devices.  If not, then you can
> easily assume that 512 was either too much or not necessary.  You can
> try doing this again if you wish.  If you get to a value of OST threads
> where your throughput is lower than it should be, you''ve gone too
low.
> 
> But really, the iokit is the more efficient and accurate way to
> determine this.
> 
> b.
>

Andreas Dilger

2008-Nov-12 17:36 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

On Nov 12, 2008  13:48 +0000, Wang lu wrote:> May I ask where can I run PIOS command? I think to determine the max thread
> number of OSS, it should be run on OSS, however, the OST directorys are
> unwritable. Can I write to /dev/sdaX? I am confused. 
Running PIOS directly the /dev/sdX will overwrite all data there.  It should
only be run on the disk devices before the filesystem is formatted.  You
can run PIOS against the filesystem itself (e.g. /mnt/lustre) to just create
regular files in the filesystem.
> Brian J. Murrell ?:
> 
> > On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote:
> >> I have already 512(max number) IO thread running. Some of them are
of "Dead"
> >> status. Is it safe to draw conclusion that the OSS is
oversubscribed?
> > 
> > Until you do some analysis of your storage with the iokit, one cannot
> > really draw any conclusions, however if you are already at the maximum
> > value of OST threads, it would not be difficult to believe that
perhaps
> > this is a possibility.
> > 
> > Try a simple experiment and half the number to 256 and see if you have
> > any drop off in throughput to the storage devices.  If not, then you
can
> > easily assume that 512 was either too much or not necessary.  You can
> > try doing this again if you wish.  If you get to a value of OST
threads
> > where your throughput is lower than it should be, you''ve gone
too low.
> > 
> > But really, the iokit is the more efficient and accurate way to
> > determine this.
> > 
> > b.
> > 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Wang lu

2008-Nov-12 17:49 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

Do you mean mount Lustre on a OSS server, and then do PIOS test? One client
node has only 1Gbit Network. It can not  saturat the OSS server. 



[root at boss01 /]# mount -t lustre mds01 at tcp0:/besfs /besfs
[root at boss01 /]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/cciss/c0d0p1     30233896   3784000  24914084  14% /
none                   4153680         0   4153680   0% /dev/shm
/dev/cciss/c0d0p5     92702372     90176  87903148   1% /scrach
/dev/cciss/c0d0p3      2016044     35836   1877796   2% /usr/vice/cache
/dev/sda1            6728210844 1657103924 4729333456  26% /lustre/besfs/ost0
/dev/sda2            6728210844 1659522080 4726915300  26% /lustre/besfs/ost1
/dev/sdb1            6728210844 1644823840 4741613540  26% /lustre/besfs/ost2
/dev/sdb2            6728210844 1653193084 4733244296  26% /lustre/besfs/ost3
mds01 at tcp0:/besfs    53825686752 13247980384 37843027072  26% /besfs


Andreas Dilger ?:
> On Nov 12, 2008  13:48 +0000, Wang lu wrote:
>> May I ask where can I run PIOS command? I think to determine the max
thread
>> number of OSS, it should be run on OSS, however, the OST directorys are
>> unwritable. Can I write to /dev/sdaX? I am confused. 
> 
> Running PIOS directly the /dev/sdX will overwrite all data there.  It
should
> only be run on the disk devices before the filesystem is formatted.  You
> can run PIOS against the filesystem itself (e.g. /mnt/lustre) to just
create
> regular files in the filesystem.
> 
>> Brian J. Murrell ?:
>> 
>> > On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote:
>> >> I have already 512(max number) IO thread running. Some of them
are of "
Dead">> >> status. Is it safe to draw conclusion that the OSS is
oversubscribed?
>> > 
>> > Until you do some analysis of your storage with the iokit, one
cannot
>> > really draw any conclusions, however if you are already at the
maximum
>> > value of OST threads, it would not be difficult to believe that
perhaps
>> > this is a possibility.
>> > 
>> > Try a simple experiment and half the number to 256 and see if you
have
>> > any drop off in throughput to the storage devices.  If not, then
you can
>> > easily assume that 512 was either too much or not necessary.  You
can
>> > try doing this again if you wish.  If you get to a value of OST
threads
>> > where your throughput is lower than it should be, you''ve
gone too low.
>> > 
>> > But really, the iokit is the more efficient and accurate way to
>> > determine this.
>> > 
>> > b.
>> > 
>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>

wanglu

2008-Nov-13 11:32 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

Dear all, 
    This is a piece of error log: 
    Nov 13 18:25:26 boss02 kernel: Lustre:
27228:0:(filter_io_26.c:700:filter_commitrw_write()) Skipped 56 previous similar
messages
Nov 13 18:25:26 boss02 kernel: Lustre:
27176:0:(lustre_fsfilt.h:246:fsfilt_brw_start_log()) besfs-OST0004: slow journal
start 47s
Nov 13 18:25:26 boss02 kernel: Lustre:
27231:0:(filter_io_26.c:713:filter_commitrw_write()) besfs-OST0004: slow
brw_start 47s
Nov 13 18:25:26 boss02 kernel: Lustre:
27231:0:(filter_io_26.c:713:filter_commitrw_write()) Skipped 8 previous similar
messages
Nov 13 18:25:26 boss02 kernel: Lustre:
27176:0:(lustre_fsfilt.h:246:fsfilt_brw_start_log()) Skipped 10 previous similar
messages
Nov 13 18:25:26 boss02 kernel: Lustre:
27278:0:(filter_io_26.c:765:filter_commitrw_write()) besfs-OST0004: slow
direct_io 47s
Nov 13 18:25:26 boss02 kernel: Lustre:
27235:0:(lustre_fsfilt.h:302:fsfilt_commit_wait()) besfs-OST0004: slow journal
start 47s
Nov 13 18:25:26 boss02 kernel: Lustre:
27235:0:(filter_io_26.c:778:filter_commitrw_write()) besfs-OST0004: slow
commitrw commit 47s
Nov 13 18:25:47 boss02 sshd[18062]: Accepted password for root from
192.168.50.33 port 32796
Nov 13 10:25:47 boss02 sshd[18063]: Accepted password for root from
192.168.50.33 port 32796

<----I could not log in from SSH here and went to the console-->
<---What I saw--->
Nov 13 18:25:47 boss02 sshd(pam_unix)[18064]: session opened for user root by
root(uid=0)
Nov 13 18:29:00 boss02 kernel: Lustre:
27501:0:(ldlm_lib.c:525:target_handle_reconnect()) besfs-OST0004:
f8e1ba7f-1faf-9b85-b04b-cbf89fe80640 reconnecting
Nov 13 18:29:00 boss02 kernel: Lustre:
27501:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 1 previous similar
message
Nov 13 18:29:00 boss02 kernel: Lustre:
27359:0:(ldlm_lib.c:525:target_handle_reconnect()) besfs-OST0001:
f819d104-ee19-f011-d6d6-bde44a19a8df reconnecting
Nov 13 18:29:00 boss02 kernel: Lustre:
27359:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 4 previous similar
messages
Nov 13 18:29:51 boss02 kernel: Lustre:
27074:0:(ldlm_lib.c:525:target_handle_reconnect()) besfs-OST0001:
1cd07c52-94c0-3a1c-dcd4-390daf0f0d10 reconnecting
Nov 13 18:29:51 boss02 kernel: Lustre:
27074:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 2 previous similar
messages
Nov 13 18:35:02 boss02 kernel: LustreError:
26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial receive from
12345-192.168.52.79 at tcp, ip 192.168.52.79:1021, with error
Nov 13 18:35:02 boss02 kernel: LustreError:
26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc
e1c24000
Nov 13 18:35:02 boss02 kernel: LustreError:
17941:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on bulk GET
0(1048576)  req at ea8cd200 x10376088/t0
o4->b99b0138-d1de-93db-0418-c08eeb8c4b57 at NET_0x20000c0a8344f_UUID:0/0 lens
384/352 e 0 to 0 dl 1226573467 ref 1 fl Interpret:/0/0 rc 0/0
Nov 13 18:35:02 boss02 kernel: Lustre:
17941:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0001: ignoring bulk IO
comm error with b99b0138-d1de-93db-0418-c08eeb8c4b57 at NET_0x20000c0a8344f_UUID
id 12345-192.168.52.79 at tcp - client will retry
Nov 13 18:35:04 boss02 kernel: LustreError:
26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial receive from
12345-192.168.52.94 at tcp, ip 192.168.52.94:1021, with error
Nov 13 18:35:04 boss02 kernel: LustreError:
26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc
f6ce6000
Nov 13 18:35:04 boss02 kernel: LustreError:
18214:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on bulk GET
0(1048576)  req at e3d07a00 x12379068/t0
o4->7dfc3e78-2411-0625-f276-26756a033f22 at NET_0x20000c0a8345e_UUID:0/0 lens
384/352 e 0 to 0 dl 1226573468 ref 1 fl Interpret:/0/0 rc 0/0
Nov 13 18:35:04 boss02 kernel: Lustre:
18214:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0000: ignoring bulk IO
comm error with 7dfc3e78-2411-0625-f276-26756a033f22 at NET_0x20000c0a8345e_UUID
id 12345-192.168.52.94 at tcp - client will retry
Nov 13 18:35:13 boss02 kernel: LustreError:
26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial receive from
12345-192.168.52.70 at tcp, ip 192.168.52.70:1021, with error
Nov 13 18:35:13 boss02 kernel: LustreError:
26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc
d9f6b000
Nov 13 18:35:13 boss02 kernel: LustreError:
27177:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on bulk GET
0(1048576)  req at f5164800 x600925/t0
o4->53e9e602-8258-51f4-c7f9-4b9ded4efc27 at NET_0x20000c0a83446_UUID:0/0 lens
384/352 e 0 to 0 dl 1226573467 ref 1 fl Interpret:/0/0 rc 0/0
Nov 13 18:35:13 boss02 kernel: Lustre:
27177:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0000: ignoring bulk IO
comm error with 53e9e602-8258-51f4-c7f9-4b9ded4efc27 at NET_0x20000c0a83446_UUID
id 12345-192.168.52.70 at tcp - client will retry
Nov 13 18:35:15 boss02 kernel: LustreError:
26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial receive from
12345-192.168.52.81 at tcp, ip 192.168.52.81:1021, with error
Nov 13 18:35:15 boss02 kernel: LustreError:
26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc
d34d2000
Nov 13 18:35:15 boss02 kernel: LustreError:
27237:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on bulk GET
0(1048576)  req at c5a8da2c x12883457/t0
o4->dce502fc-79fb-9a4e-5e97-90a58a814569 at NET_0x20000c0a83451_UUID:0/0 lens
384/352 e 0 to 0 dl 1226573467 ref 1 fl Interpret:/0/0 rc 0/0
Nov 13 18:35:15 boss02 kernel: Lustre:
27237:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0003: ignoring bulk IO
comm error with dce502fc-79fb-9a4e-5e97-90a58a814569 at NET_0x20000c0a83451_UUID
id 12345-192.168.52.81 at tcp - client will retry
Nov 13 18:35:17 boss02 kernel: LustreError:
26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc
da5d7000
Nov 13 18:35:18 boss02 kernel: LustreError:
26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial receive from
12345-192.168.52.108 at tcp, ip 192.168.52.108:1021, with error
Nov 13 18:35:18 boss02 kernel: LustreError:
26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Skipped 1 previous similar
message
Nov 13 18:35:18 boss02 kernel: LustreError:
26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc
c71c0000
Nov 13 18:35:18 boss02 kernel: LustreError:
27215:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on bulk GET
0(1048576)  req at e3767800 x7236800/t0
o4->66fc5c30-3666-f0e6-d005-a39f58eb4be2 at NET_0x20000c0a8346c_UUID:0/0 lens
384/352 e 0 to 0 dl 1226573468 ref 1 fl Interpret:/0/0 rc 0/0
Nov 13 18:35:18 boss02 kernel: LustreError:
27215:0:(ost_handler.c:1139:ost_brw_write()) Skipped 1 previous similar message
Nov 13 18:35:18 boss02 kernel: Lustre:
27215:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0000: ignoring bulk IO
comm error with 66fc5c30-3666-f0e6-d005-a39f58eb4be2 at NET_0x20000c0a8346c_UUID
id 12345-192.168.52.108 at tcp - client will retry
Nov 13 18:35:18 boss02 kernel: Lustre:
27215:0:(ost_handler.c:1270:ost_brw_write()) Skipped 1 previous similar message


<---At that time, the network was down, couldn''t ping gateway-->
<--I have tried restart service network, but after restarted, gateway was
still unreachable--->







------------------				 
wanglu
2008-11-13

-------------------------------------------------------------
????Andreas Dilger
?????2008-11-13 01:36:57
????Wang lu
???Brian J. Murrell; lustre-discuss at lists.lustre.org
???Re: [Lustre-discuss] Frequent OSS Crashes with heavy load

On Nov 12, 2008  13:48 +0000, Wang lu wrote:> May I ask where can I run PIOS command? I think to determine the max thread
> number of OSS, it should be run on OSS, however, the OST directorys are
> unwritable. Can I write to /dev/sdaX? I am confused. 
Running PIOS directly the /dev/sdX will overwrite all data there.  It should
only be run on the disk devices before the filesystem is formatted.  You
can run PIOS against the filesystem itself (e.g. /mnt/lustre) to just create
regular files in the filesystem.
> Brian J. Murrell ?:
> 
> > On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote:
> >> I have already 512(max number) IO thread running. Some of them are
of "Dead"
> >> status. Is it safe to draw conclusion that the OSS is
oversubscribed?
> > 
> > Until you do some analysis of your storage with the iokit, one cannot
> > really draw any conclusions, however if you are already at the maximum
> > value of OST threads, it would not be difficult to believe that
perhaps
> > this is a possibility.
> > 
> > Try a simple experiment and half the number to 256 and see if you have
> > any drop off in throughput to the storage devices.  If not, then you
can
> > easily assume that 512 was either too much or not necessary.  You can
> > try doing this again if you wish.  If you get to a value of OST
threads
> > where your throughput is lower than it should be, you''ve gone
too low.
> > 
> > But really, the iokit is the more efficient and accurate way to
> > determine this.
> > 
> > b.
> > 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Brian J. Murrell

2008-Nov-13 13:58 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

There is really no need to put both andreas and myself into your new
message recipient addresses.  We are both on the lustre-discuss list.

On Thu, 2008-11-13 at 19:32 +0800, wanglu wrote:> <----I could not log in from SSH here and went to the console-->
> <---What I saw--->
> ...
> Nov 13 18:35:02 boss02 kernel: LustreError:
26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial receive from
12345-192.168.52.79 at tcp, ip 192.168.52.79:1021, with error
> Nov 13 18:35:02 boss02 kernel: LustreError:
26928:0:(events.c:361:server_bulk_callback()) event type 2, status -5, desc
e1c24000
> Nov 13 18:35:02 boss02 kernel: LustreError:
17941:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on bulk GET
0(1048576)  req at ea8cd200 x10376088/t0
o4->b99b0138-d1de-93db-0418-c08eeb8c4b57 at NET_0x20000c0a8344f_UUID:0/0 lens
384/352 e 0 to 0 dl 1226573467 ref 1 fl Interpret:/0/0 rc 0/0                                                                                
^^^^^^^^^^^^^^^^^^> Nov 13 18:35:02 boss02 kernel: Lustre:
17941:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0001: ignoring bulk IO
comm error with b99b0138-d1de-93db-0418-c08eeb8c4b57 at NET_0x20000c0a8344f_UUID
id 12345-192.168.52.79 at tcp - client will retry
[ Many more ]> 
> <---At that time, the network was down, couldn''t ping
gateway-->
> <--I have tried restart service network, but after restarted, gateway
was still unreachable--->
You have networking problems, not Lustre problems.  Lustre only utilizes
whatever network you provide it.  It does not control it.  It does not
bring it up, take it down or reconfigure it in any way.  Your operating
system does this.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081113/48be01b8/attachment.bin

Jim Harm

2008-Nov-13 23:53 UTC

head link

[Lustre-discuss] Frequent OSS Crashes with heavy load

We have experienced all these errors when we have a big job that is 
writing many small chunks.
when the writes are ... say 80 bytes and the block size is 4k bytes, 
the back end storage can
slow down with read block, modify block, write block, to such and 
extent as to cause the slow commitrw
and slow journal messages very similar to yours.
from Your email:
Dear all,
     This is a piece of error log:
     Nov 13 18:25:26 boss02 kernel: Lustre: 
27228:0:(filter_io_26.c:700:filter_commitrw_write()) Skipped 56 
previous similar messages
Nov 13 18:25:26 boss02 kernel: Lustre: 
27176:0:(lustre_fsfilt.h:246:fsfilt_brw_start_log()) besfs-OST0004: 
slow journal start 47s
Nov 13 18:25:26 boss02 kernel: Lustre: 
27231:0:(filter_io_26.c:713:filter_commitrw_write()) besfs-OST0004: 
slow brw_start 47s
Nov 13 18:25:26 boss02 kernel: Lustre: 
27231:0:(filter_io_26.c:713:filter_commitrw_write()) Skipped 8 
previous similar messages
Nov 13 18:25:26 boss02 kernel: Lustre: 
27176:0:(lustre_fsfilt.h:246:fsfilt_brw_start_log()) Skipped 10 
previous similar messages
Nov 13 18:25:26 boss02 kernel: Lustre: 
27278:0:(filter_io_26.c:765:filter_commitrw_write()) besfs-OST0004: 
slow direct_io 47s
Nov 13 18:25:26 boss02 kernel: Lustre: 
27235:0:(lustre_fsfilt.h:302:fsfilt_commit_wait()) besfs-OST0004: 
slow journal start 47s

You may check for a job that is confirming small writes instead of 
caching and writing Mbytes.
we have even seen this phenomenon back up the server to the extent 
that it will appear to the client
that it is time to try the failover server, which fails.
just something to check.

At 8:58 AM -0500 11/13/08, Brian J. Murrell wrote:>Content-type: multipart/signed; boundary="=-8X67HSS1Wp3J4Z9OBh3u";
>	protocol="application/pgp-signature"; micalg=pgp-sha1
>
>There is really no need to put both andreas and myself into your new
>message recipient addresses.  We are both on the lustre-discuss list.
>
>On Thu, 2008-11-13 at 19:32 +0800, wanglu wrote:
>>  <----I could not log in from SSH here and went to the console-->
>>  <---What I saw--->
>>  ...
>>  Nov 13 18:35:02 boss02 kernel: LustreError: 
>>26928:0:(socklnd.c:1613:ksocknal_destroy_conn()) Completing partial 
>>receive from 12345-192.168.52.79 at tcp, ip 192.168.52.79:1021, with 
>>error
>>  Nov 13 18:35:02 boss02 kernel: LustreError: 
>>26928:0:(events.c:361:server_bulk_callback()) event type 2, status 
>>-5, desc e1c24000
>>  Nov 13 18:35:02 boss02 kernel: LustreError: 
>>17941:0:(ost_handler.c:1139:ost_brw_write()) @@@ network error on 
>>bulk GET 0(1048576)  req at ea8cd200 x10376088/t0 
>>o4->b99b0138-d1de-93db-0418-c08eeb8c4b57 at
NET_0x20000c0a8344f_UUID:0/0
>>lens 384/352 e 0 to 0 dl 1226573467 ref 1 fl Interpret:/0/0 rc 0/0
> 
>^^^^^^^^^^^^^^^^^^
>>  Nov 13 18:35:02 boss02 kernel: Lustre: 
>>17941:0:(ost_handler.c:1270:ost_brw_write()) besfs-OST0001: 
>>ignoring bulk IO comm error with 
>>b99b0138-d1de-93db-0418-c08eeb8c4b57 at NET_0x20000c0a8344f_UUID id 
>>12345-192.168.52.79 at tcp - client will retry
>[ Many more ]
>>
>>  <---At that time, the network was down, couldn''t ping
gateway-->
>>  <--I have tried restart service network, but after restarted, 
>>gateway was still unreachable--->
>
>You have networking problems, not Lustre problems.  Lustre only utilizes
>whatever network you provide it.  It does not control it.  It does not
>bring it up, take it down or reconfigure it in any way.  Your operating
>system does this.
>
>b.
>
>
>Content-Type: application/pgp-signature; name="signature.asc"
>Content-Description: This is a digitally signed message part
>
>Attachment converted: PowerBook HD:signature.asc (    /    ) (001D608B)
>_______________________________________________
>Lustre-discuss mailing list
>Lustre-discuss at lists.lustre.org
>http:// lists.lustre.org/mailman/listinfo/lustre-discuss

-- 
}}}===============>>  LLNL
James E. Harm (Jim); jharm at llnl.gov
System Administrator, ICCD Clusters
(925) 422-4018 Page: 423-7705x57152

Lustre discuss - Nov 2008 - Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load

[Lustre-discuss] Frequent OSS Crashes with heavy load