Hi all,
I''m still successful in bringing my OSSs to a standstill if not
crashing
them.
Having reduced the number of stress jobs writing to Lustre (stress -d 2
--hdd-noclean --hdd-bytes 5M) to four, and having reduced the number of
OSS threads (options ost oss_num_threads=256 in /etc/modprobe.d/lustre),
the OSS do not freeze entirely any more. Instead after ~ 15 hours,
- all stress jobs have terminated with Input/output error
- the MDT has marked the affected OSTs as Inactive
- the already open connections to the OSS remain active
- interactive collectl, "watch df", top sessions are still working
- the number of ll_ost threads is 256 ( number of ll_ost_io is 257 ?)
- log file writing has obviously stopped after only 10 hours
- already open shells allow commands like "ps", I can kill some
processes
- new ssh login doesn''t work
- access to disk, as in "ls", brings the system to total freeze
The process table shows six ll_ost_io - threads, all using 38.9% cpu,
all running for 419:21m. All the rest are sleeping.
The cause can''t be system overloading or simple faulty hardware. To
give an impression of what is going on, I''m quoting the last collectl
record:
##########################################################################################
### RECORD 139 (1217475195.342) (Thu Jul 31 05:33:15 2008) ###
# CPU SUMMARY (INTR, CTXSW & PROC /sec)
# USER NICE SYS WAIT IRQ SOFT STEAL IDLE INTR CTXSW PROC
RUNQ RUN AVG1 AVG5 AVG15
0 0 14 20 0 5 0
58 4255 53K 1 736 6 22.06
31.28 31.13
# DISK SUMMARY (/sec)
#KBRead RMerged Reads SizeKB KBWrite WMerged Writes SizeKB
0 0 0 0 83740
314 861 97
# LUSTRE FILESYSTEM SINGLE OST STATISTICS
#Ost KBRead Reads KBWrite Writes
OST0004 0 0 40674 63
OST0005 0 0 40858 66
##########################################################################################
That''s not too much for the machine, I''d reckon. And as
mentioned in an
earlier post, I have run the very same ''stress'' test, also
with CPU load
or I/O load only, locally on machines that had crashed earlier. The test
runs that wrote to disk finished only when the disks where 100% full
(then formatted plain ext3), the tests with I/O load = 500 and CPU load
= 1k are running for three days now. Of course I don''t know how
reliable these test are.
Looks to me as if a few Lustre threads for some reason can''t process
their I/O any more, kind of building up pressure and finally blocking
all (disk) I/O.
Knowing this reason and how to avoid it would not only relieve these
servers of some pressure... ;-)
Hm, hardware: the cluster is running Debian Etch, Kernel 2.6.22, Lustre
1.6.5. The OSS are Supermicro X7DB8 fileservers, Xeon E5320, 8GB RAM,
with 16 internal disks on two 3ware 9650 RAID controllers, forming two
OSTs each.
Many thanks for any further hints,
Thomas