Hi, I get problems with ext3 delete blocking filesystem access or
slowing down write speeds.
My system is following:
* a process is reading real-time data (with few seconds of
buffering) and after processing writing with top speed of 2x10
Mbyte/s (two streams to different disks).
* Then there are two processes that read data from the same disks
and process it further and copy it to yet another pair of disks.
* Yet another processes is then deleting older files to keep disk
usage below 85%
The reason for this kind of processing is that the second step is too
slow to happen real time, the incoming data is bursty in nature and at
peek load the processors are not fast enough to process the data. On
average (given 2x900 GB disk buffer) the system is, however fast enough
to post-process the data.
However, as my delete script malfunctioned, and at one point it had
2x100 GB files to delete; thus running 'rm file' one after one for those
400 files, about 500 MB each. What then resulted was that the
real-time data processing became too slow and and buffers overfload.
Of course, I could force delete script to sleep few seconds between file
deletes to allow write process to recover, but still this feels a bit of
unsure patch.
I looked on IO schedulers, but while I'm quite familar with networking
queues, IO scheduler is largely unknown for me. I assume that you
cannot assing per-process priorities with IO schedulers? As that would
be the case, I would max priority for the real-time process and put
delete function to lowest one.
Any ideas how I could make sure that the system would do its best to
provide good service for real-time processing? The secondary processing
is niced, but if I recall right, the delete was running with nice 0.
I had few ideas to improve things, but not yet had time to implement:
* I could use tee-like program for post-processing. At first it
tries to process data real-time (reading from raw stream after it
has been written to disk, so data could be in buffer if caching is
set ok), but it if could not keep with it, it would then just
queue post-processing and continue later, when load allows.
* Smaller files would of course make blocking time shorter.
If it matters, the systems use sata disks (both native and scsi-raid),
and have kernel 2.6.26 (Debian Lenny).
. Markus