[NOTE: I apologize in advance if this shows up as a duplicate. I sent it once by accident from the wrong account, so the message has been waiting moderation. If the moderator reads the list, hopefully they'll notice I already posted :)] I have a number of machines which are used for mail storage. We've had issues with sporadic slow connections to the machine, seemingly blocking on I/O. After running some tests it seemed as if we might be filling the journal. In some synthetic testing we did, increasing journal size eliminated the spikes we saw at regular intervals with a large amount of simultaneous reading and writing. This weekend we increased journal size from 32MB to 256MB on a group of machines. There was one with the following configuration: 2 x P3/667 256MB ServeRAID 4L (16MB cache, writethrough) 5 x 18GB 10k Ultra160 (RAID5, 8KB stripe) Red Hat 7.2 w/ kernel 2.4.20-18.7 The rest are: 2 x Xeon/2.4 1024MB ServeRAID 6i (128MB cache, writethrough) 5 x 36GB 15k Ultra320 (RAID5EE, 8KB stripe) Red Hat 7.2 w/ kernel 2.4.20-24.7 (addl path: ips 6.10 driver) This actually seemed to fix the problem on the older machine. The slow connections are pretty much eliminated. The newer machines, on the other hand, are getting *more* slow connections, and load average, which previously never exceeded 6, has been seeing occasional quick spikes as high as 30 (but by the time the machine is actively viewed, the run queue is pretty much empty). It's worth noting that the newer systems didn't exhibit any problematic behaviour with the arrays in write-through mode. However, a power outage on one of those machines had previously resulted in an exceedingly large amount of data corruption on files that hadn't been modified in as long as an hour, despite the daemon calling fsync after write. Since the drives were already in writethrough, and the files had both been fsynced and were old enough they should have been flushed anyways, it was assumed that the contents of the controllers cache was the likely culprit (since the controller has battery backed cache, we're inquiring with IBM why it might be that cache wasn't flushed when the array was brought back online). Anyways, any ideas on why an increased journal would cause decreased performance? Nothing I've read in the archives would suggest that should happen. -- Matthew Berg <galt@gothpoodle.com>