Mike Miller wrote:> All,
> We are encountering spurious errors with ext3. After some period of heavy
IO
> we may see messages similiar to:
>
> EXT3-fs error (device cciss/c0d0p5) in start_transaction: Journal has
> aborted
You probably had relevant messages before that... what were they?
> When this happens the filesystem is remounted read-only. If it's the
root
> filesystem the system becomes unresponsive and must be rebooted. An fsck on
> the affected filesystem shows lots of corruption.
> Any ideas on what we can do to help isolate this problem? We have 64 nodes
> and the problem is random.
Crazy question, but I have to ask - you don't have the same filesystem
mounted on all those nodes, do you?
What kernel is this?
-Eric