On Apr 17, 2006 15:22 -0400, Sev Binello wrote:> We have had a raid failure, we have some what recovered
> but we continue to see the following ext3 message...
>
> Apr 17 14:59:14 acnlin84 kernel: EXT3-fs unexpected failure:
> (((jh2bh(jh))->b_state & (1UL << BH_Uptodate)) != 0);
> Apr 17 14:59:14 acnlin84 kernel: Possible IO failure.
>
>
> Since we have experienced several instances of ext3 file system
corruption
> when we lose total communication with our raid,
> we were wondering if there was any concrete advice out there
> on what to do in this situation.
You really, really, really need to mount your filesystem with
"-o errors=remount-ro", at least to prevent filesystem corruption.
I'm not sure if this is enough to prevent corruption in the case
of your RAID disconnects (if it doesn't generate errors up to the
filesystem, but still discards writes), but it is at least a minimum
requirement.
> Other messages we got before the ones above...
> Apr 17 13:40:42 acnlin84 kernel: EXT3-fs error (device sd(8,33)):
> ext3_free_blocks: bit already cleared for block 14943160
> Apr 17 13:40:42 acnlin84 kernel: EXT3-fs error (device sd(8,33)):
> ext3_free_blocks: bit already cleared for block 3703794
>
> Apr 17 13:40:43 acnlin84 kernel: EXT3-fs error (device sd(8,65)):
> ext3_get_inode_loc: unable to read inode block - inode=50931914,
> block=101843272
> --
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.