thr3ads.net - Ext3 users - Intermittent ext3 corruption on external firewire Micronet 1.5Tb RAID on FC3 [May 2005]

If this information is useful, please help other people find it:
Share via:

David Clunie

2005-May-15 13:56 UTC

Intermittent ext3 corruption on external firewire Micronet 1.5Tb RAID on FC3

Hi

I have a Firewire connected Micronet 1.5TB RAID with a single
large ext3 filesystem on one partition on a dual Xeon system.

I am checking out from an extremely large cvs repository
(don't ask) to this drive over the course of many days, and
intermittently I get bad blocks and the filesystem goes
read-only. This is not related to any power failure or
anything similar. The RAID is currently about 40% full;
this started to happen around the 15% mark as I recall.

I checked the RAID firmware setup, found that caching was
set to write-back, and changed it to write-through to
see if that would help (since I gather the Linux kernel
presumes write-through, though why it should make a
difference in the absence of a reboot or power failure
I don't understand).

This reduced the frequency of the error from once a night
to once every couple of nights; interestingly mostly at
about 04:03 AM or so. Looking at cron.daily, only mrtg
and sa seem to be starting up at about that time.

I suspect the timing is related to a change in the pattern
of disk activity rather than anything else.

I have no reason to suspect that there is anything actually
wrong with the RAID itself, which just appears as a really
big firewire external disk. It is new however, so this
can't be ruled out.

My next step is to just turn off journaling and see if
doing this with just ext2 works OK. Journaling doesn't
seem to be doing much good as I am stuck regularly running
ordinary fsck's with all these errors anyway !

I just thought I would ask if anyone else has had a similar
experience, and whether such issues are known to be with ext3,
or the firewire interface, or both together.

PS. I did actually create the partition and did the mkfs on
an AMD64 FC3 system at a different site, though that is not the
system to which the RAID is currently connected. Just mention
that in case this makes a difference, but I presume an fsck
would have noticed and fixed anything fundamentally wrong in
this regard.

David

May 15 04:03:30 localhost kernel: Aborting journal on device sdd1.
May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1):
ext3_journal_start_sb: Detected aborted journal
May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1): ext3_xattr_get:
inode 63343526: bad block 165510584
May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1) in
start_transaction: Journal has aborted
May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1) in
start_transaction: Journal has aborted
May 15 04:03:30 localhost kernel: inode_doinit_with_dentry:  getxattr returned 5
for dev=sdd1 ino=63343526
May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1): ext3_xattr_get:
inode 63343381: bad block 141623810
May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1): ext3_xattr_get:
inode 63947123: bad block 203323361

Linux localhost.localdomain 2.6.9-1.667smp #1 SMP Tue Nov 2 14:59:52 EST 2004
i686 i686 i386 GNU/Linux

Joseph D. Wagner

2005-May-15 22:48 UTC

head link

Intermittent ext3 corruption on external firewire Micronet 1.5Tb RAID on FC3

> May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1):
> ext3_xattr_get: inode 63343526: bad block 165510584
> May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1):
> ext3_xattr_get: inode 63343381: bad block 141623810
> May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1):
> ext3_xattr_get: inode 63947123: bad block 203323361
These errors cannot be caused by a bug in the file system.  It is possible,
although highly unlikely, that a bug in the device driver could generate these
errors.

The most likely cause is that there actually are bad blocks on your new 1.5TB
file system.

Do us all a favor and run:

Badblocks -v -b block_size /dev/device

And let us know about the results.

Joseph D. Wagner

anandtiwari at softhome.net

2005-May-16 23:39 UTC

head link

Ext3 journal corruption

Hi all, 

I was having a ext3 filesystem with writeback. yesterday my system crashed 
and now when i try to mount it, it gives me "Invalid argument".
Following is
the command line
#mount -t ext3 /dev/hda1 /mnt/home 

i tried debugging it and later i found out, its was complaining about 
journaling inode. Is there any way to recover my files, i did clone the disk 
and mounted it as ext2 after few tries but there was nothing in it.
any help or pointers will be appreciated, 

Thanks
anand

Andreas Dilger

2005-May-17 06:04 UTC

head link

Intermittent ext3 corruption on external firewire Micronet 1.5Tb RAID on FC3

On May 15, 2005  09:56 -0400, David Clunie wrote:> I have a Firewire connected Micronet 1.5TB RAID with a single
> large ext3 filesystem on one partition on a dual Xeon system.
For some kernels (maybe even current ones) it is possible that
there is a problem with IO beyond 1 TB.

What I would do (if you don't mind overwriting the disk, presumably
not if it is just new and doesn't contain important data) is to
write a small test program to write the byte offset at the start of
every 4kB block on the disk, then read them all back and verify it
is correct.

This will tell you if there is aliasing in the block device (possibly
e.g. an int used instead of __u32 or sector_t).
 
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Reasonably Related Threads

Search for more seemingly similar threads

Ext3 users - May 2005 - Intermittent ext3 corruption on external firewire Micronet 1.5Tb RAID on FC3

Intermittent ext3 corruption on external firewire Micronet 1.5Tb RAID on FC3

Intermittent ext3 corruption on external firewire Micronet 1.5Tb RAID on FC3

Ext3 journal corruption

Intermittent ext3 corruption on external firewire Micronet 1.5Tb RAID on FC3

Reasonably Related Threads