thr3ads.net - Ext3 users - rebooting more often to stop fsck problems and total disk loss [Mar 2007]

If this information is useful, please help other people find it:
Share via:

ahlist

2007-Mar-19 21:15 UTC

rebooting more often to stop fsck problems and total disk loss

Hi,

I run several hundred servers that are used heavily (webhosting, etc.)
all day long.

Quite often we'll have a server that either needs a really long fsck
(10 hours - 200 gig drive) or an fsck that evntually results in
everything going to lost+found (pretty much a total loss).

Would rebooting these servers monthly (or some other frequency) stop this?

Is it correct to visualize this as small errors compounding over time
thus more frequent reboots would allow quick fsck's to fix the errors
before they become huge?

(OS is redhat 7.3 and el3)

Thanks for any input!

Andreas Dilger

2007-Mar-19 21:27 UTC

head link

rebooting more often to stop fsck problems and total disk loss

On Mar 19, 2007  17:15 -0400, ahlist wrote:> Quite often we'll have a server that either needs a really long fsck
> (10 hours - 200 gig drive) or an fsck that evntually results in
> everything going to lost+found (pretty much a total loss).
Strange.  We get 1TB/hr fscks these days unless the filesystem is
completely corrupted and has a lot of duplicate blocks.
> Would rebooting these servers monthly (or some other frequency) stop this?
What else is important is that if you do an fsck you run with "-f" to
actually check the filesystem instead of just the superblock.  e2fsck
will only do a full e2fsck if the kernel detected disk corruption, OR
if the "last checked" time is > 6 months or {20 < X < 40}
mounts have
happened since the last check time.  See tune2fs(8) for details.
> Is it correct to visualize this as small errors compounding over time
> thus more frequent reboots would allow quick fsck's to fix the errors
> before they become huge?
That is definitely true.  If the bitmaps get corrupted, then this will
spread corruption throughout the filesystem.
> (OS is redhat 7.3 and el3)
I would instead suggest updating to a newer kernel (e.g. RHEL4 2.6.9) as
this has fixed a LOT of bugs in ext3.  Also, make sure you are using the
newest e2fsck available, as some bugs have been fixed there also.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Seemingly Similar Threads

Search for more maybe matching threads

Ext3 users - Mar 2007 - rebooting more often to stop fsck problems and total disk loss

rebooting more often to stop fsck problems and total disk loss

rebooting more often to stop fsck problems and total disk loss

Seemingly Similar Threads