thr3ads.net - Ext3 users - [SUMMARY] 2 Linux boxes, failover, & 1 EXT3 RAID [Apr 2002]

If this information is useful, please help other people find it:
Share via:

Bill Antoniadis

2002-Apr-02 13:24 UTC

[SUMMARY] 2 Linux boxes, failover, & 1 EXT3 RAID

Hello,

Many warm thank yous to Bill Rugolsky Jr. and Stephen Tweedie for their help on
this one.  Both pointed out that since the file system is journaled, if the
primary box (nas1) were to crash, the secondary box should mount the ext3 file
system without any problems.  Depending on the nature of the journal (metadata
journaling and/or data journaling), we may have little or no data loss.

Bill Rigolsky, also pointed out that I may have some performance benefits from
data=journal option, since I am exporting the EXT3 filesystem with the
"rw,sync,no_wdelay" options, thus forcing NFS to do synchronous
commits.  His
reasoning is based on a theory that with "data=ordered" and
"sync" options for
EXT3 and NFS, the system will have to work harder to write out data blocks and
may need to "seek all over the disk to do so."  This will decrease
throughput.
However, with "data=journal", the NFS forced syncs will write the data
in a
(likely) contiguous journal (less disk seeking, less latency, increased
throughput) and allow the kernel to do its actual disk commits on it''s
own
pace.

Best Regards,
Bill Antoniadis

-------------------------------------------------------------------------------

Following is my original email:
-------------------------------
Setup:
------
I have two RedHat 7.2 (2.4.9-31) boxes that are attached to one external RAID
unit.  Both boxes are able to see the RAID unit as /dev/sdb1, but only
one box mounts (cat /proc/mounts yields: /dev/sdb1 /nas ext3 rw 0 0) the unit
at any give time.  The other box listens, via heartbeat (linux-ha), waiting to
mount the RAID unit, should it''s sibling crash (actually, heartbeat no
longer
heard via serial and ethernet).  The /nas directory is NFS exported with the
rw,sync,no_wdelay options to several Linux and Tru64 boxes.

Questions:
----------
What will I encounter should the primary (i.e. box currently mounting /dev/sdb1)
crash, and the backup take over?  From my simulations, I see the backup mount
/dev/sdb1 but I get the following in it''s /var/log/messages:

nas2 kernel: kjournald starting.  Commit interval 5 seconds
nas2 kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is
recommended
nas2 kernel: EXT3 FS 2.4-0.9.11, 3 Oct 2001 on sd(8,17), internal journal
nas2 kernel: EXT3-fs: recovery complete.
nas2 kernel: EXT3-fs mounted filesystem with ordered data mode.

My limited understanding is that since both the primary box (named
"nas1") and
the secondary (named "nas2") are keeping a metadata-only journal, that
data
updates were flushed to disk (on nas1) and the metadata changes were not
committed, thus nas2 sees an inconsistent filesystem when mounting.  Am I
correct?

If we run with nas2 box for a while, and then decide to switch back to nas1,
how will nas1 and it''s journal playback, react to the changes committed
by nas2
since the crash?

Would it be safer to always run e2fsck on nas2 takeover, prior to mounting
/dev/sdb1?

Am I wrong in choosing EXT3 over EXT2 in this setup?

Any help is greatly appreciated.

Thanks in advance,
Bill Antoniadis

Stephen Tweedie

2002-Apr-05 07:33 UTC

head link

Re: [SUMMARY] 2 Linux boxes, failover, & 1 EXT3 RAID

Hi,

On Tue, Apr 02, 2002 at 04:24:16PM -0500, Bill Antoniadis wrote:
 > Many warm thank yous to Bill Rugolsky Jr. and Stephen Tweedie for their
help on
> this one.  Both pointed out that since the file system is journaled, if the
> primary box (nas1) were to crash, the secondary box should mount the ext3
file
> system without any problems.  Depending on the nature of the journal
(metadata
> journaling and/or data journaling), we may have little or no data loss.
More than that --- think of the failover as a simple system crash.
The only difference is that the "reboot" involves bringing up the
filesystem on a different node, rather than the original node.

Thinking about it this way makes data integrity much easier to
visualise.  Any time you want to make data persistent over a reboot at
a certain point in your application, it's up to your application to
ensure that it tells the filesystem so by calling fsync() or by using
synchronised IO.  The result of the fsync is *exactly* the same
regardless of whether you are doing a single-node reboot or a two-node
failover.  Unix performs universal write-behind data caching for local
disk writes, so any application which assumes data integrity on disk
without asking for that explicitly is simply broken.

Cheers,
 Stephen

Maybe Matching Threads

Search for more maybe matching threads

Ext3 users - Apr 2002 - [SUMMARY] 2 Linux boxes, failover, & 1 EXT3 RAID

[SUMMARY] 2 Linux boxes, failover, & 1 EXT3 RAID

Re: [SUMMARY] 2 Linux boxes, failover, & 1 EXT3 RAID

Maybe Matching Threads