Everyone, We just had a pretty bad crash on one of production boxes and the ext2 filesystem on the data partition of our box had some major filesystem corruption. Needless to say, I am now looking into converting the filesystem to ext3 and I have some questions regarding ext3 and Linux software RAID. I have read that previously there were some issues running ext3 on a software raid device (/dev/mdN), but that most of those issues are resolved by running kernel 2.4.x. Currently we are running 2.4.16 on our producton system and we have a rather complicated hardware/software RAID configuration on the box. Now for the details of my system. See http://w3.one.net/~djflux/graphics/raiddiag.png for a graphic of our RAID configuration. We have 2 Dell PowerVault 220S filled with 15K 18GB SCSI drives. Each drive in PowerVault 1 is hardware mirrored to the corresponding drive in PowerVault 2. I then use Linux software RAID0 to create a stripe across these 12 drives (/dev/md0). This setup is kind of convoluted due to hardware restraints (a Dell PERC3QC RAID card can only span [RAID10] 8 drives and we wanted 12). Internal to the box I have 7 (10K 36GB) SCSI disks and a hardware stripe (RAID0, /dev/sdb1). I then use Linux software RAID1 to mirror this drive with the software RAID0 creating /dev/md1. I know I'm only using a portion of the full space on /dev/sdb1, but it is hoped to use it all at some later date. There is an ext2 filesystem on /dev/md1 that is used for the Informix/IBM database called UniVerse. The reason for this RAID configuration is to have a static copy of data to be used for backups. I suspend database operations long enough to use mdctl to fail and remove /dev/sdb1 out of /dev/md1. I can then backup the static database data knowing that it is a valid point-in-time snapshot of my database. Tar is used to archive this drive to tape and then it is hotadded back in to /dev/md1 for resync after the tar archive completes. The box is a Dell PowerEdge 6400 with 4 700MHz Xeon, and 8GB of RAM. The box hosts approximately 450 users during the average business day. The database is currently about 70GB and the partition on /dev/md1 is about 200GB. The database has a few large files that the majority of system users access very frequently, mostly for reads, but also for updates. We want the highest level of integrity for our data, but do not want to impact the interactivity of the machine very much. Current system load averages range from 0.33 to 3.50 and occasionally spiking higher. Now that you know the basics of my system and our ideal requirements, I have a few questions: - Is it wise to convert the filesystem on /dev/md1 to ext3? - Have the issues with ext3 on Linux RAID been resolved? - Will the failing and resyncing of /dev/md1 happening on a daily basis cause problems with the journalling? - Do you think the filesystem would be stable enough for 18x7 availability? - What kind of overhead is involved after the filesystem is ext3? - What journalling mode is suggested for this type of application/system configuration? - What size journal would be appropriate give data=ordered vs. data=journal? - And any other suggestions/insights/comments. Below is our /etc/raidtab. Let me know if you need any more information. Thank you in advance for all your assistance. Regards, Andrew Rechenberg Network Team, Sherman Financial Group arechenberg@shermanfinancialgroup.com raiddev /dev/md0 raid-level 0 persistent-superblock 1 chunk-size 64 nr-raid-disks 12 nr-spare-disks 0 device /dev/sdc1 raid-disk 0 device /dev/sdd1 raid-disk 1 device /dev/sde1 raid-disk 2 device /dev/sdf1 raid-disk 3 device /dev/sdg1 raid-disk 4 device /dev/sdh1 raid-disk 5 device /dev/sdi1 raid-disk 6 device /dev/sdj1 raid-disk 7 device /dev/sdk1 raid-disk 8 device /dev/sdl1 raid-disk 9 device /dev/sdm1 raid-disk 10 device /dev/sdn1 raid-disk 11 raiddev /dev/md1 raid-level 1 persistent-superblock 1 chunk-size 64 nr-raid-disks 2 nr-spare-disks 0 device /dev/md0 raid-disk 0 device /dev/sdb1 raid-disk 1
On Sat, 2002-03-02 at 09:18, Rechenberg, Andrew wrote:> Everyone, > > We just had a pretty bad crash on one of production boxes and the ext2 > filesystem on the data partition of our box had some major filesystem > corruption. Needless to say, I am now looking into converting the > filesystem to ext3 and I have some questions regarding ext3 and Linux > software RAID.I'm sure you already know this, but ext2 sucks. :-)> I have read that previously there were some issues running ext3 on a > software raid device (/dev/mdN), but that most of those issues are resolved > by running kernel 2.4.x. Currently we are running 2.4.16 on our producton > system and we have a rather complicated hardware/software RAID configuration > on the box.Correct, there aren't any bugs that I know of in running ext3 on RAID under the 2.4.x kernel series. [snip]> Now that you know the basics of my system and our ideal requirements, I have > a few questions: > > - Is it wise to convert the filesystem on /dev/md1 to ext3?I see no reason that converting to ext3 would be a poor choice. It will allow you to recover from a hugely higher percentage of crashes. fscks are still necessary from time to time, but unlike ext2, they actually work.> - Have the issues with ext3 on Linux RAID been resolved?To the best of my knowledge, yes.> - Will the failing and resyncing of /dev/md1 happening on a daily basis > cause problems with the journalling?The filesystem sees the Linux software RAID devices as just another block device, so it shouldn't even know or care about the "failures".> - Do you think the filesystem would be stable enough for 18x7 availability?Yes, but if you're using unpatched 2.4.16, I think you've got some virtual memory issues that need to be resolved. There are a few people who have patches to get the VM in the kernel up to "enterprise" levels, but they're not necessarily in the main kernels yet. Red Hat 7.2 shipped with ext3 as the default filesystem, which gets it onto a huge number of machines, some under much higher demand situations than this one sounds like. [snip filesystem questions] I'm not sure, I'm not that much of a filesystem expert. I expect that someone else will know much better. Greg -- Portland, Oregon, USA.
On Sat, Mar 02, 2002 at 12:18:44PM -0500, Rechenberg, Andrew wrote: ...> We just had a pretty bad crash on one of production boxes and the ext2 > filesystem on the data partition of our box had some major filesystem > corruption. Needless to say, I am now looking into converting the > filesystem to ext3 and I have some questions regarding ext3 and Linux > software RAID. > > I have read that previously there were some issues running ext3 on a > software raid device (/dev/mdN), but that most of those issues are resolved > by running kernel 2.4.x. Currently we are running 2.4.16 on our producton > system and we have a rather complicated hardware/software RAID configuration > on the box.To a large extent what is called "ext3" should probably be called "ext2+j", or some such, but "ext3" makes sense for many things. You have probably seen my comments. I did "solve" the problem by running my dual-cpu box with uniprocessor kernel. Even a SMP kernel with "nosmp" boot option (to run it with only the boot processor) didn't work. My troublesome machine has been co-located into a place at which I don't have daily access. How much the problem lies in the gcc version (RH 7.1 2.96-97.1), and how much in the kernel (2.4.17 release), I don't know. It seems to happen when there is: - lots of memory - at least 2 _fast_ processors - process writing at full tilt a very large file (1.3+ times the memory size) I didn't ever see the problem with 128 MB Dual PPro200 machine even with same disks that latter with another motherboard/cpus did cause problems. I haven't seen the problem with my home machine which is, if possible, even heftier box that the one which I do get to hang -- except it has dual IDE disks at same IDE cable, instead of a separate dual channel IDE controller or SCSI controller. (Both of which have hung up on me at otherbox.) ...> - Is it wise to convert the filesystem on /dev/md1 to ext3? > > - Have the issues with ext3 on Linux RAID been resolved?Things I have seen in 2.4.18 test releases don't convince me. However I haven't been able to test them either. Maybe in couple weeks time, but not yet.> - Will the failing and resyncing of /dev/md1 happening on a daily basis > cause problems with the journalling?It should be invisible. Doing it at a quiet moment would be advisable, of course. Running the resync will take heaps of time, though. Consider 252 GB being synced at 15 MB/sec (even that might be a bit over optimistic with some disks), it takes "mere" 16800 seconds, or 4h 40m. Heaps of smarter internal design to support a real RAID10 (instead of RAID1 on top of a pair of RAID0) might allow syncing all disks in parallel, which of course reduces the max speed, but might achieve the sync in about an hour, or two.> - Do you think the filesystem would be stable enough for 18x7 availability?EXT3, yes. But likely you should ask RedHat for supported Enterprise kernel. There are other reasons why ReiserFS might make sense in your system, but unfortunately they don't help with backup... .. or maybe... See the site: http://devlinux.com/namesys If the reiserfs can do snapshot at filesystem level, and you take backup at filesystem level.. Then you can have heaps of small RAID1 pairs stripped together with RAID0 -- e.g. "RAID01" (or which ever way those hybrides are called..) ... but still, those don't help in case your system experiences the RAID1+EXT3 hangup which I kept seeing. When you have time, try it. "tune2fs -j /dev/md1" (Have suitable toolset online also), e2fsck, and mount with "-t ext3". Then try writing there a large file, e.g.: dd if=/dev/zero bs=1024k of=test.file count=12000 If it does not hang with a few runs, you probably are safe.> - What kind of overhead is involved after the filesystem is ext3?Aside of the journal file, it is exactly the same as EXT2. Indeed you can tune2fs a EXT2 filesystem to be EXT3. I have done that. It takes a bit to have the root to mutate itself in boot to ext3, for some reason. But you are not doing this for your boot system.. Difficulty appears only with ext2 filesystem that has been in use for a longer time with older kernels.> - What journalling mode is suggested for this type of application/system > configuration? > - What size journal would be appropriate give data=ordered vs. data=journal? > - And any other suggestions/insights/comments.For these I don't have opinnions. I have been using "save and slow" mode (data=ordered), but I am not in a very great hurry most of the time. Indeed presently one of my remotely located machines has been running RAID1 in degraded mode because 2 weeks ago one of its IDE disks did fail, and I noticed it only yesterday (lacking automated monitoring.) It seems I can get to it in a weeks time, but will it need replacing, or just a powercycle, no idea...> Below is our /etc/raidtab. Let me know if you need any more information. > Thank you in advance for all your assistance. > > Regards, > Andrew Rechenberg > Network Team, Sherman Financial Group > arechenberg@shermanfinancialgroup.com/Matti Aarnio
Anyone have any comments as to the proper journal size for a 200GB partition with data=ordered? Thanks, Andy. -----Original Message----- From: Andrew Morton [mailto:akpm@zip.com.au] Sent: Saturday, March 02, 2002 9:32 PM To: Matti Aarnio Cc: Rechenberg, Andrew; 'ext3-users@redhat.com'; 'linux-raid@vger.kernel.org' Subject: Re: ext3 on Linux software RAID1 Matti Aarnio wrote:> > How much the problem lies in the gcc version (RH 7.1 2.96-97.1), andA dog of a compiler. Burn it. I had a machine which oopsed mysteriously once a day somewhere down playing with swapspace ptes. Compiling that kernel with 2.91.66 fixed the problem. Similarly, I was in correspondence with a person with a labful of sporadically oopsing machines a while back. Same deal. Use 2.91.66 or 2.95 -
On Wed, Mar 06, 2002 at 04:14:05PM -0500, Rechenberg, Andrew wrote:> Anyone have any comments as to the proper journal size for a 200GB partition with data=ordered?The journal size is computed automatically by "tune2fs -j /dev/???" -- Ralf Hildebrandt (Im Auftrag des Referat V A) Ralf.Hildebrandt@charite.de Charite Campus Virchow-Klinikum Tel. +49 (0)30-450 570-155 Referat V A - Kommunikationsnetze - Fax. +49 (0)30-450 570-916 If Bill Gates had a dime for every time a Windows box crashed... ...Oh, wait a minute, he already does.