On Wednesday 30 October 2002 07:44 am, Vinnie wrote:> Hello everyone,
> The RAID unit is a Promise UltraTrak100-TX8, with 8 Western Digital
> WD1200JB 120GB ATA100 7200rpm hard drives installed. 7 of the 8 drives
> are joined to a RAID5 array, the 8th is an unassigned hot spare. The
> UltraTrak's SCSI interface is an Ultra2-LVD (80MB/sec) interface,
> connected via its external 68-pin MicroD cable, to a custom Granite
> Digital internal-to-external "Gold TPO" ribbon cable - which
leads to
> the "B" channel of the onboard AIC7899W Ultra160 SCSI interface.
The
> RAID unit is the only SCSI device attached to this channel at this time,
> and is terminated with a Granite Digital SCSI-Vue active diagnostic
> terminator. I have no indication or suspicion whatsoever of any SCSI
> bus problems. (I have also run same UltraTrak unit with same diag
> terminator to an AHA2940U2W in the "old" file server, with same
write
> performance issues, to be described below).
>
Break the Drives up into RAID1 sets, the more spindles you throw at it the
better off you'll be.
> Currently, the array is partitioned with a /boot partition, and a /
> partition, each as ext3 with the default data=ordered journaling mode.
> I have begun to realize gradually why it is a decent idea to break up
> the filesystem into separate mount points and partitions, and may yet
> end up doing that. But that's a rabbit to hunt another day, unless
> taking care of this is also required to solve this problem.
This is _very_ adviseable.
> This file server performs 5 key fileserver-related roles, due to its
> having the large RAID5 file storage for the network:
>
> 1. Serves the mailboxes for our domain to the two frontend mail/web
> servers via NFS mount
>
> 2. Runs the master SQL server - the two mail/web servers run local slave
> copies of the mail account databases
>
> 3. Stores the master copy of web documents served by the web servers
> (and will replicate them to web servers when documents change, still
> working on this though)
>
> 4. Samba file server for storage needs on the network
>
> 5. Limited/restricted-access FTP server for web clients
Do any of these require more than 120GB of storage (meaning are they too large
to fit on a single 120GB RAID1 set)?
>
>
> For the most part, the file server runs great and does its job quite
> well. However there are two main circumstances in which things run
> quite poorly to "go downhill":
>
> 1. Daily maintenance-type cron events (like updatedb)
>
> 2. Other heavy file WRITE activity, such as when Samba clients are
> backing up their files to this server from the network. We regularly
> have some very large files being copied over to the file server via
> Samba (1 GB drive image files, for example)
>
> In both cases, or other cases of heavy file I/O (mainly writes), this
> server pretty much grinds to a halt. It starts grabbing up all of the
> available RAM to use as dcache, presumably because the RAID unit cannot
> write it to disk that fast. The inevitable is stalled as long as
> possible, but eventually the backlog uses up all available system RAM
> (we have 2GB in this puppy now), until it is forced to write
> synchronously to free up some dcache for fresh data coming in. While
> this is going on, might as well forget delivering/retrieving an email
> to/from mailboxes, or getting much anything else out of the server. We
> have seen "NFS Server Not Responding" errors, and MySQL errors
too (from
> the vpopmail libs trying to look up the username/pw and mailbox location).
>
> Once the "emergency/panic" sync writing to disk is complete, the
server
> reverts back to running great (although linux never seems to de-allocate
> RAM it has grabbed for dcache -- that is until it absolutely HAS to give
> it up).
>
> From what I've been reading this seems to be normal for 2.4-series
> kernels (I'm running a modded 2.4.18 on this server, patched with the
> various NFS suite of patches, plus recent iptables), it seems to really
> like to use RAM for cache. And I suppose that RAM works better doing
> SOMETHING, than just sitting there looking pretty under the available
> column. ;)
>
Ok then, hmmmm, looks like you should be dividing things up, perhaps like:
You have 4 RAID1 sets at 120GB:
Set 1> Operating system, boot, etc. Should also store the web roots, and be
the FTP store (I'm assuming FTP is for updating the web roots). Regardless,
other than /var/log I would imagine this set is more READ oriented.
Set 2> Your samba file store. Heavy, heavy writes here. Keep these disks
isolated so heavy write activity only affects samba performance (aka
workstation backups). Keeping these heavy sustained writes isolated will help
your NFS timeout issues.
Set 3> If your MySQL setup databases are fairly large and performance
oriented, they should always get their own dedicated spindles.
Set 4> NFS store for mail. I normally wouldn't advise writing mail to a
central NFS store, but if you've been having good luck with it then ok.
120GB per set probably is a ton of overkill here, I doubt your MySQL databases
are even close to 120GB. This is where the new ultra modern high capacity
drives kill. You need more spindles, not capacity. The outcome of this is
mostly underutilized disk space.
> THOUGHTS ABOUT USING AN EXTERNAL DATA=JOURNAL SETUP
> After reading many posts in the archives here and other things I could
> find, I have considered setting up a separate pair of quick drives in a
> RAID1 array as an external journal, and setting DATA=JOURNAL mode on the
> root filesystem mount.
>
You are correct, making external journals does improve write performance
pretty significantly. I've done some testing on a postfix mail server
running
in data=journal mode and an external journal increased performance very, very
significantly.
Remember though, you can move the journal to an external device at any time. I
would heavily recommend that you break up your spindles and allocate the
journal with the filesystem (a large journal with the filesystem) to start
out with. Then if performance still demands it, grab some small(er) disks and
move the journals off to them.
When I say large journal, I usually think around the 250MB range. I personally
wouldn't recommend allocating a super large (greater than 1GB), but I'll
reside and let the FS experts advise on that issue.
> CAN WE CHANGE JOURNAL LOCATION ON EXISTING EXT3 PARTITIONS?
> One other snag it seems we may run into is the fact that the / partition
> already has a journal (/.journal, I presume), since it's already an
ext3
> partition. Is it possible to tell the system we want the journal
> somewhere else instead? Strikes me that when we're ready to move to
the
> external journal, we may have to mount the / partition ext2, then remove
> the journal, and create the new one and point the / partition to it with
> the e2fs tools?
Yes, except I would _not_ advise moving the / partition journal to an external
device. The / partition should have very little activity (assuming /var or
/var/log is a separate file system). This is the prime reason you should not
be allocating one huge / filesystem. Break it up into something like:
/
/var
/tmp
/usr
/usr/local
and create special mounts for your samba, mysql, webroot (NFS), mail (NFS),
stuff.
/usr/local/mysql
/usr/local/webs
/usr/local/filestore
etc. This setup would allow you to tweak the journals as you see fit. Almost
all your disk activity (using the above non existent example) would be on the
filesystems:
/usr/local/mysql
/usr/local/webs
/usr/local/filestore
Meaning this is where you would focus your tuning. / /var /tmp /usr and
/usr/local may be ok with the standard settings, but the flexibility allows
those to be tuned as well.
For instance, /usr/local/filestore is a heavy write based (samba) filesystem.
Create this filesystem with a large journal (250MB). Later, if you notice
you're still at I/O capacity, try moving the journal off to another set of
disks (so you don't have to buy the disks unless you have to). Etc., Etc...
Only having to tune those that need it.
>
> Thanks in advance for all thoughts, opinions, and suggestions. I'll
> provide whatever other details necessary.
>
> Thanks in advance,
> vinnie
>
Anyhow, hope this helps.
Jeremy