On Thu, Feb 01, 2007 at 07:50:25PM -0800, Brandon Lamb
wrote:> The default is 256 megabytes I believe? Would it be better to increase
> this? Does it matter on the total partition size, ie should i have
> more journal for the more disk space i have?
That's the default with -Tmail... It's lower (I _think_ 128 megs or so)
by
default on a sufficiently large block device.
Basically, I arrived at those numbers by running some metadata heavy
workloads and comparing performance numbers. 128 megs seemed to have the
best tradeoff of journal size versus performance, while 256 sqeezed out a
few more percent. After 256, I saw negligible improvement.
Now, the problem with these measurements is that it's just based on some
"average" machines in our lab. I'd venture to guess though that
the vast
majority of people will be fine with the -Tmail default of 256 megs. If it
really concerns you though, repeating my tests is trivial. Basically, take
the workload you're likely to run and reproduce it on your test machines
with various journal sizes.
> How about number of nodes, does this matter? Do I need more or less
> journal space per node depending on how many nodes I have?
Number of nodes doesn't matter to journal performance - the journal is a
node local resource, that is until it gets recovered by another node.
What will affect performance however, is the cross section of node activity.
Basically, every time one node makes a change to a resource which other
nodes have locks on, it has the potential to cause a journal flush on those
nodes (if they too have outstanding changes).
You'll see the most performance when you let each node run on seperate parts
of the file system. So, if you have a workload which requires lots of file
creates, try to set it up so that each node does the creates in their own
directory. If you have something which needs to do many writes to a file,
try to seperate the files by node.
The database gets around this via direct I/O, but the majority of apps don't
do that sort of I/O (nor should they - you lose some coherency guarantees in
ocfs2 by using O_DIRECT).
> I am not real knowledgeable on what the journal does exactly, I think
> I get the gist but I dont know the effects of changing the size.
The only downside I can think of off the top of my head other than the disk
space consumption and possible increased latency before releasing locks
(discussed above) is that there's a potential for additional memory usage if
the file system is allowed to pin that many buffers in memory before they're
written out to the file system and reclaimed. So far I haven't seen anyone
having real problems with that on the default journal sizes, but it's
definitely something worth mentioning.
-Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com