thr3ads.net - Ocfs2 users - [Ocfs2-users] ocfs2 performance and scaling [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Sabuj Pattanayek

2008-Jul-17 20:54 UTC

[Ocfs2-users] ocfs2 performance and scaling

Hi,

I'm using OCFS2 from 2.6.26 with some patches I made that allow for
the creation of a volume greater than 16TB:

http://oss.oracle.com/pipermail/ocfs2-devel/2008-July/002568.html
http://oss.oracle.com/pipermail/ocfs2-tools-devel/2008-July/000857.html

The ocfs2-tools-devel post has info regarding the block/cluster size
(from the mkfs command) used which will pertain to the following
question: in general, what sort of performance numbers are people
seeing for something like "time dd if=/dev/zero of=testFile bs=4k
count=500000"? I'm getting anywhere from 120MB/s to 165MB/s . The same
command on XFS using the same hardware/LVM setup gives me 300MB/s and
with GFS2 gives 100MB/s. Currently there's only one node in the
cluster but if other nodes are added with similar 4GB FC HBA hardware
will these also achieve ~120-165MB/s write speeds as long as the RAID
hardware isn't being "maxed" out?

Here are some bonnie++ benchmarks:

http://structbio.vanderbilt.edu/~pattans/bonnie-porpoise.html

Also if any devs could look at the patches to see if I missed anything
that might cause OCFS2 to blow up if it reaches for a block offset
greater than 2^32 - 1, would greatly appreciate it (please post in
reply to the posts on the -devel lists). As far as the write testing
is going, it's only at 1.1T of 18T written, i.e. it'll take a day or
two and then I'll have to try some fseek and read calls for large
offsets.

Thanks,
Sabuj Pattanayek

Sunil Mushran

2008-Jul-17 21:58 UTC

head link

[Ocfs2-users] ocfs2 performance and scaling

Sabuj Pattanayek wrote:> Hi,
>
> I'm using OCFS2 from 2.6.26 with some patches I made that allow for
> the creation of a volume greater than 16TB:
>
> http://oss.oracle.com/pipermail/ocfs2-devel/2008-July/002568.html
> http://oss.oracle.com/pipermail/ocfs2-tools-devel/2008-July/000857.html
>
> The ocfs2-tools-devel post has info regarding the block/cluster size
> (from the mkfs command) used which will pertain to the following
> question: in general, what sort of performance numbers are people
> seeing for something like "time dd if=/dev/zero of=testFile bs=4k
> count=500000"? I'm getting anywhere from 120MB/s to 165MB/s . The
same
> command on XFS using the same hardware/LVM setup gives me 300MB/s and
> with GFS2 gives 100MB/s. Currently there's only one node in the
> cluster but if other nodes are added with similar 4GB FC HBA hardware
> will these also achieve ~120-165MB/s write speeds as long as the RAID
> hardware isn't being "maxed" out?
>   
Try it out. If not, then we have a bottleneck somewhere.

One obvious bottleneck is the global bitmap. The fs works around this by
using a node local bitmap cache called localalloc. By default it is 8MB.
So if you are using a 4K/4K (block/cluster), then you will hit the global
bitmap (and thus cluster lock) every 2048 extents. If that is a bottleneck,
you can mount with a larger localalloc.

To mount with 16MB localalloc, do:
mount -olocalalloc=16

XFS has delayed allocation that allows it to write data in fewer extents
allowing it to provide better i/o thruput in buffered access.
> Here are some bonnie++ benchmarks:
>
> http://structbio.vanderbilt.edu/~pattans/bonnie-porpoise.html
>
> Also if any devs could look at the patches to see if I missed anything
> that might cause OCFS2 to blow up if it reaches for a block offset
> greater than 2^32 - 1, would greatly appreciate it (please post in
> reply to the posts on the -devel lists). As far as the write testing
> is going, it's only at 1.1T of 18T written, i.e. it'll take a day
or
> two and then I'll have to try some fseek and read calls for large
> offsets.
>   
So JBD2 will allow one to go beyond 4 billion blocks. But to make ocfs2
access beyond 16T, you will for the time being need to use clustersize > 4K.

To make ocfs2 with 4K clustersize access beyond 16T will need more changes.
See task titled... Support more than 32-bits worth of clusters.
http://oss.oracle.com/osswiki/OCFS2/LargeTasksList

A quick way to fill up space could be using unwritten extents. It will just
allocate space and not bother writing to it. Check out 
reserve_space/reserve_space.c
in the ocfs2-test project.

As far as the kernel patches go, we would like backward compatibility.
As in, not get rid of jbd just yet. Maybe an incompat flag. But this has not
been decided.

Let us know how it goes.

Sunil

Mark Fasheh

2008-Jul-24 23:54 UTC

head link

[Ocfs2-users] ocfs2 performance and scaling

On Thu, Jul 17, 2008 at 03:54:19PM -0500, Sabuj Pattanayek
wrote:> The ocfs2-tools-devel post has info regarding the block/cluster size
> (from the mkfs command) used which will pertain to the following
> question: in general, what sort of performance numbers are people
> seeing for something like "time dd if=/dev/zero of=testFile bs=4k
> count=500000"? I'm getting anywhere from 120MB/s to 165MB/s . The
same
> command on XFS using the same hardware/LVM setup gives me 300MB/s and
> with GFS2 gives 100MB/s.
Try mounting Ocfs2 in writeback journaling mode:

	mount -t ocfs2 -odata=writeback /dev/XXX /mountpoint

That should increase your perfomance for streaming writes.


By the way, are you only timing the dd, or are you doing dd;sync and timing
the entire operation? The latter is a better measurement of how long it
takes to get data written to disk.
	--Mark

--
Mark Fasheh

Sabuj Pattanayek

2008-Jul-25 02:41 UTC

head link

[Ocfs2-users] ocfs2 performance and scaling

> Try mounting Ocfs2 in writeback journaling mode:
>
>        mount -t ocfs2 -odata=writeback /dev/XXX /mountpoint
Yup:
> /dev/mapper/vg-ocfs2_0 on /export/ocfs2_0 type ocfs2
> (rw,_netdev,localalloc=16,data=writeback,heartbeat=local)
> By the way, are you only timing the dd, or are you doing dd;sync and timing
> the entire operation? The latter is a better measurement of how long it
> takes to get data written to disk.
>        --Mark
I have a pl script that does start timer, dd, sync, end timer using
gettimeofday() but this seems to give slightly slower results than
doing it like this:

pattans at orca ~/san1/tmp $ time { dd if=/dev/zero of=testFile.porpoise
bs=4k count=500000 ; sync; }
2048000000 bytes (2.0 GB) copied, 15.047 s, 136 MB/s

pattans at orca ~/san1/tmp $ time { dd if=/dev/zero of=testFile.porpoise
bs=8k count=250000 ; sync; }
2048000000 bytes (2.0 GB) copied, 12.5018 s, 164 MB/s

pattans at orca ~/san1/tmp $ time { dd if=/dev/zero of=testFile.porpoise
bs=16k count=125000 ; sync; }
2048000000 bytes (2.0 GB) copied, 13.7218 s, 149 MB/s

pattans at orca ~/san1/tmp $ time { dd if=/dev/zero of=testFile.porpoise
bs=32k count=62500 ; sync; }
2048000000 bytes (2.0 GB) copied, 13.647 s, 150 MB/s

pattans at orca ~/san1/tmp $ time { dd if=/dev/zero of=testFile.porpoise
bs=64k count=31250 ; sync; }
2048000000 bytes (2.0 GB) copied, 11.9441 s, 171 MB/s

pattans at orca ~/san1/tmp $ time { dd if=/dev/zero of=testFile.porpoise
bs=64k count=31250 ; sync; }
2048000000 bytes (2.0 GB) copied, 12.083 s, 169 MB/s

pattans at orca ~/san1/tmp $ time { dd if=/dev/zero of=testFile.porpoise
bs=128k count=15625 ; sync; }
2048000000 bytes (2.0 GB) copied, 11.9659 s, 171 MB/s

pattans at orca ~/san1/tmp $ time { dd if=/dev/zero of=testFile.porpoise
bs=256k count=7812 ; sync; }
2047868928 bytes (2.0 GB) copied, 14.3243 s, 143 MB/s

Anyway to change the default block write size similar to using wsize
and rsize nfs mount options? Still working on the stripe, need to get
some other RAID hardware stabilized.

Ocfs2 users - Jul 2008 - ocfs2 performance and scaling

[Ocfs2-users] ocfs2 performance and scaling

[Ocfs2-users] ocfs2 performance and scaling

[Ocfs2-users] ocfs2 performance and scaling

[Ocfs2-users] ocfs2 performance and scaling