thr3ads.net - Gluster users - [Gluster-users] advice on optimal configuration [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Barry Robison

2010-Mar-09 23:58 UTC

[Gluster-users] advice on optimal configuration

Hello,

I have 128 physically identical blades, with 1GbE uplink per blade,
and 10GbE between chassis ( 32 blades per chassis ). Each node will
have a 80GB gluster partition. Dual-quad core intel Xeons, 24GB RAM.

The goal is to use gluster as a cache for files used by render
applications. All files in gluster could be re-generated or retrieved
from the upstream file server.

My first volume config attempt is 64 replicated volumes with partner
pairs on different chassis.

Is replicating a performance hit? Do reads balance between replication nodes?

Would NUFA make more sense for this set-up?

Here is my config, any advice appreciated.

Thank you,
-Barry

>>>>volume c001b17-1
    type protocol/client
    option transport-type tcp
    option remote-host c001b17
    option transport.socket.nodelay on
    option transport.remote-port 6996
    option remote-subvolume brick1
    option ping-timeout 5
end-volume
.
<snip>
.
volume c004b48-1
    type protocol/client
    option transport-type tcp
    option remote-host c004b48
    option transport.socket.nodelay on
    option transport.remote-port 6996
    option remote-subvolume brick1
    option ping-timeout 5
end-volume

volume replicate001-17
    type cluster/replicate
    subvolumes c001b17-1 c002b17-1
end-volume
.
<snip>
.
volume replicate001-48
    type cluster/replicate
    subvolumes c001b48-1 c002b48-1
end-volume

volume replicate003-17
    type cluster/replicate
    subvolumes c003b17-1 c004b17-1
end-volume
.
<snip>
.
volume replicate003-48
    type cluster/replicate
    subvolumes c003b48-1 c004b48-1
end-volume

volume distribute
    type cluster/distribute
    subvolumes replicate001-17 replicate001-18 replicate001-19
replicate001-20 replicate001-21 replicate001-22 replicate001-23
replicate001-24 replicate001-25 replicate001-26 replicate001-27
replicate001-28 replicate001-29 replicate001-30 replicate001-31
replicate001-32 replicate001-33 replicate001-34 replicate001-35
replicate001-36 replicate001-37 replicate001-38 replicate001-39
replicate001-40 replicate001-41 replicate001-42 replicate001-43
replicate001-44 replicate001-45 replicate001-46 replicate001-47
replicate001-48 replicate003-17 replicate003-18 replicate003-19
replicate003-20 replicate003-21 replicate003-22 replicate003-23
replicate003-24 replicate003-25 replicate003-26 replicate003-27
replicate003-28 replicate003-29 replicate003-30 replicate003-31
replicate003-32 replicate003-33 replicate003-34 replicate003-35
replicate003-36 replicate003-37 replicate003-38 replicate003-39
replicate003-40 replicate003-41 replicate003-42 replicate003-43
replicate003-44 replicate003-45 replicate003-46 replicate003-47
replicate003-48
end-volume

volume writebehind
    type performance/write-behind
    option cache-size 64MB
    option flush-behind on
    subvolumes distribute
end-volume

volume readahead
    type performance/read-ahead
    option page-count 4
    subvolumes writebehind
end-volume

volume iocache
    type performance/io-cache
    option cache-size 128MB
    option cache-timeout 10
    subvolumes readahead
end-volume

volume quickread
    type performance/quick-read
    option cache-timeout 1
    option max-file-size 64kB
    subvolumes iocache
end-volume

volume statprefetch
    type performance/stat-prefetch
    subvolumes quickread
end-volume

Vikas Gorur

2010-Mar-10 00:34 UTC

head link

[Gluster-users] advice on optimal configuration

On Mar 9, 2010, at 3:58 PM, Barry Robison wrote:
> Hello,
> 
> I have 128 physically identical blades, with 1GbE uplink per blade,
> and 10GbE between chassis ( 32 blades per chassis ). Each node will
> have a 80GB gluster partition. Dual-quad core intel Xeons, 24GB RAM.
> 
> The goal is to use gluster as a cache for files used by render
> applications. All files in gluster could be re-generated or retrieved
> from the upstream file server.
How do you plan to keep the gluster cache and the upstream fileserver in sync?
> My first volume config attempt is 64 replicated volumes with partner
> pairs on different chassis.
> 
> Is replicating a performance hit? Do reads balance between replication
nodes?
Replication will reduce write performance. Reads however will be balanced
between
the subvolumes.

------------------------------
Vikas Gorur
Engineer - Gluster, Inc.
+1 (408) 770 1894
------------------------------

Tejas N. Bhise

2010-Mar-10 07:39 UTC

head link

[Gluster-users] advice on optimal configuration

Barry,

Just to clarify, the application that would cache files on glusterfs would do it
across regular mount points and not copy off from the backend servers, right ?
If that is the case then that is fine.

Since you mentioned such a small partition my guess would be that you are using
SSD on the 128 cache nodes. Is that correct ?

Since you can re-generate or retreive files from the upstream file server
seamlessly, I would recommend not to use replication and instead configure a 2X
cache using distribute configuration. If there are enough files and the
application is caching files that are in demand, they will spread out nicely
over the 128 nodes and will give you a good load balancing effect.

With replication, suppose you have two replicas, like you mentioned, the write
goes to both replica servers and the read for a file will go to a preferred
server. There is no load balancing per file per se. What I mean is, suppose 100
clients mount a volume that is replicated across 2 servers, if all of them
access the same file in read mode, it will be read from the same server and will
not be balanced across the 2 servers. This however can be fixed by using a
client preferred read server - but this would have to be set on each client.
Also, it will work only for a replication count of 2. It does not allow for a
preference list for servers - like it would not allow for a replica count of 3,
one client to give preference of s1, s2, s3, another client to give preference
of s2, s3, s1 and the next one a preference of s3, s1, s2 and so on and so
forth.

At some point we intend to automate some of that, but since most users use a
replication count of 2 only, it can be managed - except of the work required to
set preferences on each client. Again, if there are lots of files being
accessed, it evens out, so that becomes less of a concern again and gives a load
balanced effect.

So in summary, read for same file does not get balanced, unless each client sets
a preference. However for many files being accessed it evens out and gives a
load balanced effect.

Since you are only going to write once, that does not hurt performance much ( a
replicated write returns only after the write has happened to both replica
locations ).

Since you are still in testing phase, what you can do is this - create one
backend FS on each nodes. Create two directories in that - one called distribute
and the other called something like replica<volume><replica#> so you
can use that to group it with a similar one on another node for replication.

The backend subvolumes exported from the servers can be directories so you can
setup a distribute GlusterFS volume as well as the replicated GlusterFS volumes
and mount both on the clients and hence test both. At any point when you have
decide to use one of them, just umount the other one, delete the directory from
the the backend FS and thats it.

If you have SSDs like I assumed, you would actually be decreasing wear per
cached data ( if there were such a term :-) ) by not using replication.

Let me know if you have any questions on this.

Regards,
Tejas.


----- Original Message -----
From: "Barry Robison" <barry.robison at drdstudios.com>
To: gluster-users at gluster.org
Sent: Wednesday, March 10, 2010 5:28:24 AM GMT +05:30 Chennai, Kolkata, Mumbai,
New Delhi
Subject: [Gluster-users] advice on optimal configuration

Hello,

I have 128 physically identical blades, with 1GbE uplink per blade,
and 10GbE between chassis ( 32 blades per chassis ). Each node will
have a 80GB gluster partition. Dual-quad core intel Xeons, 24GB RAM.

The goal is to use gluster as a cache for files used by render
applications. All files in gluster could be re-generated or retrieved
from the upstream file server.

My first volume config attempt is 64 replicated volumes with partner
pairs on different chassis.

Is replicating a performance hit? Do reads balance between replication nodes?

Would NUFA make more sense for this set-up?

Here is my config, any advice appreciated.

Thank you,
-Barry

>>>>volume c001b17-1
    type protocol/client
    option transport-type tcp
    option remote-host c001b17
    option transport.socket.nodelay on
    option transport.remote-port 6996
    option remote-subvolume brick1
    option ping-timeout 5
end-volume
.
<snip>
.
volume c004b48-1
    type protocol/client
    option transport-type tcp
    option remote-host c004b48
    option transport.socket.nodelay on
    option transport.remote-port 6996
    option remote-subvolume brick1
    option ping-timeout 5
end-volume

volume replicate001-17
    type cluster/replicate
    subvolumes c001b17-1 c002b17-1
end-volume
.
<snip>
.
volume replicate001-48
    type cluster/replicate
    subvolumes c001b48-1 c002b48-1
end-volume

volume replicate003-17
    type cluster/replicate
    subvolumes c003b17-1 c004b17-1
end-volume
.
<snip>
.
volume replicate003-48
    type cluster/replicate
    subvolumes c003b48-1 c004b48-1
end-volume

volume distribute
    type cluster/distribute
    subvolumes replicate001-17 replicate001-18 replicate001-19
replicate001-20 replicate001-21 replicate001-22 replicate001-23
replicate001-24 replicate001-25 replicate001-26 replicate001-27
replicate001-28 replicate001-29 replicate001-30 replicate001-31
replicate001-32 replicate001-33 replicate001-34 replicate001-35
replicate001-36 replicate001-37 replicate001-38 replicate001-39
replicate001-40 replicate001-41 replicate001-42 replicate001-43
replicate001-44 replicate001-45 replicate001-46 replicate001-47
replicate001-48 replicate003-17 replicate003-18 replicate003-19
replicate003-20 replicate003-21 replicate003-22 replicate003-23
replicate003-24 replicate003-25 replicate003-26 replicate003-27
replicate003-28 replicate003-29 replicate003-30 replicate003-31
replicate003-32 replicate003-33 replicate003-34 replicate003-35
replicate003-36 replicate003-37 replicate003-38 replicate003-39
replicate003-40 replicate003-41 replicate003-42 replicate003-43
replicate003-44 replicate003-45 replicate003-46 replicate003-47
replicate003-48
end-volume

volume writebehind
    type performance/write-behind
    option cache-size 64MB
    option flush-behind on
    subvolumes distribute
end-volume

volume readahead
    type performance/read-ahead
    option page-count 4
    subvolumes writebehind
end-volume

volume iocache
    type performance/io-cache
    option cache-size 128MB
    option cache-timeout 10
    subvolumes readahead
end-volume

volume quickread
    type performance/quick-read
    option cache-timeout 1
    option max-file-size 64kB
    subvolumes iocache
end-volume

volume statprefetch
    type performance/stat-prefetch
    subvolumes quickread
end-volume
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Gluster users - Mar 2010 - advice on optimal configuration

[Gluster-users] advice on optimal configuration

[Gluster-users] advice on optimal configuration

[Gluster-users] advice on optimal configuration