Barry,
Just to clarify, the application that would cache files on glusterfs would do it
across regular mount points and not copy off from the backend servers, right ?
If that is the case then that is fine.
Since you mentioned such a small partition my guess would be that you are using
SSD on the 128 cache nodes. Is that correct ?
Since you can re-generate or retreive files from the upstream file server
seamlessly, I would recommend not to use replication and instead configure a 2X
cache using distribute configuration. If there are enough files and the
application is caching files that are in demand, they will spread out nicely
over the 128 nodes and will give you a good load balancing effect.
With replication, suppose you have two replicas, like you mentioned, the write
goes to both replica servers and the read for a file will go to a preferred
server. There is no load balancing per file per se. What I mean is, suppose 100
clients mount a volume that is replicated across 2 servers, if all of them
access the same file in read mode, it will be read from the same server and will
not be balanced across the 2 servers. This however can be fixed by using a
client preferred read server - but this would have to be set on each client.
Also, it will work only for a replication count of 2. It does not allow for a
preference list for servers - like it would not allow for a replica count of 3,
one client to give preference of s1, s2, s3, another client to give preference
of s2, s3, s1 and the next one a preference of s3, s1, s2 and so on and so
forth.
At some point we intend to automate some of that, but since most users use a
replication count of 2 only, it can be managed - except of the work required to
set preferences on each client. Again, if there are lots of files being
accessed, it evens out, so that becomes less of a concern again and gives a load
balanced effect.
So in summary, read for same file does not get balanced, unless each client sets
a preference. However for many files being accessed it evens out and gives a
load balanced effect.
Since you are only going to write once, that does not hurt performance much ( a
replicated write returns only after the write has happened to both replica
locations ).
Since you are still in testing phase, what you can do is this - create one
backend FS on each nodes. Create two directories in that - one called distribute
and the other called something like replica<volume><replica#> so you
can use that to group it with a similar one on another node for replication.
The backend subvolumes exported from the servers can be directories so you can
setup a distribute GlusterFS volume as well as the replicated GlusterFS volumes
and mount both on the clients and hence test both. At any point when you have
decide to use one of them, just umount the other one, delete the directory from
the the backend FS and thats it.
If you have SSDs like I assumed, you would actually be decreasing wear per
cached data ( if there were such a term :-) ) by not using replication.
Let me know if you have any questions on this.
Regards,
Tejas.
----- Original Message -----
From: "Barry Robison" <barry.robison at drdstudios.com>
To: gluster-users at gluster.org
Sent: Wednesday, March 10, 2010 5:28:24 AM GMT +05:30 Chennai, Kolkata, Mumbai,
New Delhi
Subject: [Gluster-users] advice on optimal configuration
Hello,
I have 128 physically identical blades, with 1GbE uplink per blade,
and 10GbE between chassis ( 32 blades per chassis ). Each node will
have a 80GB gluster partition. Dual-quad core intel Xeons, 24GB RAM.
The goal is to use gluster as a cache for files used by render
applications. All files in gluster could be re-generated or retrieved
from the upstream file server.
My first volume config attempt is 64 replicated volumes with partner
pairs on different chassis.
Is replicating a performance hit? Do reads balance between replication nodes?
Would NUFA make more sense for this set-up?
Here is my config, any advice appreciated.
Thank you,
-Barry
>>>>
volume c001b17-1
type protocol/client
option transport-type tcp
option remote-host c001b17
option transport.socket.nodelay on
option transport.remote-port 6996
option remote-subvolume brick1
option ping-timeout 5
end-volume
.
<snip>
.
volume c004b48-1
type protocol/client
option transport-type tcp
option remote-host c004b48
option transport.socket.nodelay on
option transport.remote-port 6996
option remote-subvolume brick1
option ping-timeout 5
end-volume
volume replicate001-17
type cluster/replicate
subvolumes c001b17-1 c002b17-1
end-volume
.
<snip>
.
volume replicate001-48
type cluster/replicate
subvolumes c001b48-1 c002b48-1
end-volume
volume replicate003-17
type cluster/replicate
subvolumes c003b17-1 c004b17-1
end-volume
.
<snip>
.
volume replicate003-48
type cluster/replicate
subvolumes c003b48-1 c004b48-1
end-volume
volume distribute
type cluster/distribute
subvolumes replicate001-17 replicate001-18 replicate001-19
replicate001-20 replicate001-21 replicate001-22 replicate001-23
replicate001-24 replicate001-25 replicate001-26 replicate001-27
replicate001-28 replicate001-29 replicate001-30 replicate001-31
replicate001-32 replicate001-33 replicate001-34 replicate001-35
replicate001-36 replicate001-37 replicate001-38 replicate001-39
replicate001-40 replicate001-41 replicate001-42 replicate001-43
replicate001-44 replicate001-45 replicate001-46 replicate001-47
replicate001-48 replicate003-17 replicate003-18 replicate003-19
replicate003-20 replicate003-21 replicate003-22 replicate003-23
replicate003-24 replicate003-25 replicate003-26 replicate003-27
replicate003-28 replicate003-29 replicate003-30 replicate003-31
replicate003-32 replicate003-33 replicate003-34 replicate003-35
replicate003-36 replicate003-37 replicate003-38 replicate003-39
replicate003-40 replicate003-41 replicate003-42 replicate003-43
replicate003-44 replicate003-45 replicate003-46 replicate003-47
replicate003-48
end-volume
volume writebehind
type performance/write-behind
option cache-size 64MB
option flush-behind on
subvolumes distribute
end-volume
volume readahead
type performance/read-ahead
option page-count 4
subvolumes writebehind
end-volume
volume iocache
type performance/io-cache
option cache-size 128MB
option cache-timeout 10
subvolumes readahead
end-volume
volume quickread
type performance/quick-read
option cache-timeout 1
option max-file-size 64kB
subvolumes iocache
end-volume
volume statprefetch
type performance/stat-prefetch
subvolumes quickread
end-volume
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users