thr3ads.net - Gluster users - [Gluster-users] Native client reads from a replication volume [Sep 2013]

If this information is useful, please help other people find it:
Share via:

Kal Black

2013-Sep-04 20:54 UTC

[Gluster-users] Native client reads from a replication volume

Hi All,
I am new to Gluster and trying to figure out how the things actually work.
Have been reading the doc, lists and blogs but still have some doubts. Did
a little test trying to understand how a native client would read from a
replicated volume and need some help.

My setup:
0. Using Gluster 3.4.0 of CentOS 6.4 64bit servers.
1. Replicated gluster volume, using two bricks on two different gluster
servers.
2. Dedicated apache web server, mounting the replicated gluster volume,
using gluster native client, having it's document root directory, with 10
000 100K unique files, on the gluster mount.
3. Dedicated workstation that queries each unique 100K file from the web
server using httperf:
httperf --hog --server=192.168.29.45 --uri= --port=80 --wlog
y,/var/tmp/100K_urls --num-calls 1 --timeout 5 --num-conns 10000 --rate 100

Findings:

1. Gluster servers will cache requests and then will serve subsequent
request from the cache (no disk i/o will be issued). Clearing the mem cache
(sync; echo "3" >/proc/sys/vm/drop_caches) will trigger reading the
files
from disk again.
1.1 Not sure if the caching is due to gluster caching (the
performance/io-cache translator is enabled by default) or it is due to the
system caching.
1.2 During the initial caching on the gluster servers, the load (top) on
the web server would go above 200. It will go to normal level, once the i/o
on the gluster servers goes down (server starts serving requests from the
cache). There are any to very little blocked process on the web server
(vmstat) and performance looks good. Apparently this should be attributed
somehow to the gluster client .. but why?

2. In order to figure out how the gluster client reads from the gluster
servers, I started iptraf on the gluster client (the apache server) and on
both gluster servers. After that, I stared the httperf from the
workstation, requesting each of the 10 000, 100K files located on the
gluster volume (a total of 1G). In my understanding, the client would make
sure that the requested file is the right one by checking file's metadata
on all gluster servers, part of the replication volume, but will fetch it
from only one of them.

To my surprise, according to iptraf, each gluster server sent 1038M (close
to the total size of all files) to the gluster client. And the gluster
client received ~98001K from each gluster server:

Proto/Port Pkts Bytes PktsTo BytesTo PktsFrom BytesFrom

Gluster server 1:
TCP/1020 543629 1072M 101758 1038M 441871 34578324

Gluster server 2:
TCP/1019 546148 1072M 103366 1038M 442782 34625676

Gluster Client:
TCP/1020 1087627 981701K 697846 949912K 389781
31789104

TCP/1019 1087516 981973K 697993 950220K 389523 31753440

In the other hand, each gluster server cached ~500MB while the gluster
client cached ~1GB

Gluster server 1:
# free -m
total used free shared buffers cached
Mem: 3821 1072 2748 0 66 520
-/+ buffers/cache: 485 3336
Swap: 3951 0 3951

Gluster server 1:
# free -m
total used free shared buffers cached
Mem: 3821 1085 2735 0 65 522
-/+ buffers/cache: 497 3324
Swap: 3951 3 3948

Gluser Client:
# free -m
total used free shared buffers cached
Mem: 3821 1330 2490 0 1 993
-/+ buffers/cache: 336 3485
Swap: 3951 0 3951

Can someone explain how exactly a single native guster client would read
from a replication volume.
Is there any mechanism (any details) that would spread requests (from a
single client) among all servers, taking part in a replica volume, making
it faster and spread the load among all servers?
Would a client take a decision where to fetch a file from each time based
on some criteria (details) or it will be bound to a give server all the
time?

Is there any low level documentation since I find the one provided on the
gluster site to be very basic?

Thanks!
Kal
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130904/6e6f06ae/attachment.html>

Kal Black

2013-Sep-11 14:09 UTC

head link

[Gluster-users] Native client reads from a replication volume

Hello,
I asked this question some days ago but did not get any answer so far so I
am posting it again.

Can someone explain how Gluster native client reads from a Gluster
replicated volume?

1. Is there a mechanism that would spread requests, form a single client,
among all bricks/servers, taking part in a replica volume?

2. Would the native client take a decision where (brick/server) to fetch a
file from based on some criteria or it will be bound to a give brick/server
all the time?

3. Is a file being downloaded from just one of the gluster servers?

I did some tests, trying to measure network traffic between gluster servers
and a client using "iptraf" and "gluster volume status Volname
client",
which gave confusing results so I would like someone to clear this out for
me if possible.

Thank you!
Kal
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130911/669125e0/attachment.html>

Gluster users - Sep 2013 - Native client reads from a replication volume

[Gluster-users] Native client reads from a replication volume

[Gluster-users] Native client reads from a replication volume