Lawrence, here are my answers to your questions, based on the analysis that
follows them.
Q1: Why the Read is better than write, is ok?
Less-than optimal bonding mode is reason for difference
Q2: Refer to experience, This benchmark is best , good or bad?
The results are consistent with past experience with bonding mode 0
(round-robin)
Q3: How to optimize the gluster to impove the benchmark?
I suspect network configuration needs optimization, but you might not need
striping feature.
Q4: How can I get other friends' benchmark, which I use to compared?
I don't understand question, but iozone is what I use for sequential I/O
benchmarking
configuration -- It appears that you have 2 servers and 6 clients, and your
gluster volume is striped, with no replication.
servers:
each server has 4 cores
each server has 4-way bonding mode 0
each server has 12 1-TB drives configured as 2 6-drive RAID volumes (what kind
of RAID?)
clients:
each has 2-way bonding mode 0
client disks are irrelevant since gluster does not use them
results:
You do cluster iozone sequential write test followed by sequential read test
with 25 threads, total of 50 GB data, using 1-MB transfer size, including fsync
and close in throughput calculation. Results are:
initial write: 414 MB/s
re-write: 447 MB/s
initial read: 655 MB/s
re-read: 778 MB/s
Clients have 12 NICs and servers have 8 NICs, so cross-sectional bandwidth
between clients and servers is ~800 MB/s. So for your volume type you would
treat 800 MB/s as the theoretical limit of your iozone throughput. It appears
that you have enough threads and enough disk drives in your servers that your
storage should not be the bottleneck. With this I/O transfer size and volume
type, a server CPU bottleneck is less likely, but you should still check.
With Linux bonding, the biggest challenge is to load-balancing INCOMING traffic
to the servers (writes) -- almost any mode can load-balance outgoing traffic
(reads) across NICs in bond. For writes, you are transmitting from the many
client NICs to the fewer server NICs. The problem with bonding mode 0 is that
ARP protocol associates a single MAC address with a single IP address, and
bonding mode 0 assigns the SAME MAC ADDRESS to all NICs, so the network switch
will "learn" that the last port that transmitted with that MAC address
is the port where all receives should take place for that MAC address. This
will reduce the effectiveness of bonding at balancing receive load across
available server NICs. For better load-balancing of receive traffic by switch
and servers, try:
bonding mode 6 (balance-alb) -- use if clients and servers are mostly on the
same VLAN. In this mode, the Linux bonding driver will use ARP to load-balance
clients across available server NICs (NICs retain unique MAC addresses), so
network switch can deliver IP packets from different clients to different server
NICs. This can result in optimal utilization of server NICs when
clients/server ratio is larger than number of server NICs, usually with no
switch configuration necessary.
bonding mode 4 (803.2ad "trunking") -- if switch supports it, you can
configure the switch and the servers to treat all server NICs as a single
"trunk". Any incoming IP packet destined for that server can be
passed by the switch to whichever server NIC is least busy (subject to
constraints?). This works even when clients are on a different subnet, and does
not depend on ARP protocol, but both servers and switch must be configured for
this to work, and switch configuration in the past has been vendor-specific.
Also, I do not see how using gluster striping feature will improve your
performance with this workload. Gluster striping could be expected to help
under 2 conditions:
- Gluster client can read/write data much faster than any one server can
- Gluster client is only reading/writing one or two files at a time
Neither of these conditions is satisfied by your workload and configuration.
You achieved close to the network throughput limit for re-reads. The difference
between initial read and re-read result suggests that you might be able to
improve your initial read result with better pre-fetching on the server block
devices.
Ben England, Red Hat
----- Original Message -----
From: "Amar Tumballi" <amarts at redhat.com>
To: "Ben England" <bengland at redhat.com>
Sent: Saturday, May 19, 2012 2:22:03 AM
Subject: Fwd: [Gluster-users] This benchmark is OK?
Ben,
When you have time, can you have a look on this thread and respond?
-Amar
-------- Original Message --------
Subject: [Gluster-users] This benchmark is OK?
Date: Thu, 17 May 2012 00:11:48 +0800
From: soft_lawrency at hotmail.com <soft_lawrency at hotmail.com>
Reply-To: soft_lawrency <soft_lawrency at hotmail.com>
To: gluster-users <gluster-users at gluster.org>
CC: wangleiyf <wangleiyf at initdream.com>
Hi Amar,
here is my benchmark, pls help me to evaluate it.
1. [Env - Storage servers]
Gluster version: 3.3 beta3
OS : CentOS 6.1
2* Server : CPU : E5506 @ 2.13GHz 4*core
MEM: 16G
Disk : 1T * 12
Net: bond0 = 4 * 1Gb
2. [Env - Testing clients]
OS: CentOS 6.1
6* Server: CPU: 5150 @ 2.66GHz
MEM: 8G
Disk: RAID0 = 1T * 3
Net: bond0 = 2 * 1Gb
3. [Env - Switch]
1 * H3C Switch
4. [Env - Testing Tools]
iozone for cluster, clients is rsh cluster.
5. Volume info
Type: distributed - stripe
bricks: 6 * 4 = 24
6. Iozone command:
./iozone -r 1m -s 50g -t 25 -i 0 -i 1 -+m /CZFS/client_list -R -b
report.xls -c -C -+k -e
then my benchmark is :
" Initial write " 424258.41 KB
" Rewrite " 458250.97 KB
" Read " 671079.30 KB
" Re-read " 797509.20 KB
here I have 4 Questions:
Q1: Why the Read is better than write, is ok?
Q2: Refer to experience, This benchmark is best , good or bad?
Q3: How to optimize the gluster to impove the benchmark?
Q4: How can I get other friends' benchmark, which I use to compared?
Tks very much.
Regards,
Lawrence.