Jon Swanson
2010-Apr-02 07:10 UTC
[Gluster-users] Caching differences in Gluster vs Local Storage
Hello,
First off, thanks again for providing gluster. Awesome project.
This is a n00bish question. I thought that gluster goes through the VFS
like any other filesystem, which is where the most of the filesystem
caching takes place. (Somewhat Simplified)
I'm seeing a major difference in benchmarks when comparing small-ish
files locally versus on gluster. Namely, about 40x different on writes.
I don't really think this is a problem, but am just seeking a greater
understanding. Server / client are all on 3.0.3. Servers are two
machines in a replicate setup.
Client: CentOS 5.4 2.6.18-164.15.1.el5.centos
Servers: F12 2.6.32.9-70.fc12.x86_64
-----------------------------------------
(To Gluster)
[root at linuxdb1 tiobench-gluster.2]# tiotest -b 16384 -r 4096 -f 32 -t 16
-d .
Tiotest results for 16 concurrent io threads:
,----------------------------------------------------------------------.
| Item | Time | Rate | Usr CPU | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write 512 MBs | 16.7 s | 30.731 MB/s | 2.0 % | 33.5 % |
| Random Write 1024 MBs | 38.9 s | 26.314 MB/s | 1.8 % | 32.5 % |
| Read 512 MBs | 4.8 s | 107.145 MB/s | 4.0 % | 221.4 % |
| Random Read 1024 MBs | 4.2 s | 241.220 MB/s | 11.6 % | 543.4 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item | Average latency | Maximum latency | % >2 sec | % >10 sec
|
+--------------+-----------------+-----------------+----------+-----------+
| Write | 7.747 ms | 240.730 ms | 0.00000 | 0.00000 |
| Random Write | 8.709 ms | 2425.524 ms | 0.00153 | 0.00000 |
| Read | 2.009 ms | 1575.232 ms | 0.00000 | 0.00000 |
| Random Read | 0.930 ms | 236.096 ms | 0.00000 | 0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total | 4.839 ms | 2425.524 ms | 0.00051 | 0.00000 |
`--------------+-----------------+-----------------+----------+-----------'
(To Local)
[root at linuxdb1 tiobench-gluster.2]# tiotest -b 16384 -r 4096 -f 32 -t 16
-d ~
Tiotest results for 16 concurrent io threads:
,----------------------------------------------------------------------.
| Item | Time | Rate | Usr CPU | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write 512 MBs | 35.7 s | 14.361 MB/s | 0.5 % | 833.8 % |
| Random Write 1024 MBs | 100.6 s | 10.182 MB/s | 0.4 % | 379.5 % |
| Read 512 MBs | 0.1 s | 4043.978 MB/s | 74.2 % | 5832.1 % |
| Random Read 1024 MBs | 0.2 s | 4171.521 MB/s | 131.2 % | 6425.0 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item | Average latency | Maximum latency | % >2 sec | % >10 sec
|
+--------------+-----------------+-----------------+----------+-----------+
| Write | 0.846 ms | 154.874 ms | 0.00000 | 0.00000 |
| Random Write | 0.185 ms | 265.350 ms | 0.00000 | 0.00000 |
| Read | 0.044 ms | 13.088 ms | 0.00000 | 0.00000 |
| Random Read | 0.043 ms | 16.019 ms | 0.00000 | 0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total | 0.224 ms | 265.350 ms | 0.00000 | 0.00000 |
`--------------+-----------------+-----------------+----------+-----------'
---------------------------------------
Volume Files. The machine in question is mounting the linuxdb1 volume.
Any criticisms of the way these files are setup are also extremely welcome.
[root at x2100-gfs1 glusterfs]# cat glusterfsd.vol
## file auto generated by /usr/bin/glusterfs-volgen (export.vol)
# Cmd line:
# $ /usr/bin/glusterfs-volgen --name DBMirror --raid 1
x2100-gfs1:/data/gfs/DBMirror x2100-gfs2:/data/gfs/DBMirror
############# pdb Mirror #####################################
# RAID 1
# TRANSPORT-TYPE tcp
# mounted by test environment. Currently setup as
# a replicate, or raid1, across x2100-gfs1 and x2100-gfs2
##################################################################
volume pdb-posix
type storage/posix
option directory /data/gfs/pdb
end-volume
volume pdb-locks
type features/locks
subvolumes pdb-posix
end-volume
volume pdb-iothreads
type performance/io-threads
option thread-count 8
subvolumes pdb-locks
end-volume
############# linuxdb1 Mirror #####################################
# RAID 1
# TRANSPORT-TYPE tcp
# mounted by linuxdb1 Currently configured as a
# replicate, or raid1, across x2100-gfs1 and x2100-gfs2
##################################################################
volume linuxdb1-posix
type storage/posix
option directory /data/gfs/linuxdb1
end-volume
volume linuxdb1-locks
type features/locks
subvolumes linuxdb1-posix
end-volume
volume linuxdb1-iothreads
type performance/io-threads
option thread-count 8
subvolumes linuxdb1-locks
end-volume
############# vmmirror1 Mirror ###################################
# RAID 1
# TRANSPORT-TYPE tcp
# mounted by stuff (archtest01). Currently configured as a
# replicate, or raid1, across x2100-gfs1 and x2100-gfs2
##################################################################
volume vmmirror1-posix
type storage/posix
option directory /data/gfs/vmmirror1
end-volume
volume vmmirror1-locks
type features/locks
subvolumes vmmirror1-posix
end-volume
volume vmmirror1-iothreads
type performance/io-threads
option thread-count 8
subvolumes vmmirror1-locks
end-volume
############# GLOBAL SPECIFICATIONS ###############################
# TRANSPORT-TYPE tcp
# global options. Currently configured to export volumes linuxdb1
# and pdb.
##################################################################
volume server-tcp
type protocol/server
option transport-type tcp
option auth.addr.pdb-iothreads.allow *
option auth.addr.linuxdb1-iothreads.allow *
option auth.addr.vmmirror1-iothreads.allow *
option transport.socket.listen-port 6996
option transport.socket.nodelay on
subvolumes pdb-iothreads linuxdb1-iothreads vmmirror1-iothreads
end-volume
[root at x2100-gfs1 glusterfs]# cat glusterfs.vol
## file auto generated by /usr/bin/glusterfs-volgen (mount.vol)
# Cmd line:
# $ /usr/bin/glusterfs-volgen --name DBMirror --raid 1
x2100-gfs1:/data/gfs/DBMirror x2100-gfs2:/data/gfs/DBMirror
############# PDB Mirror #####################################
# RAID 1
# TRANSPORT-TYPE tcp
# Intended for pdb test environment
# Volume-name: pdb
##############################################################
volume x2100-gfs1-pdb
type protocol/client
option transport-type tcp
option remote-host x2100-gfs1
option transport.socket.nodelay on
option remote-port 6996
option remote-subvolume pdb-iothreads
end-volume
volume x2100-gfs2-pdb
type protocol/client
option transport-type tcp
option remote-host x2100-gfs2
option transport.socket.nodelay on
option remote-port 6996
option remote-subvolume pdb-iothreads
end-volume
# Name of the volume as specified at mount time
volume pdb
type cluster/replicate
subvolumes x2100-gfs1-pdb x2100-gfs2-pdb
end-volume
volume pdb-writebehind
type performance/write-behind
option cache-size 4MB
subvolumes pdb
end-volume
volume pdb-readahead
type performance/read-ahead
option page-count 4
subvolumes pdb-writebehind
end-volume
volume pdb-iocache
type performance/io-cache
option cache-size `grep 'MemTotal' /proc/meminfo | awk '{print
$2
* 0.2 / 1024}' | cut -f1 -d.`MB
option cache-timeout 1
subvolumes pdb-readahead
end-volume
volume pdb-quickread
type performance/quick-read
option cache-timeout 1
option max-file-size 64kB
subvolumes pdb-iocache
end-volume
volume pdb-statprefetch
type performance/stat-prefetch
subvolumes pdb-quickread
end-volume
############# linuxdb Mirror #####################################
# RAID 1
# TRANSPORT-TYPE tcp
# Intended for linuxdb1 to mount
# Volume-name: linuxdb1
##################################################################
volume x2100-gfs1-linuxdb1
type protocol/client
option transport-type tcp
option remote-host x2100-gfs1
option transport.socket.nodelay on
option remote-port 6996
option remote-subvolume linuxdb1-iothreads
end-volume
volume x2100-gfs2-linuxdb1
type protocol/client
option transport-type tcp
option remote-host x2100-gfs2
option transport.socket.nodelay on
option transport.remote-port 6996
option remote-subvolume linuxdb1-iothreads
end-volume
# Name of the volume as specified at mount time
volume linuxdb1
type cluster/replicate
subvolumes x2100-gfs1-linuxdb1 x2100-gfs2-linuxdb1
end-volume
volume linuxdb1-writebehind
type performance/write-behind
option cache-size 4MB
subvolumes linuxdb1
end-volume
volume linuxdb1-readahead
type performance/read-ahead
option page-count 4
subvolumes linuxdb1-writebehind
end-volume
volume linuxdb1-iocache
type performance/io-cache
option cache-size `grep 'MemTotal' /proc/meminfo | awk '{print
$2
* 0.2 / 1024}' | cut -f1 -d.`MB
option cache-timeout 1
subvolumes linuxdb1-readahead
end-volume
volume linuxdb1-quickread
type performance/quick-read
option cache-timeout 1
option max-file-size 64kB
subvolumes linuxdb1-iocache
end-volume
volume linuxdb1-statprefetch
type performance/stat-prefetch
subvolumes linuxdb1-quickread
end-volume
############# Virtual Images Mirror ###############################
# RAID 1
# TRANSPORT-TYPE tcp
# Intended for vm testing servers to mount
# Volume-name: vmmirror1
##################################################################
volume x2100-gfs1-vmmirror1
type protocol/client
option transport-type tcp
option remote-host x2100-gfs1
option transport.socket.nodelay on
option remote-port 6996
option remote-subvolume vmmirror1-iothreads
end-volume
volume x2100-gfs2-vmmirror1
type protocol/client
option transport-type tcp
option remote-host x2100-gfs2
option transport.socket.nodelay on
option remote-port 6996
option remote-subvolume vmmirror1-iothreads
end-volume
# Name of the volume as specified at mount time
volume vmmirror1
type cluster/replicate
subvolumes x2100-gfs1-vmmirror1 x2100-gfs2-vmmirror1
end-volume
volume vmmirror1-writebehind
type performance/write-behind
option cache-size 4MB
subvolumes vmmirror1
end-volume
volume vmmirror1-readahead
type performance/read-ahead
option page-count 4
subvolumes vmmirror1-writebehind
end-volume
volume vmmirror1-iocache
type performance/io-cache
option cache-size `grep 'MemTotal' /proc/meminfo | awk '{print
$2
* 0.2 / 1024}' | cut -f1 -d.`MB
option cache-timeout 1
subvolumes vmmirror1-readahead
end-volume
volume vmmirror1-quickread
type performance/quick-read
option cache-timeout 1
option max-file-size 64kB
subvolumes vmmirror1-iocache
end-volume
volume vmmirror1-statprefetch
type performance/stat-prefetch
subvolumes vmmirror1-quickread
end-volume
Thanks!
Jon Swanson
2010-Apr-02 07:38 UTC
[Gluster-users] Caching differences in Gluster vs Local Storage
Yeah, obviously it's not actually writing to physical disks. I'm assuming that because it's a small file size (32GB), most of that is just hitting the filesystem cache. What i'm curious about is why Gluster is not seeing similar benefits from filesystem cache. It is getting /some/ benefit: [root at linuxdb1 tiobench-gluster.2]# tiotest -b 16384 -r 4096 -f 32 -t 16 -d . Tiotest results for 16 concurrent io threads: ,----------------------------------------------------------------------. | Item | Time | Rate | Usr CPU | Sys CPU | +-----------------------+----------+--------------+----------+---------+ | Write 512 MBs | 16.7 s | 30.731 MB/s | 2.0 % | 33.5 % | | Random Write 1024 MBs | 38.9 s | 26.314 MB/s | 1.8 % | 32.5 % | | Read 512 MBs | 4.8 s | 107.145 MB/s | 4.0 % | 221.4 % | | Random Read 1024 MBs | 4.2 s | 241.220 MB/s | 11.6 % | 543.4 % | There's no way it's getting 241 MB/s over gigabit with Random Read. I'm sure there's a reason for this, just curious as to what it is. On 04/02/2010 04:29 PM, Marcus Bointon wrote:> On 2 Apr 2010, at 09:10, Jon Swanson wrote: > > >> ,----------------------------------------------------------------------. >> | Item | Time | Rate | Usr CPU | Sys CPU | >> +-----------------------+----------+--------------+----------+---------+ >> | Write 512 MBs | 35.7 s | 14.361 MB/s | 0.5 % | 833.8 % | >> | Random Write 1024 MBs | 100.6 s | 10.182 MB/s | 0.4 % | 379.5 % | >> | Read 512 MBs | 0.1 s | 4043.978 MB/s | 74.2 % | 5832.1 % | >> | Random Read 1024 MBs | 0.2 s | 4171.521 MB/s | 131.2 % | 6425.0 % | >> `----------------------------------------------------------------------' >> > Either these numbers or units are wrong or you have some outrageously fast disks! 4Gbytes/sec?? You have multiple FusionIOs or something? > > Marcus >