Jon Swanson
2010-Apr-02 07:10 UTC
[Gluster-users] Caching differences in Gluster vs Local Storage
Hello, First off, thanks again for providing gluster. Awesome project. This is a n00bish question. I thought that gluster goes through the VFS like any other filesystem, which is where the most of the filesystem caching takes place. (Somewhat Simplified) I'm seeing a major difference in benchmarks when comparing small-ish files locally versus on gluster. Namely, about 40x different on writes. I don't really think this is a problem, but am just seeking a greater understanding. Server / client are all on 3.0.3. Servers are two machines in a replicate setup. Client: CentOS 5.4 2.6.18-164.15.1.el5.centos Servers: F12 2.6.32.9-70.fc12.x86_64 ----------------------------------------- (To Gluster) [root at linuxdb1 tiobench-gluster.2]# tiotest -b 16384 -r 4096 -f 32 -t 16 -d . Tiotest results for 16 concurrent io threads: ,----------------------------------------------------------------------. | Item | Time | Rate | Usr CPU | Sys CPU | +-----------------------+----------+--------------+----------+---------+ | Write 512 MBs | 16.7 s | 30.731 MB/s | 2.0 % | 33.5 % | | Random Write 1024 MBs | 38.9 s | 26.314 MB/s | 1.8 % | 32.5 % | | Read 512 MBs | 4.8 s | 107.145 MB/s | 4.0 % | 221.4 % | | Random Read 1024 MBs | 4.2 s | 241.220 MB/s | 11.6 % | 543.4 % | `----------------------------------------------------------------------' Tiotest latency results: ,-------------------------------------------------------------------------. | Item | Average latency | Maximum latency | % >2 sec | % >10 sec | +--------------+-----------------+-----------------+----------+-----------+ | Write | 7.747 ms | 240.730 ms | 0.00000 | 0.00000 | | Random Write | 8.709 ms | 2425.524 ms | 0.00153 | 0.00000 | | Read | 2.009 ms | 1575.232 ms | 0.00000 | 0.00000 | | Random Read | 0.930 ms | 236.096 ms | 0.00000 | 0.00000 | |--------------+-----------------+-----------------+----------+-----------| | Total | 4.839 ms | 2425.524 ms | 0.00051 | 0.00000 | `--------------+-----------------+-----------------+----------+-----------' (To Local) [root at linuxdb1 tiobench-gluster.2]# tiotest -b 16384 -r 4096 -f 32 -t 16 -d ~ Tiotest results for 16 concurrent io threads: ,----------------------------------------------------------------------. | Item | Time | Rate | Usr CPU | Sys CPU | +-----------------------+----------+--------------+----------+---------+ | Write 512 MBs | 35.7 s | 14.361 MB/s | 0.5 % | 833.8 % | | Random Write 1024 MBs | 100.6 s | 10.182 MB/s | 0.4 % | 379.5 % | | Read 512 MBs | 0.1 s | 4043.978 MB/s | 74.2 % | 5832.1 % | | Random Read 1024 MBs | 0.2 s | 4171.521 MB/s | 131.2 % | 6425.0 % | `----------------------------------------------------------------------' Tiotest latency results: ,-------------------------------------------------------------------------. | Item | Average latency | Maximum latency | % >2 sec | % >10 sec | +--------------+-----------------+-----------------+----------+-----------+ | Write | 0.846 ms | 154.874 ms | 0.00000 | 0.00000 | | Random Write | 0.185 ms | 265.350 ms | 0.00000 | 0.00000 | | Read | 0.044 ms | 13.088 ms | 0.00000 | 0.00000 | | Random Read | 0.043 ms | 16.019 ms | 0.00000 | 0.00000 | |--------------+-----------------+-----------------+----------+-----------| | Total | 0.224 ms | 265.350 ms | 0.00000 | 0.00000 | `--------------+-----------------+-----------------+----------+-----------' --------------------------------------- Volume Files. The machine in question is mounting the linuxdb1 volume. Any criticisms of the way these files are setup are also extremely welcome. [root at x2100-gfs1 glusterfs]# cat glusterfsd.vol ## file auto generated by /usr/bin/glusterfs-volgen (export.vol) # Cmd line: # $ /usr/bin/glusterfs-volgen --name DBMirror --raid 1 x2100-gfs1:/data/gfs/DBMirror x2100-gfs2:/data/gfs/DBMirror ############# pdb Mirror ##################################### # RAID 1 # TRANSPORT-TYPE tcp # mounted by test environment. Currently setup as # a replicate, or raid1, across x2100-gfs1 and x2100-gfs2 ################################################################## volume pdb-posix type storage/posix option directory /data/gfs/pdb end-volume volume pdb-locks type features/locks subvolumes pdb-posix end-volume volume pdb-iothreads type performance/io-threads option thread-count 8 subvolumes pdb-locks end-volume ############# linuxdb1 Mirror ##################################### # RAID 1 # TRANSPORT-TYPE tcp # mounted by linuxdb1 Currently configured as a # replicate, or raid1, across x2100-gfs1 and x2100-gfs2 ################################################################## volume linuxdb1-posix type storage/posix option directory /data/gfs/linuxdb1 end-volume volume linuxdb1-locks type features/locks subvolumes linuxdb1-posix end-volume volume linuxdb1-iothreads type performance/io-threads option thread-count 8 subvolumes linuxdb1-locks end-volume ############# vmmirror1 Mirror ################################### # RAID 1 # TRANSPORT-TYPE tcp # mounted by stuff (archtest01). Currently configured as a # replicate, or raid1, across x2100-gfs1 and x2100-gfs2 ################################################################## volume vmmirror1-posix type storage/posix option directory /data/gfs/vmmirror1 end-volume volume vmmirror1-locks type features/locks subvolumes vmmirror1-posix end-volume volume vmmirror1-iothreads type performance/io-threads option thread-count 8 subvolumes vmmirror1-locks end-volume ############# GLOBAL SPECIFICATIONS ############################### # TRANSPORT-TYPE tcp # global options. Currently configured to export volumes linuxdb1 # and pdb. ################################################################## volume server-tcp type protocol/server option transport-type tcp option auth.addr.pdb-iothreads.allow * option auth.addr.linuxdb1-iothreads.allow * option auth.addr.vmmirror1-iothreads.allow * option transport.socket.listen-port 6996 option transport.socket.nodelay on subvolumes pdb-iothreads linuxdb1-iothreads vmmirror1-iothreads end-volume [root at x2100-gfs1 glusterfs]# cat glusterfs.vol ## file auto generated by /usr/bin/glusterfs-volgen (mount.vol) # Cmd line: # $ /usr/bin/glusterfs-volgen --name DBMirror --raid 1 x2100-gfs1:/data/gfs/DBMirror x2100-gfs2:/data/gfs/DBMirror ############# PDB Mirror ##################################### # RAID 1 # TRANSPORT-TYPE tcp # Intended for pdb test environment # Volume-name: pdb ############################################################## volume x2100-gfs1-pdb type protocol/client option transport-type tcp option remote-host x2100-gfs1 option transport.socket.nodelay on option remote-port 6996 option remote-subvolume pdb-iothreads end-volume volume x2100-gfs2-pdb type protocol/client option transport-type tcp option remote-host x2100-gfs2 option transport.socket.nodelay on option remote-port 6996 option remote-subvolume pdb-iothreads end-volume # Name of the volume as specified at mount time volume pdb type cluster/replicate subvolumes x2100-gfs1-pdb x2100-gfs2-pdb end-volume volume pdb-writebehind type performance/write-behind option cache-size 4MB subvolumes pdb end-volume volume pdb-readahead type performance/read-ahead option page-count 4 subvolumes pdb-writebehind end-volume volume pdb-iocache type performance/io-cache option cache-size `grep 'MemTotal' /proc/meminfo | awk '{print $2 * 0.2 / 1024}' | cut -f1 -d.`MB option cache-timeout 1 subvolumes pdb-readahead end-volume volume pdb-quickread type performance/quick-read option cache-timeout 1 option max-file-size 64kB subvolumes pdb-iocache end-volume volume pdb-statprefetch type performance/stat-prefetch subvolumes pdb-quickread end-volume ############# linuxdb Mirror ##################################### # RAID 1 # TRANSPORT-TYPE tcp # Intended for linuxdb1 to mount # Volume-name: linuxdb1 ################################################################## volume x2100-gfs1-linuxdb1 type protocol/client option transport-type tcp option remote-host x2100-gfs1 option transport.socket.nodelay on option remote-port 6996 option remote-subvolume linuxdb1-iothreads end-volume volume x2100-gfs2-linuxdb1 type protocol/client option transport-type tcp option remote-host x2100-gfs2 option transport.socket.nodelay on option transport.remote-port 6996 option remote-subvolume linuxdb1-iothreads end-volume # Name of the volume as specified at mount time volume linuxdb1 type cluster/replicate subvolumes x2100-gfs1-linuxdb1 x2100-gfs2-linuxdb1 end-volume volume linuxdb1-writebehind type performance/write-behind option cache-size 4MB subvolumes linuxdb1 end-volume volume linuxdb1-readahead type performance/read-ahead option page-count 4 subvolumes linuxdb1-writebehind end-volume volume linuxdb1-iocache type performance/io-cache option cache-size `grep 'MemTotal' /proc/meminfo | awk '{print $2 * 0.2 / 1024}' | cut -f1 -d.`MB option cache-timeout 1 subvolumes linuxdb1-readahead end-volume volume linuxdb1-quickread type performance/quick-read option cache-timeout 1 option max-file-size 64kB subvolumes linuxdb1-iocache end-volume volume linuxdb1-statprefetch type performance/stat-prefetch subvolumes linuxdb1-quickread end-volume ############# Virtual Images Mirror ############################### # RAID 1 # TRANSPORT-TYPE tcp # Intended for vm testing servers to mount # Volume-name: vmmirror1 ################################################################## volume x2100-gfs1-vmmirror1 type protocol/client option transport-type tcp option remote-host x2100-gfs1 option transport.socket.nodelay on option remote-port 6996 option remote-subvolume vmmirror1-iothreads end-volume volume x2100-gfs2-vmmirror1 type protocol/client option transport-type tcp option remote-host x2100-gfs2 option transport.socket.nodelay on option remote-port 6996 option remote-subvolume vmmirror1-iothreads end-volume # Name of the volume as specified at mount time volume vmmirror1 type cluster/replicate subvolumes x2100-gfs1-vmmirror1 x2100-gfs2-vmmirror1 end-volume volume vmmirror1-writebehind type performance/write-behind option cache-size 4MB subvolumes vmmirror1 end-volume volume vmmirror1-readahead type performance/read-ahead option page-count 4 subvolumes vmmirror1-writebehind end-volume volume vmmirror1-iocache type performance/io-cache option cache-size `grep 'MemTotal' /proc/meminfo | awk '{print $2 * 0.2 / 1024}' | cut -f1 -d.`MB option cache-timeout 1 subvolumes vmmirror1-readahead end-volume volume vmmirror1-quickread type performance/quick-read option cache-timeout 1 option max-file-size 64kB subvolumes vmmirror1-iocache end-volume volume vmmirror1-statprefetch type performance/stat-prefetch subvolumes vmmirror1-quickread end-volume Thanks!
Jon Swanson
2010-Apr-02 07:38 UTC
[Gluster-users] Caching differences in Gluster vs Local Storage
Yeah, obviously it's not actually writing to physical disks. I'm assuming that because it's a small file size (32GB), most of that is just hitting the filesystem cache. What i'm curious about is why Gluster is not seeing similar benefits from filesystem cache. It is getting /some/ benefit: [root at linuxdb1 tiobench-gluster.2]# tiotest -b 16384 -r 4096 -f 32 -t 16 -d . Tiotest results for 16 concurrent io threads: ,----------------------------------------------------------------------. | Item | Time | Rate | Usr CPU | Sys CPU | +-----------------------+----------+--------------+----------+---------+ | Write 512 MBs | 16.7 s | 30.731 MB/s | 2.0 % | 33.5 % | | Random Write 1024 MBs | 38.9 s | 26.314 MB/s | 1.8 % | 32.5 % | | Read 512 MBs | 4.8 s | 107.145 MB/s | 4.0 % | 221.4 % | | Random Read 1024 MBs | 4.2 s | 241.220 MB/s | 11.6 % | 543.4 % | There's no way it's getting 241 MB/s over gigabit with Random Read. I'm sure there's a reason for this, just curious as to what it is. On 04/02/2010 04:29 PM, Marcus Bointon wrote:> On 2 Apr 2010, at 09:10, Jon Swanson wrote: > > >> ,----------------------------------------------------------------------. >> | Item | Time | Rate | Usr CPU | Sys CPU | >> +-----------------------+----------+--------------+----------+---------+ >> | Write 512 MBs | 35.7 s | 14.361 MB/s | 0.5 % | 833.8 % | >> | Random Write 1024 MBs | 100.6 s | 10.182 MB/s | 0.4 % | 379.5 % | >> | Read 512 MBs | 0.1 s | 4043.978 MB/s | 74.2 % | 5832.1 % | >> | Random Read 1024 MBs | 0.2 s | 4171.521 MB/s | 131.2 % | 6425.0 % | >> `----------------------------------------------------------------------' >> > Either these numbers or units are wrong or you have some outrageously fast disks! 4Gbytes/sec?? You have multiple FusionIOs or something? > > Marcus >