Joerg Hinz
2015-Dec-14 15:02 UTC
[Gluster-users] Heavy performance impact to local access (glusterfs 3.6.7)
I have a setup with 2 GlusterFS 3.6.7 servers that are connected with a WAN-connection: root at r1:/daten_gluster# gluster pool list UUID Hostname State 6b70b66c-866f-4222-826b-736a21a9fce1 willy Connected f1ba0eb9-b991-4c99-a177-a4ca7764ff52 localhost Connected PING willy (10.8.0.186) 56(84) bytes of data. 64 bytes from willy (10.8.0.186): icmp_seq=1 ttl=63 time=181 ms 64 bytes from willy (10.8.0.186): icmp_seq=2 ttl=63 time=69.0 ms 64 bytes from willy (10.8.0.186): icmp_seq=3 ttl=63 time=72.1 ms 64 bytes from willy (10.8.0.186): icmp_seq=4 ttl=63 time=71.1 ms 64 bytes from willy (10.8.0.186): icmp_seq=5 ttl=63 time=70.2 ms As you see it's a typical WAN-connect with a latency of about 70ms. And one shared gluster volume: root at r1:/daten_gluster# gluster volume info Volume Name: gv0 Type: Distribute Volume ID: 5baeef5e-4fd4-472f-b313-b0fcd1baa17a Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: r1:/storage/gluster Brick2: willy:/storage/gluster Options Reconfigured: nfs.export-volumes: off cluster.readdir-optimize: on performance.readdir-ahead: on diagnostics.brick-log-level: WARNING diagnostics.client-log-level: WARNING performance.write-behind-window-size: 64MB performance.cache-size: 256MB performance.client-io-threads: on performance.cache-refresh-timeout: 10 nfs.addr-namelookup: off cluster.min-free-disk: 1 cluster.data-self-heal-algorithm: full performance.io-thread-count: 64 nfs.disable: true performance.flush-behind: on As you see I tried all performance-options I found that might be useful. The problem is, when working in the gluster-mounted directory (using -t glusterfs, I tried NFS too, but there was not that great performance win), _EVERYTHING_ is DEAD SLOW: root at r1:/daten_gluster# time find test test test/4 test/4/file05 test/4/file04 test/4/file02 test/4/file03 test/4/file01 test/2 test/2/file05 test/2/file04 test/2/file02 test/2/file03 test/2/file01 test/file05 test/3 test/3/file05 test/3/file04 test/3/file02 test/3/file03 test/3/file01 test/file04 test/file02 test/file03 test/1 test/1/file05 test/1/file04 test/1/file02 test/1/file03 test/1/file01 test/file01 real 0m4.734s user 0m0.000s sys 0m0.000s When I disconnect the other node (willy): root at r1:/daten_gluster# gluster volume remove-brick gv0 willy:/storage/gluster force Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit force: success root at r1:/daten_gluster# time find test test test/4 test/4/file05 test/4/file04 test/4/file02 test/4/file03 test/4/file01 test/2 test/2/file05 test/2/file04 test/2/file02 test/2/file03 test/2/file01 test/file05 test/3 test/3/file05 test/3/file04 test/3/file02 test/3/file03 test/3/file01 test/file04 test/file02 test/file03 test/1 test/1/file05 test/1/file04 test/1/file02 test/1/file03 test/1/file01 test/file01 real 0m0.017s user 0m0.000s sys 0m0.000s 5 secs compared to 0,02 secs.... WHY? I'm just running local reads... (not even writes that might be distributed) When I add willy again it again slows down to death: root at r1:/daten_gluster# gluster volume add-brick gv0 willy:/storage/gluster force volume add-brick: success root at r1:/daten_gluster# time find test test test/4 test/4/file05 test/4/file04 test/4/file02 test/4/file03 test/4/file01 test/2 test/2/file05 test/2/file04 test/2/file02 test/2/file03 test/2/file01 test/file05 test/3 test/3/file05 test/3/file04 test/3/file02 test/3/file03 test/3/file01 test/file04 test/file02 test/file03 test/1 test/1/file05 test/1/file04 test/1/file02 test/1/file03 test/1/file01 test/file01 real 0m5.226s user 0m0.000s sys 0m0.000s These were only 30 files. I wanted to share 220.000 files with glusterfs... Where is my configuration-mistake? Can you please help me or give me a hint? I cannot belive that GlusterFS is that problematic on WAN-connects...? Thank you very much! Joerg