Ionescu, A.
2012-Apr-18 10:05 UTC
[Gluster-users] Performance issues with striped volume over Infiniband
Dear Gluster Users, We are facing some severe performance issues with GlusterFS and we would very much appreciate any help on identifying the cause of this. Our setup is extremely simple: 2 nodes interconnected with 40Gb/s Infiniband and also 1Gb/s Ethernet, running Centos 6.2 and GlusterFS 3.2.6. Each node has 4 SATA drives put in a RAID0 array that gives ~750 MB/s random reads bandwidth. The tool that we used for measuring IO performance relies on O_DIRECT access, so we patched the fuse kernel: http://marc.info/?l=linux-fsdevel&m=132950081331043&w=2. We created the following volume and mounted it at /mnt/gfs/. Volume Name: GFS_RDMA_VOLUME Type: Stripe Status: Started Number of Bricks: 2 Transport-type: rdma Bricks: Brick1: node01:/mnt/md0/gfs_storage Brick2: node02:/mnt/md0/gfs_storage Options Reconfigured: cluster.stripe-block-size: *:2MB performance.quick-read: on performance.io-cache: on performance.cache-size: 256MB performance.cache-max-file-size: 128MB We expected to see an IO bandwidth of about 1500 MB/s (measured with the exact same tool and parameters), but unfortunately we only get ~100MB/s, which is very disappointing. Please find below the output of #cat /var/log/glusterfs/mnt-gfs-.log. If you need any other information that I forgot to mentioned, please let me know. Thanks, Adrian ________________________________ [2012-04-18 11:59:42.847818] I [glusterfsd.c:1493:main] 0-/opt/glusterfs/3.2.6/sbin/glusterfs: Started running /opt/glusterfs/3.2.6/sbin/glusterfs version 3.2.6 [2012-04-18 11:59:42.862610] W [write-behind.c:3023:init] 0-GFS_RDMA_VOLUME-write-behind: disabling write-behind for first 0 bytes [2012-04-18 11:59:43.318188] I [client.c:1935:notify] 0-GFS_RDMA_VOLUME-client-0: parent translators are ready, attempting connect on transport [2012-04-18 11:59:43.321287] I [client.c:1935:notify] 0-GFS_RDMA_VOLUME-client-1: parent translators are ready, attempting connect on transport Given volfile: +------------------------------------------------------------------------------+ 1: volume GFS_RDMA_VOLUME-client-0 2: type protocol/client 3: option remote-host node01 4: option remote-subvolume /mnt/md0/gfs_storage 5: option transport-type rdma 6: end-volume 7: 8: volume GFS_RDMA_VOLUME-client-1 9: type protocol/client 10: option remote-host node02 11: option remote-subvolume /mnt/md0/gfs_storage 12: option transport-type rdma 13: end-volume 14: 15: volume GFS_RDMA_VOLUME-stripe-0 16: type cluster/stripe 17: option block-size *:2MB 18: subvolumes GFS_RDMA_VOLUME-client-0 GFS_RDMA_VOLUME-client-1 19: end-volume 20: 21: volume GFS_RDMA_VOLUME-write-behind 22: type performance/write-behind 23: subvolumes GFS_RDMA_VOLUME-stripe-0 24: end-volume 25: 26: volume GFS_RDMA_VOLUME-read-ahead 27: type performance/read-ahead 28: subvolumes GFS_RDMA_VOLUME-write-behind 29: end-volume 30: 31: volume GFS_RDMA_VOLUME-io-cache 32: type performance/io-cache 33: option max-file-size 128MB 34: option cache-size 256MB 35: subvolumes GFS_RDMA_VOLUME-read-ahead 36: end-volume 37: 38: volume GFS_RDMA_VOLUME-quick-read 39: type performance/quick-read 40: option cache-size 256MB 41: subvolumes GFS_RDMA_VOLUME-io-cache 42: end-volume 43: 44: volume GFS_RDMA_VOLUME-stat-prefetch 45: type performance/stat-prefetch 46: subvolumes GFS_RDMA_VOLUME-quick-read 47: end-volume 48: 49: volume GFS_RDMA_VOLUME 50: type debug/io-stats 51: option latency-measurement off 52: option count-fop-hits off 53: subvolumes GFS_RDMA_VOLUME-stat-prefetch 54: end-volume +------------------------------------------------------------------------------+ [2012-04-18 11:59:43.326287] E [client-handshake.c:1171:client_query_portmap_cbk] 0-GFS_RDMA_VOLUME-client-1: failed to get the port number for remote subvolume [2012-04-18 11:59:43.764287] E [client-handshake.c:1171:client_query_portmap_cbk] 0-GFS_RDMA_VOLUME-client-0: failed to get the port number for remote subvolume [2012-04-18 11:59:46.868595] I [rpc-clnt.c:1536:rpc_clnt_reconfig] 0-GFS_RDMA_VOLUME-client-0: changing port to 24009 (from 0) [2012-04-18 11:59:46.879292] I [rpc-clnt.c:1536:rpc_clnt_reconfig] 0-GFS_RDMA_VOLUME-client-1: changing port to 24009 (from 0) [2012-04-18 11:59:50.872346] I [client-handshake.c:1090:select_server_supported_programs] 0-GFS_RDMA_VOLUME-client-0: Using Program GlusterFS 3.2.6, Num (1298437), Version (310) [2012-04-18 11:59:50.872760] I [client-handshake.c:913:client_setvolume_cbk] 0-GFS_RDMA_VOLUME-client-0: Connected to 192.168.0.101:24009, attached to remote volume '/mnt/md0/gfs_storage'. [2012-04-18 11:59:50.874975] I [client-handshake.c:1090:select_server_supported_programs] 0-GFS_RDMA_VOLUME-client-1: Using Program GlusterFS 3.2.6, Num (1298437), Version (310) [2012-04-18 11:59:50.875290] I [client-handshake.c:913:client_setvolume_cbk] 0-GFS_RDMA_VOLUME-client-1: Connected to 192.168.0.103:24009, attached to remote volume '/mnt/md0/gfs_storage'. [2012-04-18 11:59:50.878013] I [fuse-bridge.c:3339:fuse_graph_setup] 0-fuse: switched to graph 0 [2012-04-18 11:59:50.878321] I [fuse-bridge.c:2927:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13 ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120418/bc943535/attachment.html>
Sabuj Pattanayek
2012-Apr-18 12:48 UTC
[Gluster-users] Performance issues with striped volume over Infiniband
I've seen the same 100MB/s limit (depending on block size of transfer) with 5 bricks in a stripe and have yet to try ipoib, which I hear improves performance over rdma for some reason. On Wed, Apr 18, 2012 at 5:05 AM, Ionescu, A. <a.ionescu at student.vu.nl> wrote:> Dear Gluster Users, > > We are facing some severe performance issues with GlusterFS and we would > very much appreciate any help on identifying the cause of this.