David Coulson
2012-Jun-03 15:05 UTC
[Gluster-users] File IO issues during brick unreachable in replica config
I've a volume in a 4 way replica configuration running 3.3.0 - Two bricks are in one datacenter, two are in the other. We had some sort of connectivity issue between the two facilities this morning, and applications utilizing gluster mounts (via NFS; in this case only-read work load) experienced IO timeouts. I've a 5s network timeout on the volume, and a 20s timeout on the application - I'd expect even if it went through 3 bricks before it found a good one for a read, it would take 10s. What is the expectation for a read which occurs when a brick is in the process of failing? Should the IO fail, or should it be re-routed to an available brick? I don't see anything specific in nfs.log indicating a particular read failed, just that the bricks went up/down. Info is below - Let me know if there are other logs I need to look at. [root at dresproddns02 glusterfs]# gluster volume info svn Volume Name: svn Type: Replicate Volume ID: fabe320d-5ef2-4f35-9720-eab617e13dde Status: Started Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: rhesproddns01:/gluster/svn Brick2: rhesproddns02:/gluster/svn Brick3: dresproddns01:/gluster/svn Brick4: dresproddns02:/gluster/svn Options Reconfigured: performance.write-behind-window-size: 128Mb performance.cache-size: 256Mb auth.allow: 10.250.53.*,10.252.248.*,169.254.*,127.0.0.1 nfs.register-with-portmap: on nfs.disable: off performance.stat-prefetch: 1 network.ping-timeout: 5 performance.flush-behind: on performance.client-io-threads: 1 nfs.rpc-auth-allow: 127.0.0.1 nfs.log output is here: http://pastebin.com/CNmP4s32