Emir Imamagic
2011-Apr-16 19:01 UTC
[Gluster-users] Gluster native client in case of distributed volume node failure
Hello, I am trying to find precise definition of gluster native client behavior in case of distributed volume node failure. Some info is provided in FAQ: http://www.gluster.com/community/documentation/index.php/GlusterFS_Technical_FAQ#What_happens_if_a_GlusterFS_brick_crashes.3F but it doesn't provide details. The other info I managed to find is this stale document: http://www.gluster.com/community/documentation/index.php/Understanding_DHT_Translator Document says that files on the failed node will not be visible to client. However, behavior of opened file handles is not described. I did couple of simple tests with cp and sha1 commands in order to see what what happens. Server configuration: Volume Name: test Type: Distribute Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: gluster1:/data Brick2: gluster2:/data Options Reconfigured: performance.stat-prefetch: off performance.write-behind-window-size: 4MB performance.io-thread-count: 8 On client side I use default mount without any additional options. *File read*: Both cp and sha1 seem to read to the point when node fails and then exit without error. In case of sha1sum it reports incorrect hash and in case of cp it copies part of the file. In Gluster client logs I see errors indicating node failure, but commands doesn't report anything. *File write*: In case of write situation is slightly better as cp reports that endpoint is not connected and then fails: # cp testfile /gluster/; echo $? cp: writing `testfile': Transport endpoint is not connected cp: closing `testfile': Transport endpoint is not connected 1 Another interesting detail is that in client log I see that file gets reopened when the storage node comes back online: [2011-04-16 14:03:04.909540] I [client-handshake.c:407:client3_1_reopen_cbk] test-client-1: reopen on /testfile succeeded (remote-fd = 0) [2011-04-16 14:03:04.909782] I [client-handshake.c:407:client3_1_reopen_cbk] test-client-1: reopen on /testfile succeeded (remote-fd = 1) However, command has already finished. What is the purpose of this reopen? Is this expected behavior? Could you please provide pointers to documentation if such exists? Is it possible to tune this behavior to be more NFS alike, i.e. put processes in IO wait until the node comes back? Thanks in advance -- Emir Imamagic www.srce.hr
Emir Imamagic
2011-May-12 09:47 UTC
[Gluster-users] Gluster native client in case of distributed volume node failure
Hello, anyone has any comments to the issues I described below? Any feedback would be more than welcome. Thanks On 16.4.2011. 21:01, Emir Imamagic wrote:> Hello, > > I am trying to find precise definition of gluster native client behavior > in case of distributed volume node failure. Some info is provided in FAQ: > > http://www.gluster.com/community/documentation/index.php/GlusterFS_Technical_FAQ#What_happens_if_a_GlusterFS_brick_crashes.3F > > but it doesn't provide details. > The other info I managed to find is this stale document: > > http://www.gluster.com/community/documentation/index.php/Understanding_DHT_Translator > > Document says that files on the failed node will not be visible to > client. However, behavior of opened file handles is not described. > > I did couple of simple tests with cp and sha1 commands in order to see > what what happens. Server configuration: > Volume Name: test > Type: Distribute > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: gluster1:/data > Brick2: gluster2:/data > Options Reconfigured: > performance.stat-prefetch: off > performance.write-behind-window-size: 4MB > performance.io-thread-count: 8 > On client side I use default mount without any additional options. > > *File read*: Both cp and sha1 seem to read to the point when node fails > and then exit without error. In case of sha1sum it reports incorrect > hash and in case of cp it copies part of the file. In Gluster client > logs I see errors indicating node failure, but commands doesn't report > anything. > > *File write*: In case of write situation is slightly better as cp > reports that endpoint is not connected and then fails: > # cp testfile /gluster/; echo $? > cp: writing `testfile': Transport endpoint is not connected > cp: closing `testfile': Transport endpoint is not connected > 1 > > Another interesting detail is that in client log I see that file gets > reopened when the storage node comes back online: > [2011-04-16 14:03:04.909540] I > [client-handshake.c:407:client3_1_reopen_cbk] test-client-1: reopen on > /testfile succeeded (remote-fd = 0) > [2011-04-16 14:03:04.909782] I > [client-handshake.c:407:client3_1_reopen_cbk] test-client-1: reopen on > /testfile succeeded (remote-fd = 1) > However, command has already finished. What is the purpose of this reopen? > > Is this expected behavior? Could you please provide pointers to > documentation if such exists? > > Is it possible to tune this behavior to be more NFS alike, i.e. put > processes in IO wait until the node comes back? > > Thanks in advance-- Emir Imamagic www.srce.hr -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2853 bytes Desc: S/MIME Cryptographic Signature URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110512/4d109c71/attachment.p7s>