Armen Baghumian
2015-Sep-14 23:31 UTC
[Gluster-users] Stale write/read during self-healing
Hey guys, I'm trying to do an experiment with glusterfs and having some problems and will appreciate if someone points out what I'm doing wrong. I've created replicated volume across two nodes. # gluster volume info Volume Name: gv0 Type: Replicate Volume ID: fe8f42b7-2a4b-4cf0-b22e-082bd5c8ed60 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 172.31.13.220:/srv/gluster-bricks/xvdf/gv0 Brick2: 172.31.20.154:/srv/gluster-bricks/xvdf/gv0 This is the scenario I'm testing. Two servers are up and running: #gluster peer status Number of Peers: 1 Hostname: 172.31.13.220 Uuid: 3a7fc90b-47b3-4a80-950c-e27afbb64e06 State: Peer in Cluster (Connected) And: Number of Peers: 1 Hostname: 172.31.20.154 Uuid: aceb2dce-db05-4368-98b4-398338d9de1a State: Peer in Cluster (Connected) There are two other servers which use native glusterfs client to mount the volume. I run a script on both of the servers to create 0-sized files to measure the latency of fail-over. I've attached the script, the following cut is the important part of the script: --CUT-- while true do epoch=$(date +"%s%N") touch /home/armen/testdata/$sampledata/$epoch & pid=$! if [ $? -eq 0 ]; then echo "mountpoint is up (${pid})" fi wait $pid echo "Testing $sampledata..." done --CUT-- when I reboot one of the servers I see following lines in the logs: [2015-09-14 22:41:38.006195] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f78b08194c6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f78b05ec01e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f78b05ec12e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7f78b05eda92] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f78b05ee248] ))))) 14-gv0-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-09-14 22:41:37.973664 (xid=0x2d2efc) [2015-09-14 22:41:38.006387] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] 14-gv0-client-1: remote operation failed: Transport endpoint is not connected. Path: <gfid:f106d7a3-a7b1-4cf8-8402-40e19993ece6>/testdata (72d1db26-1e6e-460f-a1ba-d428bd09b4d7) [2015-09-14 22:41:38.006940] I [client.c:2215:client_rpc_notify] 14-gv0-client-1: disconnected from gv0-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2015-09-14 22:41:38.009152] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] 14-gv0-client-1: remote operation failed: Transport endpoint is not connected. Path: (null) (00000000-0000-0000-0000-000000000000) And of-course: # gluster peer status Number of Peers: 1 Hostname: 172.31.20.154 Uuid: aceb2dce-db05-4368-98b4-398338d9de1a State: Peer in Cluster (Disconnected) The fail-over works smoothly, I don't see any latency or interruption in the scripts. When the server is back online, obviously it performs the self-heal: [2015-09-14 22:42:51.483091] E [client-handshake.c:1496:client_query_portmap_cbk] 14-gv0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2015-09-14 22:42:51.483387] I [client.c:2215:client_rpc_notify] 14-gv0-client-1: disconnected from gv0-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2015-09-14 22:42:54.387194] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 14-gv0-client-1: changing port to 49152 (from 0) [2015-09-14 22:42:54.395019] I [client-handshake.c:1413:select_server_supported_programs] 14-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-09-14 22:42:54.403842] I [client-handshake.c:1200:client_setvolume_cbk] 14-gv0-client-1: Connected to gv0-client-1, attached to remote volume '/srv/gluster-bricks/xvdf/gv0'. [2015-09-14 22:42:54.403986] I [client-handshake.c:1210:client_setvolume_cbk] 14-gv0-client-1: Server and Client lk-version numbers are not same, reopening the fds [2015-09-14 22:42:54.404508] I [client-handshake.c:188:client_set_lk_version_cbk] 14-gv0-client-1: Server lk version = 1 [2015-09-14 22:42:54.438649] I [afr-self-heal-entry.c:554:afr_selfheal_entry_do] 14-gv0-replicate-0: performing entry selfheal on 6dbd77a9-398c-4c90-b7a0-f6c36337fb2a But as soon as it starts the self-heal it blocks the bash script(s): root 6343 0.0 0.0 107920 352 pts/0 S+ 22:42 0:00 touch /home/armen/testdata/1442270482/1442270574421803041 root 30929 0.5 0.0 113248 1620 pts/0 S+ 22:41 0:01 /bin/bash ./test.sh And Stracing the glusterfs process: open("/proc/16343/status", O_RDONLY) = 11 fstat(11, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f78b0cb2000 read(11, "Name:\ttouch\nState:\tS (sleeping)\n"..., 1024) = 1024 close(11) = 0 munmap(0x7f78b0cb2000, 4096) = 0 gettimeofday({1442207605, 814096}, NULL) = 0 writev(14, [{"\200\0\2x", 4}, {"\0\25\250\36\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\33\0\5\363\227\0\0\0 "..., 72}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\215\20\317\226\344\222M\344\274\376g\202i<x\33"..., 560}], 3) = 636 gettimeofday({1442207605, 814221}, NULL) = 0 readv(8, And checking FD=8 with lsof reveals: glusterfs 1563 root 8u CHR 10,229 0t0 7141 /dev/fuse So it seems it blocks on reading /dev/fuse. At this point if I run another instance of the bash script it works, it just blocks the processes by the time the self-heal starts. Then when I see the following line in the logs: [2015-09-14 22:45:29.847199] I [afr-self-heal-common.c:476:afr_log_selfheal] 14-gv0-replicate-0: Completed entry selfheal on 6dbd77a9-398c-4c90-b7a0-f6c36337fb2a. source=0 sinks=1 It lets the blocked process go. My question is, why it blocks those processes when it starts the self-heal while initiating and processes after self-heal is started is fine? This is the glusterfs version I'm using in a CentOS: #glusterfs -V glusterfs 3.6.2 built on Feb 4 2015 20:41:30 #glusterd -V glusterfs 3.6.2 built on Feb 4 2015 20:41:30 # uname -r 3.10.0-123.8.1.el7.x86_64 Cheers, Armen -------------- next part -------------- A non-text attachment was scrubbed... Name: test.sh Type: application/x-shellscript Size: 1256 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150915/b63b4178/attachment.bin>