Xin - I appreciate your patience. I'd need some more time to pick this item up from my backlog. I believe we have a workaround applicable here too. On Thu, 24 Nov 2016 at 14:24, songxin <songxin_1980 at 126.com> wrote:> > > > Hi Atin, > Actually, the glusterfs is used in my project. > And our test team find this issue. > So I want to make sure that whether you plan to fix it. > if you have plan I will wait you because your method shoud be better than > mine. > > Thanks, > Xin > > > ? 2016-11-21 10:00:36?"Atin Mukherjee" <atin.mukherjee83 at gmail.com> ??? > > Hi Xin, > > I've not got a chance to look into it yet. delete stale volume function is > in place to take care of wiping off volume configuration data which has > been deleted from the cluster. However we need to revisit this code to see > if this function is anymore needed given we recently added a validation to > fail delete request if one of the glusterd is down. I'll get back to you on > this. > > On Mon, 21 Nov 2016 at 07:24, songxin <songxin_1980 at 126.com> wrote: > > Hi Atin, > Thank you for your support. > > And any conclusions about this issue? > > Thanks, > Xin > > > > > > ? 2016-11-16 20:59:05?"Atin Mukherjee" <amukherj at redhat.com> ??? > > > > On Tue, Nov 15, 2016 at 1:53 PM, songxin <songxin_1980 at 126.com> wrote: > > ok, thank you. > > > > > ? 2016-11-15 16:12:34?"Atin Mukherjee" <amukherj at redhat.com> ??? > > > > On Tue, Nov 15, 2016 at 12:47 PM, songxin <songxin_1980 at 126.com> wrote: > > > Hi Atin, > > I think the root cause is in the function glusterd_import_friend_volume as > below. > > int32_t > glusterd_import_friend_volume (dict_t *peer_data, size_t count) > { > ... > ret = glusterd_volinfo_find (new_volinfo->volname, &old_volinfo); > if (0 == ret) { > (void) gd_check_and_update_rebalance_info (old_volinfo, > new_volinfo); > (void) glusterd_delete_stale_volume (old_volinfo, > new_volinfo); > } > ... > ret = glusterd_store_volinfo (new_volinfo, > GLUSTERD_VOLINFO_VER_AC_NONE); > if (ret) { > gf_msg (this->name, GF_LOG_ERROR, 0, > GD_MSG_VOLINFO_STORE_FAIL, "Failed to store " > "volinfo for volume %s", new_volinfo->volname); > goto out; > } > ... > } > > glusterd_delete_stale_volume will remove the info and bricks/* and the > glusterd_store_volinfo will create the new one. > But if glusterd is killed before rename the info will is empty. > > And glusterd will start failed because the infois empty in the next time > you start the glusterd. > > Any idea? Atin? > > > Give me some time, will check it out, but reading at this analysis looks > very well possible if a volume is changed when the glusterd was done on > node a and when the same comes up during peer handshake we update the > volinfo and during that time glusterd goes down once again. I'll confirm it > by tomorrow. > > > I checked the code and it does look like you have got the right RCA for > the issue which you simulated through those two scripts. However this can > happen even when you try to create a fresh volume and while glusterd tries > to write the content into the store and goes down before renaming the > info.tmp file you get into the same situation. > > I'd really need to think through if this can be fixed. Suggestions are > always appreciated. > > > > > BTW, excellent work Xin! > > > Thanks, > Xin > > > ? 2016-11-15 12:07:05?"Atin Mukherjee" <amukherj at redhat.com> ??? > > > > On Tue, Nov 15, 2016 at 8:58 AM, songxin <songxin_1980 at 126.com> wrote: > > Hi Atin, > I have some clues about this issue. > I could reproduce this issue use the scrip that mentioned in > https://bugzilla.redhat.com/show_bug.cgi?id=1308487 . > > > I really appreciate your help in trying to nail down this issue. While I > am at your email and going through the code to figure out the possible > cause for it, unfortunately I don't see any script in the attachment of the > bug. Could you please cross check? > > > > After I added some debug print,which like below, in glusterd-store.c and I > found that the /var/lib/glusterd/vols/xxx/info and /var/lib/glusterd/vols/xxx/bricks/* > are removed. > But other files in /var/lib/glusterd/vols/xxx/ will not be remove. > > int32_t > glusterd_store_volinfo (glusterd_volinfo_t *volinfo, > glusterd_volinfo_ver_ac_t ac) > { > int32_t ret = -1; > > GF_ASSERT (volinfo) > > ret = access("/var/lib/glusterd/vols/gv0/info", F_OK); > if(ret < 0) > { > gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "info is not > exit(%d)", errno); > } > else > { > ret = stat("/var/lib/glusterd/vols/gv0/info", &buf); > if(ret < 0) > { > gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "stat info > error"); > } > else > { > gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "info size > is %lu, inode num is %lu", buf.st_size, buf.st_ino); > } > } > > glusterd_perform_volinfo_version_action (volinfo, ac); > ret = glusterd_store_create_volume_dir (volinfo); > if (ret) > goto out; > > ... > } > > So it is easy to understand why the info or > 10.32.1.144.-opt-lvmdir-c2-brick sometimes is empty. > It is becaue the info file is not exist, and it will be create by ?fd > open (path, O_RDWR | O_CREAT | O_APPEND, 0600);? in function > gf_store_handle_new. > And the info file is empty before rename. > So the info file is empty if glusterd shutdown before rename. > > > > My question is following. > 1.I did not find the point the info is removed.Could you tell me the point > where the info and /bricks/* are removed? > 2.why the file info and bricks/* is removed?But other files in var/lib/glusterd/vols/xxx/ > are not be removed? > > > AFAIK, we never delete the info file and hence this file is opened with > O_APPEND flag. As I said I will go back and cross check the code once again. > > > > > Thanks, > Xin > > > ? 2016-11-11 20:34:05?"Atin Mukherjee" <amukherj at redhat.com> ??? > > > > On Fri, Nov 11, 2016 at 4:00 PM, songxin <songxin_1980 at 126.com> wrote: > > Hi Atin, > > Thank you for your support. > Sincerely wait for your reply. > > By the way, could you make sure that the issue, file info is empty, cause > by rename is interrupted in kernel? > > > As per my RCA on that bug, it looked to be. > > > > Thanks, > Xin > > ? 2016-11-11 15:49:02?"Atin Mukherjee" <amukherj at redhat.com> ??? > > > > On Fri, Nov 11, 2016 at 1:15 PM, songxin <songxin_1980 at 126.com> wrote: > > Hi Atin, > Thank you for your reply. > Actually it is very difficult to reproduce because I don't know when there > was an ongoing commit happening.It is just a coincidence. > But I want to make sure the root cause. > > > I'll give it a another try and see if this situation can be > simulated/reproduced and will keep you posted. > > > > So I would be grateful if you could answer my questions below. > > You said that "This issue is hit at part of the negative testing where > while gluster volume set was executed at the same point of time glusterd in > another instance was brought down. In the faulty node we could see > /var/lib/glusterd/vols/<volname>info file been empty whereas the info.tmp > file has the correct contents." in comment. > > I have two questions for you. > > 1.Could you reproduce this issue by gluster volume set glusterd which was brought down? > 2.Could you be certain that this issue is cause by rename is interrupted in kernel? > > In my case there are two files, info and 10.32.1.144.-opt-lvmdir-c2-brick, are both empty. > But in my view only one rename can be running at the same time because of the big lock. > Why there are two files are empty? > > > Could rename("info.tmp", "info") and rename("xxx-brick.tmp", "xxx-brick") be running in two thread? > > Thanks, > Xin > > > ? 2016-11-11 15:27:03?"Atin Mukherjee" <amukherj at redhat.com> ??? > > > > On Fri, Nov 11, 2016 at 12:38 PM, songxin <songxin_1980 at 126.com> wrote: > > > Hi Atin, > Thank you for your reply. > > As you said that the info file can only be changed in the glusterd_store_volinfo() > sequentially because of the big lock. > > I have found the similar issue as below that you mentioned. > https://bugzilla.redhat.com/show_bug.cgi?id=1308487 > > > Great, so this is what I was actually trying to refer in my first email > that I saw a similar issue. Have you got a chance to look at > https://bugzilla.redhat.com/show_bug.cgi?id=1308487#c4 ? But in your > case, did you try to bring down glusterd when there was an ongoing commit > happening? > > > > You said that "This issue is hit at part of the negative testing where > while gluster volume set was executed at the same point of time glusterd in > another instance was brought down. In the faulty node we could see > /var/lib/glusterd/vols/<volname>info file been empty whereas the info.tmp > file has the correct contents." in comment. > > I have two questions for you. > > 1.Could you reproduce this issue by gluster volume set glusterd which was brought down? > 2.Could you be certain that this issue is cause by rename is interrupted in kernel? > > In my case there are two files, info and 10.32.1.144.-opt-lvmdir-c2-brick, are both empty. > But in my view only one rename can be running at the same time because of the big lock. > Why there are two files are empty? > > > Could rename("info.tmp", "info") and rename("xxx-brick.tmp", "xxx-brick") be running in two thread? > > Thanks, > Xin > > > > > ? 2016-11-11 14:36:40?"Atin Mukherjee" <amukherj at redhat.com> ??? > > > > On Fri, Nov 11, 2016 at 8:33 AM, songxin <songxin_1980 at 126.com> wrote: > > Hi Atin, > > Thank you for your reply. > I have two questions for you. > > 1.Are the two files info and info.tmp are only to be created or changed in > function glusterd_store_volinfo()? I did not find other point in which the > two file are changed. > > > If we are talking about info file volume then yes, the mentioned function > actually takes care of it. > > > 2.I found that glusterd_store_volinfo() will be call in many point by > glusterd.Is there a problem of thread synchronization?If so, one thread may > open a same file info.tmp using O_TRUNC flag when another thread is > writing the info,tmp.Could this case happen? > > > In glusterd threads are big lock protected and I don't see a possibility > (theoretically) to have two glusterd_store_volinfo () calls at a given > point of time. > > > > Thanks, > Xin > > > At 2016-11-10 21:41:06, "Atin Mukherjee" <amukherj at redhat.com> wrote: > > Did you run out of disk space by any chance? AFAIK, the code is like we > write new stuffs to .tmp file and rename it back to the original file. In > case of a disk space issue I expect both the files to be of non zero size. > But having said that I vaguely remember a similar issue (in the form of a > bug or an email) landed up once but we couldn't reproduce it, so something > is wrong with the atomic update here is what I guess. I'll be glad if you > have a reproducer for the same and then we can dig into it further. > > On Thu, Nov 10, 2016 at 1:32 PM, songxin <songxin_1980 at 126.com> wrote: > > Hi, > When I start the glusterd some error happened. > And the log is following. > > [2016-11-08 07:58:34.989365] I [MSGID: 100030] [glusterfsd.c:2318:main] > 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6 > (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) > [2016-11-08 07:58:34.998356] I [MSGID: 106478] [glusterd.c:1350:init] > 0-management: Maximum allowed open file descriptors set to 65536 > [2016-11-08 07:58:35.000667] I [MSGID: 106479] [glusterd.c:1399:init] > 0-management: Using /system/glusterd as working directory > [2016-11-08 07:58:35.024508] I [MSGID: 106514] > [glusterd-store.c:2075:glusterd_restore_op_version] 0-management: Upgrade > detected. Setting op-version to minimum : 1 > *[2016-11-08 07:58:35.025356] E [MSGID: 106206] > [glusterd-store.c:2562:glusterd_store_update_volinfo] 0-management: Failed > to get next store iter * > *[2016-11-08 07:58:35.025401] E [MSGID: 106207] > [glusterd-store.c:2844:glusterd_store_retrieve_volume] 0-management: Failed > to update volinfo for c_glusterfs volume * > *[2016-11-08 07:58:35.025463] E [MSGID: 106201] > [glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management: > Unable to restore volume: c_glusterfs * > *[2016-11-08 07:58:35.025544] E [MSGID: 101019] [xlator.c:428:xlator_init] > 0-management: Initialization of volume 'management' failed, review your > volfile again * > *[2016-11-08 07:58:35.025582] E [graph.c:322:glusterfs_graph_init] > 0-management: initializing translator failed * > *[2016-11-08 07:58:35.025629] E [graph.c:661:glusterfs_graph_activate] > 0-graph: init failed * > [2016-11-08 07:58:35.026109] W [glusterfsd.c:1236:cleanup_and_exit] > (-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b260) [0x1000a718] > -->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b3b8) [0x1000a5a8] > -->/usr/sbin/glusterd(cleanup_and_exit-0x1c02c) [0x100098bc] ) 0-: received > signum (0), shutting down > > > And then I found that the size of vols/volume_name/info is 0.It cause > glusterd shutdown. > But I found that vols/volume_name_info.tmp is not 0. > And I found that there is a brick file vols/volume_name/bricks/xxxx.brick > is 0, but vols/volume_name/bricks/xxxx.brick.tmp is not 0. > > I read the function code glusterd_store_volinfo () in glusterd-store.c . > I know that the info.tmp will be rename to info in function > glusterd_store_volume_atomic_update(). > > But my question is that why the info file is 0 but info.tmp is not 0. > > > Thanks, > Xin > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > -- > > ~ Atin (atinm) > > > > > > > > > -- > > ~ Atin (atinm) > > > > > > > > > -- > > ~ Atin (atinm) > > > > > > > > > -- > > ~ Atin (atinm) > > > > > > > > > -- > > ~ Atin (atinm) > > > > > > > > > -- > > ~ Atin (atinm) > > > > > > > > > -- > > ~ Atin (atinm) > > > > > > > > > -- > > ~ Atin (atinm) > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > > -- > --Atin > > --- Atin (atinm) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161124/93e6be10/attachment.html>
songxin
2016-Nov-25 06:33 UTC
[Gluster-users] not to reconnect between client and server because of race condition
Hi Atin
I found a problem, that is about client(glusterfs) will not trying to reconnect
to server(glusterfsd) after disconnect.
Actually, it seems caused by race condition.
Precondition
The glusterfs version is 3.7.6.
I create a replicate volume using two node, A node and B node.One brick is on A
node and another brick is on B node.
A node ip:10.32.1.144
B node ip:10.32.0.48
The phenomenon is following.
Firstly, the client(glusterfs) on A board disconnect with server(glusterfsd) on
B board.The log is following.
...
readv on 10.32.0.48:49309 failed (No data available)
...
And then the client(glusterfs) on A board disconnect with server(glusterfsd) on
A board.The log is following.
...
readv on 10.32.1.144:49391 failed (Connection reset by peer)
...
After that, all operation in mount point will show "Transport endpoint is
not connected" until client reconnect with server(glusterfsd) on B board.
The client log is following.And I have highlight the important line.
...
[2016-10-31 04:06:03.626047] W [socket.c:588:__socket_rwv]
0-c_glusterfs-client-9: readv on 10.32.1.144:49391 failed (Connection reset by
peer)
[2016-10-31 04:06:03.627345] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] (
-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] (
-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] (
-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18] (
-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] )))))
0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3)
op(FINODELK(30)) called at 2016-10-31 04:06:03.626033 (xid=0x7f5e)
[2016-10-31 04:06:03.627395] E [MSGID: 114031]
[client-rpc-fops.c:1673:client3_3_finodelk_cbk] 0-c_glusterfs-client-9: remote
operation failed [Transport endpoint is not connected]
[2016-10-31 04:06:03.628381] I [socket.c:3308:socket_submit_request]
0-c_glusterfs-client-9: not connected (priv->connected = 0)
[2016-10-31 04:06:03.628432] W [rpc-clnt.c:1586:rpc_clnt_submit]
0-c_glusterfs-client-9: failed to submit rpc-request (XID: 0x7f5f Program:
GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (c_glusterfs-client-9)
[2016-10-31 04:06:03.628466] E [MSGID: 114031]
[client-rpc-fops.c:1673:client3_3_finodelk_cbk] 0-c_glusterfs-client-9: remote
operation failed [Transport endpoint is not connected]
[2016-10-31 04:06:03.628475] I [MSGID: 108019]
[afr-lk-common.c:1086:afr_lock_blocking] 0-c_glusterfs-replicate-0: unable to
lock on even one child
[2016-10-31 04:06:03.628539] I [MSGID: 108019]
[afr-transaction.c:1224:afr_post_blocking_inodelk_cbk]
0-c_glusterfs-replicate-0: Blocking inodelks failed.
[2016-10-31 04:06:03.628630] W [fuse-bridge.c:1282:fuse_err_cbk]
0-glusterfs-fuse: 20790: FLUSH() ERR => -1 (Transport endpoint is not
connected)
[2016-10-31 04:06:03.629149] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] )))))
0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3)
op(LOOKUP(27)) called at 2016-10-31 04:06:03.624346 (xid=0x7f5a)
[2016-10-31 04:06:03.629183] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
0-c_glusterfs-client-9: changing port to 49391 (from 0)
[2016-10-31 04:06:03.629210] W [MSGID: 114031]
[client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-c_glusterfs-client-9: remote
operation failed. Path: /loadmodules_norepl/CXC1725605_P93A001/cello/emasviews
(b0e5a94e-a432-4dce-b86f-a551555780a2) [Transport endpoint is not connected]
[2016-10-31 04:06:03.629266] I [socket.c:3308:socket_submit_request]
0-c_glusterfs-client-9: not connected (priv->connected = 255)
[2016-10-31 04:06:03.629277] I [MSGID: 109063]
[dht-layout.c:702:dht_layout_normalize] 0-c_glusterfs-dht: Found anomalies in
/loadmodules_norepl/CXC1725605_P93A001/cello/emasviews (gfid =
b0e5a94e-a432-4dce-b86f-a551555780a2). Holes=1 overlaps=0
[2016-10-31 04:06:03.629293] W [rpc-clnt.c:1586:rpc_clnt_submit]
0-c_glusterfs-client-9: failed to submit rpc-request (XID: 0x7f62 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 41) to rpc-transport (c_glusterfs-client-9)
[2016-10-31 04:06:03.629333] W [fuse-resolve.c:149:fuse_resolve_gfid_cbk]
0-fuse: b0e5a94e-a432-4dce-b86f-a551555780a2: failed to resolve (Transport
endpoint is not connected)
[2016-10-31 04:06:03.629363] W [fuse-bridge.c:3385:fuse_getxattr_resume]
0-glusterfs-fuse: 20784: GETXATTR
b0e5a94e-a432-4dce-b86f-a551555780a2/70366685109140 (system.posix_acl_access)
resolution failed
[2016-10-31 04:06:03.629729] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] )))))
0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3)
op(LOOKUP(27)) called at 2016-10-31 04:06:03.624568 (xid=0x7f5b)
[2016-10-31 04:06:03.629779] W [MSGID: 114031]
[client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-c_glusterfs-client-9: remote
operation failed. Path: /pmd/2 (fa866805-ab22-4afe-8acb-528d15ce4117) [Transport
endpoint is not connected]
[2016-10-31 04:06:03.629784] E [MSGID: 114031]
[client-rpc-fops.c:2883:client3_3_opendir_cbk] 0-c_glusterfs-client-9: remote
operation failed. Path: /loadmodules_norepl/CXC1725605_P93A001/cello/emasviews
(b0e5a94e-a432-4dce-b86f-a551555780a2) [Transport endpoint is not connected]
[2016-10-31 04:06:03.629836] I [MSGID: 109063]
[dht-layout.c:702:dht_layout_normalize] 0-c_glusterfs-dht: Found anomalies in
/pmd/2 (gfid = fa866805-ab22-4afe-8acb-528d15ce4117). Holes=1 overlaps=0
[2016-10-31 04:06:03.629857] W [fuse-bridge.c:980:fuse_fd_cbk] 0-glusterfs-fuse:
20794: OPENDIR() /loadmodules_norepl/CXC1725605_P93A001/cello/emasviews => -1
(Transport endpoint is not connected)
[2016-10-31 04:06:03.629896] W [fuse-resolve.c:149:fuse_resolve_gfid_cbk]
0-fuse: fa866805-ab22-4afe-8acb-528d15ce4117: failed to resolve (Transport
endpoint is not connected)
[2016-10-31 04:06:03.629990] E [fuse-bridge.c:787:fuse_getattr_resume]
0-glusterfs-fuse: 20785: GETATTR 70366685017568
(fa866805-ab22-4afe-8acb-528d15ce4117) resolution failed
[2016-10-31 04:06:03.630164] W [defaults.c:2212:default_releasedir]
(-->/usr/lib64/glusterfs/3.7.6/xlator/protocol/client.so(client_local_wipe-0x42d64)
[0x3fff86c9bc5c] -->/usr/lib64/libglusterfs.so.0(fd_unref-0x7ba98)
[0x3fff8abb7ad0] -->/usr/lib64/libglusterfs.so.0(default_releasedir-0x9d6d8)
[0x3fff8ab93c58] ) 0-fuse: xlator does not implement releasedir_cbk
[2016-10-31 04:06:03.630367] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] )))))
0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3)
op(LOOKUP(27)) called at 2016-10-31 04:06:03.625675 (xid=0x7f5c)
[2016-10-31 04:06:03.630368] W [MSGID: 114031]
[client-rpc-fops.c:1569:client3_3_fstat_cbk] 0-c_glusterfs-client-9: remote
operation failed [Transport endpoint is not connected]
[2016-10-31 04:06:03.630497] W [MSGID: 114031]
[client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-c_glusterfs-client-9: remote
operation failed. Path: /configuration/oamrdncy.cfg
(200b0bb2-d903-4924-b025-262d805c0f2f) [Transport endpoint is not connected]
[2016-10-31 04:06:03.630871] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs]
0-c_glusterfs-client-9: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2016-10-31 04:06:03.631488] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-c_glusterfs-client-9: Connected
to c_glusterfs-client-9, attached to remote volume
'/opt/lvmdir/c2/brick'.
[2016-10-31 04:06:03.631523] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-c_glusterfs-client-9: Server
and Client lk-version numbers are not same, reopening the fds
[2016-10-31 04:06:03.631555] I [MSGID: 114042]
[client-handshake.c:1056:client_post_handshake] 0-c_glusterfs-client-9: 6 fds
open - Delaying child_up until they are re-opened
[2016-10-31 04:06:03.631475] W [MSGID: 114031]
[client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-c_glusterfs-client-9: remote
operation failed. Path: /configuration/ethmac.cfg
(00000000-0000-0000-0000-000000000000) [Transport endpoint is not connected]
[2016-10-31 04:06:03.632256] I [MSGID: 114060]
[client-handshake.c:820:client3_3_reopendir_cbk] 0-c_glusterfs-client-9:
reopendir on <gfid:00000000-0000-0000-0000-000000000001> succeeded (fd =
1)
[2016-10-31 04:06:03.632728] I [MSGID: 114060]
[client-handshake.c:820:client3_3_reopendir_cbk] 0-c_glusterfs-client-9:
reopendir on <gfid:257add5a-9b87-4014-8a3a-09dd0a699eec> succeeded (fd =
2)
[2016-10-31 04:06:03.632790] I [MSGID: 114060]
[client-handshake.c:820:client3_3_reopendir_cbk] 0-c_glusterfs-client-9:
reopendir on <gfid:6fd47fe9-a566-47c7-8df7-e7a5e379101f> succeeded (fd =
3)
[2016-10-31 04:06:03.632847] I [MSGID: 114060]
[client-handshake.c:820:client3_3_reopendir_cbk] 0-c_glusterfs-client-9:
reopendir on <gfid:fa866805-ab22-4afe-8acb-528d15ce4117> succeeded (fd =
4)
[2016-10-31 04:06:03.632905] I [MSGID: 114060]
[client-handshake.c:820:client3_3_reopendir_cbk] 0-c_glusterfs-client-9:
reopendir on <gfid:139f5034-5b42-4fb8-9372-6dd7cac1e674> succeeded (fd =
5)
[2016-10-31 04:06:03.632935] I [MSGID: 114041]
[client-handshake.c:678:client_child_up_reopen_done] 0-c_glusterfs-client-9:
last fd open'd/lock-self-heal'd - notifying CHILD-UP
[2016-10-31 04:06:03.633028] I [MSGID: 108005] [afr-common.c:3841:afr_notify]
0-c_glusterfs-replicate-0: Subvolume 'c_glusterfs-client-9' came back
up; going online.
[2016-10-31 04:06:03.633386] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-c_glusterfs-client-9:
Server lk version = 1
[2016-10-31 04:06:03.634579] W [fuse-bridge.c:758:fuse_attr_cbk]
0-glusterfs-fuse: 20795: FSTAT() /configuration/oamrdncy.cfg => -1 (Transport
endpoint is not connected)
[2016-10-31 04:06:03.634889] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] )))))
0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3)
op(XATTROP(33)) called at 2016-10-31 04:06:03.625716 (xid=0x7f5d)
[2016-10-31 04:06:03.634950] W [MSGID: 114031]
[client-rpc-fops.c:1845:client3_3_xattrop_cbk] 0-c_glusterfs-client-9: remote
operation failed. Path: /configuration/oamrdncy.cfg
(200b0bb2-d903-4924-b025-262d805c0f2f)
[2016-10-31 04:06:03.635134] W [MSGID: 108008] [afr-read-txn.c:250:afr_read_txn]
0-c_glusterfs-replicate-0: Unreadable subvolume -1 found with event generation 3
for gfid 200b0bb2-d903-4924-b025-262d805c0f2f. (Possible split-brain)
[2016-10-31 04:06:03.635364] I [MSGID: 114018] [client.c:2042:client_rpc_notify]
0-c_glusterfs-client-9: disconnected from c_glusterfs-client-9. Client process
will keep trying to connect to glusterd until brick's port is available
[2016-10-31 04:06:03.635420] E [MSGID: 108006] [afr-common.c:3880:afr_notify]
0-c_glusterfs-replicate-0: All subvolumes are down. Going offline until atleast
one of them comes back up.
[2016-10-31 04:06:03.635762] I [MSGID: 108006]
[afr-common.c:4008:afr_local_init] 0-c_glusterfs-replicate-0: no subvolumes up
[2016-10-31 04:06:03.641160] W [fuse-bridge.c:462:fuse_entry_cbk]
0-glusterfs-fuse: 20808: LOOKUP() /license => -1 (Transport endpoint is not
connected)
The message "I [MSGID: 108006] [afr-common.c:4008:afr_local_init]
0-c_glusterfs-replicate-0: no subvolumes up" repeated 6 times between
[2016-10-31 04:06:03.640564] and [2016-10-31 04:06:03.641444]
[2016-10-31 04:06:03.641484] E [MSGID: 114031]
[client-rpc-fops.c:1621:client3_3_inodelk_cbk] 0-c_glusterfs-client-9: remote
operation failed [Invalid argument]
[2016-10-31 04:06:03.641562] I [MSGID: 108006]
[afr-common.c:4008:afr_local_init] 0-c_glusterfs-replicate-0: no subvolumes up
[2016-10-31 04:06:03.641573] E [MSGID: 108010]
[afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-c_glusterfs-replicate-0:
path=/configuration/oamrdncy.cfg gfid=200b0bb2-d903-4924-b025-262d805c0f2f:
unlock failed on subvolume c_glusterfs-client-9 with lock owner 6cd0c189ff3f0000
[Invalid argument]
[2016-10-31 04:06:03.641655] I [MSGID: 108006]
[afr-common.c:4008:afr_local_init] 0-c_glusterfs-replicate-0: no subvolumes up
[2016-10-31 04:06:03.641716] W [fuse-bridge.c:462:fuse_entry_cbk]
0-glusterfs-fuse: 20809: LOOKUP() /lost+found => -1 (Transport endpoint is
not connected)
[2016-10-31 04:06:03.642387] W [fuse-bridge.c:462:fuse_entry_cbk]
0-glusterfs-fuse: 20813: LOOKUP() /lost+found => -1 (Transport endpoint is
not connected)
...
Analysis
I think the disconnection and reconnectiong between client(glusterfs on A board)
and server(glusterfsd on A board) has a race condition as below.
process 1
process 2
[2016-10-31 04:06:03.626047] readv err
show that
the disconnect happen
between client and server.
[2016-10-31 04:06:03.627345] It will call
rpc_clnt_notify().
conn->connected = 0 in
rpc_clnt_connection_cleanup.
Creating a timer which will
reconnect after 10 second.
[2016-10-31 04:06:03.628381]
calling rpc_clnt_submit().
it will call rpc_transport_connect
because conn->connected is 0.
[2016-10-31 04:06:03.631488]
the new connection has been successful.
[2016-10-31 04:06:03.633028]
calling afr_notify(). In the case
GF_EVENT_CHILD_UP the new
new connection is set by "priv->child_up[idx] = 1"
[2016-10-31 04:06:03.635364] continue to
run client_rpc_notify()
will notify RPC_CLNT_DISCONNECT
in case RPC_CLNT_DISCONNECT.
[2016-10-31 04:06:03.635420] In
afr_notify() will set "priv->child_up[idx] = 0"?
which is the new connection,
in the case RPC_CLNT_DISCONNECT.
And the all the subvolume is down.
[2016-10-31 04:06:03.635762] All the
access will be failed because
no subvolumes up.And rpc_transport_connect()
will not be called because the conn->connected
has been set to 1 in process 2
Please help me to check my analysis.
Thanks,
Xin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161125/7e362756/attachment.html>
Hi Atin,
Do you mean that you have the workaround applicable now?
Or it will take time to design the workaround?
If you have workaround now, could you share it to me ?
Thanks,
Xin,
? 2016-11-24 19:12:07?"Atin Mukherjee" <amukherj at redhat.com>
???
Xin - I appreciate your patience. I'd need some more time to pick this item
up from my backlog. I believe we have a workaround applicable here too.
On Thu, 24 Nov 2016 at 14:24, songxin <songxin_1980 at 126.com> wrote:
Hi Atin,
Actually, the glusterfs is used in my project.
And our test team find this issue.
So I want to make sure that whether you plan to fix it.
if you have plan I will wait you because your method shoud be better than mine.
Thanks,
Xin
? 2016-11-21 10:00:36?"Atin Mukherjee" <atin.mukherjee83 at
gmail.com> ???
Hi Xin,
I've not got a chance to look into it yet. delete stale volume function is
in place to take care of wiping off volume configuration data which has been
deleted from the cluster. However we need to revisit this code to see if this
function is anymore needed given we recently added a validation to fail delete
request if one of the glusterd is down. I'll get back to you on this.
On Mon, 21 Nov 2016 at 07:24, songxin <songxin_1980 at 126.com> wrote:
Hi Atin,
Thank you for your support.
And any conclusions about this issue?
Thanks,
Xin
? 2016-11-16 20:59:05?"Atin Mukherjee" <amukherj at redhat.com>
???
On Tue, Nov 15, 2016 at 1:53 PM, songxin <songxin_1980 at 126.com> wrote:
ok, thank you.
? 2016-11-15 16:12:34?"Atin Mukherjee" <amukherj at redhat.com>
???
On Tue, Nov 15, 2016 at 12:47 PM, songxin <songxin_1980 at 126.com> wrote:
Hi Atin,
I think the root cause is in the function glusterd_import_friend_volume as
below.
int32_t
glusterd_import_friend_volume (dict_t *peer_data, size_t count)
{
...
ret = glusterd_volinfo_find (new_volinfo->volname, &old_volinfo);
if (0 == ret) {
(void) gd_check_and_update_rebalance_info (old_volinfo,
new_volinfo);
(void) glusterd_delete_stale_volume (old_volinfo, new_volinfo);
}
...
ret = glusterd_store_volinfo (new_volinfo,
GLUSTERD_VOLINFO_VER_AC_NONE);
if (ret) {
gf_msg (this->name, GF_LOG_ERROR, 0,
GD_MSG_VOLINFO_STORE_FAIL, "Failed to store "
"volinfo for volume %s",
new_volinfo->volname);
goto out;
}
...
}
glusterd_delete_stale_volume will remove the info and bricks/* and the
glusterd_store_volinfo will create the new one.
But if glusterd is killed before rename the info will is empty.
And glusterd will start failed because the infois empty in the next time you
start the glusterd.
Any idea? Atin?
Give me some time, will check it out, but reading at this analysis looks very
well possible if a volume is changed when the glusterd was done on node a and
when the same comes up during peer handshake we update the volinfo and during
that time glusterd goes down once again. I'll confirm it by tomorrow.
I checked the code and it does look like you have got the right RCA for the
issue which you simulated through those two scripts. However this can happen
even when you try to create a fresh volume and while glusterd tries to write the
content into the store and goes down before renaming the info.tmp file you get
into the same situation.
I'd really need to think through if this can be fixed. Suggestions are
always appreciated.
BTW, excellent work Xin!
Thanks,
Xin
? 2016-11-15 12:07:05?"Atin Mukherjee" <amukherj at redhat.com>
???
On Tue, Nov 15, 2016 at 8:58 AM, songxin <songxin_1980 at 126.com> wrote:
Hi Atin,
I have some clues about this issue.
I could reproduce this issue use the scrip that mentioned in
https://bugzilla.redhat.com/show_bug.cgi?id=1308487 .
I really appreciate your help in trying to nail down this issue. While I am at
your email and going through the code to figure out the possible cause for it,
unfortunately I don't see any script in the attachment of the bug. Could
you please cross check?
After I added some debug print,which like below, in glusterd-store.c and I found
that the /var/lib/glusterd/vols/xxx/info and /var/lib/glusterd/vols/xxx/bricks/*
are removed.
But other files in /var/lib/glusterd/vols/xxx/ will not be remove.
int32_t
glusterd_store_volinfo (glusterd_volinfo_t *volinfo, glusterd_volinfo_ver_ac_t
ac)
{
int32_t ret = -1;
GF_ASSERT (volinfo)
ret = access("/var/lib/glusterd/vols/gv0/info", F_OK);
if(ret < 0)
{
gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "info is not
exit(%d)", errno);
}
else
{
ret = stat("/var/lib/glusterd/vols/gv0/info",
&buf);
if(ret < 0)
{
gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "stat
info error");
}
else
{
gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "info
size is %lu, inode num is %lu", buf.st_size, buf.st_ino);
}
}
glusterd_perform_volinfo_version_action (volinfo, ac);
ret = glusterd_store_create_volume_dir (volinfo);
if (ret)
goto out;
...
}
So it is easy to understand why the info or 10.32.1.144.-opt-lvmdir-c2-brick
sometimes is empty.
It is becaue the info file is not exist, and it will be create by ?fd = open
(path, O_RDWR | O_CREAT | O_APPEND, 0600);? in function gf_store_handle_new.
And the info file is empty before rename.
So the info file is empty if glusterd shutdown before rename.
My question is following.
1.I did not find the point the info is removed.Could you tell me the point where
the info and /bricks/* are removed?
2.why the file info and bricks/* is removed?But other files in
var/lib/glusterd/vols/xxx/ are not be removed?
AFAIK, we never delete the info file and hence this file is opened with O_APPEND
flag. As I said I will go back and cross check the code once again.
Thanks,
Xin
? 2016-11-11 20:34:05?"Atin Mukherjee" <amukherj at redhat.com>
???
On Fri, Nov 11, 2016 at 4:00 PM, songxin <songxin_1980 at 126.com> wrote:
Hi Atin,
Thank you for your support.
Sincerely wait for your reply.
By the way, could you make sure that the issue, file info is empty, cause by
rename is interrupted in kernel?
As per my RCA on that bug, it looked to be.
Thanks,
Xin
? 2016-11-11 15:49:02?"Atin Mukherjee" <amukherj at redhat.com>
???
On Fri, Nov 11, 2016 at 1:15 PM, songxin <songxin_1980 at 126.com> wrote:
Hi Atin,
Thank you for your reply.
Actually it is very difficult to reproduce because I don't know when there
was an ongoing commit happening.It is just a coincidence.
But I want to make sure the root cause.
I'll give it a another try and see if this situation can be
simulated/reproduced and will keep you posted.
So I would be grateful if you could answer my questions below.
You said that "This issue is hit at part of the negative testing where
while gluster volume set was executed at the same point of time glusterd in
another instance was brought down. In the faulty node we could see
/var/lib/glusterd/vols/<volname>info file been empty whereas the info.tmp
file has the correct contents." in comment.
I have two questions for you.
1.Could you reproduce this issue by gluster volume set glusterd which was
brought down?
2.Could you be certain that this issue is cause by rename is interrupted in
kernel?
In my case there are two files, info and 10.32.1.144.-opt-lvmdir-c2-brick, are
both empty.
But in my view only one rename can be running at the same time because of the
big lock.
Why there are two files are empty?
Could rename("info.tmp", "info") and
rename("xxx-brick.tmp", "xxx-brick") be running in two
thread?
Thanks,
Xin
? 2016-11-11 15:27:03?"Atin Mukherjee" <amukherj at redhat.com>
???
On Fri, Nov 11, 2016 at 12:38 PM, songxin <songxin_1980 at 126.com> wrote:
Hi Atin,
Thank you for your reply.
As you said that the info file can only be changed in the
glusterd_store_volinfo() sequentially because of the big lock.
I have found the similar issue as below that you mentioned.
https://bugzilla.redhat.com/show_bug.cgi?id=1308487
Great, so this is what I was actually trying to refer in my first email that I
saw a similar issue. Have you got a chance to look at
https://bugzilla.redhat.com/show_bug.cgi?id=1308487#c4 ? But in your case, did
you try to bring down glusterd when there was an ongoing commit happening?
You said that "This issue is hit at part of the negative testing where
while gluster volume set was executed at the same point of time glusterd in
another instance was brought down. In the faulty node we could see
/var/lib/glusterd/vols/<volname>info file been empty whereas the info.tmp
file has the correct contents." in comment.
I have two questions for you.
1.Could you reproduce this issue by gluster volume set glusterd which was
brought down?
2.Could you be certain that this issue is cause by rename is interrupted in
kernel?
In my case there are two files, info and 10.32.1.144.-opt-lvmdir-c2-brick, are
both empty.
But in my view only one rename can be running at the same time because of the
big lock.
Why there are two files are empty?
Could rename("info.tmp", "info") and
rename("xxx-brick.tmp", "xxx-brick") be running in two
thread?
Thanks,
Xin
? 2016-11-11 14:36:40?"Atin Mukherjee" <amukherj at redhat.com>
???
On Fri, Nov 11, 2016 at 8:33 AM, songxin <songxin_1980 at 126.com> wrote:
Hi Atin,
Thank you for your reply.
I have two questions for you.
1.Are the two files info and info.tmp are only to be created or changed in
function glusterd_store_volinfo()? I did not find other point in which the two
file are changed.
If we are talking about info file volume then yes, the mentioned function
actually takes care of it.
2.I found that glusterd_store_volinfo() will be call in many point by
glusterd.Is there a problem of thread synchronization?If so, one thread may open
a same file info.tmp using O_TRUNC flag when another thread is writing the
info,tmp.Could this case happen?
In glusterd threads are big lock protected and I don't see a possibility
(theoretically) to have two glusterd_store_volinfo () calls at a given point of
time.
Thanks,
Xin
At 2016-11-10 21:41:06, "Atin Mukherjee" <amukherj at
redhat.com> wrote:
Did you run out of disk space by any chance? AFAIK, the code is like we write
new stuffs to .tmp file and rename it back to the original file. In case of a
disk space issue I expect both the files to be of non zero size. But having said
that I vaguely remember a similar issue (in the form of a bug or an email)
landed up once but we couldn't reproduce it, so something is wrong with the
atomic update here is what I guess. I'll be glad if you have a reproducer
for the same and then we can dig into it further.
On Thu, Nov 10, 2016 at 1:32 PM, songxin <songxin_1980 at 126.com> wrote:
Hi,
When I start the glusterd some error happened.
And the log is following.
[2016-11-08 07:58:34.989365] I [MSGID: 100030] [glusterfsd.c:2318:main]
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6 (args:
/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2016-11-08 07:58:34.998356] I [MSGID: 106478] [glusterd.c:1350:init]
0-management: Maximum allowed open file descriptors set to 65536
[2016-11-08 07:58:35.000667] I [MSGID: 106479] [glusterd.c:1399:init]
0-management: Using /system/glusterd as working directory
[2016-11-08 07:58:35.024508] I [MSGID: 106514]
[glusterd-store.c:2075:glusterd_restore_op_version] 0-management: Upgrade
detected. Setting op-version to minimum : 1
[2016-11-08 07:58:35.025356] E [MSGID: 106206]
[glusterd-store.c:2562:glusterd_store_update_volinfo] 0-management: Failed to
get next store iter
[2016-11-08 07:58:35.025401] E [MSGID: 106207]
[glusterd-store.c:2844:glusterd_store_retrieve_volume] 0-management: Failed to
update volinfo for c_glusterfs volume
[2016-11-08 07:58:35.025463] E [MSGID: 106201]
[glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management: Unable to
restore volume: c_glusterfs
[2016-11-08 07:58:35.025544] E [MSGID: 101019] [xlator.c:428:xlator_init]
0-management: Initialization of volume 'management' failed, review your
volfile again
[2016-11-08 07:58:35.025582] E [graph.c:322:glusterfs_graph_init] 0-management:
initializing translator failed
[2016-11-08 07:58:35.025629] E [graph.c:661:glusterfs_graph_activate] 0-graph:
init failed
[2016-11-08 07:58:35.026109] W [glusterfsd.c:1236:cleanup_and_exit]
(-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b260) [0x1000a718]
-->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b3b8) [0x1000a5a8]
-->/usr/sbin/glusterd(cleanup_and_exit-0x1c02c) [0x100098bc] ) 0-: received
signum (0), shutting down
And then I found that the size of vols/volume_name/info is 0.It cause glusterd
shutdown.
But I found that vols/volume_name_info.tmp is not 0.
And I found that there is a brick file vols/volume_name/bricks/xxxx.brick is 0,
but vols/volume_name/bricks/xxxx.brick.tmp is not 0.
I read the function code glusterd_store_volinfo () in glusterd-store.c .
I know that the info.tmp will be rename to info in function
glusterd_store_volume_atomic_update().
But my question is that why the info file is 0 but info.tmp is not 0.
Thanks,
Xin
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users
--
~ Atin (atinm)
--
~ Atin (atinm)
--
~ Atin (atinm)
--
~ Atin (atinm)
--
~ Atin (atinm)
--
~ Atin (atinm)
--
~ Atin (atinm)
--
~ Atin (atinm)
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users
--
--Atin
--
- Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161125/9c3ede1d/attachment.html>