On Tue, Nov 15, 2016 at 8:58 AM, songxin <songxin_1980 at 126.com> wrote:
> Hi Atin,
> I have some clues about this issue.
> I could reproduce this issue use the scrip that mentioned in
> https://bugzilla.redhat.com/show_bug.cgi?id=1308487 .
>
I really appreciate your help in trying to nail down this issue. While I am
at your email and going through the code to figure out the possible cause
for it, unfortunately I don't see any script in the attachment of the bug.
Could you please cross check?
>
> After I added some debug print,which like below, in glusterd-store.c and I
> found that the /var/lib/glusterd/vols/xxx/info and
/var/lib/glusterd/vols/xxx/bricks/*
> are removed.
> But other files in /var/lib/glusterd/vols/xxx/ will not be remove.
>
> int32_t
> glusterd_store_volinfo (glusterd_volinfo_t *volinfo,
> glusterd_volinfo_ver_ac_t ac)
> {
> int32_t ret = -1;
>
> GF_ASSERT (volinfo)
>
> ret = access("/var/lib/glusterd/vols/gv0/info", F_OK);
> if(ret < 0)
> {
> gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "info is
not
> exit(%d)", errno);
> }
> else
> {
> ret = stat("/var/lib/glusterd/vols/gv0/info",
&buf);
> if(ret < 0)
> {
> gf_msg (THIS->name, GF_LOG_ERROR, 0, 0,
"stat info
> error");
> }
> else
> {
> gf_msg (THIS->name, GF_LOG_ERROR, 0, 0,
"info size
> is %lu, inode num is %lu", buf.st_size, buf.st_ino);
> }
> }
>
> glusterd_perform_volinfo_version_action (volinfo, ac);
> ret = glusterd_store_create_volume_dir (volinfo);
> if (ret)
> goto out;
>
> ...
> }
>
> So it is easy to understand why the info or 10.32.1.144.-opt-lvmdir-c2-
> brick sometimes is empty.
> It is becaue the info file is not exist, and it will be create by ?fd >
open (path, O_RDWR | O_CREAT | O_APPEND, 0600);? in function
> gf_store_handle_new.
> And the info file is empty before rename.
> So the info file is empty if glusterd shutdown before rename.
>
>
> My question is following.
> 1.I did not find the point the info is removed.Could you tell me the point
> where the info and /bricks/* are removed?
> 2.why the file info and bricks/* is removed?But other files in
var/lib/glusterd/vols/xxx/
> are not be removed?
>
AFAIK, we never delete the info file and hence this file is opened with
O_APPEND flag. As I said I will go back and cross check the code once again.
> Thanks,
> Xin
>
>
> ? 2016-11-11 20:34:05?"Atin Mukherjee" <amukherj at
redhat.com> ???
>
>
>
> On Fri, Nov 11, 2016 at 4:00 PM, songxin <songxin_1980 at 126.com>
wrote:
>
>> Hi Atin,
>>
>> Thank you for your support.
>> Sincerely wait for your reply.
>>
>> By the way, could you make sure that the issue, file info is empty,
cause
>> by rename is interrupted in kernel?
>>
>
> As per my RCA on that bug, it looked to be.
>
>
>>
>> Thanks,
>> Xin
>>
>> ? 2016-11-11 15:49:02?"Atin Mukherjee" <amukherj at
redhat.com> ???
>>
>>
>>
>> On Fri, Nov 11, 2016 at 1:15 PM, songxin <songxin_1980 at
126.com> wrote:
>>
>>> Hi Atin,
>>> Thank you for your reply.
>>> Actually it is very difficult to reproduce because I don't know
when there
>>> was an ongoing commit happening.It is just a coincidence.
>>> But I want to make sure the root cause.
>>>
>>
>> I'll give it a another try and see if this situation can be
>> simulated/reproduced and will keep you posted.
>>
>>
>>>
>>> So I would be grateful if you could answer my questions below.
>>>
>>> You said that "This issue is hit at part of the negative
testing where
>>> while gluster volume set was executed at the same point of time
glusterd in
>>> another instance was brought down. In the faulty node we could see
>>> /var/lib/glusterd/vols/<volname>info file been empty whereas
the
>>> info.tmp file has the correct contents." in comment.
>>>
>>> I have two questions for you.
>>>
>>> 1.Could you reproduce this issue by gluster volume set glusterd
which was brought down?
>>> 2.Could you be certain that this issue is cause by rename is
interrupted in kernel?
>>>
>>> In my case there are two files, info and
10.32.1.144.-opt-lvmdir-c2-brick, are both empty.
>>> But in my view only one rename can be running at the same time
because of the big lock.
>>> Why there are two files are empty?
>>>
>>>
>>> Could rename("info.tmp", "info") and
rename("xxx-brick.tmp", "xxx-brick") be running in two
thread?
>>>
>>> Thanks,
>>> Xin
>>>
>>>
>>> ? 2016-11-11 15:27:03?"Atin Mukherjee" <amukherj at
redhat.com> ???
>>>
>>>
>>>
>>> On Fri, Nov 11, 2016 at 12:38 PM, songxin <songxin_1980 at
126.com> wrote:
>>>
>>>>
>>>> Hi Atin,
>>>> Thank you for your reply.
>>>>
>>>> As you said that the info file can only be changed in the
glusterd_store_volinfo()
>>>> sequentially because of the big lock.
>>>>
>>>> I have found the similar issue as below that you mentioned.
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487
>>>>
>>>
>>> Great, so this is what I was actually trying to refer in my first
email
>>> that I saw a similar issue. Have you got a chance to look at
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487#c4 ? But in
your
>>> case, did you try to bring down glusterd when there was an ongoing
commit
>>> happening?
>>>
>>>
>>>>
>>>> You said that "This issue is hit at part of the negative
testing where
>>>> while gluster volume set was executed at the same point of time
glusterd in
>>>> another instance was brought down. In the faulty node we could
see
>>>> /var/lib/glusterd/vols/<volname>info file been empty
whereas the
>>>> info.tmp file has the correct contents." in comment.
>>>>
>>>> I have two questions for you.
>>>>
>>>> 1.Could you reproduce this issue by gluster volume set glusterd
which was brought down?
>>>> 2.Could you be certain that this issue is cause by rename is
interrupted in kernel?
>>>>
>>>> In my case there are two files, info and
10.32.1.144.-opt-lvmdir-c2-brick, are both empty.
>>>> But in my view only one rename can be running at the same time
because of the big lock.
>>>> Why there are two files are empty?
>>>>
>>>>
>>>> Could rename("info.tmp", "info") and
rename("xxx-brick.tmp", "xxx-brick") be running in two
thread?
>>>>
>>>> Thanks,
>>>> Xin
>>>>
>>>>
>>>>
>>>>
>>>> ? 2016-11-11 14:36:40?"Atin Mukherjee" <amukherj
at redhat.com> ???
>>>>
>>>>
>>>>
>>>> On Fri, Nov 11, 2016 at 8:33 AM, songxin <songxin_1980 at
126.com> wrote:
>>>>
>>>>> Hi Atin,
>>>>>
>>>>> Thank you for your reply.
>>>>> I have two questions for you.
>>>>>
>>>>> 1.Are the two files info and info.tmp are only to be
created or
>>>>> changed in function glusterd_store_volinfo()? I did not
find other point in
>>>>> which the two file are changed.
>>>>>
>>>>
>>>> If we are talking about info file volume then yes, the
mentioned
>>>> function actually takes care of it.
>>>>
>>>>
>>>>> 2.I found that glusterd_store_volinfo() will be call in
many point by
>>>>> glusterd.Is there a problem of thread synchronization?If
so, one thread may
>>>>> open a same file info.tmp using O_TRUNC flag when another
thread is
>>>>> writing the info,tmp.Could this case happen?
>>>>>
>>>>
>>>> In glusterd threads are big lock protected and I don't see
a
>>>> possibility (theoretically) to have two glusterd_store_volinfo
() calls at
>>>> a given point of time.
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Xin
>>>>>
>>>>>
>>>>> At 2016-11-10 21:41:06, "Atin Mukherjee"
<amukherj at redhat.com> wrote:
>>>>>
>>>>> Did you run out of disk space by any chance? AFAIK, the
code is like
>>>>> we write new stuffs to .tmp file and rename it back to the
original file.
>>>>> In case of a disk space issue I expect both the files to be
of non zero
>>>>> size. But having said that I vaguely remember a similar
issue (in the form
>>>>> of a bug or an email) landed up once but we couldn't
reproduce it, so
>>>>> something is wrong with the atomic update here is what I
guess. I'll be
>>>>> glad if you have a reproducer for the same and then we can
dig into it
>>>>> further.
>>>>>
>>>>> On Thu, Nov 10, 2016 at 1:32 PM, songxin <songxin_1980
at 126.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>> When I start the glusterd some error happened.
>>>>>> And the log is following.
>>>>>>
>>>>>> [2016-11-08 07:58:34.989365] I [MSGID: 100030]
>>>>>> [glusterfsd.c:2318:main] 0-/usr/sbin/glusterd: Started
running
>>>>>> /usr/sbin/glusterd version 3.7.6 (args:
/usr/sbin/glusterd -p
>>>>>> /var/run/glusterd.pid --log-level INFO)
>>>>>> [2016-11-08 07:58:34.998356] I [MSGID: 106478]
[glusterd.c:1350:init]
>>>>>> 0-management: Maximum allowed open file descriptors set
to 65536
>>>>>> [2016-11-08 07:58:35.000667] I [MSGID: 106479]
[glusterd.c:1399:init]
>>>>>> 0-management: Using /system/glusterd as working
directory
>>>>>> [2016-11-08 07:58:35.024508] I [MSGID: 106514]
>>>>>> [glusterd-store.c:2075:glusterd_restore_op_version]
0-management:
>>>>>> Upgrade detected. Setting op-version to minimum : 1
>>>>>> *[2016-11-08 07:58:35.025356] E [MSGID: 106206]
>>>>>> [glusterd-store.c:2562:glusterd_store_update_volinfo]
0-management: Failed
>>>>>> to get next store iter *
>>>>>> *[2016-11-08 07:58:35.025401] E [MSGID: 106207]
>>>>>> [glusterd-store.c:2844:glusterd_store_retrieve_volume]
0-management: Failed
>>>>>> to update volinfo for c_glusterfs volume *
>>>>>> *[2016-11-08 07:58:35.025463] E [MSGID: 106201]
>>>>>> [glusterd-store.c:3042:glusterd_store_retrieve_volumes]
0-management:
>>>>>> Unable to restore volume: c_glusterfs *
>>>>>> *[2016-11-08 07:58:35.025544] E [MSGID: 101019]
>>>>>> [xlator.c:428:xlator_init] 0-management: Initialization
of volume
>>>>>> 'management' failed, review your volfile again
*
>>>>>> *[2016-11-08 07:58:35.025582] E
[graph.c:322:glusterfs_graph_init]
>>>>>> 0-management: initializing translator failed *
>>>>>> *[2016-11-08 07:58:35.025629] E
>>>>>> [graph.c:661:glusterfs_graph_activate] 0-graph: init
failed *
>>>>>> [2016-11-08 07:58:35.026109] W
[glusterfsd.c:1236:cleanup_and_exit]
>>>>>>
(-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b260) [0x1000a718]
>>>>>>
-->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b3b8) [0x1000a5a8]
>>>>>> -->/usr/sbin/glusterd(cleanup_and_exit-0x1c02c)
[0x100098bc] ) 0-:
>>>>>> received signum (0), shutting down
>>>>>>
>>>>>>
>>>>>> And then I found that the size of vols/volume_name/info
is 0.It cause
>>>>>> glusterd shutdown.
>>>>>> But I found that vols/volume_name_info.tmp is not 0.
>>>>>> And I found that there is a brick file
vols/volume_name/bricks/xxxx.brick
>>>>>> is 0, but vols/volume_name/bricks/xxxx.brick.tmp is not
0.
>>>>>>
>>>>>> I read the function code glusterd_store_volinfo () in
>>>>>> glusterd-store.c .
>>>>>> I know that the info.tmp will be rename to info in
function
>>>>>> glusterd_store_volume_atomic_update().
>>>>>>
>>>>>> But my question is that why the info file is 0 but
info.tmp is not 0.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Xin
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ~ Atin (atinm)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> ~ Atin (atinm)
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>>
>>
>>
>>
>
>
>
> --
>
> ~ Atin (atinm)
>
>
>
>
>
--
~ Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161115/a7c7a261/attachment.html>