On Fri, Nov 11, 2016 at 4:00 PM, songxin <songxin_1980 at 126.com> wrote:
> Hi Atin,
>
> Thank you for your support.
> Sincerely wait for your reply.
>
> By the way, could you make sure that the issue, file info is empty, cause
> by rename is interrupted in kernel?
>
As per my RCA on that bug, it looked to be.
>
> Thanks,
> Xin
>
> ? 2016-11-11 15:49:02?"Atin Mukherjee" <amukherj at
redhat.com> ???
>
>
>
> On Fri, Nov 11, 2016 at 1:15 PM, songxin <songxin_1980 at 126.com>
wrote:
>
>> Hi Atin,
>> Thank you for your reply.
>> Actually it is very difficult to reproduce because I don't know
when there
>> was an ongoing commit happening.It is just a coincidence.
>> But I want to make sure the root cause.
>>
>
> I'll give it a another try and see if this situation can be
> simulated/reproduced and will keep you posted.
>
>
>>
>> So I would be grateful if you could answer my questions below.
>>
>> You said that "This issue is hit at part of the negative testing
where
>> while gluster volume set was executed at the same point of time
glusterd in
>> another instance was brought down. In the faulty node we could see
>> /var/lib/glusterd/vols/<volname>info file been empty whereas the
>> info.tmp file has the correct contents." in comment.
>>
>> I have two questions for you.
>>
>> 1.Could you reproduce this issue by gluster volume set glusterd which
was brought down?
>> 2.Could you be certain that this issue is cause by rename is
interrupted in kernel?
>>
>> In my case there are two files, info and
10.32.1.144.-opt-lvmdir-c2-brick, are both empty.
>> But in my view only one rename can be running at the same time because
of the big lock.
>> Why there are two files are empty?
>>
>>
>> Could rename("info.tmp", "info") and
rename("xxx-brick.tmp", "xxx-brick") be running in two
thread?
>>
>> Thanks,
>> Xin
>>
>>
>> ? 2016-11-11 15:27:03?"Atin Mukherjee" <amukherj at
redhat.com> ???
>>
>>
>>
>> On Fri, Nov 11, 2016 at 12:38 PM, songxin <songxin_1980 at
126.com> wrote:
>>
>>>
>>> Hi Atin,
>>> Thank you for your reply.
>>>
>>> As you said that the info file can only be changed in the
glusterd_store_volinfo()
>>> sequentially because of the big lock.
>>>
>>> I have found the similar issue as below that you mentioned.
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487
>>>
>>
>> Great, so this is what I was actually trying to refer in my first email
>> that I saw a similar issue. Have you got a chance to look at
>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487#c4 ? But in your
>> case, did you try to bring down glusterd when there was an ongoing
commit
>> happening?
>>
>>
>>>
>>> You said that "This issue is hit at part of the negative
testing where
>>> while gluster volume set was executed at the same point of time
glusterd in
>>> another instance was brought down. In the faulty node we could see
>>> /var/lib/glusterd/vols/<volname>info file been empty whereas
the
>>> info.tmp file has the correct contents." in comment.
>>>
>>> I have two questions for you.
>>>
>>> 1.Could you reproduce this issue by gluster volume set glusterd
which was brought down?
>>> 2.Could you be certain that this issue is cause by rename is
interrupted in kernel?
>>>
>>> In my case there are two files, info and
10.32.1.144.-opt-lvmdir-c2-brick, are both empty.
>>> But in my view only one rename can be running at the same time
because of the big lock.
>>> Why there are two files are empty?
>>>
>>>
>>> Could rename("info.tmp", "info") and
rename("xxx-brick.tmp", "xxx-brick") be running in two
thread?
>>>
>>> Thanks,
>>> Xin
>>>
>>>
>>>
>>>
>>> ? 2016-11-11 14:36:40?"Atin Mukherjee" <amukherj at
redhat.com> ???
>>>
>>>
>>>
>>> On Fri, Nov 11, 2016 at 8:33 AM, songxin <songxin_1980 at
126.com> wrote:
>>>
>>>> Hi Atin,
>>>>
>>>> Thank you for your reply.
>>>> I have two questions for you.
>>>>
>>>> 1.Are the two files info and info.tmp are only to be created or
changed
>>>> in function glusterd_store_volinfo()? I did not find other
point in which
>>>> the two file are changed.
>>>>
>>>
>>> If we are talking about info file volume then yes, the mentioned
>>> function actually takes care of it.
>>>
>>>
>>>> 2.I found that glusterd_store_volinfo() will be call in many
point by
>>>> glusterd.Is there a problem of thread synchronization?If so,
one thread may
>>>> open a same file info.tmp using O_TRUNC flag when another
thread is
>>>> writing the info,tmp.Could this case happen?
>>>>
>>>
>>> In glusterd threads are big lock protected and I don't see a
>>> possibility (theoretically) to have two glusterd_store_volinfo ()
calls at
>>> a given point of time.
>>>
>>>
>>>>
>>>> Thanks,
>>>> Xin
>>>>
>>>>
>>>> At 2016-11-10 21:41:06, "Atin Mukherjee" <amukherj
at redhat.com> wrote:
>>>>
>>>> Did you run out of disk space by any chance? AFAIK, the code is
like we
>>>> write new stuffs to .tmp file and rename it back to the
original file. In
>>>> case of a disk space issue I expect both the files to be of non
zero size.
>>>> But having said that I vaguely remember a similar issue (in the
form of a
>>>> bug or an email) landed up once but we couldn't reproduce
it, so something
>>>> is wrong with the atomic update here is what I guess. I'll
be glad if you
>>>> have a reproducer for the same and then we can dig into it
further.
>>>>
>>>> On Thu, Nov 10, 2016 at 1:32 PM, songxin <songxin_1980 at
126.com> wrote:
>>>>
>>>>> Hi,
>>>>> When I start the glusterd some error happened.
>>>>> And the log is following.
>>>>>
>>>>> [2016-11-08 07:58:34.989365] I [MSGID: 100030]
>>>>> [glusterfsd.c:2318:main] 0-/usr/sbin/glusterd: Started
running
>>>>> /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd
-p
>>>>> /var/run/glusterd.pid --log-level INFO)
>>>>> [2016-11-08 07:58:34.998356] I [MSGID: 106478]
[glusterd.c:1350:init]
>>>>> 0-management: Maximum allowed open file descriptors set to
65536
>>>>> [2016-11-08 07:58:35.000667] I [MSGID: 106479]
[glusterd.c:1399:init]
>>>>> 0-management: Using /system/glusterd as working directory
>>>>> [2016-11-08 07:58:35.024508] I [MSGID: 106514]
>>>>> [glusterd-store.c:2075:glusterd_restore_op_version]
0-management:
>>>>> Upgrade detected. Setting op-version to minimum : 1
>>>>> *[2016-11-08 07:58:35.025356] E [MSGID: 106206]
>>>>> [glusterd-store.c:2562:glusterd_store_update_volinfo]
0-management: Failed
>>>>> to get next store iter *
>>>>> *[2016-11-08 07:58:35.025401] E [MSGID: 106207]
>>>>> [glusterd-store.c:2844:glusterd_store_retrieve_volume]
0-management: Failed
>>>>> to update volinfo for c_glusterfs volume *
>>>>> *[2016-11-08 07:58:35.025463] E [MSGID: 106201]
>>>>> [glusterd-store.c:3042:glusterd_store_retrieve_volumes]
0-management:
>>>>> Unable to restore volume: c_glusterfs *
>>>>> *[2016-11-08 07:58:35.025544] E [MSGID: 101019]
>>>>> [xlator.c:428:xlator_init] 0-management: Initialization of
volume
>>>>> 'management' failed, review your volfile again *
>>>>> *[2016-11-08 07:58:35.025582] E
[graph.c:322:glusterfs_graph_init]
>>>>> 0-management: initializing translator failed *
>>>>> *[2016-11-08 07:58:35.025629] E
[graph.c:661:glusterfs_graph_activate]
>>>>> 0-graph: init failed *
>>>>> [2016-11-08 07:58:35.026109] W
[glusterfsd.c:1236:cleanup_and_exit]
>>>>> (-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b260)
[0x1000a718]
>>>>> -->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b3b8)
[0x1000a5a8]
>>>>> -->/usr/sbin/glusterd(cleanup_and_exit-0x1c02c)
[0x100098bc] ) 0-:
>>>>> received signum (0), shutting down
>>>>>
>>>>>
>>>>> And then I found that the size of vols/volume_name/info is
0.It cause
>>>>> glusterd shutdown.
>>>>> But I found that vols/volume_name_info.tmp is not 0.
>>>>> And I found that there is a brick file
vols/volume_name/bricks/xxxx.brick
>>>>> is 0, but vols/volume_name/bricks/xxxx.brick.tmp is not 0.
>>>>>
>>>>> I read the function code glusterd_store_volinfo () in
>>>>> glusterd-store.c .
>>>>> I know that the info.tmp will be rename to info in function
>>>>> glusterd_store_volume_atomic_update().
>>>>>
>>>>> But my question is that why the info file is 0 but info.tmp
is not 0.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Xin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ~ Atin (atinm)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> ~ Atin (atinm)
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>>
>>
>>
>>
>
>
>
> --
>
> ~ Atin (atinm)
>
>
>
>
>
--
~ Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161111/4ba7eb93/attachment.html>