thr3ads.net - Gluster users - [Gluster-users] Empty info file preventing glusterd from starting [May 2017]

If this information is useful, please help other people find it:
Share via:

Atin Mukherjee

2017-May-09 12:50 UTC

[Gluster-users] Empty info file preventing glusterd from starting

On Tue, May 9, 2017 at 6:10 PM, ABHISHEK PALIWAL <abhishpaliwal at
gmail.com>
wrote:
> Hi Atin,
>
> Thanks for your reply.
>
>
> Its urgent because this error is very rarely reproducible we have seen
> this 2 3 times in our system till now.
>
> We have delivery in near future so that we want it asap. Please try to
> review it internally.
>
I don't think your statements justified the reason of urgency as (a) you
have mentioned it to be *rarely* reproducible and (b) I am still waiting
for a real use case where glusterd will go through multiple restarts in a
loop?

> Regards,
> Abhishek
>
> On Tue, May 9, 2017 at 5:58 PM, Atin Mukherjee <amukherj at
redhat.com>
> wrote:
>
>>
>>
>> On Tue, May 9, 2017 at 3:37 PM, ABHISHEK PALIWAL <abhishpaliwal at
gmail.com
>> > wrote:
>>
>>> + Muthu-vingeshwaran
>>>
>>> On Tue, May 9, 2017 at 11:30 AM, ABHISHEK PALIWAL <
>>> abhishpaliwal at gmail.com> wrote:
>>>
>>>> Hi Atin/Team,
>>>>
>>>> We are using gluster-3.7.6 with setup of two brick and while
restart of
>>>> system I have seen that the glusterd daemon is getting failed
from start.
>>>>
>>>>
>>>> At the time of analyzing the logs from etc-glusterfs.......log
file I
>>>> have received the below logs
>>>>
>>>>
>>>> [2017-05-06 03:33:39.798087] I [MSGID: 100030]
[glusterfsd.c:2348:main]
>>>> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd
version 3.7.6
>>>> (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level
INFO)
>>>> [2017-05-06 03:33:39.807859] I [MSGID: 106478]
[glusterd.c:1350:init]
>>>> 0-management: Maximum allowed open file descriptors set to
65536
>>>> [2017-05-06 03:33:39.807974] I [MSGID: 106479]
[glusterd.c:1399:init]
>>>> 0-management: Using /system/glusterd as working directory
>>>> [2017-05-06 03:33:39.826833] I [MSGID: 106513]
>>>> [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd:
>>>> retrieved op-version: 30706
>>>> [2017-05-06 03:33:39.827515] E [MSGID: 106206]
>>>> [glusterd-store.c:2562:glusterd_store_update_volinfo]
0-management:
>>>> Failed to get next store iter
>>>> [2017-05-06 03:33:39.827563] E [MSGID: 106207]
>>>> [glusterd-store.c:2844:glusterd_store_retrieve_volume]
0-management:
>>>> Failed to update volinfo for c_glusterfs volume
>>>> [2017-05-06 03:33:39.827625] E [MSGID: 106201]
>>>> [glusterd-store.c:3042:glusterd_store_retrieve_volumes]
0-management:
>>>> Unable to restore volume: c_glusterfs
>>>> [2017-05-06 03:33:39.827722] E [MSGID: 101019]
>>>> [xlator.c:428:xlator_init] 0-management: Initialization of
volume
>>>> 'management' failed, review your volfile again
>>>> [2017-05-06 03:33:39.827762] E
[graph.c:322:glusterfs_graph_init]
>>>> 0-management: initializing translator failed
>>>> [2017-05-06 03:33:39.827784] E
[graph.c:661:glusterfs_graph_activate]
>>>> 0-graph: init failed
>>>> [2017-05-06 03:33:39.828396] W
[glusterfsd.c:1238:cleanup_and_exit]
>>>> (-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b0b8)
[0x1000a648]
>>>> -->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b210)
[0x1000a4d8]
>>>> -->/usr/sbin/glusterd(cleanup_and_exit-0x1beac) [0x100097ac]
) 0-:
>>>> received signum (0), shutting down
>>>>
>>>
>> Abhishek,
>>
>> This patch needs to be thoroughly reviewed to ensure that it
doesn't
>> cause any regression given this touches on the core store management
>> functionality of glusterd. AFAICT, we get into an empty info file only
when
>> volume set operation is executed and in parallel one of the glusterd
>> instance in other nodes have been brought down and whole sequence of
>> operation happens in a loop. The test case through which you can get
into
>> this situation is not something you'd hit in production. Please
help me to
>> understand the urgency here.
>>
>> Also in one of the earlier thread, I did mention the workaround of this
>> issue back to Xin through http://lists.gluster.org/piper
>> mail/gluster-users/2017-January/029600.html
>>
>> "If you end up in having a 0 byte info file you'd need to copy
the same info file from other node and put it there and restart glusterd"
>>
>>
>>>>
>>>> I have found one of the existing case is there and also
solution patch
>>>> is available but the status of that patch in "cannot
merge". Also the
>>>> "info" file is empty and "info.tmp" file
present in "lib/glusterd/vol"
>>>> directory.
>>>>
>>>> Below is the link of the existing case.
>>>>
>>>> https://review.gluster.org/#/c/16279/5
>>>>
>>>> please let me know what is the plan of community to provide the
>>>> solution of this problem and in which version.
>>>>
>>>> Regards
>>>> Abhishek Paliwal
>>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> Regards
>>> Abhishek Paliwal
>>>
>>
>>
>
>
> --
>
>
>
>
> Regards
> Abhishek Paliwal
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170509/d1ce6188/attachment.html>

ABHISHEK PALIWAL

2017-May-09 13:07 UTC

head link

[Gluster-users] Empty info file preventing glusterd from starting

Actually it is very risky if it will reproduce in production thats is why I
said it is on high priority as want to resolve it before production.

On Tue, May 9, 2017 at 6:20 PM, Atin Mukherjee <amukherj at redhat.com>
wrote:
>
>
> On Tue, May 9, 2017 at 6:10 PM, ABHISHEK PALIWAL <abhishpaliwal at
gmail.com>
> wrote:
>
>> Hi Atin,
>>
>> Thanks for your reply.
>>
>>
>> Its urgent because this error is very rarely reproducible we have seen
>> this 2 3 times in our system till now.
>>
>> We have delivery in near future so that we want it asap. Please try to
>> review it internally.
>>
>
> I don't think your statements justified the reason of urgency as (a)
you
> have mentioned it to be *rarely* reproducible and (b) I am still waiting
> for a real use case where glusterd will go through multiple restarts in a
> loop?
>
>
>> Regards,
>> Abhishek
>>
>> On Tue, May 9, 2017 at 5:58 PM, Atin Mukherjee <amukherj at
redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, May 9, 2017 at 3:37 PM, ABHISHEK PALIWAL <
>>> abhishpaliwal at gmail.com> wrote:
>>>
>>>> + Muthu-vingeshwaran
>>>>
>>>> On Tue, May 9, 2017 at 11:30 AM, ABHISHEK PALIWAL <
>>>> abhishpaliwal at gmail.com> wrote:
>>>>
>>>>> Hi Atin/Team,
>>>>>
>>>>> We are using gluster-3.7.6 with setup of two brick and
while restart
>>>>> of system I have seen that the glusterd daemon is getting
failed from start.
>>>>>
>>>>>
>>>>> At the time of analyzing the logs from
etc-glusterfs.......log file I
>>>>> have received the below logs
>>>>>
>>>>>
>>>>> [2017-05-06 03:33:39.798087] I [MSGID: 100030]
>>>>> [glusterfsd.c:2348:main] 0-/usr/sbin/glusterd: Started
running
>>>>> /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd
-p
>>>>> /var/run/glusterd.pid --log-level INFO)
>>>>> [2017-05-06 03:33:39.807859] I [MSGID: 106478]
[glusterd.c:1350:init]
>>>>> 0-management: Maximum allowed open file descriptors set to
65536
>>>>> [2017-05-06 03:33:39.807974] I [MSGID: 106479]
[glusterd.c:1399:init]
>>>>> 0-management: Using /system/glusterd as working directory
>>>>> [2017-05-06 03:33:39.826833] I [MSGID: 106513]
>>>>> [glusterd-store.c:2047:glusterd_restore_op_version]
0-glusterd:
>>>>> retrieved op-version: 30706
>>>>> [2017-05-06 03:33:39.827515] E [MSGID: 106206]
>>>>> [glusterd-store.c:2562:glusterd_store_update_volinfo]
0-management:
>>>>> Failed to get next store iter
>>>>> [2017-05-06 03:33:39.827563] E [MSGID: 106207]
>>>>> [glusterd-store.c:2844:glusterd_store_retrieve_volume]
0-management:
>>>>> Failed to update volinfo for c_glusterfs volume
>>>>> [2017-05-06 03:33:39.827625] E [MSGID: 106201]
>>>>> [glusterd-store.c:3042:glusterd_store_retrieve_volumes]
0-management:
>>>>> Unable to restore volume: c_glusterfs
>>>>> [2017-05-06 03:33:39.827722] E [MSGID: 101019]
>>>>> [xlator.c:428:xlator_init] 0-management: Initialization of
volume
>>>>> 'management' failed, review your volfile again
>>>>> [2017-05-06 03:33:39.827762] E
[graph.c:322:glusterfs_graph_init]
>>>>> 0-management: initializing translator failed
>>>>> [2017-05-06 03:33:39.827784] E
[graph.c:661:glusterfs_graph_activate]
>>>>> 0-graph: init failed
>>>>> [2017-05-06 03:33:39.828396] W
[glusterfsd.c:1238:cleanup_and_exit]
>>>>> (-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b0b8)
[0x1000a648]
>>>>> -->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b210)
[0x1000a4d8]
>>>>> -->/usr/sbin/glusterd(cleanup_and_exit-0x1beac)
[0x100097ac] ) 0-:
>>>>> received signum (0), shutting down
>>>>>
>>>>
>>> Abhishek,
>>>
>>> This patch needs to be thoroughly reviewed to ensure that it
doesn't
>>> cause any regression given this touches on the core store
management
>>> functionality of glusterd. AFAICT, we get into an empty info file
only when
>>> volume set operation is executed and in parallel one of the
glusterd
>>> instance in other nodes have been brought down and whole sequence
of
>>> operation happens in a loop. The test case through which you can
get into
>>> this situation is not something you'd hit in production. Please
help me to
>>> understand the urgency here.
>>>
>>> Also in one of the earlier thread, I did mention the workaround of
this
>>> issue back to Xin through http://lists.gluster.org/piper
>>> mail/gluster-users/2017-January/029600.html
>>>
>>> "If you end up in having a 0 byte info file you'd need to
copy the same info file from other node and put it there and restart
glusterd"
>>>
>>>
>>>>>
>>>>> I have found one of the existing case is there and also
solution patch
>>>>> is available but the status of that patch in "cannot
merge". Also the
>>>>> "info" file is empty and "info.tmp"
file present in "lib/glusterd/vol"
>>>>> directory.
>>>>>
>>>>> Below is the link of the existing case.
>>>>>
>>>>> https://review.gluster.org/#/c/16279/5
>>>>>
>>>>> please let me know what is the plan of community to provide
the
>>>>> solution of this problem and in which version.
>>>>>
>>>>> Regards
>>>>> Abhishek Paliwal
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>> Regards
>>>> Abhishek Paliwal
>>>>
>>>
>>>
>>
>>
>> --
>>
>>
>>
>>
>> Regards
>> Abhishek Paliwal
>>
>
>

-- 




Regards
Abhishek Paliwal
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170509/c23ee6c9/attachment.html>

Gluster users - May 2017 - Empty info file preventing glusterd from starting

[Gluster-users] Empty info file preventing glusterd from starting

[Gluster-users] Empty info file preventing glusterd from starting