Atin Mukherjee
2017-May-09 12:50 UTC
[Gluster-users] Empty info file preventing glusterd from starting
On Tue, May 9, 2017 at 6:10 PM, ABHISHEK PALIWAL <abhishpaliwal at gmail.com> wrote:> Hi Atin, > > Thanks for your reply. > > > Its urgent because this error is very rarely reproducible we have seen > this 2 3 times in our system till now. > > We have delivery in near future so that we want it asap. Please try to > review it internally. >I don't think your statements justified the reason of urgency as (a) you have mentioned it to be *rarely* reproducible and (b) I am still waiting for a real use case where glusterd will go through multiple restarts in a loop?> Regards, > Abhishek > > On Tue, May 9, 2017 at 5:58 PM, Atin Mukherjee <amukherj at redhat.com> > wrote: > >> >> >> On Tue, May 9, 2017 at 3:37 PM, ABHISHEK PALIWAL <abhishpaliwal at gmail.com >> > wrote: >> >>> + Muthu-vingeshwaran >>> >>> On Tue, May 9, 2017 at 11:30 AM, ABHISHEK PALIWAL < >>> abhishpaliwal at gmail.com> wrote: >>> >>>> Hi Atin/Team, >>>> >>>> We are using gluster-3.7.6 with setup of two brick and while restart of >>>> system I have seen that the glusterd daemon is getting failed from start. >>>> >>>> >>>> At the time of analyzing the logs from etc-glusterfs.......log file I >>>> have received the below logs >>>> >>>> >>>> [2017-05-06 03:33:39.798087] I [MSGID: 100030] [glusterfsd.c:2348:main] >>>> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6 >>>> (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) >>>> [2017-05-06 03:33:39.807859] I [MSGID: 106478] [glusterd.c:1350:init] >>>> 0-management: Maximum allowed open file descriptors set to 65536 >>>> [2017-05-06 03:33:39.807974] I [MSGID: 106479] [glusterd.c:1399:init] >>>> 0-management: Using /system/glusterd as working directory >>>> [2017-05-06 03:33:39.826833] I [MSGID: 106513] >>>> [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd: >>>> retrieved op-version: 30706 >>>> [2017-05-06 03:33:39.827515] E [MSGID: 106206] >>>> [glusterd-store.c:2562:glusterd_store_update_volinfo] 0-management: >>>> Failed to get next store iter >>>> [2017-05-06 03:33:39.827563] E [MSGID: 106207] >>>> [glusterd-store.c:2844:glusterd_store_retrieve_volume] 0-management: >>>> Failed to update volinfo for c_glusterfs volume >>>> [2017-05-06 03:33:39.827625] E [MSGID: 106201] >>>> [glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management: >>>> Unable to restore volume: c_glusterfs >>>> [2017-05-06 03:33:39.827722] E [MSGID: 101019] >>>> [xlator.c:428:xlator_init] 0-management: Initialization of volume >>>> 'management' failed, review your volfile again >>>> [2017-05-06 03:33:39.827762] E [graph.c:322:glusterfs_graph_init] >>>> 0-management: initializing translator failed >>>> [2017-05-06 03:33:39.827784] E [graph.c:661:glusterfs_graph_activate] >>>> 0-graph: init failed >>>> [2017-05-06 03:33:39.828396] W [glusterfsd.c:1238:cleanup_and_exit] >>>> (-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b0b8) [0x1000a648] >>>> -->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b210) [0x1000a4d8] >>>> -->/usr/sbin/glusterd(cleanup_and_exit-0x1beac) [0x100097ac] ) 0-: >>>> received signum (0), shutting down >>>> >>> >> Abhishek, >> >> This patch needs to be thoroughly reviewed to ensure that it doesn't >> cause any regression given this touches on the core store management >> functionality of glusterd. AFAICT, we get into an empty info file only when >> volume set operation is executed and in parallel one of the glusterd >> instance in other nodes have been brought down and whole sequence of >> operation happens in a loop. The test case through which you can get into >> this situation is not something you'd hit in production. Please help me to >> understand the urgency here. >> >> Also in one of the earlier thread, I did mention the workaround of this >> issue back to Xin through http://lists.gluster.org/piper >> mail/gluster-users/2017-January/029600.html >> >> "If you end up in having a 0 byte info file you'd need to copy the same info file from other node and put it there and restart glusterd" >> >> >>>> >>>> I have found one of the existing case is there and also solution patch >>>> is available but the status of that patch in "cannot merge". Also the >>>> "info" file is empty and "info.tmp" file present in "lib/glusterd/vol" >>>> directory. >>>> >>>> Below is the link of the existing case. >>>> >>>> https://review.gluster.org/#/c/16279/5 >>>> >>>> please let me know what is the plan of community to provide the >>>> solution of this problem and in which version. >>>> >>>> Regards >>>> Abhishek Paliwal >>>> >>> >>> >>> >>> -- >>> >>> >>> >>> >>> Regards >>> Abhishek Paliwal >>> >> >> > > > -- > > > > > Regards > Abhishek Paliwal >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170509/d1ce6188/attachment.html>
ABHISHEK PALIWAL
2017-May-09 13:07 UTC
[Gluster-users] Empty info file preventing glusterd from starting
Actually it is very risky if it will reproduce in production thats is why I said it is on high priority as want to resolve it before production. On Tue, May 9, 2017 at 6:20 PM, Atin Mukherjee <amukherj at redhat.com> wrote:> > > On Tue, May 9, 2017 at 6:10 PM, ABHISHEK PALIWAL <abhishpaliwal at gmail.com> > wrote: > >> Hi Atin, >> >> Thanks for your reply. >> >> >> Its urgent because this error is very rarely reproducible we have seen >> this 2 3 times in our system till now. >> >> We have delivery in near future so that we want it asap. Please try to >> review it internally. >> > > I don't think your statements justified the reason of urgency as (a) you > have mentioned it to be *rarely* reproducible and (b) I am still waiting > for a real use case where glusterd will go through multiple restarts in a > loop? > > >> Regards, >> Abhishek >> >> On Tue, May 9, 2017 at 5:58 PM, Atin Mukherjee <amukherj at redhat.com> >> wrote: >> >>> >>> >>> On Tue, May 9, 2017 at 3:37 PM, ABHISHEK PALIWAL < >>> abhishpaliwal at gmail.com> wrote: >>> >>>> + Muthu-vingeshwaran >>>> >>>> On Tue, May 9, 2017 at 11:30 AM, ABHISHEK PALIWAL < >>>> abhishpaliwal at gmail.com> wrote: >>>> >>>>> Hi Atin/Team, >>>>> >>>>> We are using gluster-3.7.6 with setup of two brick and while restart >>>>> of system I have seen that the glusterd daemon is getting failed from start. >>>>> >>>>> >>>>> At the time of analyzing the logs from etc-glusterfs.......log file I >>>>> have received the below logs >>>>> >>>>> >>>>> [2017-05-06 03:33:39.798087] I [MSGID: 100030] >>>>> [glusterfsd.c:2348:main] 0-/usr/sbin/glusterd: Started running >>>>> /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd -p >>>>> /var/run/glusterd.pid --log-level INFO) >>>>> [2017-05-06 03:33:39.807859] I [MSGID: 106478] [glusterd.c:1350:init] >>>>> 0-management: Maximum allowed open file descriptors set to 65536 >>>>> [2017-05-06 03:33:39.807974] I [MSGID: 106479] [glusterd.c:1399:init] >>>>> 0-management: Using /system/glusterd as working directory >>>>> [2017-05-06 03:33:39.826833] I [MSGID: 106513] >>>>> [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd: >>>>> retrieved op-version: 30706 >>>>> [2017-05-06 03:33:39.827515] E [MSGID: 106206] >>>>> [glusterd-store.c:2562:glusterd_store_update_volinfo] 0-management: >>>>> Failed to get next store iter >>>>> [2017-05-06 03:33:39.827563] E [MSGID: 106207] >>>>> [glusterd-store.c:2844:glusterd_store_retrieve_volume] 0-management: >>>>> Failed to update volinfo for c_glusterfs volume >>>>> [2017-05-06 03:33:39.827625] E [MSGID: 106201] >>>>> [glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management: >>>>> Unable to restore volume: c_glusterfs >>>>> [2017-05-06 03:33:39.827722] E [MSGID: 101019] >>>>> [xlator.c:428:xlator_init] 0-management: Initialization of volume >>>>> 'management' failed, review your volfile again >>>>> [2017-05-06 03:33:39.827762] E [graph.c:322:glusterfs_graph_init] >>>>> 0-management: initializing translator failed >>>>> [2017-05-06 03:33:39.827784] E [graph.c:661:glusterfs_graph_activate] >>>>> 0-graph: init failed >>>>> [2017-05-06 03:33:39.828396] W [glusterfsd.c:1238:cleanup_and_exit] >>>>> (-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b0b8) [0x1000a648] >>>>> -->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b210) [0x1000a4d8] >>>>> -->/usr/sbin/glusterd(cleanup_and_exit-0x1beac) [0x100097ac] ) 0-: >>>>> received signum (0), shutting down >>>>> >>>> >>> Abhishek, >>> >>> This patch needs to be thoroughly reviewed to ensure that it doesn't >>> cause any regression given this touches on the core store management >>> functionality of glusterd. AFAICT, we get into an empty info file only when >>> volume set operation is executed and in parallel one of the glusterd >>> instance in other nodes have been brought down and whole sequence of >>> operation happens in a loop. The test case through which you can get into >>> this situation is not something you'd hit in production. Please help me to >>> understand the urgency here. >>> >>> Also in one of the earlier thread, I did mention the workaround of this >>> issue back to Xin through http://lists.gluster.org/piper >>> mail/gluster-users/2017-January/029600.html >>> >>> "If you end up in having a 0 byte info file you'd need to copy the same info file from other node and put it there and restart glusterd" >>> >>> >>>>> >>>>> I have found one of the existing case is there and also solution patch >>>>> is available but the status of that patch in "cannot merge". Also the >>>>> "info" file is empty and "info.tmp" file present in "lib/glusterd/vol" >>>>> directory. >>>>> >>>>> Below is the link of the existing case. >>>>> >>>>> https://review.gluster.org/#/c/16279/5 >>>>> >>>>> please let me know what is the plan of community to provide the >>>>> solution of this problem and in which version. >>>>> >>>>> Regards >>>>> Abhishek Paliwal >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> Regards >>>> Abhishek Paliwal >>>> >>> >>> >> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> > >-- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170509/c23ee6c9/attachment.html>