We do use few static code analyser tools like coverity, cppcheck. Vijay
would have more details on how frequently we run them. Is there any
specific issue you want to report about or this is just a generic query you
wanted to put up.
On Tue, Jan 17, 2017 at 7:42 AM, songxin <songxin_1980 at 126.com> wrote:
>
> Hi Atin,
> Thank your for your reply.It is very helpful for me.
> And I have a question for you.
>
> I want to know that if gluster community ever run code static tool, such
> as valgrind, to check the glusterfs.
>
> Thanks,
> Xin
>
>
>
>
> At 2017-01-10 12:02:12, "Atin Mukherjee" <amukherj at
redhat.com> wrote:
>
> Xin,
>
> There is a patch [1] attempted to handle this case which is under review.
>
> [1] http://review.gluster.org/#/c/16279
>
>
>
> On Tue, Jan 10, 2017 at 7:15 AM, songxin <songxin_1980 at 126.com>
wrote:
>
>> Hi Atin,
>>
>> Have you fix this issue?
>>
>> Thanks,
>> Xin
>>
>>
>>
>> ? 2016-11-25 15:46:25?"Atin Mukherjee" <amukherj at
redhat.com> ???
>>
>>
>>
>> On Fri, Nov 25, 2016 at 1:14 PM, songxin <songxin_1980 at
126.com> wrote:
>>
>>> Hi Atin,
>>> It seems that this workaround should be done by manual.
>>> Is that right?
>>> And even the files in bricks/* may be empty too.
>>>
>>
>> Yes, that's right
>>
>>
>>>
>>> Do you have a workaround, which is implemented in glusterfs code?
>>>
>>
>> Workaround is by nature manual and anything to be done through code
>> should be considered as fix not work around :)
>>
>>
>>>
>>> Thanks,
>>> Xin
>>>
>>>
>>>
>>>
>>>
>>> ? 2016-11-25 15:36:29?"Atin Mukherjee" <amukherj at
redhat.com> ???
>>>
>>>
>>>
>>> On Fri, Nov 25, 2016 at 12:06 PM, songxin <songxin_1980 at
126.com> wrote:
>>>
>>>> Hi Atin,
>>>> Do you mean that you have the workaround applicable now?
>>>> Or it will take time to design the workaround?
>>>>
>>>> If you have workaround now, could you share it to me ?
>>>>
>>>
>>> If you end up in having a 0 byte info file you'd need to copy
the same
>>> info file from other node and put it there and restart glusterd.
>>>
>>>
>>>>
>>>> Thanks,
>>>> Xin,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ? 2016-11-24 19:12:07?"Atin Mukherjee" <amukherj
at redhat.com> ???
>>>>
>>>> Xin - I appreciate your patience. I'd need some more time
to pick this
>>>> item up from my backlog. I believe we have a workaround
applicable here too.
>>>>
>>>> On Thu, 24 Nov 2016 at 14:24, songxin <songxin_1980 at
126.com> wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hi Atin,
>>>>> Actually, the glusterfs is used in my project.
>>>>> And our test team find this issue.
>>>>> So I want to make sure that whether you plan to fix it.
>>>>> if you have plan I will wait you because your method shoud
be better
>>>>> than mine.
>>>>>
>>>>> Thanks,
>>>>> Xin
>>>>>
>>>>>
>>>>> ? 2016-11-21 10:00:36?"Atin Mukherjee"
<atin.mukherjee83 at gmail.com>
>>>>> ???
>>>>>
>>>>> Hi Xin,
>>>>>
>>>>> I've not got a chance to look into it yet. delete stale
volume
>>>>> function is in place to take care of wiping off volume
configuration data
>>>>> which has been deleted from the cluster. However we need to
revisit this
>>>>> code to see if this function is anymore needed given we
recently added a
>>>>> validation to fail delete request if one of the glusterd is
down. I'll get
>>>>> back to you on this.
>>>>>
>>>>> On Mon, 21 Nov 2016 at 07:24, songxin <songxin_1980 at
126.com> wrote:
>>>>>
>>>>> Hi Atin,
>>>>> Thank you for your support.
>>>>>
>>>>> And any conclusions about this issue?
>>>>>
>>>>> Thanks,
>>>>> Xin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ? 2016-11-16 20:59:05?"Atin Mukherjee"
<amukherj at redhat.com> ???
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 15, 2016 at 1:53 PM, songxin <songxin_1980
at 126.com> wrote:
>>>>>
>>>>> ok, thank you.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ? 2016-11-15 16:12:34?"Atin Mukherjee"
<amukherj at redhat.com> ???
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 15, 2016 at 12:47 PM, songxin <songxin_1980
at 126.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>> Hi Atin,
>>>>>
>>>>> I think the root cause is in the function
>>>>> glusterd_import_friend_volume as below.
>>>>>
>>>>> int32_t
>>>>> glusterd_import_friend_volume (dict_t *peer_data, size_t
count)
>>>>> {
>>>>> ...
>>>>> ret = glusterd_volinfo_find
(new_volinfo->volname,
>>>>> &old_volinfo);
>>>>> if (0 == ret) {
>>>>> (void) gd_check_and_update_rebalance_info
>>>>> (old_volinfo,
>>>>>
n
>>>>> ew_volinfo);
>>>>> (void) glusterd_delete_stale_volume
(old_volinfo,
>>>>> new_volinfo);
>>>>> }
>>>>> ...
>>>>> ret = glusterd_store_volinfo (new_volinfo,
>>>>> GLUSTERD_VOLINFO_VER_AC_NONE);
>>>>> if (ret) {
>>>>> gf_msg (this->name, GF_LOG_ERROR, 0,
>>>>> GD_MSG_VOLINFO_STORE_FAIL,
"Failed to store "
>>>>> "volinfo for volume %s",
>>>>> new_volinfo->volname);
>>>>> goto out;
>>>>> }
>>>>> ...
>>>>> }
>>>>>
>>>>> glusterd_delete_stale_volume will remove the info and
bricks/* and the
>>>>> glusterd_store_volinfo will create the new one.
>>>>> But if glusterd is killed before rename the info will is
empty.
>>>>>
>>>>> And glusterd will start failed because the infois empty in
the next
>>>>> time you start the glusterd.
>>>>>
>>>>> Any idea? Atin?
>>>>>
>>>>>
>>>>> Give me some time, will check it out, but reading at this
analysis
>>>>> looks very well possible if a volume is changed when the
glusterd was done
>>>>> on node a and when the same comes up during peer handshake
we update the
>>>>> volinfo and during that time glusterd goes down once again.
I'll confirm it
>>>>> by tomorrow.
>>>>>
>>>>>
>>>>> I checked the code and it does look like you have got the
right RCA
>>>>> for the issue which you simulated through those two
scripts. However this
>>>>> can happen even when you try to create a fresh volume and
while glusterd
>>>>> tries to write the content into the store and goes down
before renaming the
>>>>> info.tmp file you get into the same situation.
>>>>>
>>>>> I'd really need to think through if this can be fixed.
Suggestions are
>>>>> always appreciated.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> BTW, excellent work Xin!
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Xin
>>>>>
>>>>>
>>>>> ? 2016-11-15 12:07:05?"Atin Mukherjee"
<amukherj at redhat.com> ???
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 15, 2016 at 8:58 AM, songxin <songxin_1980
at 126.com> wrote:
>>>>>
>>>>> Hi Atin,
>>>>> I have some clues about this issue.
>>>>> I could reproduce this issue use the scrip that mentioned
in
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487 .
>>>>>
>>>>>
>>>>> I really appreciate your help in trying to nail down this
issue. While
>>>>> I am at your email and going through the code to figure out
the possible
>>>>> cause for it, unfortunately I don't see any script in
the attachment of the
>>>>> bug. Could you please cross check?
>>>>>
>>>>>
>>>>>
>>>>> After I added some debug print,which like below, in
glusterd-store.c
>>>>> and I found that the /var/lib/glusterd/vols/xxx/info and
>>>>> /var/lib/glusterd/vols/xxx/bricks/* are removed.
>>>>> But other files in /var/lib/glusterd/vols/xxx/ will not be
remove.
>>>>>
>>>>> int32_t
>>>>> glusterd_store_volinfo (glusterd_volinfo_t *volinfo,
>>>>> glusterd_volinfo_ver_ac_t ac)
>>>>> {
>>>>> int32_t ret = -1;
>>>>>
>>>>> GF_ASSERT (volinfo)
>>>>>
>>>>> ret =
access("/var/lib/glusterd/vols/gv0/info", F_OK);
>>>>> if(ret < 0)
>>>>> {
>>>>> gf_msg (THIS->name, GF_LOG_ERROR, 0, 0,
"info is not
>>>>> exit(%d)", errno);
>>>>> }
>>>>> else
>>>>> {
>>>>> ret =
stat("/var/lib/glusterd/vols/gv0/info", &buf);
>>>>> if(ret < 0)
>>>>> {
>>>>> gf_msg (THIS->name,
GF_LOG_ERROR, 0, 0, "stat
>>>>> info error");
>>>>> }
>>>>> else
>>>>> {
>>>>> gf_msg (THIS->name,
GF_LOG_ERROR, 0, 0, "info
>>>>> size is %lu, inode num is %lu", buf.st_size,
buf.st_ino);
>>>>> }
>>>>> }
>>>>>
>>>>> glusterd_perform_volinfo_version_action (volinfo,
ac);
>>>>> ret = glusterd_store_create_volume_dir (volinfo);
>>>>> if (ret)
>>>>> goto out;
>>>>>
>>>>> ...
>>>>> }
>>>>>
>>>>> So it is easy to understand why the info or
>>>>> 10.32.1.144.-opt-lvmdir-c2-brick sometimes is empty.
>>>>> It is becaue the info file is not exist, and it will be
create by ?fd
>>>>> = open (path, O_RDWR | O_CREAT | O_APPEND, 0600);? in
function
>>>>> gf_store_handle_new.
>>>>> And the info file is empty before rename.
>>>>> So the info file is empty if glusterd shutdown before
rename.
>>>>>
>>>>>
>>>>>
>>>>> My question is following.
>>>>> 1.I did not find the point the info is removed.Could you
tell me the
>>>>> point where the info and /bricks/* are removed?
>>>>> 2.why the file info and bricks/* is removed?But other files
in var/lib/glusterd/vols/xxx/
>>>>> are not be removed?
>>>>>
>>>>>
>>>>> AFAIK, we never delete the info file and hence this file is
opened
>>>>> with O_APPEND flag. As I said I will go back and cross
check the code once
>>>>> again.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Xin
>>>>>
>>>>>
>>>>> ? 2016-11-11 20:34:05?"Atin Mukherjee"
<amukherj at redhat.com> ???
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Nov 11, 2016 at 4:00 PM, songxin <songxin_1980
at 126.com> wrote:
>>>>>
>>>>> Hi Atin,
>>>>>
>>>>> Thank you for your support.
>>>>> Sincerely wait for your reply.
>>>>>
>>>>> By the way, could you make sure that the issue, file info
is empty,
>>>>> cause by rename is interrupted in kernel?
>>>>>
>>>>>
>>>>> As per my RCA on that bug, it looked to be.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Xin
>>>>>
>>>>> ? 2016-11-11 15:49:02?"Atin Mukherjee"
<amukherj at redhat.com> ???
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Nov 11, 2016 at 1:15 PM, songxin <songxin_1980
at 126.com> wrote:
>>>>>
>>>>> Hi Atin,
>>>>> Thank you for your reply.
>>>>> Actually it is very difficult to reproduce because I
don't know when there
>>>>> was an ongoing commit happening.It is just a coincidence.
>>>>> But I want to make sure the root cause.
>>>>>
>>>>>
>>>>> I'll give it a another try and see if this situation
can be
>>>>> simulated/reproduced and will keep you posted.
>>>>>
>>>>>
>>>>>
>>>>> So I would be grateful if you could answer my questions
below.
>>>>>
>>>>> You said that "This issue is hit at part of the
negative testing where
>>>>> while gluster volume set was executed at the same point of
time glusterd in
>>>>> another instance was brought down. In the faulty node we
could see
>>>>> /var/lib/glusterd/vols/<volname>info file been empty
whereas the
>>>>> info.tmp file has the correct contents." in comment.
>>>>>
>>>>> I have two questions for you.
>>>>>
>>>>> 1.Could you reproduce this issue by gluster volume set
glusterd which was brought down?
>>>>> 2.Could you be certain that this issue is cause by rename
is interrupted in kernel?
>>>>>
>>>>> In my case there are two files, info and
10.32.1.144.-opt-lvmdir-c2-brick, are both empty.
>>>>> But in my view only one rename can be running at the same
time because of the big lock.
>>>>> Why there are two files are empty?
>>>>>
>>>>>
>>>>> Could rename("info.tmp", "info") and
rename("xxx-brick.tmp", "xxx-brick") be running in two
thread?
>>>>>
>>>>> Thanks,
>>>>> Xin
>>>>>
>>>>>
>>>>> ? 2016-11-11 15:27:03?"Atin Mukherjee"
<amukherj at redhat.com> ???
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Nov 11, 2016 at 12:38 PM, songxin <songxin_1980
at 126.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>> Hi Atin,
>>>>> Thank you for your reply.
>>>>>
>>>>> As you said that the info file can only be changed in the
glusterd_store_volinfo()
>>>>> sequentially because of the big lock.
>>>>>
>>>>> I have found the similar issue as below that you mentioned.
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487
>>>>>
>>>>>
>>>>> Great, so this is what I was actually trying to refer in my
first
>>>>> email that I saw a similar issue. Have you got a chance to
look at
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487#c4 ?
But in your
>>>>> case, did you try to bring down glusterd when there was an
ongoing commit
>>>>> happening?
>>>>>
>>>>>
>>>>>
>>>>> You said that "This issue is hit at part of the
negative testing where
>>>>> while gluster volume set was executed at the same point of
time glusterd in
>>>>> another instance was brought down. In the faulty node we
could see
>>>>> /var/lib/glusterd/vols/<volname>info file been empty
whereas the
>>>>> info.tmp file has the correct contents." in comment.
>>>>>
>>>>> I have two questions for you.
>>>>>
>>>>> 1.Could you reproduce this issue by gluster volume set
glusterd which was brought down?
>>>>> 2.Could you be certain that this issue is cause by rename
is interrupted in kernel?
>>>>>
>>>>> In my case there are two files, info and
10.32.1.144.-opt-lvmdir-c2-brick, are both empty.
>>>>> But in my view only one rename can be running at the same
time because of the big lock.
>>>>> Why there are two files are empty?
>>>>>
>>>>>
>>>>> Could rename("info.tmp", "info") and
rename("xxx-brick.tmp", "xxx-brick") be running in two
thread?
>>>>>
>>>>> Thanks,
>>>>> Xin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ? 2016-11-11 14:36:40?"Atin Mukherjee"
<amukherj at redhat.com> ???
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Nov 11, 2016 at 8:33 AM, songxin <songxin_1980
at 126.com> wrote:
>>>>>
>>>>> Hi Atin,
>>>>>
>>>>> Thank you for your reply.
>>>>> I have two questions for you.
>>>>>
>>>>> 1.Are the two files info and info.tmp are only to be
created or
>>>>> changed in function glusterd_store_volinfo()? I did not
find other point in
>>>>> which the two file are changed.
>>>>>
>>>>>
>>>>> If we are talking about info file volume then yes, the
mentioned
>>>>> function actually takes care of it.
>>>>>
>>>>>
>>>>> 2.I found that glusterd_store_volinfo() will be call in
many point by
>>>>> glusterd.Is there a problem of thread synchronization?If
so, one thread may
>>>>> open a same file info.tmp using O_TRUNC flag when another
thread is
>>>>> writing the info,tmp.Could this case happen?
>>>>>
>>>>>
>>>>> In glusterd threads are big lock protected and I don't
see a
>>>>> possibility (theoretically) to have two
glusterd_store_volinfo () calls at
>>>>> a given point of time.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Xin
>>>>>
>>>>>
>>>>> At 2016-11-10 21:41:06, "Atin Mukherjee"
<amukherj at redhat.com> wrote:
>>>>>
>>>>> Did you run out of disk space by any chance? AFAIK, the
code is like
>>>>> we write new stuffs to .tmp file and rename it back to the
original file.
>>>>> In case of a disk space issue I expect both the files to be
of non zero
>>>>> size. But having said that I vaguely remember a similar
issue (in the form
>>>>> of a bug or an email) landed up once but we couldn't
reproduce it, so
>>>>> something is wrong with the atomic update here is what I
guess. I'll be
>>>>> glad if you have a reproducer for the same and then we can
dig into it
>>>>> further.
>>>>>
>>>>> On Thu, Nov 10, 2016 at 1:32 PM, songxin <songxin_1980
at 126.com> wrote:
>>>>>
>>>>> Hi,
>>>>> When I start the glusterd some error happened.
>>>>> And the log is following.
>>>>>
>>>>> [2016-11-08 07:58:34.989365] I [MSGID: 100030]
>>>>> [glusterfsd.c:2318:main] 0-/usr/sbin/glusterd: Started
running
>>>>> /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd
-p
>>>>> /var/run/glusterd.pid --log-level INFO)
>>>>> [2016-11-08 07:58:34.998356] I [MSGID: 106478]
[glusterd.c:1350:init]
>>>>> 0-management: Maximum allowed open file descriptors set to
65536
>>>>> [2016-11-08 07:58:35.000667] I [MSGID: 106479]
[glusterd.c:1399:init]
>>>>> 0-management: Using /system/glusterd as working directory
>>>>> [2016-11-08 07:58:35.024508] I [MSGID: 106514]
>>>>> [glusterd-store.c:2075:glusterd_restore_op_version]
0-management:
>>>>> Upgrade detected. Setting op-version to minimum : 1
>>>>> *[2016-11-08 07:58:35.025356] E [MSGID: 106206]
>>>>> [glusterd-store.c:2562:glusterd_store_update_volinfo]
0-management: Failed
>>>>> to get next store iter *
>>>>> *[2016-11-08 07:58:35.025401] E [MSGID: 106207]
>>>>> [glusterd-store.c:2844:glusterd_store_retrieve_volume]
0-management: Failed
>>>>> to update volinfo for c_glusterfs volume *
>>>>> *[2016-11-08 07:58:35.025463] E [MSGID: 106201]
>>>>> [glusterd-store.c:3042:glusterd_store_retrieve_volumes]
0-management:
>>>>> Unable to restore volume: c_glusterfs *
>>>>> *[2016-11-08 07:58:35.025544] E [MSGID: 101019]
>>>>> [xlator.c:428:xlator_init] 0-management: Initialization of
volume
>>>>> 'management' failed, review your volfile again *
>>>>> *[2016-11-08 07:58:35.025582] E
[graph.c:322:glusterfs_graph_init]
>>>>> 0-management: initializing translator failed *
>>>>> *[2016-11-08 07:58:35.025629] E
[graph.c:661:glusterfs_graph_activate]
>>>>> 0-graph: init failed *
>>>>> [2016-11-08 07:58:35.026109] W
[glusterfsd.c:1236:cleanup_and_exit]
>>>>> (-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b260)
[0x1000a718]
>>>>> -->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b3b8)
[0x1000a5a8]
>>>>> -->/usr/sbin/glusterd(cleanup_and_exit-0x1c02c)
[0x100098bc] ) 0-:
>>>>> received signum (0), shutting down
>>>>>
>>>>>
>>>>> And then I found that the size of vols/volume_name/info is
0.It cause
>>>>> glusterd shutdown.
>>>>> But I found that vols/volume_name_info.tmp is not 0.
>>>>> And I found that there is a brick file
vols/volume_name/bricks/xxxx.brick
>>>>> is 0, but vols/volume_name/bricks/xxxx.brick.tmp is not 0.
>>>>>
>>>>> I read the function code glusterd_store_volinfo () in
>>>>> glusterd-store.c .
>>>>> I know that the info.tmp will be rename to info in function
>>>>> glusterd_store_volume_atomic_update().
>>>>>
>>>>> But my question is that why the info file is 0 but info.tmp
is not 0.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Xin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>> --
>>>>> --Atin
>>>>>
>>>>> --
>>>> - Atin (atinm)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> ~ Atin (atinm)
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>>
>>
>>
>>
>
>
>
> --
>
> ~ Atin (atinm)
>
>
>
>
>
--
~ Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170117/cc21fc09/attachment-0001.html>