thr3ads.net - Gluster users - [Gluster-users] [Gluster-devel] gfid and volume-id extended attributes lost [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Pranith Kumar Karampuri

2017-Jul-08 01:36 UTC

[Gluster-users] [Gluster-devel] gfid and volume-id extended attributes lost

Ram,
       As per the code, self-heal was the only candidate which *can* do it.
Could you check logs of self-heal daemon and the mount to check if there
are any metadata heals on root?


+Sanoj

Sanoj,
       Is there any systemtap script we can use to detect which process is
removing these xattrs?

On Sat, Jul 8, 2017 at 2:58 AM, Ankireddypalle Reddy <areddy at
commvault.com>
wrote:
> We lost the attributes on all the bricks on servers glusterfs2 and
> glusterfs3 again.
>
>
>
> [root at glusterfs2 Log_Files]# gluster volume info
>
>
>
> Volume Name: StoragePool
>
> Type: Distributed-Disperse
>
> Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f
>
> Status: Started
>
> Number of Bricks: 20 x (2 + 1) = 60
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: glusterfs1sds:/ws/disk1/ws_brick
>
> Brick2: glusterfs2sds:/ws/disk1/ws_brick
>
> Brick3: glusterfs3sds:/ws/disk1/ws_brick
>
> Brick4: glusterfs1sds:/ws/disk2/ws_brick
>
> Brick5: glusterfs2sds:/ws/disk2/ws_brick
>
> Brick6: glusterfs3sds:/ws/disk2/ws_brick
>
> Brick7: glusterfs1sds:/ws/disk3/ws_brick
>
> Brick8: glusterfs2sds:/ws/disk3/ws_brick
>
> Brick9: glusterfs3sds:/ws/disk3/ws_brick
>
> Brick10: glusterfs1sds:/ws/disk4/ws_brick
>
> Brick11: glusterfs2sds:/ws/disk4/ws_brick
>
> Brick12: glusterfs3sds:/ws/disk4/ws_brick
>
> Brick13: glusterfs1sds:/ws/disk5/ws_brick
>
> Brick14: glusterfs2sds:/ws/disk5/ws_brick
>
> Brick15: glusterfs3sds:/ws/disk5/ws_brick
>
> Brick16: glusterfs1sds:/ws/disk6/ws_brick
>
> Brick17: glusterfs2sds:/ws/disk6/ws_brick
>
> Brick18: glusterfs3sds:/ws/disk6/ws_brick
>
> Brick19: glusterfs1sds:/ws/disk7/ws_brick
>
> Brick20: glusterfs2sds:/ws/disk7/ws_brick
>
> Brick21: glusterfs3sds:/ws/disk7/ws_brick
>
> Brick22: glusterfs1sds:/ws/disk8/ws_brick
>
> Brick23: glusterfs2sds:/ws/disk8/ws_brick
>
> Brick24: glusterfs3sds:/ws/disk8/ws_brick
>
> Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick
>
> Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick
>
> Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick
>
> Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick
>
> Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick
>
> Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick
>
> Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick
>
> Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick
>
> Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick
>
> Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick
>
> Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick
>
> Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick
>
> Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick
>
> Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick
>
> Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick
>
> Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick
>
> Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick
>
> Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick
>
> Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick
>
> Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick
>
> Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick
>
> Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick
>
> Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick
>
> Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick
>
> Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick
>
> Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick
>
> Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick
>
> Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick
>
> Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick
>
> Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick
>
> Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick
>
> Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick
>
> Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick
>
> Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick
>
> Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick
>
> Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> diagnostics.client-log-level: INFO
>
> auth.allow: glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds.
> commvault.com,glusterfs5sds.commvault.com,glusterfs6sds.commvault.com
>
>
>
> Thanks and Regards,
>
> Ram
>
> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
> *Sent:* Friday, July 07, 2017 12:15 PM
>
> *To:* Ankireddypalle Reddy
> *Cc:* Gluster Devel (gluster-devel at gluster.org); gluster-users at
gluster.org
> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost
>
>
>
>
>
>
>
> On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy <areddy at
commvault.com>
> wrote:
>
> 3.7.19
>
>
>
> These are the only callers for removexattr and only _posix_remove_xattr
> has the potential to do removexattr as posix_removexattr already makes sure
> that it is not gfid/volume-id. And surprise surprise _posix_remove_xattr
> happens only from healing code of afr/ec. And this can only happen if the
> source brick doesn't have gfid, which doesn't seem to match with
the
> situation you explained.
>
>    #   line  filename / context / line
>    1   1234  xlators/mgmt/glusterd/src/glusterd-quota.c
> <<glusterd_remove_quota_limit>>
>              ret = sys_lremovexattr (abspath, QUOTA_LIMIT_KEY);
>    2   1243  xlators/mgmt/glusterd/src/glusterd-quota.c
> <<glusterd_remove_quota_limit>>
>              ret = sys_lremovexattr (abspath, QUOTA_LIMIT_OBJECTS_KEY);
>    3   6102  xlators/mgmt/glusterd/src/glusterd-utils.c
> <<glusterd_check_and_set_brick_xattr>>
>              sys_lremovexattr (path, "trusted.glusterfs.test");
>    4     80  xlators/storage/posix/src/posix-handle.h
> <<REMOVE_PGFID_XATTR>>
>              op_ret = sys_lremovexattr (path, key); \
>    5   5026  xlators/storage/posix/src/posix.c
<<_posix_remove_xattr>>
>              op_ret = sys_lremovexattr (filler->real_path, key);
>    6   5101  xlators/storage/posix/src/posix.c
<<posix_removexattr>>
>              op_ret = sys_lremovexattr (real_path, name);
>    7   6811  xlators/storage/posix/src/posix.c <<init>>
>              sys_lremovexattr (dir_data->data,
"trusted.glusterfs.test");
>
> So there are only two possibilities:
>
> 1) Source directory in ec/afr doesn't have gfid
>
> 2) Something else removed these xattrs.
>
> What is your volume info? May be that will give more clues.
>
>
>
>  PS: sys_fremovexattr is called only from posix_fremovexattr(), so that
> doesn't seem to be the culprit as it also have checks to guard against
> gfid/volume-id removal.
>
>
>
> Thanks and Regards,
>
> Ram
>
> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
> *Sent:* Friday, July 07, 2017 11:54 AM
>
>
> *To:* Ankireddypalle Reddy
> *Cc:* Gluster Devel (gluster-devel at gluster.org); gluster-users at
gluster.org
> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost
>
>
>
>
>
>
>
> On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy <areddy at
commvault.com>
> wrote:
>
> Pranith,
>
>                  Thanks for looking in to the issue. The bricks were
> mounted after the reboot. One more thing that I noticed was when the
> attributes were manually set when glusterd was up then on starting the
> volume the attributes were again lost. Had to stop glusterd set attributes
> and then start glusterd. After that the volume start succeeded.
>
>
>
> Which version is this?
>
>
>
>
>
> Thanks and Regards,
>
> Ram
>
>
>
> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
> *Sent:* Friday, July 07, 2017 11:46 AM
> *To:* Ankireddypalle Reddy
> *Cc:* Gluster Devel (gluster-devel at gluster.org); gluster-users at
gluster.org
> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost
>
>
>
> Did anything special happen on these two bricks? It can't happen in the
> I/O path:
> posix_removexattr() has:
>   0         if (!strcmp (GFID_XATTR_KEY, name))
> {
>
>
>   1                 gf_msg (this->name, GF_LOG_WARNING, 0,
> P_MSG_XATTR_NOT_REMOVED,
>   2                         "Remove xattr called on gfid for file
%s",
> real_path);
>   3                 op_ret = -1;
>
>   4                 goto out;
>
>   5         }
>
>   6         if (!strcmp (GF_XATTR_VOL_ID_KEY, name))
> {
>   7                 gf_msg (this->name, GF_LOG_WARNING, 0,
> P_MSG_XATTR_NOT_REMOVED,
>   8                         "Remove xattr called on volume-id for file
> %s",
>   9                         real_path);
>
>  10                 op_ret = -1;
>
>  11                 goto out;
>
>  12         }
>
> I just found that op_errno is not set correctly, but it can't happen in
> the I/O path, so self-heal/rebalance are off the hook.
>
> I also grepped for any removexattr of trusted.gfid from glusterd and
> didn't find any.
>
> So one thing that used to happen was that sometimes when machines reboot,
> the brick mounts wouldn't happen and this would lead to absence of both
> trusted.gfid and volume-id. So at the moment this is my wild guess.
>
>
>
>
>
> On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy <areddy at
commvault.com>
> wrote:
>
> Hi,
>
>        We faced an issue in the production today. We had to stop the
> volume and reboot all the servers in the cluster.  Once the servers
> rebooted starting of the volume failed because the following extended
> attributes were not present on all the bricks on 2 servers.
>
> 1)      trusted.gfid
>
> 2)      trusted.glusterfs.volume-id
>
>
>
> We had to manually set these extended attributes to start the volume.  Are
> there any such known issues.
>
>
>
> Thanks and Regards,
>
> Ram
>
> ***************************Legal Disclaimer***************************
>
> "This communication may contain confidential and privileged material
for
> the
>
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
>
> by others is strictly prohibited. If you have received the message by
> mistake,
>
> please advise the sender by reply email and delete the message. Thank
you."
>
> **********************************************************************
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> --
>
> Pranith
>
> ***************************Legal Disclaimer***************************
>
> "This communication may contain confidential and privileged material
for
> the
>
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
>
> by others is strictly prohibited. If you have received the message by
> mistake,
>
> please advise the sender by reply email and delete the message. Thank
you."
>
> **********************************************************************
>
>
>
>
> --
>
> Pranith
>
> ***************************Legal Disclaimer***************************
>
> "This communication may contain confidential and privileged material
for
> the
>
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
>
> by others is strictly prohibited. If you have received the message by
> mistake,
>
> please advise the sender by reply email and delete the message. Thank
you."
>
> **********************************************************************
>
>
>
>
> --
>
> Pranith
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material
for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank
you."
> **********************************************************************
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170708/de831927/attachment.html>

Sanoj Unnikrishnan

2017-Jul-10 09:26 UTC

head link

[Gluster-users] [Gluster-devel] gfid and volume-id extended attributes lost

@ pranith , yes . we can get the pid on all removexattr call and also print
the backtrace of the glusterfsd process when trigerring removing xattr.
I will write the script and reply back.

On Sat, Jul 8, 2017 at 7:06 AM, Pranith Kumar Karampuri <pkarampu at
redhat.com> wrote:
> Ram,
>        As per the code, self-heal was the only candidate which *can* do
> it. Could you check logs of self-heal daemon and the mount to check if
> there are any metadata heals on root?
>
>
> +Sanoj
>
> Sanoj,
>        Is there any systemtap script we can use to detect which process is
> removing these xattrs?
>
> On Sat, Jul 8, 2017 at 2:58 AM, Ankireddypalle Reddy <areddy at
commvault.com
> > wrote:
>
>> We lost the attributes on all the bricks on servers glusterfs2 and
>> glusterfs3 again.
>>
>>
>>
>> [root at glusterfs2 Log_Files]# gluster volume info
>>
>>
>>
>> Volume Name: StoragePool
>>
>> Type: Distributed-Disperse
>>
>> Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f
>>
>> Status: Started
>>
>> Number of Bricks: 20 x (2 + 1) = 60
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: glusterfs1sds:/ws/disk1/ws_brick
>>
>> Brick2: glusterfs2sds:/ws/disk1/ws_brick
>>
>> Brick3: glusterfs3sds:/ws/disk1/ws_brick
>>
>> Brick4: glusterfs1sds:/ws/disk2/ws_brick
>>
>> Brick5: glusterfs2sds:/ws/disk2/ws_brick
>>
>> Brick6: glusterfs3sds:/ws/disk2/ws_brick
>>
>> Brick7: glusterfs1sds:/ws/disk3/ws_brick
>>
>> Brick8: glusterfs2sds:/ws/disk3/ws_brick
>>
>> Brick9: glusterfs3sds:/ws/disk3/ws_brick
>>
>> Brick10: glusterfs1sds:/ws/disk4/ws_brick
>>
>> Brick11: glusterfs2sds:/ws/disk4/ws_brick
>>
>> Brick12: glusterfs3sds:/ws/disk4/ws_brick
>>
>> Brick13: glusterfs1sds:/ws/disk5/ws_brick
>>
>> Brick14: glusterfs2sds:/ws/disk5/ws_brick
>>
>> Brick15: glusterfs3sds:/ws/disk5/ws_brick
>>
>> Brick16: glusterfs1sds:/ws/disk6/ws_brick
>>
>> Brick17: glusterfs2sds:/ws/disk6/ws_brick
>>
>> Brick18: glusterfs3sds:/ws/disk6/ws_brick
>>
>> Brick19: glusterfs1sds:/ws/disk7/ws_brick
>>
>> Brick20: glusterfs2sds:/ws/disk7/ws_brick
>>
>> Brick21: glusterfs3sds:/ws/disk7/ws_brick
>>
>> Brick22: glusterfs1sds:/ws/disk8/ws_brick
>>
>> Brick23: glusterfs2sds:/ws/disk8/ws_brick
>>
>> Brick24: glusterfs3sds:/ws/disk8/ws_brick
>>
>> Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick
>>
>> Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick
>>
>> Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick
>>
>> Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick
>>
>> Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick
>>
>> Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick
>>
>> Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick
>>
>> Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick
>>
>> Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick
>>
>> Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick
>>
>> Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick
>>
>> Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick
>>
>> Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick
>>
>> Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick
>>
>> Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick
>>
>> Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick
>>
>> Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick
>>
>> Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick
>>
>> Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick
>>
>> Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick
>>
>> Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick
>>
>> Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick
>>
>> Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick
>>
>> Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick
>>
>> Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick
>>
>> Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick
>>
>> Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick
>>
>> Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick
>>
>> Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick
>>
>> Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick
>>
>> Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick
>>
>> Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick
>>
>> Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick
>>
>> Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick
>>
>> Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick
>>
>> Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick
>>
>> Options Reconfigured:
>>
>> performance.readdir-ahead: on
>>
>> diagnostics.client-log-level: INFO
>>
>> auth.allow:
glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds.comm
>> vault.com,glusterfs5sds.commvault.com,glusterfs6sds.commvault.com
>>
>>
>>
>> Thanks and Regards,
>>
>> Ram
>>
>> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
>> *Sent:* Friday, July 07, 2017 12:15 PM
>>
>> *To:* Ankireddypalle Reddy
>> *Cc:* Gluster Devel (gluster-devel at gluster.org);
>> gluster-users at gluster.org
>> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes
>> lost
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy <
>> areddy at commvault.com> wrote:
>>
>> 3.7.19
>>
>>
>>
>> These are the only callers for removexattr and only _posix_remove_xattr
>> has the potential to do removexattr as posix_removexattr already makes
sure
>> that it is not gfid/volume-id. And surprise surprise
_posix_remove_xattr
>> happens only from healing code of afr/ec. And this can only happen if
the
>> source brick doesn't have gfid, which doesn't seem to match
with the
>> situation you explained.
>>
>>    #   line  filename / context / line
>>    1   1234  xlators/mgmt/glusterd/src/glusterd-quota.c
>> <<glusterd_remove_quota_limit>>
>>              ret = sys_lremovexattr (abspath, QUOTA_LIMIT_KEY);
>>    2   1243  xlators/mgmt/glusterd/src/glusterd-quota.c
>> <<glusterd_remove_quota_limit>>
>>              ret = sys_lremovexattr (abspath, QUOTA_LIMIT_OBJECTS_KEY);
>>    3   6102  xlators/mgmt/glusterd/src/glusterd-utils.c
>> <<glusterd_check_and_set_brick_xattr>>
>>              sys_lremovexattr (path,
"trusted.glusterfs.test");
>>    4     80  xlators/storage/posix/src/posix-handle.h
>> <<REMOVE_PGFID_XATTR>>
>>              op_ret = sys_lremovexattr (path, key); \
>>    5   5026  xlators/storage/posix/src/posix.c
<<_posix_remove_xattr>>
>>              op_ret = sys_lremovexattr (filler->real_path, key);
>>    6   5101  xlators/storage/posix/src/posix.c
<<posix_removexattr>>
>>              op_ret = sys_lremovexattr (real_path, name);
>>    7   6811  xlators/storage/posix/src/posix.c <<init>>
>>              sys_lremovexattr (dir_data->data,
"trusted.glusterfs.test");
>>
>> So there are only two possibilities:
>>
>> 1) Source directory in ec/afr doesn't have gfid
>>
>> 2) Something else removed these xattrs.
>>
>> What is your volume info? May be that will give more clues.
>>
>>
>>
>>  PS: sys_fremovexattr is called only from posix_fremovexattr(), so that
>> doesn't seem to be the culprit as it also have checks to guard
against
>> gfid/volume-id removal.
>>
>>
>>
>> Thanks and Regards,
>>
>> Ram
>>
>> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
>> *Sent:* Friday, July 07, 2017 11:54 AM
>>
>>
>> *To:* Ankireddypalle Reddy
>> *Cc:* Gluster Devel (gluster-devel at gluster.org);
>> gluster-users at gluster.org
>> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes
>> lost
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy <
>> areddy at commvault.com> wrote:
>>
>> Pranith,
>>
>>                  Thanks for looking in to the issue. The bricks were
>> mounted after the reboot. One more thing that I noticed was when the
>> attributes were manually set when glusterd was up then on starting the
>> volume the attributes were again lost. Had to stop glusterd set
attributes
>> and then start glusterd. After that the volume start succeeded.
>>
>>
>>
>> Which version is this?
>>
>>
>>
>>
>>
>> Thanks and Regards,
>>
>> Ram
>>
>>
>>
>> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
>> *Sent:* Friday, July 07, 2017 11:46 AM
>> *To:* Ankireddypalle Reddy
>> *Cc:* Gluster Devel (gluster-devel at gluster.org);
>> gluster-users at gluster.org
>> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes
>> lost
>>
>>
>>
>> Did anything special happen on these two bricks? It can't happen in
the
>> I/O path:
>> posix_removexattr() has:
>>   0         if (!strcmp (GFID_XATTR_KEY, name))
>> {
>>
>>
>>   1                 gf_msg (this->name, GF_LOG_WARNING, 0,
>> P_MSG_XATTR_NOT_REMOVED,
>>   2                         "Remove xattr called on gfid for file
%s",
>> real_path);
>>   3                 op_ret = -1;
>>
>>   4                 goto out;
>>
>>   5         }
>>
>>   6         if (!strcmp (GF_XATTR_VOL_ID_KEY, name))
>> {
>>   7                 gf_msg (this->name, GF_LOG_WARNING, 0,
>> P_MSG_XATTR_NOT_REMOVED,
>>   8                         "Remove xattr called on volume-id for
file
>> %s",
>>   9                         real_path);
>>
>>  10                 op_ret = -1;
>>
>>  11                 goto out;
>>
>>  12         }
>>
>> I just found that op_errno is not set correctly, but it can't
happen in
>> the I/O path, so self-heal/rebalance are off the hook.
>>
>> I also grepped for any removexattr of trusted.gfid from glusterd and
>> didn't find any.
>>
>> So one thing that used to happen was that sometimes when machines
reboot,
>> the brick mounts wouldn't happen and this would lead to absence of
both
>> trusted.gfid and volume-id. So at the moment this is my wild guess.
>>
>>
>>
>>
>>
>> On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy <
>> areddy at commvault.com> wrote:
>>
>> Hi,
>>
>>        We faced an issue in the production today. We had to stop the
>> volume and reboot all the servers in the cluster.  Once the servers
>> rebooted starting of the volume failed because the following extended
>> attributes were not present on all the bricks on 2 servers.
>>
>> 1)      trusted.gfid
>>
>> 2)      trusted.glusterfs.volume-id
>>
>>
>>
>> We had to manually set these extended attributes to start the volume.
>> Are there any such known issues.
>>
>>
>>
>> Thanks and Regards,
>>
>> Ram
>>
>> ***************************Legal Disclaimer***************************
>>
>> "This communication may contain confidential and privileged
material for
>> the
>>
>> sole use of the intended recipient. Any unauthorized review, use or
>> distribution
>>
>> by others is strictly prohibited. If you have received the message by
>> mistake,
>>
>> please advise the sender by reply email and delete the message. Thank
>> you."
>>
>> **********************************************************************
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>
>> --
>>
>> Pranith
>>
>> ***************************Legal Disclaimer***************************
>>
>> "This communication may contain confidential and privileged
material for
>> the
>>
>> sole use of the intended recipient. Any unauthorized review, use or
>> distribution
>>
>> by others is strictly prohibited. If you have received the message by
>> mistake,
>>
>> please advise the sender by reply email and delete the message. Thank
>> you."
>>
>> **********************************************************************
>>
>>
>>
>>
>> --
>>
>> Pranith
>>
>> ***************************Legal Disclaimer***************************
>>
>> "This communication may contain confidential and privileged
material for
>> the
>>
>> sole use of the intended recipient. Any unauthorized review, use or
>> distribution
>>
>> by others is strictly prohibited. If you have received the message by
>> mistake,
>>
>> please advise the sender by reply email and delete the message. Thank
>> you."
>>
>> **********************************************************************
>>
>>
>>
>>
>> --
>>
>> Pranith
>> ***************************Legal Disclaimer***************************
>> "This communication may contain confidential and privileged
material for
>> the
>> sole use of the intended recipient. Any unauthorized review, use or
>> distribution
>> by others is strictly prohibited. If you have received the message by
>> mistake,
>> please advise the sender by reply email and delete the message. Thank
>> you."
>> **********************************************************************
>>
>
>
>
> --
> Pranith
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170710/4b7bc00e/attachment.html>

Sanoj Unnikrishnan

2017-Jul-10 11:49 UTC

head link

[Gluster-users] [Gluster-devel] gfid and volume-id extended attributes lost

Please use the systemtap script(
https://paste.fedoraproject.org/paste/EGDa0ErwX0LV3y-gBYpfNA) to check
which process is invoking remove xattr calls.
It prints the pid, tid and arguments of all removexattr calls.
I have checked for these fops at the protocol/client and posix translators.

To run the script ..
1) install systemtap and dependencies.
2) install glusterfs-debuginfo
3) change the path of the translator in the systemtap script to appropriate
values for your system
(change "/usr/lib64/glusterfs/3.12dev/xlator/protocol/client.so" and
"/usr/lib64/glusterfs/3.12dev/xlator/storage/posix.so")
4) run the script as follows
#stap -v fop_trace.stp

The o/p would look like these .. additionally arguments will also be dumped
if glusterfs-debuginfo is also installed (i had not done it here.)
pid-958:     0 glusterfsd(3893):->posix_setxattr
pid-958:    47 glusterfsd(3893):<-posix_setxattr
pid-966:     0 glusterfsd(5033):->posix_setxattr
pid-966:    57 glusterfsd(5033):<-posix_setxattr
pid-1423:     0 glusterfs(1431):->client_setxattr
pid-1423:    37 glusterfs(1431):<-client_setxattr
pid-1423:     0 glusterfs(1431):->client_setxattr
pid-1423:    41 glusterfs(1431):<-client_setxattr

Regards,
Sanoj



On Mon, Jul 10, 2017 at 2:56 PM, Sanoj Unnikrishnan <sunnikri at
redhat.com>
wrote:
> @ pranith , yes . we can get the pid on all removexattr call and also
> print the backtrace of the glusterfsd process when trigerring removing
> xattr.
> I will write the script and reply back.
>
> On Sat, Jul 8, 2017 at 7:06 AM, Pranith Kumar Karampuri <
> pkarampu at redhat.com> wrote:
>
>> Ram,
>>        As per the code, self-heal was the only candidate which *can* do
>> it. Could you check logs of self-heal daemon and the mount to check if
>> there are any metadata heals on root?
>>
>>
>> +Sanoj
>>
>> Sanoj,
>>        Is there any systemtap script we can use to detect which process
>> is removing these xattrs?
>>
>> On Sat, Jul 8, 2017 at 2:58 AM, Ankireddypalle Reddy <
>> areddy at commvault.com> wrote:
>>
>>> We lost the attributes on all the bricks on servers glusterfs2 and
>>> glusterfs3 again.
>>>
>>>
>>>
>>> [root at glusterfs2 Log_Files]# gluster volume info
>>>
>>>
>>>
>>> Volume Name: StoragePool
>>>
>>> Type: Distributed-Disperse
>>>
>>> Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f
>>>
>>> Status: Started
>>>
>>> Number of Bricks: 20 x (2 + 1) = 60
>>>
>>> Transport-type: tcp
>>>
>>> Bricks:
>>>
>>> Brick1: glusterfs1sds:/ws/disk1/ws_brick
>>>
>>> Brick2: glusterfs2sds:/ws/disk1/ws_brick
>>>
>>> Brick3: glusterfs3sds:/ws/disk1/ws_brick
>>>
>>> Brick4: glusterfs1sds:/ws/disk2/ws_brick
>>>
>>> Brick5: glusterfs2sds:/ws/disk2/ws_brick
>>>
>>> Brick6: glusterfs3sds:/ws/disk2/ws_brick
>>>
>>> Brick7: glusterfs1sds:/ws/disk3/ws_brick
>>>
>>> Brick8: glusterfs2sds:/ws/disk3/ws_brick
>>>
>>> Brick9: glusterfs3sds:/ws/disk3/ws_brick
>>>
>>> Brick10: glusterfs1sds:/ws/disk4/ws_brick
>>>
>>> Brick11: glusterfs2sds:/ws/disk4/ws_brick
>>>
>>> Brick12: glusterfs3sds:/ws/disk4/ws_brick
>>>
>>> Brick13: glusterfs1sds:/ws/disk5/ws_brick
>>>
>>> Brick14: glusterfs2sds:/ws/disk5/ws_brick
>>>
>>> Brick15: glusterfs3sds:/ws/disk5/ws_brick
>>>
>>> Brick16: glusterfs1sds:/ws/disk6/ws_brick
>>>
>>> Brick17: glusterfs2sds:/ws/disk6/ws_brick
>>>
>>> Brick18: glusterfs3sds:/ws/disk6/ws_brick
>>>
>>> Brick19: glusterfs1sds:/ws/disk7/ws_brick
>>>
>>> Brick20: glusterfs2sds:/ws/disk7/ws_brick
>>>
>>> Brick21: glusterfs3sds:/ws/disk7/ws_brick
>>>
>>> Brick22: glusterfs1sds:/ws/disk8/ws_brick
>>>
>>> Brick23: glusterfs2sds:/ws/disk8/ws_brick
>>>
>>> Brick24: glusterfs3sds:/ws/disk8/ws_brick
>>>
>>> Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick
>>>
>>> Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick
>>>
>>> Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick
>>>
>>> Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick
>>>
>>> Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick
>>>
>>> Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick
>>>
>>> Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick
>>>
>>> Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick
>>>
>>> Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick
>>>
>>> Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick
>>>
>>> Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick
>>>
>>> Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick
>>>
>>> Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick
>>>
>>> Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick
>>>
>>> Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick
>>>
>>> Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick
>>>
>>> Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick
>>>
>>> Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick
>>>
>>> Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick
>>>
>>> Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick
>>>
>>> Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick
>>>
>>> Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick
>>>
>>> Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick
>>>
>>> Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick
>>>
>>> Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick
>>>
>>> Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick
>>>
>>> Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick
>>>
>>> Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick
>>>
>>> Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick
>>>
>>> Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick
>>>
>>> Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick
>>>
>>> Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick
>>>
>>> Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick
>>>
>>> Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick
>>>
>>> Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick
>>>
>>> Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick
>>>
>>> Options Reconfigured:
>>>
>>> performance.readdir-ahead: on
>>>
>>> diagnostics.client-log-level: INFO
>>>
>>> auth.allow:
glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds.comm
>>> vault.com,glusterfs5sds.commvault.com,glusterfs6sds.commvault.com
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Ram
>>>
>>> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
>>> *Sent:* Friday, July 07, 2017 12:15 PM
>>>
>>> *To:* Ankireddypalle Reddy
>>> *Cc:* Gluster Devel (gluster-devel at gluster.org);
>>> gluster-users at gluster.org
>>> *Subject:* Re: [Gluster-devel] gfid and volume-id extended
attributes
>>> lost
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy <
>>> areddy at commvault.com> wrote:
>>>
>>> 3.7.19
>>>
>>>
>>>
>>> These are the only callers for removexattr and only
_posix_remove_xattr
>>> has the potential to do removexattr as posix_removexattr already
makes sure
>>> that it is not gfid/volume-id. And surprise surprise
_posix_remove_xattr
>>> happens only from healing code of afr/ec. And this can only happen
if the
>>> source brick doesn't have gfid, which doesn't seem to match
with the
>>> situation you explained.
>>>
>>>    #   line  filename / context / line
>>>    1   1234  xlators/mgmt/glusterd/src/glusterd-quota.c
>>> <<glusterd_remove_quota_limit>>
>>>              ret = sys_lremovexattr (abspath, QUOTA_LIMIT_KEY);
>>>    2   1243  xlators/mgmt/glusterd/src/glusterd-quota.c
>>> <<glusterd_remove_quota_limit>>
>>>              ret = sys_lremovexattr (abspath,
QUOTA_LIMIT_OBJECTS_KEY);
>>>    3   6102  xlators/mgmt/glusterd/src/glusterd-utils.c
>>> <<glusterd_check_and_set_brick_xattr>>
>>>              sys_lremovexattr (path,
"trusted.glusterfs.test");
>>>    4     80  xlators/storage/posix/src/posix-handle.h
>>> <<REMOVE_PGFID_XATTR>>
>>>              op_ret = sys_lremovexattr (path, key); \
>>>    5   5026  xlators/storage/posix/src/posix.c
<<_posix_remove_xattr>>
>>>              op_ret = sys_lremovexattr (filler->real_path, key);
>>>    6   5101  xlators/storage/posix/src/posix.c
<<posix_removexattr>>
>>>              op_ret = sys_lremovexattr (real_path, name);
>>>    7   6811  xlators/storage/posix/src/posix.c <<init>>
>>>              sys_lremovexattr (dir_data->data,
"trusted.glusterfs.test");
>>>
>>> So there are only two possibilities:
>>>
>>> 1) Source directory in ec/afr doesn't have gfid
>>>
>>> 2) Something else removed these xattrs.
>>>
>>> What is your volume info? May be that will give more clues.
>>>
>>>
>>>
>>>  PS: sys_fremovexattr is called only from posix_fremovexattr(), so
that
>>> doesn't seem to be the culprit as it also have checks to guard
against
>>> gfid/volume-id removal.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Ram
>>>
>>> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
>>> *Sent:* Friday, July 07, 2017 11:54 AM
>>>
>>>
>>> *To:* Ankireddypalle Reddy
>>> *Cc:* Gluster Devel (gluster-devel at gluster.org);
>>> gluster-users at gluster.org
>>> *Subject:* Re: [Gluster-devel] gfid and volume-id extended
attributes
>>> lost
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy <
>>> areddy at commvault.com> wrote:
>>>
>>> Pranith,
>>>
>>>                  Thanks for looking in to the issue. The bricks
were
>>> mounted after the reboot. One more thing that I noticed was when
the
>>> attributes were manually set when glusterd was up then on starting
the
>>> volume the attributes were again lost. Had to stop glusterd set
attributes
>>> and then start glusterd. After that the volume start succeeded.
>>>
>>>
>>>
>>> Which version is this?
>>>
>>>
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Ram
>>>
>>>
>>>
>>> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
>>> *Sent:* Friday, July 07, 2017 11:46 AM
>>> *To:* Ankireddypalle Reddy
>>> *Cc:* Gluster Devel (gluster-devel at gluster.org);
>>> gluster-users at gluster.org
>>> *Subject:* Re: [Gluster-devel] gfid and volume-id extended
attributes
>>> lost
>>>
>>>
>>>
>>> Did anything special happen on these two bricks? It can't
happen in the
>>> I/O path:
>>> posix_removexattr() has:
>>>   0         if (!strcmp (GFID_XATTR_KEY, name))
>>> {
>>>
>>>
>>>   1                 gf_msg (this->name, GF_LOG_WARNING, 0,
>>> P_MSG_XATTR_NOT_REMOVED,
>>>   2                         "Remove xattr called on gfid for
file %s",
>>> real_path);
>>>   3                 op_ret = -1;
>>>
>>>   4                 goto out;
>>>
>>>   5         }
>>>
>>>   6         if (!strcmp (GF_XATTR_VOL_ID_KEY, name))
>>> {
>>>   7                 gf_msg (this->name, GF_LOG_WARNING, 0,
>>> P_MSG_XATTR_NOT_REMOVED,
>>>   8                         "Remove xattr called on volume-id
for file
>>> %s",
>>>   9                         real_path);
>>>
>>>  10                 op_ret = -1;
>>>
>>>  11                 goto out;
>>>
>>>  12         }
>>>
>>> I just found that op_errno is not set correctly, but it can't
happen in
>>> the I/O path, so self-heal/rebalance are off the hook.
>>>
>>> I also grepped for any removexattr of trusted.gfid from glusterd
and
>>> didn't find any.
>>>
>>> So one thing that used to happen was that sometimes when machines
>>> reboot, the brick mounts wouldn't happen and this would lead to
absence of
>>> both trusted.gfid and volume-id. So at the moment this is my wild
guess.
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy <
>>> areddy at commvault.com> wrote:
>>>
>>> Hi,
>>>
>>>        We faced an issue in the production today. We had to stop
the
>>> volume and reboot all the servers in the cluster.  Once the servers
>>> rebooted starting of the volume failed because the following
extended
>>> attributes were not present on all the bricks on 2 servers.
>>>
>>> 1)      trusted.gfid
>>>
>>> 2)      trusted.glusterfs.volume-id
>>>
>>>
>>>
>>> We had to manually set these extended attributes to start the
volume.
>>> Are there any such known issues.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Ram
>>>
>>> ***************************Legal
Disclaimer***************************
>>>
>>> "This communication may contain confidential and privileged
material for
>>> the
>>>
>>> sole use of the intended recipient. Any unauthorized review, use or
>>> distribution
>>>
>>> by others is strictly prohibited. If you have received the message
by
>>> mistake,
>>>
>>> please advise the sender by reply email and delete the message.
Thank
>>> you."
>>>
>>>
**********************************************************************
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Pranith
>>>
>>> ***************************Legal
Disclaimer***************************
>>>
>>> "This communication may contain confidential and privileged
material for
>>> the
>>>
>>> sole use of the intended recipient. Any unauthorized review, use or
>>> distribution
>>>
>>> by others is strictly prohibited. If you have received the message
by
>>> mistake,
>>>
>>> please advise the sender by reply email and delete the message.
Thank
>>> you."
>>>
>>>
**********************************************************************
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Pranith
>>>
>>> ***************************Legal
Disclaimer***************************
>>>
>>> "This communication may contain confidential and privileged
material for
>>> the
>>>
>>> sole use of the intended recipient. Any unauthorized review, use or
>>> distribution
>>>
>>> by others is strictly prohibited. If you have received the message
by
>>> mistake,
>>>
>>> please advise the sender by reply email and delete the message.
Thank
>>> you."
>>>
>>>
**********************************************************************
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Pranith
>>> ***************************Legal
Disclaimer***************************
>>> "This communication may contain confidential and privileged
material for
>>> the
>>> sole use of the intended recipient. Any unauthorized review, use or
>>> distribution
>>> by others is strictly prohibited. If you have received the message
by
>>> mistake,
>>> please advise the sender by reply email and delete the message.
Thank
>>> you."
>>>
**********************************************************************
>>>
>>
>>
>>
>> --
>> Pranith
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170710/7eee2de1/attachment.html>

Possibly Parallel Threads

Search for more reasonably related threads

Gluster users - Jul 2017 - [Gluster-devel] gfid and volume-id extended attributes lost

[Gluster-users] [Gluster-devel] gfid and volume-id extended attributes lost

[Gluster-users] [Gluster-devel] gfid and volume-id extended attributes lost

[Gluster-users] [Gluster-devel] gfid and volume-id extended attributes lost

Possibly Parallel Threads