Pranith Kumar Karampuri
2017-Jul-08 01:36 UTC
[Gluster-users] [Gluster-devel] gfid and volume-id extended attributes lost
Ram, As per the code, self-heal was the only candidate which *can* do it. Could you check logs of self-heal daemon and the mount to check if there are any metadata heals on root? +Sanoj Sanoj, Is there any systemtap script we can use to detect which process is removing these xattrs? On Sat, Jul 8, 2017 at 2:58 AM, Ankireddypalle Reddy <areddy at commvault.com> wrote:> We lost the attributes on all the bricks on servers glusterfs2 and > glusterfs3 again. > > > > [root at glusterfs2 Log_Files]# gluster volume info > > > > Volume Name: StoragePool > > Type: Distributed-Disperse > > Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f > > Status: Started > > Number of Bricks: 20 x (2 + 1) = 60 > > Transport-type: tcp > > Bricks: > > Brick1: glusterfs1sds:/ws/disk1/ws_brick > > Brick2: glusterfs2sds:/ws/disk1/ws_brick > > Brick3: glusterfs3sds:/ws/disk1/ws_brick > > Brick4: glusterfs1sds:/ws/disk2/ws_brick > > Brick5: glusterfs2sds:/ws/disk2/ws_brick > > Brick6: glusterfs3sds:/ws/disk2/ws_brick > > Brick7: glusterfs1sds:/ws/disk3/ws_brick > > Brick8: glusterfs2sds:/ws/disk3/ws_brick > > Brick9: glusterfs3sds:/ws/disk3/ws_brick > > Brick10: glusterfs1sds:/ws/disk4/ws_brick > > Brick11: glusterfs2sds:/ws/disk4/ws_brick > > Brick12: glusterfs3sds:/ws/disk4/ws_brick > > Brick13: glusterfs1sds:/ws/disk5/ws_brick > > Brick14: glusterfs2sds:/ws/disk5/ws_brick > > Brick15: glusterfs3sds:/ws/disk5/ws_brick > > Brick16: glusterfs1sds:/ws/disk6/ws_brick > > Brick17: glusterfs2sds:/ws/disk6/ws_brick > > Brick18: glusterfs3sds:/ws/disk6/ws_brick > > Brick19: glusterfs1sds:/ws/disk7/ws_brick > > Brick20: glusterfs2sds:/ws/disk7/ws_brick > > Brick21: glusterfs3sds:/ws/disk7/ws_brick > > Brick22: glusterfs1sds:/ws/disk8/ws_brick > > Brick23: glusterfs2sds:/ws/disk8/ws_brick > > Brick24: glusterfs3sds:/ws/disk8/ws_brick > > Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick > > Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick > > Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick > > Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick > > Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick > > Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick > > Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick > > Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick > > Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick > > Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick > > Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick > > Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick > > Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick > > Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick > > Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick > > Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick > > Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick > > Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick > > Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick > > Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick > > Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick > > Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick > > Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick > > Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick > > Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick > > Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick > > Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick > > Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick > > Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick > > Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick > > Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick > > Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick > > Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick > > Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick > > Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick > > Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick > > Options Reconfigured: > > performance.readdir-ahead: on > > diagnostics.client-log-level: INFO > > auth.allow: glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds. > commvault.com,glusterfs5sds.commvault.com,glusterfs6sds.commvault.com > > > > Thanks and Regards, > > Ram > > *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] > *Sent:* Friday, July 07, 2017 12:15 PM > > *To:* Ankireddypalle Reddy > *Cc:* Gluster Devel (gluster-devel at gluster.org); gluster-users at gluster.org > *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost > > > > > > > > On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy <areddy at commvault.com> > wrote: > > 3.7.19 > > > > These are the only callers for removexattr and only _posix_remove_xattr > has the potential to do removexattr as posix_removexattr already makes sure > that it is not gfid/volume-id. And surprise surprise _posix_remove_xattr > happens only from healing code of afr/ec. And this can only happen if the > source brick doesn't have gfid, which doesn't seem to match with the > situation you explained. > > # line filename / context / line > 1 1234 xlators/mgmt/glusterd/src/glusterd-quota.c > <<glusterd_remove_quota_limit>> > ret = sys_lremovexattr (abspath, QUOTA_LIMIT_KEY); > 2 1243 xlators/mgmt/glusterd/src/glusterd-quota.c > <<glusterd_remove_quota_limit>> > ret = sys_lremovexattr (abspath, QUOTA_LIMIT_OBJECTS_KEY); > 3 6102 xlators/mgmt/glusterd/src/glusterd-utils.c > <<glusterd_check_and_set_brick_xattr>> > sys_lremovexattr (path, "trusted.glusterfs.test"); > 4 80 xlators/storage/posix/src/posix-handle.h > <<REMOVE_PGFID_XATTR>> > op_ret = sys_lremovexattr (path, key); \ > 5 5026 xlators/storage/posix/src/posix.c <<_posix_remove_xattr>> > op_ret = sys_lremovexattr (filler->real_path, key); > 6 5101 xlators/storage/posix/src/posix.c <<posix_removexattr>> > op_ret = sys_lremovexattr (real_path, name); > 7 6811 xlators/storage/posix/src/posix.c <<init>> > sys_lremovexattr (dir_data->data, "trusted.glusterfs.test"); > > So there are only two possibilities: > > 1) Source directory in ec/afr doesn't have gfid > > 2) Something else removed these xattrs. > > What is your volume info? May be that will give more clues. > > > > PS: sys_fremovexattr is called only from posix_fremovexattr(), so that > doesn't seem to be the culprit as it also have checks to guard against > gfid/volume-id removal. > > > > Thanks and Regards, > > Ram > > *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] > *Sent:* Friday, July 07, 2017 11:54 AM > > > *To:* Ankireddypalle Reddy > *Cc:* Gluster Devel (gluster-devel at gluster.org); gluster-users at gluster.org > *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost > > > > > > > > On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy <areddy at commvault.com> > wrote: > > Pranith, > > Thanks for looking in to the issue. The bricks were > mounted after the reboot. One more thing that I noticed was when the > attributes were manually set when glusterd was up then on starting the > volume the attributes were again lost. Had to stop glusterd set attributes > and then start glusterd. After that the volume start succeeded. > > > > Which version is this? > > > > > > Thanks and Regards, > > Ram > > > > *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] > *Sent:* Friday, July 07, 2017 11:46 AM > *To:* Ankireddypalle Reddy > *Cc:* Gluster Devel (gluster-devel at gluster.org); gluster-users at gluster.org > *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost > > > > Did anything special happen on these two bricks? It can't happen in the > I/O path: > posix_removexattr() has: > 0 if (!strcmp (GFID_XATTR_KEY, name)) > { > > > 1 gf_msg (this->name, GF_LOG_WARNING, 0, > P_MSG_XATTR_NOT_REMOVED, > 2 "Remove xattr called on gfid for file %s", > real_path); > 3 op_ret = -1; > > 4 goto out; > > 5 } > > 6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name)) > { > 7 gf_msg (this->name, GF_LOG_WARNING, 0, > P_MSG_XATTR_NOT_REMOVED, > 8 "Remove xattr called on volume-id for file > %s", > 9 real_path); > > 10 op_ret = -1; > > 11 goto out; > > 12 } > > I just found that op_errno is not set correctly, but it can't happen in > the I/O path, so self-heal/rebalance are off the hook. > > I also grepped for any removexattr of trusted.gfid from glusterd and > didn't find any. > > So one thing that used to happen was that sometimes when machines reboot, > the brick mounts wouldn't happen and this would lead to absence of both > trusted.gfid and volume-id. So at the moment this is my wild guess. > > > > > > On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy <areddy at commvault.com> > wrote: > > Hi, > > We faced an issue in the production today. We had to stop the > volume and reboot all the servers in the cluster. Once the servers > rebooted starting of the volume failed because the following extended > attributes were not present on all the bricks on 2 servers. > > 1) trusted.gfid > > 2) trusted.glusterfs.volume-id > > > > We had to manually set these extended attributes to start the volume. Are > there any such known issues. > > > > Thanks and Regards, > > Ram > > ***************************Legal Disclaimer*************************** > > "This communication may contain confidential and privileged material for > the > > sole use of the intended recipient. Any unauthorized review, use or > distribution > > by others is strictly prohibited. If you have received the message by > mistake, > > please advise the sender by reply email and delete the message. Thank you." > > ********************************************************************** > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-devel > > > > > -- > > Pranith > > ***************************Legal Disclaimer*************************** > > "This communication may contain confidential and privileged material for > the > > sole use of the intended recipient. Any unauthorized review, use or > distribution > > by others is strictly prohibited. If you have received the message by > mistake, > > please advise the sender by reply email and delete the message. Thank you." > > ********************************************************************** > > > > > -- > > Pranith > > ***************************Legal Disclaimer*************************** > > "This communication may contain confidential and privileged material for > the > > sole use of the intended recipient. Any unauthorized review, use or > distribution > > by others is strictly prohibited. If you have received the message by > mistake, > > please advise the sender by reply email and delete the message. Thank you." > > ********************************************************************** > > > > > -- > > Pranith > ***************************Legal Disclaimer*************************** > "This communication may contain confidential and privileged material for > the > sole use of the intended recipient. Any unauthorized review, use or > distribution > by others is strictly prohibited. If you have received the message by > mistake, > please advise the sender by reply email and delete the message. Thank you." > ********************************************************************** >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170708/de831927/attachment.html>
Sanoj Unnikrishnan
2017-Jul-10 09:26 UTC
[Gluster-users] [Gluster-devel] gfid and volume-id extended attributes lost
@ pranith , yes . we can get the pid on all removexattr call and also print the backtrace of the glusterfsd process when trigerring removing xattr. I will write the script and reply back. On Sat, Jul 8, 2017 at 7:06 AM, Pranith Kumar Karampuri <pkarampu at redhat.com> wrote:> Ram, > As per the code, self-heal was the only candidate which *can* do > it. Could you check logs of self-heal daemon and the mount to check if > there are any metadata heals on root? > > > +Sanoj > > Sanoj, > Is there any systemtap script we can use to detect which process is > removing these xattrs? > > On Sat, Jul 8, 2017 at 2:58 AM, Ankireddypalle Reddy <areddy at commvault.com > > wrote: > >> We lost the attributes on all the bricks on servers glusterfs2 and >> glusterfs3 again. >> >> >> >> [root at glusterfs2 Log_Files]# gluster volume info >> >> >> >> Volume Name: StoragePool >> >> Type: Distributed-Disperse >> >> Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f >> >> Status: Started >> >> Number of Bricks: 20 x (2 + 1) = 60 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: glusterfs1sds:/ws/disk1/ws_brick >> >> Brick2: glusterfs2sds:/ws/disk1/ws_brick >> >> Brick3: glusterfs3sds:/ws/disk1/ws_brick >> >> Brick4: glusterfs1sds:/ws/disk2/ws_brick >> >> Brick5: glusterfs2sds:/ws/disk2/ws_brick >> >> Brick6: glusterfs3sds:/ws/disk2/ws_brick >> >> Brick7: glusterfs1sds:/ws/disk3/ws_brick >> >> Brick8: glusterfs2sds:/ws/disk3/ws_brick >> >> Brick9: glusterfs3sds:/ws/disk3/ws_brick >> >> Brick10: glusterfs1sds:/ws/disk4/ws_brick >> >> Brick11: glusterfs2sds:/ws/disk4/ws_brick >> >> Brick12: glusterfs3sds:/ws/disk4/ws_brick >> >> Brick13: glusterfs1sds:/ws/disk5/ws_brick >> >> Brick14: glusterfs2sds:/ws/disk5/ws_brick >> >> Brick15: glusterfs3sds:/ws/disk5/ws_brick >> >> Brick16: glusterfs1sds:/ws/disk6/ws_brick >> >> Brick17: glusterfs2sds:/ws/disk6/ws_brick >> >> Brick18: glusterfs3sds:/ws/disk6/ws_brick >> >> Brick19: glusterfs1sds:/ws/disk7/ws_brick >> >> Brick20: glusterfs2sds:/ws/disk7/ws_brick >> >> Brick21: glusterfs3sds:/ws/disk7/ws_brick >> >> Brick22: glusterfs1sds:/ws/disk8/ws_brick >> >> Brick23: glusterfs2sds:/ws/disk8/ws_brick >> >> Brick24: glusterfs3sds:/ws/disk8/ws_brick >> >> Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick >> >> Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick >> >> Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick >> >> Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick >> >> Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick >> >> Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick >> >> Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick >> >> Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick >> >> Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick >> >> Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick >> >> Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick >> >> Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick >> >> Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick >> >> Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick >> >> Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick >> >> Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick >> >> Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick >> >> Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick >> >> Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick >> >> Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick >> >> Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick >> >> Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick >> >> Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick >> >> Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick >> >> Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick >> >> Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick >> >> Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick >> >> Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick >> >> Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick >> >> Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick >> >> Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick >> >> Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick >> >> Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick >> >> Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick >> >> Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick >> >> Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick >> >> Options Reconfigured: >> >> performance.readdir-ahead: on >> >> diagnostics.client-log-level: INFO >> >> auth.allow: glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds.comm >> vault.com,glusterfs5sds.commvault.com,glusterfs6sds.commvault.com >> >> >> >> Thanks and Regards, >> >> Ram >> >> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] >> *Sent:* Friday, July 07, 2017 12:15 PM >> >> *To:* Ankireddypalle Reddy >> *Cc:* Gluster Devel (gluster-devel at gluster.org); >> gluster-users at gluster.org >> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes >> lost >> >> >> >> >> >> >> >> On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy < >> areddy at commvault.com> wrote: >> >> 3.7.19 >> >> >> >> These are the only callers for removexattr and only _posix_remove_xattr >> has the potential to do removexattr as posix_removexattr already makes sure >> that it is not gfid/volume-id. And surprise surprise _posix_remove_xattr >> happens only from healing code of afr/ec. And this can only happen if the >> source brick doesn't have gfid, which doesn't seem to match with the >> situation you explained. >> >> # line filename / context / line >> 1 1234 xlators/mgmt/glusterd/src/glusterd-quota.c >> <<glusterd_remove_quota_limit>> >> ret = sys_lremovexattr (abspath, QUOTA_LIMIT_KEY); >> 2 1243 xlators/mgmt/glusterd/src/glusterd-quota.c >> <<glusterd_remove_quota_limit>> >> ret = sys_lremovexattr (abspath, QUOTA_LIMIT_OBJECTS_KEY); >> 3 6102 xlators/mgmt/glusterd/src/glusterd-utils.c >> <<glusterd_check_and_set_brick_xattr>> >> sys_lremovexattr (path, "trusted.glusterfs.test"); >> 4 80 xlators/storage/posix/src/posix-handle.h >> <<REMOVE_PGFID_XATTR>> >> op_ret = sys_lremovexattr (path, key); \ >> 5 5026 xlators/storage/posix/src/posix.c <<_posix_remove_xattr>> >> op_ret = sys_lremovexattr (filler->real_path, key); >> 6 5101 xlators/storage/posix/src/posix.c <<posix_removexattr>> >> op_ret = sys_lremovexattr (real_path, name); >> 7 6811 xlators/storage/posix/src/posix.c <<init>> >> sys_lremovexattr (dir_data->data, "trusted.glusterfs.test"); >> >> So there are only two possibilities: >> >> 1) Source directory in ec/afr doesn't have gfid >> >> 2) Something else removed these xattrs. >> >> What is your volume info? May be that will give more clues. >> >> >> >> PS: sys_fremovexattr is called only from posix_fremovexattr(), so that >> doesn't seem to be the culprit as it also have checks to guard against >> gfid/volume-id removal. >> >> >> >> Thanks and Regards, >> >> Ram >> >> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] >> *Sent:* Friday, July 07, 2017 11:54 AM >> >> >> *To:* Ankireddypalle Reddy >> *Cc:* Gluster Devel (gluster-devel at gluster.org); >> gluster-users at gluster.org >> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes >> lost >> >> >> >> >> >> >> >> On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy < >> areddy at commvault.com> wrote: >> >> Pranith, >> >> Thanks for looking in to the issue. The bricks were >> mounted after the reboot. One more thing that I noticed was when the >> attributes were manually set when glusterd was up then on starting the >> volume the attributes were again lost. Had to stop glusterd set attributes >> and then start glusterd. After that the volume start succeeded. >> >> >> >> Which version is this? >> >> >> >> >> >> Thanks and Regards, >> >> Ram >> >> >> >> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] >> *Sent:* Friday, July 07, 2017 11:46 AM >> *To:* Ankireddypalle Reddy >> *Cc:* Gluster Devel (gluster-devel at gluster.org); >> gluster-users at gluster.org >> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes >> lost >> >> >> >> Did anything special happen on these two bricks? It can't happen in the >> I/O path: >> posix_removexattr() has: >> 0 if (!strcmp (GFID_XATTR_KEY, name)) >> { >> >> >> 1 gf_msg (this->name, GF_LOG_WARNING, 0, >> P_MSG_XATTR_NOT_REMOVED, >> 2 "Remove xattr called on gfid for file %s", >> real_path); >> 3 op_ret = -1; >> >> 4 goto out; >> >> 5 } >> >> 6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name)) >> { >> 7 gf_msg (this->name, GF_LOG_WARNING, 0, >> P_MSG_XATTR_NOT_REMOVED, >> 8 "Remove xattr called on volume-id for file >> %s", >> 9 real_path); >> >> 10 op_ret = -1; >> >> 11 goto out; >> >> 12 } >> >> I just found that op_errno is not set correctly, but it can't happen in >> the I/O path, so self-heal/rebalance are off the hook. >> >> I also grepped for any removexattr of trusted.gfid from glusterd and >> didn't find any. >> >> So one thing that used to happen was that sometimes when machines reboot, >> the brick mounts wouldn't happen and this would lead to absence of both >> trusted.gfid and volume-id. So at the moment this is my wild guess. >> >> >> >> >> >> On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy < >> areddy at commvault.com> wrote: >> >> Hi, >> >> We faced an issue in the production today. We had to stop the >> volume and reboot all the servers in the cluster. Once the servers >> rebooted starting of the volume failed because the following extended >> attributes were not present on all the bricks on 2 servers. >> >> 1) trusted.gfid >> >> 2) trusted.glusterfs.volume-id >> >> >> >> We had to manually set these extended attributes to start the volume. >> Are there any such known issues. >> >> >> >> Thanks and Regards, >> >> Ram >> >> ***************************Legal Disclaimer*************************** >> >> "This communication may contain confidential and privileged material for >> the >> >> sole use of the intended recipient. Any unauthorized review, use or >> distribution >> >> by others is strictly prohibited. If you have received the message by >> mistake, >> >> please advise the sender by reply email and delete the message. Thank >> you." >> >> ********************************************************************** >> >> >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-devel >> >> >> >> >> -- >> >> Pranith >> >> ***************************Legal Disclaimer*************************** >> >> "This communication may contain confidential and privileged material for >> the >> >> sole use of the intended recipient. Any unauthorized review, use or >> distribution >> >> by others is strictly prohibited. If you have received the message by >> mistake, >> >> please advise the sender by reply email and delete the message. Thank >> you." >> >> ********************************************************************** >> >> >> >> >> -- >> >> Pranith >> >> ***************************Legal Disclaimer*************************** >> >> "This communication may contain confidential and privileged material for >> the >> >> sole use of the intended recipient. Any unauthorized review, use or >> distribution >> >> by others is strictly prohibited. If you have received the message by >> mistake, >> >> please advise the sender by reply email and delete the message. Thank >> you." >> >> ********************************************************************** >> >> >> >> >> -- >> >> Pranith >> ***************************Legal Disclaimer*************************** >> "This communication may contain confidential and privileged material for >> the >> sole use of the intended recipient. Any unauthorized review, use or >> distribution >> by others is strictly prohibited. If you have received the message by >> mistake, >> please advise the sender by reply email and delete the message. Thank >> you." >> ********************************************************************** >> > > > > -- > Pranith >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170710/4b7bc00e/attachment.html>
Sanoj Unnikrishnan
2017-Jul-10 11:49 UTC
[Gluster-users] [Gluster-devel] gfid and volume-id extended attributes lost
Please use the systemtap script( https://paste.fedoraproject.org/paste/EGDa0ErwX0LV3y-gBYpfNA) to check which process is invoking remove xattr calls. It prints the pid, tid and arguments of all removexattr calls. I have checked for these fops at the protocol/client and posix translators. To run the script .. 1) install systemtap and dependencies. 2) install glusterfs-debuginfo 3) change the path of the translator in the systemtap script to appropriate values for your system (change "/usr/lib64/glusterfs/3.12dev/xlator/protocol/client.so" and "/usr/lib64/glusterfs/3.12dev/xlator/storage/posix.so") 4) run the script as follows #stap -v fop_trace.stp The o/p would look like these .. additionally arguments will also be dumped if glusterfs-debuginfo is also installed (i had not done it here.) pid-958: 0 glusterfsd(3893):->posix_setxattr pid-958: 47 glusterfsd(3893):<-posix_setxattr pid-966: 0 glusterfsd(5033):->posix_setxattr pid-966: 57 glusterfsd(5033):<-posix_setxattr pid-1423: 0 glusterfs(1431):->client_setxattr pid-1423: 37 glusterfs(1431):<-client_setxattr pid-1423: 0 glusterfs(1431):->client_setxattr pid-1423: 41 glusterfs(1431):<-client_setxattr Regards, Sanoj On Mon, Jul 10, 2017 at 2:56 PM, Sanoj Unnikrishnan <sunnikri at redhat.com> wrote:> @ pranith , yes . we can get the pid on all removexattr call and also > print the backtrace of the glusterfsd process when trigerring removing > xattr. > I will write the script and reply back. > > On Sat, Jul 8, 2017 at 7:06 AM, Pranith Kumar Karampuri < > pkarampu at redhat.com> wrote: > >> Ram, >> As per the code, self-heal was the only candidate which *can* do >> it. Could you check logs of self-heal daemon and the mount to check if >> there are any metadata heals on root? >> >> >> +Sanoj >> >> Sanoj, >> Is there any systemtap script we can use to detect which process >> is removing these xattrs? >> >> On Sat, Jul 8, 2017 at 2:58 AM, Ankireddypalle Reddy < >> areddy at commvault.com> wrote: >> >>> We lost the attributes on all the bricks on servers glusterfs2 and >>> glusterfs3 again. >>> >>> >>> >>> [root at glusterfs2 Log_Files]# gluster volume info >>> >>> >>> >>> Volume Name: StoragePool >>> >>> Type: Distributed-Disperse >>> >>> Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f >>> >>> Status: Started >>> >>> Number of Bricks: 20 x (2 + 1) = 60 >>> >>> Transport-type: tcp >>> >>> Bricks: >>> >>> Brick1: glusterfs1sds:/ws/disk1/ws_brick >>> >>> Brick2: glusterfs2sds:/ws/disk1/ws_brick >>> >>> Brick3: glusterfs3sds:/ws/disk1/ws_brick >>> >>> Brick4: glusterfs1sds:/ws/disk2/ws_brick >>> >>> Brick5: glusterfs2sds:/ws/disk2/ws_brick >>> >>> Brick6: glusterfs3sds:/ws/disk2/ws_brick >>> >>> Brick7: glusterfs1sds:/ws/disk3/ws_brick >>> >>> Brick8: glusterfs2sds:/ws/disk3/ws_brick >>> >>> Brick9: glusterfs3sds:/ws/disk3/ws_brick >>> >>> Brick10: glusterfs1sds:/ws/disk4/ws_brick >>> >>> Brick11: glusterfs2sds:/ws/disk4/ws_brick >>> >>> Brick12: glusterfs3sds:/ws/disk4/ws_brick >>> >>> Brick13: glusterfs1sds:/ws/disk5/ws_brick >>> >>> Brick14: glusterfs2sds:/ws/disk5/ws_brick >>> >>> Brick15: glusterfs3sds:/ws/disk5/ws_brick >>> >>> Brick16: glusterfs1sds:/ws/disk6/ws_brick >>> >>> Brick17: glusterfs2sds:/ws/disk6/ws_brick >>> >>> Brick18: glusterfs3sds:/ws/disk6/ws_brick >>> >>> Brick19: glusterfs1sds:/ws/disk7/ws_brick >>> >>> Brick20: glusterfs2sds:/ws/disk7/ws_brick >>> >>> Brick21: glusterfs3sds:/ws/disk7/ws_brick >>> >>> Brick22: glusterfs1sds:/ws/disk8/ws_brick >>> >>> Brick23: glusterfs2sds:/ws/disk8/ws_brick >>> >>> Brick24: glusterfs3sds:/ws/disk8/ws_brick >>> >>> Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick >>> >>> Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick >>> >>> Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick >>> >>> Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick >>> >>> Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick >>> >>> Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick >>> >>> Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick >>> >>> Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick >>> >>> Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick >>> >>> Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick >>> >>> Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick >>> >>> Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick >>> >>> Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick >>> >>> Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick >>> >>> Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick >>> >>> Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick >>> >>> Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick >>> >>> Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick >>> >>> Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick >>> >>> Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick >>> >>> Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick >>> >>> Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick >>> >>> Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick >>> >>> Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick >>> >>> Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick >>> >>> Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick >>> >>> Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick >>> >>> Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick >>> >>> Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick >>> >>> Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick >>> >>> Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick >>> >>> Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick >>> >>> Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick >>> >>> Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick >>> >>> Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick >>> >>> Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick >>> >>> Options Reconfigured: >>> >>> performance.readdir-ahead: on >>> >>> diagnostics.client-log-level: INFO >>> >>> auth.allow: glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds.comm >>> vault.com,glusterfs5sds.commvault.com,glusterfs6sds.commvault.com >>> >>> >>> >>> Thanks and Regards, >>> >>> Ram >>> >>> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] >>> *Sent:* Friday, July 07, 2017 12:15 PM >>> >>> *To:* Ankireddypalle Reddy >>> *Cc:* Gluster Devel (gluster-devel at gluster.org); >>> gluster-users at gluster.org >>> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes >>> lost >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy < >>> areddy at commvault.com> wrote: >>> >>> 3.7.19 >>> >>> >>> >>> These are the only callers for removexattr and only _posix_remove_xattr >>> has the potential to do removexattr as posix_removexattr already makes sure >>> that it is not gfid/volume-id. And surprise surprise _posix_remove_xattr >>> happens only from healing code of afr/ec. And this can only happen if the >>> source brick doesn't have gfid, which doesn't seem to match with the >>> situation you explained. >>> >>> # line filename / context / line >>> 1 1234 xlators/mgmt/glusterd/src/glusterd-quota.c >>> <<glusterd_remove_quota_limit>> >>> ret = sys_lremovexattr (abspath, QUOTA_LIMIT_KEY); >>> 2 1243 xlators/mgmt/glusterd/src/glusterd-quota.c >>> <<glusterd_remove_quota_limit>> >>> ret = sys_lremovexattr (abspath, QUOTA_LIMIT_OBJECTS_KEY); >>> 3 6102 xlators/mgmt/glusterd/src/glusterd-utils.c >>> <<glusterd_check_and_set_brick_xattr>> >>> sys_lremovexattr (path, "trusted.glusterfs.test"); >>> 4 80 xlators/storage/posix/src/posix-handle.h >>> <<REMOVE_PGFID_XATTR>> >>> op_ret = sys_lremovexattr (path, key); \ >>> 5 5026 xlators/storage/posix/src/posix.c <<_posix_remove_xattr>> >>> op_ret = sys_lremovexattr (filler->real_path, key); >>> 6 5101 xlators/storage/posix/src/posix.c <<posix_removexattr>> >>> op_ret = sys_lremovexattr (real_path, name); >>> 7 6811 xlators/storage/posix/src/posix.c <<init>> >>> sys_lremovexattr (dir_data->data, "trusted.glusterfs.test"); >>> >>> So there are only two possibilities: >>> >>> 1) Source directory in ec/afr doesn't have gfid >>> >>> 2) Something else removed these xattrs. >>> >>> What is your volume info? May be that will give more clues. >>> >>> >>> >>> PS: sys_fremovexattr is called only from posix_fremovexattr(), so that >>> doesn't seem to be the culprit as it also have checks to guard against >>> gfid/volume-id removal. >>> >>> >>> >>> Thanks and Regards, >>> >>> Ram >>> >>> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] >>> *Sent:* Friday, July 07, 2017 11:54 AM >>> >>> >>> *To:* Ankireddypalle Reddy >>> *Cc:* Gluster Devel (gluster-devel at gluster.org); >>> gluster-users at gluster.org >>> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes >>> lost >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy < >>> areddy at commvault.com> wrote: >>> >>> Pranith, >>> >>> Thanks for looking in to the issue. The bricks were >>> mounted after the reboot. One more thing that I noticed was when the >>> attributes were manually set when glusterd was up then on starting the >>> volume the attributes were again lost. Had to stop glusterd set attributes >>> and then start glusterd. After that the volume start succeeded. >>> >>> >>> >>> Which version is this? >>> >>> >>> >>> >>> >>> Thanks and Regards, >>> >>> Ram >>> >>> >>> >>> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] >>> *Sent:* Friday, July 07, 2017 11:46 AM >>> *To:* Ankireddypalle Reddy >>> *Cc:* Gluster Devel (gluster-devel at gluster.org); >>> gluster-users at gluster.org >>> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes >>> lost >>> >>> >>> >>> Did anything special happen on these two bricks? It can't happen in the >>> I/O path: >>> posix_removexattr() has: >>> 0 if (!strcmp (GFID_XATTR_KEY, name)) >>> { >>> >>> >>> 1 gf_msg (this->name, GF_LOG_WARNING, 0, >>> P_MSG_XATTR_NOT_REMOVED, >>> 2 "Remove xattr called on gfid for file %s", >>> real_path); >>> 3 op_ret = -1; >>> >>> 4 goto out; >>> >>> 5 } >>> >>> 6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name)) >>> { >>> 7 gf_msg (this->name, GF_LOG_WARNING, 0, >>> P_MSG_XATTR_NOT_REMOVED, >>> 8 "Remove xattr called on volume-id for file >>> %s", >>> 9 real_path); >>> >>> 10 op_ret = -1; >>> >>> 11 goto out; >>> >>> 12 } >>> >>> I just found that op_errno is not set correctly, but it can't happen in >>> the I/O path, so self-heal/rebalance are off the hook. >>> >>> I also grepped for any removexattr of trusted.gfid from glusterd and >>> didn't find any. >>> >>> So one thing that used to happen was that sometimes when machines >>> reboot, the brick mounts wouldn't happen and this would lead to absence of >>> both trusted.gfid and volume-id. So at the moment this is my wild guess. >>> >>> >>> >>> >>> >>> On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy < >>> areddy at commvault.com> wrote: >>> >>> Hi, >>> >>> We faced an issue in the production today. We had to stop the >>> volume and reboot all the servers in the cluster. Once the servers >>> rebooted starting of the volume failed because the following extended >>> attributes were not present on all the bricks on 2 servers. >>> >>> 1) trusted.gfid >>> >>> 2) trusted.glusterfs.volume-id >>> >>> >>> >>> We had to manually set these extended attributes to start the volume. >>> Are there any such known issues. >>> >>> >>> >>> Thanks and Regards, >>> >>> Ram >>> >>> ***************************Legal Disclaimer*************************** >>> >>> "This communication may contain confidential and privileged material for >>> the >>> >>> sole use of the intended recipient. Any unauthorized review, use or >>> distribution >>> >>> by others is strictly prohibited. If you have received the message by >>> mistake, >>> >>> please advise the sender by reply email and delete the message. Thank >>> you." >>> >>> ********************************************************************** >>> >>> >>> _______________________________________________ >>> Gluster-devel mailing list >>> Gluster-devel at gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-devel >>> >>> >>> >>> >>> -- >>> >>> Pranith >>> >>> ***************************Legal Disclaimer*************************** >>> >>> "This communication may contain confidential and privileged material for >>> the >>> >>> sole use of the intended recipient. Any unauthorized review, use or >>> distribution >>> >>> by others is strictly prohibited. If you have received the message by >>> mistake, >>> >>> please advise the sender by reply email and delete the message. Thank >>> you." >>> >>> ********************************************************************** >>> >>> >>> >>> >>> -- >>> >>> Pranith >>> >>> ***************************Legal Disclaimer*************************** >>> >>> "This communication may contain confidential and privileged material for >>> the >>> >>> sole use of the intended recipient. Any unauthorized review, use or >>> distribution >>> >>> by others is strictly prohibited. If you have received the message by >>> mistake, >>> >>> please advise the sender by reply email and delete the message. Thank >>> you." >>> >>> ********************************************************************** >>> >>> >>> >>> >>> -- >>> >>> Pranith >>> ***************************Legal Disclaimer*************************** >>> "This communication may contain confidential and privileged material for >>> the >>> sole use of the intended recipient. Any unauthorized review, use or >>> distribution >>> by others is strictly prohibited. If you have received the message by >>> mistake, >>> please advise the sender by reply email and delete the message. Thank >>> you." >>> ********************************************************************** >>> >> >> >> >> -- >> Pranith >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170710/7eee2de1/attachment.html>
Reasonably Related Threads
- [Gluster-devel] gfid and volume-id extended attributes lost
- [Gluster-devel] gfid and volume-id extended attributes lost
- [Gluster-devel] gfid and volume-id extended attributes lost
- [Gluster-devel] gfid and volume-id extended attributes lost
- [Gluster-devel] gfid and volume-id extended attributes lost