Ankireddypalle Reddy
2017-Jul-10 13:00 UTC
[Gluster-users] [Gluster-devel] gfid and volume-id extended attributes lost
Thanks for the swift turn around. Will try this out and let you know. Thanks and Regards, Ram From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] Sent: Monday, July 10, 2017 8:31 AM To: Sanoj Unnikrishnan Cc: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org); gluster-users at gluster.org Subject: Re: [Gluster-devel] gfid and volume-id extended attributes lost Ram, If you see it again, you can use this. I am going to send out a patch for the code path which can lead to removal of gfid/volume-id tomorrow. On Mon, Jul 10, 2017 at 5:19 PM, Sanoj Unnikrishnan <sunnikri at redhat.com<mailto:sunnikri at redhat.com>> wrote: Please use the systemtap script(https://paste.fedoraproject.org/paste/EGDa0ErwX0LV3y-gBYpfNA) to check which process is invoking remove xattr calls. It prints the pid, tid and arguments of all removexattr calls. I have checked for these fops at the protocol/client and posix translators. To run the script .. 1) install systemtap and dependencies. 2) install glusterfs-debuginfo 3) change the path of the translator in the systemtap script to appropriate values for your system (change "/usr/lib64/glusterfs/3.12dev/xlator/protocol/client.so" and "/usr/lib64/glusterfs/3.12dev/xlator/storage/posix.so") 4) run the script as follows #stap -v fop_trace.stp The o/p would look like these .. additionally arguments will also be dumped if glusterfs-debuginfo is also installed (i had not done it here.) pid-958: 0 glusterfsd(3893):->posix_setxattr pid-958: 47 glusterfsd(3893):<-posix_setxattr pid-966: 0 glusterfsd(5033):->posix_setxattr pid-966: 57 glusterfsd(5033):<-posix_setxattr pid-1423: 0 glusterfs(1431):->client_setxattr pid-1423: 37 glusterfs(1431):<-client_setxattr pid-1423: 0 glusterfs(1431):->client_setxattr pid-1423: 41 glusterfs(1431):<-client_setxattr Regards, Sanoj On Mon, Jul 10, 2017 at 2:56 PM, Sanoj Unnikrishnan <sunnikri at redhat.com<mailto:sunnikri at redhat.com>> wrote: @ pranith , yes . we can get the pid on all removexattr call and also print the backtrace of the glusterfsd process when trigerring removing xattr. I will write the script and reply back. On Sat, Jul 8, 2017 at 7:06 AM, Pranith Kumar Karampuri <pkarampu at redhat.com<mailto:pkarampu at redhat.com>> wrote: Ram, As per the code, self-heal was the only candidate which *can* do it. Could you check logs of self-heal daemon and the mount to check if there are any metadata heals on root? +Sanoj Sanoj, Is there any systemtap script we can use to detect which process is removing these xattrs? On Sat, Jul 8, 2017 at 2:58 AM, Ankireddypalle Reddy <areddy at commvault.com<mailto:areddy at commvault.com>> wrote: We lost the attributes on all the bricks on servers glusterfs2 and glusterfs3 again. [root at glusterfs2 Log_Files]# gluster volume info Volume Name: StoragePool Type: Distributed-Disperse Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f Status: Started Number of Bricks: 20 x (2 + 1) = 60 Transport-type: tcp Bricks: Brick1: glusterfs1sds:/ws/disk1/ws_brick Brick2: glusterfs2sds:/ws/disk1/ws_brick Brick3: glusterfs3sds:/ws/disk1/ws_brick Brick4: glusterfs1sds:/ws/disk2/ws_brick Brick5: glusterfs2sds:/ws/disk2/ws_brick Brick6: glusterfs3sds:/ws/disk2/ws_brick Brick7: glusterfs1sds:/ws/disk3/ws_brick Brick8: glusterfs2sds:/ws/disk3/ws_brick Brick9: glusterfs3sds:/ws/disk3/ws_brick Brick10: glusterfs1sds:/ws/disk4/ws_brick Brick11: glusterfs2sds:/ws/disk4/ws_brick Brick12: glusterfs3sds:/ws/disk4/ws_brick Brick13: glusterfs1sds:/ws/disk5/ws_brick Brick14: glusterfs2sds:/ws/disk5/ws_brick Brick15: glusterfs3sds:/ws/disk5/ws_brick Brick16: glusterfs1sds:/ws/disk6/ws_brick Brick17: glusterfs2sds:/ws/disk6/ws_brick Brick18: glusterfs3sds:/ws/disk6/ws_brick Brick19: glusterfs1sds:/ws/disk7/ws_brick Brick20: glusterfs2sds:/ws/disk7/ws_brick Brick21: glusterfs3sds:/ws/disk7/ws_brick Brick22: glusterfs1sds:/ws/disk8/ws_brick Brick23: glusterfs2sds:/ws/disk8/ws_brick Brick24: glusterfs3sds:/ws/disk8/ws_brick Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick Options Reconfigured: performance.readdir-ahead: on diagnostics.client-log-level: INFO auth.allow: glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds.commvault.com<http://glusterfs4sds.commvault.com>,glusterfs5sds.commvault.com<http://glusterfs5sds.commvault.com>,glusterfs6sds.commvault.com<http://glusterfs6sds.commvault.com> Thanks and Regards, Ram From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com<mailto:pkarampu at redhat.com>] Sent: Friday, July 07, 2017 12:15 PM To: Ankireddypalle Reddy Cc: Gluster Devel (gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>); gluster-users at gluster.org<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-devel] gfid and volume-id extended attributes lost On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy <areddy at commvault.com<mailto:areddy at commvault.com>> wrote: 3.7.19 These are the only callers for removexattr and only _posix_remove_xattr has the potential to do removexattr as posix_removexattr already makes sure that it is not gfid/volume-id. And surprise surprise _posix_remove_xattr happens only from healing code of afr/ec. And this can only happen if the source brick doesn't have gfid, which doesn't seem to match with the situation you explained. # line filename / context / line 1 1234 xlators/mgmt/glusterd/src/glusterd-quota.c <<glusterd_remove_quota_limit>> ret = sys_lremovexattr (abspath, QUOTA_LIMIT_KEY); 2 1243 xlators/mgmt/glusterd/src/glusterd-quota.c <<glusterd_remove_quota_limit>> ret = sys_lremovexattr (abspath, QUOTA_LIMIT_OBJECTS_KEY); 3 6102 xlators/mgmt/glusterd/src/glusterd-utils.c <<glusterd_check_and_set_brick_xattr>> sys_lremovexattr (path, "trusted.glusterfs.test"); 4 80 xlators/storage/posix/src/posix-handle.h <<REMOVE_PGFID_XATTR>> op_ret = sys_lremovexattr (path, key); \ 5 5026 xlators/storage/posix/src/posix.c <<_posix_remove_xattr>> op_ret = sys_lremovexattr (filler->real_path, key); 6 5101 xlators/storage/posix/src/posix.c <<posix_removexattr>> op_ret = sys_lremovexattr (real_path, name); 7 6811 xlators/storage/posix/src/posix.c <<init>> sys_lremovexattr (dir_data->data, "trusted.glusterfs.test"); So there are only two possibilities: 1) Source directory in ec/afr doesn't have gfid 2) Something else removed these xattrs. What is your volume info? May be that will give more clues. PS: sys_fremovexattr is called only from posix_fremovexattr(), so that doesn't seem to be the culprit as it also have checks to guard against gfid/volume-id removal. Thanks and Regards, Ram From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com<mailto:pkarampu at redhat.com>] Sent: Friday, July 07, 2017 11:54 AM To: Ankireddypalle Reddy Cc: Gluster Devel (gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>); gluster-users at gluster.org<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-devel] gfid and volume-id extended attributes lost On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy <areddy at commvault.com<mailto:areddy at commvault.com>> wrote: Pranith, Thanks for looking in to the issue. The bricks were mounted after the reboot. One more thing that I noticed was when the attributes were manually set when glusterd was up then on starting the volume the attributes were again lost. Had to stop glusterd set attributes and then start glusterd. After that the volume start succeeded. Which version is this? Thanks and Regards, Ram From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com<mailto:pkarampu at redhat.com>] Sent: Friday, July 07, 2017 11:46 AM To: Ankireddypalle Reddy Cc: Gluster Devel (gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>); gluster-users at gluster.org<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-devel] gfid and volume-id extended attributes lost Did anything special happen on these two bricks? It can't happen in the I/O path: posix_removexattr() has: 0 if (!strcmp (GFID_XATTR_KEY, name)) { 1 gf_msg (this->name, GF_LOG_WARNING, 0, P_MSG_XATTR_NOT_REMOVED, 2 "Remove xattr called on gfid for file %s", real_path); 3 op_ret = -1; 4 goto out; 5 } 6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name)) { 7 gf_msg (this->name, GF_LOG_WARNING, 0, P_MSG_XATTR_NOT_REMOVED, 8 "Remove xattr called on volume-id for file %s", 9 real_path); 10 op_ret = -1; 11 goto out; 12 } I just found that op_errno is not set correctly, but it can't happen in the I/O path, so self-heal/rebalance are off the hook. I also grepped for any removexattr of trusted.gfid from glusterd and didn't find any. So one thing that used to happen was that sometimes when machines reboot, the brick mounts wouldn't happen and this would lead to absence of both trusted.gfid and volume-id. So at the moment this is my wild guess. On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy <areddy at commvault.com<mailto:areddy at commvault.com>> wrote: Hi, We faced an issue in the production today. We had to stop the volume and reboot all the servers in the cluster. Once the servers rebooted starting of the volume failed because the following extended attributes were not present on all the bricks on 2 servers. 1) trusted.gfid 2) trusted.glusterfs.volume-id We had to manually set these extended attributes to start the volume. Are there any such known issues. Thanks and Regards, Ram ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** _______________________________________________ Gluster-devel mailing list Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org> http://lists.gluster.org/mailman/listinfo/gluster-devel -- Pranith ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** -- Pranith ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** -- Pranith ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** -- Pranith -- Pranith ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170710/ef570ec4/attachment.html>
Pranith Kumar Karampuri
2017-Jul-13 08:13 UTC
[Gluster-users] [Gluster-devel] gfid and volume-id extended attributes lost
Ram, I sent https://review.gluster.org/17765 to fix the possibility in bulk removexattr. But I am not sure if this is indeed the reason for this issue. On Mon, Jul 10, 2017 at 6:30 PM, Ankireddypalle Reddy <areddy at commvault.com> wrote:> Thanks for the swift turn around. Will try this out and let you know. > > > > Thanks and Regards, > > Ram > > *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] > *Sent:* Monday, July 10, 2017 8:31 AM > *To:* Sanoj Unnikrishnan > *Cc:* Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org); > gluster-users at gluster.org > > *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost > > > > Ram, > > If you see it again, you can use this. I am going to send out a > patch for the code path which can lead to removal of gfid/volume-id > tomorrow. > > > > On Mon, Jul 10, 2017 at 5:19 PM, Sanoj Unnikrishnan <sunnikri at redhat.com> > wrote: > > Please use the systemtap script(https://paste.fedoraproject.org/paste/ > EGDa0ErwX0LV3y-gBYpfNA) to check which process is invoking remove xattr > calls. > It prints the pid, tid and arguments of all removexattr calls. > > I have checked for these fops at the protocol/client and posix translators. > > > To run the script .. > > 1) install systemtap and dependencies. > 2) install glusterfs-debuginfo > > 3) change the path of the translator in the systemtap script to > appropriate values for your system > > (change "/usr/lib64/glusterfs/3.12dev/xlator/protocol/client.so" and > "/usr/lib64/glusterfs/3.12dev/xlator/storage/posix.so") > > 4) run the script as follows > > #stap -v fop_trace.stp > > The o/p would look like these .. additionally arguments will also be > dumped if glusterfs-debuginfo is also installed (i had not done it here.) > pid-958: 0 glusterfsd(3893):->posix_setxattr > pid-958: 47 glusterfsd(3893):<-posix_setxattr > pid-966: 0 glusterfsd(5033):->posix_setxattr > pid-966: 57 glusterfsd(5033):<-posix_setxattr > pid-1423: 0 glusterfs(1431):->client_setxattr > pid-1423: 37 glusterfs(1431):<-client_setxattr > pid-1423: 0 glusterfs(1431):->client_setxattr > pid-1423: 41 glusterfs(1431):<-client_setxattr > > Regards, > > Sanoj > > > > > > > > On Mon, Jul 10, 2017 at 2:56 PM, Sanoj Unnikrishnan <sunnikri at redhat.com> > wrote: > > @ pranith , yes . we can get the pid on all removexattr call and also > print the backtrace of the glusterfsd process when trigerring removing > xattr. > > I will write the script and reply back. > > > > On Sat, Jul 8, 2017 at 7:06 AM, Pranith Kumar Karampuri < > pkarampu at redhat.com> wrote: > > Ram, > > As per the code, self-heal was the only candidate which *can* do > it. Could you check logs of self-heal daemon and the mount to check if > there are any metadata heals on root? > > +Sanoj > > Sanoj, > > Is there any systemtap script we can use to detect which process is > removing these xattrs? > > > > On Sat, Jul 8, 2017 at 2:58 AM, Ankireddypalle Reddy <areddy at commvault.com> > wrote: > > We lost the attributes on all the bricks on servers glusterfs2 and > glusterfs3 again. > > > > [root at glusterfs2 Log_Files]# gluster volume info > > > > Volume Name: StoragePool > > Type: Distributed-Disperse > > Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f > > Status: Started > > Number of Bricks: 20 x (2 + 1) = 60 > > Transport-type: tcp > > Bricks: > > Brick1: glusterfs1sds:/ws/disk1/ws_brick > > Brick2: glusterfs2sds:/ws/disk1/ws_brick > > Brick3: glusterfs3sds:/ws/disk1/ws_brick > > Brick4: glusterfs1sds:/ws/disk2/ws_brick > > Brick5: glusterfs2sds:/ws/disk2/ws_brick > > Brick6: glusterfs3sds:/ws/disk2/ws_brick > > Brick7: glusterfs1sds:/ws/disk3/ws_brick > > Brick8: glusterfs2sds:/ws/disk3/ws_brick > > Brick9: glusterfs3sds:/ws/disk3/ws_brick > > Brick10: glusterfs1sds:/ws/disk4/ws_brick > > Brick11: glusterfs2sds:/ws/disk4/ws_brick > > Brick12: glusterfs3sds:/ws/disk4/ws_brick > > Brick13: glusterfs1sds:/ws/disk5/ws_brick > > Brick14: glusterfs2sds:/ws/disk5/ws_brick > > Brick15: glusterfs3sds:/ws/disk5/ws_brick > > Brick16: glusterfs1sds:/ws/disk6/ws_brick > > Brick17: glusterfs2sds:/ws/disk6/ws_brick > > Brick18: glusterfs3sds:/ws/disk6/ws_brick > > Brick19: glusterfs1sds:/ws/disk7/ws_brick > > Brick20: glusterfs2sds:/ws/disk7/ws_brick > > Brick21: glusterfs3sds:/ws/disk7/ws_brick > > Brick22: glusterfs1sds:/ws/disk8/ws_brick > > Brick23: glusterfs2sds:/ws/disk8/ws_brick > > Brick24: glusterfs3sds:/ws/disk8/ws_brick > > Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick > > Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick > > Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick > > Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick > > Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick > > Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick > > Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick > > Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick > > Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick > > Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick > > Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick > > Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick > > Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick > > Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick > > Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick > > Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick > > Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick > > Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick > > Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick > > Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick > > Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick > > Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick > > Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick > > Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick > > Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick > > Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick > > Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick > > Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick > > Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick > > Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick > > Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick > > Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick > > Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick > > Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick > > Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick > > Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick > > Options Reconfigured: > > performance.readdir-ahead: on > > diagnostics.client-log-level: INFO > > auth.allow: glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds. > commvault.com,glusterfs5sds.commvault.com,glusterfs6sds.commvault.com > > > > Thanks and Regards, > > Ram > > *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] > *Sent:* Friday, July 07, 2017 12:15 PM > > > *To:* Ankireddypalle Reddy > *Cc:* Gluster Devel (gluster-devel at gluster.org); gluster-users at gluster.org > *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost > > > > > > > > On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy <areddy at commvault.com> > wrote: > > 3.7.19 > > > > These are the only callers for removexattr and only _posix_remove_xattr > has the potential to do removexattr as posix_removexattr already makes sure > that it is not gfid/volume-id. And surprise surprise _posix_remove_xattr > happens only from healing code of afr/ec. And this can only happen if the > source brick doesn't have gfid, which doesn't seem to match with the > situation you explained. > > # line filename / context / line > 1 1234 xlators/mgmt/glusterd/src/glusterd-quota.c > <<glusterd_remove_quota_limit>> > ret = sys_lremovexattr (abspath, QUOTA_LIMIT_KEY); > 2 1243 xlators/mgmt/glusterd/src/glusterd-quota.c > <<glusterd_remove_quota_limit>> > ret = sys_lremovexattr (abspath, QUOTA_LIMIT_OBJECTS_KEY); > 3 6102 xlators/mgmt/glusterd/src/glusterd-utils.c > <<glusterd_check_and_set_brick_xattr>> > sys_lremovexattr (path, "trusted.glusterfs.test"); > 4 80 xlators/storage/posix/src/posix-handle.h > <<REMOVE_PGFID_XATTR>> > op_ret = sys_lremovexattr (path, key); \ > 5 5026 xlators/storage/posix/src/posix.c <<_posix_remove_xattr>> > op_ret = sys_lremovexattr (filler->real_path, key); > 6 5101 xlators/storage/posix/src/posix.c <<posix_removexattr>> > op_ret = sys_lremovexattr (real_path, name); > 7 6811 xlators/storage/posix/src/posix.c <<init>> > sys_lremovexattr (dir_data->data, "trusted.glusterfs.test"); > > So there are only two possibilities: > > 1) Source directory in ec/afr doesn't have gfid > > 2) Something else removed these xattrs. > > What is your volume info? May be that will give more clues. > > > > PS: sys_fremovexattr is called only from posix_fremovexattr(), so that > doesn't seem to be the culprit as it also have checks to guard against > gfid/volume-id removal. > > > > Thanks and Regards, > > Ram > > *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] > *Sent:* Friday, July 07, 2017 11:54 AM > > > *To:* Ankireddypalle Reddy > *Cc:* Gluster Devel (gluster-devel at gluster.org); gluster-users at gluster.org > *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost > > > > > > > > On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy <areddy at commvault.com> > wrote: > > Pranith, > > Thanks for looking in to the issue. The bricks were > mounted after the reboot. One more thing that I noticed was when the > attributes were manually set when glusterd was up then on starting the > volume the attributes were again lost. Had to stop glusterd set attributes > and then start glusterd. After that the volume start succeeded. > > > > Which version is this? > > > > > > Thanks and Regards, > > Ram > > > > *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] > *Sent:* Friday, July 07, 2017 11:46 AM > *To:* Ankireddypalle Reddy > *Cc:* Gluster Devel (gluster-devel at gluster.org); gluster-users at gluster.org > *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost > > > > Did anything special happen on these two bricks? It can't happen in the > I/O path: > posix_removexattr() has: > 0 if (!strcmp (GFID_XATTR_KEY, name)) > { > > > 1 gf_msg (this->name, GF_LOG_WARNING, 0, > P_MSG_XATTR_NOT_REMOVED, > 2 "Remove xattr called on gfid for file %s", > real_path); > 3 op_ret = -1; > > 4 goto out; > > 5 } > > 6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name)) > { > 7 gf_msg (this->name, GF_LOG_WARNING, 0, > P_MSG_XATTR_NOT_REMOVED, > 8 "Remove xattr called on volume-id for file > %s", > 9 real_path); > > 10 op_ret = -1; > > 11 goto out; > > 12 } > > I just found that op_errno is not set correctly, but it can't happen in > the I/O path, so self-heal/rebalance are off the hook. > > I also grepped for any removexattr of trusted.gfid from glusterd and > didn't find any. > > So one thing that used to happen was that sometimes when machines reboot, > the brick mounts wouldn't happen and this would lead to absence of both > trusted.gfid and volume-id. So at the moment this is my wild guess. > > > > > > On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy <areddy at commvault.com> > wrote: > > Hi, > > We faced an issue in the production today. We had to stop the > volume and reboot all the servers in the cluster. Once the servers > rebooted starting of the volume failed because the following extended > attributes were not present on all the bricks on 2 servers. > > 1) trusted.gfid > > 2) trusted.glusterfs.volume-id > > > > We had to manually set these extended attributes to start the volume. Are > there any such known issues. > > > > Thanks and Regards, > > Ram > > ***************************Legal Disclaimer*************************** > > "This communication may contain confidential and privileged material for > the > > sole use of the intended recipient. Any unauthorized review, use or > distribution > > by others is strictly prohibited. If you have received the message by > mistake, > > please advise the sender by reply email and delete the message. Thank you." > > ********************************************************************** > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-devel > > > > > -- > > Pranith > > ***************************Legal Disclaimer*************************** > > "This communication may contain confidential and privileged material for > the > > sole use of the intended recipient. Any unauthorized review, use or > distribution > > by others is strictly prohibited. If you have received the message by > mistake, > > please advise the sender by reply email and delete the message. Thank you." > > ********************************************************************** > > > > > -- > > Pranith > > ***************************Legal Disclaimer*************************** > > "This communication may contain confidential and privileged material for > the > > sole use of the intended recipient. Any unauthorized review, use or > distribution > > by others is strictly prohibited. If you have received the message by > mistake, > > please advise the sender by reply email and delete the message. Thank you." > > ********************************************************************** > > > > > -- > > Pranith > > ***************************Legal Disclaimer*************************** > > "This communication may contain confidential and privileged material for > the > > sole use of the intended recipient. Any unauthorized review, use or > distribution > > by others is strictly prohibited. If you have received the message by > mistake, > > please advise the sender by reply email and delete the message. Thank you." > > ********************************************************************** > > > > -- > > Pranith > > > > > > > > > -- > > Pranith > ***************************Legal Disclaimer*************************** > "This communication may contain confidential and privileged material for > the > sole use of the intended recipient. Any unauthorized review, use or > distribution > by others is strictly prohibited. If you have received the message by > mistake, > please advise the sender by reply email and delete the message. Thank you." > ********************************************************************** >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170713/9ad827af/attachment.html>
Ankireddypalle Reddy
2017-Jul-13 13:55 UTC
[Gluster-users] [Gluster-devel] gfid and volume-id extended attributes lost
Thanks Pranith. We are waiting for a downtime on our production setup. Will update you once we are able to apply this on our production setup. Thanks and Regards, Ram From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] Sent: Thursday, July 13, 2017 4:13 AM To: Ankireddypalle Reddy Cc: Sanoj Unnikrishnan; Gluster Devel (gluster-devel at gluster.org); gluster-users at gluster.org Subject: Re: [Gluster-devel] gfid and volume-id extended attributes lost Ram, I sent https://review.gluster.org/17765 to fix the possibility in bulk removexattr. But I am not sure if this is indeed the reason for this issue. On Mon, Jul 10, 2017 at 6:30 PM, Ankireddypalle Reddy <areddy at commvault.com<mailto:areddy at commvault.com>> wrote: Thanks for the swift turn around. Will try this out and let you know. Thanks and Regards, Ram From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com<mailto:pkarampu at redhat.com>] Sent: Monday, July 10, 2017 8:31 AM To: Sanoj Unnikrishnan Cc: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>); gluster-users at gluster.org<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-devel] gfid and volume-id extended attributes lost Ram, If you see it again, you can use this. I am going to send out a patch for the code path which can lead to removal of gfid/volume-id tomorrow. On Mon, Jul 10, 2017 at 5:19 PM, Sanoj Unnikrishnan <sunnikri at redhat.com<mailto:sunnikri at redhat.com>> wrote: Please use the systemtap script(https://paste.fedoraproject.org/paste/EGDa0ErwX0LV3y-gBYpfNA) to check which process is invoking remove xattr calls. It prints the pid, tid and arguments of all removexattr calls. I have checked for these fops at the protocol/client and posix translators. To run the script .. 1) install systemtap and dependencies. 2) install glusterfs-debuginfo 3) change the path of the translator in the systemtap script to appropriate values for your system (change "/usr/lib64/glusterfs/3.12dev/xlator/protocol/client.so" and "/usr/lib64/glusterfs/3.12dev/xlator/storage/posix.so") 4) run the script as follows #stap -v fop_trace.stp The o/p would look like these .. additionally arguments will also be dumped if glusterfs-debuginfo is also installed (i had not done it here.) pid-958: 0 glusterfsd(3893):->posix_setxattr pid-958: 47 glusterfsd(3893):<-posix_setxattr pid-966: 0 glusterfsd(5033):->posix_setxattr pid-966: 57 glusterfsd(5033):<-posix_setxattr pid-1423: 0 glusterfs(1431):->client_setxattr pid-1423: 37 glusterfs(1431):<-client_setxattr pid-1423: 0 glusterfs(1431):->client_setxattr pid-1423: 41 glusterfs(1431):<-client_setxattr Regards, Sanoj On Mon, Jul 10, 2017 at 2:56 PM, Sanoj Unnikrishnan <sunnikri at redhat.com<mailto:sunnikri at redhat.com>> wrote: @ pranith , yes . we can get the pid on all removexattr call and also print the backtrace of the glusterfsd process when trigerring removing xattr. I will write the script and reply back. On Sat, Jul 8, 2017 at 7:06 AM, Pranith Kumar Karampuri <pkarampu at redhat.com<mailto:pkarampu at redhat.com>> wrote: Ram, As per the code, self-heal was the only candidate which *can* do it. Could you check logs of self-heal daemon and the mount to check if there are any metadata heals on root? +Sanoj Sanoj, Is there any systemtap script we can use to detect which process is removing these xattrs? On Sat, Jul 8, 2017 at 2:58 AM, Ankireddypalle Reddy <areddy at commvault.com<mailto:areddy at commvault.com>> wrote: We lost the attributes on all the bricks on servers glusterfs2 and glusterfs3 again. [root at glusterfs2 Log_Files]# gluster volume info Volume Name: StoragePool Type: Distributed-Disperse Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f Status: Started Number of Bricks: 20 x (2 + 1) = 60 Transport-type: tcp Bricks: Brick1: glusterfs1sds:/ws/disk1/ws_brick Brick2: glusterfs2sds:/ws/disk1/ws_brick Brick3: glusterfs3sds:/ws/disk1/ws_brick Brick4: glusterfs1sds:/ws/disk2/ws_brick Brick5: glusterfs2sds:/ws/disk2/ws_brick Brick6: glusterfs3sds:/ws/disk2/ws_brick Brick7: glusterfs1sds:/ws/disk3/ws_brick Brick8: glusterfs2sds:/ws/disk3/ws_brick Brick9: glusterfs3sds:/ws/disk3/ws_brick Brick10: glusterfs1sds:/ws/disk4/ws_brick Brick11: glusterfs2sds:/ws/disk4/ws_brick Brick12: glusterfs3sds:/ws/disk4/ws_brick Brick13: glusterfs1sds:/ws/disk5/ws_brick Brick14: glusterfs2sds:/ws/disk5/ws_brick Brick15: glusterfs3sds:/ws/disk5/ws_brick Brick16: glusterfs1sds:/ws/disk6/ws_brick Brick17: glusterfs2sds:/ws/disk6/ws_brick Brick18: glusterfs3sds:/ws/disk6/ws_brick Brick19: glusterfs1sds:/ws/disk7/ws_brick Brick20: glusterfs2sds:/ws/disk7/ws_brick Brick21: glusterfs3sds:/ws/disk7/ws_brick Brick22: glusterfs1sds:/ws/disk8/ws_brick Brick23: glusterfs2sds:/ws/disk8/ws_brick Brick24: glusterfs3sds:/ws/disk8/ws_brick Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick Options Reconfigured: performance.readdir-ahead: on diagnostics.client-log-level: INFO auth.allow: glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds.commvault.com<http://glusterfs4sds.commvault.com>,glusterfs5sds.commvault.com<http://glusterfs5sds.commvault.com>,glusterfs6sds.commvault.com<http://glusterfs6sds.commvault.com> Thanks and Regards, Ram From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com<mailto:pkarampu at redhat.com>] Sent: Friday, July 07, 2017 12:15 PM To: Ankireddypalle Reddy Cc: Gluster Devel (gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>); gluster-users at gluster.org<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-devel] gfid and volume-id extended attributes lost On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy <areddy at commvault.com<mailto:areddy at commvault.com>> wrote: 3.7.19 These are the only callers for removexattr and only _posix_remove_xattr has the potential to do removexattr as posix_removexattr already makes sure that it is not gfid/volume-id. And surprise surprise _posix_remove_xattr happens only from healing code of afr/ec. And this can only happen if the source brick doesn't have gfid, which doesn't seem to match with the situation you explained. # line filename / context / line 1 1234 xlators/mgmt/glusterd/src/glusterd-quota.c <<glusterd_remove_quota_limit>> ret = sys_lremovexattr (abspath, QUOTA_LIMIT_KEY); 2 1243 xlators/mgmt/glusterd/src/glusterd-quota.c <<glusterd_remove_quota_limit>> ret = sys_lremovexattr (abspath, QUOTA_LIMIT_OBJECTS_KEY); 3 6102 xlators/mgmt/glusterd/src/glusterd-utils.c <<glusterd_check_and_set_brick_xattr>> sys_lremovexattr (path, "trusted.glusterfs.test"); 4 80 xlators/storage/posix/src/posix-handle.h <<REMOVE_PGFID_XATTR>> op_ret = sys_lremovexattr (path, key); \ 5 5026 xlators/storage/posix/src/posix.c <<_posix_remove_xattr>> op_ret = sys_lremovexattr (filler->real_path, key); 6 5101 xlators/storage/posix/src/posix.c <<posix_removexattr>> op_ret = sys_lremovexattr (real_path, name); 7 6811 xlators/storage/posix/src/posix.c <<init>> sys_lremovexattr (dir_data->data, "trusted.glusterfs.test"); So there are only two possibilities: 1) Source directory in ec/afr doesn't have gfid 2) Something else removed these xattrs. What is your volume info? May be that will give more clues. PS: sys_fremovexattr is called only from posix_fremovexattr(), so that doesn't seem to be the culprit as it also have checks to guard against gfid/volume-id removal. Thanks and Regards, Ram From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com<mailto:pkarampu at redhat.com>] Sent: Friday, July 07, 2017 11:54 AM To: Ankireddypalle Reddy Cc: Gluster Devel (gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>); gluster-users at gluster.org<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-devel] gfid and volume-id extended attributes lost On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy <areddy at commvault.com<mailto:areddy at commvault.com>> wrote: Pranith, Thanks for looking in to the issue. The bricks were mounted after the reboot. One more thing that I noticed was when the attributes were manually set when glusterd was up then on starting the volume the attributes were again lost. Had to stop glusterd set attributes and then start glusterd. After that the volume start succeeded. Which version is this? Thanks and Regards, Ram From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com<mailto:pkarampu at redhat.com>] Sent: Friday, July 07, 2017 11:46 AM To: Ankireddypalle Reddy Cc: Gluster Devel (gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>); gluster-users at gluster.org<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-devel] gfid and volume-id extended attributes lost Did anything special happen on these two bricks? It can't happen in the I/O path: posix_removexattr() has: 0 if (!strcmp (GFID_XATTR_KEY, name)) { 1 gf_msg (this->name, GF_LOG_WARNING, 0, P_MSG_XATTR_NOT_REMOVED, 2 "Remove xattr called on gfid for file %s", real_path); 3 op_ret = -1; 4 goto out; 5 } 6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name)) { 7 gf_msg (this->name, GF_LOG_WARNING, 0, P_MSG_XATTR_NOT_REMOVED, 8 "Remove xattr called on volume-id for file %s", 9 real_path); 10 op_ret = -1; 11 goto out; 12 } I just found that op_errno is not set correctly, but it can't happen in the I/O path, so self-heal/rebalance are off the hook. I also grepped for any removexattr of trusted.gfid from glusterd and didn't find any. So one thing that used to happen was that sometimes when machines reboot, the brick mounts wouldn't happen and this would lead to absence of both trusted.gfid and volume-id. So at the moment this is my wild guess. On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy <areddy at commvault.com<mailto:areddy at commvault.com>> wrote: Hi, We faced an issue in the production today. We had to stop the volume and reboot all the servers in the cluster. Once the servers rebooted starting of the volume failed because the following extended attributes were not present on all the bricks on 2 servers. 1) trusted.gfid 2) trusted.glusterfs.volume-id We had to manually set these extended attributes to start the volume. Are there any such known issues. Thanks and Regards, Ram ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** _______________________________________________ Gluster-devel mailing list Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org> http://lists.gluster.org/mailman/listinfo/gluster-devel -- Pranith ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** -- Pranith ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** -- Pranith ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** -- Pranith -- Pranith ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** -- Pranith ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170713/fc547483/attachment.html>
Possibly Parallel Threads
- [Gluster-devel] gfid and volume-id extended attributes lost
- [Gluster-devel] gfid and volume-id extended attributes lost
- [Gluster-devel] gfid and volume-id extended attributes lost
- [Gluster-devel] gfid and volume-id extended attributes lost
- [Gluster-devel] gfid and volume-id extended attributes lost