Frank Ruehlemann
2018-Apr-23 13:22 UTC
[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights
Hi, after 2 years running GlusterFS without bigger problems we're facing some strange errors lately. After updating to 3.12.7 some user reported at least 4 broken directories with some invisible files. The files are at the bricks and don't start with a dot, but aren't visible in "ls". Clients still can interact with them by using the explicit path. More information: https://bugzilla.redhat.com/show_bug.cgi?id=1564071 And since this update gluster reported for the rebalance of >16900 PB (Petabyte!) of data for one of our 2 server, when using ?gluster volume rebalance $myvolume status?. The time looks right, but the size of transfered files is absurd. The rebalance was with 3.12.6 in March 2018. The last rebalance log file listed no errors and a realistic size at the end. We started a new rebalance today during a downtime of our corresponding compute cluster, since these errors started to spread and this might help. The output of ?gluster volume rebalance $myvolume status? doesn't list any errors so far and the numbers look like realistic values. But we're seeing some strange errors (every few minutes) reports in the journald: ?[2018-04-23 12:31:24.942377] E [MSGID: 113001] [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$myvolume-posix: setxattr failed on /srv/glusterfs/bricks/DATA112/data/.glusterfs/e6/a8/e6a8ce50-fda5-4bad-8d4d-acd25dafcaa2 while doing xattrop: key=trusted.glusterfs.quota.1ce02d3b-b7ae-4485-903c-2991de5350b6.contri.1 [No such file or directory]? The rebalance log file lists no errors. Has anybody seen similar error messages during a rebalance? And we see some files dublicated. There are two copies on different bricks (we're running a distributed volume). One copy looks like this: $ ls -lah -rwxr--r-- 2 $user $group 293 May 11 2017 config The other one looks rather strange: $ ls -lah ---------T 2 root $group 0 May 11 2017 config Has anybody seen similar broken files? We're using gluster 3.12 from the gluster.org-repositories on a standard Debian 9 with XFS formatted bricks. Hopefully somebody might have an answer how to fix this. At least somebody in the future might find this, since we didn't found anything while searching after these errors. If you're from the future: Good luck! (^_^) So far, -- Frank R?hlemann IT-Systemtechnik UNIVERSIT?T ZU L?BECK IT-Service-Center Ratzeburger Allee 160 23562 L?beck Tel +49 451 3101 2034 Fax +49 451 3101 2004 ruehlemann at itsc.uni-luebeck.de www.itsc.uni-luebeck.de
Nithya Balachandran
2018-Apr-23 13:42 UTC
[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights
Hi, What is the output of 'gluster volume info' for this volume? Regards, Nithya On 23 April 2018 at 18:52, Frank Ruehlemann <ruehlemann at itsc.uni-luebeck.de> wrote:> Hi, > > after 2 years running GlusterFS without bigger problems we're facing > some strange errors lately. > > After updating to 3.12.7 some user reported at least 4 broken > directories with some invisible files. The files are at the bricks and > don't start with a dot, but aren't visible in "ls". Clients still can > interact with them by using the explicit path. > More information: https://bugzilla.redhat.com/show_bug.cgi?id=1564071 > > And since this update gluster reported for the rebalance of >16900 PB > (Petabyte!) of data for one of our 2 server, when using ?gluster volume > rebalance $myvolume status?. The time looks right, but the size of > transfered files is absurd. The rebalance was with 3.12.6 in March 2018. > The last rebalance log file listed no errors and a realistic size at the > end. > > We started a new rebalance today during a downtime of our corresponding > compute cluster, since these errors started to spread and this might > help. The output of ?gluster volume rebalance $myvolume status? doesn't > list any errors so far and the numbers look like realistic values. > But we're seeing some strange errors (every few minutes) reports in the > journald: > ?[2018-04-23 12:31:24.942377] E [MSGID: 113001] > [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$myvolume-posix: > setxattr failed > on /srv/glusterfs/bricks/DATA112/data/.glusterfs/e6/a8/ > e6a8ce50-fda5-4bad-8d4d-acd25dafcaa2 while doing xattrop: > key=trusted.glusterfs.quota.1ce02d3b-b7ae-4485-903c-2991de5350b6.contri.1 > [No such file or directory]? > The rebalance log file lists no errors. > > Has anybody seen similar error messages during a rebalance? > > And we see some files dublicated. There are two copies on different > bricks (we're running a distributed volume). > One copy looks like this: > $ ls -lah > -rwxr--r-- 2 $user $group 293 May 11 2017 config > > The other one looks rather strange: > $ ls -lah > ---------T 2 root $group 0 May 11 2017 config > > Has anybody seen similar broken files? > > We're using gluster 3.12 from the gluster.org-repositories on a standard > Debian 9 with XFS formatted bricks. > > Hopefully somebody might have an answer how to fix this. > > At least somebody in the future might find this, since we didn't found > anything while searching after these errors. If you're from the future: > Good luck! (^_^) > > So far, > > -- > Frank R?hlemann > IT-Systemtechnik > > UNIVERSIT?T ZU L?BECK > IT-Service-Center > > Ratzeburger Allee 160 > 23562 L?beck > Tel +49 451 3101 2034 > Fax +49 451 3101 2004 > ruehlemann at itsc.uni-luebeck.de > www.itsc.uni-luebeck.de > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180423/61b72ebd/attachment.html>
Frank Ruehlemann
2018-Apr-23 14:06 UTC
[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights
Hi, here it is. # gluster volume info $myvolume Volume Name: $myvolume Type: Distribute Volume ID: 0d210c70-e44f-46f1-862c-ef260514c9f1 Status: Started Snapshot Count: 0 Number of Bricks: 23 Transport-type: tcp Bricks: Brick1: gluster02:/srv/glusterfs/bricks/DATA201/data Brick2: gluster02:/srv/glusterfs/bricks/DATA202/data Brick3: gluster02:/srv/glusterfs/bricks/DATA203/data Brick4: gluster02:/srv/glusterfs/bricks/DATA204/data Brick5: gluster02:/srv/glusterfs/bricks/DATA205/data Brick6: gluster02:/srv/glusterfs/bricks/DATA206/data Brick7: gluster02:/srv/glusterfs/bricks/DATA207/data Brick8: gluster02:/srv/glusterfs/bricks/DATA208/data Brick9: gluster01:/srv/glusterfs/bricks/DATA110/data Brick10: gluster01:/srv/glusterfs/bricks/DATA111/data Brick11: gluster01:/srv/glusterfs/bricks/DATA112/data Brick12: gluster01:/srv/glusterfs/bricks/DATA113/data Brick13: gluster01:/srv/glusterfs/bricks/DATA114/data Brick14: gluster02:/srv/glusterfs/bricks/DATA209/data Brick15: gluster01:/srv/glusterfs/bricks/DATA101/data Brick16: gluster01:/srv/glusterfs/bricks/DATA102/data Brick17: gluster01:/srv/glusterfs/bricks/DATA103/data Brick18: gluster01:/srv/glusterfs/bricks/DATA104/data Brick19: gluster01:/srv/glusterfs/bricks/DATA105/data Brick20: gluster01:/srv/glusterfs/bricks/DATA106/data Brick21: gluster01:/srv/glusterfs/bricks/DATA107/data Brick22: gluster01:/srv/glusterfs/bricks/DATA108/data Brick23: gluster01:/srv/glusterfs/bricks/DATA109/data Options Reconfigured: features.quota-deem-statfs: on features.inode-quota: on features.quota: on auth.allow: $myipspace performance.readdir-ahead: on diagnostics.brick-log-level: WARNING nfs.disable: on transport.address-family: inet nfs.addr-namelookup: off diagnostics.brick-sys-log-level: WARNING Well at least one thing got fixed by this reboot: "df -h" returns a realistic size of the volume etc. This wasn't the case after our update to 3.12.7. Best Regards, -- Frank R?hlemann IT-Systemtechnik UNIVERSIT?T ZU L?BECK IT-Service-Center Ratzeburger Allee 160 23562 L?beck Tel +49 451 3101 2034 Fax +49 451 3101 2004 ruehlemann at itsc.uni-luebeck.de www.itsc.uni-luebeck.de Am Montag, den 23.04.2018, 19:12 +0530 schrieb Nithya Balachandran:> Hi, > > What is the output of 'gluster volume info' for this volume? > > > Regards, > Nithya > > On 23 April 2018 at 18:52, Frank Ruehlemann <ruehlemann at itsc.uni-luebeck.de> > wrote: > > > Hi, > > > > after 2 years running GlusterFS without bigger problems we're facing > > some strange errors lately. > > > > After updating to 3.12.7 some user reported at least 4 broken > > directories with some invisible files. The files are at the bricks and > > don't start with a dot, but aren't visible in "ls". Clients still can > > interact with them by using the explicit path. > > More information: https://bugzilla.redhat.com/show_bug.cgi?id=1564071 > > > > And since this update gluster reported for the rebalance of >16900 PB > > (Petabyte!) of data for one of our 2 server, when using ?gluster volume > > rebalance $myvolume status?. The time looks right, but the size of > > transfered files is absurd. The rebalance was with 3.12.6 in March 2018. > > The last rebalance log file listed no errors and a realistic size at the > > end. > > > > We started a new rebalance today during a downtime of our corresponding > > compute cluster, since these errors started to spread and this might > > help. The output of ?gluster volume rebalance $myvolume status? doesn't > > list any errors so far and the numbers look like realistic values. > > But we're seeing some strange errors (every few minutes) reports in the > > journald: > > ?[2018-04-23 12:31:24.942377] E [MSGID: 113001] > > [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$myvolume-posix: > > setxattr failed > > on /srv/glusterfs/bricks/DATA112/data/.glusterfs/e6/a8/ > > e6a8ce50-fda5-4bad-8d4d-acd25dafcaa2 while doing xattrop: > > key=trusted.glusterfs.quota.1ce02d3b-b7ae-4485-903c-2991de5350b6.contri.1 > > [No such file or directory]? > > The rebalance log file lists no errors. > > > > Has anybody seen similar error messages during a rebalance? > > > > And we see some files dublicated. There are two copies on different > > bricks (we're running a distributed volume). > > One copy looks like this: > > $ ls -lah > > -rwxr--r-- 2 $user $group 293 May 11 2017 config > > > > The other one looks rather strange: > > $ ls -lah > > ---------T 2 root $group 0 May 11 2017 config > > > > Has anybody seen similar broken files? > > > > We're using gluster 3.12 from the gluster.org-repositories on a standard > > Debian 9 with XFS formatted bricks. > > > > Hopefully somebody might have an answer how to fix this. > > > > At least somebody in the future might find this, since we didn't found > > anything while searching after these errors. If you're from the future: > > Good luck! (^_^) > > > > So far, > > > > -- > > Frank R?hlemann > > IT-Systemtechnik > > > > UNIVERSIT?T ZU L?BECK > > IT-Service-Center > > > > Ratzeburger Allee 160 > > 23562 L?beck > > Tel +49 451 3101 2034 > > Fax +49 451 3101 2004 > > ruehlemann at itsc.uni-luebeck.de > > www.itsc.uni-luebeck.de > > > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users
Nithya Balachandran
2018-Apr-23 16:21 UTC
[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights
Hi, On 23 April 2018 at 18:52, Frank Ruehlemann <ruehlemann at itsc.uni-luebeck.de> wrote:> Hi, > > after 2 years running GlusterFS without bigger problems we're facing > some strange errors lately. > > After updating to 3.12.7 some user reported at least 4 broken > directories with some invisible files. The files are at the bricks and > don't start with a dot, but aren't visible in "ls". Clients still can > interact with them by using the explicit path. > More information: https://bugzilla.redhat.com/show_bug.cgi?id=1564071I will continue the analysis for this issue in the bug.> > > And since this update gluster reported for the rebalance of >16900 PB > (Petabyte!) of data for one of our 2 server, when using ?gluster volume > rebalance $myvolume status?. The time looks right, but the size of > transfered files is absurd. The rebalance was with 3.12.6 in March 2018. > The last rebalance log file listed no errors and a realistic size at the > end. >This has been seen a few times and is because an incorrect value is stored in the node_state.info file . However, I don't know what causes this incorrect value to be stored. It is harmless and can be ignored.> > We started a new rebalance today during a downtime of our corresponding > compute cluster, since these errors started to spread and this might > help. The output of ?gluster volume rebalance $myvolume status? doesn't > list any errors so far and the numbers look like realistic values. > But we're seeing some strange errors (every few minutes) reports in the > journald: > ?[2018-04-23 12:31:24.942377] E [MSGID: 113001] > [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$myvolume-posix: > setxattr failed > on /srv/glusterfs/bricks/DATA112/data/.glusterfs/e6/a8/ > e6a8ce50-fda5-4bad-8d4d-acd25dafcaa2 while doing xattrop: > key=trusted.glusterfs.quota.1ce02d3b-b7ae-4485-903c-2991de5350b6.contri.1 > [No such file or directory]? > The rebalance log file lists no errors. > > Has anybody seen similar error messages during a rebalance? >Are any directories being deleted/renamed during the rebalance? If yes, this could be a valid message.> > And we see some files dublicated. There are two copies on different > bricks (we're running a distributed volume). > One copy looks like this: > $ ls -lah > -rwxr--r-- 2 $user $group 293 May 11 2017 config > > The other one looks rather strange: > $ ls -lah > ---------T 2 root $group 0 May 11 2017 config > > Has anybody seen similar broken files? >This is fine as long as you only see a single file from the mount point. The 'T' files are internal gluster files (called linkto files) and should be invisible from the mount point. Regards, Nithya> > We're using gluster 3.12 from the gluster.org-repositories on a standard > Debian 9 with XFS formatted bricks. > > Hopefully somebody might have an answer how to fix this. > > At least somebody in the future might find this, since we didn't found > anything while searching after these errors. If you're from the future: > Good luck! (^_^) > > So far, > > -- > Frank R?hlemann > IT-Systemtechnik > > UNIVERSIT?T ZU L?BECK > IT-Service-Center > > Ratzeburger Allee 160 > 23562 L?beck > Tel +49 451 3101 2034 > Fax +49 451 3101 2004 > ruehlemann at itsc.uni-luebeck.de > www.itsc.uni-luebeck.de > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180423/58a173f1/attachment.html>
Frank Ruehlemann
2018-Apr-24 08:26 UTC
[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights
Hi, thank you for you quick answer. Am Montag, den 23.04.2018, 21:51 +0530 schrieb Nithya Balachandran:> On 23 April 2018 at 18:52, Frank Ruehlemann <ruehlemann at itsc.uni-luebeck.de> > wrote: > > > Hi, > > > > after 2 years running GlusterFS without bigger problems we're facing > > some strange errors lately. > > > > After updating to 3.12.7 some user reported at least 4 broken > > directories with some invisible files. The files are at the bricks and > > don't start with a dot, but aren't visible in "ls". Clients still can > > interact with them by using the explicit path. > > More information: https://bugzilla.redhat.com/show_bug.cgi?id=1564071 > > > I will continue the analysis for this issue in the bug.This would be very helpful. We saw your request for additional information and will provide them as soon as possible.> > And since this update gluster reported for the rebalance of >16900 PB > > (Petabyte!) of data for one of our 2 server, when using ?gluster volume > > rebalance $myvolume status?. The time looks right, but the size of > > transfered files is absurd. The rebalance was with 3.12.6 in March 2018. > > The last rebalance log file listed no errors and a realistic size at the > > end. > > > > This has been seen a few times and is because an incorrect value is stored > in the node_state.info file . However, I don't know what causes this > incorrect value to be stored. It is harmless and can be ignored.Ok. :)> > We started a new rebalance today during a downtime of our corresponding > > compute cluster, since these errors started to spread and this might > > help. The output of ?gluster volume rebalance $myvolume status? doesn't > > list any errors so far and the numbers look like realistic values. > > But we're seeing some strange errors (every few minutes) reports in the > > journald: > > ?[2018-04-23 12:31:24.942377] E [MSGID: 113001] > > [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$myvolume-posix: > > setxattr failed > > on /srv/glusterfs/bricks/DATA112/data/.glusterfs/e6/a8/ > > e6a8ce50-fda5-4bad-8d4d-acd25dafcaa2 while doing xattrop: > > key=trusted.glusterfs.quota.1ce02d3b-b7ae-4485-903c-2991de5350b6.contri.1 > > [No such file or directory]? > > The rebalance log file lists no errors. > > > > Has anybody seen similar error messages during a rebalance? > > > > Are any directories being deleted/renamed during the rebalance? If yes, > this could be a valid message.No. We locked out all users and took down all clients that mount the volume before we started the rebalance to ensure that there's no interaction of any client with it. The messages continued during the last hours and occurred up to several times per minute with some sporadic phases without them on all bricks of this volume.> > And we see some files dublicated. There are two copies on different > > bricks (we're running a distributed volume). > > One copy looks like this: > > $ ls -lah > > -rwxr--r-- 2 $user $group 293 May 11 2017 config > > > > The other one looks rather strange: > > $ ls -lah > > ---------T 2 root $group 0 May 11 2017 config > > > > Has anybody seen similar broken files? > > > > This is fine as long as you only see a single file from the mount point. > The 'T' files are internal gluster files (called linkto files) and should > be invisible from the mount point. > > > Regards, > NithyaThis is good to know. Yes, all files we saw so far had only one of those files. Thanks for your message. It helped a lot. -- Frank R?hlemann IT-Systemtechnik UNIVERSIT?T ZU L?BECK IT-Service-Center Ratzeburger Allee 160 23562 L?beck Tel +49 451 3101 2034 Fax +49 451 3101 2004 ruehlemann at itsc.uni-luebeck.de www.itsc.uni-luebeck.de
Possibly Parallel Threads
- Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights
- Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights
- Invisible files and directories
- Invisible files and directories
- About adding bricks ...