thr3ads.net - Gluster users - [Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights [Apr 2018]

If this information is useful, please help other people find it:
Share via:

Frank Ruehlemann

2018-Apr-23 13:22 UTC

[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights

Hi,

after 2 years running GlusterFS without bigger problems we're facing
some strange errors lately.

After updating to 3.12.7 some user reported at least 4 broken
directories with some invisible files. The files are at the bricks and
don't start with a dot, but aren't visible in "ls". Clients
still can
interact with them by using the explicit path.
More information: https://bugzilla.redhat.com/show_bug.cgi?id=1564071

And since this update gluster reported for the rebalance of >16900 PB
(Petabyte!) of data for one of our 2 server, when using ?gluster volume
rebalance $myvolume status?. The time looks right, but the size of
transfered files is absurd. The rebalance was with 3.12.6 in March 2018.
The last rebalance log file listed no errors and a realistic size at the
end.

We started a new rebalance today during a downtime of our corresponding
compute cluster, since these errors started to spread and this might
help. The output of ?gluster volume rebalance $myvolume status? doesn't
list any errors so far and the numbers look like realistic values.
But we're seeing some strange errors (every few minutes) reports in the
journald:
?[2018-04-23 12:31:24.942377] E [MSGID: 113001]
[posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$myvolume-posix:
setxattr failed
on
/srv/glusterfs/bricks/DATA112/data/.glusterfs/e6/a8/e6a8ce50-fda5-4bad-8d4d-acd25dafcaa2
while doing xattrop:
key=trusted.glusterfs.quota.1ce02d3b-b7ae-4485-903c-2991de5350b6.contri.1 [No
such file or directory]?
The rebalance log file lists no errors.

Has anybody seen similar error messages during a rebalance?

And we see some files dublicated. There are two copies on different
bricks (we're running a distributed volume). 
One copy looks like this: 
$ ls -lah
-rwxr--r--  2 $user $group  293 May 11  2017 config

The other one looks rather strange:
$ ls -lah
---------T  2 root    $group    0 May 11  2017 config

Has anybody seen similar broken files?

We're using gluster 3.12 from the gluster.org-repositories on a standard
Debian 9 with XFS formatted bricks.

Hopefully somebody might have an answer how to fix this.

At least somebody in the future might find this, since we didn't found
anything while searching after these errors. If you're from the future:
Good luck! (^_^)

So far,

-- 
Frank R?hlemann
   IT-Systemtechnik

UNIVERSIT?T ZU L?BECK
    IT-Service-Center
    
    Ratzeburger Allee 160
    23562 L?beck
    Tel +49 451 3101 2034
    Fax +49 451 3101 2004
    ruehlemann at itsc.uni-luebeck.de
    www.itsc.uni-luebeck.de

Nithya Balachandran

2018-Apr-23 13:42 UTC

head link

[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights

Hi,

What is the output of 'gluster volume info' for this volume?


Regards,
Nithya

On 23 April 2018 at 18:52, Frank Ruehlemann <ruehlemann at
itsc.uni-luebeck.de>
wrote:
> Hi,
>
> after 2 years running GlusterFS without bigger problems we're facing
> some strange errors lately.
>
> After updating to 3.12.7 some user reported at least 4 broken
> directories with some invisible files. The files are at the bricks and
> don't start with a dot, but aren't visible in "ls".
Clients still can
> interact with them by using the explicit path.
> More information: https://bugzilla.redhat.com/show_bug.cgi?id=1564071
>
> And since this update gluster reported for the rebalance of >16900 PB
> (Petabyte!) of data for one of our 2 server, when using ?gluster volume
> rebalance $myvolume status?. The time looks right, but the size of
> transfered files is absurd. The rebalance was with 3.12.6 in March 2018.
> The last rebalance log file listed no errors and a realistic size at the
> end.
>
> We started a new rebalance today during a downtime of our corresponding
> compute cluster, since these errors started to spread and this might
> help. The output of ?gluster volume rebalance $myvolume status? doesn't
> list any errors so far and the numbers look like realistic values.
> But we're seeing some strange errors (every few minutes) reports in the
> journald:
> ?[2018-04-23 12:31:24.942377] E [MSGID: 113001]
> [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$myvolume-posix:
> setxattr failed
> on /srv/glusterfs/bricks/DATA112/data/.glusterfs/e6/a8/
> e6a8ce50-fda5-4bad-8d4d-acd25dafcaa2 while doing xattrop:
> key=trusted.glusterfs.quota.1ce02d3b-b7ae-4485-903c-2991de5350b6.contri.1
> [No such file or directory]?
> The rebalance log file lists no errors.
>
> Has anybody seen similar error messages during a rebalance?
>
> And we see some files dublicated. There are two copies on different
> bricks (we're running a distributed volume).
> One copy looks like this:
> $ ls -lah
> -rwxr--r--  2 $user $group  293 May 11  2017 config
>
> The other one looks rather strange:
> $ ls -lah
> ---------T  2 root    $group    0 May 11  2017 config
>
> Has anybody seen similar broken files?
>
> We're using gluster 3.12 from the gluster.org-repositories on a
standard
> Debian 9 with XFS formatted bricks.
>
> Hopefully somebody might have an answer how to fix this.
>
> At least somebody in the future might find this, since we didn't found
> anything while searching after these errors. If you're from the future:
> Good luck! (^_^)
>
> So far,
>
> --
> Frank R?hlemann
>    IT-Systemtechnik
>
> UNIVERSIT?T ZU L?BECK
>     IT-Service-Center
>
>     Ratzeburger Allee 160
>     23562 L?beck
>     Tel +49 451 3101 2034
>     Fax +49 451 3101 2004
>     ruehlemann at itsc.uni-luebeck.de
>     www.itsc.uni-luebeck.de
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180423/61b72ebd/attachment.html>

Frank Ruehlemann

2018-Apr-23 14:06 UTC

head link

[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights

Hi,

here it is.

# gluster volume info $myvolume
 
Volume Name: $myvolume
Type: Distribute
Volume ID: 0d210c70-e44f-46f1-862c-ef260514c9f1
Status: Started
Snapshot Count: 0
Number of Bricks: 23
Transport-type: tcp
Bricks:
Brick1: gluster02:/srv/glusterfs/bricks/DATA201/data
Brick2: gluster02:/srv/glusterfs/bricks/DATA202/data
Brick3: gluster02:/srv/glusterfs/bricks/DATA203/data
Brick4: gluster02:/srv/glusterfs/bricks/DATA204/data
Brick5: gluster02:/srv/glusterfs/bricks/DATA205/data
Brick6: gluster02:/srv/glusterfs/bricks/DATA206/data
Brick7: gluster02:/srv/glusterfs/bricks/DATA207/data
Brick8: gluster02:/srv/glusterfs/bricks/DATA208/data
Brick9: gluster01:/srv/glusterfs/bricks/DATA110/data
Brick10: gluster01:/srv/glusterfs/bricks/DATA111/data
Brick11: gluster01:/srv/glusterfs/bricks/DATA112/data
Brick12: gluster01:/srv/glusterfs/bricks/DATA113/data
Brick13: gluster01:/srv/glusterfs/bricks/DATA114/data
Brick14: gluster02:/srv/glusterfs/bricks/DATA209/data
Brick15: gluster01:/srv/glusterfs/bricks/DATA101/data
Brick16: gluster01:/srv/glusterfs/bricks/DATA102/data
Brick17: gluster01:/srv/glusterfs/bricks/DATA103/data
Brick18: gluster01:/srv/glusterfs/bricks/DATA104/data
Brick19: gluster01:/srv/glusterfs/bricks/DATA105/data
Brick20: gluster01:/srv/glusterfs/bricks/DATA106/data
Brick21: gluster01:/srv/glusterfs/bricks/DATA107/data
Brick22: gluster01:/srv/glusterfs/bricks/DATA108/data
Brick23: gluster01:/srv/glusterfs/bricks/DATA109/data
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
auth.allow: $myipspace
performance.readdir-ahead: on
diagnostics.brick-log-level: WARNING
nfs.disable: on
transport.address-family: inet
nfs.addr-namelookup: off
diagnostics.brick-sys-log-level: WARNING

Well at least one thing got fixed by this reboot: "df -h" returns a
realistic size of the volume etc. This wasn't the case after our update
to 3.12.7.

Best Regards,

-- 
Frank R?hlemann
   IT-Systemtechnik

UNIVERSIT?T ZU L?BECK
    IT-Service-Center
    
    Ratzeburger Allee 160
    23562 L?beck
    Tel +49 451 3101 2034
    Fax +49 451 3101 2004
    ruehlemann at itsc.uni-luebeck.de
    www.itsc.uni-luebeck.de



Am Montag, den 23.04.2018, 19:12 +0530 schrieb Nithya
Balachandran:> Hi,
> 
> What is the output of 'gluster volume info' for this volume?
> 
> 
> Regards,
> Nithya
> 
> On 23 April 2018 at 18:52, Frank Ruehlemann <ruehlemann at
itsc.uni-luebeck.de>
> wrote:
> 
> > Hi,
> >
> > after 2 years running GlusterFS without bigger problems we're
facing
> > some strange errors lately.
> >
> > After updating to 3.12.7 some user reported at least 4 broken
> > directories with some invisible files. The files are at the bricks and
> > don't start with a dot, but aren't visible in "ls".
Clients still can
> > interact with them by using the explicit path.
> > More information: https://bugzilla.redhat.com/show_bug.cgi?id=1564071
> >
> > And since this update gluster reported for the rebalance of >16900
PB
> > (Petabyte!) of data for one of our 2 server, when using ?gluster
volume
> > rebalance $myvolume status?. The time looks right, but the size of
> > transfered files is absurd. The rebalance was with 3.12.6 in March
2018.
> > The last rebalance log file listed no errors and a realistic size at
the
> > end.
> >
> > We started a new rebalance today during a downtime of our
corresponding
> > compute cluster, since these errors started to spread and this might
> > help. The output of ?gluster volume rebalance $myvolume status?
doesn't
> > list any errors so far and the numbers look like realistic values.
> > But we're seeing some strange errors (every few minutes) reports
in the
> > journald:
> > ?[2018-04-23 12:31:24.942377] E [MSGID: 113001]
> > [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$myvolume-posix:
> > setxattr failed
> > on /srv/glusterfs/bricks/DATA112/data/.glusterfs/e6/a8/
> > e6a8ce50-fda5-4bad-8d4d-acd25dafcaa2 while doing xattrop:
> >
key=trusted.glusterfs.quota.1ce02d3b-b7ae-4485-903c-2991de5350b6.contri.1
> > [No such file or directory]?
> > The rebalance log file lists no errors.
> >
> > Has anybody seen similar error messages during a rebalance?
> >
> > And we see some files dublicated. There are two copies on different
> > bricks (we're running a distributed volume).
> > One copy looks like this:
> > $ ls -lah
> > -rwxr--r--  2 $user $group  293 May 11  2017 config
> >
> > The other one looks rather strange:
> > $ ls -lah
> > ---------T  2 root    $group    0 May 11  2017 config
> >
> > Has anybody seen similar broken files?
> >
> > We're using gluster 3.12 from the gluster.org-repositories on a
standard
> > Debian 9 with XFS formatted bricks.
> >
> > Hopefully somebody might have an answer how to fix this.
> >
> > At least somebody in the future might find this, since we didn't
found
> > anything while searching after these errors. If you're from the
future:
> > Good luck! (^_^)
> >
> > So far,
> >
> > --
> > Frank R?hlemann
> >    IT-Systemtechnik
> >
> > UNIVERSIT?T ZU L?BECK
> >     IT-Service-Center
> >
> >     Ratzeburger Allee 160
> >     23562 L?beck
> >     Tel +49 451 3101 2034
> >     Fax +49 451 3101 2004
> >     ruehlemann at itsc.uni-luebeck.de
> >     www.itsc.uni-luebeck.de
> >
> >
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users

Nithya Balachandran

2018-Apr-23 16:21 UTC

head link

[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights

Hi,

On 23 April 2018 at 18:52, Frank Ruehlemann <ruehlemann at
itsc.uni-luebeck.de>
wrote:
> Hi,
>
> after 2 years running GlusterFS without bigger problems we're facing
> some strange errors lately.
>
> After updating to 3.12.7 some user reported at least 4 broken
> directories with some invisible files. The files are at the bricks and
> don't start with a dot, but aren't visible in "ls".
Clients still can
> interact with them by using the explicit path.
> More information: https://bugzilla.redhat.com/show_bug.cgi?id=1564071

I will continue the analysis for this issue in the bug.
>
>
> And since this update gluster reported for the rebalance of >16900 PB
> (Petabyte!) of data for one of our 2 server, when using ?gluster volume
> rebalance $myvolume status?. The time looks right, but the size of
> transfered files is absurd. The rebalance was with 3.12.6 in March 2018.
> The last rebalance log file listed no errors and a realistic size at the
> end.
>
This has been seen a few times and is because an incorrect value is stored
in the node_state.info file . However, I don't know what causes this
incorrect value to be stored. It is harmless and can be ignored.

>
> We started a new rebalance today during a downtime of our corresponding
> compute cluster, since these errors started to spread and this might
> help. The output of ?gluster volume rebalance $myvolume status? doesn't
> list any errors so far and the numbers look like realistic values.
> But we're seeing some strange errors (every few minutes) reports in the
> journald:
> ?[2018-04-23 12:31:24.942377] E [MSGID: 113001]
> [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$myvolume-posix:
> setxattr failed
> on /srv/glusterfs/bricks/DATA112/data/.glusterfs/e6/a8/
> e6a8ce50-fda5-4bad-8d4d-acd25dafcaa2 while doing xattrop:
> key=trusted.glusterfs.quota.1ce02d3b-b7ae-4485-903c-2991de5350b6.contri.1
> [No such file or directory]?
> The rebalance log file lists no errors.
>
> Has anybody seen similar error messages during a rebalance?
>
Are any directories being deleted/renamed during the rebalance? If yes,
this could be a valid message.
>
> And we see some files dublicated. There are two copies on different
> bricks (we're running a distributed volume).
> One copy looks like this:
> $ ls -lah
> -rwxr--r--  2 $user $group  293 May 11  2017 config
>
> The other one looks rather strange:
> $ ls -lah
> ---------T  2 root    $group    0 May 11  2017 config
>
> Has anybody seen similar broken files?
>
This is fine as long as you only see a single file from the mount point.
The 'T' files are internal gluster files (called linkto files) and
should
be invisible from the mount point.


Regards,
Nithya
>
> We're using gluster 3.12 from the gluster.org-repositories on a
standard
> Debian 9 with XFS formatted bricks.
>
> Hopefully somebody might have an answer how to fix this.
>
> At least somebody in the future might find this, since we didn't found
> anything while searching after these errors. If you're from the future:
> Good luck! (^_^)
>
> So far,
>
> --
> Frank R?hlemann
>    IT-Systemtechnik
>
> UNIVERSIT?T ZU L?BECK
>     IT-Service-Center
>
>     Ratzeburger Allee 160
>     23562 L?beck
>     Tel +49 451 3101 2034
>     Fax +49 451 3101 2004
>     ruehlemann at itsc.uni-luebeck.de
>     www.itsc.uni-luebeck.de
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180423/58a173f1/attachment.html>

Frank Ruehlemann

2018-Apr-24 08:26 UTC

head link

[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights

Hi,

thank you for you quick answer.

Am Montag, den 23.04.2018, 21:51 +0530 schrieb Nithya
Balachandran:> On 23 April 2018 at 18:52, Frank Ruehlemann <ruehlemann at
itsc.uni-luebeck.de>
> wrote:
> 
> > Hi,
> >
> > after 2 years running GlusterFS without bigger problems we're
facing
> > some strange errors lately.
> >
> > After updating to 3.12.7 some user reported at least 4 broken
> > directories with some invisible files. The files are at the bricks and
> > don't start with a dot, but aren't visible in "ls".
Clients still can
> > interact with them by using the explicit path.
> > More information: https://bugzilla.redhat.com/show_bug.cgi?id=1564071
> 
> 
> I will continue the analysis for this issue in the bug.
This would be very helpful. We saw your request for additional
information and will provide them as soon as possible.

> > And since this update gluster reported for the rebalance of >16900
PB
> > (Petabyte!) of data for one of our 2 server, when using ?gluster
volume
> > rebalance $myvolume status?. The time looks right, but the size of
> > transfered files is absurd. The rebalance was with 3.12.6 in March
2018.
> > The last rebalance log file listed no errors and a realistic size at
the
> > end.
> >
> 
> This has been seen a few times and is because an incorrect value is stored
> in the node_state.info file . However, I don't know what causes this
> incorrect value to be stored. It is harmless and can be ignored.
Ok. :)

> > We started a new rebalance today during a downtime of our
corresponding
> > compute cluster, since these errors started to spread and this might
> > help. The output of ?gluster volume rebalance $myvolume status?
doesn't
> > list any errors so far and the numbers look like realistic values.
> > But we're seeing some strange errors (every few minutes) reports
in the
> > journald:
> > ?[2018-04-23 12:31:24.942377] E [MSGID: 113001]
> > [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$myvolume-posix:
> > setxattr failed
> > on /srv/glusterfs/bricks/DATA112/data/.glusterfs/e6/a8/
> > e6a8ce50-fda5-4bad-8d4d-acd25dafcaa2 while doing xattrop:
> >
key=trusted.glusterfs.quota.1ce02d3b-b7ae-4485-903c-2991de5350b6.contri.1
> > [No such file or directory]?
> > The rebalance log file lists no errors.
> >
> > Has anybody seen similar error messages during a rebalance?
> >
> 
> Are any directories being deleted/renamed during the rebalance? If yes,
> this could be a valid message.
No. We locked out all users and took down all clients that mount the volume
before we started the rebalance to ensure that there's no interaction of any
client with it.
The messages continued during the last hours and occurred up to several times
per minute with some sporadic phases without them on all bricks of this volume.
> > And we see some files dublicated. There are two copies on different
> > bricks (we're running a distributed volume).
> > One copy looks like this:
> > $ ls -lah
> > -rwxr--r--  2 $user $group  293 May 11  2017 config
> >
> > The other one looks rather strange:
> > $ ls -lah
> > ---------T  2 root    $group    0 May 11  2017 config
> >
> > Has anybody seen similar broken files?
> >
> 
> This is fine as long as you only see a single file from the mount point.
> The 'T' files are internal gluster files (called linkto files) and
should
> be invisible from the mount point.
> 
> 
> Regards,
> Nithya
This is good to know. Yes, all files we saw so far had only one of those
files.

Thanks for your message. It helped a lot.

-- 
Frank R?hlemann
   IT-Systemtechnik

UNIVERSIT?T ZU L?BECK
    IT-Service-Center
    
    Ratzeburger Allee 160
    23562 L?beck
    Tel +49 451 3101 2034
    Fax +49 451 3101 2004
    ruehlemann at itsc.uni-luebeck.de
    www.itsc.uni-luebeck.de

Maybe Matching Threads

Search for more maybe matching threads

Gluster users - Apr 2018 - Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights

[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights

[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights

[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights

[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights

[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights

Maybe Matching Threads