thr3ads.net - Gluster users - [Gluster-users] Gluster 3.12.14: wrong quota in Distributed Dispersed Volume [Nov 2018]

If this information is useful, please help other people find it:
Share via:

Hari Gowtham

2018-Nov-26 14:50 UTC

[Gluster-users] Gluster 3.12.14: wrong quota in Distributed Dispersed Volume

Comments inline.

On Mon, Nov 26, 2018 at 7:25 PM Gudrun Mareike Amedick
<g.amedick at uni-luebeck.de> wrote:>
> Hi Hari,
>
> I'm sorry to bother you again, but I have a few questions concerning
the script.
>
> Do I understand correctly that I have to execute it once per brick on each
server?
Yes. On all the servers.> It is a dispersed volume, so the file size on brick side and on client side
can differ. Is that a problem?The size on brick is aggregated by the quota daemon and the displayed
on the client. If there was a problem with the aggregation
(caused due to missed updates in the brick) then we see a different
size reported. As its a distributed file system this is how it works.
To fix these missed updates, we need to find where the updates are
missed. and then on these missed directories we need to set a
dirty flag (on the directories of all bricks) and then do a stat on
this directory from the client. As the client will see the dirty flag
in the xattr,
it will try to fix the values that are accounted wrong and update the
right value.>
> Is it a reasonable way of action if I first run "python quota_fsck.py
--subdir $broken_dir $brickpath" to see if it reports something and if yes,
runThe script can be run with fix-issues argument in a single go, but we
haven't tested the fix issues side intensively.
As the above command shows you where the problem is, we can explicitly
set the dirty flag and then do a lookup to fix the issue.
This will help you understand where the issue is.
> "python quota_fsck.py --subdir $broken_dir --fix-issues $mountpoint
$brickpath" to correct them?The fix issue argument actually makes changes to the brick (changes
only related to quota. which can be fixed with a restart)
But as the restart is not crawling completely, We will come back to
the script in case if there is an abnormality seen.

So you can run the script in two ways:
1) without --fix-issues and then see where the issue is. And then set
the dirty flag on all the bricks of that directory of the brick and
the do a stat from the client.
2) with fix-issue this should take care of both the setting dirty flag
and then doing a stat on it.

You can choose any one of the above. Both has its own benefits:
Without fix-issues: you need to do a lot of work, but its scrutinized.
So the changes you make to the backend are fine.

with fix-issues (you just need to run it once along with this arguement)
It changes the xattr values related to quota on the backend and fixes
it. Looking at the size of your volume, if some thing goes wrong then
we are left with
useless quota values. To clean this up is where restart of quota comes
into play. And with your volume size, the restart doesn't fix the
whole system.

So both the ways things are fine.
>
> I'd run "du -h $mountpoint/broken_dir" from client side as a
lookup. Is that sufficient?yep. Necessary only if you are running the script without the --fix-issues
.>
> Will further action be required or should this be enough?
>
> Kind regards
>
> Gudrun
> Am Montag, den 26.11.2018, 17:26 +0530 schrieb Hari Gowtham:
> > Yes. In that case you can run the script and see what errors it is
> > throwing and then clean that directory up with setting dirty and then
> > doing a lookup.
> > Again for such a huge size, it will consume a lot of resource.
> >
> > On Mon, Nov 26, 2018 at 3:56 PM Gudrun Mareike Amedick
> > <g.amedick at uni-luebeck.de> wrote:
> > >
> > >
> > > Hi,
> > >
> > > we have no notifications of OOM kills in /var/log/messages. So if
I understood this correctly, the crawls finished but my attributes weren't
set
> > > correctly? And this script should fix them?
> > >
> > > Thanks for your help so far
> > >
> > > Gudrun
> > > Am Donnerstag, den 22.11.2018, 13:03 +0530 schrieb Hari Gowtham:
> > > >
> > > > On Wed, Nov 21, 2018 at 8:55 PM Gudrun Mareike Amedick
> > > > <g.amedick at uni-luebeck.de> wrote:
> > > > >
> > > > >
> > > > >
> > > > > Hi Hari,
> > > > >
> > > > > I disabled and re-enabled the quota and I saw the
crawlers starting. However, this caused a pretty high load on my servers (200+)
and this
> > > > > seem to
> > > > > have gotten them killed again. At least, I have no
crawlers running, the quotas are not matching the output of du -h, and the
crawler logs all
> > > > > contain
> > > > > this line:
> > > > The quota crawl is an intensive process as it has to crawl
the entire
> > > > file system. The intensity varies based on the number of
bricks,
> > > > number of files,
> > > > the depth of filesystem, on going io to the filesystem and
so on.
> > > > Being a disperse volume it will have to talk to all the
bricks and
> > > > also with the huge size, the
> > > > increase in the CPU is expected.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > [2018-11-20 14:16:35.180467] W
[glusterfsd.c:1375:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7f0e3d6fe494] --
> > > > > >
> > > > > >
> > > > > > /usr/sbin/glusterfs(glusterfs_sigwaiter+0xf5)
[0x561eb7952d45] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54)
[0x561eb7952ba4] ) 0-: received
> > > > > > signum
> > > > > (15), shutting down
> > > > This can mean that the file attributes are set and then its
stopped/
> > > > as you said the process was killed while it still has the
attributes
> > > > to be set on a few set of files.
> > > >
> > > > This message is common for all the shutdown (one triggered
after the
> > > > job is finished and one triggered to stop the process as
well)
> > > > Can you check the /var/log/messages file for "OOM"
kill?
> > > > If you see those messages then the shutdown is because of
the increase
> > > > in memory consumption which is expected.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > I suspect this means my file attributes are not set
correctly. Would the script you sent me fix that? And the script seems to be
part of the
> > > > > Git
> > > > > GlusterFS 5.0 repo. We are running 3.12. Would it still
work on 3.12 (or 4.1, since we'll be upgrading soon) or could it break
things?
> > > > Quota is not actively developed because of its performance
issues
> > > > which need a major redesign. So the script holds true for
newer
> > > > version as well,
> > > > because no changes have gone in the code for it.
> > > > The advantage of the script is it can be used to run over a
certain
> > > > directory (need not be root. this reduce the number of
directories/
> > > > files depth and so on) which is faulty.
> > > > The crawl is necessary for the quota to work fine. The
script can help
> > > > only if the xattrs are set by the crawl. which I think
isn't the case
> > > > here.
> > > > (To verify if the xattrs are set on all the directories we
need to do
> > > > a getxattr and see) So we can't use script.
> > > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > > Kind regards
> > > > >
> > > > > Gudrun Amedick
> > > > > Am Dienstag, den 20.11.2018, 16:59 +0530 schrieb Hari
Gowtham:
> > > > > >
> > > > > >
> > > > > > reply inline.
> > > > > > On Tue, Nov 20, 2018 at 3:53 PM Gudrun Mareike
Amedick
> > > > > > <g.amedick at uni-luebeck.de> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I think I know what happened. According to
the logs, the crawlers recieved a signum(15). They seemed to have died before
having finished.
> > > > > > > Probably
> > > > > > > too
> > > > > > > much to do simultaneously. I have disabled
and re-enabled quota and will set the quotas again with more time.
> > > > > > >
> > > > > > > Is there a way to restart a crawler that was
killed too soon?
> > > > > > No. the disable and enable of quota starts a new
crawl.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > If I restart a server while a crawler is
running, will the crawler be restarted, too? We'll need to do some hardware
fixing on one of the
> > > > > > > servers
> > > > > > > soon
> > > > > > > and I need to know whether I have to check
the crawlers first before shutting it down.
> > > > > > During the shutdown of the server the crawl will
be killed. (data
> > > > > > usage shown will be updated as per what has been
crawled)
> > > > > > The crawl won't be restarted on starting the
server. Only quotad will
> > > > > > be restarted (which is not the same as crawl).
> > > > > > For the crawl to happen you will have to restart
the quota.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thanks for the pointers
> > > > > > >
> > > > > > > Gudrun Amedick
> > > > > > > Am Dienstag, den 20.11.2018, 11:38 +0530
schrieb Hari Gowtham:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Can you check if the quota crawl
finished? Without it having finished
> > > > > > > > the quota list will show incorrect
values.
> > > > > > > > Looking at the under accounting, it
looks like the crawl is not yet
> > > > > > > > finished ( it does take a lot of time as
it has to crawl the whole
> > > > > > > > filesystem).
> > > > > > > >
> > > > > > > > If the crawl has finished and the usage
is still showing wrong values
> > > > > > > > then there should be an accounting
issue.
> > > > > > > > The easy way to fix this is to try
restarting quota. This will not
> > > > > > > > cause any problems. The only downside is
the limits won't hold true
> > > > > > > > while the quota is disabled,
> > > > > > > > till its enabled and the crawl finishes.
> > > > > > > > Or you can try using the quota fsck
script
> > > > > > > >
https://review.gluster.org/#/c/glusterfs/+/19179/ to fix your
> > > > > > > > accounting issue.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Hari.
> > > > > > > > On Mon, Nov 19, 2018 at 10:05 PM Frank
Ruehlemann
> > > > > > > > <f.ruehlemann at uni-luebeck.de>
wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > we're running a Distributed
Dispersed volume with Gluster 3.12.14 at
> > > > > > > > > Debian 9.6 (Stretch).
> > > > > > > > >
> > > > > > > > > We migrated our data (>300TB)
from a pure Distributed volume into this
> > > > > > > > > Dispersed volume with cp, followed
by multiple rsyncs.
> > > > > > > > > After the migration was successful
we enabled quotas again with "gluster
> > > > > > > > > volume quota $VOLUME enable",
which finished successfully.
> > > > > > > > > And we set our required quotas with
"gluster volume quota $VOLUME
> > > > > > > > > limit-usage $PATH $QUOTA",
which finished without errors too.
> > > > > > > > >
> > > > > > > > > But our "gluster volume quota
$VOLUME list" shows wrong values.
> > > > > > > > > For example:
> > > > > > > > > A directory with ~170TB of data
shows only 40.8TB Used.
> > > > > > > > > When we sum up all quoted
directories we're way under the ~310TB that
> > > > > > > > > "df -h /$volume" shows.
> > > > > > > > > And "df -h
/$volume/$directory" shows wrong values for nearly all
> > > > > > > > > directories.
> > > > > > > > >
> > > > > > > > > All 72 8TB-bricks and all quota
deamons of the 6 servers are visible and
> > > > > > > > > online in "gluster volume
status $VOLUME".
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > In quotad.log I found multiple
warnings like this:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [2018-11-16 09:21:25.738901] W
[dict.c:636:dict_unref] (-->/usr/lib/x86_64-linux-
> > > > > > > > > >
gnu/glusterfs/3.12.14/xlator/features/quotad.so(+0x1d58)
> > > > > > > > > > [0x7f6844be7d58]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/features/quotad.so(+0x2b92)
[0x7f6844be8b92] --
> > > > > > > > > > >/usr/lib/x86_64-
> > > > > > > > > > linux-
> > > > > > > > > >
gnu/libglusterfs.so.0(dict_unref+0xc0) [0x7f684b0db640] ) 0-dict: dict is NULL
[Invalid argument]
> > > > > > > > > In some brick logs I found those:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [2018-11-19 07:23:30.932327] I
[MSGID: 120020] [quota.c:2198:quota_unlink_cbk] 0-$VOLUME-quota: quota context
not set inode
> > > > > > > > > > (gfid:f100f7a9-
> > > > > > > > > > 0779-
> > > > > > > > > > 4b4c-880f-c8b3b4bdc49d)
[Invalid argument]
> > > > > > > > > and (replaced the volume name with
"$VOLUME") those:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The message "W [MSGID:
120003] [quota.c:821:quota_build_ancestry_cbk] 0-$VOLUME-quota: parent is NULL
[Invalid argument]" repeated
> > > > > > > > > > 13
> > > > > > > > > > times
> > > > > > > > > > between [2018-11-19
15:28:54.089404] and [2018-11-19 15:30:12.792175]
> > > > > > > > > > [2018-11-19 15:31:34.559348] W
[MSGID: 120003] [quota.c:821:quota_build_ancestry_cbk] 0-$VOLUME-quota: parent
is NULL [Invalid
> > > > > > > > > > argument]
> > > > > > > > > I already found that setting the
flag "trusted.glusterfs.quota.dirty" might help, but I'm unsure
about the consequences that will be
> > > > > > > > > triggered.
> > > > > > > > > And I'm unsure about the
necessary version flag.
> > > > > > > > >
> > > > > > > > > Has anyone an idea how to fix this?
> > > > > > > > >
> > > > > > > > > Best Regards,
> > > > > > > > > --
> > > > > > > > > Frank R?hlemann
> > > > > > > > >    IT-Systemtechnik
> > > > > > > > >
> > > > > > > > > UNIVERSIT?T ZU L?BECK
> > > > > > > > >     IT-Service-Center
> > > > > > > > >
> > > > > > > > >     Ratzeburger Allee 160
> > > > > > > > >     23562 L?beck
> > > > > > > > >     Tel +49 451 3101 2034
> > > > > > > > >     Fax +49 451 3101 2004
> > > > > > > > >     ruehlemann at
itsc.uni-luebeck.de
> > > > > > > > >     www.itsc.uni-luebeck.de
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
_______________________________________________
> > > > > > > > > Gluster-users mailing list
> > > > > > > > > Gluster-users at gluster.org
> > > > > > > > >
https://lists.gluster.org/mailman/listinfo/gluster-users
> > > >
> > > > --
> > > > Regards,
> > > > Hari Gowtham.
> >
> >


-- 
Regards,
Hari Gowtham.

Gudrun Mareike Amedick

2018-Nov-26 16:37 UTC

head link

[Gluster-users] Gluster 3.12.14: wrong quota in Distributed Dispersed Volume

Hi Hari,

I think I have indeed found a hint at to where the error is. As in, the script
gives me an error. This is what happens:

# python /root/glusterscripts/quotas/quota_fsch.py??--sub-dir $broken_dir
$brick?
getfattr: Removing leading '/' from absolute path names
getfattr: Removing leading '/' from absolute path names
getfattr: Removing leading '/' from absolute path names
getfattr: Removing leading '/' from absolute path names
getfattr: Removing leading '/' from absolute path names
getfattr: Removing leading '/' from absolute path names
getfattr: Removing leading '/' from absolute path names
mismatch
Traceback (most recent call last):
? File "/root/glusterscripts/quotas/quota_fsch.py", line 371, in
<module>
????walktree(os.path.join(brick_path, sub_dir), hard_link_dict)
? File "/root/glusterscripts/quotas/quota_fsch.py", line 286, in
walktree
????subtree_size = walktree(pathname, descendent_hardlinks)
? File "/root/glusterscripts/quotas/quota_fsch.py", line 325, in
walktree
????verify_dir_xattr(t_dir, aggr_size[t_dir])
? File "/root/glusterscripts/quotas/quota_fsch.py", line 260, in
verify_dir_xattr
????print_msg(QUOTA_SIZE_MISMATCH, path, xattr_dict, stbuf, dir_size)
? File "/root/glusterscripts/quotas/quota_fsch.py", line 60, in
print_msg
????print '%24s %60s %12s %12s' % ("Size Mismatch",path ,
xattr_dict['contri_size'],
KeyError: 'contri_size'

This looks kind of wrong, so I ran the script with --full-logs. The result is
longer and it contains this:

Verbose??????????????????/$somefile_1
xattr_values: {'parents': {}}
posix.stat_result(st_mode=33188, st_ino=8120161795, st_dev=65034, st_nlink=2,
st_uid=1052, st_gid=1032, st_size=512, st_atime=1538539390,
st_mtime=1538213613, st_ctime=1538539392)

getfattr: Removing leading '/' from absolute path names
Verbose??????????????????/somefile_2
xattr_values: {'parents': {}}
posix.stat_result(st_mode=33188, st_ino=8263802208, st_dev=65034, st_nlink=2,
st_uid=1052, st_gid=1032, st_size=46139430400, st_atime=1542640461,
st_mtime=1542645844, st_ctime=1542709397)


This looks even more wrong, so I took a look at the file attributes:


# getfattr -e hex -d -m. --no-dereference /$somefile_1?
getfattr: Removing leading '/' from absolute path names
# file: $somefile_1
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x0000002af87f5800
trusted.ec.version=0x0000000000234ba30000000000234ba3
trusted.gfid=0x270a5939c1fe40d5aa13d943209eedab
trusted.gfid2path.7bae7a7a6d9b6e99=0x36666433306232342d396536352d346339322d613030662d3533393662393131343830662f686f6d652d3138313131392e746172

# getfattr -e hex -d -m. --no-dereference /$somefile_2
getfattr: Removing leading '/' from absolute path names
# file: $somefile_2
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x00000000000006eb
trusted.ec.version=0x00000000000000010000000000000005
trusted.gfid=0xcfc7641415ae46899b7cb1035491d706
trusted.gfid2path.13dac9b562af3c0d=0x36666433306232342d396536352d346339322d613030662d3533393662393131343830662f6d6166446966663575362e747874

So, no quota file attributes. This doesn't look good to me..?
I also took a look at the attributes of $broken_dir and I think dirty is already
set:

# getfattr -e hex -d -m. --no-dereference $broken_dir
getfattr: Removing leading '/' from absolute path names
# file: $broken_dir
trusted.ec.version=0x00000000000000180000000000000022
trusted.gfid=0xccc9615e9bc94b5fb27a1db54c66cd3c
trusted.glusterfs.dht=0x00000001000000006aaaaaa97ffffffd
trusted.glusterfs.quota.2631bcce-32bd-4e3e-9953-6412063a9fca.contri.3=0x000000000a64200000000000000000140000000000000012
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.limit-set.3=0x0000010000000000ffffffffffffffff
trusted.glusterfs.quota.size.3=0x000000000a64200000000000000000140000000000000012

Does that mean that the crawlers didn't finish their jobs?

Kind regards

GudrunAm Montag, den 26.11.2018, 20:20 +0530 schrieb Hari
Gowtham:> Comments inline.
> 
> On Mon, Nov 26, 2018 at 7:25 PM Gudrun Mareike Amedick
> <g.amedick at uni-luebeck.de> wrote:
> > 
> > 
> > Hi Hari,
> > 
> > I'm sorry to bother you again, but I have a few questions
concerning the script.
> > 
> > Do I understand correctly that I have to execute it once per brick on
each server?
> Yes. On all the servers.
> > 
> > It is a dispersed volume, so the file size on brick side and on client
side can differ. Is that a problem?
> The size on brick is aggregated by the quota daemon and the displayed
> on the client. If there was a problem with the aggregation
> (caused due to missed updates in the brick) then we see a different
> size reported. As its a distributed file system this is how it works.
> To fix these missed updates, we need to find where the updates are
> missed. and then on these missed directories we need to set a
> dirty flag (on the directories of all bricks) and then do a stat on
> this directory from the client. As the client will see the dirty flag
> in the xattr,
> it will try to fix the values that are accounted wrong and update the
> right value.
> > 
> > 
> > Is it a reasonable way of action if I first run "python
quota_fsck.py --subdir $broken_dir $brickpath" to see if it reports
something and if yes,
> > run
> The script can be run with fix-issues argument in a single go, but we
> haven't tested the fix issues side intensively.
> As the above command shows you where the problem is, we can explicitly
> set the dirty flag and then do a lookup to fix the issue.
> This will help you understand where the issue is.
> 
> > 
> > "python quota_fsck.py --subdir $broken_dir --fix-issues
$mountpoint $brickpath" to correct them?
> The fix issue argument actually makes changes to the brick (changes
> only related to quota. which can be fixed with a restart)
> But as the restart is not crawling completely, We will come back to
> the script in case if there is an abnormality seen.
> 
> So you can run the script in two ways:
> 1) without --fix-issues and then see where the issue is. And then set
> the dirty flag on all the bricks of that directory of the brick and
> the do a stat from the client.
> 2) with fix-issue this should take care of both the setting dirty flag
> and then doing a stat on it.
> 
> You can choose any one of the above. Both has its own benefits:
> Without fix-issues: you need to do a lot of work, but its scrutinized.
> So the changes you make to the backend are fine.
> 
> with fix-issues (you just need to run it once along with this arguement)
> It changes the xattr values related to quota on the backend and fixes
> it. Looking at the size of your volume, if some thing goes wrong then
> we are left with
> useless quota values. To clean this up is where restart of quota comes
> into play. And with your volume size, the restart doesn't fix the
> whole system.
> 
> So both the ways things are fine.
> 
> > 
> > 
> > I'd run "du -h $mountpoint/broken_dir" from client side
as a lookup. Is that sufficient?
> yep. Necessary only if you are running the script without the --fix-issues
.
> > 
> > 
> > Will further action be required or should this be enough?
> > 
> > Kind regards
> > 
> > Gudrun
> > Am Montag, den 26.11.2018, 17:26 +0530 schrieb Hari Gowtham:
> > > 
> > > Yes. In that case you can run the script and see what errors it
is
> > > throwing and then clean that directory up with setting dirty and
then
> > > doing a lookup.
> > > Again for such a huge size, it will consume a lot of resource.
> > > 
> > > On Mon, Nov 26, 2018 at 3:56 PM Gudrun Mareike Amedick
> > > <g.amedick at uni-luebeck.de> wrote:
> > > > 
> > > > 
> > > > 
> > > > Hi,
> > > > 
> > > > we have no notifications of OOM kills in /var/log/messages.
So if I understood this correctly, the crawls finished but my attributes
weren't
> > > > set
> > > > correctly? And this script should fix them?
> > > > 
> > > > Thanks for your help so far
> > > > 
> > > > Gudrun
> > > > Am Donnerstag, den 22.11.2018, 13:03 +0530 schrieb Hari
Gowtham:
> > > > > 
> > > > > 
> > > > > On Wed, Nov 21, 2018 at 8:55 PM Gudrun Mareike Amedick
> > > > > <g.amedick at uni-luebeck.de> wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Hi Hari,
> > > > > > 
> > > > > > I disabled and re-enabled the quota and I saw the
crawlers starting. However, this caused a pretty high load on my servers (200+)
and this
> > > > > > seem to
> > > > > > have gotten them killed again. At least, I have no
crawlers running, the quotas are not matching the output of du -h, and the
crawler logs
> > > > > > all
> > > > > > contain
> > > > > > this line:
> > > > > The quota crawl is an intensive process as it has to
crawl the entire
> > > > > file system. The intensity varies based on the number
of bricks,
> > > > > number of files,
> > > > > the depth of filesystem, on going io to the filesystem
and so on.
> > > > > Being a disperse volume it will have to talk to all the
bricks and
> > > > > also with the huge size, the
> > > > > increase in the CPU is expected.
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > [2018-11-20 14:16:35.180467] W
[glusterfsd.c:1375:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7f0e3d6fe494] --
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > /usr/sbin/glusterfs(glusterfs_sigwaiter+0xf5)
[0x561eb7952d45] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54)
[0x561eb7952ba4] ) 0-:
> > > > > > > received
> > > > > > > signum
> > > > > > (15), shutting down
> > > > > This can mean that the file attributes are set and then
its stopped/
> > > > > as you said the process was killed while it still has
the attributes
> > > > > to be set on a few set of files.
> > > > > 
> > > > > This message is common for all the shutdown (one
triggered after the
> > > > > job is finished and one triggered to stop the process
as well)
> > > > > Can you check the /var/log/messages file for
"OOM" kill?
> > > > > If you see those messages then the shutdown is because
of the increase
> > > > > in memory consumption which is expected.
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > I suspect this means my file attributes are not
set correctly. Would the script you sent me fix that? And the script seems to be
part of
> > > > > > the
> > > > > > Git
> > > > > > GlusterFS 5.0 repo. We are running 3.12. Would it
still work on 3.12 (or 4.1, since we'll be upgrading soon) or could it break
things?
> > > > > Quota is not actively developed because of its
performance issues
> > > > > which need a major redesign. So the script holds true
for newer
> > > > > version as well,
> > > > > because no changes have gone in the code for it.
> > > > > The advantage of the script is it can be used to run
over a certain
> > > > > directory (need not be root. this reduce the number of
directories/
> > > > > files depth and so on) which is faulty.
> > > > > The crawl is necessary for the quota to work fine. The
script can help
> > > > > only if the xattrs are set by the crawl. which I think
isn't the case
> > > > > here.
> > > > > (To verify if the xattrs are set on all the directories
we need to do
> > > > > a getxattr and see) So we can't use script.
> > > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Kind regards
> > > > > > 
> > > > > > Gudrun Amedick
> > > > > > Am Dienstag, den 20.11.2018, 16:59 +0530 schrieb
Hari Gowtham:
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > reply inline.
> > > > > > > On Tue, Nov 20, 2018 at 3:53 PM Gudrun
Mareike Amedick
> > > > > > > <g.amedick at uni-luebeck.de> wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > I think I know what happened. According
to the logs, the crawlers recieved a signum(15). They seemed to have died before
having
> > > > > > > > finished.
> > > > > > > > Probably
> > > > > > > > too
> > > > > > > > much to do simultaneously. I have
disabled and re-enabled quota and will set the quotas again with more time.
> > > > > > > > 
> > > > > > > > Is there a way to restart a crawler that
was killed too soon?
> > > > > > > No. the disable and enable of quota starts a
new crawl.
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > If I restart a server while a crawler is
running, will the crawler be restarted, too? We'll need to do some hardware
fixing on one of
> > > > > > > > the
> > > > > > > > servers
> > > > > > > > soon
> > > > > > > > and I need to know whether I have to
check the crawlers first before shutting it down.
> > > > > > > During the shutdown of the server the crawl
will be killed. (data
> > > > > > > usage shown will be updated as per what has
been crawled)
> > > > > > > The crawl won't be restarted on starting
the server. Only quotad will
> > > > > > > be restarted (which is not the same as
crawl).
> > > > > > > For the crawl to happen you will have to
restart the quota.
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Thanks for the pointers
> > > > > > > > 
> > > > > > > > Gudrun Amedick
> > > > > > > > Am Dienstag, den 20.11.2018, 11:38 +0530
schrieb Hari Gowtham:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > Can you check if the quota crawl
finished? Without it having finished
> > > > > > > > > the quota list will show incorrect
values.
> > > > > > > > > Looking at the under accounting, it
looks like the crawl is not yet
> > > > > > > > > finished ( it does take a lot of
time as it has to crawl the whole
> > > > > > > > > filesystem).
> > > > > > > > > 
> > > > > > > > > If the crawl has finished and the
usage is still showing wrong values
> > > > > > > > > then there should be an accounting
issue.
> > > > > > > > > The easy way to fix this is to try
restarting quota. This will not
> > > > > > > > > cause any problems. The only
downside is the limits won't hold true
> > > > > > > > > while the quota is disabled,
> > > > > > > > > till its enabled and the crawl
finishes.
> > > > > > > > > Or you can try using the quota fsck
script
> > > > > > > > >
https://review.gluster.org/#/c/glusterfs/+/19179/ to fix your
> > > > > > > > > accounting issue.
> > > > > > > > > 
> > > > > > > > > Regards,
> > > > > > > > > Hari.
> > > > > > > > > On Mon, Nov 19, 2018 at 10:05 PM
Frank Ruehlemann
> > > > > > > > > <f.ruehlemann at
uni-luebeck.de> wrote:
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > we're running a
Distributed Dispersed volume with Gluster 3.12.14 at
> > > > > > > > > > Debian 9.6 (Stretch).
> > > > > > > > > > 
> > > > > > > > > > We migrated our data
(>300TB) from a pure Distributed volume into this
> > > > > > > > > > Dispersed volume with cp,
followed by multiple rsyncs.
> > > > > > > > > > After the migration was
successful we enabled quotas again with "gluster
> > > > > > > > > > volume quota $VOLUME
enable", which finished successfully.
> > > > > > > > > > And we set our required quotas
with "gluster volume quota $VOLUME
> > > > > > > > > > limit-usage $PATH
$QUOTA", which finished without errors too.
> > > > > > > > > > 
> > > > > > > > > > But our "gluster volume
quota $VOLUME list" shows wrong values.
> > > > > > > > > > For example:
> > > > > > > > > > A directory with ~170TB of
data shows only 40.8TB Used.
> > > > > > > > > > When we sum up all quoted
directories we're way under the ~310TB that
> > > > > > > > > > "df -h /$volume"
shows.
> > > > > > > > > > And "df -h
/$volume/$directory" shows wrong values for nearly all
> > > > > > > > > > directories.
> > > > > > > > > > 
> > > > > > > > > > All 72 8TB-bricks and all
quota deamons of the 6 servers are visible and
> > > > > > > > > > online in "gluster volume
status $VOLUME".
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > In quotad.log I found multiple
warnings like this:
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > [2018-11-16
09:21:25.738901] W [dict.c:636:dict_unref] (-->/usr/lib/x86_64-linux-
> > > > > > > > > > >
gnu/glusterfs/3.12.14/xlator/features/quotad.so(+0x1d58)
> > > > > > > > > > > [0x7f6844be7d58]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/features/quotad.so(+0x2b92)
[0x7f6844be8b92] --
> > > > > > > > > > > > 
> > > > > > > > > > > > /usr/lib/x86_64-
> > > > > > > > > > > linux-
> > > > > > > > > > >
gnu/libglusterfs.so.0(dict_unref+0xc0) [0x7f684b0db640] ) 0-dict: dict is NULL
[Invalid argument]
> > > > > > > > > > In some brick logs I found
those:
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > [2018-11-19
07:23:30.932327] I [MSGID: 120020] [quota.c:2198:quota_unlink_cbk]
0-$VOLUME-quota: quota context not set inode
> > > > > > > > > > > (gfid:f100f7a9-
> > > > > > > > > > > 0779-
> > > > > > > > > > > 4b4c-880f-c8b3b4bdc49d)
[Invalid argument]
> > > > > > > > > > and (replaced the volume name
with "$VOLUME") those:
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > The message "W
[MSGID: 120003] [quota.c:821:quota_build_ancestry_cbk] 0-$VOLUME-quota: parent
is NULL [Invalid argument]"
> > > > > > > > > > > repeated
> > > > > > > > > > > 13
> > > > > > > > > > > times
> > > > > > > > > > > between [2018-11-19
15:28:54.089404] and [2018-11-19 15:30:12.792175]
> > > > > > > > > > > [2018-11-19
15:31:34.559348] W [MSGID: 120003] [quota.c:821:quota_build_ancestry_cbk]
0-$VOLUME-quota: parent is NULL [Invalid
> > > > > > > > > > > argument]
> > > > > > > > > > I already found that setting
the flag "trusted.glusterfs.quota.dirty" might help, but I'm
unsure about the consequences that will
> > > > > > > > > > be
> > > > > > > > > > triggered.
> > > > > > > > > > And I'm unsure about the
necessary version flag.
> > > > > > > > > > 
> > > > > > > > > > Has anyone an idea how to fix
this?
> > > > > > > > > > 
> > > > > > > > > > Best Regards,
> > > > > > > > > > --
> > > > > > > > > > Frank R?hlemann
> > > > > > > > > > ???IT-Systemtechnik
> > > > > > > > > > 
> > > > > > > > > > UNIVERSIT?T ZU L?BECK
> > > > > > > > > > ????IT-Service-Center
> > > > > > > > > > 
> > > > > > > > > > ????Ratzeburger Allee 160
> > > > > > > > > > ????23562 L?beck
> > > > > > > > > > ????Tel +49 451 3101 2034
> > > > > > > > > > ????Fax +49 451 3101 2004
> > > > > > > > > > ????ruehlemann at
itsc.uni-luebeck.de
> > > > > > > > > > ????www.itsc.uni-luebeck.de
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > >
_______________________________________________
> > > > > > > > > > Gluster-users mailing list
> > > > > > > > > > Gluster-users at gluster.org
> > > > > > > > > >
https://lists.gluster.org/mailman/listinfo/gluster-users
> > > > > --
> > > > > Regards,
> > > > > Hari Gowtham.
> > > 
> 
> -------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 6743 bytes
Desc: not available
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181126/02db6bf9/attachment.bin>

Gluster users - Nov 2018 - Gluster 3.12.14: wrong quota in Distributed Dispersed Volume

[Gluster-users] Gluster 3.12.14: wrong quota in Distributed Dispersed Volume

[Gluster-users] Gluster 3.12.14: wrong quota in Distributed Dispersed Volume