Mauro Tridici
2018-Sep-10 14:08 UTC
[Gluster-users] Gluster 3.10.5: used disk size reported by quota and du mismatch
Hi Hari, thank you very much for your support. I will do everything you suggested and I will contact you as soon as all the steps will be completed. Thank you, Mauro> Il giorno 10 set 2018, alle ore 16:02, Hari Gowtham <hgowtham at redhat.com> ha scritto: > > Hi Mauro, > > I went through the log file you have shared. > I don't find any mismatch. > > This can be because of various reasons: > 1) the accounting which was wrong is now fine. but as per your comment > above if this is the case, > then the crawl should still be happening which is why the its not yet > reflected. (will reflect after a while) > 2) the fix-issue part of the script might be wrong. > 3) or the final script that we use might be wrong. > > You can wait for a while (based on the number of files the time will > vary) and then see if the accounting is fine. > If its not fine even after a while, then we will have to run the > script (6th patch set has worked so can be reused) without "fix-issue" > This will give us the mismatch in log file, which i can read and let > you know where the lookup has to be done. > On Mon, Sep 10, 2018 at 4:58 PM Mauro Tridici <mauro.tridici at cmcc.it> wrote: >> >> >> Dear Hari, >> >> the log files that I attached to my last mail have been generated running quota-fsck script after deleting the files. >> The quota-fsck script version that I used is the one in the following link https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py >> I didn?t edit the log files, but during the execution I forgot to redirect the stderr and stdout to the same log file, sorry, mea culpa! >> >> Anyway, as you suggested, I executed again the quota-fsck script with option ?fix-issues. >> At the end of script execution, I launched the du command, but the problem is still there. >> >> [root at s02 auto]# df -hT /tier2/ASC/ >> File system Tipo Dim. Usati Dispon. Uso% Montato su >> s02-stg:tier2 fuse.glusterfs 10T 2,6T 7,5T 26% /tier2 >> >> I?m sorry to bother you so much. >> Last time I used the script everything went smoothly, but this time it seems to be more difficult. >> >> In attachment you can find the new log files. >> >> Thank you, >> Mauro >> >> >> Il giorno 10 set 2018, alle ore 12:27, Hari Gowtham <hgowtham at redhat.com> ha scritto: >> >> On Mon, Sep 10, 2018 at 3:13 PM Mauro Tridici <mauro.tridici at cmcc.it> wrote: >> >> >> >> Dear Hari, >> >> I followed you suggestions, but, unfortunately, nothing is changed. >> I tried to execute both the quota-fsck script with ?fix-issues options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100? command against the files and directory mentioned by you (on each available brick). >> >> >> There can be an issue with fix-issue in the script. As the directories >> with accounting mismatch awre found its better to set the dirty xattr >> and then do a du(this way its wasy and has to resolve the issue). The >> script can be used when we dont know where the issue is. >> >> Disk quota assigned to /tier2/ASC directory seems to be partially used (about 2,6 TB used), but the ?real and current? situation is the following one (I deleted all files in primavera directory): >> >> >> If the files are deleted, then state of the log file from the script >> is outdated. The folders I suggested are as per the old log file, So >> setting the dirty xattr and then doing a lookup (du on that dir) might >> not help. >> >> >> [root at s03 qc]# du -hsc /tier2/ASC/* >> 22G /tier2/ASC/orientgate >> 26K /tier2/ASC/primavera >> 22G totale >> >> So, I think that the problem should be only in "orientgate? or in ?primavera? directory, right!? >> For this reason, in order to collect some fresh logs, I executed again the check script starting from the top level directory ?ASC? using the following bash script (named hari-20180910) based on the new version of quota_fsck (rel. 9): >> >> hari-20180910 script: >> >> #!/bin/bash >> >> #set -xv >> >> host=$(hostname) >> >> for i in {1..12} >> do >> ./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log >> done >> ~ >> >> In attachment, you can find the log files generated by the script. >> >> SOME IMPORTANT NOTES: >> >> - in the new log files, ?primavera? directory is no more present >> >> Is there something more that I can do? >> >> As there were files that were deleted, the accounting would have changed again. >> >> Need to look from the beginning, as the above suggestions may not be >> true anymore. >> >> I find that the log files are edited. A few lines are missing. Can you >> send the actual log file from running the script >> And i would recommend you to run the script after all the files are >> deleted (or other major modifications are done). >> So that we can fix once at the end. >> >> If the fix-issue argument on script doesn't work on the directory/ >> subdirectory where you find mismatch, then you can send the whole >> file. >> Will check the log and let you know where you need to do the lookup. >> >> >> Thank you very much for your patience. >> Regards, >> Mauro >> >> >> Il giorno 10 set 2018, alle ore 10:51, Hari Gowtham <hgowtham at redhat.com> ha scritto: >> >> Hi, >> >> Looking at the logs, I can see that the file: >> >> /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005 >> /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000 >> /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100 >> /primavera/cam >> >> has mismatch. >> >> You can try setting dirty for this and then do a du on it. >> >> A few corrections for my above comments. >> The contri size in the xattr and the aggregated size have to be checked. >> >> On Mon, Sep 10, 2018 at 1:16 PM Mauro Tridici <mauro.tridici at cmcc.it> wrote: >> >> >> >> Hi Hari, >> >> thank you very much for your help. >> I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible. >> >> Thank you again for the detailed explanation. >> Regards, >> Mauro >> >> Il giorno 10 set 2018, alle ore 09:17, Hari Gowtham <hgowtham at redhat.com> ha scritto: >> >> Hi Mauro, >> >> The problem might be at some other place, So setting the xattr and >> doing the lookup might not have fixed the issue. >> >> To resolve this we need to read the log file reported by the fsck >> script. In this log file we need to look for the size reported by the >> xattr (the value "SIZE:" in the log file) and the size reported by the >> stat on the file (the value after "st_size=" ). >> >> >> The contri size in the xattr and the aggregated size have to be checked >> >> These two should be the same. If they mismatch, then we have to find >> the top most dir which has the mismatch. >> >> >> Bottom most dir/file has to be found. Replace top with bottom in the >> following places as well. >> >> On this top most directory you have to do a set dirty xattr and then >> do a lookup. >> >> If there are two different directories without a common top directory, >> then both these have to undergo the above process. >> >> The fsck script should work fine. can you try the "--fix-issue" with >> the latest script instead of the 6th patch used above? >> >> >> >> >> >> >> -- >> Regards, >> Hari Gowtham. >> >> >> >> ------------------------- >> Mauro Tridici >> >> Fondazione CMCC >> CMCC Supercomputing Center >> presso Complesso Ecotekne - Universit? del Salento - >> Strada Prov.le Lecce - Monteroni sn >> 73100 Lecce IT >> http://www.cmcc.it >> >> mobile: (+39) 327 5630841 >> email: mauro.tridici at cmcc.it >> >> >> >> -- >> Regards, >> Hari Gowtham. >> >> >> >> ------------------------- >> Mauro Tridici >> >> Fondazione CMCC >> CMCC Supercomputing Center >> presso Complesso Ecotekne - Universit? del Salento - >> Strada Prov.le Lecce - Monteroni sn >> 73100 Lecce IT >> http://www.cmcc.it >> >> mobile: (+39) 327 5630841 >> email: mauro.tridici at cmcc.it >> > > > -- > Regards, > Hari Gowtham.------------------------- Mauro Tridici Fondazione CMCC CMCC Supercomputing Center presso Complesso Ecotekne - Universit? del Salento - Strada Prov.le Lecce - Monteroni sn 73100 Lecce IT http://www.cmcc.it mobile: (+39) 327 5630841 email: mauro.tridici at cmcc.it -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180910/7f532059/attachment.html>
Mauro Tridici
2018-Sep-10 14:32 UTC
[Gluster-users] Gluster 3.10.5: used disk size reported by quota and du mismatch
Hi Hari,
good news for us!
A few seconds ago, I submitted the gluster quota list command in order to save
the current quota status.
[root at s01 auto]# gluster volume quota tier2 list /ASC
Path Hard-limit Soft-limit Used
Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/ASC 10.0TB 99%(9.9TB) 2.6TB 7.4TB
No No
In the same time, I was asking myself how I can stimulate a sort of directory
?scan? in order to refresh the quota value without waiting for the automatic
scan.
So, I decided to start a ?du -hs /tier2/ASC? session (without specify each
single brick path as I usually do after quota-fsck script execution).
[root at s01 auto]# du -hs /tier2/ASC
22G /tier2/ASC
Now, magically, the quota value reflects the real disk space usage info provided
by the ?du? command.
[root at s01 auto]# gluster volume quota tier2 list /ASC
Path Hard-limit Soft-limit Used
Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/ASC 10.0TB 99%(9.9TB) 21.8GB 10.0TB
No No
Do you think that I was only lucky or is there a particular reason why
everything is now working?
Thank you,
Mauro
> Il giorno 10 set 2018, alle ore 16:08, Mauro Tridici <mauro.tridici at
cmcc.it> ha scritto:
>
>
> Hi Hari,
>
> thank you very much for your support.
> I will do everything you suggested and I will contact you as soon as all
the steps will be completed.
>
> Thank you,
> Mauro
>
>> Il giorno 10 set 2018, alle ore 16:02, Hari Gowtham <hgowtham at
redhat.com <mailto:hgowtham at redhat.com>> ha scritto:
>>
>> Hi Mauro,
>>
>> I went through the log file you have shared.
>> I don't find any mismatch.
>>
>> This can be because of various reasons:
>> 1) the accounting which was wrong is now fine. but as per your comment
>> above if this is the case,
>> then the crawl should still be happening which is why the its not yet
>> reflected. (will reflect after a while)
>> 2) the fix-issue part of the script might be wrong.
>> 3) or the final script that we use might be wrong.
>>
>> You can wait for a while (based on the number of files the time will
>> vary) and then see if the accounting is fine.
>> If its not fine even after a while, then we will have to run the
>> script (6th patch set has worked so can be reused) without
"fix-issue"
>> This will give us the mismatch in log file, which i can read and let
>> you know where the lookup has to be done.
>> On Mon, Sep 10, 2018 at 4:58 PM Mauro Tridici <mauro.tridici at
cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote:
>>>
>>>
>>> Dear Hari,
>>>
>>> the log files that I attached to my last mail have been generated
running quota-fsck script after deleting the files.
>>> The quota-fsck script version that I used is the one in the
following link
https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py
<https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py>
>>> I didn?t edit the log files, but during the execution I forgot to
redirect the stderr and stdout to the same log file, sorry, mea culpa!
>>>
>>> Anyway, as you suggested, I executed again the quota-fsck script
with option ?fix-issues.
>>> At the end of script execution, I launched the du command, but the
problem is still there.
>>>
>>> [root at s02 auto]# df -hT /tier2/ASC/
>>> File system Tipo Dim. Usati Dispon. Uso% Montato su
>>> s02-stg:tier2 fuse.glusterfs 10T 2,6T 7,5T 26% /tier2
>>>
>>> I?m sorry to bother you so much.
>>> Last time I used the script everything went smoothly, but this time
it seems to be more difficult.
>>>
>>> In attachment you can find the new log files.
>>>
>>> Thank you,
>>> Mauro
>>>
>>>
>>> Il giorno 10 set 2018, alle ore 12:27, Hari Gowtham <hgowtham at
redhat.com <mailto:hgowtham at redhat.com>> ha scritto:
>>>
>>> On Mon, Sep 10, 2018 at 3:13 PM Mauro Tridici <mauro.tridici at
cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote:
>>>
>>>
>>>
>>> Dear Hari,
>>>
>>> I followed you suggestions, but, unfortunately, nothing is changed.
>>> I tried to execute both the quota-fsck script with ?fix-issues
options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100?
command against the files and directory mentioned by you (on each available
brick).
>>>
>>>
>>> There can be an issue with fix-issue in the script. As the
directories
>>> with accounting mismatch awre found its better to set the dirty
xattr
>>> and then do a du(this way its wasy and has to resolve the issue).
The
>>> script can be used when we dont know where the issue is.
>>>
>>> Disk quota assigned to /tier2/ASC directory seems to be partially
used (about 2,6 TB used), but the ?real and current? situation is the following
one (I deleted all files in primavera directory):
>>>
>>>
>>> If the files are deleted, then state of the log file from the
script
>>> is outdated. The folders I suggested are as per the old log file,
So
>>> setting the dirty xattr and then doing a lookup (du on that dir)
might
>>> not help.
>>>
>>>
>>> [root at s03 qc]# du -hsc /tier2/ASC/*
>>> 22G /tier2/ASC/orientgate
>>> 26K /tier2/ASC/primavera
>>> 22G totale
>>>
>>> So, I think that the problem should be only in "orientgate? or
in ?primavera? directory, right!?
>>> For this reason, in order to collect some fresh logs, I executed
again the check script starting from the top level directory ?ASC? using the
following bash script (named hari-20180910) based on the new version of
quota_fsck (rel. 9):
>>>
>>> hari-20180910 script:
>>>
>>> #!/bin/bash
>>>
>>> #set -xv
>>>
>>> host=$(hostname)
>>>
>>> for i in {1..12}
>>> do
>>> ./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick
>> $host.log
>>> done
>>> ~
>>>
>>> In attachment, you can find the log files generated by the script.
>>>
>>> SOME IMPORTANT NOTES:
>>>
>>> - in the new log files, ?primavera? directory is no more present
>>>
>>> Is there something more that I can do?
>>>
>>> As there were files that were deleted, the accounting would have
changed again.
>>>
>>> Need to look from the beginning, as the above suggestions may not
be
>>> true anymore.
>>>
>>> I find that the log files are edited. A few lines are missing. Can
you
>>> send the actual log file from running the script
>>> And i would recommend you to run the script after all the files are
>>> deleted (or other major modifications are done).
>>> So that we can fix once at the end.
>>>
>>> If the fix-issue argument on script doesn't work on the
directory/
>>> subdirectory where you find mismatch, then you can send the whole
>>> file.
>>> Will check the log and let you know where you need to do the
lookup.
>>>
>>>
>>> Thank you very much for your patience.
>>> Regards,
>>> Mauro
>>>
>>>
>>> Il giorno 10 set 2018, alle ore 10:51, Hari Gowtham <hgowtham at
redhat.com <mailto:hgowtham at redhat.com>> ha scritto:
>>>
>>> Hi,
>>>
>>> Looking at the logs, I can see that the file:
>>>
>>>
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005
>>>
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000
>>>
/orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100
>>> /primavera/cam
>>>
>>> has mismatch.
>>>
>>> You can try setting dirty for this and then do a du on it.
>>>
>>> A few corrections for my above comments.
>>> The contri size in the xattr and the aggregated size have to be
checked.
>>>
>>> On Mon, Sep 10, 2018 at 1:16 PM Mauro Tridici <mauro.tridici at
cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote:
>>>
>>>
>>>
>>> Hi Hari,
>>>
>>> thank you very much for your help.
>>> I will try to use the latest available version of quota_fsck script
and I will provide you a feedback as soon as possible.
>>>
>>> Thank you again for the detailed explanation.
>>> Regards,
>>> Mauro
>>>
>>> Il giorno 10 set 2018, alle ore 09:17, Hari Gowtham <hgowtham at
redhat.com <mailto:hgowtham at redhat.com>> ha scritto:
>>>
>>> Hi Mauro,
>>>
>>> The problem might be at some other place, So setting the xattr and
>>> doing the lookup might not have fixed the issue.
>>>
>>> To resolve this we need to read the log file reported by the fsck
>>> script. In this log file we need to look for the size reported by
the
>>> xattr (the value "SIZE:" in the log file) and the size
reported by the
>>> stat on the file (the value after "st_size=" ).
>>>
>>>
>>> The contri size in the xattr and the aggregated size have to be
checked
>>>
>>> These two should be the same. If they mismatch, then we have to
find
>>> the top most dir which has the mismatch.
>>>
>>>
>>> Bottom most dir/file has to be found. Replace top with bottom in
the
>>> following places as well.
>>>
>>> On this top most directory you have to do a set dirty xattr and
then
>>> do a lookup.
>>>
>>> If there are two different directories without a common top
directory,
>>> then both these have to undergo the above process.
>>>
>>> The fsck script should work fine. can you try the
"--fix-issue" with
>>> the latest script instead of the 6th patch used above?
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Hari Gowtham.
>>>
>>>
>>>
>>> -------------------------
>>> Mauro Tridici
>>>
>>> Fondazione CMCC
>>> CMCC Supercomputing Center
>>> presso Complesso Ecotekne - Universit? del Salento -
>>> Strada Prov.le Lecce - Monteroni sn
>>> 73100 Lecce IT
>>> http://www.cmcc.it <http://www.cmcc.it/>
>>>
>>> mobile: (+39) 327 5630841
>>> email: mauro.tridici at cmcc.it
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Hari Gowtham.
>>>
>>>
>>>
>>> -------------------------
>>> Mauro Tridici
>>>
>>> Fondazione CMCC
>>> CMCC Supercomputing Center
>>> presso Complesso Ecotekne - Universit? del Salento -
>>> Strada Prov.le Lecce - Monteroni sn
>>> 73100 Lecce IT
>>> http://www.cmcc.it
>>>
>>> mobile: (+39) 327 5630841
>>> email: mauro.tridici at cmcc.it
>>>
>>
>>
>> --
>> Regards,
>> Hari Gowtham.
>
>
> -------------------------
> Mauro Tridici
>
> Fondazione CMCC
> CMCC Supercomputing Center
> presso Complesso Ecotekne - Universit? del Salento -
> Strada Prov.le Lecce - Monteroni sn
> 73100 Lecce IT
> http://www.cmcc.it <http://www.cmcc.it/>
>
> mobile: (+39) 327 5630841
> email: mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>
-------------------------
Mauro Tridici
Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Universit? del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce IT
http://www.cmcc.it
mobile: (+39) 327 5630841
email: mauro.tridici at cmcc.it
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180910/7d7b0bd9/attachment.html>