Mauro Tridici
2018-Sep-10 14:32 UTC
[Gluster-users] Gluster 3.10.5: used disk size reported by quota and du mismatch
Hi Hari, good news for us! A few seconds ago, I submitted the gluster quota list command in order to save the current quota status. [root at s01 auto]# gluster volume quota tier2 list /ASC Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------------------------------------------------------------- /ASC 10.0TB 99%(9.9TB) 2.6TB 7.4TB No No In the same time, I was asking myself how I can stimulate a sort of directory ?scan? in order to refresh the quota value without waiting for the automatic scan. So, I decided to start a ?du -hs /tier2/ASC? session (without specify each single brick path as I usually do after quota-fsck script execution). [root at s01 auto]# du -hs /tier2/ASC 22G /tier2/ASC Now, magically, the quota value reflects the real disk space usage info provided by the ?du? command. [root at s01 auto]# gluster volume quota tier2 list /ASC Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------------------------------------------------------------- /ASC 10.0TB 99%(9.9TB) 21.8GB 10.0TB No No Do you think that I was only lucky or is there a particular reason why everything is now working? Thank you, Mauro> Il giorno 10 set 2018, alle ore 16:08, Mauro Tridici <mauro.tridici at cmcc.it> ha scritto: > > > Hi Hari, > > thank you very much for your support. > I will do everything you suggested and I will contact you as soon as all the steps will be completed. > > Thank you, > Mauro > >> Il giorno 10 set 2018, alle ore 16:02, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto: >> >> Hi Mauro, >> >> I went through the log file you have shared. >> I don't find any mismatch. >> >> This can be because of various reasons: >> 1) the accounting which was wrong is now fine. but as per your comment >> above if this is the case, >> then the crawl should still be happening which is why the its not yet >> reflected. (will reflect after a while) >> 2) the fix-issue part of the script might be wrong. >> 3) or the final script that we use might be wrong. >> >> You can wait for a while (based on the number of files the time will >> vary) and then see if the accounting is fine. >> If its not fine even after a while, then we will have to run the >> script (6th patch set has worked so can be reused) without "fix-issue" >> This will give us the mismatch in log file, which i can read and let >> you know where the lookup has to be done. >> On Mon, Sep 10, 2018 at 4:58 PM Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote: >>> >>> >>> Dear Hari, >>> >>> the log files that I attached to my last mail have been generated running quota-fsck script after deleting the files. >>> The quota-fsck script version that I used is the one in the following link https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py <https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py> >>> I didn?t edit the log files, but during the execution I forgot to redirect the stderr and stdout to the same log file, sorry, mea culpa! >>> >>> Anyway, as you suggested, I executed again the quota-fsck script with option ?fix-issues. >>> At the end of script execution, I launched the du command, but the problem is still there. >>> >>> [root at s02 auto]# df -hT /tier2/ASC/ >>> File system Tipo Dim. Usati Dispon. Uso% Montato su >>> s02-stg:tier2 fuse.glusterfs 10T 2,6T 7,5T 26% /tier2 >>> >>> I?m sorry to bother you so much. >>> Last time I used the script everything went smoothly, but this time it seems to be more difficult. >>> >>> In attachment you can find the new log files. >>> >>> Thank you, >>> Mauro >>> >>> >>> Il giorno 10 set 2018, alle ore 12:27, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto: >>> >>> On Mon, Sep 10, 2018 at 3:13 PM Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote: >>> >>> >>> >>> Dear Hari, >>> >>> I followed you suggestions, but, unfortunately, nothing is changed. >>> I tried to execute both the quota-fsck script with ?fix-issues options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100? command against the files and directory mentioned by you (on each available brick). >>> >>> >>> There can be an issue with fix-issue in the script. As the directories >>> with accounting mismatch awre found its better to set the dirty xattr >>> and then do a du(this way its wasy and has to resolve the issue). The >>> script can be used when we dont know where the issue is. >>> >>> Disk quota assigned to /tier2/ASC directory seems to be partially used (about 2,6 TB used), but the ?real and current? situation is the following one (I deleted all files in primavera directory): >>> >>> >>> If the files are deleted, then state of the log file from the script >>> is outdated. The folders I suggested are as per the old log file, So >>> setting the dirty xattr and then doing a lookup (du on that dir) might >>> not help. >>> >>> >>> [root at s03 qc]# du -hsc /tier2/ASC/* >>> 22G /tier2/ASC/orientgate >>> 26K /tier2/ASC/primavera >>> 22G totale >>> >>> So, I think that the problem should be only in "orientgate? or in ?primavera? directory, right!? >>> For this reason, in order to collect some fresh logs, I executed again the check script starting from the top level directory ?ASC? using the following bash script (named hari-20180910) based on the new version of quota_fsck (rel. 9): >>> >>> hari-20180910 script: >>> >>> #!/bin/bash >>> >>> #set -xv >>> >>> host=$(hostname) >>> >>> for i in {1..12} >>> do >>> ./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log >>> done >>> ~ >>> >>> In attachment, you can find the log files generated by the script. >>> >>> SOME IMPORTANT NOTES: >>> >>> - in the new log files, ?primavera? directory is no more present >>> >>> Is there something more that I can do? >>> >>> As there were files that were deleted, the accounting would have changed again. >>> >>> Need to look from the beginning, as the above suggestions may not be >>> true anymore. >>> >>> I find that the log files are edited. A few lines are missing. Can you >>> send the actual log file from running the script >>> And i would recommend you to run the script after all the files are >>> deleted (or other major modifications are done). >>> So that we can fix once at the end. >>> >>> If the fix-issue argument on script doesn't work on the directory/ >>> subdirectory where you find mismatch, then you can send the whole >>> file. >>> Will check the log and let you know where you need to do the lookup. >>> >>> >>> Thank you very much for your patience. >>> Regards, >>> Mauro >>> >>> >>> Il giorno 10 set 2018, alle ore 10:51, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto: >>> >>> Hi, >>> >>> Looking at the logs, I can see that the file: >>> >>> /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005 >>> /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000 >>> /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100 >>> /primavera/cam >>> >>> has mismatch. >>> >>> You can try setting dirty for this and then do a du on it. >>> >>> A few corrections for my above comments. >>> The contri size in the xattr and the aggregated size have to be checked. >>> >>> On Mon, Sep 10, 2018 at 1:16 PM Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote: >>> >>> >>> >>> Hi Hari, >>> >>> thank you very much for your help. >>> I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible. >>> >>> Thank you again for the detailed explanation. >>> Regards, >>> Mauro >>> >>> Il giorno 10 set 2018, alle ore 09:17, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto: >>> >>> Hi Mauro, >>> >>> The problem might be at some other place, So setting the xattr and >>> doing the lookup might not have fixed the issue. >>> >>> To resolve this we need to read the log file reported by the fsck >>> script. In this log file we need to look for the size reported by the >>> xattr (the value "SIZE:" in the log file) and the size reported by the >>> stat on the file (the value after "st_size=" ). >>> >>> >>> The contri size in the xattr and the aggregated size have to be checked >>> >>> These two should be the same. If they mismatch, then we have to find >>> the top most dir which has the mismatch. >>> >>> >>> Bottom most dir/file has to be found. Replace top with bottom in the >>> following places as well. >>> >>> On this top most directory you have to do a set dirty xattr and then >>> do a lookup. >>> >>> If there are two different directories without a common top directory, >>> then both these have to undergo the above process. >>> >>> The fsck script should work fine. can you try the "--fix-issue" with >>> the latest script instead of the 6th patch used above? >>> >>> >>> >>> >>> >>> >>> -- >>> Regards, >>> Hari Gowtham. >>> >>> >>> >>> ------------------------- >>> Mauro Tridici >>> >>> Fondazione CMCC >>> CMCC Supercomputing Center >>> presso Complesso Ecotekne - Universit? del Salento - >>> Strada Prov.le Lecce - Monteroni sn >>> 73100 Lecce IT >>> http://www.cmcc.it <http://www.cmcc.it/> >>> >>> mobile: (+39) 327 5630841 >>> email: mauro.tridici at cmcc.it >>> >>> >>> >>> -- >>> Regards, >>> Hari Gowtham. >>> >>> >>> >>> ------------------------- >>> Mauro Tridici >>> >>> Fondazione CMCC >>> CMCC Supercomputing Center >>> presso Complesso Ecotekne - Universit? del Salento - >>> Strada Prov.le Lecce - Monteroni sn >>> 73100 Lecce IT >>> http://www.cmcc.it >>> >>> mobile: (+39) 327 5630841 >>> email: mauro.tridici at cmcc.it >>> >> >> >> -- >> Regards, >> Hari Gowtham. > > > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Universit? del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it <http://www.cmcc.it/> > > mobile: (+39) 327 5630841 > email: mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>------------------------- Mauro Tridici Fondazione CMCC CMCC Supercomputing Center presso Complesso Ecotekne - Universit? del Salento - Strada Prov.le Lecce - Monteroni sn 73100 Lecce IT http://www.cmcc.it mobile: (+39) 327 5630841 email: mauro.tridici at cmcc.it -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180910/7d7b0bd9/attachment.html>
Hari Gowtham
2018-Sep-11 08:49 UTC
[Gluster-users] Gluster 3.10.5: used disk size reported by quota and du mismatch
Hi Mauro, It was because the quota crawl takes some time and it was working on it. When we ran the fix-issues it makes changes to the backend and does a lookup. It takes time for the whole thing to reflect in the quota list command. Earlier, it didnt reflect as it was still crawling. So this is the same as the first reason I have mentioned above in the 3 situations that could have happened. This is the expected behavior. Regards, Hari. On Mon, Sep 10, 2018 at 8:03 PM Mauro Tridici <mauro.tridici at cmcc.it> wrote:> > > Hi Hari, > > good news for us! > > A few seconds ago, I submitted the gluster quota list command in order to save the current quota status. > > [root at s01 auto]# gluster volume quota tier2 list /ASC > Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? > ------------------------------------------------------------------------------------------------------------------------------- > /ASC 10.0TB 99%(9.9TB) 2.6TB 7.4TB No No > > In the same time, I was asking myself how I can stimulate a sort of directory ?scan? in order to refresh the quota value without waiting for the automatic scan. > So, I decided to start a ?du -hs /tier2/ASC? session (without specify each single brick path as I usually do after quota-fsck script execution). > > [root at s01 auto]# du -hs /tier2/ASC > 22G /tier2/ASC > > Now, magically, the quota value reflects the real disk space usage info provided by the ?du? command. > > [root at s01 auto]# gluster volume quota tier2 list /ASC > Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? > ------------------------------------------------------------------------------------------------------------------------------- > /ASC 10.0TB 99%(9.9TB) 21.8GB 10.0TB No No > > Do you think that I was only lucky or is there a particular reason why everything is now working? > > Thank you, > Mauro > > > Il giorno 10 set 2018, alle ore 16:08, Mauro Tridici <mauro.tridici at cmcc.it> ha scritto: > > > Hi Hari, > > thank you very much for your support. > I will do everything you suggested and I will contact you as soon as all the steps will be completed. > > Thank you, > Mauro > > Il giorno 10 set 2018, alle ore 16:02, Hari Gowtham <hgowtham at redhat.com> ha scritto: > > Hi Mauro, > > I went through the log file you have shared. > I don't find any mismatch. > > This can be because of various reasons: > 1) the accounting which was wrong is now fine. but as per your comment > above if this is the case, > then the crawl should still be happening which is why the its not yet > reflected. (will reflect after a while) > 2) the fix-issue part of the script might be wrong. > 3) or the final script that we use might be wrong. > > You can wait for a while (based on the number of files the time will > vary) and then see if the accounting is fine. > If its not fine even after a while, then we will have to run the > script (6th patch set has worked so can be reused) without "fix-issue" > This will give us the mismatch in log file, which i can read and let > you know where the lookup has to be done. > On Mon, Sep 10, 2018 at 4:58 PM Mauro Tridici <mauro.tridici at cmcc.it> wrote: > > > > Dear Hari, > > the log files that I attached to my last mail have been generated running quota-fsck script after deleting the files. > The quota-fsck script version that I used is the one in the following link https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py > I didn?t edit the log files, but during the execution I forgot to redirect the stderr and stdout to the same log file, sorry, mea culpa! > > Anyway, as you suggested, I executed again the quota-fsck script with option ?fix-issues. > At the end of script execution, I launched the du command, but the problem is still there. > > [root at s02 auto]# df -hT /tier2/ASC/ > File system Tipo Dim. Usati Dispon. Uso% Montato su > s02-stg:tier2 fuse.glusterfs 10T 2,6T 7,5T 26% /tier2 > > I?m sorry to bother you so much. > Last time I used the script everything went smoothly, but this time it seems to be more difficult. > > In attachment you can find the new log files. > > Thank you, > Mauro > > > Il giorno 10 set 2018, alle ore 12:27, Hari Gowtham <hgowtham at redhat.com> ha scritto: > > On Mon, Sep 10, 2018 at 3:13 PM Mauro Tridici <mauro.tridici at cmcc.it> wrote: > > > > Dear Hari, > > I followed you suggestions, but, unfortunately, nothing is changed. > I tried to execute both the quota-fsck script with ?fix-issues options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100? command against the files and directory mentioned by you (on each available brick). > > > There can be an issue with fix-issue in the script. As the directories > with accounting mismatch awre found its better to set the dirty xattr > and then do a du(this way its wasy and has to resolve the issue). The > script can be used when we dont know where the issue is. > > Disk quota assigned to /tier2/ASC directory seems to be partially used (about 2,6 TB used), but the ?real and current? situation is the following one (I deleted all files in primavera directory): > > > If the files are deleted, then state of the log file from the script > is outdated. The folders I suggested are as per the old log file, So > setting the dirty xattr and then doing a lookup (du on that dir) might > not help. > > > [root at s03 qc]# du -hsc /tier2/ASC/* > 22G /tier2/ASC/orientgate > 26K /tier2/ASC/primavera > 22G totale > > So, I think that the problem should be only in "orientgate? or in ?primavera? directory, right!? > For this reason, in order to collect some fresh logs, I executed again the check script starting from the top level directory ?ASC? using the following bash script (named hari-20180910) based on the new version of quota_fsck (rel. 9): > > hari-20180910 script: > > #!/bin/bash > > #set -xv > > host=$(hostname) > > for i in {1..12} > do > ./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log > done > ~ > > In attachment, you can find the log files generated by the script. > > SOME IMPORTANT NOTES: > > - in the new log files, ?primavera? directory is no more present > > Is there something more that I can do? > > As there were files that were deleted, the accounting would have changed again. > > Need to look from the beginning, as the above suggestions may not be > true anymore. > > I find that the log files are edited. A few lines are missing. Can you > send the actual log file from running the script > And i would recommend you to run the script after all the files are > deleted (or other major modifications are done). > So that we can fix once at the end. > > If the fix-issue argument on script doesn't work on the directory/ > subdirectory where you find mismatch, then you can send the whole > file. > Will check the log and let you know where you need to do the lookup. > > > Thank you very much for your patience. > Regards, > Mauro > > > Il giorno 10 set 2018, alle ore 10:51, Hari Gowtham <hgowtham at redhat.com> ha scritto: > > Hi, > > Looking at the logs, I can see that the file: > > /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005 > /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000 > /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100 > /primavera/cam > > has mismatch. > > You can try setting dirty for this and then do a du on it. > > A few corrections for my above comments. > The contri size in the xattr and the aggregated size have to be checked. > > On Mon, Sep 10, 2018 at 1:16 PM Mauro Tridici <mauro.tridici at cmcc.it> wrote: > > > > Hi Hari, > > thank you very much for your help. > I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible. > > Thank you again for the detailed explanation. > Regards, > Mauro > > Il giorno 10 set 2018, alle ore 09:17, Hari Gowtham <hgowtham at redhat.com> ha scritto: > > Hi Mauro, > > The problem might be at some other place, So setting the xattr and > doing the lookup might not have fixed the issue. > > To resolve this we need to read the log file reported by the fsck > script. In this log file we need to look for the size reported by the > xattr (the value "SIZE:" in the log file) and the size reported by the > stat on the file (the value after "st_size=" ). > > > The contri size in the xattr and the aggregated size have to be checked > > These two should be the same. If they mismatch, then we have to find > the top most dir which has the mismatch. > > > Bottom most dir/file has to be found. Replace top with bottom in the > following places as well. > > On this top most directory you have to do a set dirty xattr and then > do a lookup. > > If there are two different directories without a common top directory, > then both these have to undergo the above process. > > The fsck script should work fine. can you try the "--fix-issue" with > the latest script instead of the 6th patch used above? > > > > > > > -- > Regards, > Hari Gowtham. > > > > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Universit? del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it > > mobile: (+39) 327 5630841 > email: mauro.tridici at cmcc.it > > > > -- > Regards, > Hari Gowtham. > > > > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Universit? del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it > > mobile: (+39) 327 5630841 > email: mauro.tridici at cmcc.it > > > > -- > Regards, > Hari Gowtham. > > > > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Universit? del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it > > mobile: (+39) 327 5630841 > email: mauro.tridici at cmcc.it > > > > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Universit? del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it > > mobile: (+39) 327 5630841 > email: mauro.tridici at cmcc.it >-- Regards, Hari Gowtham.