Mauro Tridici
2018-Sep-10 14:08 UTC
[Gluster-users] Gluster 3.10.5: used disk size reported by quota and du mismatch
Hi Hari, thank you very much for your support. I will do everything you suggested and I will contact you as soon as all the steps will be completed. Thank you, Mauro> Il giorno 10 set 2018, alle ore 16:02, Hari Gowtham <hgowtham at redhat.com> ha scritto: > > Hi Mauro, > > I went through the log file you have shared. > I don't find any mismatch. > > This can be because of various reasons: > 1) the accounting which was wrong is now fine. but as per your comment > above if this is the case, > then the crawl should still be happening which is why the its not yet > reflected. (will reflect after a while) > 2) the fix-issue part of the script might be wrong. > 3) or the final script that we use might be wrong. > > You can wait for a while (based on the number of files the time will > vary) and then see if the accounting is fine. > If its not fine even after a while, then we will have to run the > script (6th patch set has worked so can be reused) without "fix-issue" > This will give us the mismatch in log file, which i can read and let > you know where the lookup has to be done. > On Mon, Sep 10, 2018 at 4:58 PM Mauro Tridici <mauro.tridici at cmcc.it> wrote: >> >> >> Dear Hari, >> >> the log files that I attached to my last mail have been generated running quota-fsck script after deleting the files. >> The quota-fsck script version that I used is the one in the following link https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py >> I didn?t edit the log files, but during the execution I forgot to redirect the stderr and stdout to the same log file, sorry, mea culpa! >> >> Anyway, as you suggested, I executed again the quota-fsck script with option ?fix-issues. >> At the end of script execution, I launched the du command, but the problem is still there. >> >> [root at s02 auto]# df -hT /tier2/ASC/ >> File system Tipo Dim. Usati Dispon. Uso% Montato su >> s02-stg:tier2 fuse.glusterfs 10T 2,6T 7,5T 26% /tier2 >> >> I?m sorry to bother you so much. >> Last time I used the script everything went smoothly, but this time it seems to be more difficult. >> >> In attachment you can find the new log files. >> >> Thank you, >> Mauro >> >> >> Il giorno 10 set 2018, alle ore 12:27, Hari Gowtham <hgowtham at redhat.com> ha scritto: >> >> On Mon, Sep 10, 2018 at 3:13 PM Mauro Tridici <mauro.tridici at cmcc.it> wrote: >> >> >> >> Dear Hari, >> >> I followed you suggestions, but, unfortunately, nothing is changed. >> I tried to execute both the quota-fsck script with ?fix-issues options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100? command against the files and directory mentioned by you (on each available brick). >> >> >> There can be an issue with fix-issue in the script. As the directories >> with accounting mismatch awre found its better to set the dirty xattr >> and then do a du(this way its wasy and has to resolve the issue). The >> script can be used when we dont know where the issue is. >> >> Disk quota assigned to /tier2/ASC directory seems to be partially used (about 2,6 TB used), but the ?real and current? situation is the following one (I deleted all files in primavera directory): >> >> >> If the files are deleted, then state of the log file from the script >> is outdated. The folders I suggested are as per the old log file, So >> setting the dirty xattr and then doing a lookup (du on that dir) might >> not help. >> >> >> [root at s03 qc]# du -hsc /tier2/ASC/* >> 22G /tier2/ASC/orientgate >> 26K /tier2/ASC/primavera >> 22G totale >> >> So, I think that the problem should be only in "orientgate? or in ?primavera? directory, right!? >> For this reason, in order to collect some fresh logs, I executed again the check script starting from the top level directory ?ASC? using the following bash script (named hari-20180910) based on the new version of quota_fsck (rel. 9): >> >> hari-20180910 script: >> >> #!/bin/bash >> >> #set -xv >> >> host=$(hostname) >> >> for i in {1..12} >> do >> ./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log >> done >> ~ >> >> In attachment, you can find the log files generated by the script. >> >> SOME IMPORTANT NOTES: >> >> - in the new log files, ?primavera? directory is no more present >> >> Is there something more that I can do? >> >> As there were files that were deleted, the accounting would have changed again. >> >> Need to look from the beginning, as the above suggestions may not be >> true anymore. >> >> I find that the log files are edited. A few lines are missing. Can you >> send the actual log file from running the script >> And i would recommend you to run the script after all the files are >> deleted (or other major modifications are done). >> So that we can fix once at the end. >> >> If the fix-issue argument on script doesn't work on the directory/ >> subdirectory where you find mismatch, then you can send the whole >> file. >> Will check the log and let you know where you need to do the lookup. >> >> >> Thank you very much for your patience. >> Regards, >> Mauro >> >> >> Il giorno 10 set 2018, alle ore 10:51, Hari Gowtham <hgowtham at redhat.com> ha scritto: >> >> Hi, >> >> Looking at the logs, I can see that the file: >> >> /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005 >> /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000 >> /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100 >> /primavera/cam >> >> has mismatch. >> >> You can try setting dirty for this and then do a du on it. >> >> A few corrections for my above comments. >> The contri size in the xattr and the aggregated size have to be checked. >> >> On Mon, Sep 10, 2018 at 1:16 PM Mauro Tridici <mauro.tridici at cmcc.it> wrote: >> >> >> >> Hi Hari, >> >> thank you very much for your help. >> I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible. >> >> Thank you again for the detailed explanation. >> Regards, >> Mauro >> >> Il giorno 10 set 2018, alle ore 09:17, Hari Gowtham <hgowtham at redhat.com> ha scritto: >> >> Hi Mauro, >> >> The problem might be at some other place, So setting the xattr and >> doing the lookup might not have fixed the issue. >> >> To resolve this we need to read the log file reported by the fsck >> script. In this log file we need to look for the size reported by the >> xattr (the value "SIZE:" in the log file) and the size reported by the >> stat on the file (the value after "st_size=" ). >> >> >> The contri size in the xattr and the aggregated size have to be checked >> >> These two should be the same. If they mismatch, then we have to find >> the top most dir which has the mismatch. >> >> >> Bottom most dir/file has to be found. Replace top with bottom in the >> following places as well. >> >> On this top most directory you have to do a set dirty xattr and then >> do a lookup. >> >> If there are two different directories without a common top directory, >> then both these have to undergo the above process. >> >> The fsck script should work fine. can you try the "--fix-issue" with >> the latest script instead of the 6th patch used above? >> >> >> >> >> >> >> -- >> Regards, >> Hari Gowtham. >> >> >> >> ------------------------- >> Mauro Tridici >> >> Fondazione CMCC >> CMCC Supercomputing Center >> presso Complesso Ecotekne - Universit? del Salento - >> Strada Prov.le Lecce - Monteroni sn >> 73100 Lecce IT >> http://www.cmcc.it >> >> mobile: (+39) 327 5630841 >> email: mauro.tridici at cmcc.it >> >> >> >> -- >> Regards, >> Hari Gowtham. >> >> >> >> ------------------------- >> Mauro Tridici >> >> Fondazione CMCC >> CMCC Supercomputing Center >> presso Complesso Ecotekne - Universit? del Salento - >> Strada Prov.le Lecce - Monteroni sn >> 73100 Lecce IT >> http://www.cmcc.it >> >> mobile: (+39) 327 5630841 >> email: mauro.tridici at cmcc.it >> > > > -- > Regards, > Hari Gowtham.------------------------- Mauro Tridici Fondazione CMCC CMCC Supercomputing Center presso Complesso Ecotekne - Universit? del Salento - Strada Prov.le Lecce - Monteroni sn 73100 Lecce IT http://www.cmcc.it mobile: (+39) 327 5630841 email: mauro.tridici at cmcc.it -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180910/7f532059/attachment.html>
Mauro Tridici
2018-Sep-10 14:32 UTC
[Gluster-users] Gluster 3.10.5: used disk size reported by quota and du mismatch
Hi Hari, good news for us! A few seconds ago, I submitted the gluster quota list command in order to save the current quota status. [root at s01 auto]# gluster volume quota tier2 list /ASC Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------------------------------------------------------------- /ASC 10.0TB 99%(9.9TB) 2.6TB 7.4TB No No In the same time, I was asking myself how I can stimulate a sort of directory ?scan? in order to refresh the quota value without waiting for the automatic scan. So, I decided to start a ?du -hs /tier2/ASC? session (without specify each single brick path as I usually do after quota-fsck script execution). [root at s01 auto]# du -hs /tier2/ASC 22G /tier2/ASC Now, magically, the quota value reflects the real disk space usage info provided by the ?du? command. [root at s01 auto]# gluster volume quota tier2 list /ASC Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------------------------------------------------------------- /ASC 10.0TB 99%(9.9TB) 21.8GB 10.0TB No No Do you think that I was only lucky or is there a particular reason why everything is now working? Thank you, Mauro> Il giorno 10 set 2018, alle ore 16:08, Mauro Tridici <mauro.tridici at cmcc.it> ha scritto: > > > Hi Hari, > > thank you very much for your support. > I will do everything you suggested and I will contact you as soon as all the steps will be completed. > > Thank you, > Mauro > >> Il giorno 10 set 2018, alle ore 16:02, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto: >> >> Hi Mauro, >> >> I went through the log file you have shared. >> I don't find any mismatch. >> >> This can be because of various reasons: >> 1) the accounting which was wrong is now fine. but as per your comment >> above if this is the case, >> then the crawl should still be happening which is why the its not yet >> reflected. (will reflect after a while) >> 2) the fix-issue part of the script might be wrong. >> 3) or the final script that we use might be wrong. >> >> You can wait for a while (based on the number of files the time will >> vary) and then see if the accounting is fine. >> If its not fine even after a while, then we will have to run the >> script (6th patch set has worked so can be reused) without "fix-issue" >> This will give us the mismatch in log file, which i can read and let >> you know where the lookup has to be done. >> On Mon, Sep 10, 2018 at 4:58 PM Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote: >>> >>> >>> Dear Hari, >>> >>> the log files that I attached to my last mail have been generated running quota-fsck script after deleting the files. >>> The quota-fsck script version that I used is the one in the following link https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py <https://review.gluster.org/#/c/19179/9..9/extras/quota/quota_fsck.py> >>> I didn?t edit the log files, but during the execution I forgot to redirect the stderr and stdout to the same log file, sorry, mea culpa! >>> >>> Anyway, as you suggested, I executed again the quota-fsck script with option ?fix-issues. >>> At the end of script execution, I launched the du command, but the problem is still there. >>> >>> [root at s02 auto]# df -hT /tier2/ASC/ >>> File system Tipo Dim. Usati Dispon. Uso% Montato su >>> s02-stg:tier2 fuse.glusterfs 10T 2,6T 7,5T 26% /tier2 >>> >>> I?m sorry to bother you so much. >>> Last time I used the script everything went smoothly, but this time it seems to be more difficult. >>> >>> In attachment you can find the new log files. >>> >>> Thank you, >>> Mauro >>> >>> >>> Il giorno 10 set 2018, alle ore 12:27, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto: >>> >>> On Mon, Sep 10, 2018 at 3:13 PM Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote: >>> >>> >>> >>> Dear Hari, >>> >>> I followed you suggestions, but, unfortunately, nothing is changed. >>> I tried to execute both the quota-fsck script with ?fix-issues options both the "setfattr -n trusted.glusterfs.quota.dirty -v 0x3100? command against the files and directory mentioned by you (on each available brick). >>> >>> >>> There can be an issue with fix-issue in the script. As the directories >>> with accounting mismatch awre found its better to set the dirty xattr >>> and then do a du(this way its wasy and has to resolve the issue). The >>> script can be used when we dont know where the issue is. >>> >>> Disk quota assigned to /tier2/ASC directory seems to be partially used (about 2,6 TB used), but the ?real and current? situation is the following one (I deleted all files in primavera directory): >>> >>> >>> If the files are deleted, then state of the log file from the script >>> is outdated. The folders I suggested are as per the old log file, So >>> setting the dirty xattr and then doing a lookup (du on that dir) might >>> not help. >>> >>> >>> [root at s03 qc]# du -hsc /tier2/ASC/* >>> 22G /tier2/ASC/orientgate >>> 26K /tier2/ASC/primavera >>> 22G totale >>> >>> So, I think that the problem should be only in "orientgate? or in ?primavera? directory, right!? >>> For this reason, in order to collect some fresh logs, I executed again the check script starting from the top level directory ?ASC? using the following bash script (named hari-20180910) based on the new version of quota_fsck (rel. 9): >>> >>> hari-20180910 script: >>> >>> #!/bin/bash >>> >>> #set -xv >>> >>> host=$(hostname) >>> >>> for i in {1..12} >>> do >>> ./quota_fsck_r9.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log >>> done >>> ~ >>> >>> In attachment, you can find the log files generated by the script. >>> >>> SOME IMPORTANT NOTES: >>> >>> - in the new log files, ?primavera? directory is no more present >>> >>> Is there something more that I can do? >>> >>> As there were files that were deleted, the accounting would have changed again. >>> >>> Need to look from the beginning, as the above suggestions may not be >>> true anymore. >>> >>> I find that the log files are edited. A few lines are missing. Can you >>> send the actual log file from running the script >>> And i would recommend you to run the script after all the files are >>> deleted (or other major modifications are done). >>> So that we can fix once at the end. >>> >>> If the fix-issue argument on script doesn't work on the directory/ >>> subdirectory where you find mismatch, then you can send the whole >>> file. >>> Will check the log and let you know where you need to do the lookup. >>> >>> >>> Thank you very much for your patience. >>> Regards, >>> Mauro >>> >>> >>> Il giorno 10 set 2018, alle ore 10:51, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto: >>> >>> Hi, >>> >>> Looking at the logs, I can see that the file: >>> >>> /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB_Balkan_8km_1971-2005 >>> /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_ERA40_NMMB_Balkan_8km_1971-2000 >>> /orientgate/ftp/climate/3_urban_adaptation_health/6_budapest_veszprem_hungary/RHMSS_CMCC-CM_NMMB-RCP8.5_Balkan_8km_2010-2100 >>> /primavera/cam >>> >>> has mismatch. >>> >>> You can try setting dirty for this and then do a du on it. >>> >>> A few corrections for my above comments. >>> The contri size in the xattr and the aggregated size have to be checked. >>> >>> On Mon, Sep 10, 2018 at 1:16 PM Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote: >>> >>> >>> >>> Hi Hari, >>> >>> thank you very much for your help. >>> I will try to use the latest available version of quota_fsck script and I will provide you a feedback as soon as possible. >>> >>> Thank you again for the detailed explanation. >>> Regards, >>> Mauro >>> >>> Il giorno 10 set 2018, alle ore 09:17, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto: >>> >>> Hi Mauro, >>> >>> The problem might be at some other place, So setting the xattr and >>> doing the lookup might not have fixed the issue. >>> >>> To resolve this we need to read the log file reported by the fsck >>> script. In this log file we need to look for the size reported by the >>> xattr (the value "SIZE:" in the log file) and the size reported by the >>> stat on the file (the value after "st_size=" ). >>> >>> >>> The contri size in the xattr and the aggregated size have to be checked >>> >>> These two should be the same. If they mismatch, then we have to find >>> the top most dir which has the mismatch. >>> >>> >>> Bottom most dir/file has to be found. Replace top with bottom in the >>> following places as well. >>> >>> On this top most directory you have to do a set dirty xattr and then >>> do a lookup. >>> >>> If there are two different directories without a common top directory, >>> then both these have to undergo the above process. >>> >>> The fsck script should work fine. can you try the "--fix-issue" with >>> the latest script instead of the 6th patch used above? >>> >>> >>> >>> >>> >>> >>> -- >>> Regards, >>> Hari Gowtham. >>> >>> >>> >>> ------------------------- >>> Mauro Tridici >>> >>> Fondazione CMCC >>> CMCC Supercomputing Center >>> presso Complesso Ecotekne - Universit? del Salento - >>> Strada Prov.le Lecce - Monteroni sn >>> 73100 Lecce IT >>> http://www.cmcc.it <http://www.cmcc.it/> >>> >>> mobile: (+39) 327 5630841 >>> email: mauro.tridici at cmcc.it >>> >>> >>> >>> -- >>> Regards, >>> Hari Gowtham. >>> >>> >>> >>> ------------------------- >>> Mauro Tridici >>> >>> Fondazione CMCC >>> CMCC Supercomputing Center >>> presso Complesso Ecotekne - Universit? del Salento - >>> Strada Prov.le Lecce - Monteroni sn >>> 73100 Lecce IT >>> http://www.cmcc.it >>> >>> mobile: (+39) 327 5630841 >>> email: mauro.tridici at cmcc.it >>> >> >> >> -- >> Regards, >> Hari Gowtham. > > > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Universit? del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it <http://www.cmcc.it/> > > mobile: (+39) 327 5630841 > email: mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>------------------------- Mauro Tridici Fondazione CMCC CMCC Supercomputing Center presso Complesso Ecotekne - Universit? del Salento - Strada Prov.le Lecce - Monteroni sn 73100 Lecce IT http://www.cmcc.it mobile: (+39) 327 5630841 email: mauro.tridici at cmcc.it -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180910/7d7b0bd9/attachment.html>