I''m having trouble with a filesystem after a planned outage. I''m new to Lustre, but I''ll do my best to explain what I''m seeing. * I have a 1.6.7.2 MDS and 4 1.8.5 OSS''s. When I brought the system up, it reported the used space incorrectly (df -h shows 166T total, 150T used and 7.1 T free) * I unmounted and ran e2fsck on all of the OST''s and the MDS without error, but when I remounted everything the error persisted. * When I ran lfsck in read-only mode, it basically showed all of the files on the two OST''s of a single OSS as "not created". Is there a way to save this data? Any help or suggestions would be greatly appreciated. Thanks, Ben Verduzco | SAIC Systems Administrator | Surveillance and Reconnaissance Group (SRG) phone: 858-826-3884 | pager: 877-619-3140 benjamin.p.verduzco at saic.com | saic.com <http://www.saic.com/> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20130403/4695bc29/attachment-0001.html
On Apr 3, 2013, at 1:52 PM, "Verduzco, Benjamin P." <BENJAMIN.P.VERDUZCO at saic.com> wrote:> ? When I brought the system up, it reported the used space incorrectly (df ?h shows 166T total, 150T used and 7.1 T free)When you say that the used space is being reported incorrectly, do you mean that the sum of the used and free space does not match the total space? Or do you have a reason to believe that 150 TB is not actually used? If it is the former, then that could be explained by the fact that ext file systems by default reserve 5% of the disk space for the root user. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu
On 2013-04-16, at 10:49, "Verduzco, Benjamin P." <BENJAMIN.P.VERDUZCO at saic.com<mailto:BENJAMIN.P.VERDUZCO at saic.com>> wrote: I?m having trouble with a filesystem after a planned outage. I?m new to Lustre, but I?ll do my best to explain what I?m seeing. ? I have a 1.6.7.2 MDS and 4 1.8.5 OSS?s. In general, while it is possible to have mismatched client and server versions, it is never a good idea to run different Lustre versions on the servers. Cheers, Andreas When I brought the system up, it reported the used space incorrectly (df ?h shows 166T total, 150T used and 7.1 T free) ? I unmounted and ran e2fsck on all of the OST?s and the MDS without error, but when I remounted everything the error persisted. ? When I ran lfsck in read-only mode, it basically showed all of the files on the two OST?s of a single OSS as ?not created?. Is there a way to save this data? Any help or suggestions would be greatly appreciated. Thanks, Ben Verduzco | SAIC Systems Administrator | Surveillance and Reconnaissance Group (SRG) phone: 858-826-3884 | pager: 877-619-3140 benjamin.p.verduzco at saic.com<mailto:benjamin.p.verduzco at saic.com> | saic.com<http://www.saic.com/> _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at lists.lustre.org> http://lists.lustre.org/mailman/listinfo/lustre-discuss
That''s correct, the used and free don''t equal the total. While there is just about a 5% difference, I know the file system was reporting a more reasonable 158 T used before the reboot. Despite the file system size oddness, it seems to be working, so we''ll keep it in service for now, but we''re moving up our plans to replace our MDS and have consistent versions of both Lustre and the OS across all hosts. Thanks, Ben -----Original Message----- From: Mohr Jr, Richard Frank (Rick Mohr) [mailto:rmohr at utk.edu] Sent: Wednesday, April 17, 2013 5:07 PM To: Verduzco, Benjamin P. Cc: <Lustre-discuss at lists.lustre.org> Subject: Re: [Lustre-discuss] OST corruption? On Apr 3, 2013, at 1:52 PM, "Verduzco, Benjamin P." <BENJAMIN.P.VERDUZCO at saic.com> wrote:> * When I brought the system up, it reported the used spaceincorrectly (df -h shows 166T total, 150T used and 7.1 T free) When you say that the used space is being reported incorrectly, do you mean that the sum of the used and free space does not match the total space? Or do you have a reason to believe that 150 TB is not actually used? If it is the former, then that could be explained by the fact that ext file systems by default reserve 5% of the disk space for the root user. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu
FWIW, your total should be used+free+reserved There is normally a % set aside for root only This is changeable too. tune2fs -m 0 <device> Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238> -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss- > bounces at lists.lustre.org] On Behalf Of Verduzco, Benjamin P. > Sent: Monday, April 22, 2013 9:49 AM > To: Mohr Jr, Richard Frank (Rick Mohr) > Cc: Lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] OST corruption? > > That''s correct, the used and free don''t equal the total. While there > is just about a 5% difference, I know the file system was reporting a more > reasonable 158 T used before the reboot. > > Despite the file system size oddness, it seems to be working, so we''ll keep it > in service for now, but we''re moving up our plans to replace our MDS and > have consistent versions of both Lustre and the OS across all hosts. > > > Thanks, > Ben > -----Original Message----- > From: Mohr Jr, Richard Frank (Rick Mohr) [mailto:rmohr at utk.edu] > Sent: Wednesday, April 17, 2013 5:07 PM > To: Verduzco, Benjamin P. > Cc: <Lustre-discuss at lists.lustre.org> > Subject: Re: [Lustre-discuss] OST corruption? > > > On Apr 3, 2013, at 1:52 PM, "Verduzco, Benjamin P." > <BENJAMIN.P.VERDUZCO at saic.com> wrote: > > > * When I brought the system up, it reported the used space > incorrectly (df -h shows 166T total, 150T used and 7.1 T free) > > When you say that the used space is being reported incorrectly, do you > mean that the sum of the used and free space does not match the total > space? Or do you have a reason to believe that 150 TB is not actually > used? If it is the former, then that could be explained by the fact > that ext file systems by default reserve 5% of the disk space for the > root user. > > -- > Rick Mohr > Senior HPC System Administrator > National Institute for Computational Sciences > http://www.nics.tennessee.edu > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hello! In reality there''s no "used" or "reserved" in the statfs(2) output. All you get is total, free and avail. df and the likes typically calculate used as total-avail. Reserved would be avail-free. Bye, Oleg On Apr 26, 2013, at 10:56 AM, Andrus, Brian Contractor wrote:> FWIW, your total should be used+free+reserved > There is normally a % set aside for root only > This is changeable too. > tune2fs -m 0 <device> > > > Brian Andrus > ITACS/Research Computing > Naval Postgraduate School > Monterey, California > voice: 831-656-6238 > > >> -----Original Message----- >> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss- >> bounces at lists.lustre.org] On Behalf Of Verduzco, Benjamin P. >> Sent: Monday, April 22, 2013 9:49 AM >> To: Mohr Jr, Richard Frank (Rick Mohr) >> Cc: Lustre-discuss at lists.lustre.org >> Subject: Re: [Lustre-discuss] OST corruption? >> >> That''s correct, the used and free don''t equal the total. While there >> is just about a 5% difference, I know the file system was reporting a more >> reasonable 158 T used before the reboot. >> >> Despite the file system size oddness, it seems to be working, so we''ll keep it >> in service for now, but we''re moving up our plans to replace our MDS and >> have consistent versions of both Lustre and the OS across all hosts. >> >> >> Thanks, >> Ben >> -----Original Message----- >> From: Mohr Jr, Richard Frank (Rick Mohr) [mailto:rmohr at utk.edu] >> Sent: Wednesday, April 17, 2013 5:07 PM >> To: Verduzco, Benjamin P. >> Cc: <Lustre-discuss at lists.lustre.org> >> Subject: Re: [Lustre-discuss] OST corruption? >> >> >> On Apr 3, 2013, at 1:52 PM, "Verduzco, Benjamin P." >> <BENJAMIN.P.VERDUZCO at saic.com> wrote: >> >>> * When I brought the system up, it reported the used space >> incorrectly (df -h shows 166T total, 150T used and 7.1 T free) >> >> When you say that the used space is being reported incorrectly, do you >> mean that the sum of the used and free space does not match the total >> space? Or do you have a reason to believe that 150 TB is not actually >> used? If it is the former, then that could be explained by the fact >> that ext file systems by default reserve 5% of the disk space for the >> root user. >> >> -- >> Rick Mohr >> Senior HPC System Administrator >> National Institute for Computational Sciences >> http://www.nics.tennessee.edu >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Thanks Brian, Rick mentioned the reserve as well, but tune2fs was run when the OSS''s were put into service and I thought it was persistent. I ran it again yesterday on all OST''s and now used and free equal total. Feeling a little dumb. Thanks all for your help. Ben -----Original Message----- From: Andrus, Brian Contractor [mailto:bdandrus at nps.edu] Sent: Friday, April 26, 2013 7:57 AM To: Verduzco, Benjamin P.; Mohr Jr, Richard Frank (Rick Mohr) Cc: Lustre-discuss at lists.lustre.org Subject: RE: [Lustre-discuss] OST corruption? FWIW, your total should be used+free+reserved There is normally a % set aside for root only This is changeable too. tune2fs -m 0 <device> Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238> -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss- > bounces at lists.lustre.org] On Behalf Of Verduzco, Benjamin P. > Sent: Monday, April 22, 2013 9:49 AM > To: Mohr Jr, Richard Frank (Rick Mohr) > Cc: Lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] OST corruption? > > That''s correct, the used and free don''t equal the total. Whilethere> is just about a 5% difference, I know the file system was reporting amore> reasonable 158 T used before the reboot. > > Despite the file system size oddness, it seems to be working, so we''llkeep it> in service for now, but we''re moving up our plans to replace our MDSand> have consistent versions of both Lustre and the OS across all hosts. > > > Thanks, > Ben > -----Original Message----- > From: Mohr Jr, Richard Frank (Rick Mohr) [mailto:rmohr at utk.edu] > Sent: Wednesday, April 17, 2013 5:07 PM > To: Verduzco, Benjamin P. > Cc: <Lustre-discuss at lists.lustre.org> > Subject: Re: [Lustre-discuss] OST corruption? > > > On Apr 3, 2013, at 1:52 PM, "Verduzco, Benjamin P." > <BENJAMIN.P.VERDUZCO at saic.com> wrote: > > > * When I brought the system up, it reported the used space > incorrectly (df -h shows 166T total, 150T used and 7.1 T free) > > When you say that the used space is being reported incorrectly, do you > mean that the sum of the used and free space does not match the total > space? Or do you have a reason to believe that 150 TB is not actually > used? If it is the former, then that could be explained by the fact > that ext file systems by default reserve 5% of the disk space for the > root user. > > -- > Rick Mohr > Senior HPC System Administrator > National Institute for Computational Sciences > http://www.nics.tennessee.edu > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On 2013/26/04 8:56 AM, "Andrus, Brian Contractor" <bdandrus at nps.edu> wrote:>FWIW, your total should be used+free+reserved >There is normally a % set aside for root only >This is changeable too. >tune2fs -m 0 <device>If your filesystem ever gets totally full, it will have permanent filesystem fragmentation for any files written at this point. I agree that 5% of a 10TB+ filesystem is probably too much to reserve, but at least 1% is reasonable to avoid fragmentation. Cheers, Andreas -- Andreas Dilger Lustre Software Architect Intel High Performance Data Division