Hey, we''re running Lustre 1.8.5 on CentOS 5.3 (rocks). It seems that we have a corrupted file system, so I followed the steps given in the manual: http://wiki.lustre.org/manual/LustreManual18_HTML/LustreRecovery.html#50651260_pgfId-1291230 Steps 1-4 run smoothly. I shared the files created on the mds per NFS. $ ls -l ls /mnt/ total 14164076 -rw-r--r-- 1 root root 46471111680 Jun 6 17:51 mdsdb -rw-r--r-- 1 root root 106496 Jun 6 16:49 mdsdb.mdshdr drwxrwxrwx 6 root root 4096 Jun 6 18:01 osts Step 5 fails on all osts: $ e2fsck -v --mdsdb /mnt/mdsdb --ostdb /mnt/osts/ost2/ost2db /dev/sda3 e2fsck 1.41.10.sun2 (24-Feb-2010) lustre-OST0002 lustre database creation, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Pass 6: Acquiring information for lfsck /mnt/mdsdb:mdshdr : Permission denied failure to open database mdshdr: Input/output error e2fsck: aborted I changed permissions to 666, and tried to create a link mdsdb:mdshdr that points to mdsdb.mdshdr (I figured this might be a bug as the file indicated above has a '':'' instead of a ''.''). Neither worked. Any ideas? Cheers, Arne -- Arne Brutschy Ph.D. Student Email arne.brutschy(AT)ulb.ac.be IRIDIA CP 194/6 Web iridia.ulb.ac.be/~abrutschy Universite'' Libre de Bruxelles Tel +32 2 650 2273 Avenue Franklin Roosevelt 50 Fax +32 2 650 2715 1050 Bruxelles, Belgium (Tel and Fax both IRIDIA secretary)
Arne Brutschy
2012-Jun-07 15:49 UTC
[Lustre-discuss] Problems running e2fsck on oss - incredibly slow!
Answering to myself: I needed to give not only write permissions to the ostdb directory and the mdsdb, but also to the parent directory as well. It seems to work now. On a sidenote, does anyone know why the e2fsck is so incredibly slow? We have only two osts per system, each running on a 1TB RAID1. The check is now running since over 6 hours (!) Maybe there another cause for the IO problems I observed before starting to fix corruption... Cheers, Arne On Thu, 7 Jun 2012 11:13:37 +0200 Arne Brutschy <arne.brutschy at ulb.ac.be> wrote:> Hey, > > we''re running Lustre 1.8.5 on CentOS 5.3 (rocks). It seems that we > have a corrupted file system, so I followed the steps given in the > manual: > http://wiki.lustre.org/manual/LustreManual18_HTML/LustreRecovery.html#50651260_pgfId-1291230 > > Steps 1-4 run smoothly. I shared the files created on the mds per NFS. > > $ ls -l ls /mnt/ > total 14164076 > -rw-r--r-- 1 root root 46471111680 Jun 6 17:51 mdsdb > -rw-r--r-- 1 root root 106496 Jun 6 16:49 mdsdb.mdshdr > drwxrwxrwx 6 root root 4096 Jun 6 18:01 osts > > Step 5 fails on all osts: > > $ e2fsck -v --mdsdb /mnt/mdsdb > --ostdb /mnt/osts/ost2/ost2db /dev/sda3 e2fsck 1.41.10.sun2 > (24-Feb-2010) lustre-OST0002 lustre database creation, check forced. > Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory > structure Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > Pass 6: Acquiring information for lfsck > /mnt/mdsdb:mdshdr > : Permission denied > failure to open database mdshdr: Input/output error > e2fsck: aborted > > I changed permissions to 666, and tried to create a link mdsdb:mdshdr > that points to mdsdb.mdshdr (I figured this might be a bug as the file > indicated above has a '':'' instead of a ''.''). Neither worked. > > Any ideas? > > Cheers, > Arne > > > >-- Arne Brutschy Ph.D. Student Email arne.brutschy(AT)ulb.ac.be IRIDIA CP 194/6 Web iridia.ulb.ac.be/~abrutschy Universite'' Libre de Bruxelles Tel +32 2 650 2273 Avenue Franklin Roosevelt 50 Fax +32 2 650 2715 1050 Bruxelles, Belgium (Tel and Fax both IRIDIA secretary)
Andreas Dilger
2012-Jun-07 17:21 UTC
[Lustre-discuss] Problems running e2fsck on oss - incredibly slow!
On 2012-06-07, at 9:49, Arne Brutschy <arne.brutschy at ulb.ac.be> wrote:> Answering to myself: I needed to give not only write permissions to the > ostdb directory and the mdsdb, but also to the parent directory as > well. It seems to work now. > > On a sidenote, does anyone know why the e2fsck is so incredibly slow? > We have only two osts per system, each running on a 1TB RAID1. The > check is now running since over 6 hours (!)Note that it is possible to run lfsck while the filesystem is mounted. The lfsck code has always been very slow because of the external databases. That, and the difficulty in building and moving these databases, is the reason why we are working on an in-kernel lfsck replacement. The new lfsck will be available in 2.4.> Maybe there another cause for the IO problems I observed before > starting to fix corruption... > > Cheers, > Arne > > On Thu, 7 Jun 2012 11:13:37 +0200 Arne Brutschy > <arne.brutschy at ulb.ac.be> wrote: >> Hey, >> >> we''re running Lustre 1.8.5 on CentOS 5.3 (rocks). It seems that we >> have a corrupted file system, so I followed the steps given in the >> manual: >> http://wiki.lustre.org/manual/LustreManual18_HTML/LustreRecovery.html#50651260_pgfId-1291230 >> >> Steps 1-4 run smoothly. I shared the files created on the mds per NFS. >> >> $ ls -l ls /mnt/ >> total 14164076 >> -rw-r--r-- 1 root root 46471111680 Jun 6 17:51 mdsdb >> -rw-r--r-- 1 root root 106496 Jun 6 16:49 mdsdb.mdshdr >> drwxrwxrwx 6 root root 4096 Jun 6 18:01 osts >> >> Step 5 fails on all osts: >> >> $ e2fsck -v --mdsdb /mnt/mdsdb >> --ostdb /mnt/osts/ost2/ost2db /dev/sda3 e2fsck 1.41.10.sun2 >> (24-Feb-2010) lustre-OST0002 lustre database creation, check forced. >> Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory >> structure Pass 3: Checking directory connectivity >> Pass 4: Checking reference counts >> Pass 5: Checking group summary information >> Pass 6: Acquiring information for lfsck >> /mnt/mdsdb:mdshdr >> : Permission denied >> failure to open database mdshdr: Input/output error >> e2fsck: aborted >> >> I changed permissions to 666, and tried to create a link mdsdb:mdshdr >> that points to mdsdb.mdshdr (I figured this might be a bug as the file >> indicated above has a '':'' instead of a ''.''). Neither worked. >> >> Any ideas? >> >> Cheers, >> Arne >> >> >> >> > > > > -- > Arne Brutschy > Ph.D. Student Email arne.brutschy(AT)ulb.ac.be > IRIDIA CP 194/6 Web iridia.ulb.ac.be/~abrutschy > Universite'' Libre de Bruxelles Tel +32 2 650 2273 > Avenue Franklin Roosevelt 50 Fax +32 2 650 2715 > 1050 Bruxelles, Belgium (Tel and Fax both IRIDIA secretary) > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss