Robert Olson
2007-Nov-12 20:55 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
Since I''ve got my shiny new PPC64-based Debian Etch installation going, I decided to give Lustre another shot on my mac cluster (no cross-compilers required). The kernel patch and build went fine, using vanilla 2.6.18.8. I had some troubles with the lustre build itself; the main ones being that asm/segment.h doesn''t exist in powerpc 64-bit, and that the generic_find_next_le_bit patch did not apply. Apparently bitops.c is now in lib/find_next_bit.c instead of under the arch directory. I added generic_find_next_le_bit to find_next_bit.c and things seemed to build okay. I was able to fire everything up, creating merged MDT/MSG and an OST on one machine: mkfs.lustre --reformat --fsname datafs --mdt --mgs /dev/md0 mount -t lustre /dev/md0 /mnt/data/mdt mkfs.lustre --reformat --fsname datafs --ost -- mgsnode=192.5.200.12 at tcp /dev/sdc5 mount -t lustre /dev/sdc5 /mnt/data/ost0 and mounting on a client: mount -t lustre 192.5.200.12 at tcp:/datafs /tmp/lus However, when I tried to run bonnie++ I soon got errors & hangage. The kernel messages from the server machine are included below. The client is a NFS-root netbooted machine, served from the same machine hosting the Lustre servers, if that makes any difference. Running the same kernel & linux distribution. Thanks for any help / advice. --bob Lustre: Added LNI 192.5.200.12 at tcp [8/256] Lustre: Accept secure, port 988 Lustre: OBD class driver, info at clusterfs.com Lustre Version: 1.6.3 Build Version: 1.6.3-19700101000000- PRISTINE-.scratch.lustre.linux-2.6.18.8-2.6.18.8 Lustre: Lustre Client File System; info at clusterfs.com Lustre: Binding irq 54 to CPU 0 with cmd: echo 1 > /proc/irq/54/ smp_affinity kjournald starting. Commit interval 5 seconds LDISKFS FS on md0, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS FS on md0, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS FS on md0, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. Lustre: MGS MGS started Lustre: Enabling user_xattr Lustre: datafs-MDT0000: new disk, initializing Lustre: MDT datafs-MDT0000 now serving dev (datafs-MDT0000/7a7a4075- a2be-b14e-4c37-5d38acc1dbf0) with recovery enabled Lustre: Server datafs-MDT0000 on device /dev/md0 has started kjournald starting. Commit interval 5 seconds LDISKFS FS on sdc5, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS FS on sdc5, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS FS on sdc5, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled Lustre: Filtering OBD driver; info at clusterfs.com Lustre: datafs-OST0000: new disk, initializing Lustre: OST datafs-OST0000 now serving dev (datafs-OST0000/89964d15- f57b-8247-433d-ba88b70ed98d) with recovery enabled Lustre: Server datafs-OST0000 on device /dev/sdc5 has started Lustre: datafs-OST0000: received MDS connection from 0 at lo Lustre: MDS datafs-MDT0000: datafs-OST0000_UUID now active, resetting orphans LDISKFS-fs error (device sdc5): ldiskfs_ext_find_extent: bad header in inode #19431465: invalid eh_entries - magic f30a, entries 341, max 340(340), depth 0(0)
Andreas Dilger
2007-Nov-13 21:18 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
On Nov 12, 2007 14:55 -0600, Robert Olson wrote:> Since I''ve got my shiny new PPC64-based Debian Etch installation > going, I decided to give Lustre another shot on my mac cluster (no > cross-compilers required).As a starting point - we basically never test Lustre with a big-endian server, so while it works in theory I would instead suggest starting with a big-endian client and little-endian servers first, get that working, and then tackle the big endian server separately (likely using something like 2.6.22 ext4 as the starting point for ldiskfs, since the extent code has proper endian swabbing already). You could also try without mballoc and extents on the OSTs. Cheers, Andreas -- Andreas Dilger Sr. Software Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Robert Olson
2007-Nov-13 21:37 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
Hm, ok. (Unfortunately the bigendian server is where I''m going with this, but it will be useful to get things going with the intel hardware first, make sure I know what I''m doing). I''d love to go to the newer kernel, thought we were stuck at 2.6.18 with the vanilla series patches - is that not the case? thanks, --bob On Nov 13, 2007, at 3:18 PM, Andreas Dilger wrote:> On Nov 12, 2007 14:55 -0600, Robert Olson wrote: >> Since I''ve got my shiny new PPC64-based Debian Etch installation >> going, I decided to give Lustre another shot on my mac cluster (no >> cross-compilers required). > > As a starting point - we basically never test Lustre with a big-endian > server, so while it works in theory I would instead suggest starting > with a big-endian client and little-endian servers first, get that > working, > and then tackle the big endian server separately (likely using > something > like 2.6.22 ext4 as the starting point for ldiskfs, since the > extent code > has proper endian swabbing already). You could also try without > mballoc > and extents on the OSTs. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Software Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >
Andreas Dilger
2007-Nov-13 21:50 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
On Nov 13, 2007 15:37 -0600, Robert Olson wrote:> Hm, ok. (Unfortunately the bigendian server is where I''m going with this, > but it will be useful to get things going with the intel hardware first, > make sure I know what I''m doing). > > I''d love to go to the newer kernel, thought we were stuck at 2.6.18 with > the vanilla series patches - is that not the case?There is a kernel patch series for 2.6.22 in Bugzilla, but it hasn''t been tested yet.> On Nov 13, 2007, at 3:18 PM, Andreas Dilger wrote: > >> On Nov 12, 2007 14:55 -0600, Robert Olson wrote: >>> Since I''ve got my shiny new PPC64-based Debian Etch installation >>> going, I decided to give Lustre another shot on my mac cluster (no >>> cross-compilers required). >> >> As a starting point - we basically never test Lustre with a big-endian >> server, so while it works in theory I would instead suggest starting >> with a big-endian client and little-endian servers first, get that >> working, >> and then tackle the big endian server separately (likely using something >> like 2.6.22 ext4 as the starting point for ldiskfs, since the extent code >> has proper endian swabbing already). You could also try without mballoc >> and extents on the OSTs. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Software Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >>Cheers, Andreas -- Andreas Dilger Sr. Software Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Robert Olson
2007-Nov-16 17:56 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
> As a starting point - we basically never test Lustre with a big-endian > server, so while it works in theory I would instead suggest starting > with a big-endian client and little-endian servers first, get that > working,This works great - running a pair off OSTs on some intel boxes with clients on the PPC64 nodes. Initial iozone measurements are making me happy, seeing fairly decent performance over gigabit ethernet through at least a couple switches (the servers are some older/slower machines that sit elsewhere in the machine room from the cluster).> and then tackle the big endian server separately (likely using > something > like 2.6.22 ext4 as the starting point for ldiskfs, since the > extent code > has proper endian swabbing already). You could also try without > mballoc > and extents on the OSTs.Are these changes that can be made by someone ignorant of the implementation details of the lustre code? (config options, not applying some patches, etc?) I''d be happy to try things out but would need something of a roadmap to do so. thanks, --bob
Andreas Dilger
2007-Nov-16 23:06 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
On Nov 16, 2007 11:56 -0600, Robert Olson wrote:>> As a starting point - we basically never test Lustre with a big-endian >> server, so while it works in theory I would instead suggest starting >> with a big-endian client and little-endian servers first, get that >> working, > > This works great - running a pair off OSTs on some intel boxes with clients > on the PPC64 nodes. Initial iozone measurements are making me happy, seeing > fairly decent performance over gigabit ethernet through at least a couple > switches (the servers are some older/slower machines that sit elsewhere in > the machine room from the cluster).Good to hear.>> and then tackle the big endian server separately (likely using something >> like 2.6.22 ext4 as the starting point for ldiskfs, since the extent code >> has proper endian swabbing already). You could also try without mballoc >> and extents on the OSTs. > > Are these changes that can be made by someone ignorant of the > implementation details of the lustre code? (config options, not applying > some patches, etc?) I''d be happy to try things out but would need something > of a roadmap to do so.Well, you could start with the MDS on PPC, then try OSTs on PPC without "-o mballoc,extents" mount options (you might need to pass "-o nomballoc,noextents" to cancel out the former default options). As for porting the ldiskfs patches to ext4... I don''t think it is necessarily a simple task, but likely not impossible. It should be pretty clear which patches are already applied (extents, nlink, nanosecond), but porting some of them (e.g. mballoc) would be very tricky (it is just about done in the ext4 upstream repo) and at that point you can just mount without mballoc... Cheers, Andreas -- Andreas Dilger Sr. Software Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Peter Avakian
2007-Nov-17 06:09 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
>On Nov 12, 2007 14:55 -0600, Robert Olson wrote: >> Since I''ve got my shiny new PPC64-based Debian Etch installation >> going, I decided to give Lustre another shot on my mac cluster (no >> cross-compilers required).>On 14 November 2007 01:18 Andreas Dilger wrote: >As a starting point - we basically never test Lustre with a big-endian >server, so while it works in theory I would instead suggest starting >with a big-endian client and little-endian servers first, get that working, >and then tackle the big endian server separately (likely using something >like 2.6.22 ext4 as the starting point for ldiskfs, since the extent code >has proper endian swabbing already). You could also try without mballoc >and extents on the OSTs.I started reading this thread quite recently; I am not sure how exactly the little/big endian would you like to be tested? But I thought you might wanted to look at a simple Fortran based program (below) reflects the read/write I/O pattern. Using little-endian reads and writes a lot faster than big-endian (Linux of course). I get 500-550 MB/sec read and 200 to 375 MB/sec write with little-endian files (this was done on IA32/IA65 systems). The code was complied with the following options using intel compiler: 1) ifort -O3 -assume byterecl writer.f 2) ifort -O3 -convert big_endian -assume byterecl writer.f #cat writer.f implicit none integer, parameter :: number_x = 2000, number_y = 2000, number_z 250 integer i,j,k integer i_instant1, i_instant2, irate real*8 plane(number_x,number_y),cube(number_x,number_y,number_z) real*4 time,write_speed,read_speed character*80 fname,gname !test Fibre Channel disks using 8GB binary IEEE files: fname = ''/home/peter/test.ieee'' gname = ''/home/peter/test2.ieee'' fname = ''/home/peter/cmt2/test.ieee'' gname = ''/home/peter/cmt2/test2.ieee'' fname = ''/home/peter/cmt/test.ieee'' gname = ''/home/peter/cmt/test2.ieee'' !print *,''Reading file...'' call system_clock(i_instant1,irate) open(1,file=fname,form=''unformatted'', & access=''direct'',recl=kind(plane)*number_x*number_y) do k = 1,number_z read(1,rec=k)cube(:,:,k) enddo close(unit=1) call system_clock(i_instant2) time = (i_instant2-i_instant1)/float(irate) read_speed = 8.*number_x*number_y*number_z/time/1.e6 print *,''read:'',read_speed call system_clock(i_instant1,irate) !print *,''Writing file...'' open(1,file=gname,form=''unformatted'', !!!! & buffered=''yes'',blocksize=16384, & access=''direct'',recl=kind(plane)*number_x*number_y) do k = 1,number_z write(1,rec=k)cube(:,:,k) enddo close(unit=1) call system_clock(i_instant2) time = (i_instant2-i_instant1)/float(irate) write_speed = 8.*number_x*number_y*number_z/time/1.e6 print *,''write:'',write_speed end setenv F_UFMTENDIAN big read: 338.5341 write: 83.41084 read: 369.0582 write: 86.51231 read: 369.6755 .. . write: 88.34927 read: 368.6313 setenv F_UFMTENDIAN big:10,20 read: 807.5425 .. . write: 753.9417 read: 776.3146 write: 776.1639 Regards, -Peter
''Andreas Dilger''
2007-Nov-18 09:05 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
On Nov 17, 2007 10:09 +0400, Peter Avakian wrote:> I started reading this thread quite recently; I am not sure how exactly the > little/big endian would you like to be tested? But I thought you might > wanted to look at a simple Fortran based program (below) reflects the > read/write I/O pattern. > > Using little-endian reads and writes a lot faster than big-endian (Linux of > course). > I get 500-550 MB/sec read and 200 to 375 MB/sec write with little-endian > filesThis appears to just be doing endian conversion in the application data? Lustre doesn''t swap endianness of the data, so that isn''t a big performance issue. It only needs to swab the requests and metadata if the client and server are of different endianness and not if they are the same endianness. Cheers, Andreas -- Andreas Dilger Sr. Software Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Robert Olson
2007-Nov-19 19:25 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
> > Well, you could start with the MDS on PPC, then try OSTs on PPC > without > "-o mballoc,extents" mount options (you might need to pass > "-o nomballoc,noextents" to cancel out the former default options).OK, early indications are good here. Started out with MDT on PPC with OST on intel, ran iozone up to 1M files and it finished without error and with reasonable performance. Now running MDT + 1 OST on intel, 1 OST formatted with: mkfs.lustre --ost --fsname ppcfs --mgsnode=192.5.200.12 at tcp -- mountfsoptions=nomballoc,noextents /dev/sdc6 on PPC, iozone running in a directory set up to use the PPC OST with setstripe (cool that you can do that). Job is still running, but no errors, and intermediate results look like we''re seeing good performance. iostat reporting good numbers on the disk on the OST node. So what I am wondering now is what do I lose by turning of mballoc and extents. My jobs don''t do any sparse file writes or parallel writes to files, mostly fairly small file access and the creation of some large files. Thanks, --bob PS - interesting; readwrite test running now, driving the load avg on the MDT/OST node up to over 5. guessing a number of OST threads waiting on disk..
Robert Olson
2007-Nov-19 19:41 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
Wup, meant to say random read test, not readwrite. Though I''m also seeing fairly high 3-4 loadavgs during the straight read and write tests as well. On Nov 19, 2007, at 1:25 PM, Robert Olson wrote:> PS - interesting; readwrite test running now, driving the load avg > on the MDT/OST node up to over 5. guessing a number of OST threads > waiting on disk..-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071119/a47a3992/attachment-0002.html
Andreas Dilger
2007-Nov-21 22:30 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
On Nov 19, 2007 13:25 -0600, Robert Olson wrote:> OK, early indications are good here. Started out with MDT on PPC with OST > on intel, ran iozone up to 1M files and it finished without error and with > reasonable performance.Can you please run "e2fsck -fn" (from a Lustre-patched e2fsprogs) on the filesystems after your tests.> Now running MDT + 1 OST on intel, 1 OST formatted with: > > mkfs.lustre --ost --fsname ppcfs --mgsnode=192.5.200.12 at tcp > --mountfsoptions=nomballoc,noextents /dev/sdc6 > > on PPC, iozone running in a directory set up to use the PPC OST with > setstripe (cool that you can do that). > > Job is still running, but no errors, and intermediate results look like > we''re seeing good performance. iostat reporting good numbers on the disk on > the OST node. > > So what I am wondering now is what do I lose by turning of mballoc and > extents. My jobs don''t do any sparse file writes or parallel writes to > files, mostly fairly small file access and the creation of some large > files.The extents,mballoc options are primarily aimed at improving the performance under high load by reducing CPU usage and getting better allocation. if you have mostly small files then the performance difference won''t be huge. Cheers, Andreas -- Andreas Dilger Sr. Software Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Robert Olson
2007-Nov-21 22:37 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
On Nov 21, 2007, at 4:30 PM, Andreas Dilger wrote:> On Nov 19, 2007 13:25 -0600, Robert Olson wrote: >> OK, early indications are good here. Started out with MDT on PPC >> with OST >> on intel, ran iozone up to 1M files and it finished without error >> and with >> reasonable performance. > > Can you please run "e2fsck -fn" (from a Lustre-patched e2fsprogs) > on the > filesystems after your tests.What will that tell me?> > The extents,mballoc options are primarily aimed at improving the > performance > under high load by reducing CPU usage and getting better > allocation. if > you have mostly small files then the performance difference won''t > be huge.is this one of the changes that improves write performance? I''ve notice write performance lagging reads. I''ve got the system running with a 7 OSTs, and getting near-wire-rate reads from wide stripes to a single node with large files according to iozone. --bob
Andreas Dilger
2007-Nov-21 22:55 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
On Nov 21, 2007 16:37 -0600, Robert Olson wrote:> On Nov 21, 2007, at 4:30 PM, Andreas Dilger wrote: >> On Nov 19, 2007 13:25 -0600, Robert Olson wrote: >>> OK, early indications are good here. Started out with MDT on PPC with >>> OST >>> on intel, ran iozone up to 1M files and it finished without error and >>> with >>> reasonable performance. >> >> Can you please run "e2fsck -fn" (from a Lustre-patched e2fsprogs) on the >> filesystems after your tests. > > What will that tell me?It will tell me if some endian bug is corrupting your ext3 filesystem (and possibly if there are endian bugs in our e2fsprogs patches)...>> The extents,mballoc options are primarily aimed at improving the >> performance >> under high load by reducing CPU usage and getting better allocation. if >> you have mostly small files then the performance difference won''t be huge. > > is this one of the changes that improves write performance? I''ve notice > write performance lagging reads.Yes, mballoc+extents does improve write performance. You could do a test on the x86_64 system to compare the difference. Cheers, Andreas -- Andreas Dilger Sr. Software Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Robert Olson
2007-Nov-21 23:37 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
On Nov 21, 2007, at 4:55 PM, Andreas Dilger wrote:> On Nov 21, 2007 16:37 -0600, Robert Olson wrote: >> On Nov 21, 2007, at 4:30 PM, Andreas Dilger wrote: >>> On Nov 19, 2007 13:25 -0600, Robert Olson wrote: >>>> OK, early indications are good here. Started out with MDT on >>>> PPC with >>>> OST >>>> on intel, ran iozone up to 1M files and it finished without >>>> error and >>>> with >>>> reasonable performance. >>> >>> Can you please run "e2fsck -fn" (from a Lustre-patched e2fsprogs) >>> on the >>> filesystems after your tests. >> >> What will that tell me? > > It will tell me if some endian bug is corrupting your ext3 filesystem > (and possibly if there are endian bugs in our e2fsprogs patches)...Ahh, that would be good to know :-) Do you mean running it on the underlying OST, or on the filesystem as a whole? hm, dumb question, I don''t see a source tarball of the lustre e2fsprogs; is there one other than in the src rpms at ftp:// ftp.lustre.org/pub/lustre/other/e2fsprogs/. --bob
Andreas Dilger
2007-Nov-22 23:32 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
On Nov 21, 2007 17:37 -0600, Robert Olson wrote:> > It will tell me if some endian bug is corrupting your ext3 filesystem > > (and possibly if there are endian bugs in our e2fsprogs patches)... > > Ahh, that would be good to know :-) > > Do you mean running it on the underlying OST, or on the filesystem as > a whole?On the underlying OST or MDS that is on a PPC system.> hm, dumb question, I don''t see a source tarball of the lustre > e2fsprogs; is there one other than in the src rpms at ftp:// > ftp.lustre.org/pub/lustre/other/e2fsprogs/.We don''t make a patched tarball, but you can use the .src.rpm and extract it with cpio. Cheers, Andreas -- Andreas Dilger Sr. Software Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Robert Olson
2008-Jan-10 22:08 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
On Nov 21, 2007, at 4:55 PM, Andreas Dilger wrote:> On Nov 21, 2007 16:37 -0600, Robert Olson wrote: >> On Nov 21, 2007, at 4:30 PM, Andreas Dilger wrote: >>> On Nov 19, 2007 13:25 -0600, Robert Olson wrote: >>>> OK, early indications are good here. Started out with MDT on >>>> PPC with >>>> OST >>>> on intel, ran iozone up to 1M files and it finished without >>>> error and >>>> with >>>> reasonable performance. >>> >>> Can you please run "e2fsck -fn" (from a Lustre-patched e2fsprogs) >>> on the >>> filesystems after your tests. >> >> What will that tell me? > > It will tell me if some endian bug is corrupting your ext3 filesystem > (and possibly if there are endian bugs in our e2fsprogs patches)...Finally getting around to trying this. One one of the OSTs: root at bio-ppc-38:~# /scratch/olson/e2fsprogs/sbin/e2fsck -fn /dev/sda4 e2fsck 1.40.4.cfs1 (31-Dec-2007) Warning: skipping journal recovery because doing a read-only filesystem check. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (119027670, counted=117766441). Fix? no Free inodes count wrong (30269427, counted=30057352). Fix? no ppcfs-OST0001: ********** WARNING: Filesystem still has errors ********** ppcfs-OST0001: 13/30269440 files (261.5% non-contiguous), 2020192/121047862 blocks
Robert Olson
2008-Jan-10 22:38 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
The MDT and another OST I checked also had the same error: root at bio-ppc-head-3:/scratch/lustre/e2fsprogs/e2fsprogs-1.40.4.cfs1# / scratch/olson/e2fsprogs/sbin/e2fsck -fn /dev/md0 e2fsck 1.40.4.cfs1 (31-Dec-2007) Warning: skipping journal recovery because doing a read-only filesystem check. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (82818517, counted=82619128). Fix? no Free inodes count wrong (94797811, counted=93117818). Fix? no ppcfs-MDT0000: ********** WARNING: Filesystem still has errors ********** ppcfs-MDT0000: 13/94797824 files (61.5% non-contiguous), 11975451/94793968 blocks root at bio-ppc-39:~# /scratch/olson/e2fsprogs/sbin/e2fsck -fn /dev/sda4 e2fsck 1.40.4.cfs1 (31-Dec-2007) Warning: skipping journal recovery because doing a read-only filesystem check. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (119027670, counted=118115465). Fix? no Free inodes count wrong (30269427, counted=30057292). Fix? no ppcfs-OST0004: ********** WARNING: Filesystem still has errors ********** ppcfs-OST0004: 13/30269440 files (261.5% non-contiguous), 2020192/121047862 blocks
Andreas Dilger
2008-Jan-10 23:26 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
On Jan 10, 2008 16:38 -0600, Robert Olson wrote:> The MDT and another OST I checked also had the same error: > > root at bio-ppc-head-3:/scratch/lustre/e2fsprogs/e2fsprogs-1.40.4.cfs1# > /scratch/olson/e2fsprogs/sbin/e2fsck -fn /dev/md0 > e2fsck 1.40.4.cfs1 (31-Dec-2007) > Warning: skipping journal recovery because doing a read-only filesystem > check. > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > Free blocks count wrong (82818517, counted=82619128). > Fix? no > > Free inodes count wrong (94797811, counted=93117818). > Fix? noThis is pretty normal if you are checking a filesystem read-only. The superblock summaries are not updated, to avoid lock contention. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Robert Olson
2008-Jan-10 23:48 UTC
[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster
oh excellent, that is great news. so it looks like we''re not seeing endian-based corruption then.? thanks, --bob On Jan 10, 2008, at 5:26 PM, Andreas Dilger wrote:> On Jan 10, 2008 16:38 -0600, Robert Olson wrote: >> The MDT and another OST I checked also had the same error: >> >> root at bio-ppc-head-3:/scratch/lustre/e2fsprogs/e2fsprogs-1.40.4.cfs1# >> /scratch/olson/e2fsprogs/sbin/e2fsck -fn /dev/md0 >> e2fsck 1.40.4.cfs1 (31-Dec-2007) >> Warning: skipping journal recovery because doing a read-only >> filesystem >> check. >> Pass 1: Checking inodes, blocks, and sizes >> Pass 2: Checking directory structure >> Pass 3: Checking directory connectivity >> Pass 4: Checking reference counts >> Pass 5: Checking group summary information >> Free blocks count wrong (82818517, counted=82619128). >> Fix? no >> >> Free inodes count wrong (94797811, counted=93117818). >> Fix? no > > This is pretty normal if you are checking a filesystem read-only. The > superblock summaries are not updated, to avoid lock contention. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >