thr3ads.net - Lustre discuss - [Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Robert Olson

2007-Nov-12 20:55 UTC

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

Since I''ve got my shiny new PPC64-based Debian Etch installation  
going, I decided to give Lustre another shot on my mac cluster (no  
cross-compilers required).

The kernel patch and build went fine, using vanilla 2.6.18.8.

I had some troubles with the lustre build itself; the main ones being  
that asm/segment.h doesn''t exist in powerpc 64-bit, and that the  
generic_find_next_le_bit patch did not apply. Apparently bitops.c is  
now in lib/find_next_bit.c instead of under the arch directory. I  
added generic_find_next_le_bit to find_next_bit.c and things seemed  
to build okay.

I was able to fire everything up, creating merged MDT/MSG and an OST  
on one machine:

mkfs.lustre --reformat --fsname datafs --mdt --mgs /dev/md0
mount -t lustre /dev/md0 /mnt/data/mdt
mkfs.lustre --reformat --fsname datafs --ost -- 
mgsnode=192.5.200.12 at tcp /dev/sdc5
mount -t lustre /dev/sdc5 /mnt/data/ost0

and mounting on a client:

mount -t lustre 192.5.200.12 at tcp:/datafs /tmp/lus

However, when I tried to run bonnie++ I soon got errors & hangage.  
The kernel messages from the server machine are included below.

The client is a NFS-root netbooted machine, served from the same  
machine hosting the Lustre servers, if that makes any difference.  
Running the same kernel & linux distribution.

Thanks for any help / advice.

--bob


Lustre: Added LNI 192.5.200.12 at tcp [8/256]
Lustre: Accept secure, port 988
Lustre: OBD class driver, info at clusterfs.com
         Lustre Version: 1.6.3
         Build Version: 1.6.3-19700101000000- 
PRISTINE-.scratch.lustre.linux-2.6.18.8-2.6.18.8
Lustre: Lustre Client File System; info at clusterfs.com
Lustre: Binding irq 54 to CPU 0 with cmd: echo 1 > /proc/irq/54/ 
smp_affinity
kjournald starting.  Commit interval 5 seconds
LDISKFS FS on md0, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
LDISKFS FS on md0, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
LDISKFS FS on md0, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
Lustre: MGS MGS started
Lustre: Enabling user_xattr
Lustre: datafs-MDT0000: new disk, initializing
Lustre: MDT datafs-MDT0000 now serving dev (datafs-MDT0000/7a7a4075- 
a2be-b14e-4c37-5d38acc1dbf0) with recovery enabled
Lustre: Server datafs-MDT0000 on device /dev/md0 has started
kjournald starting.  Commit interval 5 seconds
LDISKFS FS on sdc5, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
LDISKFS FS on sdc5, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
LDISKFS FS on sdc5, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
LDISKFS-fs: file extents enabled
LDISKFS-fs: mballoc enabled
Lustre: Filtering OBD driver; info at clusterfs.com
Lustre: datafs-OST0000: new disk, initializing
Lustre: OST datafs-OST0000 now serving dev (datafs-OST0000/89964d15- 
f57b-8247-433d-ba88b70ed98d) with recovery enabled
Lustre: Server datafs-OST0000 on device /dev/sdc5 has started
Lustre: datafs-OST0000: received MDS connection from 0 at lo
Lustre: MDS datafs-MDT0000: datafs-OST0000_UUID now active, resetting  
orphans
LDISKFS-fs error (device sdc5): ldiskfs_ext_find_extent: bad header  
in inode #19431465: invalid eh_entries - magic f30a, entries 341, max  
340(340), depth 0(0)

Andreas Dilger

2007-Nov-13 21:18 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

On Nov 12, 2007  14:55 -0600, Robert Olson wrote:> Since I''ve got my shiny new PPC64-based Debian Etch installation  
> going, I decided to give Lustre another shot on my mac cluster (no  
> cross-compilers required).
As a starting point - we basically never test Lustre with a big-endian
server, so while it works in theory I would instead suggest starting
with a big-endian client and little-endian servers first, get that working,
and then tackle the big endian server separately (likely using something
like 2.6.22 ext4 as the starting point for ldiskfs, since the extent code
has proper endian swabbing already).  You could also try without mballoc
and extents on the OSTs.

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Robert Olson

2007-Nov-13 21:37 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

Hm, ok. (Unfortunately the bigendian server is where I''m going with  
this, but it will be useful to get things going with the intel  
hardware first,  make sure I know what I''m doing).

I''d love to go to the newer kernel, thought we were stuck at 2.6.18  
with the vanilla series patches - is that not the case?

thanks,
--bob

On Nov 13, 2007, at 3:18 PM, Andreas Dilger wrote:
> On Nov 12, 2007  14:55 -0600, Robert Olson wrote:
>> Since I''ve got my shiny new PPC64-based Debian Etch
installation
>> going, I decided to give Lustre another shot on my mac cluster (no
>> cross-compilers required).
>
> As a starting point - we basically never test Lustre with a big-endian
> server, so while it works in theory I would instead suggest starting
> with a big-endian client and little-endian servers first, get that  
> working,
> and then tackle the big endian server separately (likely using  
> something
> like 2.6.22 ext4 as the starting point for ldiskfs, since the  
> extent code
> has proper endian swabbing already).  You could also try without  
> mballoc
> and extents on the OSTs.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Software Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>

Andreas Dilger

2007-Nov-13 21:50 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

On Nov 13, 2007  15:37 -0600, Robert Olson wrote:> Hm, ok. (Unfortunately the bigendian server is where I''m going
with this,
> but it will be useful to get things going with the intel hardware first,  
> make sure I know what I''m doing).
>
> I''d love to go to the newer kernel, thought we were stuck at
2.6.18 with
> the vanilla series patches - is that not the case?
There is a kernel patch series for 2.6.22 in Bugzilla, but it hasn''t
been
tested yet.
> On Nov 13, 2007, at 3:18 PM, Andreas Dilger wrote:
>
>> On Nov 12, 2007  14:55 -0600, Robert Olson wrote:
>>> Since I''ve got my shiny new PPC64-based Debian Etch
installation
>>> going, I decided to give Lustre another shot on my mac cluster (no
>>> cross-compilers required).
>>
>> As a starting point - we basically never test Lustre with a big-endian
>> server, so while it works in theory I would instead suggest starting
>> with a big-endian client and little-endian servers first, get that 
>> working,
>> and then tackle the big endian server separately (likely using
something
>> like 2.6.22 ext4 as the starting point for ldiskfs, since the extent
code
>> has proper endian swabbing already).  You could also try without
mballoc
>> and extents on the OSTs.
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Sr. Software Engineer, Lustre Group
>> Sun Microsystems of Canada, Inc.
>>
Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Robert Olson

2007-Nov-16 17:56 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

> As a starting point - we basically never test Lustre with a big-endian
> server, so while it works in theory I would instead suggest starting
> with a big-endian client and little-endian servers first, get that  
> working,
This works great - running a pair off OSTs on some intel boxes with  
clients on the PPC64 nodes. Initial iozone measurements are making me  
happy, seeing fairly decent performance over gigabit ethernet through  
at least a couple switches (the servers are some older/slower  
machines that sit elsewhere in the machine room from the cluster).
> and then tackle the big endian server separately (likely using  
> something
> like 2.6.22 ext4 as the starting point for ldiskfs, since the  
> extent code
> has proper endian swabbing already).  You could also try without  
> mballoc
> and extents on the OSTs.
Are these changes that can be made by someone ignorant of the  
implementation details of the lustre code? (config options, not  
applying some patches, etc?) I''d be happy to try things out but would  
need something of a roadmap to do so.

thanks,
--bob

Andreas Dilger

2007-Nov-16 23:06 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

On Nov 16, 2007  11:56 -0600, Robert Olson wrote:>> As a starting point - we basically never test Lustre with a big-endian
>> server, so while it works in theory I would instead suggest starting
>> with a big-endian client and little-endian servers first, get that 
>> working,
>
> This works great - running a pair off OSTs on some intel boxes with clients
> on the PPC64 nodes. Initial iozone measurements are making me happy, seeing
> fairly decent performance over gigabit ethernet through at least a couple 
> switches (the servers are some older/slower machines that sit elsewhere in 
> the machine room from the cluster).
Good to hear.
>> and then tackle the big endian server separately (likely using
something
>> like 2.6.22 ext4 as the starting point for ldiskfs, since the extent
code
>> has proper endian swabbing already).  You could also try without
mballoc
>> and extents on the OSTs.
>
> Are these changes that can be made by someone ignorant of the 
> implementation details of the lustre code? (config options, not applying 
> some patches, etc?) I''d be happy to try things out but would need
something
> of a roadmap to do so.
Well, you could start with the MDS on PPC, then try OSTs on PPC without
"-o mballoc,extents" mount options (you might need to pass
"-o nomballoc,noextents" to cancel out the former default options).

As for porting the ldiskfs patches to ext4...  I don''t think it is
necessarily
a simple task, but likely not impossible.  It should be pretty clear which
patches are already applied (extents, nlink, nanosecond), but porting some
of them (e.g. mballoc) would be very tricky (it is just about done in the
ext4 upstream repo) and at that point you can just mount without mballoc...

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Peter Avakian

2007-Nov-17 06:09 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

>On Nov 12, 2007  14:55 -0600, Robert Olson wrote:
>> Since I''ve got my shiny new PPC64-based Debian Etch
installation
>> going, I decided to give Lustre another shot on my mac cluster (no  
>> cross-compilers required).
>On 14 November 2007 01:18 Andreas Dilger wrote:
>As a starting point - we basically never test Lustre with a big-endian
>server, so while it works in theory I would instead suggest starting
>with a big-endian client and little-endian servers first, get that working,
>and then tackle the big endian server separately (likely using something
>like 2.6.22 ext4 as the starting point for ldiskfs, since the extent code
>has proper endian swabbing already).  You could also try without mballoc
>and extents on the OSTs.

I started reading this thread quite recently; I am not sure how exactly the
little/big endian would you like to be tested? But I thought you might
wanted to look at a simple Fortran based program (below) reflects the
read/write I/O pattern.

Using little-endian reads and writes a lot faster than big-endian (Linux of
course). 
I get 500-550 MB/sec read and 200 to 375 MB/sec write with little-endian
files 

(this was done on IA32/IA65 systems). 



The code was complied with the following options using intel compiler:

1) ifort -O3 -assume byterecl writer.f 
2) ifort -O3 -convert big_endian  -assume byterecl writer.f 

#cat writer.f

       implicit none
       integer, parameter :: number_x = 2000, number_y = 2000, number_z 250
       integer i,j,k
       integer i_instant1, i_instant2, irate
       real*8 plane(number_x,number_y),cube(number_x,number_y,number_z) 
       real*4 time,write_speed,read_speed
       character*80 fname,gname

!test Fibre Channel disks using 8GB binary IEEE files:
       fname = ''/home/peter/test.ieee''
       gname = ''/home/peter/test2.ieee''
       fname = ''/home/peter/cmt2/test.ieee''
       gname = ''/home/peter/cmt2/test2.ieee''
       fname = ''/home/peter/cmt/test.ieee''
       gname = ''/home/peter/cmt/test2.ieee''

       !print *,''Reading file...''
       call system_clock(i_instant1,irate)
       open(1,file=fname,form=''unformatted'',
     &          
access=''direct'',recl=kind(plane)*number_x*number_y)
       do k = 1,number_z
          read(1,rec=k)cube(:,:,k)
       enddo
       close(unit=1)

       call system_clock(i_instant2)
       time = (i_instant2-i_instant1)/float(irate)
       read_speed = 8.*number_x*number_y*number_z/time/1.e6
       print *,''read:'',read_speed
       call system_clock(i_instant1,irate)
       !print *,''Writing file...''
       open(1,file=gname,form=''unformatted'',
!!!!     &           buffered=''yes'',blocksize=16384,
     &          
access=''direct'',recl=kind(plane)*number_x*number_y)
       do k = 1,number_z
          write(1,rec=k)cube(:,:,k)
       enddo
       close(unit=1)
       call system_clock(i_instant2)
       time = (i_instant2-i_instant1)/float(irate)
       write_speed = 8.*number_x*number_y*number_z/time/1.e6
       print *,''write:'',write_speed
       end 

 


setenv F_UFMTENDIAN big 

read:   338.5341    
write:   83.41084
read:   369.0582    
write:   86.51231    
read:   369.6755    
..
.
write:   88.34927    
read:   368.6313

setenv F_UFMTENDIAN big:10,20
read:   807.5425    
..
.
write:   753.9417    
read:   776.3146    
write:   776.1639



Regards,
-Peter

''Andreas Dilger''

2007-Nov-18 09:05 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

On Nov 17, 2007  10:09 +0400, Peter Avakian wrote:> I started reading this thread quite recently; I am not sure how exactly the
> little/big endian would you like to be tested? But I thought you might
> wanted to look at a simple Fortran based program (below) reflects the
> read/write I/O pattern.
> 
> Using little-endian reads and writes a lot faster than big-endian (Linux of
> course). 
> I get 500-550 MB/sec read and 200 to 375 MB/sec write with little-endian
> files 
This appears to just be doing endian conversion in the application data?
Lustre doesn''t swap endianness of the data, so that isn''t a
big performance
issue.  It only needs to swab the requests and metadata if the client and
server are of different endianness and not if they are the same endianness.

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Robert Olson

2007-Nov-19 19:25 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

>
> Well, you could start with the MDS on PPC, then try OSTs on PPC  
> without
> "-o mballoc,extents" mount options (you might need to pass
> "-o nomballoc,noextents" to cancel out the former default
options).
OK, early indications are good here. Started out with  MDT on PPC  
with OST on intel, ran iozone up to 1M files and it finished without  
error and with reasonable performance.

Now running MDT + 1 OST on intel, 1 OST formatted with:

mkfs.lustre --ost --fsname ppcfs --mgsnode=192.5.200.12 at tcp -- 
mountfsoptions=nomballoc,noextents /dev/sdc6

on  PPC, iozone running in a directory set up to use the PPC OST with  
setstripe (cool that you can do that).

Job is still running, but no errors, and intermediate results look  
like we''re seeing good performance. iostat reporting good numbers on  
the disk on the OST node.

So what I am wondering now is what do I lose by turning of mballoc  
and extents. My jobs don''t do any sparse file writes or parallel  
writes to files, mostly fairly small file access and the creation of  
some large files.

Thanks,
--bob

PS  - interesting; readwrite test running now, driving the load avg  
on the MDT/OST node up to over 5. guessing a number of OST threads  
waiting on  disk..

Robert Olson

2007-Nov-19 19:41 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

Wup, meant to say random read test, not readwrite. Though I''m also  
seeing fairly high 3-4 loadavgs during the straight read and write  
tests as well.


On Nov 19, 2007, at 1:25 PM, Robert Olson wrote:
> PS  - interesting; readwrite test running now, driving the load avg
> on the MDT/OST node up to over 5. guessing a number of OST threads
> waiting on  disk..
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071119/a47a3992/attachment-0002.html

Andreas Dilger

2007-Nov-21 22:30 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

On Nov 19, 2007  13:25 -0600, Robert Olson wrote:> OK, early indications are good here. Started out with  MDT on PPC with OST 
> on intel, ran iozone up to 1M files and it finished without error and with 
> reasonable performance.
Can you please run "e2fsck -fn" (from a Lustre-patched e2fsprogs) on
the
filesystems after your tests.
> Now running MDT + 1 OST on intel, 1 OST formatted with:
>
> mkfs.lustre --ost --fsname ppcfs --mgsnode=192.5.200.12 at tcp 
> --mountfsoptions=nomballoc,noextents /dev/sdc6
>
> on  PPC, iozone running in a directory set up to use the PPC OST with 
> setstripe (cool that you can do that).
>
> Job is still running, but no errors, and intermediate results look like 
> we''re seeing good performance. iostat reporting good numbers on
the disk on
> the OST node.
>
> So what I am wondering now is what do I lose by turning of mballoc and 
> extents. My jobs don''t do any sparse file writes or parallel
writes to
> files, mostly fairly small file access and the creation of some large 
> files.
The extents,mballoc options are primarily aimed at improving the performance
under high load by reducing CPU usage and getting better allocation.  if
you have mostly small files then the performance difference won''t be
huge.

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Robert Olson

2007-Nov-21 22:37 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

On Nov 21, 2007, at 4:30 PM, Andreas Dilger wrote:
> On Nov 19, 2007  13:25 -0600, Robert Olson wrote:
>> OK, early indications are good here. Started out with  MDT on PPC  
>> with OST
>> on intel, ran iozone up to 1M files and it finished without error  
>> and with
>> reasonable performance.
>
> Can you please run "e2fsck -fn" (from a Lustre-patched e2fsprogs)
> on the
> filesystems after your tests.
What will that tell me?

>
> The extents,mballoc options are primarily aimed at improving the  
> performance
> under high load by reducing CPU usage and getting better  
> allocation.  if
> you have mostly small files then the performance difference won''t
> be huge.
is this one of the changes that improves write performance? I''ve  
notice write performance lagging reads.

I''ve got the system running with a 7 OSTs, and getting near-wire-rate  
reads from wide stripes to a single node with large files according  
to iozone.

--bob

Andreas Dilger

2007-Nov-21 22:55 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

On Nov 21, 2007  16:37 -0600, Robert Olson wrote:> On Nov 21, 2007, at 4:30 PM, Andreas Dilger wrote:
>> On Nov 19, 2007  13:25 -0600, Robert Olson wrote:
>>> OK, early indications are good here. Started out with  MDT on PPC
with
>>> OST
>>> on intel, ran iozone up to 1M files and it finished without error
and
>>> with
>>> reasonable performance.
>>
>> Can you please run "e2fsck -fn" (from a Lustre-patched
e2fsprogs) on the
>> filesystems after your tests.
>
> What will that tell me?
It will tell me if some endian bug is corrupting your ext3 filesystem
(and possibly if there are endian bugs in our e2fsprogs patches)...
>> The extents,mballoc options are primarily aimed at improving the 
>> performance
>> under high load by reducing CPU usage and getting better allocation. 
if
>> you have mostly small files then the performance difference
won''t be huge.
>
> is this one of the changes that improves write performance? I''ve
notice
> write performance lagging reads.
Yes, mballoc+extents does improve write performance.  You could do a test
on the x86_64 system to compare the difference.

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Robert Olson

2007-Nov-21 23:37 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

On Nov 21, 2007, at 4:55 PM, Andreas Dilger wrote:
> On Nov 21, 2007  16:37 -0600, Robert Olson wrote:
>> On Nov 21, 2007, at 4:30 PM, Andreas Dilger wrote:
>>> On Nov 19, 2007  13:25 -0600, Robert Olson wrote:
>>>> OK, early indications are good here. Started out with  MDT on  
>>>> PPC with
>>>> OST
>>>> on intel, ran iozone up to 1M files and it finished without  
>>>> error and
>>>> with
>>>> reasonable performance.
>>>
>>> Can you please run "e2fsck -fn" (from a Lustre-patched
e2fsprogs)
>>> on the
>>> filesystems after your tests.
>>
>> What will that tell me?
>
> It will tell me if some endian bug is corrupting your ext3 filesystem
> (and possibly if there are endian bugs in our e2fsprogs patches)...
Ahh, that would be good to know :-)

Do you mean running it on the underlying OST, or on the filesystem as  
a whole?

hm, dumb question, I don''t see a source tarball of the lustre  
e2fsprogs; is there one other than in the src rpms at ftp:// 
ftp.lustre.org/pub/lustre/other/e2fsprogs/.

--bob

Andreas Dilger

2007-Nov-22 23:32 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

On Nov 21, 2007  17:37 -0600, Robert Olson wrote:> > It will tell me if some endian bug is corrupting your ext3 filesystem
> > (and possibly if there are endian bugs in our e2fsprogs patches)...
> 
> Ahh, that would be good to know :-)
> 
> Do you mean running it on the underlying OST, or on the filesystem as  
> a whole?
On the underlying OST or MDS that is on a PPC system.
> hm, dumb question, I don''t see a source tarball of the lustre  
> e2fsprogs; is there one other than in the src rpms at ftp:// 
> ftp.lustre.org/pub/lustre/other/e2fsprogs/.
We don''t make a patched tarball, but you can use the .src.rpm and
extract it with cpio.

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Robert Olson

2008-Jan-10 22:08 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

On Nov 21, 2007, at 4:55 PM, Andreas Dilger wrote:
> On Nov 21, 2007  16:37 -0600, Robert Olson wrote:
>> On Nov 21, 2007, at 4:30 PM, Andreas Dilger wrote:
>>> On Nov 19, 2007  13:25 -0600, Robert Olson wrote:
>>>> OK, early indications are good here. Started out with  MDT on  
>>>> PPC with
>>>> OST
>>>> on intel, ran iozone up to 1M files and it finished without  
>>>> error and
>>>> with
>>>> reasonable performance.
>>>
>>> Can you please run "e2fsck -fn" (from a Lustre-patched
e2fsprogs)
>>> on the
>>> filesystems after your tests.
>>
>> What will that tell me?
>
> It will tell me if some endian bug is corrupting your ext3 filesystem
> (and possibly if there are endian bugs in our e2fsprogs patches)...
Finally getting around to trying this. One one of the OSTs:

root at bio-ppc-38:~# /scratch/olson/e2fsprogs/sbin/e2fsck -fn /dev/sda4
e2fsck 1.40.4.cfs1 (31-Dec-2007)
Warning: skipping journal recovery because doing a read-only  
filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (119027670, counted=117766441).
Fix? no

Free inodes count wrong (30269427, counted=30057352).
Fix? no

ppcfs-OST0001: ********** WARNING: Filesystem still has errors  
**********

ppcfs-OST0001: 13/30269440 files (261.5% non-contiguous),  
2020192/121047862 blocks

Robert Olson

2008-Jan-10 22:38 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

The MDT and another OST I checked also had the same error:

root at bio-ppc-head-3:/scratch/lustre/e2fsprogs/e2fsprogs-1.40.4.cfs1# / 
scratch/olson/e2fsprogs/sbin/e2fsck -fn /dev/md0
e2fsck 1.40.4.cfs1 (31-Dec-2007)
Warning: skipping journal recovery because doing a read-only  
filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (82818517, counted=82619128).
Fix? no

Free inodes count wrong (94797811, counted=93117818).
Fix? no


ppcfs-MDT0000: ********** WARNING: Filesystem still has errors  
**********

ppcfs-MDT0000: 13/94797824 files (61.5% non-contiguous),  
11975451/94793968 blocks


root at bio-ppc-39:~# /scratch/olson/e2fsprogs/sbin/e2fsck -fn /dev/sda4
e2fsck 1.40.4.cfs1 (31-Dec-2007)
Warning: skipping journal recovery because doing a read-only  
filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (119027670, counted=118115465).
Fix? no

Free inodes count wrong (30269427, counted=30057292).
Fix? no


ppcfs-OST0004: ********** WARNING: Filesystem still has errors  
**********

ppcfs-OST0004: 13/30269440 files (261.5% non-contiguous),  
2020192/121047862 blocks

Andreas Dilger

2008-Jan-10 23:26 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

On Jan 10, 2008  16:38 -0600, Robert Olson wrote:> The MDT and another OST I checked also had the same error:
>
> root at bio-ppc-head-3:/scratch/lustre/e2fsprogs/e2fsprogs-1.40.4.cfs1# 
> /scratch/olson/e2fsprogs/sbin/e2fsck -fn /dev/md0
> e2fsck 1.40.4.cfs1 (31-Dec-2007)
> Warning: skipping journal recovery because doing a read-only filesystem 
> check.
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Free blocks count wrong (82818517, counted=82619128).
> Fix? no
>
> Free inodes count wrong (94797811, counted=93117818).
> Fix? no
This is pretty normal if you are checking a filesystem read-only.  The
superblock summaries are not updated, to avoid lock contention.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Robert Olson

2008-Jan-10 23:48 UTC

head link

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

oh excellent, that is great news. so it looks like we''re not seeing  
endian-based corruption then.?

thanks,
--bob

On Jan 10, 2008, at 5:26 PM, Andreas Dilger wrote:
> On Jan 10, 2008  16:38 -0600, Robert Olson wrote:
>> The MDT and another OST I checked also had the same error:
>>
>> root at bio-ppc-head-3:/scratch/lustre/e2fsprogs/e2fsprogs-1.40.4.cfs1#
>> /scratch/olson/e2fsprogs/sbin/e2fsck -fn /dev/md0
>> e2fsck 1.40.4.cfs1 (31-Dec-2007)
>> Warning: skipping journal recovery because doing a read-only  
>> filesystem
>> check.
>> Pass 1: Checking inodes, blocks, and sizes
>> Pass 2: Checking directory structure
>> Pass 3: Checking directory connectivity
>> Pass 4: Checking reference counts
>> Pass 5: Checking group summary information
>> Free blocks count wrong (82818517, counted=82619128).
>> Fix? no
>>
>> Free inodes count wrong (94797811, counted=93117818).
>> Fix? no
>
> This is pretty normal if you are checking a filesystem read-only.  The
> superblock summaries are not updated, to avoid lock contention.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>

Lustre discuss - Nov 2007 - Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster

[Lustre-discuss] Problems & partial success on PPC64 (XServe G5) Debian cluster