thr3ads.net - Lustre discuss - [Lustre-discuss] Disappearing OSTs [May 2008]

If this information is useful, please help other people find it:
Share via:

jrs

2008-May-01 15:52 UTC

[Lustre-discuss] Disappearing OSTs

Greetings,

I''ve posted before but no one responded. I''m reposting because
I''m
really dead in the water here until I can get this fixed.

The issue is that my OSTs don''t survive a reboot of the OSS.

In the below I''m dealing with two OSTs, quad-core Intel Xeon machines
with 8Gigs memory and dual port Qlogic fiber channel card.  They both
run SLES 10.1 and lustre 1.6.4.3.  My two MDS (similiar, though not
exactly same hardware), don''t have the same problem, though
I''m only
accessing a single MDT from them.

I''ve produced the problem by something as simple as running
umount /mnt/lustre/ost/ost_oss01_lustre0102_01
tune2fs -O +mmp /dev/mapper/ost_oss01_lustre0102_01
mount -t lustre /dev/mapper/ost_oss01_lustre0102_01
/mnt/lustre/ost/ost_oss01_lustre0102_01

This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.

When I look at the partition table with parted I see that it''s changed
from loop to gpt (as shown below).

But the simpliest case is:

oss01:/net/lmd01/space/lustre # mkfs.lustre --reformat --fsname i3_lfs3 --ost
--failnode oss02 --mgsnode mds01 --mgsnode mds02
/dev/mapper/ost_oss01_lustre0102_01

oss01:/net/lmd01/space/lustre # reboot

# log in

oss01:/net/lmd01/space/lustre # mount -t lustre
/dev/mapper/ost_oss01_lustre0102_01 /mnt/lustre/ost/ost_oss01_lustre0102_01
mount.lustre: mount /dev/mapper/ost_oss01_lustre0102_01 at
/mnt/lustre/ost/ost_oss01_lustre0102_01 failed: Invalid argument
This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.

oss01:/net/lmd01/space/lustre # dumpe2fs -h /dev/mapper/ost_oss01_lustre0102_01
|grep feature
dumpe2fs 1.40.4.cfs1 (31-Dec-2007)
dumpe2fs: Bad magic number in super-block while trying to open
/dev/mapper/ost_oss01_lustre0102_01

# another example, I re-run mkfs.lustre on the above device and mount
# it and 2 other OSTs on the second OSS

oss02:/net/lmd01/space/lustre # df|egrep ''File|ost''
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/ost_oss01_lustre0102_01
                      5768201600    469544 5474724244   1%
/mnt/lustre/ost/ost_oss01_lustre0102_01
/dev/mapper/ost_oss01_lustre0102_02
                      5768201600    469540 5474724248   1%
/mnt/lustre/ost/ost_oss01_lustre0102_02
/dev/mapper/ost_oss02_lustre0102_01
                      5768201600    479940 5474713848   1%
/mnt/lustre/ost/ost_oss02_lustre0102_01

# I reboot the first machine then

oss02:/net/lmd01/space/lustre # umount -t lustre -a

# then try to mount from first machine and ...

oss01:/net/lmd01/space/lustre # cat a
   mount -t lustre /dev/mapper/ost_oss01_lustre0102_01
/mnt/lustre/ost/ost_oss01_lustre0102_01
   mount -t lustre /dev/mapper/ost_oss01_lustre0102_02
/mnt/lustre/ost/ost_oss01_lustre0102_02
   mount -t lustre /dev/mapper/ost_oss02_lustre0102_01
/mnt/lustre/ost/ost_oss02_lustre0102_01
oss01:/net/lmd01/space/lustre # sh a
mount.lustre: mount /dev/mapper/ost_oss01_lustre0102_01 at
/mnt/lustre/ost/ost_oss01_lustre0102_01 failed: Invalid argument
This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.
oss01:/net/lmd01/space/lustre # df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/cciss/c0d0p3     61022084   6398044  51524300  12% /
udev                   4089220       312   4088908   1% /dev
/dev/cciss/c0d0p1      1241220     48324   1129844   5% /boot
lmd01:/space         470387232   8296256 438196704   2% /net/lmd01/space
/dev/mapper/ost_oss01_lustre0102_02
                      5768201600    469540 5474724248   1%
/mnt/lustre/ost/ost_oss01_lustre0102_02
/dev/mapper/ost_oss02_lustre0102_01
                      5768201600    479940 5474713848   1%
/mnt/lustre/ost/ost_oss02_lustre0102_01

# So the device was up just fine on one machine, I umounted them and tried on
the other OSS
# and the partition table has changed

oss01:/net/lmd01/space/lustre # /usr/local/sbin/parted
/dev/mapper/ost_oss01_lustre0102_01
GNU Parted 1.8.8
Using /dev/mapper/ost_oss01_lustre0102_01
Welcome to GNU Parted! Type ''help'' to view a list of commands.
(parted) p
Model: Linux device-mapper (dm)
Disk /dev/mapper/ost_oss01_lustre0102_01: 6001GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End  Size  File system  Name  Flags

(parted) quit

# I can''t just put another partition table back

(parted) mklabel
Warning: The existing disk label on /dev/mapper/ost_oss01_lustre0102_01 will be
destroyed and all data on this disk will be lost. Do you want
to continue?
Yes/No? yes
New disk label type?  [gpt]? loop
(parted) p
Model: Linux device-mapper (dm)
Disk /dev/mapper/ost_oss01_lustre0102_01: 6001GB
Sector size (logical/physical): 512B/512B
Partition Table: loop

Number  Start  End  Size  File system  Flags

(parted) mkpart
File system type?  [ext2]? ext3
Start? 0
End? 6001GB
(parted) p
Error: /dev/mapper/ost_oss01_lustre0102_01: unrecognised disk label
(parted) quit

# There is nothing unusual about the device; looking at multipath

oss01:/net/lmd01/space/lustre # multipath -l|grep ost_oss01_lustre0102_01
ost_oss01_lustre0102_01 (36000402001fc14596ef496ed00000000) dm-4
NEXSAN,SATABeast

oss02:/net/lmd01/space/lustre # multipath -l|grep ost_oss01_lustre0102_01
ost_oss01_lustre0102_01 (36000402001fc14596ef496ed00000000) dm-4
NEXSAN,SATABeast


Any suggestions would be deeply appreciated.


Thanks much,
JR Smith

Cliff White

2008-May-02 02:21 UTC

head link

[Lustre-discuss] Disappearing OSTs

jrs wrote:> Greetings,
> 
> I''ve posted before but no one responded. I''m reposting
because I''m
> really dead in the water here until I can get this fixed.
> 
> The issue is that my OSTs don''t survive a reboot of the OSS.
> 
> In the below I''m dealing with two OSTs, quad-core Intel Xeon
machines
> with 8Gigs memory and dual port Qlogic fiber channel card.  They both
> run SLES 10.1 and lustre 1.6.4.3.  My two MDS (similiar, though not
> exactly same hardware), don''t have the same problem, though
I''m only
> accessing a single MDT from them.
> 
> I''ve produced the problem by something as simple as running
> umount /mnt/lustre/ost/ost_oss01_lustre0102_01
> tune2fs -O +mmp /dev/mapper/ost_oss01_lustre0102_01
> mount -t lustre /dev/mapper/ost_oss01_lustre0102_01
/mnt/lustre/ost/ost_oss01_lustre0102_01
>.....
> Any suggestions would be deeply appreciated.
It looks like something is really destroying your disks, if you try this 
with ordinary ext3, does the filesystem survive a reboot?

Otherwise, you could try:
- mkfs.lustre as before.
# tunefs.lustre --print <device>
reboot
# tunefs.lustre --print <device>

Tunefs with --print is read-only, if it doesn''t work the second time, 
you should be able to compare the results.
cliffw
> 
> 
> Thanks much,
> JR Smith
> 
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

jrs

2008-May-02 16:09 UTC

head link

[Lustre-discuss] Disappearing OSTs

I just made an ext3 filesystem, mounted it (on both OSSes - not
at the same time), unmounted, reboot both servers and it''s still
there.  It appearance that this destruction of filesystems is
a lustre only thing.

A difference between this and the lustre filesystem, of course, is that
there is not device name created for the partition, e.g.,

oss01:~ # ls -l /dev/mapper/ost_oss01_lustre0102_01_bad_no_use*
brw------- 1 root root 253,  7 May  2 09:09
/dev/mapper/ost_oss01_lustre0102_01_bad_no_use
brw------- 1 root root 253, 12 May  2 09:09
/dev/mapper/ost_oss01_lustre0102_01_bad_no_use-part1

while a lustre OST uses the whole disk/volume

oss01:~ # ls -l /dev/mapper/ost_oss01_lustre0304_02
brw------- 1 root root 253, 5 May  2 09:45 /dev/mapper/ost_oss01_lustre0304_02

In the mailing lists some time back someone had talked about kpartx (though
I think it was in the context of having a consistent device name - whch
I have no trouble with since I''m explicitly naming them in
/etc/multipathd.conf.

Another issue that appears to be a bug, though is probably not related to
my issue is when running mkfs.lustre with --failnode mmp should be set on
the filesystem.  However, looking at the output of dumpe2fs that
doesn''t
appear to be the case:

oss01:/net/lmd01/space/lustre # dumpe2fs -h
/dev/mapper/ost_oss01_lustre0304_02|grep -A 1 feat
dumpe2fs 1.40.4.cfs1 (31-Dec-2007)
Filesystem features:      has_journal resize_inode dir_index filetype
needs_recovery extents sparse_super large_file
Filesystem flags:         signed directory hash

Of course, I can run tune2fs but that, in the past, has induced the
disappearance of
the filesystem as well.

thanks,
JR


Cliff White wrote:> jrs wrote:
>> Greetings,
>>
>> I''ve posted before but no one responded. I''m
reposting because I''m
>> really dead in the water here until I can get this fixed.
>>
>> The issue is that my OSTs don''t survive a reboot of the OSS.
>>
>> In the below I''m dealing with two OSTs, quad-core Intel Xeon
machines
>> with 8Gigs memory and dual port Qlogic fiber channel card.  They both
>> run SLES 10.1 and lustre 1.6.4.3.  My two MDS (similiar, though not
>> exactly same hardware), don''t have the same problem, though
I''m only
>> accessing a single MDT from them.
>>
>> I''ve produced the problem by something as simple as running
>> umount /mnt/lustre/ost/ost_oss01_lustre0102_01
>> tune2fs -O +mmp /dev/mapper/ost_oss01_lustre0102_01
>> mount -t lustre /dev/mapper/ost_oss01_lustre0102_01 
>> /mnt/lustre/ost/ost_oss01_lustre0102_01
>>
> .....
> 
>> Any suggestions would be deeply appreciated.
> 
> It looks like something is really destroying your disks, if you try this 
> with ordinary ext3, does the filesystem survive a reboot?
> 
> Otherwise, you could try:
> - mkfs.lustre as before.
> # tunefs.lustre --print <device>
> reboot
> # tunefs.lustre --print <device>
> 
> Tunefs with --print is read-only, if it doesn''t work the second
time,
> you should be able to compare the results.
> cliffw
> 
>>
>>
>> Thanks much,
>> JR Smith
>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Brian J. Murrell

2008-May-02 16:28 UTC

head link

[Lustre-discuss] Disappearing OSTs

On Fri, 2008-05-02 at 12:09 -0400, jrs wrote:> 
> A difference between this and the lustre filesystem, of course, is that
> there is not device name created for the partition, e.g.,
> 
> oss01:~ # ls -l /dev/mapper/ost_oss01_lustre0102_01_bad_no_use*
> brw------- 1 root root 253,  7 May  2 09:09
/dev/mapper/ost_oss01_lustre0102_01_bad_no_use
> brw------- 1 root root 253, 12 May  2 09:09
/dev/mapper/ost_oss01_lustre0102_01_bad_no_use-part1
> 
> while a lustre OST uses the whole disk/volume
So for your ext3 test you partitioned the disk and used a partition and
for lustre you used the whole disk?  Why not do a more apples-to-apples
comparison and format the whole device with ext3 just like you would
with Lustre?  There is no rule that you have to use partitions with
ext3.

Also be sure you are using the exact same disk/device between your two
tests to eliminate a possibility that this is related to only one
specific device.

You also mention partitions in your post.  You need to make sure that if
you are using a whole disk device (i.e. /dev/sda rather than /dev/sda1)
you cannot use any partitioning tools on that device or you will
overwrite the beginning of your filesystem.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080502/25ce2569/attachment.bin

Andreas Dilger

2008-May-03 02:34 UTC

head link

[Lustre-discuss] Disappearing OSTs

On May 01, 2008  11:52 -0400, jrs wrote:> oss01:/net/lmd01/space/lustre # mount -t lustre
/dev/mapper/ost_oss01_lustre0102_01 /mnt/lustre/ost/ost_oss01_lustre0102_01
> mount.lustre: mount /dev/mapper/ost_oss01_lustre0102_01 at
/mnt/lustre/ost/ost_oss01_lustre0102_01 failed: Invalid argument
> This may have multiple causes.
> Are the mount options correct?
> Check the syslog for more info.  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ???????
> mount.lustre: mount /dev/mapper/ost_oss01_lustre0102_01 at
/mnt/lustre/ost/ost_oss01_lustre0102_01 failed: Invalid argument
> This may have multiple causes.
> Are the mount options correct?
> Check the syslog for more info.  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  ????????
> # I can''t just put another partition table back
> 
> (parted) mklabel
> Warning: The existing disk label on /dev/mapper/ost_oss01_lustre0102_01
will be destroyed and all data on this disk will be lost. Do you want
> to continue?
> Yes/No? yes
> New disk label type?  [gpt]? loop
> (parted) p
What is a "loop" partition table?
> Model: Linux device-mapper (dm)
> Disk /dev/mapper/ost_oss01_lustre0102_01: 6001GB
Note that anything over 2TB (I think, maybe 4TB?) needs a GPT partition table,
or the size of the device is incorrect.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Bernd Schubert

2008-May-03 07:17 UTC

head link

[Lustre-discuss] Disappearing OSTs

On Fri, May 02, 2008 at 07:34:25PM -0700, Andreas Dilger
wrote:> On May 01, 2008  11:52 -0400, jrs wrote:
> 
> Note that anything over 2TB (I think, maybe 4TB?) needs a GPT partition
table,
> or the size of the device is incorrect.
> 
And I have a warning for people who need to use GPT tables - from our 
experience the kernel silently ignores GPT. We run into this already
some months ago. Everything seemed to be correct, /proc/partitions
was what we did expect, but then we noticed very odd data corruption.
First we thought we introduced a bug into ldiskfs (you know, we
usually need more recent kernels than you support), but after
adding a patch printing the partition offsets, we recognized it 
simply did use a dos partition table and wrapped around at 4TiB.

Oh well, I wanted to to further investigate this already for a long
time. But since it can be easily worked around by specifying the 
"gpt" kernel command line parameter other issues always did have
higher priority.

Bernd

PS: This gpt bug is in 2.6.20 and 2.6.22, don''t know if it fixed in
more
recent kernel versions.

Thomas Roth

2008-May-05 16:53 UTC

head link

[Lustre-discuss] Disappearing OSTs

Hello Bernd,

could you give some details on the data corruption you have seen?
We have been using GPT tables for some time now, but the partitions have 
been filled up only on NFS servers, not on Lustre OSS. And on these NFS 
volumes there were no problems attributed to the partition size.

This would also depend on the actual size limit: I understand GPT tables 
are necessary for partitions > 2TB. The largest partition I have set up 
so far was 3.2 TB, so if you are sure about your number of 4 TiB, we 
might simply be on the lucky side.

And the workaround is specifying "gpt" on the kernel command line ?
For
once an easy solution ;-)

Regards,
Thomas

Bernd Schubert wrote:> On Fri, May 02, 2008 at 07:34:25PM -0700, Andreas Dilger wrote:
>> On May 01, 2008  11:52 -0400, jrs wrote:
>>
>> Note that anything over 2TB (I think, maybe 4TB?) needs a GPT partition
table,
>> or the size of the device is incorrect.
>>
> 
> And I have a warning for people who need to use GPT tables - from our 
> experience the kernel silently ignores GPT. We run into this already
> some months ago. Everything seemed to be correct, /proc/partitions
> was what we did expect, but then we noticed very odd data corruption.
> First we thought we introduced a bug into ldiskfs (you know, we
> usually need more recent kernels than you support), but after
> adding a patch printing the partition offsets, we recognized it 
> simply did use a dos partition table and wrapped around at 4TiB.
> 
> Oh well, I wanted to to further investigate this already for a long
> time. But since it can be easily worked around by specifying the 
> "gpt" kernel command line parameter other issues always did have
> higher priority.
> 
> Bernd
> 
> PS: This gpt bug is in 2.6.20 and 2.6.22, don''t know if it fixed
in more
> recent kernel versions.
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
-- 
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

Gesellschaft f?r Schwerionenforschung mbH
Planckstra?e 1
D-64291 Darmstadt
www.gsi.de

Gesellschaft mit beschr?nkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Gesch?ftsf?hrer: Professor Dr. Horst St?cker

Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph,
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt

Andreas Dilger

2008-May-05 17:21 UTC

head link

[Lustre-discuss] Disappearing OSTs

On May 05, 2008  11:57 -0400, jrs wrote:> mds01:/net/lmd01/space/lustre # mount -t lustre
/dev/mapper/mdt_mds01_lustre0102 /mnt/lustre/mdt
> mount.lustre: mount /dev/mapper/mdt_mds01_lustre0102 at /mnt/lustre/mdt
failed: Invalid argument
> This may have multiple causes.
> Are the mount options correct?
> Check the syslog for more info.
>
> Which produces this in /var/log/message
>
> May  5 09:35:41 mds01 kernel: VFS: Can''t find ldiskfs filesystem
on dev dm-1.
> May  5 09:35:41 mds01 multipathd: dm-1: umount map (uevent)
> May  5 09:35:41 mds01 kernel: LustreError: 
> 16215:0:(obd_mount.c:1229:server_kernel_mount()) premount 
> /dev/mapper/mdt_mds01_lustre0102:0x0 ldiskfs failed: -22, ldiskfs2 failed: 
> -19.  Is the ldiskfs module available?
> May  5 09:35:41 mds01 kernel: LustreError:
16215:0:(obd_mount.c:1533:server_fill_super()) Unable to mount device
/dev/mapper/mdt_mds01_lustre0102: -22
> May  5 09:35:41 mds01 kernel: LustreError:
16215:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount  (-22)
>
> If I try to look at the partition table with parted I see:
>
> mds01:/net/oss02/space/parted-1.8.8 # /usr/local/sbin/parted
/dev/mapper/mdt_mds01_lustre0102
> GNU Parted 1.8.8
> Using /dev/mapper/mdt_mds01_lustre0102
> Welcome to GNU Parted! Type ''help'' to view a list of
commands.
> (parted) p
> Error: /dev/mapper/mdt_mds01_lustre0102: unrecognised disk label
> (parted)
>
> A good filesystem looks like:
> mds01:/net/oss02/space/parted-1.8.8 # /usr/local/sbin/parted
/dev/mapper/ost_oss01_lustre0304_01
> GNU Parted 1.8.8
> Using /dev/mapper/ost_oss01_lustre0304_01
> Welcome to GNU Parted! Type ''help'' to view a list of
commands.
> (parted) p
> Model: Unknown (unknown)
> Disk /dev/mapper/ost_oss01_lustre0304_01: 6001GB
> Sector size (logical/physical): 512B/512B
> Partition Table: loop
>
> Number  Start  End     Size    File system  Flags
>  1      0.00B  6001GB  6001GB  ext3
>
>
> NOTE: in another post someone commented on the loop partition type.
> I don''t know what it is but all my lustre partitions are of that
> type.  The fact that a lustre person (I believe this individual was
> employed by Sun) was unfamiliar with it certainly is surprising.
I don''t think that being employed by Sun makes everyone suddenly know
and understand everything :-).  That other person was me, and while
I''ve even contributed a significant amount of code to parted in the
past, I just haven''t used it in several years and am not familiar with
the "loop" partition type.
> Perhaps my version of parted has an issue (the one shipped with SLES
> returns:
> mds01:/net/oss02/space/parted-1.8.8 # parted
/dev/mapper/mdt_mds01_lustre0102
> Floating point exception
Two things of note:
- there have been ongoing issues with parted and ldiskfs with large disk
  devices, and I tend to avoid parted and fdisk entirely for these reasons.
  I''ve been using LVM (DM) to manage my storage for some time now, if
it is
  needed.
- we generally do NOT recommend using partitions of any kind for production
  Lustre filesystems, because of problems like this, and the fact that in
  RAID setups this can hurt performance due to misaligned IO to the disk.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Bernd Schubert

2008-May-05 17:23 UTC

head link

[Lustre-discuss] Disappearing OSTs

Hello Thomas,

On Monday 05 May 2008 18:53:12 Thomas Roth wrote:> Hello Bernd,
>
> could you give some details on the data corruption you have seen?
> We have been using GPT tables for some time now, but the partitions have
> been filled up only on NFS servers, not on Lustre OSS. And on these NFS
> volumes there were no problems attributed to the partition size.
the type of filesystem is not important at all, it is still on low device 
level.
>
> This would also depend on the actual size limit: I understand GPT tables
> are necessary for partitions > 2TB. The largest partition I have set up
> so far was 3.2 TB, so if you are sure about your number of 4 TiB, we
> might simply be on the lucky side.
I''m rather sure we did see the problem with >4TiB only. 
>
> And the workaround is specifying "gpt" on the kernel command line
? For
> once an easy solution ;-)
Yes, usually the kernel seems to first try the dos partition table and then 
tries other tables. But the first 512B of GPT seems to be compatible with DOS 
and so the kernel thinks the dos-table is fine.
When you specify gpt, it first tries gpt and if this doesn''t fit it
switches
to dos.


Cheers,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH

Bernd Schubert

2008-May-05 17:43 UTC

head link

[Lustre-discuss] Disappearing OSTs

On Monday 05 May 2008 19:21:43 Andreas Dilger wrote:> >
> > (parted) p
> > Model: Unknown (unknown)
> > Disk /dev/mapper/ost_oss01_lustre0304_01: 6001GB
> > Sector size (logical/physical): 512B/512B
> > Partition Table: loop
> >
> > Number  Start  End     Size    File system  Flags
> >  1      0.00B  6001GB  6001GB  ext3
> >
> >
> > NOTE: in another post someone commented on the loop partition type.
> > I don''t know what it is but all my lustre partitions are of
that
> > type.  The fact that a lustre person (I believe this individual was
> > employed by Sun) was unfamiliar with it certainly is surprising.
>
> I don''t think that being employed by Sun makes everyone suddenly
know
> and understand everything :-).  That other person was me, and while
> I''ve even contributed a significant amount of code to parted in
the
> past, I just haven''t used it in several years and am not familiar
with
> the "loop" partition type.
You definitely know more about filesystems and partitions than I do, but
I''m
sure this is a bug.
>
> > Perhaps my version of parted has an issue (the one shipped with SLES
> > returns:
> > mds01:/net/oss02/space/parted-1.8.8 # parted
> > /dev/mapper/mdt_mds01_lustre0102 Floating point exception
Probably this:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=259248
>
> Two things of note:
> - there have been ongoing issues with parted and ldiskfs with large disk
>   devices, and I tend to avoid parted and fdisk entirely for these reasons.
>   I''ve been using LVM (DM) to manage my storage for some time now,
if it is
>   needed.
> - we generally do NOT recommend using partitions of any kind for production
>   Lustre filesystems, because of problems like this, and the fact that in
>   RAID setups this can hurt performance due to misaligned IO to the disk.
>
Well, I wish we wouldn''t need to use partitions, but for some projects
we need
to do so:

- ldiskfs is still limited to 8TiB

- linux-md raid6 is not parallized and a single CPU becomes a limit, while 7 
other CPUs are idling. Creating several raid sets is then some kind of 
parallization


Cheers,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH

jrs

2008-May-05 20:05 UTC

head link

[Lustre-discuss] Disappearing OSTs

Well, things have changed again as I''m trying to get back to something
that works but on one of the MDSs you see.  Below that I have
the output of ''multipath -l''.  I dual port HBAs and multiple
paths
to the backend storage so it looks a little complex.  I''ve modified
the /etc/multipathd.conf file to give the logical names you see, e.g.,
ost_lustre03-04_04_oss01_dm_7_mds01.

Even though it looks a little scary remember that things work fine and
can even survive a random number of reboots before an OST disappears.
Since the last time I posted I had an MST go away too.

Does anyone think that I might have better luck running Redhat?
I''ve looked through the /etc/init.d/* files but can''t see
anything
that might be destroying the partition.

Thanks
John


$ cat /proc/partition
major minor  #blocks  name

  104     0   71652960 cciss/c0d0
  104     1    2104483 cciss/c0d0p1
  104     2   69545385 cciss/c0d0p2
    8     0 5860157184 sda
    8    16 5860157184 sdb
    8    32 5860230912 sdc
    8    48 5860156250 sdd
    8    64 5860156250 sde
    8    80 5860156250 sdf
    8    96 5860156250 sdg
    8   112 5860156250 sdh
    8   128 5860156250 sdi
    8   144 5860157184 sdj
    8   160 5860157184 sdk
    8   176 5860230912 sdl
    8   192 5860156250 sdm
    8   193 5860156216 sdm1
    8   208 5860156250 sdn
    8   224 5860156250 sdo
    8   240 5860157184 sdp
   65     0 5860157184 sdq
   65    16 5860230912 sdr
   65    32 5860156250 sds
   65    33 5860156216 sds1
   65    48 5860156250 sdt
   65    64 5860156250 sdu
   65    80 5860157184 sdv
   65    96 5860157184 sdw
   65   112 5860230912 sdx
   65   128 5860156250 sdy
   65   129 5860156216 sdy1
   65   144 5860156250 sdz
   65   160 5860156250 sdaa
   65   176 5860157184 sdab
   65   192 5860157184 sdac
   65   208 5860230912 sdad
   65   224 5860156250 sdae
   65   225 5860156216 sdae1
   65   240 5860156250 sdaf
   66     0 5860156250 sdag
   66    16 5860157184 sdah
   66    32 5860157184 sdai
   66    48 5860230912 sdaj
   66    64 5860157184 sdak
   66    80 5860157184 sdal
   66    96 5860230912 sdam
   66   112 5860156250 sdan
   66   128 5860156250 sdao
   66   144 5860156250 sdap
   66   160 5860156250 sdaq
   66   176 5860156250 sdar
   66   192 5860156250 sdas
   66   208 5860157184 sdat
   66   224 5860157184 sdau
   66   240 5860230912 sdav
  253     0 5860156250 dm-0
  253     1 5860157184 dm-1
  253     2 5860157184 dm-2
  253     3 5860230912 dm-3
  253     4 5860156250 dm-4
  253     5 5860156250 dm-5
  253     6 5860157184 dm-6
  253     7 5860157184 dm-7
  253     8 5860230912 dm-8
  253     9 5860156250 dm-9
  253    10 5860156250 dm-10
  253    11 5860156250 dm-11
  253    12 5860156216 dm-12


$ multipath -l
ost_lustre03-04_04_oss01_dm_7_mds01 (36000402001fc308260c0ace100000000) dm-7
NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:3:4 sdai 66:32  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:7:4 sdau 66:224 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:3:4 sdk  8:160  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:7:4 sdw  65:96  [active][undef]
ost_lustre01-02_04_oss01_dm_5_mds01 (36000402001fc14596ef496fd00000000) dm-5
NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:2:1 sdaf 65:240 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:4:1 sdn  8:208  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:6:1 sdt  65:48  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:0:1 sdz  65:144 [active][undef]
ost_lustre03-04_02_oss01_dm_3_mds01 (36000402001fc308260c0af3700000000) dm-3
NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:1:2 sdad 65:208 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:4:2 sdam 66:96  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:0:2 sdc  8:32   [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:5:2 sdr  65:16  [active][undef]
ost_lustre01-02_02_oss01_dm_11_mds01 (36000402001fc14596ef497ee00000000) dm-11
NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:5:5 sdap 66:144 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:6:5 sdas 66:192 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:1:5 sdf  8:80   [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:2:5 sdi  8:128  [active][undef]
ost_lustre01-02_05_oss02_dm_0_mds01 (36000402001fc14596ef4970e00000000) dm-0
NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:0:2 sdaa 65:160 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:2:2 sdag 66:0   [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:4:2 sdo  8:224  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:6:2 sdu  65:64  [active][undef]
ost_lustre01-02_01_oss02_dm_10_mds01 (36000402001fc14596ef497dc00000000) dm-10
NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:5:4 sdao 66:128 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:6:4 sdar 66:176 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:1:4 sde  8:64   [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:2:4 sdh  8:112  [active][undef]
mdt_lustre03-04_00_dm_8_mds01 (36000402001fc308260c0ac9e00000000) dm-8
NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:3:5 sdaj 66:48  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:7:5 sdav 66:240 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:3:5 sdl  8:176  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:7:5 sdx  65:112 [active][undef]
ost_lustre03-04_03_oss02_dm_6_mds01 (36000402001fc308260c0acc200000000) dm-6
NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:3:3 sdah 66:16  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:7:3 sdat 66:208 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:3:3 sdj  8:144  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:7:3 sdv  65:80  [active][undef]
ost_lustre01-02_03_oss02_dm_4_mds01 (36000402001fc14596ef496ed00000000) dm-4
NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:2:0 sdae 65:224 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:4:0 sdm  8:192  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:6:0 sds  65:32  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:0:0 sdy  65:128 [active][undef]
ost_lustre03-04_01_oss02_dm_2_mds01 (36000402001fc308260c0af1600000000) dm-2
NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:1:1 sdac 65:192 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:4:1 sdal 66:80  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:0:1 sdb  8:16   [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:5:1 sdq  65:0   [active][undef]
ost_lustre01-02_00_oss01_dm_9_mds01 (36000402001fc14596ef497cc00000000) dm-9
NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:5:3 sdan 66:112 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:6:3 sdaq 66:160 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:1:3 sdd  8:48   [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:2:3 sdg  8:96   [active][undef]
ost_lustre03-04_00_oss01_dm_1_mds01 (36000402001fc308260c0af5b00000000) dm-1
NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:1:0 sdab 65:176 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:4:0 sdak 66:64  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:0:0 sda  8:0    [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:5:0 sdp  8:240  [active][undef]



Bernd Schubert wrote:
 > On Mon, May 05, 2008 at 12:30:23PM -0400, jrs wrote:
 >> I wonder if I''d have better luck, with the disappearing OST
bug, if
 >> I actually explictly partitioned the device and then used, to take
 >> the example above
 >>
 >>     /dev/mapper/ost_oss01_lustre0304_02-part1
 >>
 >> rather than the whole disk.
 >>
 >
 > What does /proc/partitions say?

Lustre discuss - May 2008 - Disappearing OSTs

[Lustre-discuss] Disappearing OSTs

[Lustre-discuss] Disappearing OSTs

[Lustre-discuss] Disappearing OSTs

[Lustre-discuss] Disappearing OSTs

[Lustre-discuss] Disappearing OSTs

[Lustre-discuss] Disappearing OSTs

[Lustre-discuss] Disappearing OSTs

[Lustre-discuss] Disappearing OSTs

[Lustre-discuss] Disappearing OSTs

[Lustre-discuss] Disappearing OSTs

[Lustre-discuss] Disappearing OSTs