Greetings, I''ve posted before but no one responded. I''m reposting because I''m really dead in the water here until I can get this fixed. The issue is that my OSTs don''t survive a reboot of the OSS. In the below I''m dealing with two OSTs, quad-core Intel Xeon machines with 8Gigs memory and dual port Qlogic fiber channel card. They both run SLES 10.1 and lustre 1.6.4.3. My two MDS (similiar, though not exactly same hardware), don''t have the same problem, though I''m only accessing a single MDT from them. I''ve produced the problem by something as simple as running umount /mnt/lustre/ost/ost_oss01_lustre0102_01 tune2fs -O +mmp /dev/mapper/ost_oss01_lustre0102_01 mount -t lustre /dev/mapper/ost_oss01_lustre0102_01 /mnt/lustre/ost/ost_oss01_lustre0102_01 This may have multiple causes. Are the mount options correct? Check the syslog for more info. When I look at the partition table with parted I see that it''s changed from loop to gpt (as shown below). But the simpliest case is: oss01:/net/lmd01/space/lustre # mkfs.lustre --reformat --fsname i3_lfs3 --ost --failnode oss02 --mgsnode mds01 --mgsnode mds02 /dev/mapper/ost_oss01_lustre0102_01 oss01:/net/lmd01/space/lustre # reboot # log in oss01:/net/lmd01/space/lustre # mount -t lustre /dev/mapper/ost_oss01_lustre0102_01 /mnt/lustre/ost/ost_oss01_lustre0102_01 mount.lustre: mount /dev/mapper/ost_oss01_lustre0102_01 at /mnt/lustre/ost/ost_oss01_lustre0102_01 failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. oss01:/net/lmd01/space/lustre # dumpe2fs -h /dev/mapper/ost_oss01_lustre0102_01 |grep feature dumpe2fs 1.40.4.cfs1 (31-Dec-2007) dumpe2fs: Bad magic number in super-block while trying to open /dev/mapper/ost_oss01_lustre0102_01 # another example, I re-run mkfs.lustre on the above device and mount # it and 2 other OSTs on the second OSS oss02:/net/lmd01/space/lustre # df|egrep ''File|ost'' Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/ost_oss01_lustre0102_01 5768201600 469544 5474724244 1% /mnt/lustre/ost/ost_oss01_lustre0102_01 /dev/mapper/ost_oss01_lustre0102_02 5768201600 469540 5474724248 1% /mnt/lustre/ost/ost_oss01_lustre0102_02 /dev/mapper/ost_oss02_lustre0102_01 5768201600 479940 5474713848 1% /mnt/lustre/ost/ost_oss02_lustre0102_01 # I reboot the first machine then oss02:/net/lmd01/space/lustre # umount -t lustre -a # then try to mount from first machine and ... oss01:/net/lmd01/space/lustre # cat a mount -t lustre /dev/mapper/ost_oss01_lustre0102_01 /mnt/lustre/ost/ost_oss01_lustre0102_01 mount -t lustre /dev/mapper/ost_oss01_lustre0102_02 /mnt/lustre/ost/ost_oss01_lustre0102_02 mount -t lustre /dev/mapper/ost_oss02_lustre0102_01 /mnt/lustre/ost/ost_oss02_lustre0102_01 oss01:/net/lmd01/space/lustre # sh a mount.lustre: mount /dev/mapper/ost_oss01_lustre0102_01 at /mnt/lustre/ost/ost_oss01_lustre0102_01 failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. oss01:/net/lmd01/space/lustre # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/cciss/c0d0p3 61022084 6398044 51524300 12% / udev 4089220 312 4088908 1% /dev /dev/cciss/c0d0p1 1241220 48324 1129844 5% /boot lmd01:/space 470387232 8296256 438196704 2% /net/lmd01/space /dev/mapper/ost_oss01_lustre0102_02 5768201600 469540 5474724248 1% /mnt/lustre/ost/ost_oss01_lustre0102_02 /dev/mapper/ost_oss02_lustre0102_01 5768201600 479940 5474713848 1% /mnt/lustre/ost/ost_oss02_lustre0102_01 # So the device was up just fine on one machine, I umounted them and tried on the other OSS # and the partition table has changed oss01:/net/lmd01/space/lustre # /usr/local/sbin/parted /dev/mapper/ost_oss01_lustre0102_01 GNU Parted 1.8.8 Using /dev/mapper/ost_oss01_lustre0102_01 Welcome to GNU Parted! Type ''help'' to view a list of commands. (parted) p Model: Linux device-mapper (dm) Disk /dev/mapper/ost_oss01_lustre0102_01: 6001GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags (parted) quit # I can''t just put another partition table back (parted) mklabel Warning: The existing disk label on /dev/mapper/ost_oss01_lustre0102_01 will be destroyed and all data on this disk will be lost. Do you want to continue? Yes/No? yes New disk label type? [gpt]? loop (parted) p Model: Linux device-mapper (dm) Disk /dev/mapper/ost_oss01_lustre0102_01: 6001GB Sector size (logical/physical): 512B/512B Partition Table: loop Number Start End Size File system Flags (parted) mkpart File system type? [ext2]? ext3 Start? 0 End? 6001GB (parted) p Error: /dev/mapper/ost_oss01_lustre0102_01: unrecognised disk label (parted) quit # There is nothing unusual about the device; looking at multipath oss01:/net/lmd01/space/lustre # multipath -l|grep ost_oss01_lustre0102_01 ost_oss01_lustre0102_01 (36000402001fc14596ef496ed00000000) dm-4 NEXSAN,SATABeast oss02:/net/lmd01/space/lustre # multipath -l|grep ost_oss01_lustre0102_01 ost_oss01_lustre0102_01 (36000402001fc14596ef496ed00000000) dm-4 NEXSAN,SATABeast Any suggestions would be deeply appreciated. Thanks much, JR Smith
jrs wrote:> Greetings, > > I''ve posted before but no one responded. I''m reposting because I''m > really dead in the water here until I can get this fixed. > > The issue is that my OSTs don''t survive a reboot of the OSS. > > In the below I''m dealing with two OSTs, quad-core Intel Xeon machines > with 8Gigs memory and dual port Qlogic fiber channel card. They both > run SLES 10.1 and lustre 1.6.4.3. My two MDS (similiar, though not > exactly same hardware), don''t have the same problem, though I''m only > accessing a single MDT from them. > > I''ve produced the problem by something as simple as running > umount /mnt/lustre/ost/ost_oss01_lustre0102_01 > tune2fs -O +mmp /dev/mapper/ost_oss01_lustre0102_01 > mount -t lustre /dev/mapper/ost_oss01_lustre0102_01 /mnt/lustre/ost/ost_oss01_lustre0102_01 >.....> Any suggestions would be deeply appreciated.It looks like something is really destroying your disks, if you try this with ordinary ext3, does the filesystem survive a reboot? Otherwise, you could try: - mkfs.lustre as before. # tunefs.lustre --print <device> reboot # tunefs.lustre --print <device> Tunefs with --print is read-only, if it doesn''t work the second time, you should be able to compare the results. cliffw> > > Thanks much, > JR Smith > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
I just made an ext3 filesystem, mounted it (on both OSSes - not at the same time), unmounted, reboot both servers and it''s still there. It appearance that this destruction of filesystems is a lustre only thing. A difference between this and the lustre filesystem, of course, is that there is not device name created for the partition, e.g., oss01:~ # ls -l /dev/mapper/ost_oss01_lustre0102_01_bad_no_use* brw------- 1 root root 253, 7 May 2 09:09 /dev/mapper/ost_oss01_lustre0102_01_bad_no_use brw------- 1 root root 253, 12 May 2 09:09 /dev/mapper/ost_oss01_lustre0102_01_bad_no_use-part1 while a lustre OST uses the whole disk/volume oss01:~ # ls -l /dev/mapper/ost_oss01_lustre0304_02 brw------- 1 root root 253, 5 May 2 09:45 /dev/mapper/ost_oss01_lustre0304_02 In the mailing lists some time back someone had talked about kpartx (though I think it was in the context of having a consistent device name - whch I have no trouble with since I''m explicitly naming them in /etc/multipathd.conf. Another issue that appears to be a bug, though is probably not related to my issue is when running mkfs.lustre with --failnode mmp should be set on the filesystem. However, looking at the output of dumpe2fs that doesn''t appear to be the case: oss01:/net/lmd01/space/lustre # dumpe2fs -h /dev/mapper/ost_oss01_lustre0304_02|grep -A 1 feat dumpe2fs 1.40.4.cfs1 (31-Dec-2007) Filesystem features: has_journal resize_inode dir_index filetype needs_recovery extents sparse_super large_file Filesystem flags: signed directory hash Of course, I can run tune2fs but that, in the past, has induced the disappearance of the filesystem as well. thanks, JR Cliff White wrote:> jrs wrote: >> Greetings, >> >> I''ve posted before but no one responded. I''m reposting because I''m >> really dead in the water here until I can get this fixed. >> >> The issue is that my OSTs don''t survive a reboot of the OSS. >> >> In the below I''m dealing with two OSTs, quad-core Intel Xeon machines >> with 8Gigs memory and dual port Qlogic fiber channel card. They both >> run SLES 10.1 and lustre 1.6.4.3. My two MDS (similiar, though not >> exactly same hardware), don''t have the same problem, though I''m only >> accessing a single MDT from them. >> >> I''ve produced the problem by something as simple as running >> umount /mnt/lustre/ost/ost_oss01_lustre0102_01 >> tune2fs -O +mmp /dev/mapper/ost_oss01_lustre0102_01 >> mount -t lustre /dev/mapper/ost_oss01_lustre0102_01 >> /mnt/lustre/ost/ost_oss01_lustre0102_01 >> > ..... > >> Any suggestions would be deeply appreciated. > > It looks like something is really destroying your disks, if you try this > with ordinary ext3, does the filesystem survive a reboot? > > Otherwise, you could try: > - mkfs.lustre as before. > # tunefs.lustre --print <device> > reboot > # tunefs.lustre --print <device> > > Tunefs with --print is read-only, if it doesn''t work the second time, > you should be able to compare the results. > cliffw > >> >> >> Thanks much, >> JR Smith >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >
On Fri, 2008-05-02 at 12:09 -0400, jrs wrote:> > A difference between this and the lustre filesystem, of course, is that > there is not device name created for the partition, e.g., > > oss01:~ # ls -l /dev/mapper/ost_oss01_lustre0102_01_bad_no_use* > brw------- 1 root root 253, 7 May 2 09:09 /dev/mapper/ost_oss01_lustre0102_01_bad_no_use > brw------- 1 root root 253, 12 May 2 09:09 /dev/mapper/ost_oss01_lustre0102_01_bad_no_use-part1 > > while a lustre OST uses the whole disk/volumeSo for your ext3 test you partitioned the disk and used a partition and for lustre you used the whole disk? Why not do a more apples-to-apples comparison and format the whole device with ext3 just like you would with Lustre? There is no rule that you have to use partitions with ext3. Also be sure you are using the exact same disk/device between your two tests to eliminate a possibility that this is related to only one specific device. You also mention partitions in your post. You need to make sure that if you are using a whole disk device (i.e. /dev/sda rather than /dev/sda1) you cannot use any partitioning tools on that device or you will overwrite the beginning of your filesystem. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080502/25ce2569/attachment.bin
On May 01, 2008 11:52 -0400, jrs wrote:> oss01:/net/lmd01/space/lustre # mount -t lustre /dev/mapper/ost_oss01_lustre0102_01 /mnt/lustre/ost/ost_oss01_lustre0102_01 > mount.lustre: mount /dev/mapper/ost_oss01_lustre0102_01 at /mnt/lustre/ost/ost_oss01_lustre0102_01 failed: Invalid argument > This may have multiple causes. > Are the mount options correct? > Check the syslog for more info.^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ???????> mount.lustre: mount /dev/mapper/ost_oss01_lustre0102_01 at /mnt/lustre/ost/ost_oss01_lustre0102_01 failed: Invalid argument > This may have multiple causes. > Are the mount options correct? > Check the syslog for more info.^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ????????> # I can''t just put another partition table back > > (parted) mklabel > Warning: The existing disk label on /dev/mapper/ost_oss01_lustre0102_01 will be destroyed and all data on this disk will be lost. Do you want > to continue? > Yes/No? yes > New disk label type? [gpt]? loop > (parted) pWhat is a "loop" partition table?> Model: Linux device-mapper (dm) > Disk /dev/mapper/ost_oss01_lustre0102_01: 6001GBNote that anything over 2TB (I think, maybe 4TB?) needs a GPT partition table, or the size of the device is incorrect. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Fri, May 02, 2008 at 07:34:25PM -0700, Andreas Dilger wrote:> On May 01, 2008 11:52 -0400, jrs wrote: > > Note that anything over 2TB (I think, maybe 4TB?) needs a GPT partition table, > or the size of the device is incorrect. >And I have a warning for people who need to use GPT tables - from our experience the kernel silently ignores GPT. We run into this already some months ago. Everything seemed to be correct, /proc/partitions was what we did expect, but then we noticed very odd data corruption. First we thought we introduced a bug into ldiskfs (you know, we usually need more recent kernels than you support), but after adding a patch printing the partition offsets, we recognized it simply did use a dos partition table and wrapped around at 4TiB. Oh well, I wanted to to further investigate this already for a long time. But since it can be easily worked around by specifying the "gpt" kernel command line parameter other issues always did have higher priority. Bernd PS: This gpt bug is in 2.6.20 and 2.6.22, don''t know if it fixed in more recent kernel versions.
Hello Bernd, could you give some details on the data corruption you have seen? We have been using GPT tables for some time now, but the partitions have been filled up only on NFS servers, not on Lustre OSS. And on these NFS volumes there were no problems attributed to the partition size. This would also depend on the actual size limit: I understand GPT tables are necessary for partitions > 2TB. The largest partition I have set up so far was 3.2 TB, so if you are sure about your number of 4 TiB, we might simply be on the lucky side. And the workaround is specifying "gpt" on the kernel command line ? For once an easy solution ;-) Regards, Thomas Bernd Schubert wrote:> On Fri, May 02, 2008 at 07:34:25PM -0700, Andreas Dilger wrote: >> On May 01, 2008 11:52 -0400, jrs wrote: >> >> Note that anything over 2TB (I think, maybe 4TB?) needs a GPT partition table, >> or the size of the device is incorrect. >> > > And I have a warning for people who need to use GPT tables - from our > experience the kernel silently ignores GPT. We run into this already > some months ago. Everything seemed to be correct, /proc/partitions > was what we did expect, but then we noticed very odd data corruption. > First we thought we introduced a bug into ldiskfs (you know, we > usually need more recent kernels than you support), but after > adding a patch printing the partition offsets, we recognized it > simply did use a dos partition table and wrapped around at 4TiB. > > Oh well, I wanted to to further investigate this already for a long > time. But since it can be easily worked around by specifying the > "gpt" kernel command line parameter other issues always did have > higher priority. > > Bernd > > PS: This gpt bug is in 2.6.20 and 2.6.22, don''t know if it fixed in more > recent kernel versions. > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- -------------------------------------------------------------------- Thomas Roth Department: Informationstechnologie Location: SB3 1.262 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 Gesellschaft f?r Schwerionenforschung mbH Planckstra?e 1 D-64291 Darmstadt www.gsi.de Gesellschaft mit beschr?nkter Haftung Sitz der Gesellschaft: Darmstadt Handelsregister: Amtsgericht Darmstadt, HRB 1528 Gesch?ftsf?hrer: Professor Dr. Horst St?cker Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph, Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
On May 05, 2008 11:57 -0400, jrs wrote:> mds01:/net/lmd01/space/lustre # mount -t lustre /dev/mapper/mdt_mds01_lustre0102 /mnt/lustre/mdt > mount.lustre: mount /dev/mapper/mdt_mds01_lustre0102 at /mnt/lustre/mdt failed: Invalid argument > This may have multiple causes. > Are the mount options correct? > Check the syslog for more info. > > Which produces this in /var/log/message > > May 5 09:35:41 mds01 kernel: VFS: Can''t find ldiskfs filesystem on dev dm-1. > May 5 09:35:41 mds01 multipathd: dm-1: umount map (uevent) > May 5 09:35:41 mds01 kernel: LustreError: > 16215:0:(obd_mount.c:1229:server_kernel_mount()) premount > /dev/mapper/mdt_mds01_lustre0102:0x0 ldiskfs failed: -22, ldiskfs2 failed: > -19. Is the ldiskfs module available? > May 5 09:35:41 mds01 kernel: LustreError: 16215:0:(obd_mount.c:1533:server_fill_super()) Unable to mount device /dev/mapper/mdt_mds01_lustre0102: -22 > May 5 09:35:41 mds01 kernel: LustreError: 16215:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount (-22) > > If I try to look at the partition table with parted I see: > > mds01:/net/oss02/space/parted-1.8.8 # /usr/local/sbin/parted /dev/mapper/mdt_mds01_lustre0102 > GNU Parted 1.8.8 > Using /dev/mapper/mdt_mds01_lustre0102 > Welcome to GNU Parted! Type ''help'' to view a list of commands. > (parted) p > Error: /dev/mapper/mdt_mds01_lustre0102: unrecognised disk label > (parted) > > A good filesystem looks like: > mds01:/net/oss02/space/parted-1.8.8 # /usr/local/sbin/parted /dev/mapper/ost_oss01_lustre0304_01 > GNU Parted 1.8.8 > Using /dev/mapper/ost_oss01_lustre0304_01 > Welcome to GNU Parted! Type ''help'' to view a list of commands. > (parted) p > Model: Unknown (unknown) > Disk /dev/mapper/ost_oss01_lustre0304_01: 6001GB > Sector size (logical/physical): 512B/512B > Partition Table: loop > > Number Start End Size File system Flags > 1 0.00B 6001GB 6001GB ext3 > > > NOTE: in another post someone commented on the loop partition type. > I don''t know what it is but all my lustre partitions are of that > type. The fact that a lustre person (I believe this individual was > employed by Sun) was unfamiliar with it certainly is surprising.I don''t think that being employed by Sun makes everyone suddenly know and understand everything :-). That other person was me, and while I''ve even contributed a significant amount of code to parted in the past, I just haven''t used it in several years and am not familiar with the "loop" partition type.> Perhaps my version of parted has an issue (the one shipped with SLES > returns: > mds01:/net/oss02/space/parted-1.8.8 # parted /dev/mapper/mdt_mds01_lustre0102 > Floating point exceptionTwo things of note: - there have been ongoing issues with parted and ldiskfs with large disk devices, and I tend to avoid parted and fdisk entirely for these reasons. I''ve been using LVM (DM) to manage my storage for some time now, if it is needed. - we generally do NOT recommend using partitions of any kind for production Lustre filesystems, because of problems like this, and the fact that in RAID setups this can hurt performance due to misaligned IO to the disk. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Hello Thomas, On Monday 05 May 2008 18:53:12 Thomas Roth wrote:> Hello Bernd, > > could you give some details on the data corruption you have seen? > We have been using GPT tables for some time now, but the partitions have > been filled up only on NFS servers, not on Lustre OSS. And on these NFS > volumes there were no problems attributed to the partition size.the type of filesystem is not important at all, it is still on low device level.> > This would also depend on the actual size limit: I understand GPT tables > are necessary for partitions > 2TB. The largest partition I have set up > so far was 3.2 TB, so if you are sure about your number of 4 TiB, we > might simply be on the lucky side.I''m rather sure we did see the problem with >4TiB only.> > And the workaround is specifying "gpt" on the kernel command line ? For > once an easy solution ;-)Yes, usually the kernel seems to first try the dos partition table and then tries other tables. But the first 512B of GPT seems to be compatible with DOS and so the kernel thinks the dos-table is fine. When you specify gpt, it first tries gpt and if this doesn''t fit it switches to dos. Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH
On Monday 05 May 2008 19:21:43 Andreas Dilger wrote:> > > > (parted) p > > Model: Unknown (unknown) > > Disk /dev/mapper/ost_oss01_lustre0304_01: 6001GB > > Sector size (logical/physical): 512B/512B > > Partition Table: loop > > > > Number Start End Size File system Flags > > 1 0.00B 6001GB 6001GB ext3 > > > > > > NOTE: in another post someone commented on the loop partition type. > > I don''t know what it is but all my lustre partitions are of that > > type. The fact that a lustre person (I believe this individual was > > employed by Sun) was unfamiliar with it certainly is surprising. > > I don''t think that being employed by Sun makes everyone suddenly know > and understand everything :-). That other person was me, and while > I''ve even contributed a significant amount of code to parted in the > past, I just haven''t used it in several years and am not familiar with > the "loop" partition type.You definitely know more about filesystems and partitions than I do, but I''m sure this is a bug.> > > Perhaps my version of parted has an issue (the one shipped with SLES > > returns: > > mds01:/net/oss02/space/parted-1.8.8 # parted > > /dev/mapper/mdt_mds01_lustre0102 Floating point exceptionProbably this: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=259248> > Two things of note: > - there have been ongoing issues with parted and ldiskfs with large disk > devices, and I tend to avoid parted and fdisk entirely for these reasons. > I''ve been using LVM (DM) to manage my storage for some time now, if it is > needed. > - we generally do NOT recommend using partitions of any kind for production > Lustre filesystems, because of problems like this, and the fact that in > RAID setups this can hurt performance due to misaligned IO to the disk. >Well, I wish we wouldn''t need to use partitions, but for some projects we need to do so: - ldiskfs is still limited to 8TiB - linux-md raid6 is not parallized and a single CPU becomes a limit, while 7 other CPUs are idling. Creating several raid sets is then some kind of parallization Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH
Well, things have changed again as I''m trying to get back to something that works but on one of the MDSs you see. Below that I have the output of ''multipath -l''. I dual port HBAs and multiple paths to the backend storage so it looks a little complex. I''ve modified the /etc/multipathd.conf file to give the logical names you see, e.g., ost_lustre03-04_04_oss01_dm_7_mds01. Even though it looks a little scary remember that things work fine and can even survive a random number of reboots before an OST disappears. Since the last time I posted I had an MST go away too. Does anyone think that I might have better luck running Redhat? I''ve looked through the /etc/init.d/* files but can''t see anything that might be destroying the partition. Thanks John $ cat /proc/partition major minor #blocks name 104 0 71652960 cciss/c0d0 104 1 2104483 cciss/c0d0p1 104 2 69545385 cciss/c0d0p2 8 0 5860157184 sda 8 16 5860157184 sdb 8 32 5860230912 sdc 8 48 5860156250 sdd 8 64 5860156250 sde 8 80 5860156250 sdf 8 96 5860156250 sdg 8 112 5860156250 sdh 8 128 5860156250 sdi 8 144 5860157184 sdj 8 160 5860157184 sdk 8 176 5860230912 sdl 8 192 5860156250 sdm 8 193 5860156216 sdm1 8 208 5860156250 sdn 8 224 5860156250 sdo 8 240 5860157184 sdp 65 0 5860157184 sdq 65 16 5860230912 sdr 65 32 5860156250 sds 65 33 5860156216 sds1 65 48 5860156250 sdt 65 64 5860156250 sdu 65 80 5860157184 sdv 65 96 5860157184 sdw 65 112 5860230912 sdx 65 128 5860156250 sdy 65 129 5860156216 sdy1 65 144 5860156250 sdz 65 160 5860156250 sdaa 65 176 5860157184 sdab 65 192 5860157184 sdac 65 208 5860230912 sdad 65 224 5860156250 sdae 65 225 5860156216 sdae1 65 240 5860156250 sdaf 66 0 5860156250 sdag 66 16 5860157184 sdah 66 32 5860157184 sdai 66 48 5860230912 sdaj 66 64 5860157184 sdak 66 80 5860157184 sdal 66 96 5860230912 sdam 66 112 5860156250 sdan 66 128 5860156250 sdao 66 144 5860156250 sdap 66 160 5860156250 sdaq 66 176 5860156250 sdar 66 192 5860156250 sdas 66 208 5860157184 sdat 66 224 5860157184 sdau 66 240 5860230912 sdav 253 0 5860156250 dm-0 253 1 5860157184 dm-1 253 2 5860157184 dm-2 253 3 5860230912 dm-3 253 4 5860156250 dm-4 253 5 5860156250 dm-5 253 6 5860157184 dm-6 253 7 5860157184 dm-7 253 8 5860230912 dm-8 253 9 5860156250 dm-9 253 10 5860156250 dm-10 253 11 5860156250 dm-11 253 12 5860156216 dm-12 $ multipath -l ost_lustre03-04_04_oss01_dm_7_mds01 (36000402001fc308260c0ace100000000) dm-7 NEXSAN,SATABeast [size=5.5T][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 1:0:3:4 sdai 66:32 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:7:4 sdau 66:224 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:3:4 sdk 8:160 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:7:4 sdw 65:96 [active][undef] ost_lustre01-02_04_oss01_dm_5_mds01 (36000402001fc14596ef496fd00000000) dm-5 NEXSAN,SATABeast [size=5.5T][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 1:0:2:1 sdaf 65:240 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:4:1 sdn 8:208 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:6:1 sdt 65:48 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:0:1 sdz 65:144 [active][undef] ost_lustre03-04_02_oss01_dm_3_mds01 (36000402001fc308260c0af3700000000) dm-3 NEXSAN,SATABeast [size=5.5T][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 1:0:1:2 sdad 65:208 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:4:2 sdam 66:96 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:0:2 sdc 8:32 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:5:2 sdr 65:16 [active][undef] ost_lustre01-02_02_oss01_dm_11_mds01 (36000402001fc14596ef497ee00000000) dm-11 NEXSAN,SATABeast [size=5.5T][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 1:0:5:5 sdap 66:144 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:6:5 sdas 66:192 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:1:5 sdf 8:80 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:2:5 sdi 8:128 [active][undef] ost_lustre01-02_05_oss02_dm_0_mds01 (36000402001fc14596ef4970e00000000) dm-0 NEXSAN,SATABeast [size=5.5T][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 1:0:0:2 sdaa 65:160 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:2:2 sdag 66:0 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:4:2 sdo 8:224 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:6:2 sdu 65:64 [active][undef] ost_lustre01-02_01_oss02_dm_10_mds01 (36000402001fc14596ef497dc00000000) dm-10 NEXSAN,SATABeast [size=5.5T][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 1:0:5:4 sdao 66:128 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:6:4 sdar 66:176 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:1:4 sde 8:64 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:2:4 sdh 8:112 [active][undef] mdt_lustre03-04_00_dm_8_mds01 (36000402001fc308260c0ac9e00000000) dm-8 NEXSAN,SATABeast [size=5.5T][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 1:0:3:5 sdaj 66:48 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:7:5 sdav 66:240 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:3:5 sdl 8:176 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:7:5 sdx 65:112 [active][undef] ost_lustre03-04_03_oss02_dm_6_mds01 (36000402001fc308260c0acc200000000) dm-6 NEXSAN,SATABeast [size=5.5T][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 1:0:3:3 sdah 66:16 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:7:3 sdat 66:208 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:3:3 sdj 8:144 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:7:3 sdv 65:80 [active][undef] ost_lustre01-02_03_oss02_dm_4_mds01 (36000402001fc14596ef496ed00000000) dm-4 NEXSAN,SATABeast [size=5.5T][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 1:0:2:0 sdae 65:224 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:4:0 sdm 8:192 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:6:0 sds 65:32 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:0:0 sdy 65:128 [active][undef] ost_lustre03-04_01_oss02_dm_2_mds01 (36000402001fc308260c0af1600000000) dm-2 NEXSAN,SATABeast [size=5.5T][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 1:0:1:1 sdac 65:192 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:4:1 sdal 66:80 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:0:1 sdb 8:16 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:5:1 sdq 65:0 [active][undef] ost_lustre01-02_00_oss01_dm_9_mds01 (36000402001fc14596ef497cc00000000) dm-9 NEXSAN,SATABeast [size=5.5T][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 1:0:5:3 sdan 66:112 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:6:3 sdaq 66:160 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:1:3 sdd 8:48 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:2:3 sdg 8:96 [active][undef] ost_lustre03-04_00_oss01_dm_1_mds01 (36000402001fc308260c0af5b00000000) dm-1 NEXSAN,SATABeast [size=5.5T][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 1:0:1:0 sdab 65:176 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:4:0 sdak 66:64 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:0:0 sda 8:0 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:5:0 sdp 8:240 [active][undef] Bernd Schubert wrote: > On Mon, May 05, 2008 at 12:30:23PM -0400, jrs wrote: >> I wonder if I''d have better luck, with the disappearing OST bug, if >> I actually explictly partitioned the device and then used, to take >> the example above >> >> /dev/mapper/ost_oss01_lustre0304_02-part1 >> >> rather than the whole disk. >> > > What does /proc/partitions say?