thr3ads.net - Lustre discuss - [Lustre-discuss] SLES11, lustre 1.82 with lvm and multipathing problems [Jun 2010]

If this information is useful, please help other people find it:
Share via:

lustre

2010-Jun-16 09:23 UTC

[Lustre-discuss] SLES11, lustre 1.82 with lvm and multipathing problems

Hello Folks,

we have one LUN on our MGS|MGT Server.
The LUN is availible over two pathes.
( Multipathing with OS embeded rdac driver, SLES11)

snowball-mds2:/proc/fs # multipath -ll

3600a0b80005a7215000002034b952b00 dm-10 SUN,LCSM100_S

[size=419G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]

\_ round-robin 0 [prio=6][active]

  \_ 6:0:1:1 sdd 8:48 [active][ready]

\_ round-robin 0 [prio=1][enabled]

  \_ 6:0:0:1 sdb 8:16 [active][ghost]

We create an LVM device on this LUN.

snowball-mds2:~ # lvscan

   ACTIVE            ''/dev/mds2/mgs2'' [418.68 GB] inherit

everything works fine.

Now we switched the controller on the storage to simulate a path failover:

# multipath -ll

3600a0b80005a7215000002034b952b00 dm-10 SUN,LCSM100_S

[size=419G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]

\_ round-robin 0 [prio=1][enabled]

  \_ 6:0:1:1 sdd 8:48 [active][ghost]

\_ round-robin 0 [prio=6][enabled]

  \_ 6:0:0:1 sdb 8:16 [active][ready]


after that the MDT Device is unhealthy:

snowball-mds2:/proc/fs # cat /proc/fs/lustre/health_check

device tools-MDT0000 reported unhealthy

NOT HEALTHY

and we can not remount the filesystem -> the filesystem is not writeable

We can see this in /var/log/messages, as there is a warning about this 
filesystem beeing in read-only mode.


snowball-mds2:/proc/fs # tunefs.lustre --dryrun /dev/mds2/mgs2

checking for existing Lustre data: found CONFIGS/mountdata

Reading CONFIGS/mountdata

    Read previous values:

Target:

Index:      unassigned

Lustre FS:  lustre

Mount type: ldiskfs

Flags:      0x70

               (needs_index first_time update )

Persistent mount opts:

Parameters:


tunefs.lustre FATAL: must set target type: MDT,OST,MGS

tunefs.lustre: exiting with 22 (Invalid argument)



after a reboot everything works fine again.
Is there a problem this the lvm configuration?
We found an document to enable multipathing on lvm2, but it doesent work.

Is lustre 1.8.2 supported on lvm and multipathing?


We are concerning about the availability and consistency about the 
lustre filesystem e.g. metadata. Because
the metadata isn''t correctly acailable after a path failover of the 
metadata-(MDT)-device. The path-failover should
be absolutely transparent to the LVM LUN used for the MDT and the 
Lustre-FS on it. Is this correct?
We tested the path-failover functionality with a simple ext3-fs on the 
device and we could not see any problem.
Also I think it is not recommended to configure the lustre filesystem to 
remain in write-mode when an "error" accours, isnt''t it?


Does anyone have experiance with the above mentioned configuration? Are 
there any known bugs?


Thanks and regards

Matthias


Additional Information:

snowball-mds2:/proc/fs # uname -a

Linux snowball-mds2 2.6.27.39-0.3_lustre.1.8.2-default #1 SMP 2009-11-23 
12:57:38 +0100 x86_64 x86_64 x86_64 GNU/Linux
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100616/dd85ec9f/attachment-0001.html

Andreas Dilger

2010-Jun-16 14:56 UTC

head link

[Lustre-discuss] SLES11, lustre 1.82 with lvm and multipathing problems

On 2010-06-16, at 03:23, lustre wrote:> we have one LUN on our MGS|MGT Server.
> The LUN is availible over two pathes.
> ( Multipathing with OS embeded rdac driver, SLES11)
> 
> Now we switched the controller on the storage to simulate a path failover:
> after that the MDT Device is unhealthy:
> multipath -ll 3600a0b80005a7215000002034b952b00 dm-10 SUN,LCSM100_S
> and we can not remount the filesystem -> the filesystem is not writeable
> We can see this in /var/log/messages, as there is a warning about this
filesystem beeing in read-only mode.
This is a problem below Lustre.  It appears the multipath configuration cannot
handle a path failure without reporting IO errors to the filesystem.  Either the
command you are using to test path failures is not doing what you expect, or the
multipath is not working correctly (e.g. your second path is not actually
working, or it is cabled incorrectly, or the multipath driver is broken).

You need to fix this, and then Lustre should work.  I''m assuming that
you configured the MDT device correctly to use the right block device, and it
isn''t accidentally using the raw underlying device and avoiding the
multipath.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Atul Vidwansa

2010-Jun-17 00:29 UTC

head link

[Lustre-discuss] SLES11, lustre 1.82 with lvm and multipathing problems

Hi Matthias,

As far as I know, RDAC also supports multipathing. So, you should have 
only one of RDAC or dm-multipath enabled on the MDS server.  Moreover, 
your HBA driver could be doing multipathing and failover as well.

I usually disable failover in HBA driver and use RDAC only for reliable 
multupathing. For example, if using qlogic FC HBAs, set "options qla2xxx 
ql2xfailover=0" in /etc/modprobe.conf and see if all shared luns are 
visible through both controllers and RDAC (ls -lR /proc/mpp).

Cheers,
-Atul


On 06/16/2010 07:23 PM, lustre wrote:> Hello Folks,
>
> we have one LUN on our MGS|MGT Server.
> The LUN is availible over two pathes.
> ( Multipathing with OS embeded rdac driver, SLES11)
>
> snowball-mds2:/proc/fs # multipath -ll
>
> 3600a0b80005a7215000002034b952b00 dm-10 SUN,LCSM100_S
>
> [size=419G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
>
> \_ round-robin 0 [prio=6][active]
>
>  \_ 6:0:1:1 sdd 8:48 [active][ready]
>
> \_ round-robin 0 [prio=1][enabled]
>
>  \_ 6:0:0:1 sdb 8:16 [active][ghost]
>
> We create an LVM device on this LUN.
>
> snowball-mds2:~ # lvscan
>
>   ACTIVE            ''/dev/mds2/mgs2'' [418.68 GB] inherit
>
> everything works fine.
> Now we switched the controller on the storage to simulate a path failover:
>
> # multipath -ll
>
> 3600a0b80005a7215000002034b952b00 dm-10 SUN,LCSM100_S
>
> [size=419G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
>
> \_ round-robin 0 [prio=1][enabled]
>
>  \_ 6:0:1:1 sdd 8:48 [active][ghost]
>
> \_ round-robin 0 [prio=6][enabled]
>
>  \_ 6:0:0:1 sdb 8:16 [active][ready]
>
>
> after that the MDT Device is unhealthy:
>
> snowball-mds2:/proc/fs # cat /proc/fs/lustre/health_check
>
> device tools-MDT0000 reported unhealthy
>
> NOT HEALTHY
>
> and we can not remount the filesystem -> the filesystem is not writeable
>
> We can see this in /var/log/messages, as there is a warning about this 
> filesystem beeing in read-only mode.
>
>
> snowball-mds2:/proc/fs # tunefs.lustre --dryrun /dev/mds2/mgs2
>
> checking for existing Lustre data: found CONFIGS/mountdata
>
> Reading CONFIGS/mountdata
>
>    Read previous values:
>
> Target:
>
> Index:      unassigned
>
> Lustre FS:  lustre
>
> Mount type: ldiskfs
>
> Flags:      0x70
>
>               (needs_index first_time update )
>
> Persistent mount opts:
>
> Parameters:
>
>
> tunefs.lustre FATAL: must set target type: MDT,OST,MGS
>
> tunefs.lustre: exiting with 22 (Invalid argument)
>
>
>
> after a reboot everything works fine again.
> Is there a problem this the lvm configuration?
> We found an document to enable multipathing on lvm2, but it doesent work.
>
> Is lustre 1.8.2 supported on lvm and multipathing?
>
>
> We are concerning about the availability and consistency about the 
> lustre filesystem e.g. metadata. Because
> the metadata isn''t correctly acailable after a path failover of
the
> metadata-(MDT)-device. The path-failover should
> be absolutely transparent to the LVM LUN used for the MDT and the 
> Lustre-FS on it. Is this correct?
> We tested the path-failover functionality with a simple ext3-fs on the 
> device and we could not see any problem.
> Also I think it is not recommended to configure the lustre filesystem 
> to remain in write-mode when an "error" accours, isnt''t
it?
>
>
> Does anyone have experiance with the above mentioned configuration? 
> Are there any known bugs?
>
>
> Thanks and regards
>
> Matthias
>
>
> Additional Information:
>
> snowball-mds2:/proc/fs # uname -a
>
> Linux snowball-mds2 2.6.27.39-0.3_lustre.1.8.2-default #1 SMP 
> 2009-11-23 12:57:38 +0100 x86_64 x86_64 x86_64 GNU/Linux
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>    
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100617/b802472e/attachment.html

lustre

2010-Jun-17 13:29 UTC

head link

[Lustre-discuss] SLES11, lustre 1.82 with lvm and multipathing problems

Hi Atul,
thanks for your reply.
Now we use multipathing without LVM and it works fine.
So the problem comes from LVM.

Cheers,
Matthias

Am 17.06.2010 02:29, schrieb Atul Vidwansa:> Hi Matthias,
>
> As far as I know, RDAC also supports multipathing. So, you should have 
> only one of RDAC or dm-multipath enabled on the MDS server.  Moreover, 
> your HBA driver could be doing multipathing and failover as well.
>
> I usually disable failover in HBA driver and use RDAC only for 
> reliable multupathing. For example, if using qlogic FC HBAs, set 
> "options qla2xxx ql2xfailover=0" in /etc/modprobe.conf and see if
all
> shared luns are visible through both controllers and RDAC (ls -lR 
> /proc/mpp).
>
> Cheers,
> -Atul
>
>
> On 06/16/2010 07:23 PM, lustre wrote:
>> Hello Folks,
>>
>> we have one LUN on our MGS|MGT Server.
>> The LUN is availible over two pathes.
>> ( Multipathing with OS embeded rdac driver, SLES11)
>>
>> snowball-mds2:/proc/fs # multipath -ll
>>
>> 3600a0b80005a7215000002034b952b00 dm-10 SUN,LCSM100_S
>>
>> [size=419G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
>>
>> \_ round-robin 0 [prio=6][active]
>>
>>  \_ 6:0:1:1 sdd 8:48 [active][ready]
>>
>> \_ round-robin 0 [prio=1][enabled]
>>
>>  \_ 6:0:0:1 sdb 8:16 [active][ghost]
>>
>> We create an LVM device on this LUN.
>>
>> snowball-mds2:~ # lvscan
>>
>>   ACTIVE            ''/dev/mds2/mgs2'' [418.68 GB]
inherit
>>
>> everything works fine.
>> Now we switched the controller on the storage to simulate a path
failover:
>>
>> # multipath -ll
>>
>> 3600a0b80005a7215000002034b952b00 dm-10 SUN,LCSM100_S
>>
>> [size=419G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
>>
>> \_ round-robin 0 [prio=1][enabled]
>>
>>  \_ 6:0:1:1 sdd 8:48 [active][ghost]
>>
>> \_ round-robin 0 [prio=6][enabled]
>>
>>  \_ 6:0:0:1 sdb 8:16 [active][ready]
>>
>>
>> after that the MDT Device is unhealthy:
>>
>> snowball-mds2:/proc/fs # cat /proc/fs/lustre/health_check
>>
>> device tools-MDT0000 reported unhealthy
>>
>> NOT HEALTHY
>>
>> and we can not remount the filesystem -> the filesystem is not
writeable
>>
>> We can see this in /var/log/messages, as there is a warning about 
>> this filesystem beeing in read-only mode.
>>
>>
>> snowball-mds2:/proc/fs # tunefs.lustre --dryrun /dev/mds2/mgs2
>>
>> checking for existing Lustre data: found CONFIGS/mountdata
>>
>> Reading CONFIGS/mountdata
>>
>>    Read previous values:
>>
>> Target:
>>
>> Index:      unassigned
>>
>> Lustre FS:  lustre
>>
>> Mount type: ldiskfs
>>
>> Flags:      0x70
>>
>>               (needs_index first_time update )
>>
>> Persistent mount opts:
>>
>> Parameters:
>>
>>
>> tunefs.lustre FATAL: must set target type: MDT,OST,MGS
>>
>> tunefs.lustre: exiting with 22 (Invalid argument)
>>
>>
>>
>> after a reboot everything works fine again.
>> Is there a problem this the lvm configuration?
>> We found an document to enable multipathing on lvm2, but it doesent
work.
>>
>> Is lustre 1.8.2 supported on lvm and multipathing?
>>
>>
>> We are concerning about the availability and consistency about the 
>> lustre filesystem e.g. metadata. Because
>> the metadata isn''t correctly acailable after a path failover
of the
>> metadata-(MDT)-device. The path-failover should
>> be absolutely transparent to the LVM LUN used for the MDT and the 
>> Lustre-FS on it. Is this correct?
>> We tested the path-failover functionality with a simple ext3-fs on 
>> the device and we could not see any problem.
>> Also I think it is not recommended to configure the lustre filesystem 
>> to remain in write-mode when an "error" accours,
isnt''t it?
>>
>>
>> Does anyone have experiance with the above mentioned configuration? 
>> Are there any known bugs?
>>
>>
>> Thanks and regards
>>
>> Matthias
>>
>>
>> Additional Information:
>>
>> snowball-mds2:/proc/fs # uname -a
>>
>> Linux snowball-mds2 2.6.27.39-0.3_lustre.1.8.2-default #1 SMP 
>> 2009-11-23 12:57:38 +0100 x86_64 x86_64 x86_64 GNU/Linux
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>    
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>    
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100617/31c71788/attachment-0001.html

Ms. Megan Larko

2010-Jun-18 13:59 UTC

head link

[Lustre-discuss] SLES11, lustre 1.82 with lvm and multipathing problems

Hi List!

Matthias wrote:>Hi Atul,
>thanks for your reply.
>Now we use multipathing without LVM and it works fine.
>So the problem comes from LVM.
>
>Cheers,
>Matthias
I am very interested in hearing more details about this.   There is a
cluster which I wish to upgrade/clean-install  to version 1.8.3 or
1.8.4 (currently 1.6.7) and I desire to use both LVM and multipathing.
  The current set-up does use multipathing but not LVM.   I was hoping
to introduce LVM into the new build to enable more efficient snap-shot
backups and expandability on the MDT disks.

Do you know what the conflicts are with the multipathing and LVM?
Was the case-in-point multpathing and LVM on MDT volumes or other?

I appreciate your kindness in sharing your experience.

Sincerely,
megan

SGI Federal

Lustre discuss - Jun 2010 - SLES11, lustre 1.82 with lvm and multipathing problems

[Lustre-discuss] SLES11, lustre 1.82 with lvm and multipathing problems

[Lustre-discuss] SLES11, lustre 1.82 with lvm and multipathing problems

[Lustre-discuss] SLES11, lustre 1.82 with lvm and multipathing problems

[Lustre-discuss] SLES11, lustre 1.82 with lvm and multipathing problems

[Lustre-discuss] SLES11, lustre 1.82 with lvm and multipathing problems