thr3ads.net - Lustre discuss - [Lustre-discuss] multihomed OST''s configuration [Jul 2008]

If this information is useful, please help other people find it:
Share via:

mdavid

2008-Jul-07 10:13 UTC

[Lustre-discuss] multihomed OST''s configuration

hi list
I am a new to lustre (1 week old) and this list.
I have some Dell PE1950 servers with MD1000 enclosures (scientific
linux 5 == RHEL5 x86_54) on them and lustre 1.6.5, with lustre patched
kernels on them

on a first try (indeed it was the second), I managed to have a lustre
up and running OK, now

each dell server has 4 times 1Gb interfaces, and I want to take profit
from them all
either I try bonding them, or go for multihomed (which is my first
try)

I have configured my DNS with 4 ip''s and corresponding names, and the
dell server I configured respectively the 4 interfaces, all of them
are now answering correctly to each ip

the problem, after taking a look at this list and reading the same
lines of the ops manual several times, it''s not clear to me how do I
change the configuration. So here is what I have on the OST

[root at se003 ~]# cat /etc/modprobe.conf
alias eth0 bnx2
alias eth1 bnx2
alias eth2 e1000
alias eth3 e1000
alias scsi_hostadapter megaraid_sas
options lnet networks=tcp(eth0,eth1,eth2,eth3)

after that , and with the OST''s umounted I have
tunefs.lustre --writeconf /dev/sdb1 (on each one)

[root at se003 ~]# tunefs.lustre --writeconf /dev/sdb2
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     LIPlstr-OST0001
Index:      1
Lustre FS:  LIPlstr
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.1.1.207 at tcp


   Permanent disk data:
Target:     LIPlstr-OST0001
Index:      1
Lustre FS:  LIPlstr
Mount type: ldiskfs
Flags:      0x102
              (OST writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.1.1.207 at tcp

Writing CONFIGS/mountdata

and mount some of them again without problem, but
lctl > list_nids
10.100.1.51 at tcp

10.100.1.51 is the ip of the eth1, the others do not show up, I
suppose they should

or should I use the lctl network commands on the MDS/MDT (both on one
machine)
to get all the routes somehow?

thanks in advance

cheers

Mario David

LIP-Lisbon system administrator

Brian J. Murrell

2008-Jul-08 12:25 UTC

head link

[Lustre-discuss] multihomed OST''s configuration

On Mon, 2008-07-07 at 03:13 -0700, mdavid wrote:> hi list
> I am a new to lustre (1 week old) and this list.
> I have some Dell PE1950 servers with MD1000 enclosures (scientific
> linux 5 == RHEL5 x86_54) on them and lustre 1.6.5, with lustre patched
> kernels on them
> 
> on a first try (indeed it was the second), I managed to have a lustre
> up and running OK, now
> 
> each dell server has 4 times 1Gb interfaces, and I want to take profit
> from them all
> either I try bonding them, or go for multihomed (which is my first
> try)
If what you want is to get the bandwidth of all 4 interfaces to the
Lustre servers then you really do want bonding.

Can you explain why you think you want multihoming vs. bonding?  Maybe
I''m misunderstanding your goal.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080708/e1e6c318/attachment.bin

D. Marc Stearman

2008-Jul-08 15:00 UTC

head link

[Lustre-discuss] multihomed OST''s configuration

On Jul 7, 2008, at 3:13 AM, mdavid wrote:> [root at se003 ~]# cat /etc/modprobe.conf
> alias eth0 bnx2
> alias eth1 bnx2
> alias eth2 e1000
> alias eth3 e1000
> alias scsi_hostadapter megaraid_sas
> options lnet networks=tcp(eth0,eth1,eth2,eth3)
>
> and mount some of them again without problem, but
> lctl > list_nids
> 10.100.1.51 at tcp
>
> 10.100.1.51 is the ip of the eth1, the others do not show up, I
> suppose they should
>
> or should I use the lctl network commands on the MDS/MDT (both on one
> machine)
> to get all the routes somehow?
You will only have one nid per logical LNET network (tcp0, tcp1,  
o2ib0, o2ib1, etc).  So in your setup, you will only have one nid,  
for tcp0, regardless of how many eth devices you include.  If you did  
something like this:

tcp0(eth0),tcp1(eth1),tcp2(eth2),tcp3(eth3), you would end up with 4  
nids, but that isn''t what you want.  It sounds like you have  it  
configured correctly.

If you were to do bonding, you would use the standard linux bonding  
methods to bond eth0-eth3 -> bond0, so your lnet options line would  
look like this:

options lnet networks=tcp(bond0)

and you would still have one nid.

Bonding is great if all your ethernet devices are plugged into a  
single switch that supports bonding.  If your network interfaces are  
on different ethernet switches for redundancy (or because you are  
lacking in ports), you cannot use bonding, and must go mutlihomed.

All of our servers (500+) are multihomed.

-Marc

----
D. Marc Stearman
LC Lustre Administration Lead
marc at llnl.gov
925.423.9670
Pager: 1.888.203.0641

Mario David

2008-Jul-08 15:49 UTC

head link

[Lustre-discuss] multihomed OST''s configuration

hi

D. Marc Stearman wrote:>
>
> On Jul 7, 2008, at 3:13 AM, mdavid wrote:
>> [root at se003 ~]# cat /etc/modprobe.conf
>> alias eth0 bnx2
>> alias eth1 bnx2
>> alias eth2 e1000
>> alias eth3 e1000
>> alias scsi_hostadapter megaraid_sas
>> options lnet networks=tcp(eth0,eth1,eth2,eth3)
>>
>> and mount some of them again without problem, but
>> lctl > list_nids
>> 10.100.1.51 at tcp
>>
>> 10.100.1.51 is the ip of the eth1, the others do not show up, I
>> suppose they should
>>
>> or should I use the lctl network commands on the MDS/MDT (both on one
>> machine)
>> to get all the routes somehow?
>
> You will only have one nid per logical LNET network (tcp0, tcp1, 
> o2ib0, o2ib1, etc).  So in your setup, you will only have one nid, for 
> tcp0, regardless of how many eth devices you include.  If you did 
> something like this:
>
> tcp0(eth0),tcp1(eth1),tcp2(eth2),tcp3(eth3), you would end up with 4 
> nids, but that isn''t what you want.  It sounds like you have  it 
> configured correctly.
>
> If you were to do bonding, you would use the standard linux bonding 
> methods to bond eth0-eth3 -> bond0, so your lnet options line would 
> look like this:
>
> options lnet networks=tcp(bond0)
>
> and you would still have one nid.
>
> Bonding is great if all your ethernet devices are plugged into a 
> single switch that supports bonding.  If your network interfaces are 
> on different ethernet switches for redundancy (or because you are 
> lacking in ports), you cannot use bonding, and must go mutlihomed.
>we decided to go for bonding, everyone on the same switch

thanks

Mario David
> All of our servers (500+) are multihomed.
>
> -Marc
>
> ----
> D. Marc Stearman
> LC Lustre Administration Lead
> marc at llnl.gov
> 925.423.9670
> Pager: 1.888.203.0641
>

mdavid

2008-Jul-09 12:07 UTC

head link

[Lustre-discuss] multihomed OST''s configuration

hi Brian
I was "mislead" by what it says in the ops manual, 12.1 chapter

Lustre can use multiple NICs without bonding. There is a difference in
performance when Lustre uses multiple NICs versus when it uses bonding
NICs.

though here it says "multiple NICS" not multihomed configurations.

Anyway I still don''t know how to configure "multiple NICS"
both from
the point of view of the OS and Lustre
note all the ethXX are in the same LAN, and connected to the same card
in the switch
if on the Lustre OST''s I put
options lnet networks=tcp(eth0,eth1,eth2,eth3)

how is it configured each ethX
in principle I would have a single IP for the server

cheers

Mario David

On Jul 8, 1:25?pm, "Brian J. Murrell" <Brian.Murr... at Sun.COM>
wrote:> On Mon, 2008-07-07 at 03:13 -0700, mdavid wrote:
> > hi list
> > I am a new to lustre (1 week old) and this list.
> > I have some Dell PE1950 servers with MD1000 enclosures (scientific
> > linux 5 == RHEL5 x86_54) on them and lustre 1.6.5, with lustre patched
> > kernels on them
>
> > on a first try (indeed it was the second), I managed to have a lustre
> > up and running OK, now
>
> > each dell server has 4 times 1Gb interfaces, and I want to take profit
> > from them all
> > either I try bonding them, or go for multihomed (which is my first
> > try)
>
> If what you want is to get the bandwidth of all 4 interfaces to the
> Lustre servers then you really do want bonding.
>
> Can you explain why you think you want multihoming vs. bonding? ?Maybe
> I''m misunderstanding your goal.
>
> b.
>
> ?signature.asc
> 1KDownload
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-disc... at
lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss

Brian J. Murrell

2008-Jul-10 13:37 UTC

head link

[Lustre-discuss] multihomed OST''s configuration

On Wed, 2008-07-09 at 05:07 -0700, mdavid wrote:> 
> Anyway I still don''t know how to configure "multiple
NICS" both from
> the point of view of the OS and Lustre
> note all the ethXX are in the same LAN, and connected to the same card
> in the switch
Even at the OS level, that''s a very odd, if not outright broken
configuration.  I can''t imagine a valid use case for having multiple
interfaces (each with their own address) on the same subnet.
> if on the Lustre OST''s I put
> options lnet networks=tcp(eth0,eth1,eth2,eth3)
I really think to achieve what you are trying to do, you want to bond
them.
> in principle I would have a single IP for the server
Then you need Linux bonding.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080710/b94d8749/attachment.bin

Klaus Steden

2008-Jul-10 19:12 UTC

head link

[Lustre-discuss] multihomed OST''s configuration

Hi Mario,

Lustre will, if not instructed otherwise, bind to all available NICs on the
system. I''ve used Lustre extensively with LACP aggregate groups, and it
performs quite well.

Configuring multiple NICs from the same host into the same VLAN is something
of a non-sensical configuration unless you''re running some kind of
bizarre
failover scenario, but if they''re all going to the same switch,
that''s an
impossibility. This kind of configuration would also make ordinary TCP/IP
routing somewhat funky.

Use NIC bonding, and configure your switch as appropriate to do likewise.
Cisco, Foundry, Extreme, Juniper, Alcatel, Netgear and a number of others
all support LACP in their L3 edge switches, and it''s a standard feature
of
any core switch.

Once you''ve set up the switch and the OS, instruct Lustre to use the
bond by
putting "options lnet networks=tcp(bond0)" in your /etc/modprobe.conf
and it
will take care of the rest.

cheers,
Klaus

On 7/9/08 5:07 AM, "mdavid" <david at lip.pt>did etch on stone
tablets:
> hi Brian
> I was "mislead" by what it says in the ops manual, 12.1 chapter
> 
> Lustre can use multiple NICs without bonding. There is a difference in
> performance when Lustre uses multiple NICs versus when it uses bonding
> NICs.
> 
> though here it says "multiple NICS" not multihomed
configurations.
> 
> Anyway I still don''t know how to configure "multiple
NICS" both from
> the point of view of the OS and Lustre
> note all the ethXX are in the same LAN, and connected to the same card
> in the switch
> if on the Lustre OST''s I put
> options lnet networks=tcp(eth0,eth1,eth2,eth3)
> 
> how is it configured each ethX
> in principle I would have a single IP for the server
> 
> cheers
> 
> Mario David
> 
> On Jul 8, 1:25?pm, "Brian J. Murrell" <Brian.Murr... at
Sun.COM> wrote:
>> On Mon, 2008-07-07 at 03:13 -0700, mdavid wrote:
>>> hi list
>>> I am a new to lustre (1 week old) and this list.
>>> I have some Dell PE1950 servers with MD1000 enclosures (scientific
>>> linux 5 == RHEL5 x86_54) on them and lustre 1.6.5, with lustre
patched
>>> kernels on them
>> 
>>> on a first try (indeed it was the second), I managed to have a
lustre
>>> up and running OK, now
>> 
>>> each dell server has 4 times 1Gb interfaces, and I want to take
profit
>>> from them all
>>> either I try bonding them, or go for multihomed (which is my first
>>> try)
>> 
>> If what you want is to get the bandwidth of all 4 interfaces to the
>> Lustre servers then you really do want bonding.
>> 
>> Can you explain why you think you want multihoming vs. bonding? ?Maybe
>> I''m misunderstanding your goal.
>> 
>> b.
>> 
>> ?signature.asc
>> 1KDownload
>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-disc... at
lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustr
>> e-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Satoshi Isono

2008-Jul-11 02:27 UTC

head link

[Lustre-discuss] Making a init script

Hi all,

I am using lustre-1.6.4.3 on RHEL 5.1 with kernel 2.6.18. And then, I 
found Lustre bug statahead function on this Lustre version. I try to 
write the following entry into the Lustre init script in order to set 
this value permently. In which file should I write?

echo 0 > /proc/fs/lustre/llite/*/statahead_max

I have already tested. In case of editting /etc/rc.d/ec.local, it did 
not work. I assume I have to do this operation before mounting 
Lustre, after loading Lustre modules.

Could you advice me.

Regards,
Satoshi

Mario David

2008-Jul-11 10:34 UTC

head link

[Lustre-discuss] multihomed OST''s configuration

Hi Klaus

thanks for the answer
we are doing just what you describe

cheers

Mario

Klaus Steden wrote:> Hi Mario,
>
> Lustre will, if not instructed otherwise, bind to all available NICs on the
> system. I''ve used Lustre extensively with LACP aggregate groups,
and it
> performs quite well.
>
> Configuring multiple NICs from the same host into the same VLAN is
something
> of a non-sensical configuration unless you''re running some kind of
bizarre
> failover scenario, but if they''re all going to the same switch,
that''s an
> impossibility. This kind of configuration would also make ordinary TCP/IP
> routing somewhat funky.
>
> Use NIC bonding, and configure your switch as appropriate to do likewise.
> Cisco, Foundry, Extreme, Juniper, Alcatel, Netgear and a number of others
> all support LACP in their L3 edge switches, and it''s a standard
feature of
> any core switch.
>
> Once you''ve set up the switch and the OS, instruct Lustre to use
the bond by
> putting "options lnet networks=tcp(bond0)" in your
/etc/modprobe.conf and it
> will take care of the rest.
>
> cheers,
> Klaus
>
> On 7/9/08 5:07 AM, "mdavid" <david at lip.pt>did etch on
stone tablets:
>
>   
>> hi Brian
>> I was "mislead" by what it says in the ops manual, 12.1
chapter
>>
>> Lustre can use multiple NICs without bonding. There is a difference in
>> performance when Lustre uses multiple NICs versus when it uses bonding
>> NICs.
>>
>> though here it says "multiple NICS" not multihomed
configurations.
>>
>> Anyway I still don''t know how to configure "multiple
NICS" both from
>> the point of view of the OS and Lustre
>> note all the ethXX are in the same LAN, and connected to the same card
>> in the switch
>> if on the Lustre OST''s I put
>> options lnet networks=tcp(eth0,eth1,eth2,eth3)
>>
>> how is it configured each ethX
>> in principle I would have a single IP for the server
>>
>> cheers
>>
>> Mario David
>>
>> On Jul 8, 1:25 pm, "Brian J. Murrell" <Brian.Murr... at
Sun.COM> wrote:
>>     
>>> On Mon, 2008-07-07 at 03:13 -0700, mdavid wrote:
>>>       
>>>> hi list
>>>> I am a new to lustre (1 week old) and this list.
>>>> I have some Dell PE1950 servers with MD1000 enclosures
(scientific
>>>> linux 5 == RHEL5 x86_54) on them and lustre 1.6.5, with lustre
patched
>>>> kernels on them
>>>>         
>>>> on a first try (indeed it was the second), I managed to have a
lustre
>>>> up and running OK, now
>>>>         
>>>> each dell server has 4 times 1Gb interfaces, and I want to take
profit
>>>> from them all
>>>> either I try bonding them, or go for multihomed (which is my
first
>>>> try)
>>>>         
>>> If what you want is to get the bandwidth of all 4 interfaces to the
>>> Lustre servers then you really do want bonding.
>>>
>>> Can you explain why you think you want multihoming vs. bonding? 
Maybe
>>> I''m misunderstanding your goal.
>>>
>>> b.
>>>
>>>  signature.asc
>>> 1KDownload
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-disc... at
lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustr
>>> e-discuss
>>>       
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>     
>
>

Andreas Dilger

2008-Jul-11 21:19 UTC

head link

[Lustre-discuss] Making a init script

On Jul 11, 2008  11:27 +0900, Satoshi Isono wrote:> I am using lustre-1.6.4.3 on RHEL 5.1 with kernel 2.6.18. And then, I 
> found Lustre bug statahead function on this Lustre version. I try to 
> write the following entry into the Lustre init script in order to set 
> this value permently. In which file should I write?
> 
> echo 0 > /proc/fs/lustre/llite/*/statahead_max
> 
> I have already tested. In case of editting /etc/rc.d/ec.local, it did 
> not work. I assume I have to do this operation before mounting 
> Lustre, after loading Lustre modules.
The rc.local script is likely run BEFORE Lustre is mounted.  The above
parameter needs to be set AFTER Lustre is mounted.  You can add a
permanent configuration parameter with:

	lctl conf_param lustre.llite.statahead_max=0

Note that this is permanent and you will need to reset this parameter
later with a new setting to re-enable the statahead:

	lctl conf_param lustre.llite.statahead_max=32

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Satoshi Isono

2008-Jul-12 05:07 UTC

head link

[Lustre-discuss] Making a init script

Andreas,

Thanks for useful information.
Is what is necessary to execute this command on a client only at 
once? And this is performed by all clients, right?

Also, I want to get more information about this function atatahead. 
 From what can I get information?

I realize this problem is bug (Bugzilla: #15406 #15169 #15175). 
Better solution is to upgrade Lustre 1.6.5.1, how do you think so?

Regards,
Satoshi

At 08/07/12???06:19, Andreas Dilger wrote:>On Jul 11, 2008  11:27 +0900, Satoshi Isono wrote:
> > I am using lustre-1.6.4.3 on RHEL 5.1 with kernel 2.6.18. And then, I
> > found Lustre bug statahead function on this Lustre version. I try to
> > write the following entry into the Lustre init script in order to set
> > this value permently. In which file should I write?
> >
> > echo 0 > /proc/fs/lustre/llite/*/statahead_max
> >
> > I have already tested. In case of editting /etc/rc.d/ec.local, it did
> > not work. I assume I have to do this operation before mounting
> > Lustre, after loading Lustre modules.
>
>The rc.local script is likely run BEFORE Lustre is mounted.  The above
>parameter needs to be set AFTER Lustre is mounted.  You can add a
>permanent configuration parameter with:
>
>         lctl conf_param lustre.llite.statahead_max=0
>
>Note that this is permanent and you will need to reset this parameter
>later with a new setting to re-enable the statahead:
>
>         lctl conf_param lustre.llite.statahead_max=32
>
>Cheers, Andreas
>--
>Andreas Dilger
>Sr. Staff Engineer, Lustre Group
>Sun Microsystems of Canada, Inc.

Andreas Dilger

2008-Jul-14 23:28 UTC

head link

[Lustre-discuss] Making a init script

On Jul 12, 2008  14:07 +0900, Satoshi Isono wrote:> Thanks for useful information.
> Is what is necessary to execute this command on a client only at once? 
> And this is performed by all clients, right?
This command should be run once on the MGS node (usually the same as the
MDS node).  It will take effect for already mounted clients very soon, and
for all new mounts at mount time.
> Also, I want to get more information about this function atatahead. From 
> what can I get information?
>
> I realize this problem is bug (Bugzilla: #15406 #15169 #15175). Better 
> solution is to upgrade Lustre 1.6.5.1, how do you think so?
Yes, that is another possibility.
> At 08/07/12???06:19, Andreas Dilger wrote:
>> On Jul 11, 2008  11:27 +0900, Satoshi Isono wrote:
>> > I am using lustre-1.6.4.3 on RHEL 5.1 with kernel 2.6.18. And
then, I
>> > found Lustre bug statahead function on this Lustre version. I try
to
>> > write the following entry into the Lustre init script in order to
set
>> > this value permently. In which file should I write?
>> >
>> > echo 0 > /proc/fs/lustre/llite/*/statahead_max
>> >
>> > I have already tested. In case of editting /etc/rc.d/ec.local, it
did
>> > not work. I assume I have to do this operation before mounting
>> > Lustre, after loading Lustre modules.
>>
>> The rc.local script is likely run BEFORE Lustre is mounted.  The above
>> parameter needs to be set AFTER Lustre is mounted.  You can add a
>> permanent configuration parameter with:
>>
>>         lctl conf_param lustre.llite.statahead_max=0
>>
>> Note that this is permanent and you will need to reset this parameter
>> later with a new setting to re-enable the statahead:
>>
>>         lctl conf_param lustre.llite.statahead_max=32
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Sr. Staff Engineer, Lustre Group
>> Sun Microsystems of Canada, Inc.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Jul 2008 - multihomed OST's configuration

[Lustre-discuss] multihomed OST''s configuration

[Lustre-discuss] multihomed OST''s configuration

[Lustre-discuss] multihomed OST''s configuration

[Lustre-discuss] multihomed OST''s configuration

[Lustre-discuss] multihomed OST''s configuration

[Lustre-discuss] multihomed OST''s configuration

[Lustre-discuss] multihomed OST''s configuration

[Lustre-discuss] Making a init script

[Lustre-discuss] multihomed OST''s configuration

[Lustre-discuss] Making a init script

[Lustre-discuss] Making a init script

[Lustre-discuss] Making a init script