I have a lustre test environment. I''m currently testing network
failover.
Failover works fine on subnet 1. When I turn off subnet 1 on lustre servers.
The clients can''t
recover on to subnet 2.
Here is the configuration. All the servers and clients are on the same two
subnets.
I tried mounting the lustre files systems with this command, but the
failover to network 2 still failed.
mount -t lustre -o flock
10.244.1.120 at tcp0:10.244.1.121 at tcp0:10.244.2.120 at tcp1:10.244.2.121 at
tcp1:/web
fs /imatrix
Any ideas?
Ed
Network
-----------
Subnet1 - 10.244.1.0\24
Subnet2 - 10.244.2.0\24
Server1 - 10.244.1.120, 10.244.2.120
Server2 - 10.244.1.121, 10.244.2.121
Server3 - 10.244.1.100, 10.244.2.100
Client1 - 10.244.1.101, 10.244.2.101
Client2 - 10.244.1.102, 10.244.2.102
Client3 - 10.244.1.122, 10.244.2.122
Client4 - 10.244.1.123, 10.244.2.123
Client5 - 10.244.1.250, 10.244.2.250
Lustre Configuration
-------------------------
Server1 - mgs webmdt webost1 mailost2
Server2 - mailmdt mailos1 webost2
Server3 - devmdt devost1
# MGS Node on server1
tunefs.lustre --erase-param --failnode=10.244.1.121 at tcp0 --writeconf
/dev/mapper/lustremgs
#MDT nodes on server1
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.121 at tcp0 --writeconf
/dev/mapper/webmdt
#MDT nodes on server2
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.120 at tcp0 --writeconf
/dev/mapper/mailmdt
#MDT nodes on server3
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --writeconf /dev/mapper/devmdt
#OST nodes on server1
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.121 at tcp0
--param="failover.mode=failout" --writeconf
/dev/mapper/webost1
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.121 at tcp0
--param="failover.mode=failout" --writeconf
/dev/mapper/mailost2
#OST nodes on server2
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.120 at tcp0
--param="failover.mode=failout" --writeconf
/dev/mapper/webost2
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.120 at tcp0
--param="failover.mode=failout" --writeconf
/dev/mapper/mailost1
#OST nodes on server3
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.121 at tcp0
--param="failover.mode=failout" --writeconf
/dev/mapper/devost1
LNET entry in modprobe.d/lustre.conf
Server1 - options lnet networks=tcp0(bond0),tcp1(bond1)
Server2 - options lnet networks=tcp0(bond0),tcp1(bond1)
Server3 - options lnet network= tcp0(eth0),tcp1(eth1)
Five Clients
Client1 - options lnet networks=tcp0(eth0),tcp1(eth1)
Client2 - options lnet networks=tcp0(eth0),tcp1(eth1)
Client3 - options lnet networks=tcp0(eth0),tcp1(eth1)
Client4 - options lnet networks=tcp0(eth0),tcp1(eth1)
Client5 - options lnet networks=tcp0(eth0),tcp1(eth1)
Mount Command
----------------------
#Mounts on server1
mount -t lustre -o abort_recov /dev/mapper/lustremgs /lustremgs
mount -t lustre -o abort_recov /dev/mapper/webmdt /webmst
mount -t lustre -o abort_recov /dev/mapper/webost1 /webost1
mount -t lustre -o abort_recov /dev/mapper/mailost2 /mailost2
#Mounts on server2
mount -t lustre -o abort_recov /dev/mapper/webost2 /webost2
mount -t lustre -o abort_recov /dev/mapper/mailmdt /mailmst
mount -t lustre -o abort_recov /dev/mapper/mailost1 /mailost1
#Mounts on server3
mount -t lustre -o abort_recov /dev/mapper/devmdt /homemst
mount -t lustre -o abort_recov /dev/mapper/devost1 /homeost1
#Client Mounts
mount -t lustre -o flock 10.244.1.120 at tcp0:10.244.1.121 at tcp0:/webfs
/imatrix
mount -t lustre -o flock 10.244.1.120 at tcp0:10.244.1.121 at tcp0:/mailfs
/var/qmail
mount -t lustre -o flock 10.244.1.120 at tcp0:10.244.1.121 at tcp0:/devfs /home
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121108/a10a81b2/attachment-0001.html