Hello all, i am running lustre 1.6.4 on my redhat advanced server 4 update 4.as all know, the mgs/mds(out mgs/mds on the same server) server is the center server, and the cluster is unusable if the mgs/mds is dead. so i setup heartbeat between the two mgs/mds server2, and use drbd(network raid1) as the backend storage to sync data each other. to make the setup easily, i use a virtual ip(fload ip) between two mgs servers(assume mgs0: 10.0.0.1,mgs1:10.0.0.2, vip: 10.0.0.3, oss0:10.0.0.11, oss1:10.0.0.12, etc.). mgs0 as the primary mgs, and mgs1 as sencondary mgs. when use mgs0, the vip 10.0.0.3 is a alias addr on server0, and when mgs0 dead, the vip 10.0.0.3 will move to mgs1(controled by heartbeat). so the mgs hight avalibal is transparent to all the oss servers, clients, and they can only know the mgs 10.0.0.3, and mkfs with param --mgsnode=10.0.0.3 on oss and mount cluster with nid of vip on client. now the problem is: when i add the vip to the mgs0, and mkfs.lustre with --mgsnode=10.0.0.3, but the ost can''t found the mgs when mount the ost.the same error whe mount the cluster,but but when use 10.0.0.1. client# mount -t lustre 10.0.0.3 at tcp0:/mycfs /cfs/client/ client# mount.lustre: mount 10.0.0.3 at tcp0:/mycfs at /cfs/client failed: Cannot send after transport endpoint shutdown -- Best regards Felix New
On Tue, Mar 11, 2008 at 09:54:44AM +0800, Felix New wrote:> to make the setup easily, i use a virtual ip(fload ip) between two > mgs servers(assume mgs0: 10.0.0.1,mgs1:10.0.0.2, vip: 10.0.0.3, > oss0:10.0.0.11, oss1:10.0.0.12, etc.). mgs0 as the primary mgs, and > mgs1 as sencondary mgs. when use mgs0, the vip 10.0.0.3 is a alias > addr on server0, and when mgs0 dead, the vip 10.0.0.3 will move to > mgs1(controled by heartbeat). so the mgs hight avalibal is transparent > to all the oss servers, clients, and they can only know the mgsLustre servers are stateful. You don''t need to use a virtual ip since the protocol takes care of reconnections/recovery.> 10.0.0.3, and mkfs with param --mgsnode=10.0.0.3 on oss and mount > cluster with nid of vip on client.You just have to list the nids of all the mgs nodes at mkfs time (i.e. "--mgsnode=10.0.0.1 --mgsnode=10.0.0.2" in your case). See section "4.2.2.1 Failover" of the Lustre manual: http://manual.lustre.org/manual/LustreManual16_HTML/LustreInstallation.html#50491441_pgfId-1286309> now the problem is: when i add the vip to the mgs0, and mkfs.lustre > with --mgsnode=10.0.0.3, but the ost can''t found the mgs when mount > the ost.the same error whe mount the cluster,but but when use > 10.0.0.1. > client# mount -t lustre 10.0.0.3 at tcp0:/mycfs /cfs/client/ > client# mount.lustre: mount 10.0.0.3 at tcp0:/mycfs at /cfs/client > failed: Cannot send after transport endpoint shutdownThen, you should be able to mount the lustre fs on the client side through the following command: client# mount -t lustre 10.0.0.1:10.0.0.2:/mycfs /cfs/client/ comma-separation is used to list nids of the same host (a server box can have multiple network interfaces and the other nodes can then choose the nid appropriate to their own network interface) whereas nids of different nodes must be delimited by colons (see the Lustre manual). Cheers, Johann
Felix New wrote:> Hello all, > i am running lustre 1.6.4 on my redhat advanced server 4 update > 4.as all know, the mgs/mds(out mgs/mds on the same server) server is > the center server, and the cluster is unusable if the mgs/mds is dead. > so i setup heartbeat between the two mgs/mds server2, and use > drbd(network raid1) as the backend storage to sync data each other. > > to make the setup easily, i use a virtual ip(fload ip) between two > mgs servers(assume mgs0: 10.0.0.1,mgs1:10.0.0.2, vip: 10.0.0.3, > oss0:10.0.0.11, oss1:10.0.0.12, etc.). mgs0 as the primary mgs, and > mgs1 as sencondary mgs. when use mgs0, the vip 10.0.0.3 is a alias > addr on server0, and when mgs0 dead, the vip 10.0.0.3 will move to > mgs1(controled by heartbeat). so the mgs hight avalibal is transparent > to all the oss servers, clients, and they can only know the mgs > 10.0.0.3, and mkfs with param --mgsnode=10.0.0.3 on oss and mount > cluster with nid of vip on client. > > now the problem is: when i add the vip to the mgs0, and mkfs.lustre > with --mgsnode=10.0.0.3, but the ost can''t found the mgs when mount > the ost.the same error whe mount the cluster,but but when use > 10.0.0.1. > client# mount -t lustre 10.0.0.3 at tcp0:/mycfs /cfs/client/ > client# mount.lustre: mount 10.0.0.3 at tcp0:/mycfs at /cfs/client > failed: Cannot send after transport endpoint shutdown > >LNET does NOT support virtual IPs. This should be mentioned in the manual. Johann Lombardi has proved an excellent explaination of the proper configuration. cliffw