Hello, on a setup with o2ib and ethernet configured on both, lustre servers and clients I''d expect that unplugging the infiniband cable on one of the OSSes would lead the client to switch over to ethernet and continue I/O. Unfortunately this doesn''t happen, the client I/O stalls and continues only after the IB cable is plugged back. Is there anything wrong with the setup? It''s with pairwise failover servers, so maybe that''s part of the problem? Is the order of failnode arguments correct? Here''s what we have: (sorry for the many details...) MGS/MGT are mounted on the same node: Target: MGS Index: unassigned Lustre FS: lustre Mount type: ldiskfs Flags: 0x174 (MGS needs_index first_time update writeconf ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: failover.node=10.3.0.227 at o2ib,192.168.130.227 at tcp,10.3.0.226 at o2ib,192.168.130.226 at tcp mgsnode=10.3.0.227 at o2ib,192.168.130.227 at tcp,10.3.0.226 at o2ib,192.168.130.226 at tcp Target: lustre-MDT0000 Index: 0 Lustre FS: lustre Mount type: ldiskfs Flags: 0x1 (MDT ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=10.3.0.226 at o2ib,192.168.130.226 at tcp,10.3.0.227 at o2ib,192.168.130.227 at tcp failover.node=10.3.0.227 at o2ib,192.168.130.227 at tcp mdt.group_upcall=/usr/sbin/l_getgroups OST: parameters were rewritten with tunefs.lustre: tunefs.lustre --ost --erase-param --mgsnode=10.3.0.226 at o2ib0,192.168.130.226 at tcp0:10.3.0.227 at o2ib0,192.168.130.227 at tcp0 --failnode=10.3.0.229 at o2ib0,192.168.130.229 at tcp0 --writeconf /dev/mpath/ost100 Client notices the failed OST path: # lfs check servers lustre-MDT0000-mdc-ffff810007107000 active. error: check ''lustre-OST0000-osc-ffff810007107000'': Connection timed out (110) but tries to connect to the failover OSS partner instead of trying the other network: netptune121: LustreError: 11-0: an error occurred while communicating with 10.3.0.229 at o2ib. The ost_connect operation failed with -19 doss2: LustreError: 137-5: UUID ''lustre-OST0000_UUID'' is not available for connect (no target) Thanks in advance for any hint... Best regards, Erich <br><br>
Erich Focht wrote:> Hello, > > on a setup with o2ib and ethernet configured on both, lustre servers and > clients I''d expect that unplugging the infiniband cable on one of the > OSSes would lead the client to switch over to ethernet and continue I/O.No, unfortunately that''s not how multiple interfaces work with LNET. When multiple interfaces are present at connection setup we pick the ''best'' route. Once we establish a connection, we expect that connection to continue. Connections do not fail over if multiple interfaces are present.> Unfortunately this doesn''t happen, the client I/O stalls and continues > only after the IB cable is plugged back.Yup, that''s expected behaviour.> > Is there anything wrong with the setup? It''s with pairwise failover > servers, > so maybe that''s part of the problem? Is the order of failnode arguments > correct?The setup appears to be correct, all the failnode does is complicate the situation slightly, as the failnode is tried first instead of just failing right away. You have a list of failover connections for each network type. LNET will try the failover only on the common network. So a tcp connection would first retry the tcp address, and as you show the IB side attempts retry on the IB failnode. cliffw> > Here''s what we have: (sorry for the many details...) > > MGS/MGT are mounted on the same node: > Target: MGS > Index: unassigned > Lustre FS: lustre > Mount type: ldiskfs > Flags: 0x174 (MGS needs_index first_time update writeconf ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: > failover.node=10.3.0.227 at o2ib,192.168.130.227 at tcp,10.3.0.226 at o2ib,192.168.130.226 at tcp > mgsnode=10.3.0.227 at o2ib,192.168.130.227 at tcp,10.3.0.226 at o2ib,192.168.130.226 at tcp > > Target: lustre-MDT0000 > Index: 0 > Lustre FS: lustre > Mount type: ldiskfs > Flags: 0x1 (MDT ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: > mgsnode=10.3.0.226 at o2ib,192.168.130.226 at tcp,10.3.0.227 at o2ib,192.168.130.227 at tcp > failover.node=10.3.0.227 at o2ib,192.168.130.227 at tcp > mdt.group_upcall=/usr/sbin/l_getgroups > > OST: parameters were rewritten with tunefs.lustre: > tunefs.lustre --ost --erase-param > --mgsnode=10.3.0.226 at o2ib0,192.168.130.226 at tcp0:10.3.0.227 at o2ib0,192.168.130.227 at tcp0 > --failnode=10.3.0.229 at o2ib0,192.168.130.229 at tcp0 --writeconf > /dev/mpath/ost100 > > > Client notices the failed OST path: > # lfs check servers > lustre-MDT0000-mdc-ffff810007107000 active. > error: check ''lustre-OST0000-osc-ffff810007107000'': Connection timed out > (110) > > but tries to connect to the failover OSS partner instead of trying the > other > network: > netptune121: LustreError: 11-0: an error occurred while communicating > with 10.3.0.229 at o2ib. The ost_connect operation failed with -19 > doss2: LustreError: 137-5: UUID ''lustre-OST0000_UUID'' is not available > for connect (no target) > > Thanks in advance for any hint... > > Best regards, > Erich > <br><br> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hi Cliff, thanks for the answer. I guess I had a wrong picture on how this works. Will get the system to fail over when a link breaks. Regards, Erich On Dienstag 08 April 2008, Cliff White wrote:> Erich Focht wrote: > > Hello, > > > > on a setup with o2ib and ethernet configured on both, lustre servers and > > clients I''d expect that unplugging the infiniband cable on one of the > > OSSes would lead the client to switch over to ethernet and continue I/O. > > No, unfortunately that''s not how multiple interfaces work with LNET. > When multiple interfaces are present at connection setup we pick the > ''best'' route. > Once we establish a connection, we expect that connection to continue. > Connections do not fail over if multiple interfaces are present. > > > Unfortunately this doesn''t happen, the client I/O stalls and continues > > only after the IB cable is plugged back. > > Yup, that''s expected behaviour. > > > > Is there anything wrong with the setup? It''s with pairwise failover > > servers, > > so maybe that''s part of the problem? Is the order of failnode arguments > > correct? > > The setup appears to be correct, all the failnode does is complicate the > situation slightly, as the failnode is tried first instead of just > failing right away. You have a list of failover connections for each > network type. LNET will try the failover only on the common network. So > a tcp connection would first retry the tcp address, and as you show the > IB side attempts retry on the IB failnode. > cliffw > > > > Here''s what we have: (sorry for the many details...) > > > > MGS/MGT are mounted on the same node: > > Target: MGS > > Index: unassigned > > Lustre FS: lustre > > Mount type: ldiskfs > > Flags: 0x174 (MGS needs_index first_time update writeconf ) > > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > > Parameters: > > failover.node=10.3.0.227 at o2ib,192.168.130.227 at tcp,10.3.0.226 at o2ib,192.168.130.226 at tcp > > mgsnode=10.3.0.227 at o2ib,192.168.130.227 at tcp,10.3.0.226 at o2ib,192.168.130.226 at tcp > > > > Target: lustre-MDT0000 > > Index: 0 > > Lustre FS: lustre > > Mount type: ldiskfs > > Flags: 0x1 (MDT ) > > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > > Parameters: > > mgsnode=10.3.0.226 at o2ib,192.168.130.226 at tcp,10.3.0.227 at o2ib,192.168.130.227 at tcp > > failover.node=10.3.0.227 at o2ib,192.168.130.227 at tcp > > mdt.group_upcall=/usr/sbin/l_getgroups > > > > OST: parameters were rewritten with tunefs.lustre: > > tunefs.lustre --ost --erase-param > > --mgsnode=10.3.0.226 at o2ib0,192.168.130.226 at tcp0:10.3.0.227 at o2ib0,192.168.130.227 at tcp0 > > --failnode=10.3.0.229 at o2ib0,192.168.130.229 at tcp0 --writeconf > > /dev/mpath/ost100 > > > > > > Client notices the failed OST path: > > # lfs check servers > > lustre-MDT0000-mdc-ffff810007107000 active. > > error: check ''lustre-OST0000-osc-ffff810007107000'': Connection timed out > > (110) > > > > but tries to connect to the failover OSS partner instead of trying the > > other > > network: > > netptune121: LustreError: 11-0: an error occurred while communicating > > with 10.3.0.229 at o2ib. The ost_connect operation failed with -19 > > doss2: LustreError: 137-5: UUID ''lustre-OST0000_UUID'' is not available > > for connect (no target) > > > > Thanks in advance for any hint... > > > > Best regards, > > Erich > > <br><br>