Greetings! I am working with Lustre-2.1.2 on RHEL 6.2. First I configured it using the standard defaults over TCP/IP. Everything worked very nicely usnig a real, static --mgsnode=a.b.c.x value which was the actual IP of the MGS/MDS system1 node. I am now trying to integrate it with Pacemaker-1.1.7. I believe I have most of the set-up completed with a particular exception. The "lctl ping" command cannot ping the pacemaker IP alias (say a.b.c.d). The generic ping command in RHEL 6.2 can successfully access the interface. The Pacemaker alias IP (for failover of the combnied MGSMDS node with Fibre Channel multipath storage shared between both MGS/MDS-configured machines) works in and of itself. I tested with an apache service. The Pacemaker will correctly fail over the MGS/MDS from system1 to system2 properly. If I go to system2 then my Lustre file system stops because it cannot get to the alias IP number. I did configure the lustre OSTs to use --mgsnode=a.b.c.d (a.b.c.d representing my Pacemaker IP alias). A tunefs.lustre confirms the alias IP number. The alias IP number does not appear in LNET (lctl list_nids), and "lctl ping a.b.c.d" fails. Should this IP alias go into the LNET data base? If yes, how? What steps should I take to generate a successful "lctl ping a.b.c.d"? Thanks for reading! Cheers, megan
Megan, lnet pings aren''t the same as tcpip/udp pings. An lnet ping ''lctl ping'' would need to touch an active lnet instance on the target address. I don''t think you can bind lnet to a pacemaker virtual IP but I''ll let someone smarter than me on this list confirm or correct me. In any event an lnet ping and udp ping are completely separate animals. --Jeff Sent from my iPhone On Nov 1, 2012, at 21:04, "Ms. Megan Larko" <dobsonunit at gmail.com> wrote:> Greetings! > > I am working with Lustre-2.1.2 on RHEL 6.2. First I configured it > using the standard defaults over TCP/IP. Everything worked very > nicely usnig a real, static --mgsnode=a.b.c.x value which was the > actual IP of the MGS/MDS system1 node. > > I am now trying to integrate it with Pacemaker-1.1.7. I believe I > have most of the set-up completed with a particular exception. The > "lctl ping" command cannot ping the pacemaker IP alias (say a.b.c.d). > The generic ping command in RHEL 6.2 can successfully access the > interface. The Pacemaker alias IP (for failover of the combnied > MGSMDS node with Fibre Channel multipath storage shared between both > MGS/MDS-configured machines) works in and of itself. I tested with > an apache service. The Pacemaker will correctly fail over the > MGS/MDS from system1 to system2 properly. If I go to system2 then my > Lustre file system stops because it cannot get to the alias IP number. > > I did configure the lustre OSTs to use --mgsnode=a.b.c.d (a.b.c.d > representing my Pacemaker IP alias). A tunefs.lustre confirms the > alias IP number. The alias IP number does not appear in LNET (lctl > list_nids), and "lctl ping a.b.c.d" fails. > > Should this IP alias go into the LNET data base? If yes, how? What > steps should I take to generate a successful "lctl ping a.b.c.d"? > > Thanks for reading! > Cheers, > megan > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Fri, Nov 02, 2012 at 12:04:02AM -0400, Ms. Megan Larko wrote:> ...... > What steps should I take to generate a successful "lctl ping a.b.c.d"?There must be a LNet instance running over SOCKLND on a.b.c.d. - Isaac
Greetings, My present solution for my corosync/pacemaker control of my Lustre filesystem availability was to make a Linux Standards Base (LSB) Sys V init script for my IB0 service and then I could use the corosync primitive to control the IB network (and therefore the MGS). Being that I did not know how to make the corosync alias IP accessible to LNET for a successful lctl ping required for Lustre OSS nodes to properly communicate with the MGS/MDS, I chose to point to the real InfiniBand ib0 IP and coorsync align that network address with the system servinig the fibre channel multipath mgs/mdt disk. In this way the ost disks have one and only one mgsnode (no failover because the IB0 address fails over). This has been successful in my TCP test (an LSB-compliant service for eth1). I plan on implementing this week when the IB hardware comes in. Thanks for your help. I appreciate it. Cheers, megan