hi there, i have an issue with failover of the MGS device within my cluster. i''m building a simple lustre environment; just one lustre file system (testfs) ... i have a two node cluster for my MGS/MDT; this is for an active/passive config with the MGS and MDT on different devices and mounted separately (not co-locating). i have a two node cluster for my OSTs, in an active/active config in that the first OST is on node one and the second OST is on node two. if i have the above, then heartbeat is happy with the OSTs and the MDT mountpoints being mounted on either node in the cluster. the MGS however is not. i get the following message when it tries to mount on the alternative node: ---8<--- mount.lustre: mount /dev/sdb at /lustre/testfs/mgs failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. ---8<--- i''ve noticed that if i consolidate and have my MGS and MDT on the same device/mountpoint on the MDS cluster nodes, all is well and the file system mounts on the alternative node perfectly. any ideas? i have ensured i created the file systems with --failnode and --mgsnode= for each MDS server, but no joy. i can see in a previous post to lustre-discuss someone having a similar, if not the same issue: http://lists.lustre.org/pipermail/lustre-discuss/2008-September/008634.html cheers
Sounds like a problem with your mkfs commands. Please send the full "mkfs" commands for the MGS and MDT luns, and the IP addresses for the primary and secondary nodes ("lctl list_nids"). Kevin neil rutter wrote:> hi there, > > i have an issue with failover of the MGS device within my cluster. > > i''m building a simple lustre environment; just one lustre file system > (testfs) ... > > i have a two node cluster for my MGS/MDT; this is for an active/passive > config with the MGS and MDT on different devices and mounted separately > (not co-locating). > > i have a two node cluster for my OSTs, in an active/active config in > that the first OST is on node one and the second OST is on node two. > > if i have the above, then heartbeat is happy with the OSTs and the MDT > mountpoints being mounted on either node in the cluster. the MGS however > is not. i get the following message when it tries to mount on the > alternative node: > > ---8<--- > mount.lustre: mount /dev/sdb at /lustre/testfs/mgs failed: Invalid argument > This may have multiple causes. > Are the mount options correct? > Check the syslog for more info. > ---8<--- > > i''ve noticed that if i consolidate and have my MGS and MDT on the same > device/mountpoint on the MDS cluster nodes, all is well and the file > system mounts on the alternative node perfectly. > > any ideas? > > i have ensured i created the file systems with --failnode and --mgsnode= > for each MDS server, but no joy. > > i can see in a previous post to lustre-discuss someone having a similar, > if not the same issue: > > http://lists.lustre.org/pipermail/lustre-discuss/2008-September/008634.html > > cheers > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
On Tue, 2008-11-25 at 20:45 +0000, neil rutter wrote:> hi there,Hi,> i have an issue with failover of the MGS device within my cluster. > > i''m building a simple lustre environment; just one lustre file system > (testfs) ... > > i have a two node cluster for my MGS/MDT; this is for an active/passive > config with the MGS and MDT on different devices and mounted separately > (not co-locating). > > i have a two node cluster for my OSTs, in an active/active config in > that the first OST is on node one and the second OST is on node two.So you in fact have 4 nodes as your Lustre servers, yes? What is your shared storage technology? How are the two OSSes accessing the same two OSTs and how are the two MDSes accessing the single MDT and MGT?> if i have the above, then heartbeat is happy with the OSTs and the MDT > mountpoints being mounted on either node in the cluster. the MGS however > is not. i get the following message when it tries to mount on the > alternative node: > > ---8<--- > mount.lustre: mount /dev/sdb at /lustre/testfs/mgs failed: Invalid argument > This may have multiple causes. > Are the mount options correct? > Check the syslog for more info. > ---8<---Is /dev/sdb actually accessible on the alternative node? What does "cat /proc/partitions" say on that node? What does dmesg tell you after you try to mount /dev/sdb and it fails? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081125/ed6a6f9c/attachment.bin
kevin, brian, thanks to you both. i had a mkfs.lustre error, which after had been pointed out fixed the issue. Brian J. Murrell wrote:> On Tue, 2008-11-25 at 20:45 +0000, neil rutter wrote: >> hi there, > > Hi, > >> i have an issue with failover of the MGS device within my cluster. >> >> i''m building a simple lustre environment; just one lustre file system >> (testfs) ... >> >> i have a two node cluster for my MGS/MDT; this is for an active/passive >> config with the MGS and MDT on different devices and mounted separately >> (not co-locating). >> >> i have a two node cluster for my OSTs, in an active/active config in >> that the first OST is on node one and the second OST is on node two. > > So you in fact have 4 nodes as your Lustre servers, yes? What is your > shared storage technology? How are the two OSSes accessing the same two > OSTs and how are the two MDSes accessing the single MDT and MGT? > >> if i have the above, then heartbeat is happy with the OSTs and the MDT >> mountpoints being mounted on either node in the cluster. the MGS however >> is not. i get the following message when it tries to mount on the >> alternative node: >> >> ---8<--- >> mount.lustre: mount /dev/sdb at /lustre/testfs/mgs failed: Invalid argument >> This may have multiple causes. >> Are the mount options correct? >> Check the syslog for more info. >> ---8<--- > > Is /dev/sdb actually accessible on the alternative node? What does > "cat /proc/partitions" say on that node? > > What does dmesg tell you after you try to mount /dev/sdb and it fails? > > b. > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Tue, 2008-11-25 at 23:34 +0000, neil rutter wrote:> kevin, brian, > > thanks to you both. i had a mkfs.lustre error, which after had been > pointed out fixed the issue.Can you share the error here so that future searches of this problem are complete with a solution? There''s nothing more frustrating than finding the same problem you are having in a mailing list archive and with no solution. Actually there is something more frustrating and that''s that the problem was solved but no details on how. Thanx, b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081126/9c41fd43/attachment.bin
Brian J. Murrell wrote:> On Tue, 2008-11-25 at 23:34 +0000, neil rutter wrote: > >> kevin, brian, >> >> thanks to you both. i had a mkfs.lustre error, which after had been >> pointed out fixed the issue. >> > > Can you share the error here so that future searches of this problem are > complete with a solution? > > There''s nothing more frustrating than finding the same problem you are > having in a mailing list archive and with no solution. Actually there > is something more frustrating and that''s that the problem was solved but > no details on how. > > Thanx, > b. >Come on, if we wanted to make it that easy, it would be in the manual ;-) I gave Neil the correct mkfs commands: mkfs.lustre --reformat --failnode=192.168.123.21 at tcp0 --mgs /dev/sdb mkfs.lustre --reformat --fsname bananafs --failnode=192.168.123.21 at tcp0 --mgsnode=192.168.123.20 at tcp0 --mgsnode=192.168.123.21 at tcp0 --mdt /dev/sdc Kevin
Kevin Van Maren wrote:> Brian J. Murrell wrote: >> On Tue, 2008-11-25 at 23:34 +0000, neil rutter wrote: >> >>> kevin, brian, >>> >>> thanks to you both. i had a mkfs.lustre error, which after had been >>> pointed out fixed the issue. >>> >> >> Can you share the error here so that future searches of this problem are >> complete with a solution? >> >> There''s nothing more frustrating than finding the same problem you are >> having in a mailing list archive and with no solution. Actually there >> is something more frustrating and that''s that the problem was solved but >> no details on how. >> >> Thanx, >> b. >> > Come on, if we wanted to make it that easy, it would be in the manual ;-) > > I gave Neil the correct mkfs commands: > > mkfs.lustre --reformat --failnode=192.168.123.21 at tcp0 --mgs /dev/sdb > mkfs.lustre --reformat --fsname bananafs > --failnode=192.168.123.21 at tcp0 --mgsnode=192.168.123.20 at tcp0 > --mgsnode=192.168.123.21 at tcp0 --mdt /dev/sdc > > KevinBrian, Here are the original, incorrect, mkfs commands: mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 --mgs /dev/sdb mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 --mgsnode=lustremds1 --mdt /dev/sdc Kevin
On Wed, 2008-11-26 at 10:22 -0700, Kevin Van Maren wrote:> > Brian,Hi Kevin,> Here are the original, incorrect, mkfs commands: > > mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 --mgs > /dev/sdb > mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 > --mgsnode=lustremds1 --mdt /dev/sdcSo to be clear, was his failure that he only specified the one --mgsnode or that his hostname specifications did not resolve properly to the IP addresses he used in his subsequently, working commands? Or both? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081126/6f68b774/attachment.bin
guys, actually, the original cmds i ran did have the = signs for the --failnode=<nodename> arguments; i gave the wrong bash history info to kevin to analyse when he asked for them from the host :) however, having said that, it''s the NIDs that were missing from the hosts that fixed the issues i was having. cheers Kevin Van Maren wrote:> Kevin Van Maren wrote: >> Brian J. Murrell wrote: >>> On Tue, 2008-11-25 at 23:34 +0000, neil rutter wrote: >>> >>>> kevin, brian, >>>> >>>> thanks to you both. i had a mkfs.lustre error, which after had been >>>> pointed out fixed the issue. >>>> >>> Can you share the error here so that future searches of this problem are >>> complete with a solution? >>> >>> There''s nothing more frustrating than finding the same problem you are >>> having in a mailing list archive and with no solution. Actually there >>> is something more frustrating and that''s that the problem was solved but >>> no details on how. >>> >>> Thanx, >>> b. >>> >> Come on, if we wanted to make it that easy, it would be in the manual ;-) >> >> I gave Neil the correct mkfs commands: >> >> mkfs.lustre --reformat --failnode=192.168.123.21 at tcp0 --mgs /dev/sdb >> mkfs.lustre --reformat --fsname bananafs >> --failnode=192.168.123.21 at tcp0 --mgsnode=192.168.123.20 at tcp0 >> --mgsnode=192.168.123.21 at tcp0 --mdt /dev/sdc >> >> Kevin > > Brian, > > Here are the original, incorrect, mkfs commands: > > mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 --mgs > /dev/sdb > mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 > --mgsnode=lustremds1 --mdt /dev/sdc > > Kevin > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Kevin Van Maren <Kevin.Vanmaren at ...> writes:> mkfs.lustre --reformat --failnode=192.168.123.21 at tcp0 --mgs /dev/sdbIs --failnode evaluated for the MGS? We seem to do fine without it as any client requires explicit configuration of the MGS failnode anyway. Or is it possible to override this configuration with the value set on the MGS?> mkfs.lustre --reformat --fsname bananafs --failnode=192.168.123.21 at tcp0--mgsnode=192.168.123.20 at tcp0 --mgsnode=192.168.123.21 at tcp0 --mdt /dev/sdc Regards, Daniel.