Michael Kluge
2011-Dec-19 07:41 UTC
[Lustre-discuss] Client behind Router can''t mount with failover mgs
Hi list, our mgs server (Lustre 1.6.7) failed and we mounted it on the failover node. Our clients (1.6.7) on the same IB network are still functional. We have exported the fs via a Lustre/10GE router to another cluster with a patchless 1.8.5. The router works , we can ping around and get the usual protocol errors. But mounting the fs from the failover node does not work on these clients. Is this expected or is this supposed to work? Regards, Michael -- Dr.-Ing. Michael Kluge Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih
Colin Faber
2011-Dec-19 16:12 UTC
[Lustre-discuss] Client behind Router can''t mount with failover mgs
Hi, On 12/19/2011 12:41 AM, Michael Kluge wrote:> Hi list, > > our mgs server (Lustre 1.6.7) failed and we mounted it on the failover node. Our clients (1.6.7) on the same IB network are still functional.Ok.. Well aside from the fact that 1.6.7 is long since deprecated, what else isn''t functional after failover?> We have exported the fs via a Lustre/10GE router to another cluster with a patchless 1.8.5. The router works , we can ping around and get the usual protocol errors. But mounting the fs from the failover node does not work on these clients. Is this expected or is this supposed to work?Sorry, what are you actually trying to do here??? -cf> > > Regards, Michael > > -- > Dr.-Ing. Michael Kluge > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de > WWW: http://www.tu-dresden.de/zih > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Michael Kluge
2011-Dec-20 08:42 UTC
[Lustre-discuss] Client behind Router can''t mount with failover mgs
Hi Colin,> > our mgs server (Lustre 1.6.7) failed and we mounted it on the failover > > node. Our clients (1.6.7) on the same IB network are still functional. > > Ok.. Well aside from the fact that 1.6.7 is long since deprecated, what > else isn''t functional after failover?Nothing. Everything is fine. Just the 1.8.5. clients behind a IB<->10GE router can''t mount anymore.> > We have exported the fs via a Lustre/10GE router to another cluster > > with a patchless 1.8.5. The router works , we can ping around and get > > the usual protocol errors. But mounting the fs from the failover node > > does not work on these clients. Is this expected or is this supposed > > to work? > > Sorry, what are you actually trying to do here???We have a (pretty old) SDR IB based Cluster with ~700 nodes and 10 Lustre servers. We use an IB<->10GE router to attach this Lustre FS to another cluster. This works pretty well. But only, when the MGS is mounted on the primary node, not when the MGS is mounted on the failover node. I just want to know if this is an expected behaviour or not. Regards, Michael -- Dr.-Ing. Michael Kluge Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4345 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20111220/c7412c6d/attachment.bin
Cliff White
2011-Dec-20 16:19 UTC
[Lustre-discuss] Client behind Router can''t mount with failover mgs
It sounds like your failover NIDS are not properly set up, or the clients cannot route to those NIDS. Check syslogs on the failing clients, that should tell you more. cliffw On Tue, Dec 20, 2011 at 12:42 AM, Michael Kluge <Michael.Kluge at tu-dresden.de> wrote:> Hi Colin, > > > > our mgs server (Lustre 1.6.7) failed and we mounted it on the failover > > > node. Our clients (1.6.7) on the same IB network are still functional. > > > > Ok.. Well aside from the fact that 1.6.7 is long since deprecated, what > > else isn''t functional after failover? > > Nothing. Everything is fine. Just the 1.8.5. clients behind a IB<->10GE > router > can''t mount anymore. > > > > We have exported the fs via a Lustre/10GE router to another cluster > > > with a patchless 1.8.5. The router works , we can ping around and get > > > the usual protocol errors. But mounting the fs from the failover node > > > does not work on these clients. Is this expected or is this supposed > > > to work? > > > > Sorry, what are you actually trying to do here??? > > We have a (pretty old) SDR IB based Cluster with ~700 nodes and 10 Lustre > servers. We use an IB<->10GE router to attach this Lustre FS to another > cluster. This works pretty well. But only, when the MGS is mounted on the > primary node, not when the MGS is mounted on the failover node. I just > want to > know if this is an expected behaviour or not. > > > Regards, Michael > > -- > > Dr.-Ing. Michael Kluge > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de > WWW: http://www.tu-dresden.de/zih > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >-- cliffw Support Guy WhamCloud, Inc. www.whamcloud.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20111220/ec5267d5/attachment.html