thr3ads.net - Lustre devel - [Lustre-devel] lnet NAT friendliness [May 2010]

If this information is useful, please help other people find it:
Share via:

Ken Hornstein

2010-May-04 14:19 UTC

[Lustre-devel] lnet NAT friendliness

In my work with the MacOS X client, I did some work from home.  While
that had the added "benefit" of exposing the issues associated with
the
lack of attribute caching from the MacOS X client, I noticed something
else: lnet is unfortunately rather NAT-unfriendly.

Obviously putting your servers behind a NAT is extremely challenging, but
I was operating in the not-so-uncommon situation where a client was behind
a NAT and the servers all had publically routable IP addresses.  Note that
I am aware that by default Lustre requires connections from reserved ports;
I worked around that issue (until I discovered the way to turn off that
check via a configuration knob).

Specifically, I can connect to the MGS okay, but after that initial
connection I get the following error from lnet_parse() on the client
(okay, I reconstructed this from memory, but I think it is reasonably
close)

src server.addr at tcp: bad dest nid 1.2.3.4 at tcp (should have been sent
direct)

Where "1.2.3.4 at tcp" is the external address of my NAT box at home. 
It
is worth noting that there are no other known networking issues with
this setup; if I put this machine on the external-facing network, I can
mount the Lustre filesystem in queston fine.

Obviously the problem here is that a message is being sent to my home
box, but instead of using the "internal" IP address as the destination
NID,
the server is using the external address (the one it obviously is getting
from the TCP socket).

I haven''t yet had a chance to play with this more, but it makes me
wonder
if anyone else has tried out Lustre from behind a NAT (with 2.0-based
Lustre, obviously), and if they did, did it work for you?  I am perfectly
willing to believe this is an issue with the Mac client, but from looking
at the code it doesn''t feel like it would be.

Also ... it seems like it would be easy to add a configuration knob that
would let you bypass this particular check, and that might make it work.
Anyone have any thoughts about that?

--Ken

Liang Zhen

2010-May-05 11:55 UTC

head link

[Lustre-devel] lnet NAT friendliness

Ken,

LNet requires destination address of message to be same with address of 
LNet NI (unless it''s a router), I''m afraid it''s not
easy to make it  be
tunable.
I would suggest to run lustre (lnet) router on the gateway (if your 
gateway is Linux...)

Regards
Liang

Ken Hornstein wrote:> In my work with the MacOS X client, I did some work from home.  While
> that had the added "benefit" of exposing the issues associated
with the
> lack of attribute caching from the MacOS X client, I noticed something
> else: lnet is unfortunately rather NAT-unfriendly.
>
> Obviously putting your servers behind a NAT is extremely challenging, but
> I was operating in the not-so-uncommon situation where a client was behind
> a NAT and the servers all had publically routable IP addresses.  Note that
> I am aware that by default Lustre requires connections from reserved ports;
> I worked around that issue (until I discovered the way to turn off that
> check via a configuration knob).
>
> Specifically, I can connect to the MGS okay, but after that initial
> connection I get the following error from lnet_parse() on the client
> (okay, I reconstructed this from memory, but I think it is reasonably
> close)
>
> src server.addr at tcp: bad dest nid 1.2.3.4 at tcp (should have been sent
direct)
>
> Where "1.2.3.4 at tcp" is the external address of my NAT box at
home.  It
> is worth noting that there are no other known networking issues with
> this setup; if I put this machine on the external-facing network, I can
> mount the Lustre filesystem in queston fine.
>
> Obviously the problem here is that a message is being sent to my home
> box, but instead of using the "internal" IP address as the
destination NID,
> the server is using the external address (the one it obviously is getting
> from the TCP socket).
>
> I haven''t yet had a chance to play with this more, but it makes me
wonder
> if anyone else has tried out Lustre from behind a NAT (with 2.0-based
> Lustre, obviously), and if they did, did it work for you?  I am perfectly
> willing to believe this is an issue with the Mac client, but from looking
> at the code it doesn''t feel like it would be.
>
> Also ... it seems like it would be easy to add a configuration knob that
> would let you bypass this particular check, and that might make it work.
> Anyone have any thoughts about that?
>
> --Ken
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>

Ken Hornstein

2010-May-05 12:38 UTC

head link

[Lustre-devel] lnet NAT friendliness

>LNet requires destination address of message to be same with address of 
>LNet NI (unless it''s a router), I''m afraid it''s
not easy to make it  be
>tunable.
>
>I would suggest to run lustre (lnet) router on the gateway (if your 
>gateway is Linux...)
Well, that''s not really feasible, because a) many times we
don''t
control the gateway (think sitting in Starbucks; and while some people
would say that they don''t want to use Lustre from Starbucks, I would
say, "Why not?"; with Kerberos authentication, I think it would be
perfectly reasonable), and b) even if you control the gateway, that
doesn''t really scale, because while that might work for one person,
I don''t see how you would do it for more than one person (how would
you configure the routing back if more than one person are using the
same NAT address range?).

I admit I have no love for NAT and I would prefer it if we were living
in a world where the end-to-end principle worked everywhere, but that
battle was lost years ago.

So, I did a little more work on this last night.  And I respectfully
disagree it would be hard to make those things tunable.  In fact, I
got Lustre working fine with a few simple client-only changes.

I ran into two issues.  First, in lib-move.c:lnet_parse(), the variable
for_me is set if the network interface nid matches the destination nid.
I simply set for_me to 1 all of the time, and that solved that problem.
That''s a one-line change, and it would be easy to make that tunable.

However, that exposed another problem.  lnet_nid2ni_locked() would then
fail because it would try to look up the network interface associated
with the "external" address (I don''t remember exactly who
called this
right now, but I suppose it doesn''t matter).  You have to be more
careful here, because if you just make lnet_nid2ni_locked() return the
first match you end up returning the loopback interface and that makes
other things unhappy.  What I settled on was just matching the first
interface that had the same network and type.

I made these two changes, and Lustre worked fine.  Okay, it wasn''t
exactly an exhaustive test: I cd''d to the filesystem and ran "ls
-l".
As noted before, the Lustre MacOS X client doesn''t do any caching yet
so issues with AST notifications may come up.  But I think this shows
that making Lustre clients work from behind a NAT is not a set of huge
changes, at least for the simple case.

Are these changes appropriate for the general case?  No, and I wouldn''t
suggest otherwise.  But I think the case of a client with a single
network interface from behind a NAT, these changes are reasonable; if
they are made tunable and default to being off, I can''t see how it
would be harmful, and it might actually help some people.  I admit that
it occurs to me that I don''t know what would happen if more than one
Lustre client was behind the same NAT in this situation.

--Ken

Oleg Drokin

2010-May-05 15:26 UTC

head link

[Lustre-devel] lnet NAT friendliness

Hello!

On May 5, 2010, at 8:38 AM, Ken Hornstein wrote:
>> LNet requires destination address of message to be same with address of
>> LNet NI (unless it''s a router), I''m afraid
it''s not easy to make it  be
>> tunable.
>> 
>> I would suggest to run lustre (lnet) router on the gateway (if your 
>> gateway is Linux...)
> Well, that''s not really feasible, because a) many times we
don''t
> control the gateway (think sitting in Starbucks; and while some people
> would say that they don''t want to use Lustre from Starbucks, I
would
> say, "Why not?"; with Kerberos authentication, I think it would
be
> perfectly reasonable), and b) even if you control the gateway, that
> doesn''t really scale, because while that might work for one
person,
> I don''t see how you would do it for more than one person (how
would
> you configure the routing back if more than one person are using the
> same NAT address range?).
> I admit I have no love for NAT and I would prefer it if we were living
> in a world where the end-to-end principle worked everywhere, but that
> battle was lost years ago.
I would think using VPN from outside into your Lustre-supplying LAN should
be enough to work around this problem somewhat easily with no code changes.

Also provides an encrypted secure channel as a bonus.

Bye,
    Oleg

Ken Hornstein

2010-May-05 15:31 UTC

head link

[Lustre-devel] lnet NAT friendliness

>I would think using VPN from outside into your Lustre-supplying LAN should
>be enough to work around this problem somewhat easily with no code changes.
Sigh.  So, the official Oracle position in terms of LNet-NAT
compatibility is to basically give up?  If that''s the answer, then
I''ll
shut up.  But really, do I have to justify this, or explain how VPNs
aren''t always an option?

--Ken

Nicolas Williams

2010-May-05 15:48 UTC

head link

[Lustre-devel] lnet NAT friendliness

On Wed, May 05, 2010 at 11:31:39AM -0400, Ken Hornstein
wrote:> >I would think using VPN from outside into your Lustre-supplying LAN
should
> >be enough to work around this problem somewhat easily with no code
changes.
There''s another option: make the gateway an LNet router.
> Sigh.  So, the official Oracle position in terms of LNet-NAT
> compatibility is to basically give up?  If that''s the answer, then
I''ll
> shut up.  But really, do I have to justify this, or explain how VPNs
> aren''t always an option?
I wouldn''t say that''s our "official" position.  For
starters, you could
file an RFE.  You could also contribute a fix.  But it won''t be simple
to fix.

Lustre is layered above LNet, and LNet is layered above "LNDs", with
each type of LND driving LNet over some type of network (IB, TCP/IP,
...).  LNet has no concept of connections.  Therefore the state of TCP
connections created by socklnd (the name of the TCP/IP LND) is
completely irrelevant to LNet.  Which means that when some server has to
send a message to a client... the server might have to establish a TCP
connection (or three) with the client, which means... that the server
must know how to connect to the client, and that is completely firewall-
unfriendly.  Note too that LNet has no idea about the state of the
services layered above it, so the socklnd cannot know if a particular
peer will be needing to send messages, so as to proactively maintain TCP
connections open with them so as to be able to receive those messages --
it can only assume.

The very statelessness of LNet makes NAT- and firewall-friendly-ness a
difficult proposition.

The fix, if it''s at all possible, would require that clients''s
socklnds
try to keep TCP connections open at all times to all nodes that the
client has spoken to in the past.  That''s pretty heavy-weight. 
Consider
too that a server is usually also a client: socklnd shouldn''t behave
that way in all cases, just in the cases of pure clients behind NATs.
The fix might also require changes to timeout handling, and/or maybe
even to LNet itself (to at least have a notion of peer node reachability
event notification, or something of the sort).

Nico
--

Ken Hornstein

2010-May-05 16:13 UTC

head link

[Lustre-devel] lnet NAT friendliness

>> >I would think using VPN from outside into your Lustre-supplying LAN
should
>> >be enough to work around this problem somewhat easily with no code
changes.
>
>There''s another option: make the gateway an LNet router.
Did you see my previous message about this?  That simply isn''t an
option
in many cases.
>> Sigh.  So, the official Oracle position in terms of LNet-NAT
>> compatibility is to basically give up?  If that''s the answer,
then I''ll
>> shut up.  But really, do I have to justify this, or explain how VPNs
>> aren''t always an option?
>
>I wouldn''t say that''s our "official" position. 
For starters, you could
>file an RFE.  You could also contribute a fix.  But it won''t be
simple
>to fix.
Did you see my original message about this?  A simple fix (which I will
fully admit I only did an extremely brief amount of testing on) was
only six lines of changes.  Sure, it''s not appropriate as general
changes to LNet, but I think making it configurable would be perfectly
reasonable.  But I wrote the code, so I will fully admit that I''m
biased
about it.
>Lustre is layered above LNet, and LNet is layered above "LNDs",
with
>each type of LND driving LNet over some type of network (IB, TCP/IP,
>...).  LNet has no concept of connections.
>[...]
I understand all of that.  Sure, it''s easy to come up with cases where
this will fail.  But ... it looks like there are a few small changes
that can be made that will make it work in some circumstances, as
opposed to the current situation (where it will never work).  Maybe I''m
wrong and further testing will reveal that this is a lot more
complicated to make it work in even the simple case, but it seems a
shame to not even investigate further.  But it seems the feedback I''m
getting from the people at Oracle is, "Meh, don''t bother".
>The fix, if it''s at all possible, would require that
clients''s socklnds
>try to keep TCP connections open at all times to all nodes that the
>client has spoken to in the past.  That''s pretty heavy-weight.
Actually, I will freely confess to not being the LNet expert ... but
are socklnd TCP connections closed now when clients are idle?  With the
pinger running (which is a requirement, from what I understand), it seems
like you''d have a TCP connection going all of the time beween all
clients
and servers.  The pinger sends a packet every 20-25 seconds, right?

--Ken

Nicolas Williams

2010-May-05 16:32 UTC

head link

[Lustre-devel] lnet NAT friendliness

On Wed, May 05, 2010 at 12:13:56PM -0400, Ken Hornstein
wrote:> >> >I would think using VPN from outside into your
Lustre-supplying LAN should
> >> >be enough to work around this problem somewhat easily with no
code changes.
> >
> >There''s another option: make the gateway an LNet router.
> 
> Did you see my previous message about this?  That simply isn''t an
option
> in many cases.
Yes, I did, but I was just adding a workaround that might work for
others (it might not -- haven''t tested it).
> >I wouldn''t say that''s our "official"
position.  For starters, you could
> >file an RFE.  You could also contribute a fix.  But it won''t
be simple
> >to fix.
> 
> Did you see my original message about this?  A simple fix (which I will
> fully admit I only did an extremely brief amount of testing on) was
> only six lines of changes.  Sure, it''s not appropriate as general
> changes to LNet, but I think making it configurable would be perfectly
> reasonable.  But I wrote the code, so I will fully admit that I''m
biased
> about it.
I did see that.  I hadn''t followed it in detail, but just now I looked
at the code you mentioned, and, on a pure client I think that makes
sense.  See below.
>                                 [...].  But it seems the feedback
I''m
> getting from the people at Oracle is, "Meh, don''t
bother".
Well, we (or our customers) might have no use for it at this time; or
perhaps it''s just NAT hatred running in our veins (just kidding, though
I suspect most people who''ve come in contact with NAT love/hate it).
Doesn''t mean we wouldn''t take patches, or that we''d
never have a use for
it.  But the first priority is to make sure that the fix, if you''ll
contribute one, is sufficiently robust.  See below.
> >The fix, if it''s at all possible, would require that
clients''s socklnds
> >try to keep TCP connections open at all times to all nodes that the
> >client has spoken to in the past.  That''s pretty heavy-weight.
> 
> Actually, I will freely confess to not being the LNet expert ... but
> are socklnd TCP connections closed now when clients are idle?  With the
> pinger running (which is a requirement, from what I understand), it seems
> like you''d have a TCP connection going all of the time beween all
clients
> and servers.  The pinger sends a packet every 20-25 seconds, right?
Perhaps my "that''s pretty heavy-weight" comment was off the
mark.
However, I know very little about socklnd, and the key is to make sure
it proactively re-connects in the face of timeouts so that servers can
always send messages to the NATted clients.

Nico
--

Andreas Dilger

2010-May-06 06:02 UTC

head link

[Lustre-devel] lnet NAT friendliness

On 2010-05-05, at 08:38, Ken Hornstein wrote:> So, I did a little more work on this last night.  And I respectfully
> disagree it would be hard to make those things tunable.  In fact, I
> got Lustre working fine with a few simple client-only changes.
> 
> I ran into two issues.  First, in lib-move.c:lnet_parse(), the variable
> for_me is set if the network interface nid matches the destination nid.
> I simply set for_me to 1 all of the time, and that solved that problem.
> That''s a one-line change, and it would be easy to make that
tunable.
The problem with setting "for_me = 1" all the time is that this would
apparently break LNET routers completely because they would always think that
the incoming message is for them, rather than something to be passed on to
another peer (i.e. the "if (!the_lnet.ln_routing)" case).

It seems that if the "extra" error checks in the "if
(!for_me)" code were instead moved earlier and set "for_me = 1"
it might be OK:

       if (LNET_NIDNET(dest_nid) == LNET_NIDNET(ni->ni_nid)) {
                /* should have gone direct */
                for_me = 1;
       } else if (lnet_islocalnid(dest_nid)) {
                /* dest is another local NI; sender should have used
                 * this node''s NID on its own network */
                for_me = 1;
       }

There still remains the issue with server-client reconnection, which will fail
utterly for a NAT address, but as you wrote in another email, the pinger should
keep the TCP connection open by virtue of sending messages often enough, or
re-establish the connection if it fails.  There exists some possibility that the
client could be evicted if the connection was lost at the time a lock callback
was sent and the server couldn''t re-establish the connection, but if
you don''t require 100% robustness (which you can''t from
Starbuck''s WIFI anyway) then that is probably an acceptable outcome.

That said, take this answer with a pile of salt, I''m not an LNET expert
at all and I''m just poking around here as you are.  I trust Liang and
Isaac with the LNET code totally, and if they tell me this is fundamentally
broken, then I''ll believe them.  It may be that Liang was referring to
the server-client reconnection issue when he wrote that it couldn''t be
done easily, but I''ll let him clarify in his own words.

Cheers, Andreas
Just some guy poking in LNET

Liang Zhen

2010-May-06 09:31 UTC

head link

[Lustre-devel] lnet NAT friendliness

Ken, Andreas,

Thanks for diving into code, :).
As Andreas said, these changes may break rule of router easily (or 
multiple interfaces setting in the future), so we have to be very 
careful. Also, we may need more changes inside LNDs, I believe we have 
more checking there.

More interesting, I think you are using internal address to start LNet 
on client, but servers are using external address to talk back to your 
client (as you said, there is a message like : bad dst nid 1.2.3.4 at tcp, 
which is external address). It''s supposed to be broken at somewhere 
because the socklnd connection should use source address in message 
header which is internal (client should never know about the external 
address),  but obviously it didn''t, so I guess we probably have a 
loophole in socklnd to even allow this happen, I will dig into code later.

Anyway, you''ve already hacked out and it works fine,  so although need 
more survey, I tend to agree it''s possible for us to make this tunable 
and bypass those checking at least for  LNet + socklnd,  if you don''t 
really care about server-client reconnection (Andreas, yes that''s what
I
meant)  and believe supporting one client with single NI behind NAT is 
an important use-case even with limitations.

Thanks
Liang

Andreas Dilger wrote:> On 2010-05-05, at 08:38, Ken Hornstein wrote:
>   
>> So, I did a little more work on this last night.  And I respectfully
>> disagree it would be hard to make those things tunable.  In fact, I
>> got Lustre working fine with a few simple client-only changes.
>>
>> I ran into two issues.  First, in lib-move.c:lnet_parse(), the variable
>> for_me is set if the network interface nid matches the destination nid.
>> I simply set for_me to 1 all of the time, and that solved that problem.
>> That''s a one-line change, and it would be easy to make that
tunable.
>>     
>
> The problem with setting "for_me = 1" all the time is that this
would apparently break LNET routers completely because they would always think
that the incoming message is for them, rather than something to be passed on to
another peer (i.e. the "if (!the_lnet.ln_routing)" case).
>
> It seems that if the "extra" error checks in the "if
(!for_me)" code were instead moved earlier and set "for_me = 1"
it might be OK:
>
>        if (LNET_NIDNET(dest_nid) == LNET_NIDNET(ni->ni_nid)) {
>                 /* should have gone direct */
>                 for_me = 1;
>        } else if (lnet_islocalnid(dest_nid)) {
>                 /* dest is another local NI; sender should have used
>                  * this node''s NID on its own network */
>                 for_me = 1;
>        }
>
> There still remains the issue with server-client reconnection, which will
fail utterly for a NAT address, but as you wrote in another email, the pinger
should keep the TCP connection open by virtue of sending messages often enough,
or re-establish the connection if it fails.  There exists some possibility that
the client could be evicted if the connection was lost at the time a lock
callback was sent and the server couldn''t re-establish the connection,
but if you don''t require 100% robustness (which you can''t from
Starbuck''s WIFI anyway) then that is probably an acceptable outcome.
>
> That said, take this answer with a pile of salt, I''m not an LNET
expert at all and I''m just poking around here as you are.  I trust
Liang and Isaac with the LNET code totally, and if they tell me this is
fundamentally broken, then I''ll believe them.  It may be that Liang was
referring to the server-client reconnection issue when he wrote that it
couldn''t be done easily, but I''ll let him clarify in his own
words.
>
> Cheers, Andreas
> Just some guy poking in LNET
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>

Ken Hornstein

2010-May-06 14:35 UTC

head link

[Lustre-devel] lnet NAT friendliness

>Thanks for diving into code, :).
>As Andreas said, these changes may break rule of router easily (or 
>multiple interfaces setting in the future), so we have to be very 
>careful. Also, we may need more changes inside LNDs, I believe we have 
>more checking there.
Right, that''s why I would only advocate turning off of those checks
conditionally on standalone clients.
>More interesting, I think you are using internal address to start LNet 
>on client, but servers are using external address to talk back to your 
>client (as you said, there is a message like : bad dst nid 1.2.3.4 at tcp, 
>which is external address). It''s supposed to be broken at somewhere
>because the socklnd connection should use source address in message 
>header which is internal (client should never know about the external 
>address),  but obviously it didn''t, so I guess we probably have a 
>loophole in socklnd to even allow this happen, I will dig into code later.
>From what I can tell (again, not being the LNet expert), the serverknows what the real address of the remote connection is (the external
address), but it associates a particular TCP connection with a NID
(which has the internal address), and it uses that TCP connection when
it wants to talk to the NID.  So it all ends up working out, even though
in theory it shouldn''t.
>Anyway, you''ve already hacked out and it works fine,  so although
need
>more survey, I tend to agree it''s possible for us to make this
tunable
>and bypass those checking at least for  LNet + socklnd,  if you
don''t
>really care about server-client reconnection (Andreas, yes that''s
what I
>meant)  and believe supporting one client with single NI behind NAT is 
>an important use-case even with limitations.
Actually, now that I think more about it, more than one client behind a
NAT might actually work fine.  What will probably fail is two clients
behind two different NATs but having the same internal address.

Let me code up a cleaner version of this patch (and make it adjustable)
and see how that works out.

--Ken

Lustre devel - May 2010 - lnet NAT friendliness

[Lustre-devel] lnet NAT friendliness

[Lustre-devel] lnet NAT friendliness

[Lustre-devel] lnet NAT friendliness

[Lustre-devel] lnet NAT friendliness

[Lustre-devel] lnet NAT friendliness

[Lustre-devel] lnet NAT friendliness

[Lustre-devel] lnet NAT friendliness

[Lustre-devel] lnet NAT friendliness

[Lustre-devel] lnet NAT friendliness

[Lustre-devel] lnet NAT friendliness

[Lustre-devel] lnet NAT friendliness