In my work with the MacOS X client, I did some work from home. While that had the added "benefit" of exposing the issues associated with the lack of attribute caching from the MacOS X client, I noticed something else: lnet is unfortunately rather NAT-unfriendly. Obviously putting your servers behind a NAT is extremely challenging, but I was operating in the not-so-uncommon situation where a client was behind a NAT and the servers all had publically routable IP addresses. Note that I am aware that by default Lustre requires connections from reserved ports; I worked around that issue (until I discovered the way to turn off that check via a configuration knob). Specifically, I can connect to the MGS okay, but after that initial connection I get the following error from lnet_parse() on the client (okay, I reconstructed this from memory, but I think it is reasonably close) src server.addr at tcp: bad dest nid 1.2.3.4 at tcp (should have been sent direct) Where "1.2.3.4 at tcp" is the external address of my NAT box at home. It is worth noting that there are no other known networking issues with this setup; if I put this machine on the external-facing network, I can mount the Lustre filesystem in queston fine. Obviously the problem here is that a message is being sent to my home box, but instead of using the "internal" IP address as the destination NID, the server is using the external address (the one it obviously is getting from the TCP socket). I haven''t yet had a chance to play with this more, but it makes me wonder if anyone else has tried out Lustre from behind a NAT (with 2.0-based Lustre, obviously), and if they did, did it work for you? I am perfectly willing to believe this is an issue with the Mac client, but from looking at the code it doesn''t feel like it would be. Also ... it seems like it would be easy to add a configuration knob that would let you bypass this particular check, and that might make it work. Anyone have any thoughts about that? --Ken
Ken, LNet requires destination address of message to be same with address of LNet NI (unless it''s a router), I''m afraid it''s not easy to make it be tunable. I would suggest to run lustre (lnet) router on the gateway (if your gateway is Linux...) Regards Liang Ken Hornstein wrote:> In my work with the MacOS X client, I did some work from home. While > that had the added "benefit" of exposing the issues associated with the > lack of attribute caching from the MacOS X client, I noticed something > else: lnet is unfortunately rather NAT-unfriendly. > > Obviously putting your servers behind a NAT is extremely challenging, but > I was operating in the not-so-uncommon situation where a client was behind > a NAT and the servers all had publically routable IP addresses. Note that > I am aware that by default Lustre requires connections from reserved ports; > I worked around that issue (until I discovered the way to turn off that > check via a configuration knob). > > Specifically, I can connect to the MGS okay, but after that initial > connection I get the following error from lnet_parse() on the client > (okay, I reconstructed this from memory, but I think it is reasonably > close) > > src server.addr at tcp: bad dest nid 1.2.3.4 at tcp (should have been sent direct) > > Where "1.2.3.4 at tcp" is the external address of my NAT box at home. It > is worth noting that there are no other known networking issues with > this setup; if I put this machine on the external-facing network, I can > mount the Lustre filesystem in queston fine. > > Obviously the problem here is that a message is being sent to my home > box, but instead of using the "internal" IP address as the destination NID, > the server is using the external address (the one it obviously is getting > from the TCP socket). > > I haven''t yet had a chance to play with this more, but it makes me wonder > if anyone else has tried out Lustre from behind a NAT (with 2.0-based > Lustre, obviously), and if they did, did it work for you? I am perfectly > willing to believe this is an issue with the Mac client, but from looking > at the code it doesn''t feel like it would be. > > Also ... it seems like it would be easy to add a configuration knob that > would let you bypass this particular check, and that might make it work. > Anyone have any thoughts about that? > > --Ken > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel >
>LNet requires destination address of message to be same with address of >LNet NI (unless it''s a router), I''m afraid it''s not easy to make it be >tunable. > >I would suggest to run lustre (lnet) router on the gateway (if your >gateway is Linux...)Well, that''s not really feasible, because a) many times we don''t control the gateway (think sitting in Starbucks; and while some people would say that they don''t want to use Lustre from Starbucks, I would say, "Why not?"; with Kerberos authentication, I think it would be perfectly reasonable), and b) even if you control the gateway, that doesn''t really scale, because while that might work for one person, I don''t see how you would do it for more than one person (how would you configure the routing back if more than one person are using the same NAT address range?). I admit I have no love for NAT and I would prefer it if we were living in a world where the end-to-end principle worked everywhere, but that battle was lost years ago. So, I did a little more work on this last night. And I respectfully disagree it would be hard to make those things tunable. In fact, I got Lustre working fine with a few simple client-only changes. I ran into two issues. First, in lib-move.c:lnet_parse(), the variable for_me is set if the network interface nid matches the destination nid. I simply set for_me to 1 all of the time, and that solved that problem. That''s a one-line change, and it would be easy to make that tunable. However, that exposed another problem. lnet_nid2ni_locked() would then fail because it would try to look up the network interface associated with the "external" address (I don''t remember exactly who called this right now, but I suppose it doesn''t matter). You have to be more careful here, because if you just make lnet_nid2ni_locked() return the first match you end up returning the loopback interface and that makes other things unhappy. What I settled on was just matching the first interface that had the same network and type. I made these two changes, and Lustre worked fine. Okay, it wasn''t exactly an exhaustive test: I cd''d to the filesystem and ran "ls -l". As noted before, the Lustre MacOS X client doesn''t do any caching yet so issues with AST notifications may come up. But I think this shows that making Lustre clients work from behind a NAT is not a set of huge changes, at least for the simple case. Are these changes appropriate for the general case? No, and I wouldn''t suggest otherwise. But I think the case of a client with a single network interface from behind a NAT, these changes are reasonable; if they are made tunable and default to being off, I can''t see how it would be harmful, and it might actually help some people. I admit that it occurs to me that I don''t know what would happen if more than one Lustre client was behind the same NAT in this situation. --Ken
Hello! On May 5, 2010, at 8:38 AM, Ken Hornstein wrote:>> LNet requires destination address of message to be same with address of >> LNet NI (unless it''s a router), I''m afraid it''s not easy to make it be >> tunable. >> >> I would suggest to run lustre (lnet) router on the gateway (if your >> gateway is Linux...) > Well, that''s not really feasible, because a) many times we don''t > control the gateway (think sitting in Starbucks; and while some people > would say that they don''t want to use Lustre from Starbucks, I would > say, "Why not?"; with Kerberos authentication, I think it would be > perfectly reasonable), and b) even if you control the gateway, that > doesn''t really scale, because while that might work for one person, > I don''t see how you would do it for more than one person (how would > you configure the routing back if more than one person are using the > same NAT address range?). > I admit I have no love for NAT and I would prefer it if we were living > in a world where the end-to-end principle worked everywhere, but that > battle was lost years ago.I would think using VPN from outside into your Lustre-supplying LAN should be enough to work around this problem somewhat easily with no code changes. Also provides an encrypted secure channel as a bonus. Bye, Oleg
>I would think using VPN from outside into your Lustre-supplying LAN should >be enough to work around this problem somewhat easily with no code changes.Sigh. So, the official Oracle position in terms of LNet-NAT compatibility is to basically give up? If that''s the answer, then I''ll shut up. But really, do I have to justify this, or explain how VPNs aren''t always an option? --Ken
On Wed, May 05, 2010 at 11:31:39AM -0400, Ken Hornstein wrote:> >I would think using VPN from outside into your Lustre-supplying LAN should > >be enough to work around this problem somewhat easily with no code changes.There''s another option: make the gateway an LNet router.> Sigh. So, the official Oracle position in terms of LNet-NAT > compatibility is to basically give up? If that''s the answer, then I''ll > shut up. But really, do I have to justify this, or explain how VPNs > aren''t always an option?I wouldn''t say that''s our "official" position. For starters, you could file an RFE. You could also contribute a fix. But it won''t be simple to fix. Lustre is layered above LNet, and LNet is layered above "LNDs", with each type of LND driving LNet over some type of network (IB, TCP/IP, ...). LNet has no concept of connections. Therefore the state of TCP connections created by socklnd (the name of the TCP/IP LND) is completely irrelevant to LNet. Which means that when some server has to send a message to a client... the server might have to establish a TCP connection (or three) with the client, which means... that the server must know how to connect to the client, and that is completely firewall- unfriendly. Note too that LNet has no idea about the state of the services layered above it, so the socklnd cannot know if a particular peer will be needing to send messages, so as to proactively maintain TCP connections open with them so as to be able to receive those messages -- it can only assume. The very statelessness of LNet makes NAT- and firewall-friendly-ness a difficult proposition. The fix, if it''s at all possible, would require that clients''s socklnds try to keep TCP connections open at all times to all nodes that the client has spoken to in the past. That''s pretty heavy-weight. Consider too that a server is usually also a client: socklnd shouldn''t behave that way in all cases, just in the cases of pure clients behind NATs. The fix might also require changes to timeout handling, and/or maybe even to LNet itself (to at least have a notion of peer node reachability event notification, or something of the sort). Nico --
>> >I would think using VPN from outside into your Lustre-supplying LAN should >> >be enough to work around this problem somewhat easily with no code changes. > >There''s another option: make the gateway an LNet router.Did you see my previous message about this? That simply isn''t an option in many cases.>> Sigh. So, the official Oracle position in terms of LNet-NAT >> compatibility is to basically give up? If that''s the answer, then I''ll >> shut up. But really, do I have to justify this, or explain how VPNs >> aren''t always an option? > >I wouldn''t say that''s our "official" position. For starters, you could >file an RFE. You could also contribute a fix. But it won''t be simple >to fix.Did you see my original message about this? A simple fix (which I will fully admit I only did an extremely brief amount of testing on) was only six lines of changes. Sure, it''s not appropriate as general changes to LNet, but I think making it configurable would be perfectly reasonable. But I wrote the code, so I will fully admit that I''m biased about it.>Lustre is layered above LNet, and LNet is layered above "LNDs", with >each type of LND driving LNet over some type of network (IB, TCP/IP, >...). LNet has no concept of connections. >[...]I understand all of that. Sure, it''s easy to come up with cases where this will fail. But ... it looks like there are a few small changes that can be made that will make it work in some circumstances, as opposed to the current situation (where it will never work). Maybe I''m wrong and further testing will reveal that this is a lot more complicated to make it work in even the simple case, but it seems a shame to not even investigate further. But it seems the feedback I''m getting from the people at Oracle is, "Meh, don''t bother".>The fix, if it''s at all possible, would require that clients''s socklnds >try to keep TCP connections open at all times to all nodes that the >client has spoken to in the past. That''s pretty heavy-weight.Actually, I will freely confess to not being the LNet expert ... but are socklnd TCP connections closed now when clients are idle? With the pinger running (which is a requirement, from what I understand), it seems like you''d have a TCP connection going all of the time beween all clients and servers. The pinger sends a packet every 20-25 seconds, right? --Ken
On Wed, May 05, 2010 at 12:13:56PM -0400, Ken Hornstein wrote:> >> >I would think using VPN from outside into your Lustre-supplying LAN should > >> >be enough to work around this problem somewhat easily with no code changes. > > > >There''s another option: make the gateway an LNet router. > > Did you see my previous message about this? That simply isn''t an option > in many cases.Yes, I did, but I was just adding a workaround that might work for others (it might not -- haven''t tested it).> >I wouldn''t say that''s our "official" position. For starters, you could > >file an RFE. You could also contribute a fix. But it won''t be simple > >to fix. > > Did you see my original message about this? A simple fix (which I will > fully admit I only did an extremely brief amount of testing on) was > only six lines of changes. Sure, it''s not appropriate as general > changes to LNet, but I think making it configurable would be perfectly > reasonable. But I wrote the code, so I will fully admit that I''m biased > about it.I did see that. I hadn''t followed it in detail, but just now I looked at the code you mentioned, and, on a pure client I think that makes sense. See below.> [...]. But it seems the feedback I''m > getting from the people at Oracle is, "Meh, don''t bother".Well, we (or our customers) might have no use for it at this time; or perhaps it''s just NAT hatred running in our veins (just kidding, though I suspect most people who''ve come in contact with NAT love/hate it). Doesn''t mean we wouldn''t take patches, or that we''d never have a use for it. But the first priority is to make sure that the fix, if you''ll contribute one, is sufficiently robust. See below.> >The fix, if it''s at all possible, would require that clients''s socklnds > >try to keep TCP connections open at all times to all nodes that the > >client has spoken to in the past. That''s pretty heavy-weight. > > Actually, I will freely confess to not being the LNet expert ... but > are socklnd TCP connections closed now when clients are idle? With the > pinger running (which is a requirement, from what I understand), it seems > like you''d have a TCP connection going all of the time beween all clients > and servers. The pinger sends a packet every 20-25 seconds, right?Perhaps my "that''s pretty heavy-weight" comment was off the mark. However, I know very little about socklnd, and the key is to make sure it proactively re-connects in the face of timeouts so that servers can always send messages to the NATted clients. Nico --
On 2010-05-05, at 08:38, Ken Hornstein wrote:> So, I did a little more work on this last night. And I respectfully > disagree it would be hard to make those things tunable. In fact, I > got Lustre working fine with a few simple client-only changes. > > I ran into two issues. First, in lib-move.c:lnet_parse(), the variable > for_me is set if the network interface nid matches the destination nid. > I simply set for_me to 1 all of the time, and that solved that problem. > That''s a one-line change, and it would be easy to make that tunable.The problem with setting "for_me = 1" all the time is that this would apparently break LNET routers completely because they would always think that the incoming message is for them, rather than something to be passed on to another peer (i.e. the "if (!the_lnet.ln_routing)" case). It seems that if the "extra" error checks in the "if (!for_me)" code were instead moved earlier and set "for_me = 1" it might be OK: if (LNET_NIDNET(dest_nid) == LNET_NIDNET(ni->ni_nid)) { /* should have gone direct */ for_me = 1; } else if (lnet_islocalnid(dest_nid)) { /* dest is another local NI; sender should have used * this node''s NID on its own network */ for_me = 1; } There still remains the issue with server-client reconnection, which will fail utterly for a NAT address, but as you wrote in another email, the pinger should keep the TCP connection open by virtue of sending messages often enough, or re-establish the connection if it fails. There exists some possibility that the client could be evicted if the connection was lost at the time a lock callback was sent and the server couldn''t re-establish the connection, but if you don''t require 100% robustness (which you can''t from Starbuck''s WIFI anyway) then that is probably an acceptable outcome. That said, take this answer with a pile of salt, I''m not an LNET expert at all and I''m just poking around here as you are. I trust Liang and Isaac with the LNET code totally, and if they tell me this is fundamentally broken, then I''ll believe them. It may be that Liang was referring to the server-client reconnection issue when he wrote that it couldn''t be done easily, but I''ll let him clarify in his own words. Cheers, Andreas Just some guy poking in LNET
Ken, Andreas, Thanks for diving into code, :). As Andreas said, these changes may break rule of router easily (or multiple interfaces setting in the future), so we have to be very careful. Also, we may need more changes inside LNDs, I believe we have more checking there. More interesting, I think you are using internal address to start LNet on client, but servers are using external address to talk back to your client (as you said, there is a message like : bad dst nid 1.2.3.4 at tcp, which is external address). It''s supposed to be broken at somewhere because the socklnd connection should use source address in message header which is internal (client should never know about the external address), but obviously it didn''t, so I guess we probably have a loophole in socklnd to even allow this happen, I will dig into code later. Anyway, you''ve already hacked out and it works fine, so although need more survey, I tend to agree it''s possible for us to make this tunable and bypass those checking at least for LNet + socklnd, if you don''t really care about server-client reconnection (Andreas, yes that''s what I meant) and believe supporting one client with single NI behind NAT is an important use-case even with limitations. Thanks Liang Andreas Dilger wrote:> On 2010-05-05, at 08:38, Ken Hornstein wrote: > >> So, I did a little more work on this last night. And I respectfully >> disagree it would be hard to make those things tunable. In fact, I >> got Lustre working fine with a few simple client-only changes. >> >> I ran into two issues. First, in lib-move.c:lnet_parse(), the variable >> for_me is set if the network interface nid matches the destination nid. >> I simply set for_me to 1 all of the time, and that solved that problem. >> That''s a one-line change, and it would be easy to make that tunable. >> > > The problem with setting "for_me = 1" all the time is that this would apparently break LNET routers completely because they would always think that the incoming message is for them, rather than something to be passed on to another peer (i.e. the "if (!the_lnet.ln_routing)" case). > > It seems that if the "extra" error checks in the "if (!for_me)" code were instead moved earlier and set "for_me = 1" it might be OK: > > if (LNET_NIDNET(dest_nid) == LNET_NIDNET(ni->ni_nid)) { > /* should have gone direct */ > for_me = 1; > } else if (lnet_islocalnid(dest_nid)) { > /* dest is another local NI; sender should have used > * this node''s NID on its own network */ > for_me = 1; > } > > There still remains the issue with server-client reconnection, which will fail utterly for a NAT address, but as you wrote in another email, the pinger should keep the TCP connection open by virtue of sending messages often enough, or re-establish the connection if it fails. There exists some possibility that the client could be evicted if the connection was lost at the time a lock callback was sent and the server couldn''t re-establish the connection, but if you don''t require 100% robustness (which you can''t from Starbuck''s WIFI anyway) then that is probably an acceptable outcome. > > That said, take this answer with a pile of salt, I''m not an LNET expert at all and I''m just poking around here as you are. I trust Liang and Isaac with the LNET code totally, and if they tell me this is fundamentally broken, then I''ll believe them. It may be that Liang was referring to the server-client reconnection issue when he wrote that it couldn''t be done easily, but I''ll let him clarify in his own words. > > Cheers, Andreas > Just some guy poking in LNET > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel >
>Thanks for diving into code, :). >As Andreas said, these changes may break rule of router easily (or >multiple interfaces setting in the future), so we have to be very >careful. Also, we may need more changes inside LNDs, I believe we have >more checking there.Right, that''s why I would only advocate turning off of those checks conditionally on standalone clients.>More interesting, I think you are using internal address to start LNet >on client, but servers are using external address to talk back to your >client (as you said, there is a message like : bad dst nid 1.2.3.4 at tcp, >which is external address). It''s supposed to be broken at somewhere >because the socklnd connection should use source address in message >header which is internal (client should never know about the external >address), but obviously it didn''t, so I guess we probably have a >loophole in socklnd to even allow this happen, I will dig into code later.>From what I can tell (again, not being the LNet expert), the serverknows what the real address of the remote connection is (the external address), but it associates a particular TCP connection with a NID (which has the internal address), and it uses that TCP connection when it wants to talk to the NID. So it all ends up working out, even though in theory it shouldn''t.>Anyway, you''ve already hacked out and it works fine, so although need >more survey, I tend to agree it''s possible for us to make this tunable >and bypass those checking at least for LNet + socklnd, if you don''t >really care about server-client reconnection (Andreas, yes that''s what I >meant) and believe supporting one client with single NI behind NAT is >an important use-case even with limitations.Actually, now that I think more about it, more than one client behind a NAT might actually work fine. What will probably fail is two clients behind two different NATs but having the same internal address. Let me code up a cleaner version of this patch (and make it adjustable) and see how that works out. --Ken