Uwe Kleine-König
2017-Feb-04 22:28 UTC
[syslinux] tftpd: don't use AI_ADDRCONFIG to resolve addresses to bind(2)
Hello Ron, On Sat, Feb 04, 2017 at 03:46:46PM +1030, Ron wrote:> > [...] > That would seem to be a pretty good summation of how we're failing to > converge here ...I mixed too many things that IMHO improve the code but actually only care about one of those. So I suggest we restart the discussion with focusing on that one thing only. Let me try that: Currently tftpd when requested to bind to an address X does in pseudo code and simplified: if X looks like an ipv6 address: family = AF_INET6 elif X looks like an ipv4 address: family = AF_INET else: family = AF_UNSPEC addrinfo = getaddrinfo(X, NULL, { .ai_family = family, .ai_flags = AI_CANONNAME | AI_ADDRCONFIG }) bind(fd, addrinfo) (where bind() works on both AF_INET and AF_INET6 if getaddrinfo returns both). This does the right thing most of the time. There are cases however where the behaviour is wrong or at least undesirable: a) if X = 0.0.0.0 and no interface (but lo) has an ipv4 address, getaddrinfo returns an error and tftpd fails to start with cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known b) if X is an hostname that resolves to an ipv4 and an ipv6 address and the machine currently has no interface (but lo) with an ipv6 address, tftpd only binds to the ipv4 address. In case a) my expectation is that tftpd binds to 0.0.0.0 anyhow. I don't think it is necessary to show a scenario where this is sensible because I expect a command to do what was requested unless that's impossible. Nevertheless there are situations where this might make sense: a1) On a mobile machine without network access during boot. tftpd might later be used when an interface is up (e.g. it is plugged into a network later or a virtual machine is booted once it is needed.) a2) On a machine where you want to boot quickly and so drop unneeded prerequisites. So you can start tftpd in parallel to bringing the network up without the need to serialize these two. Additionally to the refusal to start binding on 0.0.0.0 the error message is not understandable to me. In case b) my expectation is that tftpd fails with something like: cannot bind to IPv6 address $ipv6_address a) can be fixed by just dropping AI_ADDRCONFIG from the call to getaddrinfo. Also for b) this improves the situation, from cannot resolve local IPv6 bind address: $X (...); using IPv4 only to cannot bind to local IPv6 socket,IPv6 disabled: ... So in this case the error message at least matches the actual problem. This convinces me it's the right thing to drop AI_ADDRCONFIG for tftpd as AFAICT there is no down side. Best regards Uwe -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://www.zytor.com/pipermail/syslinux/attachments/20170204/7d5c9583/attachment.sig>
Ron
2017-Feb-05 06:35 UTC
[syslinux] tftpd: don't use AI_ADDRCONFIG to resolve addresses to bind(2)
On Sat, Feb 04, 2017 at 11:28:08PM +0100, Uwe Kleine-K?nig wrote:> On Sat, Feb 04, 2017 at 03:46:46PM +1030, Ron wrote: > > > [...] > > That would seem to be a pretty good summation of how we're failing to > > converge here ... > > I mixed too many things that IMHO improve the code but actually only > care about one of those. So I suggest we restart the discussion with > focusing on that one thing only.Just repeating the same things, while ignoring the options I've shown you that do properly fix the problem(s) you're claiming to care about, isn't actually advancing this toward a workable solution in any way. My previous replies to you were already focussed on the part of your patch that removed AI_ADDRCONFIG, and why it was not needed at best, and harmful at worst. I can read the actual code, and understand how gai works, and I'm pretty sure Mike understood all of that too when he first reported this bug. I'd already long ago checked that there wasn't a real bug being triggered somewhere here, and that the code itself really was working as expected, and you haven't indicated anything to the contrary here. In the subset of cases where gai is used to resolve a string into a (set of) network address(es), it is not wrong to tell it that it's useless to return any address (family) which can't possibly work with the current machine configuration/state. That doesn't become less wrong if your expectation or interpretation of how it should work is different to reality and the specification of how it is defined to work. If you explicitly say "bind to 0.0.0.0", you're saying you want to service global IPv4 requests. If you have no global IPv4 interfaces, that should fail and warn the admin of a problem, not silently ignore that what they explicitly requested isn't going to work. If what you meant to request was "bind to whatever *is* there, I don't care what", then just don't pass an explicit address. If you're allergic to that possibly also binding to IPv6 addresses, then pass the -4 flag too. We don't need to disable the sanity check for users who do configure an explicit address for you to get the behaviour you say you want from the current code, without any change to it at all. If what you care about is "faster boot", then the answer to that isn't "speculatively start things that will fail (or just be useless) if they lose a race", it's to actually not waste time and resources doing that until their known prerequisites have been satisfied. Where it is expected that the available interfaces and/or configured addresses might change dynamically long after boot, then whatever you have doing that needs to notify, or reload, or (re)start, (or stop), anything you are or want to be (not) running, based on that new state of the system. If you want that, propose a patch to add a hook. Otherwise, what you say you want is already possible without needing the kludge you did send. The only bit I'm still struggling to understand, is why you are still pushing this patch hard instead of using what is already available that has exactly the behaviour you say you desire, or looking at a more complete and working solution for dynamic interfaces in general. The options here and the actions of the code don't look very complicated to me, so you don't need to "simplify" it on my behalf. I'm just not seeing you show any new problem that isn't contrived and that doesn't already have a good and/or already working solution which doesn't depend on needing this patch. What problem isn't satisfied by the options I've shown above and earlier? Ron
Uwe Kleine-König
2017-Feb-05 20:56 UTC
[syslinux] tftpd: don't use AI_ADDRCONFIG to resolve addresses to bind(2)
Hello Ron, On Sun, Feb 05, 2017 at 05:05:16PM +1030, Ron wrote:> On Sat, Feb 04, 2017 at 11:28:08PM +0100, Uwe Kleine-K?nig wrote: > > On Sat, Feb 04, 2017 at 03:46:46PM +1030, Ron wrote: > > > > [...] > > > That would seem to be a pretty good summation of how we're failing to > > > converge here ... > > > > I mixed too many things that IMHO improve the code but actually only > > care about one of those. So I suggest we restart the discussion with > > focusing on that one thing only. > > Just repeating the same things, while ignoring the options I've shown > you that do properly fix the problem(s) you're claiming to care about, > isn't actually advancing this toward a workable solution in any way.Note I repeated less than before. I hope this will simplify the discussion and stop both of us arguing about stuff that doesn't matter much.> My previous replies to you were already focussed on the part of your > patch that removed AI_ADDRCONFIG, and why it was not needed at best, > and harmful at worst.Yes, you told me in the situations you care about the modification doesn't help you. I seem to care about different situations where the patch is beneficial. So if the patch doesn't make things worse for you (and all others out there) and it improves the situation for me, IMHO we should apply the patch. I didn't understand yet, in which situations it is harmful as you claim above. The best claim matching this is: It papers over other problems. Is it that why you want to keep AI_ADDRCONFIG? I don't understand though what this buys for you. Consider you have a network problem on the machine tftpd is supposed to run at. The result is that eth0 doesn't have an ipv4 address. You notice that because a tftp-client tries to contact the tftpd server and doesn't get an answer. So what to do next? I assume your next step is logging into the tftpd-machine and check if tftpd is running. (If instead you try ping $tftpdmachine, or check the network config of the tftpd-machine, it doesn't actually help you that tftpd isn't running as it is already obvious then that the network configuration is at fault and not tftpd.) You see it doesn't run, check the log and see cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known . Is it obvious now what the problem is? Maybe yes if the network is still broken. But if not, it's harder to understand. If instead tftpd would have started successfully, you see this after login and the IMHO obvious next step is to check the network config. If you ask me, that's not any harder to debug.> I can read the actual code, and understand how gai works, and I'm pretty > sure Mike understood all of that too when he first reported this bug.I'm not sure about Mike, but that doesn't matter here.> I'd already long ago checked that there wasn't a real bug being triggered > somewhere here, and that the code itself really was working as expected, > and you haven't indicated anything to the contrary here.I did. I showed two examples where the use of AI_ADDRCONFIG breaks more than necessary or expected.> In the subset of cases where gai is used to resolve a string into a > (set of) network address(es), it is not wrong to tell it that it's > useless to return any address (family) which can't possibly work with > the current machine configuration/state.Using AI_ADDRCONFIG suppresses options that actually work. After all I can bind to 0.0.0.0 even if no interface but lo has an ipv4 address. After that I can connect to 127.0.0.1:69 and talk to tftpd. Or I can hotplug a network interface, configure that and talk to tftpd from a remote machine. I understood that none of these is a scenario you depend on. But with your refusal to see this patch as useful you assume that the above is not a use case for anybody.> That doesn't become less wrong if your expectation or interpretation of > how it should work is different to reality and the specification of how > it is defined to work. If you explicitly say "bind to 0.0.0.0", you're > saying you want to service global IPv4 requests.The semantic of binding to 0.0.0.0 is not only "serve on global ipv4 addresses that currently exist". The semantic also includes: "serve on lo" and "serve on interfaces that come up after the socket is bound".> If you have no global IPv4 interfaces, that should fail and warn the > admin of a problem, not silently ignore that what they explicitly > requested isn't going to work.The admin has a problem if a server that is supposed to serve files via tftp to the world doesn't have a ipv4 address. That alone is a problem big enough to notice. It doesn't help the admin that tftpd isn't running or there is a cryptic error message in the log. If the network interface is ready only later in the boot process everything works as expected as soon as the interface is up. You say it's bad in this situation that the admin doesn't notice there is a problem. I wonder if it's not a valid approach to claim that there isn't a real problem. After all the world can access the files now.> If what you meant to request was "bind to whatever *is* there, I don't > care what", then just don't pass an explicit address. > > If you're allergic to that possibly also binding to IPv6 addresses, > then pass the -4 flag too.Do you consider it obvious that tftpd -a 0.0.0.0 is semantically different to tftpd -4 ? I don't. Sure, I can change my setup accordingly now that I know. But this feels much more like a kluge than making the two commands above behave identically. And sure, we can even document this behaviour and claim it to be a feature. But that's not intuitive and I bet people will stumble about this in the future and wonder.> We don't need to disable the sanity check for users who do configure > an explicit address for you to get the behaviour you say you want > from the current code, without any change to it at all.I failed to understand this. If someone uses: tftpd -a 1.2.3.4 my patch doesn't make tftpd magically work. The difference is that tftpd in the "no ipv4 address available" case with my patch says: failed to bind to 1.2.3.4 instead of cannot resolve local IPv4 bind address: 1.2.3.4 without the patch. I say this is an improvement. Maybe I didn't understand you correctly here.> If what you care about is "faster boot", then the answer to that isn't > "speculatively start things that will fail (or just be useless) if they > lose a race", it's to actually not waste time and resources doing that > until their known prerequisites have been satisfied.tftpd when started with -a 0.0.0.0 with eth0 coming up later isn't failing or useless. It does exactly what it is supposed to do when AI_ADDRCONFIG is dropped. That is, tftpd binds to 0.0.0.0.> Where it is expected that the available interfaces and/or configured > addresses might change dynamically long after boot, then whatever you > have doing that needs to notify, or reload, or (re)start, (or stop), > anything you are or want to be (not) running, based on that new state > of the system.If you enlarge the problem I want to solve, my patch isn't suitable any more. That is applicable to all improvements and so it's not sensible to use it against an improvement. So it also wasn't helpful to retitle the Debian bug to 'please add a restart hook for hotplugged interfaces' from 'tftpd-hpa fails to start properly if network is unavailable'. If handling hotplugging is your wish, I suggest opening a separate bug for that without highjacking this one. Making a software behave better in some situations without making it worse in any other is a good enough reason to consider the change. That is still true even if after patch application there are still some situations that could be improved.> If you want that, propose a patch to add a hook. Otherwise, what you > say you want is already possible without needing the kludge you did > send. > > The only bit I'm still struggling to understand, is why you are still > pushing this patch hard instead of using what is already available that > has exactly the behaviour you say you desire, or looking at a more > complete and working solution for dynamic interfaces in general.I do this because tftpd doesn't behave as I expect. I struggled because I didn't understood what the error message want to tell me. And I assume that others struggle in the same way in this situation. And then there is that idea of open source, where you can actually improve the stuff you work with ... So my suggested patch might eventually save others spending time understanding tftpd-hpa as it doesn't fail for them with strange error messages.> The options here and the actions of the code don't look very complicated > to me, so you don't need to "simplify" it on my behalf. I'm just not > seeing you show any new problem that isn't contrived and that doesn't > already have a good and/or already working solution which doesn't depend > on needing this patch. > > What problem isn't satisfied by the options I've shown above and earlier?It makes the behaviour of tftpd match what I expect. And I believe others expected (or will expect) the same. I think it doesn't make sense to continue discussing between the two of us. If this mail isn't enough for you to see my point I suggest we try to find a few other people (ideally hpa among them) to give their opinion. I added the people that commented the Debian bug so far explicitly to Cc:. Maybe you can speak up. (If you're missing context, reading https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771441#161 should be enough.) Best regards Uwe -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://www.zytor.com/pipermail/syslinux/attachments/20170205/8139f98b/attachment.sig>
Possibly Parallel Threads
- [PATCH tftpd-hpa] tftpd: don't use AI_CANONNAME and AI_ADDRCONFIG to resolve addresses for bind
- Bug#771441: [PATCH tftpd-hpa] tftpd: don't use AI_CANONNAME and AI_ADDRCONFIG to resolve addresses for bind
- Bug#771441: [PATCH tftpd-hpa] tftpd: don't use AI_CANONNAME and AI_ADDRCONFIG to resolve addresses for bind
- Bug#771441: [PATCH tftpd-hpa] tftpd: don't use AI_CANONNAME and AI_ADDRCONFIG to resolve addresses for bind
- tftpd: don't use AI_ADDRCONFIG to resolve addresses to bind(2)