Uwe Kleine-König
2017-Feb-03 09:19 UTC
[syslinux] Bug#771441: [PATCH tftpd-hpa] tftpd: don't use AI_CANONNAME and AI_ADDRCONFIG to resolve addresses for bind
On Fri, Feb 03, 2017 at 03:40:32PM +1030, Ron wrote:> On Thu, Feb 02, 2017 at 02:22:47PM +0100, Uwe Kleine-K?nig wrote: > > On Thu, Feb 02, 2017 at 04:08:49PM +1030, Ron via Syslinux wrote: > > > > > The use of AI_PASSIVE here is a placebo. That flag has no effect unless > > > address was NULL, and if that was true, neither of the hunks here would > > > actually be executed in the first place. > > > > Right. Today it only has an effect if the first argument to getaddrinfo > > is NULL. The intension (IIUC) is that it should be used when you plan to > > feed the result to bind (opposed to connect). > > What do you mean by "Today"? Both SuSv4 and the Linux man page are > unequivocal about the _only_ use of that flag being to special case > a NULL address (meaning 'this machine') to return either the wildcard > address or the LOOPBACK address.I had in mind that at some point in the future (say with ipv8 or 802.11t-2042) the flag might mean more. I'd say the intension is to use AI_PASSIVE if you plan to listen on this address, so it seemed right to use it. But I'm willing to restrict the discussion to the removing of AI_ADDRCONFIG.> > > Using AI_CANONNAME here should be harmless at worst. So the only actual > > > change is to drop AI_ADDRCONFIG - the flag which limits getaddrinfo to > > > returning only the address families that are actually supported by the > > > configured interfaces on the system. And ordinarily that would seem to > > > be a fairly uncontroversially Good Thing to do, for both connecting and > > > listening sockets. > > > > The downside of using AI_ADDRCONFIG is that it makes binding to 0.0.0.0 > > (or ::) fail when no interface is up yet. > > That seems to be where we disagree. I don't see it as a 'downside' that > if you explicitly say "I want to bind to IPv4 addresses" (or IPv6), and > you don't actually have any, that this should fail early and loudly to > warn you about either misconfiguration, or some other more serious > failure, occurring.If I want to bind to 0.0.0.0 and no interface (but lo which might be good enough for me) is up this might be intensionally. This way I might be able to speed up system boot because I don't have to wait until all interfaces are up before I start the daemon. Of course this doesn't help if I configure tftp to listen on an explicit address, but that's a different problem that's out of scope of my patch.> If you passed a name instead of a numeric address, you'd only get the > address families the machine actually supported. If you pass a numeric > address in a particular family, you get a sanity check that it's valid. > > If you (personally) don't care about that, just don't pass an explicit > address. > > The downsides of not using AI_ADDRCONFIG can't be remedied so easily.IMHO AI_ADDRCONFIG is at best an optimisation for programs that use a socket to connect(2). It's not that sensible for sockets to listen. Consider I do: tftpd -a tftpd.mycompany.com:tftpd and tftpd.mycompany.com resolves to both an ipv4 and an ipv6 address. If the server has an ipv4 problem it just starts to listen on the ipv6 address with AI_ADDRCONFIG. Does this sound wrong only for me?> > If we can agree in principle I can rework the patch to make one change > > per patch: > > I'm far less concerned about the format of the patch, than the details > of what it's actually hoping to achieve. > > If all of the reasons for doing this are just different ways of saying > "if we do less sanity checking, then misconfiguration and broken tools > won't annoy me as often" - that's not very compelling.I tried hard to show good reasons that this is not the motivation for this change. I don't say "drop all sanity checking", I'm only saying "don't refuse to work as good as you can".> Doubly not so when there already is a way you can configure this for > your own use which does already bypass them. Simply disabling that > for everyone and every configuration isn't a good answer. > > > > So unless upstream sees this differently, I still think we'd need to see > > > some stronger rationale for why that isn't a Good Thing in this particular > > > case than just "Dropping that flag hides a real bug in NetworkManager". > > > > This is not the (only) reason for me. This is mostly only how it showed > > up for some people, but still there are more IMHO good reasons to fix > > it: > > > > - inconsistent behaviour when no interface is up: -a 0.0.0.0 fails, > > -a :: fails, not passing -a doesn't fail and makes tftpd bind to all > > interfaces. > > I don't see how that is 'inconsistent'? If you ask for an explicit > address (family) you get a sanity check that (support for it) is > available. If you say you don't care, then tftpd doesn't either.If I don't pass -a that's not "I don't care" but "bind to 0.0.0.0 and ::". To make it more explicit: - If the admin requests 0.0.0.0 this is denied because there is no ipv4 address on any interface. - If the admin requests :: this is denied because there is no ipv6 address on any interface. - If the admin requests :: and 0.0.0.0 he gets what he wants even if all interfaces have neither an ipv4 nor an ipv6 address.> > - The "no interface is up" also happens with ifupdown with no auto > > interface is used (only hotplug) > > Not defaulting to auto (and systemd not respecting it for a while) were > both bugs that broke lots of services on lots of people's machine.Still there are valid use cases of using hotplug. I don't see why tftpd shouldn't try to cooperate in these cases without restricting users that have different setups.> And I already explained to you in #d-d that the established 'solution' > for that, for services which don't monitor netlink events, is to add a > hook which restarts them when interfaces appear or disappear, if you > really want them to bind to interfaces which might genuinely be expected > to be hotplugged.This is a more complicated solution for a more complicated problem. I want to discuss: "tftpd fails to bind to 0.0.0.0 in some situations even though it could do as requested." Also listening on 0.0.0.0 includes listening on lo. For listening sockets it's ridiculous to special case lo. IMHO it's even wrong that in the case where there is no address on any interface but lo >>> import socket >>> socket.getaddrinfo("127.0.0.1", "tftp", socket.AF_INET, flags=socket.AI_ADDRCONFIG) fails. So IMHO AI_ADDRCONFIG is just band aid that might be used for clients(!) that fail to use all return values from getaddrinfo.> Because if you're not actually using wildcard addresses, then this will > _still_ fail, even with your patch, if that interface isn't already up.Right, this is entirely ok and everything else would be wrong. (I didn't test, but I think also in this case the error message is improved from "Cannot resolve 8.8.8.8" to "Cannot bind to 8.8.8.8" which IMHO makes more sense.)> > - The "no interface is up" also happens if your laptop has no network > > connection during boot > > This doesn't need a link up state to work, it just needs a local > interface to be brought up with the needed address (family) assigned > to it.This is not usual for the common laptop. If it's booted in a train, then suspend and resumed in the office tftpd isn't running.> If your only address is assigned by DHCP or something similar, and you > aren't blocking waiting on that - then as above lots of services are > going to fail to start if your boot sequence blindly tries to start > them anyway.On my machine tftp is the only service having this problem. Pointing to others that don't behave cooperative isn't an excuse to not cooperate.> That isn't the fault of, or a bug in, those services. Your system is > just misconfigured for what you actually want it to do.For me it's the other way round. If I request an application to bind to 0.0.0.0 it should try to do this and not be smart with me. Even if the application thinks I did something wrong, the application should only complain if I request something impossible.> > - It's more robust to try what the admin requested. It is possible even > > if no interface is up to bind to 0.0.0.0. So I suggest to do that and > > not try to know it better than the admin. > > See above. Your patch is *taking away* the ability of the admin to > have the choice to specify exactly what behaviour they want ...I fail to follow. Can you please remember me, what the failure is I introduce?> The behaviour that you want is already supported. You just need to > explicitly "request" that, rather than redefine the historical > default to now mean that as well, taking away any other option that > some other admin might want to request.I'm not talking about the default configuration. I just want to make tftpd able to bind to 0.0.0.0 when requested to do so.> > - The error message > > cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known > > is misleading. > > Well, you can file that bug against glibc :) We're just reporting > what getaddrinfo and gai_strerror returned ...getaddrinfo and gai_strerror are fine. ftpd uses them wrongly and so the error doesn't fit to what tftpd should actually do.> > > Because it could hide or introduce real problems in other cases too, and > > > if the bug in NM is fixed, then the only reason I'm so far aware of for > > > you proposing this patch (based on the discussion on #d-d) also goes away > > > too ... > > > > See above. Which problems do you see introduced by my patch? > > You're assuming that the only configuration anyone would want is to bind > to any address, without any checking of what address (families) are > available, even if the user explicitly specifies them - and that silently > ignoring any error with being able to do that is a Good Thing.No, you're talking about that other problem ("What should be the default binding address of tftpd?"). I'm talking about "tftpd fails to bind to 0.0.0.0 in some situations even though it could do as requested."> If you're running a toy service on your laptop, that might be ok.Great, so we can agree on a use case where my patch makes sense. Great. For me this is good enough to apply the patch given there are no disadvantages to other use cases.> If you're really using your laptop as a 'portable' server for some > special use case, you probably don't want it binding to random wifi > hotspots wherever you may go with it.Right. In this case I have to make something different. I can do so even if tftpd behaves fine when requested to bind to 0.0.0.0. Binding to 0.0.0.0 might not be part of the solution for the portable server scenario, but being able to do so doesn't restrict me here. Great!> If you're running a Real Server, then silently ignoring real problems > is pretty much the opposite of what you'd want happening in most cases.What is the problem here? I have a Real Server and tftpd is supposed to be started on 0.0.0.0. Currently it fails to do so if the networking setup is broken. So after repairing the network configuration I have to restart tftpd. This is cheap and ok. With my suggested patch I have to repair the network only and after that tftpd starts serving requests as its configured to do. That's even better, isn't it? Even if not, it doesn't make the situation worse here. So lets agree that there are some situations where the patch is good, and in all other situations it doesn't hurt. That's a good enough justification to apply the patch if you ask me.> The lessons of: > https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00 > aren't entirely inapplicable here.I admit I didn't read that completely. The abstract says that the statement "Be liberal in what you accept, and conservative in what you send" might have negative consequences to long term maintenance. This might be true if you expand your application to handle all sort of broken requests. I don't see this fit my patch, as it is not about better guessing what the admin requested if he articulates something incomprehensial. It's just about doing what the admin clearly requested.> > IMHO "we don't do it right because it might paper over other problems" > > is a poor reason for not patching. ("I don't need seatbelt or a helmet > > because if my head gets hurt there is a different problem.") > > I don't think defining "do it right" as "I'd rather you disable checking > in the software for everyone than configure it locally to do what I want" > is very helpful here. Especially not if the only reason you want that > is because NM has a bug, or you don't want to configure it to properly > support using this service with hotplugged interfaces.I think you're talking about the default configuration thing again. If not I cannot follow.> By your analogy, what you're saying with this patch is "I want you to > remove the seatbelt completely, even though I already have the option > to choose not to wear it myself".That's wrong. You currently don't have the option to not wear the seatbelt because -a 0.0.0.0 fails.> > > Assuming that at some point the NM bug will be fixed, why would we still > > > want to make this change in this code? > > > > See above. FTR: The NM bug is already reported upstream and a first > > (though not fixing) patch was suggested by them. > > Cool, though it would be nice if there was a (possibly RC?) debian bug > also tracking that. > > I think if you want to change the behaviour of this program, you're also > going to have to get your proposed patch past upstream before I apply it. > When I first adopted this package, the first thing I did was get all of > the patches we were carrying upstreamed, and I'm not keen to diverge > from them again over something like this.Agreed. That's why I posted the patch to the upstream mailing list. Assuming we're still not in agreement about this patch, it would be great to get a third opinion. Best regards Uwe -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://www.zytor.com/pipermail/syslinux/attachments/20170203/7cb11726/attachment.sig>
Ron
2017-Feb-04 05:16 UTC
[syslinux] Bug#771441: tftpd: don't use AI_CANONNAME and AI_ADDRCONFIG to resolve addresses for bind
On Fri, Feb 03, 2017 at 10:19:06AM +0100, Uwe Kleine-K?nig wrote:> On Fri, Feb 03, 2017 at 03:40:32PM +1030, Ron wrote: > > On Thu, Feb 02, 2017 at 02:22:47PM +0100, Uwe Kleine-K?nig wrote: > > > On Thu, Feb 02, 2017 at 04:08:49PM +1030, Ron via Syslinux wrote: > > > > > > > The use of AI_PASSIVE here is a placebo. That flag has no effect unless > > > > address was NULL, and if that was true, neither of the hunks here would > > > > actually be executed in the first place. > > > > > > Right. Today it only has an effect if the first argument to getaddrinfo > > > is NULL. The intension (IIUC) is that it should be used when you plan to > > > feed the result to bind (opposed to connect). > > > > What do you mean by "Today"? Both SuSv4 and the Linux man page are > > unequivocal about the _only_ use of that flag being to special case > > a NULL address (meaning 'this machine') to return either the wildcard > > address or the LOOPBACK address. > > I had in mind that at some point in the future (say with ipv8 or > 802.11t-2042) the flag might mean more.... That would seem to be a pretty good summation of how we're failing to converge here ... Brainstorming imaginary problems to fit your proposed solution, especially when you don't clearly say exactly what *your real* use case was here, doesn't make that solution more compelling or less ill-advised. If your real problem (aside from the NM bug which is now being tracked here: https://bugs.debian.org/854078) is that you don't want to bind to a specific address, just anything that appears at any time, then there is no bug effecting you, you can already configure tftpd that way. If it's instead that you do want to configure it to only use a subset of the available addresses, and some of those addresses might genuinely be hotplugged, long after boot, independent of the NM bug - then there's a well known, long established, solution to that too. Which I've mentioned several times now. Whatever brings those interfaces up needs a hook to restart the services that you want bound to them, for each of the services that doesn't do that itself by monitoring netlink events. It's the "one simple trick" that solves all the problems you've expounded here, also solves the problem you admit that your patch doesn't address, and doesn't have the unfortunate side effects your patch does. If:> I'm only saying "don't refuse to work as good as you can".Then why would you insist this crazy patch, which just crudely kludges over a limited subset of the issues with hotplugged interfaces, leaving others still broken - is better than one well-trodden solution which fits all cases? I don't use NM, so if someone who does wants us to add such a hook for it to this package, they'll need to send a tested patch to do that. I'll happily include it (and then close this clone of the bug) if they do. Bonus points if you also send one for ifupdown, but so far nobody has reported wanting to use this with it and genuinely hotplugged interfaces. Please, let's focus on good solutions to the real problems rather than straining to find good problems to fit a partial kludge. This bug log is already a maze of twisty little misconceptions - and the aim is to dig our way *out* of that, not to find new ratholes to get lost in. What doesn't work if the bug in NM is fixed, and it has a hook to notify this service of real dynamic interface changes when they occur? Ron
Uwe Kleine-König
2017-Feb-04 22:28 UTC
[syslinux] tftpd: don't use AI_ADDRCONFIG to resolve addresses to bind(2)
Hello Ron, On Sat, Feb 04, 2017 at 03:46:46PM +1030, Ron wrote:> > [...] > That would seem to be a pretty good summation of how we're failing to > converge here ...I mixed too many things that IMHO improve the code but actually only care about one of those. So I suggest we restart the discussion with focusing on that one thing only. Let me try that: Currently tftpd when requested to bind to an address X does in pseudo code and simplified: if X looks like an ipv6 address: family = AF_INET6 elif X looks like an ipv4 address: family = AF_INET else: family = AF_UNSPEC addrinfo = getaddrinfo(X, NULL, { .ai_family = family, .ai_flags = AI_CANONNAME | AI_ADDRCONFIG }) bind(fd, addrinfo) (where bind() works on both AF_INET and AF_INET6 if getaddrinfo returns both). This does the right thing most of the time. There are cases however where the behaviour is wrong or at least undesirable: a) if X = 0.0.0.0 and no interface (but lo) has an ipv4 address, getaddrinfo returns an error and tftpd fails to start with cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known b) if X is an hostname that resolves to an ipv4 and an ipv6 address and the machine currently has no interface (but lo) with an ipv6 address, tftpd only binds to the ipv4 address. In case a) my expectation is that tftpd binds to 0.0.0.0 anyhow. I don't think it is necessary to show a scenario where this is sensible because I expect a command to do what was requested unless that's impossible. Nevertheless there are situations where this might make sense: a1) On a mobile machine without network access during boot. tftpd might later be used when an interface is up (e.g. it is plugged into a network later or a virtual machine is booted once it is needed.) a2) On a machine where you want to boot quickly and so drop unneeded prerequisites. So you can start tftpd in parallel to bringing the network up without the need to serialize these two. Additionally to the refusal to start binding on 0.0.0.0 the error message is not understandable to me. In case b) my expectation is that tftpd fails with something like: cannot bind to IPv6 address $ipv6_address a) can be fixed by just dropping AI_ADDRCONFIG from the call to getaddrinfo. Also for b) this improves the situation, from cannot resolve local IPv6 bind address: $X (...); using IPv4 only to cannot bind to local IPv6 socket,IPv6 disabled: ... So in this case the error message at least matches the actual problem. This convinces me it's the right thing to drop AI_ADDRCONFIG for tftpd as AFAICT there is no down side. Best regards Uwe -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://www.zytor.com/pipermail/syslinux/attachments/20170204/7d5c9583/attachment.sig>
Maybe Matching Threads
- Bug#771441: [PATCH tftpd-hpa] tftpd: don't use AI_CANONNAME and AI_ADDRCONFIG to resolve addresses for bind
- [PATCH tftpd-hpa] tftpd: don't use AI_CANONNAME and AI_ADDRCONFIG to resolve addresses for bind
- Bug#771441: [PATCH tftpd-hpa] tftpd: don't use AI_CANONNAME and AI_ADDRCONFIG to resolve addresses for bind
- tftpd: don't use AI_ADDRCONFIG to resolve addresses to bind(2)
- Bug#771441: [PATCH tftpd-hpa] tftpd: don't use AI_CANONNAME and AI_ADDRCONFIG to resolve addresses for bind