Hi, I'm trying to debug crashing server, and I'm starting to suspect tftp server as one of possible culprits. The box is quad core Xeon with 4GB of memory serving as Asterisk PBX + some additional services (reporting, IS integration etc). Among others, it's also serving as provisioning box for over hundred of SPA504 phones. Phones are checking configuration files every 60s meaning average of a bit over one TFTP request/second. It happened few times, that box stopped replying and had to be rebooted. Checking various logs and especially atop files showed, that minutes before the box became unreachable, number of in.tftpd processes has grown to ~4000. Checking pcap files showed that phones were sending file requests, but were not getting any replies thus retrying every 5s (so tftp request rade growed to ~100files each 5s). We were trying to simulate the problem on test virtual guest and successfully reproduced it: - VM got setup as tftp server (tftp started as xinet service, details below) - on host, we've blocked all outgoing tftp traffic using iptables. then trying tftp request from some other machine spawns hanging tftp process (running client 1000 times spawns 1000 in.tftpd processes) I see various problems here: - phones behave just stupid, retrying each 5s over and over. this seems to be fixed by newer firmware though (after 5 retries, it sleeps for a minute). but newer firmware has some other problems, but it's not important here. - I'm not sure whether the root cause is not somewhere in network stack or so, but other applications seem to be communicating without problems, only tftp stops replying. Anyways, I wonder whether forking so many processes hanging on select() is correct behaviour. My inet configuration looks like this: service tftp { disable = no socket_type = dgram protocol = udp wait = yes user = root server = /usr/sbin/in.tftpd server_args = -s /tftpboot -v -v -v -c per_source = 11 cps = 100 2 flags = IPv4 instances = 500 } Which I'd undetstand that no more then 500 in.tftpd processes should be spawned. I guess tftp is forking on it's own, right? Is it possible it could block somewhere unable to send replis, but forking on (repeated) requests? Does somebody have an idea on where the problem could be, or how should I proceed with debugging? The box is x86_64 centos, tried both 0.49 and 5.2 versions of tftp. I'd be very gratefull for any help. thanks in advance with best regards nik -- ------------------------------------- Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis at linuxbox.cz ------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://www.zytor.com/pipermail/syslinux/attachments/20121113/681c2178/attachment.sig>
On Tue, Nov 13, 2012 at 01:49:46PM +0100, Nikola Ciprich wrote:> Hi, > > I'm trying to debug crashing server, and I'm starting to suspect tftp > server as one of possible culprits. > The box is quad core Xeon with 4GB of memory serving as Asterisk PBX + some > additional services (reporting, IS integration etc). Among others, it's also > serving as provisioning box for over hundred of SPA504 phones. Phones are > checking configuration files every 60s meaning average of a bit over one TFTP > request/second. > > It happened few times, that box stopped replying and had to be rebooted. Checking > various logs and especially atop files showed, that minutes before the box became > unreachable, number of in.tftpd processes has grown to ~4000. Checking pcap files > showed that phones were sending file requests, but were not getting any replies > thus retrying every 5s (so tftp request rade growed to ~100files each 5s). > > We were trying to simulate the problem on test virtual guest and successfully > reproduced it: > > - VM got setup as tftp server (tftp started as xinet service, details below) > - on host, we've blocked all outgoing tftp traffic using iptables. > > then trying tftp request from some other machine spawns hanging tftp process > (running client 1000 times spawns 1000 in.tftpd processes)Respect! Nice test setup!> I see various problems here: > - phones behave just stupid, retrying each 5s over and over. this seems to be > fixed by newer firmware though (after 5 retries, it sleeps for a minute). but newer > firmware has some other problems, but it's not important here. > > - I'm not sure whether the root cause is not somewhere in network stack or so, but > other applications seem to be communicating without problems, only tftp stops replying. > > Anyways, I wonder whether forking so many processes hanging on select() is correct behaviour. > My inet configuration looks like this: > > service tftp > { > disable = no > socket_type = dgram > protocol = udp > wait = yes > user = root > server = /usr/sbin/in.tftpd > server_args = -s /tftpboot -v -v -v -c > per_source = 11 > cps = 100 2 > flags = IPv4 > instances = 500 > } > > Which I'd undetstand that no more then 500 in.tftpd processes should be spawned. > I guess tftp is forking on it's own, right? Is it possible it could block somewhere > unable to send replis, but forking on (repeated) requests? > > Does somebody have an idea on where the problem could be, > or how should I proceed with debugging?I would go for the TFTP server "stand alone", so not as a inetd proces.> The box is x86_64 centos, tried both 0.49 and 5.2 versions of tftp. > > I'd be very gratefull for any help. > > thanks in advance > > with best regards > > nikGroeten Geert Stappers --> And is there a policy on top-posting vs. bottom-posting?Yes.
On 11/13/2012 04:49 AM, Nikola Ciprich wrote:> > Which I'd undetstand that no more then 500 in.tftpd processes > should be spawned. I guess tftp is forking on it's own, right? Is > it possible it could block somewhere unable to send replis, but > forking on (repeated) requests? >This comes up every so often. The way to fix this is for in.tftpd to keep track of its clients in a hash database so it can filter out repeated RRQs or WRQs, but since the design of the protocol is still such that repeated RRQs or WRQs cannot always be detected it is still not always going to work. Either way, it is unlikely I'm going to have any significant time to spend on tftp-hpa any time in the near future. -hpa
Possibly Parallel Threads
- using feature from applicationmap while ringing in queue
- timeout on VM actions prone to hang
- TFTP problems on FC4
- [CentOS 5] tftp-server, unable to create new files (even with "-c" option)
- TFTP to be installed in Linux same asterisk machine to be used with Cisco