Tony and Robyn Lewis
2005-Nov-16 23:34 UTC
[netflow-tools] "Fatal error - exiting immediately"
Have just come across softflowd, and am trying to get it working.
When I run it, i get a fatal error:
me at mymachine:~/softflowd-0.9.7$ sudo ./softflowd -i eth0
-nlocalhost:2055 -d -D
softflowd v0.9.7 starting data collection
Exporting flows to [127.0.0.1]:2055
Fatal error - exiting immediately
Exiting immediately on user request
I had a quick poke in the code, but nothing jumped out. I can see where
the error is reported, but it looks like it''s generated in the big
for(;;) loop.
I''m using libpcap0.8 headers, though I tried it with libpcap0.7 too.
The following tail end of strace output might yield clues:
----------------------------------------------------------------
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(2055),
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
socket(PF_FILE, SOCK_STREAM, 0) = 5
unlink("/var/run/softflowd.ctl") = -1 ENOENT (No such file or
directory)
bind(5, {sa_family=AF_FILE, path="/var/run/softflowd.ctl"}, 25) = 0
listen(5, 64) = 0
write(2, "softflowd v0.9.7 starting data c"..., 41softflowd v0.9.7
starting data collection) = 41
write(2, "\n", 1
) = 1
write(2, "Exporting flows to [127.0.0.1]:2"..., 35Exporting flows to
[127.0.0.1]:2055) = 35
write(2, "\n", 1
) = 1
gettimeofday({1132183833, 268447}, NULL) = 0
gettimeofday({1132183833, 268509}, NULL) = 0
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}, {fd=5,
events=POLLIN|POLLERR|POLLHUP}], 2, -1) = 1
recvfrom(3,
"\0\220\32\240\0\320\0\2\245\254\222C\210d\21\0\10\242\5"..., 160,
MSG_TRUNC, {sa_family=AF_PACKET, proto=0x8864, if2,
pkttype=PACKET_OUTGOING, addr(6)={1, 0002a5ac9243}, [20]) = 1462
ioctl(3, SIOCGSTAMP, 0xbf9ea5f8) = 0
write(2, "Fatal error - exiting immediatel"..., 33Fatal error -
exiting
immediately) = 33
write(2, "\n", 1
) = 1
write(2, "Exiting immediately on user requ"..., 35Exiting immediately
on
user request) = 35
write(2, "\n", 1
) = 1
close(3) = 0
close(4) = 0
unlink("/var/run/softflowd.pid") = -1 ENOENT (No such file or
directory)
unlink("/var/run/softflowd.ctl") = 0
exit_group(0) = ?
----------------------------------------------------------------
My eye is drawn to the lines:
recvfrom(3,
"\0\220\32\240\0\320\0\2\245\254\222C\210d\21\0\10\242\5"..., 160,
MSG_TRUNC, {sa_family=AF_PACKET, proto=0x8864, if2,
pkttype=PACKET_OUTGOING, addr(6)={1, 0002a5ac9243}, [20]) = 1462
ioctl(3, SIOCGSTAMP, 0xbf9ea5f8) = 0
write(2, "Fatal error - exiting immediatel"..., 33Fatal error -
exiting
immediately) = 33
But other than that, before I go flinging code hither and yon, am I
missing anything glaringly obvious?
I''m running debian testing, on a PIII jobbie.
Tony Lewis
On Thu, 17 Nov 2005, Tony and Robyn Lewis wrote:> Have just come across softflowd, and am trying to get it working. > > When I run it, i get a fatal error: > > me at mymachine:~/softflowd-0.9.7$ sudo ./softflowd -i eth0 > -nlocalhost:2055 -d -D > softflowd v0.9.7 starting data collection > Exporting flows to [127.0.0.1]:2055 > Fatal error - exiting immediatelyThat indicates a malloc() failure usually. Are you running with tight ulimits?> Exiting immediately on user request > > I had a quick poke in the code, but nothing jumped out. I can see where > the error is reported, but it looks like it''s generated in the big > for(;;) loop. > > I''m using libpcap0.8 headers, though I tried it with libpcap0.7 too. > > The following tail end of strace output might yield clues:Do you have ltrace or any tools that can trace system library calls? That would probably be more useful. If you want to dive into the code, search for return(PP_MALLOC_FAIL) statements and insert some logging around them so you can see exactly which allocation failed. -d
On Thu, 17 Nov 2005, Damien Miller wrote:> If you want to dive into the code, search for return(PP_MALLOC_FAIL) > statements and insert some logging around them so you can see exactly > which allocation failed.Like the attached patch does... -d -------------- next part -------------- Index: softflowd.c ==================================================================RCS file: /var/cvs/softflowd/softflowd.c,v retrieving revision 1.84 diff -u -p -r1.84 softflowd.c --- softflowd.c 1 Oct 2005 00:14:21 -0000 1.84 +++ softflowd.c 17 Nov 2005 00:34:30 -0000 @@ -533,8 +533,11 @@ process_packet(struct FLOWTRACK *ft, con /* If a matching flow does not exist, create and insert one */ if ((flow = FLOW_FIND(FLOWS, &ft->flows, &tmp)) == NULL) { /* Allocate and fill in the flow */ - if ((flow = malloc(sizeof(*flow))) == NULL) + if ((flow = malloc(sizeof(*flow))) == NULL) { + logit(LOG_ERR, "process_packet: flow malloc(%u) fail", + sizeof(*flow)); return (PP_MALLOC_FAIL); + } memcpy(flow, &tmp, sizeof(*flow)); memcpy(&flow->flow_start, received_time, sizeof(flow->flow_start)); @@ -542,8 +545,11 @@ process_packet(struct FLOWTRACK *ft, con FLOW_INSERT(FLOWS, &ft->flows, flow); /* Allocate and fill in the associated expiry event */ - if ((flow->expiry = malloc(sizeof(*flow->expiry))) == NULL) + if ((flow->expiry = malloc(sizeof(*flow->expiry))) == NULL) { + logit(LOG_ERR, "process_packet: expiry malloc(%u) fail", + sizeof(*flow->expiry)); return (PP_MALLOC_FAIL); + } flow->expiry->flow = flow; /* Must be non-zero (0 means expire immediately) */ flow->expiry->expires_at = 1;
Tony and Robyn Lewis
2005-Nov-17 00:51 UTC
[netflow-tools] "Fatal error - exiting immediately"
Damien Miller wrote:> On Thu, 17 Nov 2005, Tony and Robyn Lewis wrote: > >> Have just come across softflowd, and am trying to get it working. >> >> When I run it, i get a fatal error: >> >> me at mymachine:~/softflowd-0.9.7$ sudo ./softflowd -i eth0 >> -nlocalhost:2055 -d -D >> softflowd v0.9.7 starting data collection >> Exporting flows to [127.0.0.1]:2055 >> Fatal error - exiting immediately > > > That indicates a malloc() failure usually. Are you running with tight > ulimits? > >> Exiting immediately on user request >> >> I had a quick poke in the code, but nothing jumped out. I can see where >> the error is reported, but it looks like it''s generated in the big >> for(;;) loop. >> >> I''m using libpcap0.8 headers, though I tried it with libpcap0.7 too. >> >> The following tail end of strace output might yield clues: > > > Do you have ltrace or any tools that can trace system library calls? > That would probably be more useful.Nice, hadn''t stumbled across ltrace. Relevant bits; -------------------------------------------------------------------- vfprintf(0xb7ee7de0, "%s v%s starting data collection", 0xbfb2fa88softflowd v0.9.7 starting data collection) = 41 fputc(''\n'', 0xb7ee7de0 ) = 10 vfprintf(0xb7ee7de0, "Exporting flows to [%s]:%s", 0xbfb2fa88Exporting flows to [127.0.0.1]:2055) = 35 fputc(''\n'', 0xb7ee7de0 ) = 10 gettimeofday(0xbfb2faf8, NULL) = 0 pcap_fileno(0x8052058, 0, 0xbfb32e00, 0xbfb32d00, 0) = 3 gettimeofday(0xbfb2fa78, NULL) = 0 poll(0xbfb32ff0, 2, -1, -1, 0) = 1 pcap_dispatch(0x8052058, 8192, 0x804c0a0, 0xbfb33000, 0) = 1 vfprintf(0xb7ee7de0, "Fatal error - exiting immediatel"..., 0xbfb2fa88Fatal error - exiting immediately) = 33 fputc(''\n'', 0xb7ee7de0 ) = 10 vfprintf(0xb7ee7de0, "Exiting immediately on user requ"..., 0xbfb2fa88Exiting immediately on user request) = 35 fputc(''\n'', 0xb7ee7de0 ) = 10 pcap_close(0x8052058, 0x804f984, 0x804c0a0, 0xbfb33000, 0) = 489 close(4) = 0 unlink("/var/run/softflowd.pid") = -1 unlink("/var/run/softflowd.ctl") = 0 +++ exited (status 0) +++ -------------------------------------------------------------------- Tony
Tony and Robyn Lewis
2005-Nov-17 00:51 UTC
[netflow-tools] "Fatal error - exiting immediately"
Damien Miller wrote:> On Thu, 17 Nov 2005, Damien Miller wrote: > >> If you want to dive into the code, search for return(PP_MALLOC_FAIL) >> statements and insert some logging around them so you can see exactly >> which allocation failed. > > > Like the attached patch does...Applied, compiled, ran - no diff to output: me at mymachine:~/softflowd-0.9.7$ sudo ./softflowd -i eth0 -n127.0.0.1:2055 -d -D softflowd v0.9.7 starting data collection Exporting flows to [127.0.0.1]:2055 Fatal error - exiting immediately Exiting immediately on user request Tony
On Thu, 17 Nov 2005, Tony and Robyn Lewis wrote:> Applied, compiled, ran - no diff to output:Try this, it shouldn''t matter unless your compiler is buggy. What version of gcc are you using? something from the 4.x series perhaps? -d -------------- next part -------------- Index: softflowd.c ==================================================================RCS file: /var/cvs/softflowd/softflowd.c,v retrieving revision 1.84 diff -u -p -r1.84 softflowd.c --- softflowd.c 1 Oct 2005 00:14:21 -0000 1.84 +++ softflowd.c 17 Nov 2005 00:55:45 -0000 @@ -1749,6 +1755,7 @@ main(int argc, char **argv) /* Main processing loop */ gettimeofday(&flowtrack.system_boot_time, NULL); stop_collection_flag = 0; + memset(cb_ctxt, ''\0'', sizeof(cb_ctxt)); cb_ctxt.ft = &flowtrack; cb_ctxt.linktype = linktype; cb_ctxt.want_v6 = target.dialect->v6_capable || always_v6;
Tony and Robyn Lewis
2005-Nov-17 02:15 UTC
[netflow-tools] "Fatal error - exiting immediately"
Damien Miller wrote:> > Try this, it shouldn''t matter unless your compiler is buggy.me at mymachine:~/softflowd-0.9.7$ patch -p0 <../patchfile2 patching file softflowd.c Hunk #1 succeeded at 1752 (offset -3 lines). me at mymachine:~/softflowd-0.9.7$ make gcc -g -O2 -DFLOW_SPLAY -DEXPIRY_RB -I. -c -o softflowd.o softflowd.c softflowd.c: In function ''accept_control'': softflowd.c:1097: warning: pointer targets in passing argument 1 of ''fgets'' differ in signedness softflowd.c:1101: warning: pointer targets in passing argument 1 of ''__builtin_strchr'' differ in signedness softflowd.c:1101: warning: pointer targets in assignment differ in signedness softflowd.c:1109: warning: pointer targets in passing argument 1 of ''strlen'' differ in signedness softflowd.c:1109: warning: pointer targets in passing argument 1 of ''__builtin_strcmp'' differ in signedness softflowd.c:1109: warning: pointer targets in passing argument 1 of ''strlen'' differ in signedness ... snip ... softflowd.c:1160: warning: pointer targets in passing argument 1 of ''__builtin_strcmp'' differ in signedness softflowd.c:1160: warning: pointer targets in passing argument 1 of ''__builtin_strcmp'' differ in signedness softflowd.c: In function ''main'': softflowd.c:1749: error: incompatible type for argument 1 of ''memset'' make: *** [softflowd.o] Error 1> What version of gcc are you using? something from the 4.x series perhaps?me at mymachine:~/softflowd-0.9.7$ gcc -v Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,java,f95,objc,ada,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.0 --enable-__cxa_atexit --enable-libstdcxx-allocator=mt --enable-clocale=gnu --enable-libstdcxx-debug --enable-java-gc=boehm --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0/jre --enable-mpfr --disable-werror --enable-checking=release i486-linux-gnu Thread model: posix gcc version 4.0.2 (Debian 4.0.2-2) Aha! I reconfigured and remade with gcc-3.3 and she seems to be working. Is GCC 4.0 buggy? Tony
On Thu, 17 Nov 2005, Tony and Robyn Lewis wrote:> Damien Miller wrote: > >> >> Try this, it shouldn''t matter unless your compiler is buggy. > > softflowd.c:1749: error: incompatible type for argument 1 of ''memset''oops, bad patch. Try this one (revert the last with "patch -R < softflowd-zero.diff" first). Or, don''t worry about it, since you have found the real problem :)> Aha! I reconfigured and remade with gcc-3.3 and she seems to be working. > > Is GCC 4.0 buggy?It is very buggy. There have been a spate of bugs reported to OpenSSH that have been traced to gcc-4.x miscompilations, though Linux/x86 hasn''t been as bad as more obscure platforms. I would avoid it for quite a few versions yet... -d -------------- next part -------------- Index: softflowd.c ==================================================================RCS file: /var/cvs/softflowd/softflowd.c,v retrieving revision 1.84 diff -u -p -r1.84 softflowd.c --- softflowd.c 1 Oct 2005 00:14:21 -0000 1.84 +++ softflowd.c 17 Nov 2005 00:55:45 -0000 @@ -1749,6 +1755,7 @@ main(int argc, char **argv) /* Main processing loop */ gettimeofday(&flowtrack.system_boot_time, NULL); stop_collection_flag = 0; + memset(&cb_ctxt, ''\0'', sizeof(cb_ctxt)); cb_ctxt.ft = &flowtrack; cb_ctxt.linktype = linktype; cb_ctxt.want_v6 = target.dialect->v6_capable || always_v6;
Damien Miller wrote:>> Is GCC 4.0 buggy? > > It is very buggy. There have been a spate of bugs reported to OpenSSH > that have been traced to gcc-4.x miscompilations, though Linux/x86 > hasn''t been as bad as more obscure platforms. I would avoid it for quite > a few versions yet...Damien, I''ve also found that softflowd dies instantly when compiled with GCC 4.0.2 under Solaris 9/Sparc - I hadn''t found time to verify conclusively that GCC was responsible or I would have reported the issue earlier. # softflowd -D -i bge3 softflowd v0.9.7 starting data collection ADD FLOW seq:1 [68.60.146.124]:27030 <> [161.73.46.160]:1782 proto:6 ...<snip>... ADD FLOW seq:115 [64.228.88.220]:47134 <> [161.73.46.160]:2033 proto:6 Fatal error - exiting immediately Exiting immediately on internal error truss doesn''t throw up anything interesting (that I can see), and I don''t yet have access to ltrace on the affected system. Robin -- Robin Breathe, Computer Services, Oxford Brookes University, Oxford, UK rbreathe at brookes.ac.uk Tel: +44 1865 483685 Fax: +44 1865 483073 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: OpenPGP digital signature Url : http://lists.mindrot.org/pipermail/netflow-tools/attachments/20051118/2e630ec8/attachment.bin
Robin Breathe wrote:>Damien Miller wrote: > > >>>Is GCC 4.0 buggy? >>> >>> >>It is very buggy. There have been a spate of bugs reported to OpenSSH >>that have been traced to gcc-4.x miscompilations, though Linux/x86 >>hasn''t been as bad as more obscure platforms. I would avoid it for quite >>a few versions yet... >> >> > >Damien, > >I''ve also found that softflowd dies instantly when compiled with GCC >4.0.2 under Solaris 9/Sparc - I hadn''t found time to verify conclusively >that GCC was responsible or I would have reported the issue earlier. > >Robin, I don''t know if you picked it up, but I sorted this out by compiling with GCC 3.3. I hadn''t heard that GCC 4.0 was bad. In fact, most of Debian (though not the kernel) is compiled with GCC 4.0 I think. It''s the default GCC. Dunno how they get away with it. If it''s compiler badness, there''s probably not a lot you can do, other than maybe raise bugs on GCC, because it seems fairly reproducible. Tony
Robin Breathe wrote:>Damien Miller wrote: > > >>>Is GCC 4.0 buggy? >>> >>> >>It is very buggy. There have been a spate of bugs reported to OpenSSH >>that have been traced to gcc-4.x miscompilations, though Linux/x86 >>hasn''t been as bad as more obscure platforms. I would avoid it for quite >>a few versions yet... >> >> > >Damien, > >I''ve also found that softflowd dies instantly when compiled with GCC >4.0.2 under Solaris 9/Sparc - I hadn''t found time to verify conclusively >that GCC was responsible or I would have reported the issue earlier. > >Robin, I don''t know if you picked it up, but I sorted this out by compiling with GCC 3.3. I hadn''t heard that GCC 4.0 was bad. In fact, most of Debian (though not the kernel) is compiled with GCC 4.0 I think. It''s the default GCC. Dunno how they get away with it. If it''s compiler badness, there''s probably not a lot you can do, other than maybe raise bugs on GCC, because it seems fairly reproducible. Tony
Tony Lewis wrote:>> I''ve also found that softflowd dies instantly when compiled with GCC >> 4.0.2 under Solaris 9/Sparc - I hadn''t found time to verify conclusively >> that GCC was responsible or I would have reported the issue earlier. >> > Robin, I don''t know if you picked it up, but I sorted this out by > compiling with GCC 3.3.I was already aware of this - I was hit after OpenPKG upgraded to GCC 4.0 as the default compiler in their 2.5 release. I had already been running for some time with GCC 3.4.2, so it was fairly easy to see what the likely culprit was (I''ve just carried on using my customised OpenPKG-2.4/GCC-3.4.2 RPMs).> I hadn''t heard that GCC 4.0 was bad. In fact, most of Debian (though > not the kernel) is compiled with GCC 4.0 I think. It''s the default > GCC. Dunno how they get away with it.Indeed. Mac OS X.4 is also compiled with GCC 4.0. I''m not sure whether "bad" is an apt description. "Different", sure. But I''m not an expert on compiler internals, so... :)> If it''s compiler badness, there''s probably not a lot you can do, other > than maybe raise bugs on GCC, because it seems fairly reproducible.Quite, I was waiting to report this problem until I could track down exactly what was going on, and whether the problem were limited to Solars/Sparc with GCC 4.0. Robin -- Robin Breathe, Computer Services, Oxford Brookes University, Oxford, UK rbreathe at brookes.ac.uk Tel: +44 1865 483685 Fax: +44 1865 483073 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: OpenPGP digital signature Url : http://lists.mindrot.org/pipermail/netflow-tools/attachments/20051118/a316183d/attachment.bin