Hi all, We recently moved our CVS repository from a 4.6-STABLE machine to a brand new 4.8 install, on another identical machine. The server runs cvs in 'pserver' mode, for remote access by various Windows/Solaris/Linux/FreeBSD clients. We pretty soon noticed that the cvs server process was occasionally crashing on sig11 (ie. a segfault). The only evidence for this was in the message log, the cvs operations always completed normally on the client side. This *never* happened on the old server, so I figured it had to be a hardware problem on the new machine, or some issue with 4.8. Probably happening about 1 in every 100 times the cvs server was run. I compiled a debug version of cvs from the 4.8 sources and was able to get a few cores, once I figured out how to make it actually dump core. I've attached the log of a gdb session on one of these -- all the cores I have show the process crashing in the same place, where it's clearly trying to follow a NULL pointer. I've since copied the cvs binary from the 4.6 machine across to the new server -- we've run with this for the past two weeks and had exactly zero problems with it. Given that all the cores are the same, and that the only thing we've seen fail on this machine is the 4.8 cvs code, this smells like a cvs bug to me. I've no idea if it's in our local extensions or the base cvs code -- should I be sending the bug report to FreeBSD.org or cvshome.org? Is there anyone on here familiar with the internals of cvs who wants to take a look at this? I can provide any additional configuration details or more grovelling in the core dumps on request... Cheers, Scott -------------- next part -------------- Script started on Wed Jul 23 11:14:55 2003 pukeko# gdb `which cvs.debug` cvs.debug.81697.core GNU gdb 4.18 (FreeBSD) Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 2627 in elfstab_build_psymtabs Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 933 in fill_symbuf Core was generated by `cvs.debug'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/libgnuregex.so.2...done. Reading symbols from /usr/lib/libmd.so.2...done. Reading symbols from /usr/lib/libcrypt.so.2...done. Reading symbols from /usr/lib/libz.so.2...done. Reading symbols from /usr/lib/libc.so.4...done. Reading symbols from /usr/libexec/ld-elf.so.1...done. #0 buf_shutdown (buf=0x0) at /usr/src/gnu/usr.bin/cvs/cvs/../../../../contrib/cvs/src/buffer.c:1208 1208 if (buf->shutdown) (gdb) where #0 buf_shutdown (buf=0x0) at /usr/src/gnu/usr.bin/cvs/cvs/../../../../contrib/cvs/src/buffer.c:1208 #1 0x8087e2b in server_cleanup (sig=0) at /usr/src/gnu/usr.bin/cvs/cvs/../../../../contrib/cvs/src/server.c:4892 #2 0x805ec67 in error_exit () at /usr/src/gnu/usr.bin/cvs/cvs/../../../../contrib/cvs/src/error.c:71 #3 0x805ef27 in error (status=1, errnum=0, message=0x80ab4b9 "received %s signal") at /usr/src/gnu/usr.bin/cvs/cvs/../../../../contrib/cvs/src/error.c:212 #4 0x806daae in main_cleanup (sig=13) at /usr/src/gnu/usr.bin/cvs/cvs/../../../../contrib/cvs/src/main.c:395 #5 0x80926e4 in strip_trailing_slashes () #6 0xbfbfffac in ?? () #7 0x804d85a in buf_send_output (buf=0x80c1040) at /usr/src/gnu/usr.bin/cvs/cvs/../../../../contrib/cvs/src/buffer.c:287 #8 0x804d900 in buf_flush (buf=0x80c1040, block=1) at /usr/src/gnu/usr.bin/cvs/cvs/../../../../contrib/cvs/src/buffer.c:352 #9 0x8087eb7 in server_cleanup (sig=0) at /usr/src/gnu/usr.bin/cvs/cvs/../../../../contrib/cvs/src/server.c:5007 #10 0x80883e2 in server (argc=1, argv=0xbfbffc88) at /usr/src/gnu/usr.bin/cvs/cvs/../../../../contrib/cvs/src/server.c:5234 #11 0x806e636 in main (argc=1, argv=0xbfbffc88) at /usr/src/gnu/usr.bin/cvs/cvs/../../../../contrib/cvs/src/main.c:1028 #12 0x804a67a in _start () (gdb) list 1203 1204 int 1205 buf_shutdown (buf) 1206 struct buffer *buf; 1207 { 1208 if (buf->shutdown) 1209 return (*buf->shutdown) (buf); 1210 return 0; 1211 } 1212 (gdb) quit pukeko# ^Dexit Script done on Wed Jul 23 11:15:28 2003
On Wed, 23 Jul 2003, Scott Mitchell wrote:> We recently moved our CVS repository from a 4.6-STABLE machine to a brand > new 4.8 install, on another identical machine. The server runs cvs in > 'pserver' mode, for remote access by various Windows/Solaris/Linux/FreeBSD > clients. > > We pretty soon noticed that the cvs server process was occasionally crashing > on sig11 (ie. a segfault).Does the new machine have ECC memory? If not, it could be something as simple as bad RAM in the new system. -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org
Scott Mitchell <scott+freebsd@fishballoon.org> writes:> Hi all, > > We recently moved our CVS repository from a 4.6-STABLE machine to a brand > new 4.8 install, on another identical machine. The server runs cvs in > 'pserver' mode, for remote access by various Windows/Solaris/Linux/FreeBSD > clients. > > We pretty soon noticed that the cvs server process was occasionally crashing > on sig11 (ie. a segfault). The only evidence for this was in the message^^^^^^^^^^^^^^> log, the cvs operations always completed normally on the client side.ehmm, I reported that I saw this as well on my cvs-server. This evening my eye caught the log-file of one of the client boxes : Tue Jul 29 10:13:37 MEST 2003 Jul 29 11:04:43 sos login: ROOT LOGIN (toor) ON ttyv0 pid 745 (jikes), uid 122: exited on signal 11 (core dumped) Jul 29 11:26:52 sos /kernel: pid 745 (jikes), uid 122: exited on signal 11 (core dumped) pid 1592 (jikes), uid 122: exited on signal 11 (core dumped) Jul 29 11:41:54 sos /kernel: pid 1592 (jikes), uid 122: exited on signal 11 (core dumped) pid 3238 (jikes), uid 122: exited on signal 11 (core dumped) Jul 29 11:59:09 sos /kernel: pid 3238 (jikes), uid 122: exited on signal 11 (core dumped) pid 5320 (java), uid 122: exited on signal 6 (core dumped) Jul 29 12:21:29 sos /kernel: pid 5320 (java), uid 122: exited on signal 6 (core dumped) pid 8125 (jikes), uid 122: exited on signal 10 (core dumped) Jul 29 12:37:35 sos /kernel: pid 8125 (jikes), uid 122: exited on signal 10 (core dumped) pid 16042 (jikes), uid 122: exited on signal 11 (core dumped) Jul 29 13:00:46 sos /kernel: pid 16042 (jikes), uid 122: exited on signal 11 (core dumped) pid 22650 (jikes), uid 122: exited on signal 11 (core dumped) Jul 29 17:20:27 sos /kernel: pid 22650 (jikes), uid 122: exited on signal 11 (core dumped) pid 26072 (java), uid 122: exited on signal 6 (core dumped) Jul 29 19:54:17 sos /kernel: pid 26072 (java), uid 122: exited on signal 6 (core dumped) pid 28073 (jikes), uid 122: exited on signal 11 (core dumped) Jul 29 20:02:08 sos /kernel: pid 28073 (jikes), uid 122: exited on signal 11 (core dumped) pid 31158 (jikes), uid 122: exited on signal 11 (core dumped) Jul 29 20:52:01 sos /kernel: pid 31158 (jikes), uid 122: exited on signal 11 (core dumped) I am troubled by the "The only evidence for" : the developer on this client box, as well as you (and me) for the cvs-pserver, swears everything is OK for him (from "application point of view"). Could this be a "problem@exit" or something like that? Just my thoughts. Regards, Arno
On Wed, Jul 23, 2003 at 11:46:31AM +0100, Scott Mitchell wrote: [ Stuff about the cvs server in 4.8-R crashing every so often. A long discussion about memory testing followed... ] To wrap this thread up somewhat, a patch supplied by Ruslan Ermilov (ru@) has made the problem go away. See PR bin/54854 for details. I'm now attempting to get this fix committed back to the official cvs, and thus eventually back into FreeBSD, so I'm not condemned to running an old version forever, or remembering to reapply the patch at every upgrade :-( Thanks to all who replied. At least I know a bit more about memory testing than I did when this all started :-) Cheers, Scott -- ==========================================================================Scott Mitchell | PGP Key ID | "Eagles may soar, but weasels Cambridge, England | 0x54B171B9 | don't get sucked into jet engines" scott at fishballoon.org | 0xAA775B8B | -- Anon