There was a "request" for Tor related problem reports a while ago, I couldn't find the message again, but I believe it was posted here. Last week I installed: FreeBSD tor.fabiankeil.de 6.1-RELEASE-p2 FreeBSD 6.1-RELEASE-p2 #0: Fri Jun 23 20:06:57 CEST 2006 fk@fabiankeil.de:/usr/obj/usr/src/sys/BIGSLEEP i386. At the moment it is only acting as Tor node <http://serifos.eecs.harvard.edu/cgi-bin/desc.pl?q=zwiebelsuppe> tor-devel (maintainer CC'd) is running jailed in a Geli image, ntpd, named, cron and sshd are running in the host system and that's about it. No mail or web server and nearly no traffic besides the one caused by Tor. I started Tor Friday night and had to reset the box three times since then. The server just suddenly stops responding, the logs stop as well, therefore I assume it either panics or hangs. I only have remote access, a serial console is available, but it becomes unresponsive as well. I didn't configure DDB yet, so maybe that is to be expected? cron creates some stats every five minutes, a few minutes before a hang this morning the load was: last pid: 7996; load averages: 0.40, 0.37, 0.36 up 0+18:38:25 05:55:02 83 processes: 2 running, 66 sleeping, 15 waiting CPU states: 21.3% user, 0.0% nice, 17.8% system, 20.2% interrupt, 40.7% idle Mem: 100M Active, 157M Inact, 102M Wired, 12K Cache, 60M Buf, 134M Free Swap: 1024M Total, 1024M Free PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 11 root 1 171 52 0K 8K RUN 857:30 53.61% idle 12 root 1 -44 -163 0K 8K WAIT 45:22 6.54% swi1: net 23 root 1 -68 -187 0K 8K WAIT 14:48 2.83% irq12: fxp0 fxp1 7973 root 1 96 0 2264K 1544K RUN 0:00 0.51% top 13 root 1 -32 -151 0K 8K WAIT 5:49 0.10% swi4: clock sio 33 root 1 171 52 0K 8K pgzero 0:02 0.10% pagezero 3 root 1 -8 0 0K 8K - 0:16 0.05% g_up 1586 _tor 14 20 0 99M 97912K kserel 188:36 0.00% tor 15 root 1 -16 0 0K 8K - 1:01 0.00% yarrow 1443 root 1 -8 0 0K 8K geli:w 0:49 0.00% g_eli[0] md0 4 root 1 -8 0 0K 8K - 0:21 0.00% g_down 35 root 1 20 0 0K 8K syncer 0:17 0.00% syncer 1439 root 1 -8 0 0K 8K mdwait 0:13 0.00% md0 24 root 1 -64 -183 0K 8K WAIT 0:08 0.00% irq14: ata0 2 root 1 -8 0 0K 8K - 0:07 0.00% g_event 42 root 1 -16 0 0K 8K - 0:06 0.00% schedcpu 453 root 1 96 0 2920K 1752K select 0:05 0.00% ntpd 256 _pflogd 1 -58 0 1548K 1216K bpf 0:05 0.00% pflog pfctls -si: Status: Enabled for 0 days 18:37:52 Debug: Urgent Hostid: 0x1ec3da6b Interface Stats for fxp0 IPv4 IPv6 Bytes In 25077859159 0 Bytes Out 27498863362 0 Packets In Passed 36192760 0 Blocked 32213 0 Packets Out Passed 36871432 0 Blocked 265 0 State Table Total Rate current entries 5290 searches 73567507 1096.8/s inserts 600068 8.9/s removals 594778 8.9/s Counters match 752600 11.2/s bad-offset 0 0.0/s fragment 102 0.0/s short 0 0.0/s normalize 2 0.0/s memory 68 0.0/s bad-timestamp 0 0.0/s congestion 0 0.0/s ip-option 0 0.0/s proto-cksum 0 0.0/s state-mismatch 12655 0.2/s state-insert 0 0.0/s state-limit 0 0.0/s src-limit 2 0.0/s synproxy Today's traffic graph: <http://www.fabiankeil.de/blog-surrogat/2006/06/27/tor.fabiankeil.de-dritter-ausfall-24-stunden-durchsatz-statistik-595x337.png> (The hang around 14:00 happened while I was logged in doing a buildworld) At the moment I'm building RELENG_6 with DDB to see if it changes anything and if I can get a core dump, but so far the problem seems to be similar to: http://www.freebsd.org/cgi/query-pr.cgi?pr=95180 (closed) and <http://freebsd.rambler.ru/bsdmail/freebsd-questions_2006/msg08692.html>. Is anyone on this list running a Tor node on FreeBSD 6.1-RELEASE or later with similar or higher load? Fabian -- http://www.fabiankeil.de/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060627/e442b268/signature.pgp
On Tue, 27 Jun 2006, Fabian Keil wrote:> There was a "request" for Tor related problem reports a while ago, I > couldn't find the message again, but I believe it was posted here.I'm very interested in tracking down this problem, but have had a lot of trouble getting reliable reports of problems -- i.e., ones where I could get any debugging information. I had a similar conversation on these lines yeterday with Roger (Tor author) here at the WEIS conference. If this is easily reproduceable, I would like you to do the following: - Compile in options DDB, options KDB, options BREAK_TO_DEBUGGER, options WITNESS, options WITNESS_SKIPSPIN, options INVARIANTS, options INVARIANT_SUPPORT. - Make sure to have a kernel with debugging symbols for the kernel. - Turn on core dumps. The above debugging options will have a significant performance impact, and may or may not affect the probability of the race or deadlock being exercised. The first question is: - Are there any warnings on the console from WITNESS or other debugging options? If so, please copy/paste them into an e-mail for me. - Does a panic occur? If so, the output of the following comments would be very useful: show pcpu show allpcpu ps show locks show alllocks show lockedvnods trace Then walk the list of all processes listed in 'show alllocks', and run trace on each pid. - Does the hang occur? If so, use a serial break to get into DDB, see the above. In both of the last two cases, attempt to get a core dump. Robert N M Watson Computer Laboratory University of Cambridge> > Last week I installed: > FreeBSD tor.fabiankeil.de 6.1-RELEASE-p2 FreeBSD > 6.1-RELEASE-p2 #0: Fri Jun 23 20:06:57 CEST 2006 > fk@fabiankeil.de:/usr/obj/usr/src/sys/BIGSLEEP i386. > > At the moment it is only acting as Tor node > <http://serifos.eecs.harvard.edu/cgi-bin/desc.pl?q=zwiebelsuppe> > tor-devel (maintainer CC'd) is running jailed in a Geli image, > ntpd, named, cron and sshd are running in the host system > and that's about it. No mail or web server and nearly no traffic > besides the one caused by Tor. > > I started Tor Friday night and had to reset the box three times > since then. The server just suddenly stops responding, the logs > stop as well, therefore I assume it either panics or hangs. > > I only have remote access, a serial console is available, > but it becomes unresponsive as well. I didn't configure DDB yet, > so maybe that is to be expected? > > cron creates some stats every five minutes, a few minutes > before a hang this morning the load was: > > last pid: 7996; load averages: 0.40, 0.37, 0.36 up 0+18:38:25 05:55:02 > 83 processes: 2 running, 66 sleeping, 15 waiting > CPU states: 21.3% user, 0.0% nice, 17.8% system, 20.2% interrupt, 40.7% idle > Mem: 100M Active, 157M Inact, 102M Wired, 12K Cache, 60M Buf, 134M Free > Swap: 1024M Total, 1024M Free > > PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND > 11 root 1 171 52 0K 8K RUN 857:30 53.61% idle > 12 root 1 -44 -163 0K 8K WAIT 45:22 6.54% swi1: net > 23 root 1 -68 -187 0K 8K WAIT 14:48 2.83% irq12: fxp0 fxp1 > 7973 root 1 96 0 2264K 1544K RUN 0:00 0.51% top > 13 root 1 -32 -151 0K 8K WAIT 5:49 0.10% swi4: clock sio > 33 root 1 171 52 0K 8K pgzero 0:02 0.10% pagezero > 3 root 1 -8 0 0K 8K - 0:16 0.05% g_up > 1586 _tor 14 20 0 99M 97912K kserel 188:36 0.00% tor > 15 root 1 -16 0 0K 8K - 1:01 0.00% yarrow > 1443 root 1 -8 0 0K 8K geli:w 0:49 0.00% g_eli[0] md0 > 4 root 1 -8 0 0K 8K - 0:21 0.00% g_down > 35 root 1 20 0 0K 8K syncer 0:17 0.00% syncer > 1439 root 1 -8 0 0K 8K mdwait 0:13 0.00% md0 > 24 root 1 -64 -183 0K 8K WAIT 0:08 0.00% irq14: ata0 > 2 root 1 -8 0 0K 8K - 0:07 0.00% g_event > 42 root 1 -16 0 0K 8K - 0:06 0.00% schedcpu > 453 root 1 96 0 2920K 1752K select 0:05 0.00% ntpd > 256 _pflogd 1 -58 0 1548K 1216K bpf 0:05 0.00% pflog > > pfctls -si: > Status: Enabled for 0 days 18:37:52 Debug: Urgent > > Hostid: 0x1ec3da6b > > Interface Stats for fxp0 IPv4 IPv6 > Bytes In 25077859159 0 > Bytes Out 27498863362 0 > Packets In > Passed 36192760 0 > Blocked 32213 0 > Packets Out > Passed 36871432 0 > Blocked 265 0 > > State Table Total Rate > current entries 5290 > searches 73567507 1096.8/s > inserts 600068 8.9/s > removals 594778 8.9/s > Counters > match 752600 11.2/s > bad-offset 0 0.0/s > fragment 102 0.0/s > short 0 0.0/s > normalize 2 0.0/s > memory 68 0.0/s > bad-timestamp 0 0.0/s > congestion 0 0.0/s > ip-option 0 0.0/s > proto-cksum 0 0.0/s > state-mismatch 12655 0.2/s > state-insert 0 0.0/s > state-limit 0 0.0/s > src-limit 2 0.0/s > synproxy > > Today's traffic graph: > <http://www.fabiankeil.de/blog-surrogat/2006/06/27/tor.fabiankeil.de-dritter-ausfall-24-stunden-durchsatz-statistik-595x337.png> > (The hang around 14:00 happened while I was logged in doing a buildworld) > > At the moment I'm building RELENG_6 with DDB to see if it changes anything > and if I can get a core dump, but so far the problem seems to be > similar to: http://www.freebsd.org/cgi/query-pr.cgi?pr=95180 (closed) > and <http://freebsd.rambler.ru/bsdmail/freebsd-questions_2006/msg08692.html>. > > Is anyone on this list running a Tor node on FreeBSD 6.1-RELEASE > or later with similar or higher load? > > Fabian > -- > http://www.fabiankeil.de/ >
--- Fabian Keil <freebsd-listen@fabiankeil.de> wrote:> There was a "request" for Tor related problem reports > a while ago, I couldn't find the message again, but I > believe it was posted here.> Is anyone on this list running a Tor node on FreeBSD 6.1-RELEASE > or later with similar or higher load?I am hitting the same issue still Fabian. I had that PR closed as "works for me" with insignificant testing. I am still crashing (as before) but maybe only once every week or two instead of every couple hours with 6.1 RELEASE. The PR really should be reopened. Couple other folk have emailed me with similiar issues offline (and also spoke with it about me on IRC). I am still 99% sure this is NOT A TOR ISSUE!!! I have spoken with many tor users on other platforms and the actual developers and this is not seen by any of them. I can also recreate this crash NOT running tor but just generating a heavy load with freenet and i2p. My gut feeling is still a network code regression between 5.x -> 6.x with the stack rewrite. I am at a loss how to troubleshoot this anymore (as noted in the PR and my earlier email). I truly hope somebody (e.g. a developer) can shed some light on this issue or troubleshoot it.
Robert Watson <rwatson@FreeBSD.org> wrote:> - Are there any warnings on the console from WITNESS or other > debugging options?I just got: Jun 28 23:01:19 tor kernel: lock order reversal: Jun 28 23:01:19 tor kernel: 1st 0xc3795000 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1053 Jun 28 23:01:19 tor kernel: 2nd 0xc1043144 system map (system map) @ /usr/src/sys/vm/vm_map.c:2390 Jun 28 23:01:20 tor kernel: KDB: stack backtrace: Jun 28 23:01:20 tor kernel: kdb_backtrace(0,ffffffff,c0711af0,c0713440,c06db624) at kdb_backtrace+0x29 Jun 28 23:01:20 tor kernel: witness_checkorder(c1043144,9,c06b90a8,956) at witness_checkorder+0x578 Jun 28 23:01:20 tor kernel: _mtx_lock_flags(c1043144,0,c06b90a8,956) at _mtx_lock_flags+0x5b Jun 28 23:01:20 tor kernel: _vm_map_lock(c10430c0,c06b90a8,956) at _vm_map_lock+0x26 Jun 28 23:01:20 tor kernel: vm_map_remove(c10430c0,c3bc6000,c3bc8000,d6f55b30,c0623361) at vm_map_remove+0x1f Jun 28 23:01:20 tor kernel: kmem_free(c10430c0,c3bc6000,2000,d6f55b48,c062524f) at kmem_free+0x25 Jun 28 23:01:20 tor kernel: page_free(c3bc6000,2000,22,2000,d6f55b60) at page_free+0x29 Jun 28 23:01:20 tor kernel: uma_large_free(c3ba5140) at uma_large_free+0x7b Jun 28 23:01:20 tor kernel: free(c3bc6000,c06d8980,c3bc6000,c4830000,1400) at free+0xc5 Jun 28 23:01:20 tor kernel: kqueue_expand(c3795000,c06d8a40,500,0) at kqueue_expand+0xd7 Jun 28 23:01:20 tor kernel: kqueue_register(c3795000,d6f55bf4,c3a8f480,1,0) at kqueue_register+0x1b8 Jun 28 23:01:20 tor kernel: kern_kevent(c3a8f480,3,19,200,d6f55cc8) at kern_kevent+0xc9 Jun 28 23:01:20 tor kernel: kevent(c3a8f480,d6f55d04,6,2,212) at kevent+0x55 Jun 28 23:01:20 tor kernel: syscall(2824003b,80e003b,bfbf003b,cb87000,80d5020) at syscall+0x22f Jun 28 23:01:20 tor kernel: Xint0x80_syscall() at Xint0x80_syscall+0x1f Jun 28 23:01:20 tor kernel: --- syscall (363, FreeBSD ELF32, kevent), eip = 0x282cc4af, esp = 0xbfbfe9fc, ebp = 0xbfbfea48 --- Looks similar to <http://sources.zabbadoz.net/freebsd/lor.html#185>. Fabian -- http://www.fabiankeil.de/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060628/e126183d/signature.pgp
On Wed, 28 Jun 2006, Fabian Keil wrote:> Robert Watson <rwatson@FreeBSD.org> wrote: > >> - Are there any warnings on the console from WITNESS or other >> debugging options? > > I just got: > > Jun 28 23:01:19 tor kernel: lock order reversal: > Jun 28 23:01:19 tor kernel: 1st 0xc3795000 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1053 > Jun 28 23:01:19 tor kernel: 2nd 0xc1043144 system map (system map) @ /usr/src/sys/vm/vm_map.c:2390 > Jun 28 23:01:20 tor kernel: KDB: stack backtrace: > Jun 28 23:01:20 tor kernel: kdb_backtrace(0,ffffffff,c0711af0,c0713440,c06db624) at kdb_backtrace+0x29 > Jun 28 23:01:20 tor kernel: witness_checkorder(c1043144,9,c06b90a8,956) at witness_checkorder+0x578 > Jun 28 23:01:20 tor kernel: _mtx_lock_flags(c1043144,0,c06b90a8,956) at _mtx_lock_flags+0x5b > Jun 28 23:01:20 tor kernel: _vm_map_lock(c10430c0,c06b90a8,956) at _vm_map_lock+0x26 > Jun 28 23:01:20 tor kernel: vm_map_remove(c10430c0,c3bc6000,c3bc8000,d6f55b30,c0623361) at vm_map_remove+0x1f > Jun 28 23:01:20 tor kernel: kmem_free(c10430c0,c3bc6000,2000,d6f55b48,c062524f) at kmem_free+0x25 > Jun 28 23:01:20 tor kernel: page_free(c3bc6000,2000,22,2000,d6f55b60) at page_free+0x29 > Jun 28 23:01:20 tor kernel: uma_large_free(c3ba5140) at uma_large_free+0x7b > Jun 28 23:01:20 tor kernel: free(c3bc6000,c06d8980,c3bc6000,c4830000,1400) at free+0xc5 > Jun 28 23:01:20 tor kernel: kqueue_expand(c3795000,c06d8a40,500,0) at kqueue_expand+0xd7 > Jun 28 23:01:20 tor kernel: kqueue_register(c3795000,d6f55bf4,c3a8f480,1,0) at kqueue_register+0x1b8 > Jun 28 23:01:20 tor kernel: kern_kevent(c3a8f480,3,19,200,d6f55cc8) at kern_kevent+0xc9 > Jun 28 23:01:20 tor kernel: kevent(c3a8f480,d6f55d04,6,2,212) at kevent+0x55 > Jun 28 23:01:20 tor kernel: syscall(2824003b,80e003b,bfbf003b,cb87000,80d5020) at syscall+0x22f > Jun 28 23:01:20 tor kernel: Xint0x80_syscall() at Xint0x80_syscall+0x1f > Jun 28 23:01:20 tor kernel: --- syscall (363, FreeBSD ELF32, kevent), eip = 0x282cc4af, esp = 0xbfbfe9fc, ebp = 0xbfbfea48 --- > > Looks similar to <http://sources.zabbadoz.net/freebsd/lor.html#185>.Could you run "vmstat -z", "netstat -m", and "vmstat -m" please? Robert N M Watson Computer Laboratory University of Cambridge
Robert Watson <rwatson@FreeBSD.org> wrote:> On Tue, 27 Jun 2006, Fabian Keil wrote: > > > There was a "request" for Tor related problem reports a while ago, > > I couldn't find the message again, but I believe it was posted here. > > I'm very interested in tracking down this problem, but have had a lot > of trouble getting reliable reports of problems -- i.e., ones where I > could get any debugging information. I had a similar conversation on > these lines yeterday with Roger (Tor author) here at the WEIS > conference. If this is easily reproduceable, I would like you to do > the following:> - Does the hang occur? If so, use a serial break to get into DDB, > see the above.I previously had the serial console misconfigured and I'm still not sure if the settings are correct now. So far I put "BOOT_COMCONSOLE_SPEED=57600" in /etc/make.conf, "options CONSPEED=57600" in the kernel and "console=comconsole" in /boot/loader.conf. Kernel and bootblock were recompiled and reinstalled. /boot.config contains the line: "-D -h -S57600" (speed setting through make.conf didn't work). The boot process now starts with: PXELINUX 3.11 2005-09-02 Copyright (C) 1994-2005 H. Peter Anvin Booting from local disk... 1 Linux 2 FreeBSD 3 FreeBSD Default: 2 /boot.config: -DConsoles: internal video/keyboard serial port BIOS drive C: is disk0 BIOS 639kB/523200kB available memory FreeBSD/i386 bootstrap loader, Revision 1.1 [...] After manually triggering a test panic through debug.kdb.enter I could enter ddb and everything seemed to be working. However today I got another hang and couldn't enter the debugger by sending BREAK. It is the same BREAK ssh sends with ~B, right? Even after rebooting, sending break didn't trigger a panic, so either I'm sending the wrong BREAK, or my console settings are still messed up. Any ideas? Fabian -- http://www.fabiankeil.de/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060702/4264660d/signature.pgp