Hi,
I work for a small company that makes radar systems for research
organisations and we use FreeBSD on the PCs for data acquisition and
processing. We have recently shifted to FreeBSD6/amd64 and one machine in
particular is exhibiting a strange problem.
The acquisition process is a Tcl interpreter with a largish chunk of C code
which talks to the hardware (via RS485 and a custom PCI card). Once the
system is set up it streams data back via the PCI card and runs it through
various data processors (eg dump raw data to disk, FFT, winds, etc..).
The actual forking of processes is handled in Tcl and the C code only gets
involved to write the data out (to an FD the Tcl layer keeps).
The problem is that every now and then the process gets stuck and becomes
unkillable just after forking, ie..
eureka:~>ps -axwwwwl | grep Reco
19999 881 1 12 -8 -5 21716 15984 piperd I<s ?? 128:50.79
/usr/home/radar/skiymet/libexec/Recorder
/usr/home/radar/skiymet/libexec/acquisition/sks.tcl
/usr/home/radar/skiymet/etc/ud3
19999 80154 881 12 92 -5 21716 16 user m D<L ?? 0:00.00
/usr/home/radar/skiymet/libexec/Recorder
/usr/home/radar/skiymet/libexec/acquisition/sks.tcl
/usr/home/radar/skiymet/etc/ud3
19999 96464 96343 0 96 0 388 280 - R+ p2 0:00.00 grep Reco
Looking at the original process is OK..
eureka:~>gdb $GSHOME/libexec/Recorder
...
(gdb) attach 881
...
(gdb) bt
#0 0x00000008009c395c in read () from /lib/libc.so.6
#1 0x000000080072f77f in TclpCreateProcess () from /usr/local/lib/libtcl84.so.1
#2 0x0000000800717d25 in TclCreatePipeline () from /usr/local/lib/libtcl84.so.1
#3 0x00000008007186d0 in Tcl_OpenCommandChannel () from
/usr/local/lib/libtcl84.so.1
#4 0x0000000800704af8 in Tcl_ExecObjCmd () from /usr/local/lib/libtcl84.so.1
...
However the newly made one..
(gdb) attach 80154
Attaching to program: /usr/home/radar/skiymet/libexec/Recorder, process 80154
ptrace: Resource temporarily unavailable.
The original is killable..
eureka:~>kill 881
eureka:~>kill 881
881: No such process
But the old one is not..
eureka:~>kill 80154
eureka:~>kill 80154
eureka:~>kill -9 80154
eureka:~>kill -9 80154
I can fstat the new process and it shows a slew of open FDs (presumably
inherited from the old process), but I can't ktrace it..
eureka:~>ktrace -f 80154.ktr -p 80154
ktrace: 80154.ktr: Operation not permitted
eureka:~>sudo ktrace -f 80154.ktr -p 80154
ktrace: 80154.ktr: Operation not permitted
Or get a memory map..
eureka:~>dd if=/proc/80154/map bs=64k
dd: /proc/80154/map: Resource temporarily unavailable
0+0 records in
0+0 records out
0 bytes transferred in 0.000096 secs (0 bytes/sec)
Unfortunately the machine is at a very remote location and I have not
been able to replicate it locally (and I can't run, say memtest remotely
either).
The custom PCI card has a driver which may be the cause of the problems
but it does not appear to be involved from what I can see.
Does anyone have any suggestions? The version of FreeBSD is a little
after 6.0-RELEASE but not much.
--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
-- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060322/b57bb293/attachment.pgp