Hello! I was trying to build OpenOffice using all of my 4 CPUs. To be able to do other work on the machine comfortably, I ran the build under nice, and assigned real-time priority to the two Xorg processes. The build started at about 23:10 last night, and hung at 23:46. The procstat output for the make's process group is: PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM 8371 2425 8371 2425 2425 1 mi wait FreeBSD ELF64 make 12254 8371 8371 2425 2425 1 mi wait FreeBSD ELF64 sh 12255 12254 8371 2425 2425 1 mi pause FreeBSD ELF64 tcsh 12262 12255 8371 2425 2425 1 mi wait FreeBSD ELF64 perl5.8.8 33010 12262 8371 2425 2425 1 mi wait FreeBSD ELF64 perl5.8.8 33011 33010 8371 2425 2425 1 mi wait FreeBSD ELF64 sh 33012 33011 8371 2425 2425 1 mi wait FreeBSD ELF64 dmake 37126 33012 8371 2425 2425 1 mi - FreeBSD ELF64 dmake The last line worries me greatly... According to "procstat -t", there is only one thread there: PID TID COMM TDNAME CPU PRI STATE WCHAN 37126 100724 dmake - 1 193 sleep - And trying to "ktrace -p 37126" returns (even to root, even in /tmp): ktrace: ktrace.out: Operation not permitted There are no problems ktrace-ing 33012, but nothing comes from there, as that process simply waits for its child. I guess, the child -- 37126 was (v)forked to launch a compiler or some such and remains stuck in between (v)fork and exec somewhere... The OS is: FreeBSD 7.0-STABLE/amd64 from Sat Jul 26, 2008 and the box is otherwise perfectly functional. The scheduling-related options are set as such: options SCHED_4BSD # 4BSD scheduler options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions Let me know, what else I can do to help fix this bug -- I'm going to reboot the machine tonight... Should I switch to SCHED_ULE as a work-around? Thanks! Yours, -mi
Paul B. Mahol ???????(??):>> Let me know, what else I can do to help fix this bug -- I'm going to >> reboot the machine tonight... Should I switch to SCHED_ULE as a >> work-around? >> > > SCHED_BSD4 is suboptimal for 4 CPUs, and it is replaced with SCHED_ULE > on 7 STABLE. >Thanks, Paul for the explanation. I've heard of some problems still lingering in the ULE and figured, I'll stay with the BSD-scheduler for a while. I guess, it is time to switch. However, what I'm seeing on my system today is evidence of the scheduler having a bug, rather than merely being suboptimal. If the old scheduler is still maintained and supposed to work (however sub-optimally) in 4-CPU configurations, I'd expect either inquiries for more debug-information or assurances, that the bug is known and worked on. In the July 26th sys/conf/NOTES file the SCHED_BSD4 is still on the default option.... Yours, -mi
On 9/23/08, Mikhail Teterin <mi+mill@aldan.algebra.com> wrote:> Hello! > > I was trying to build OpenOffice using all of my 4 CPUs. To be able to > do other work on the machine comfortably, I ran the build under nice, > and assigned real-time priority to the two Xorg processes. > The build started at about 23:10 last night, and hung at 23:46. The > procstat output for the make's process group is: > > PID PPID PGID SID TSID THR LOGIN WCHAN EMUL > COMM > 8371 2425 8371 2425 2425 1 mi wait FreeBSD ELF64 make > 12254 8371 8371 2425 2425 1 mi wait FreeBSD ELF64 sh > 12255 12254 8371 2425 2425 1 mi pause FreeBSD ELF64 > tcsh > 12262 12255 8371 2425 2425 1 mi wait FreeBSD ELF64 > perl5.8.8 > 33010 12262 8371 2425 2425 1 mi wait FreeBSD ELF64 > perl5.8.8 > 33011 33010 8371 2425 2425 1 mi wait FreeBSD ELF64 sh > 33012 33011 8371 2425 2425 1 mi wait FreeBSD ELF64 dmake > 37126 33012 8371 2425 2425 1 mi - FreeBSD ELF64 dmake > > The last line worries me greatly... According to "procstat -t", there is > only one thread there: > > PID TID COMM TDNAME CPU PRI STATE > WCHAN > 37126 100724 dmake - 1 193 sleep - > > And trying to "ktrace -p 37126" returns (even to root, even in /tmp): > > ktrace: ktrace.out: Operation not permitted > > There are no problems ktrace-ing 33012, but nothing comes from there, as > that process simply waits for its child. I guess, the child -- 37126 was > (v)forked to launch a compiler or some such and remains stuck in between > (v)fork and exec somewhere... > > The OS is: FreeBSD 7.0-STABLE/amd64 from Sat Jul 26, 2008 and the box is > otherwise perfectly functional. The scheduling-related options are set > as such: > > options SCHED_4BSD # 4BSD scheduler > options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B > real-time extensions > > Let me know, what else I can do to help fix this bug -- I'm going to > reboot the machine tonight... Should I switch to SCHED_ULE as a > work-around?SCHED_BSD4 is suboptimal for 4 CPUs, and it is replaced with SCHED_ULE on 7 STABLE.
On Tue, 23 Sep 2008, Mikhail Teterin wrote:> 37126 33012 8371 2425 2425 1 mi - FreeBSD ELF64 dmake > > PID TID COMM TDNAME CPU PRI STATE WCHAN > 37126 100724 dmake - 1 193 sleep - > > There are no problems ktrace-ing 33012, but nothing comes from there, as > that process simply waits for its child. I guess, the child -- 37126 was > (v)forked to launch a compiler or some such and remains stuck in between > (v)fork and exec somewhere...(lots of details elided) Yes, there's a period during exec where attaching debuggers isn't allowed, so if something gets wedged or otherwise lost there, ktrace isn't much use. On the other hand, if it's stuck there, then there are no syscalls going on anyway. Could you try procstat -kk on the process, does that shed any light? Another alternative, if you have DDB compiled in, is to break to the debugger and do a stack trace, or to use gdb on /dev/mem if you have a kernel.symbols. This may help us understand more about what is going on. Robert N M Watson Computer Laboratory University of Cambridge
Reasonably Related Threads
- FreeBSD 9.1 - openldap slapd lockups, mutex problems
- usb port issue in 9.1-Prerelease (Possibly Cam related)
- 9-STABLE (238719) compilation fails on i386
- "sleeping without queue" ?
- Approaching the limit on PV entries, consider increasing either the vm.pmap.shpgperproc or the vm.pmap.pv_entry_max sysctl.