Xin LI/李鑫
2003-Dec-17 16:41 UTC
[Backtrace] 4.9 and 5.1-RELEASE occasionly panic on RAM > 4GB without PAE (long)
Hello, One of my friend is operating servers on a big ICP in China. They found that if they run FreeBSD 4.9 or FreeBSD 5.1-RELEASE on their server, which has more than 4GB RAM installed, then they will occasionaly have the server panic'ed, or even rebooted without any reason. The server hardware is: P4-Xeon-2.4G*2, 4G RAM(DDR266), 146G*2 on asr(4) RAID-1. The kernel is not quite different from GENERIC, except that we have AUTO_EOI1 enabled. When PAE is enabled, then the kernel will not recognize the asr(4) device, and thus would refuse to boot from it, so we have it disabled. This problem afffects 4.9-RELEASE and 5.1-RELEASE. As the ICP is a conservative user, they will not run -STABLE, nor -CURRENT. I have turned the debugging symbols on their server and grabbed the following backtraces. Also, I have found how to reproduce this problem on a machine running with > 4GB memory without a PAE configuration, which, in my opinion, should not cause a kernel panic because it's just a resource hungry program and all it will do should, at most, hang the system, or even being killed directly by the operating system. ----------------------------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <unistd.h> int main() { char* s; int p; int d; d = 4096; for(;d>0;d--) { p=fork(); if(p==0) { s=malloc(1024*1024); if(s==NULL) printf("Pid %d malloc(1024*1024) failed\n",getpid()); else memset(s,0,1024*1024); while(1); } else if(p==-1) printf("Fork() %d failed\n",d); else continue; } return 0; } ----------------------------------------------------------------------- Unfortuantelly, this will cause panic after a short while. We got these backtrace: gdb /usr/src/sys/compile/DELPHIJ/kernel.debug -k vmcore.2 GNU gdb 4.18 (FreeBSD) Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 2627 in elfstab_build_psymtabs Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 933 in fill_symbuf SMP 4 cpus IdlePTD at phsyical address 0x003af000 initial pcb at physical address 0x00313fe0 panicstr: pmap_new_proc: u_map allocation failed panic messages: --- panic: pmap_new_proc: u_map allocation failed mp_lock = 02000001; cpuid = 2; lapic.id = 01000000 boot() called on cpu#2 syncing disks... 3 done Uptime: 2h44m38s dumping to dev #da/0x20001, offset 263168 dump 3967 3966 3965.................. ..................................... ..................................... ...............10 9 8 7 6 5 4 3 2 1 0 --- #0 dumpsys () at ../../kern/kern_shutdown.c:487 487 if (dumping++) { (kgdb) bt #0 dumpsys () at ../../kern/kern_shutdown.c:487 #1 0xc0185beb in boot (howto=256) at ../../kern/kern_shutdown.c:316 #2 0xc018605d in panic (fmt=0xc02e15e0 "pmap_new_proc: u_map allocation failed") at ../../kern/kern_shutdown.c:595 #3 0xc028d575 in pmap_new_proc (p=0xffb5d1e0) at ../../i386/i386/pmap.c:919 #4 0xc0239944 in vm_fork (p1=0xfbfe2ba0, p2=0xffb5d1e0, flags=20) at ../../vm/vm_glue.c:240 #5 0xc017fa85 in fork1 (p1=0xfbfe2ba0, flags=20, procp=0xfc0fef24) at ../.. /kern/kern_fork.c:517 #6 0xc017f1fa in fork (p=0xfbfe2ba0, uap=0xfc0fef80) at ../../kern/kern_fork.c:100 #7 0xc02916e1 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1077937296, tf_esi = -1077937304, tf_ebp = -1077937392, tf_isp = -66064428, tf_ebx = 1, tf_edx = 0, tf_ecx = -1077937066, tf_eax = 2, tf_trapno = 12, tf_err = 2, tf_eip = 1208488192, tf_cs = 31, tf_eflags = 514, tf_esp = -1077937420, tf_ss = 47}) at ../../i386/i386/trap.c:1175 #8 0xc027e09b in Xint0x80_syscall () #9 0x8048506 in ?? () (kgdb) up #1 0xc0185beb in boot (howto=256) at ../../kern/kern_shutdown.c:316 316 dumpsys(); (kgdb) #2 0xc018605d in panic (fmt=0xc02e15e0 "pmap_new_proc: u_map allocation failed") at ../../kern/kern_shutdown.c:595 595 boot(bootopt); (kgdb) #3 0xc028d575 in pmap_new_proc (p=0xffb5d1e0) at ../../i386/i386/pmap.c:919 919 panic("pmap_new_proc: u_map allocation failed"); (kgdb) l 914 915 /* get a kernel virtual address for the UPAGES for this proc */ 916 if ((up = (vm_offset_t) p->p_addr) == 0) { 917 up = kmem_alloc_nofault(kernel_map, UPAGES * PAGE_SIZE); 918 if (up == 0) 919 panic("pmap_new_proc: u_map allocation failed"); 920 p->p_addr = (struct user *) up; 921 } 922 923 for(i = 0; i < UPAGES; i++) { (kgdb) p up $1 = 0 (kgdb) p {struct proc *}p $2 = (struct proc *) 0xffb5d040 (kgdb) p {struct proc}p $3 = {p_procq = {tqe_next = 0xffb5d040, tqe_prev = 0x0}, p_list = {le_next 0xffb5d380, le_prev = 0xc032ac38}, p_cred = 0xd2898620, p_fd = 0xd289e900, p_stats = 0x0, p_limit = 0xd275e700, p_upages_obj 0xffbf9e60, p_procsig = 0xd289dac0, p_flag = 6, p_stat = 1 '\001', p_pad1 = "\000\000", p_pid = 5832, p_hash = {le_next = 0xfd8641a0, le_prev = 0xd2198b20}, p_pglist = {le_next = 0xffb5d380, le_prev = 0xfbfe2bdc}, p_pptr = 0xfbfe2ba0, p_sibling = {le_next 0xffb5d380, le_prev = 0xfbfe2bf0}, p_children = {lh_first = 0x0}, p_ithandle = {callout = 0x0}, p_oppid = 0, p_dupfd = 0, p_vmspace 0xffb239c0, p_estcpu = 7, p_cpticks = 0, p_pctcpu = 0, p_wchan = 0x0, p_wmesg = 0x0, p_swtime = 0, p_slptime = 0, p_realtimer = {it_interval {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}, p_runtime = 0, p_uu = 0, p_su = 0, p_iu = 0, p_uticks = 0, p_sticks = 0, p_iticks = 0, p_traceflag = 0, p_tracep = 0x0, p_siglist = { __bits = {0, 0, 0, 0}}, p_textvp = 0xfc0310c0, p_lock = 0 '\000', p_oncpu = 0 '\000', p_lastcpu = 0 '\000', p_rqindex = 0 '\000', p_locks = 0, p_simple_locks = 0, p_stops = 0, p_stype = 0, p_step = 0 '\000', p_pfsflags = 0 '\000', p_pad3 = "\000", p_retval = {0, 0}, p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_oldsigmask = {__bits {0, 0, 0, 0}}, p_sig = 0, p_code = 0, p_klist = { slh_first = 0x0}, p_sigmask = {__bits = {0, 0, 0, 0}}, p_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, p_priority = 50 '2', p_usrpri = 50 '2', p_nice = 0 '\000', p_comm = "fork_and_malloc_", p_pgrp = 0xd2740f20, p_sysent = 0xc02f13e0, p_rtprio = {type = 1, prio = 0}, p_prison = 0x0, p_args = 0xd2770b40, p_addr = 0x0, p_md {md_regs = 0x0}, p_xstat = 0, p_acflag = 0, p_ru = 0x0, p_nthreads = 0, p_aioinfo = 0x0, p_wakeup = 0, p_peers = 0x0, p_leader 0xffb5d1e0, p_asleep = {as_priority = 0, as_timo = 0}, p_emuldata = 0x0, p_fdtol = 0x0} (kgdb) ----------------------------------------------------------------------- On another machine, we have tested the following CGI script. When having heavy load, it will cause panic, too: ----------------------------------------------------------------------- #!/bin/sh printf "content-type:text/html\n\n" printf "<table bordercolor=a0a0a0 border=1 cellpadding=1 cellspacing=1>\n" printf "<tr>\n" printf "<td colspan=8 align=center>`/sbin/ifconfig|grep broadcast|head -n 1|perl -pi -e 's/ +/\t/g'|cut -f3|perl -pi -e 's/\n//g'`</td>\n" printf "</tr>\n" printf "<tr>\n" printf "<td align=center width=100>Load</td>\n" printf "<td align=center width=100>Connections</td>\n" printf "<td align=center width=100>Processes</td>\n" printf "<td align=center width=100>Open Files</td>\n" printf "<td align=center width=300>Processor(User%% System%% Free%%)</td>\n" printf "<td align=center width=100>Active Memory</td>\n" printf "<td align=center width=100>Swap</td>\n" printf "</tr>\n" printf "<tr>\n" printf "<td align=right>`w w|head -n 1|perl -pi -e 's/^.*, //g'|perl -pi -e 's/\n/ /g'`</td>\n" printf "<td align=right>`netstat -an|wc|cut -c1-8|perl -pi -e 's/ //g'|perl -pi -e 's/\n/ /g'`</td>\n" printf "<td align=right>`ps ax|wc|cut -c1-8|perl -pi -e 's/ //g'|perl -pi -e 's/\n/ /g'`</td>\n" printf "<td align=right>`fstat|wc|cut -c1-8|perl -pi -e 's/ //g'|perl -pi -e 's/\n/ /g'`</td>\n" printf "<td align=right>`vmstat|tail -n 1|perl -pi -e 's/ +/\t/g'|cut -f18-|perl -pi -e 's/\t/%% /g'|perl -pi -e 's/\n/%%/g'`</td>\n" printf "<td align=right>`top -n|head -n 4|tail -n 1|perl -pi -e 's/ +/\t/g'|cut -f2|perl -pi -e 's/\n/ /g'`</td>\n" printf "<td align=right>`top -n|head -n 5|tail -n 1|perl -pi -e 's/ +/\t/g'|cut -f4|perl -pi -e 's/\n/ /g'`</td>\n" printf "</tr>\n" printf "</table>\n" ----------------------------------------------------------------------- To simulate the situation, we have used ab: /usr/local/apache/bin/ab -n 16384000 -c 1024 And we got this: ----------------------------------------------------------------------- gdb /usr/src/sys/compile/DELPHIJ/kernel.debug -k vmcore.1 GNU gdb 4.18 (FreeBSD) Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 2627 in elfstab_build_psymtabs Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 933 in fill_symbuf SMP 4 cpus IdlePTD at phsyical address 0x003af000 initial pcb at physical address 0x00313fe0 panicstr: pmap_new_proc: u_map allocation failed panic messages: --- panic: pmap_new_proc: u_map allocation failed mp_lock = 01000001; cpuid = 1; lapic.id = 06000000 boot() called on cpu#1 syncing disks... 1 done Uptime: 1d0h36m40s dumping to dev #da/0x20001, offset 263168 dump 3967 3966 3965.................. ..................................... ..................................... ...............10 9 8 7 6 5 4 3 2 1 0 --- #0 dumpsys () at ../../kern/kern_shutdown.c:487 487 if (dumping++) { (kgdb) bt #0 dumpsys () at ../../kern/kern_shutdown.c:487 #1 0xc0185beb in boot (howto=256) at ../../kern/kern_shutdown.c:316 #2 0xc018605d in panic (fmt=0xc02e15e0 "pmap_new_proc: u_map allocation failed") at ../../kern/kern_shutdown.c:595 #3 0xc028d575 in pmap_new_proc (p=0xffa7f680) at ../../i386/i386/pmap.c:919 #4 0xc0239944 in vm_fork (p1=0xfed67780, p2=0xffa7f680, flags=20) at ../../vm/vm_glue.c:240 #5 0xc017fa85 in fork1 (p1=0xfed67780, flags=20, procp=0xfed6ef24) at ../.. /kern/kern_fork.c:517 #6 0xc017f1fa in fork (p=0xfed67780, uap=0xfed6ef80) at ../../kern/kern_fork.c:100 #7 0xc02916e1 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 134960572, tf_esi = 135016448, tf_ebp = -1077938448, tf_isp = -19468332, tf_ebx = 134959244, tf_edx = 4, tf_ecx -1077938376, tf_eax = 2, tf_trapno = 22, tf_err = 2, tf_eip = 134672192, tf_cs = 31, tf_eflags = 582, tf_esp = -1077938476, tf_ss = 47}) at ../../i386/i386/trap.c:1175 #8 0xc027e09b in Xint0x80_syscall () #9 0x804b34f in ?? () #10 0x804d915 in ?? () #11 0x804d403 in ?? () #12 0x804d17e in ?? () #13 0x804b4fb in ?? () #14 0x804ac40 in ?? () #15 0x80526be in ?? () #16 0x805262a in ?? () #17 0x804813e in ?? () (kgdb) up #1 0xc0185beb in boot (howto=256) at ../../kern/kern_shutdown.c:316 316 dumpsys(); (kgdb) #2 0xc018605d in panic (fmt=0xc02e15e0 "pmap_new_proc: u_map allocation failed") at ../../kern/kern_shutdown.c:595 595 boot(bootopt); (kgdb) #3 0xc028d575 in pmap_new_proc (p=0xffa7f680) at ../../i386/i386/pmap.c:919 919 panic("pmap_new_proc: u_map allocation failed"); (kgdb) l 914 915 /* get a kernel virtual address for the UPAGES for this proc */ 916 if ((up = (vm_offset_t) p->p_addr) == 0) { 917 up = kmem_alloc_nofault(kernel_map, UPAGES * PAGE_SIZE); 918 if (up == 0) 919 panic("pmap_new_proc: u_map allocation failed"); 920 p->p_addr = (struct user *) up; 921 } 922 923 for(i = 0; i < UPAGES; i++) { (kgdb) p p $1 = (struct proc *) 0xffa7f680 (kgdb) p {struct proc *}p $2 = (struct proc *) 0xffa7f4e0 (kgdb) p {struct proc}p $3 = {p_procq = {tqe_next = 0xffa7f4e0, tqe_prev = 0x0}, p_list = {le_next 0xffa7f820, le_prev = 0xc032ac38}, p_cred = 0xd3cb9f20, p_fd = 0xd3c95d00, p_stats = 0x0, p_limit = 0xd3c9e100, p_upages_obj 0xff64de04, p_procsig = 0xd3ca7400, p_flag = 4, p_stat = 1 '\001', p_pad1 = "\000\000", p_pid = 26450, p_hash = {le_next = 0xfd812dc0, le_prev = 0xd2198d48}, p_pglist = {le_next = 0xff18c3c0, le_prev = 0xfed677bc}, p_pptr = 0xfed67780, p_sibling = {le_next = 0x0, le_prev = 0xfed677d0}, p_children = {lh_first = 0x0}, p_ithandle = {callout = 0x0}, p_oppid = 0, p_dupfd = 0, p_vmspace 0xffae2500, p_estcpu = 184, p_cpticks = 0, p_pctcpu = 0, p_wchan = 0x0, p_wmesg = 0x0, p_swtime = 0, p_slptime = 0, p_realtimer = {it_interval {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}, p_runtime = 0, p_uu = 0, p_su = 0, p_iu = 0, p_uticks = 0, p_sticks = 0, p_iticks = 0, p_traceflag = 0, p_tracep = 0x0, p_siglist = { __bits = {0, 0, 0, 0}}, p_textvp = 0xf9b5c740, p_lock = 0 '\000', p_oncpu = 0 '\000', p_lastcpu = 0 '\000', p_rqindex = 0 '\000', p_locks = 0, p_simple_locks = 0, p_stops = 0, p_stype = 0, p_step = 0 '\000', p_pfsflags = 0 '\000', p_pad3 = "\000", p_retval = {0, 0}, p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_oldsigmask = {__bits {0, 0, 0, 0}}, p_sig = 0, p_code = 0, p_klist = { slh_first = 0x0}, p_sigmask = {__bits = {0, 0, 0, 0}}, p_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, p_priority = 52 '4', p_usrpri = 52 '4', p_nice = 0 '\000', p_comm "sh\000pd\000r\000\000\000\000\000\000\000\000\000", p_pgrp = 0xd21ce460, p_sysent = 0xc02f13e0, p_rtprio = {type = 1, prio = 0}, p_prison = 0x0, p_args = 0xd3c97240, p_addr = 0x0, p_md = {md_regs = 0x0}, p_xstat = 0, p_acflag = 0, p_ru = 0x0, p_nthreads = 0, p_aioinfo = 0x0, p_wakeup = 0, p_peers = 0x0, p_leader = 0xffa7f680, p_asleep = { as_priority = 0, as_timo = 0}, p_emuldata = 0x0, p_fdtol = 0x0} (kgdb) p kernel_map $4 = 0xc031a60c ----------------------------------------------------------------------- I have took a sight on the pmap and vm code, and did not figured out why the panic was caused. So I post it here, and I will have to say that -CURRENT may have the same problem, and it may affect the upcoming 5.2-RELEASE. For additional information, please fell free to contact me, and I can act as liaison. Any suggestions? Thank you in advance! Xin LI, FreeBSD (Simplified) Chinese Project
Bernd Walter
2003-Dec-17 16:54 UTC
[Backtrace] 4.9 and 5.1-RELEASE occasionly panic on RAM > 4GB without PAE (long)
On Thu, Dec 18, 2003 at 08:40:45AM +0800, Xin LI/???? wrote:> Hello, > > One of my friend is operating servers on a big ICP in China. They found that > if they run FreeBSD 4.9 or FreeBSD 5.1-RELEASE on their server, which has > more than 4GB RAM installed, then they will occasionaly have the server > panic'ed, or even rebooted without any reason. > > The server hardware is: P4-Xeon-2.4G*2, 4G RAM(DDR266), 146G*2 on asr(4) > RAID-1. The kernel is not quite different from GENERIC, except that we have > AUTO_EOI1 enabled. When PAE is enabled, then the kernel will not recognize > the asr(4) device, and thus would refuse to boot from it, so we have it > disabled. > > > This problem afffects 4.9-RELEASE and 5.1-RELEASE. As the ICP is a > conservative user, they will not run -STABLE, nor -CURRENT. I have turned > the debugging symbols on their server and grabbed the following backtraces. > Also, I have found how to reproduce this problem on a machine running with > > 4GB memory without a PAE configuration, which, in my opinion, should not > cause a kernel panic because it's just a resource hungry program and all it > will do should, at most, hang the system, or even being killed directly by > the operating system.Without PAE you can't use more than 4G on i386. You either have to use PAE or limit memory to 4G. This is strictly a hardware limitation there no other option left. The best that can be done would be autolimiting available memory instead of panicing later. I don't know how difficult this would be to implement. Telling in which way it panics if you try to do something that your hardware can't do is just wasted time. If PAE isn't compatible with asr(4) then maybe it's not busdma'ed or buggy. With PAE you have to use busdma aware drivers. That's a software thing and can be fixed. -- B.Walter BWCT http://www.bwct.de ticso@bwct.de info@bwct.de
Doug White
2003-Dec-17 16:59 UTC
[Backtrace] 4.9 and 5.1-RELEASE occasionly panic on RAM > 4GB without PAE (long)
On Thu, 18 Dec 2003, [gb2312] Xin LI/=C0=EE=F6=CE wrote:> One of my friend is operating servers on a big ICP in China. They found that > if they run FreeBSD 4.9 or FreeBSD 5.1-RELEASE on their server, which has > more than 4GB RAM installed, then they will occasionaly have the server > panic'ed, or even rebooted without any reason. > > The server hardware is: P4-Xeon-2.4G*2, 4G RAM(DDR266), 146G*2 on asr(4) > RAID-1. The kernel is not quite different from GENERIC, except that we have > AUTO_EOI1 enabled. When PAE is enabled, then the kernel will not recognize > the asr(4) device, and thus would refuse to boot from it, so we have it > disabled.PAE only has effect for >4GB (strictly), so exactly 4GB will work fine with !PAE.> This problem afffects 4.9-RELEASE and 5.1-RELEASE. As the ICP is a > conservative user, they will not run -STABLE, nor -CURRENT. I have turned > the debugging symbols on their server and grabbed the following backtraces. > Also, I have found how to reproduce this problem on a machine running with > > 4GB memory without a PAE configuration, which, in my opinion, should not > cause a kernel panic because it's just a resource hungry program and all it > will do should, at most, hang the system, or even being killed directly by > the operating system.Its probably due to poor tuning in this case. See the tuning(7) man page and innumerable mailing lists posts. Basically, you want to turn down maxusers (128 or so works best) and maybe change the kernel/user boundary and kmem sizes. vmstat -m is useful to keep track of kernel memory use. Also forkbombs (which is what your program is, effectively) are best controlled by using user resource limits. They're there for a reason. See the login.conf(5) man page.> panic: pmap_new_proc: u_map allocation failedkmem exhaustion condition in -stable. --=20 Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org