Royce Williams
2008-Jul-22 20:05 UTC
6.3-RELEASE-p3 recurring panics on multiple SM PDSMi+
We have 10 SuperMicro PDSMi+ 5015M-MTs that are panic'ing every few days. This started shortly after upgrade from 6.2-RELEASE to 6.3-RELEASE with freebsd-update. Other than switching to a debugging kernel, a little sysctl tuning, and patching with freebsd-update, they are stock. The debugging kernel was built from source that is also being patched with freebsd-update. These systems are running postfix and Courier imapd for an ISP with a userbase on the order of 10^4 users. They use gmirror, but the mailstore is over NFS. That NFS server is under pretty high load. All of the servers with this app and load pattern are panic'ing. I have little experience with kernel debugging, but the box in question is out of our farm and available for testing, and I am motivated to cooperate. :-) The full debugging kernel options I used are: include SMP options KDB options KDB_TRACE options DDB options BREAK_TO_DEBUGGER options WITNESS options WITNESS_SKIPSPIN db> trace Tracing pid 71182 tid 100325 td 0xcc08b180 kdb_enter(c095f294) at kdb_enter+0x2b panic(c09768ad,1000,14000000,c145bc88,1000,...) at panic+0x127 kmem_malloc(c14680c0,1000,102,eba6a8cc,c07e3fa5,...) at kmem_malloc+0x89 page_alloc(c1453780,1000,eba6a8bf,102,c06b8a84,...) at page_alloc+0x1a slab_zalloc(c1453780,102,c14537e0,c1453780,c1460d5c,...) at slab_zalloc+0xa1 uma_zone_slab(c1453780,2) at uma_zone_slab+0xf0 uma_zalloc_bucket(c1453780,2) at uma_zalloc_bucket+0x11c uma_zalloc_arg(c1453780,0,2) at uma_zalloc_arg+0x24c cache_enter(cd02c220,c9e62880,eba6a9fc) at cache_enter+0xa6 nfs_readdirplusrpc(cd02c220,eba6aa60,cc0ab880) at nfs_readdirplusrpc+0x6a6 nfs_doio(cd02c220,dce59668,cc0ab880,cc08b180,dce59668,...) at nfs_doio+0x20f nfs_bioread(cd02c220,eba6acb0,0,cc0ab880) at nfs_bioread+0xa64 nfs_readdir(eba6ac90) at nfs_readdir+0xe6 VOP_READDIR_APV(c09ebbc0,eba6ac90) at VOP_READDIR_APV+0x38 getdirentries(cc08b180,eba6ad04) at getdirentries+0x146 syscall(3b,3b,3b,9085f00,9085f00,...) at syscall+0x22f Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (196, FreeBSD ELF32, getdirentries), eip = 0xb825a79b, esp = 0xbfbfa1fc, ebp = 0xbfbfa228 --- Royce -- Royce D. Williams - http://royce.ws/ I don't like that man. I must get to know him better. - A. Lincoln
Royce Williams wrote:> db> trace > Tracing pid 71182 tid 100325 td 0xcc08b180 > kdb_enter(c095f294) at kdb_enter+0x2b > panic(c09768ad,1000,14000000,c145bc88,1000,...) at panic+0x127 > kmem_malloc(c14680c0,1000,102,eba6a8cc,c07e3fa5,...) at kmem_malloc+0x89You forgot to include the panic, but this is probably the "kmem_map too small" panic. It says that your kernel ran out of memory, and the solution is to fix that situation by giving more memory to the kernel. Increase the vm.kmem_size tunable until your system stops running out of memory on your workload. Kris
Clifton Royston
2008-Jul-23 05:07 UTC
6.3-RELEASE-p3 recurring panics on multiple SM PDSMi+
On Tue, Jul 22, 2008 at 11:45:30AM -0800, Royce Williams wrote:> We have 10 SuperMicro PDSMi+ 5015M-MTs that are panic'ing every few > days. This started shortly after upgrade from 6.2-RELEASE to > 6.3-RELEASE with freebsd-update.I was having similar problems on some servers using 6.2-psomething, which also use an NFS server heavily (for shared configuration files), until I started using the same vmem.kmem_size tunable Kris is recommending. That seemed to solve the problem for me. (I just slapped it up to 512M rather than try to binary search for some optimum value.) -- Clifton -- Clifton Royston -- cliftonr@iandicomputing.com / cliftonr@lava.net President - I and I Computing * http://www.iandicomputing.com/ Custom programming, network design, systems and network consulting services
Jeremy Chadwick
2008-Jul-23 05:34 UTC
6.3-RELEASE-p3 recurring panics on multiple SM PDSMi+
On Tue, Jul 22, 2008 at 11:45:30AM -0800, Royce Williams wrote:> We have 10 SuperMicro PDSMi+ 5015M-MTs that are panic'ing every few > days. This started shortly after upgrade from 6.2-RELEASE to > 6.3-RELEASE with freebsd-update.We use the same hardware (board and chassis), and have no such problems running both RELENG_6 and RELENG_7. I don't think your issue is specific to the board or chassis. Kris's explanation makes a lot more sense. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Reasonably Related Threads
- repeatable crash on RELENG7
- vm_thread_new: kstack allocation failed with vm.kmem_size="1536M"
- recommended setup for amd64 7-STABLE with ZFS, Samba 3.2 and possibly ACLs?
- current zfs tuning in RELENG_7 (AMD64) suggestions ?
- Customer complains of noise on line I cannot reproduce.