It seems that this mail has been sent encoded... Here I try it once more verified as 'plain text' from my private e-mail account ************************************************************************** Hello, finally I got it managed to upgrade and test my server last weekend. There are good news: so far kernel r261208 (FreeBSD 9.2-STABLE) runs without problems. I could not apply the patch you supplied, but I saw that the code was modified nonetheless and I gave it a try :-) It seems that the problem has been solved. Thank you very much! :-) with best regards Matthias Schuendehuette > -----Urspr?ngliche Nachricht----- > Von: Rick Macklem [mailto:rmacklem at uoguelph.ca] > Gesendet: Sonntag, 19. Januar 2014 03:19 > An: Schuendehuette, Matthias > Cc: Konstantin Belousov > Betreff: Re: Stack overflow with kernel r254683 > > I just found a bug that causes a stack overflow in the file handle > affinity code done by ken at . It occurs for an NFSv2 client mounting > a server, where sizeof(fhandle_t) < 32. > > I've attached the patch that fixes this, in case you can test it? > > Since your stack trace looks completely different, I won't guess if > this was the bug, but this bug definitely trashed the stack. > > rick > > ----- Original Message ----- > > On Mon, Aug 26, 2013 at 07:11:48PM -0400, Rick Macklem wrote: > > > Matthias Schuendehuette wrote: > > > > Hello, > > > > > > > > yesterday I got a kernel crash on my server (a ProLiant DL380 > > > > G5): > > > > > > > > "panic: stack overflow detected; backtrace may be corrupted" > > > > > > > > Kernel is "9.2-PRERELEASE FreeBSD 9.2-PRERELEASE #7 r254683" > > > > > > > > > > > > The stack trace reads: > > > > > > > > #0 doadump (textdump=1) at pcpu.h:249 > > > > 249 pcpu.h: No such file or directory. > > > > in pcpu.h > > > > (kgdb) #0 doadump (textdump=1) at pcpu.h:249 > > > > #1 0xc0668a4d in kern_reboot (howto=260) > > > > at /usr/src/sys/kern/kern_shutdown.c:449 > > > > #2 0xc0668f07 in panic (fmt=0x104 ) > > > > at /usr/src/sys/kern/kern_shutdown.c:637 > > > > #3 0xc0691da2 in __stack_chk_fail () > > > > at /usr/src/sys/kern/stack_protector.c:17 > > > > #4 0xc7fdb175 in nfsrvd_setattr (nd=0xc73b4400, > > > > isdgram=-952596480, > > > > vp=0xc8001140, p=0xf405ecc8, exp=0xc07af7f0) > > > > at > > > > /usr/src/sys/modules/nfsd/../../fs/nfsserver/nfs_nfsdserv.c:371 > > > > #5 0xc7fdb6e0 in nfsrvd_releaselckown (nd=0xc7442a00, > > > > isdgram=-952596480, > > > > vp=0xc7388848, p=0xf405ecb8, exp=0x0) > > > > at > > > > /usr/src/sys/modules/nfsd/../../fs/nfsserver/nfs_nfsdserv.c:3481 > > > > #6 0xc07af7f0 in svc_run_internal (pool=0xc7de8b80, ismaster=0) > > > > at /usr/src/sys/rpc/svc.c:1109 > > > > #7 0xc07b006d in svc_thread_start (arg=0xc7de8b80) > > > > at /usr/src/sys/rpc/svc.c:1200 > > > > #8 0xc06384f7 in fork_exit (callout=0xc07b0060 > > > > , > > > > arg=0xc7de8b80, frame=0xf405ed08) at > > > > /usr/src/sys/kern/kern_fork.c:992 > > > > #9 0xc08787c4 in fork_trampoline () at > > > > /usr/src/sys/i386/i386/exception.s:279 > > > > > > > Well, when I've looked on i386, the nfsd threads normally don't use > > > 1 page > > > and the stacks are 2 pages, so I doubt an nfsd thread is blowing > > > the stack. > > It is overflowing the frame, not the whole stack. In other word, > > something > > overwrote the canary which was put on the stack between local > > variables > > and the return address, possibly corrupting the return address as > > well. > > > > > Also, nfsrvd_releaselckown() doesn't call nfsrvd_setattr(), so the > > > backtrace > > > doesn't make much sense. > > Yes, this might be one of the consequences of the stack smashing. > > > > > > > > Afraid I can't help more than this. Good luck with it, rick > > > > > > > > > > > I have all the files in /var/crash, so if someone wants > > > > additional > > > > informations > > > > I should be able to deliver them. > > > > > > > > The kernel config file is customized in the sense that I have > > > > removed > > > > kernel items, that aren't used on that machine. > > > > > > > > One major difference: I use > > > > > > > > < options NFSCLIENT # Network Filesystem > > > > Client > > > > < options NFSSERVER # Network Filesystem > > > > Server > > > > > > > > instead of > > > > > > > > > options NFSCL # New Network Filesystem > > > > > Client > > > > > options NFSD # New Network Filesystem > > > > > Server > > > > > > > > because a kernel a few weeks ago immediately crashed with the new > > > > NFS-code. > > > > > > > > But it seems now, that the old NFS-code is also somehow damaged. > > > > > > > > Ah, and I still have from older releases of FreeBSD the following > > > > loader options - do they still make sense? > > > > > > > > geom_vinum_load="YES" > > > > kern.maxdsiz="734003200" > > > > vm.pmap.shpgperproc=256 > > > > vm.pmap.pv_entry_max=3145728 > > > > > > > > > > > > 'geom_vinum' is used as LVM only, no RAIDs are configured. > > > > > > > > This server is primarily a Samba server with the SMB-shares > > > > exported > > > > as NFS-shares as well > > > > for the other *nix-servers around. > > > > > > > > Because this is the most loaded production server, testing is a > > > > bit > > > > difficult, restricted to the evening and the weekends. > > > > > > > > On my two other FreeBSD machines I have no problems at all, one > > > > of > > > > them is an identical ProLiant server with a nearly identical > > > > kernel > > > > config - runs like a charm... > > > > > > > > Has someone a good advice or further questions? > > > > > > > > > > > > > > > > with best regards > > > > Matthias Schuendehuette > > > >