Mike Gerdts
2007-Mar-30 22:10 UTC
[dtrace-discuss] File path of mmap''d file that goes ESTALE
I have some apache processes with third-party modules that are core dumping after a few hours of uptime. I can see that they are getting a SIGBUS due to mmap''d files over NFS going stale, but I can''t see which file is going stale to help understand where to look next. When I look at one of the core files, I can see: t at 1 (l at 1) program terminated by signal BUS (Bus Error) 0xff1b0940: strcmp+0x0160: ld [%o1 + %o2], %o3 (dbx) where current thread: t at 1 =>[1] strcmp(0xfdd02890, 0xfe87512c, 0xff48d764, 0x0, 0x80808080, 0x1010101), at 0xff1b0940 ... The first argument to strcmp points into an mmap''d file: # pmap core.2030 | grep -i fdd0 FDD02000 1032K rw---* I used the following dtrace script to help me understand what could be causing the SIGBUS: #! /usr/sbin/dtrace -s #pragma D option quiet fbt::trapsig:entry / execname == "httpd" / { printf("%Y %s[%d] trapped signal %d\n", walltimestamp, execname, pid, args[0]->si_signo); printf(" si_code %d si_errno %d si_addr 0x%p\n", args[0]->si_code, args[0]->si_errno, args[0]->__data.__fault.__addr); printf(" %d@%s:%s\n", uid, zonename, cwd); printf(" %s\n", curpsinfo->pr_psargs); stack(); ustack(); }>From this I can see...2007 Mar 30 05:57:48 httpd[2030] trapped signal 10 si_code 3 si_errno 151 si_addr 0xfdd02890 .... # grep 151 /usr/include/sys/errno.h #define ESTALE 151 /* Stale NFS file handle */ Is there a way that I could use si_addr (or some other info at this point) to get back to the vnode? Can I use the vminfo provider to catch the failed page fault instead, then use some magic within it to take me to the vnode? Presumably the vnode will be able to tell me the path of the file that was opened. Bonus points goes to anyone that can tell me why another zone on a different machine would not be crashing. Both machines have the same patches, same shared NFS directory for application binaries, same NFS server and file system for application data (different subtrees), and only have hostname differences in configuration files. Given the vagueness of this description, lots of bonus points are due! Mike -- Mike Gerdts http://mgerdts.blogspot.com/
Stefan Parvu
2007-Mar-31 13:06 UTC
[dtrace-discuss] File path of mmap''d file that goes ESTALE
Hey Mike,>I have some apache processes with third-party modules that are core>dumping after a few hours of uptime.Try to use opensnoop from DTT. You could try: $ opensnoop -A -p <apache_pid> Check the errno value for each file. Is your docroot on NFS ?>Bonus points goes to anyone that can tell me why another zone on a>different machine would not be crashing. Both machines have the same>patches, same shared NFS directory for application binaries, same NFS>server and file system for application data (different subtrees), and>only have hostname differences in configuration files. Given the>vagueness of this description, lots of bonus points are due!Is Apache configured 100% same between zones ? If you try "EnableMMap off" for Apache[1] would that make a difference ? Stefan [1] http://httpd.apache.org/docs/2.2/mod/core.html