Adam McDougall
2013-Dec-20 17:34 UTC
"panic: vm_fault: fault on nofault entry" in nvidia module on 10
I know I should submit a PR and I fully intend to, but I don't have all the details gathered yet and had to defer to more pressing bugs or issues. But since 10.0 is very near, I should say at least something. 6 times on my home desktop, and twice this week on my work desktop I've had a kernel panic that looks like it came from inside the nvidia kernel module: Info from /var/crash/core.txt.#: Unread portion of the kernel message buffer: [175718] panic: vm_fault: fault on nofault entry, addr: fffffe0005f13000 [175718] cpuid = 3 [175718] Uptime: 2d0h48m38s [175718] Dumping 5442 out of 16321 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/linux.ko.symbols...done. Loaded symbols for /boot/kernel/linux.ko.symbols Reading symbols from /boot/modules/vboxdrv.ko...done. Loaded symbols for /boot/modules/vboxdrv.ko Reading symbols from /boot/modules/nvidia.ko...done. Loaded symbols for /boot/modules/nvidia.ko #0 doadump (textdump=1) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=1) at pcpu.h:219 #1 0xffffffff805cb045 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:447 #2 0xffffffff805cb424 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:754 #3 0xffffffff807c811d in vm_fault_hold (map=0xfffff80002000000, vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:279 #4 0xffffffff807c6be7 in vm_fault (map=0xfffff80002000000, vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=0) at /usr/src/sys/vm/vm_fault.c:224 #5 0xffffffff8080d01b in trap_pfault (frame=0xfffffe08491e5630, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:775 #6 0xffffffff8080c8d6 in trap (frame=0xfffffe08491e5630) at /usr/src/sys/amd64/amd64/trap.c:463 #7 0xffffffff807f2ca2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232 #8 0xffffffff8129056b in _nv000222rm () from /boot/modules/nvidia.ko #9 0xfffffe000bfd0000 in ?? () #10 0xfffff8008597ac00 in ?? () #11 0xfffffe08491e5820 in ?? () #12 0xfffff8000dc58c00 in ?? () #13 0xfffff8008597ac00 in ?? () #14 0xffffffff81781558 in _nv000768rm () from /boot/modules/nvidia.ko #15 0xfffffe000bfd0000 in ?? () #16 0xfffff8008597ac00 in ?? () #17 0xfffffe08491e5820 in ?? () #18 0xfffff8000dc58c00 in ?? () #19 0xfffff8008597ac00 in ?? () #20 0xffffffff817838c6 in rm_free_unused_clients () from /boot/modules/nvidia.ko #21 0x0000000000018764 in ?? () #22 0x134198d054ad9910 in ?? () #23 0x134198d14318c110 in ?? () #24 0x134198d14318c110 in ?? () #25 0x134198d0cbe32d10 in ?? () #26 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb) Other traces I've found are similar but not necessarily exact, although they are ALL in nvidia.ko. My home desktop crashed with this panic string on Nov 6, Nov 25, Dec 4 (twice), Dec 14, and Dec 15th. Often it would crash when I was opening thunderbird on the right monitor which is rotated vertically. One of the panics was with the new nvidia driver from ports/184352, which I quickly abandoned for now because it didn't solve the panic and it caused my second monitor to be rotated wrongly. Otherwise, nvidia-driver-319.32 driving a 'G96 [Quadro FX 380]'. When I saw the very first panic I knew I should report it, but it sure seemed odd that only the one desktop was having trouble. Not having time every night lately to debug this properly, I was starting to blame the hardware until it started happening on my work desktop too. Now I have to take it seriously, although I'll only have remote access to my work desktop for the rest of the year after I go home today. My work desktop crashed with this panic string on Dec 18th and 20th, but both times it happened when I was trying to start a VM in VirtualBox. Both my monitors are rotated vertically at work (in case this is a factor). Only nvidia-driver-319.32, driving a 'G92 [GeForce 8800 GT]'. Both of these computers have built-in Intel graphics of some sort, but I'm pretty sure I'd just be running away from the problem if I went that route, as interesting as it may be. All I really need is decent performance with the ability to rotate one or both DVI digital outputs. I have not configured Intel graphics for X in years so that is even lower on the list. I don't think the build of 10 has made any difference. Some of the panics were on r257230 (BETA2-ish) and the more recent ones on r258899 (BETA4-ish). The nvidia driver was always compiled by a similar version jail in poudriere (I don't have exact details, nor did I think of trying to compile locally yet). FreeBSD 9.x was always fine in this regard. Both of these systems used to run 9.x before I switched to a fresh install of 10 in a new zfs. I feel I could pretty easily agitate my home computer into panicing if I had a set of things to try at home, but I was hoping to think of a way to make more symbols show up in the nvidia module so the backtrace would make sense. It's largest component is a binary blob from the source which claims to be unstripped (as does the resulting nvidia.ko, also based on its size). Anyone else seen this? Anyone have any tips to try, or think of tests scenarios I should explore that might help track it down? A way to see symbols from the nvidia driver in a backtrace? I can think of some ideas to try such as dumping rotation and I was planning on brainstorming a more concrete example with more info, but I'm running low on time to spend on it this year and 10.0 is at the door. I'm not so much concerned about this issue for my own sake, but for the greater good, assuming someone else will fall into it. I'll plug away at it as I have time but good suggestions might help my efficiency. Thanks.
Volodymyr Kostyrko
2014-Mar-20 17:59 UTC
"panic: vm_fault: fault on nofault entry" in nvidia module on 10
20.12.2013 19:34, Adam McDougall wrote:> Anyone else seen this? Anyone have any tips to try, or think of tests scenarios > I should explore that might help track it down? A way to see symbols from the > nvidia driver in a backtrace? I can think of some ideas to try such as dumping > rotation and I was planning on brainstorming a more concrete example with more > info, but I'm running low on time to spend on it this year and 10.0 is at the > door. I'm not so much concerned about this issue for my own sake, but for the > greater good, assuming someone else will fall into it. I'll plug away at it > as I have time but good suggestions might help my efficiency. Thanks.I'm seeing this too. Rarely. Also sometimes when wm is creating/destroying modal windows with some effects Xorg hangs and program that was creating windows hangs in uwait. If Xorg or program is killed I can restart X and continue working. All other programs are intact. -- Sphinx of black quartz, judge my vow.