Roger Leigh
2014-Dec-12 19:34 UTC
Hard system lockups with 10.1, probably drm/newcons/radeonkms-related
Hi folks, With 10.1-RELEASE, I've enabled newcons at boot with kern.vty="vt" in loader.conf. With the latest Xorg/drm installed with pkg, I'm seeing intermittent hangs and hard lockups of the system. I've included the logs for one which recovered earlier today, but later on it just locked up completely and I don't have logs for that since I had to do a hard reset. I had to install and enable hal+dbus to get a working keyboard and mouse when running X, despite both working fine on the console! Not sure what the trigger is. Possibly also related to input. The first hard hang was after logging in with "mwm" via kdm4. It didn't start mwm, so I ran "mwm&" in the xterm; it locked up when I clicked and dragged the window title, i.e. when initiating the drag event. The second hang was while typing into a tmux session inside a konsole window. Nothing particularly special happening at the moment it locked up. I'm happy to do further debugging, but given that it locks up the whole system, I'm not sure how to go about getting any useful information at that point. The graphics card is an AMD Radeon HD 6800 Series using /dev/dri/card0. Starting X11 automatically loads the needed modules: # kldstat Id Refs Address Size Name 1 59 0xffffffff80200000 1755658 kernel 2 1 0xffffffff81956000 267f48 zfs.ko 3 2 0xffffffff81bbe000 6780 opensolaris.ko 4 1 0xffffffff81c11000 2b58 uhid.ko 5 1 0xffffffff81c14000 357f ums.ko 6 2 0xffffffff81c18000 28c0 vboxnetflt.ko 7 2 0xffffffff81c1b000 b998 netgraph.ko 8 2 0xffffffff81c27000 434c0 vboxdrv.ko 9 1 0xffffffff81c6b000 40a7 ng_ether.ko 10 1 0xffffffff81c70000 3ec0 vboxnetadp.ko 11 1 0xffffffff81c74000 11a57a radeonkms.ko 12 1 0xffffffff81d8f000 47f80 drm2.ko 13 4 0xffffffff81dd7000 1ff2 iicbus.ko 14 1 0xffffffff81dd9000 1a46 iic.ko 15 1 0xffffffff81ddb000 1e48 iicbb.ko 16 1 0xffffffff81ddd000 18f3 radeonkmsfw_BARTS_pfp.ko 17 1 0xffffffff81ddf000 1ce8 radeonkmsfw_BARTS_me.ko 18 1 0xffffffff81de1000 136f radeonkmsfw_BTC_rlc.ko 19 1 0xffffffff81de3000 6585 radeonkmsfw_BARTS_mc.ko Kernel log for the recoverable hang: Dec 12 13:23:23 sorilea kernel: drmn0: error: GPU lockup CP stall for more than 10000m sec Dec 12 13:23:23 sorilea kernel: drmn0: warning: GPU lockup (waiting for 0x000000000008 7184 last fence id 0x0000000000087177) Dec 12 13:23:23 sorilea kernel: drmn0: info: Saved 407 dwords of commands on ring 0. Dec 12 13:23:23 sorilea kernel: drmn0: info: GPU softreset: 0x00000003 Dec 12 13:23:23 sorilea kernel: drmn0: info: GRBM_STATUS = 0xA0003828 Dec 12 13:23:23 sorilea kernel: drmn0: info: GRBM_STATUS_SE0 = 0x00000007 Dec 12 13:23:23 sorilea kernel: drmn0: info: GRBM_STATUS_SE1 = 0x00000007 Dec 12 13:23:23 sorilea kernel: drmn0: info: SRBM_STATUS = 0x200000C0 Dec 12 13:23:23 sorilea kernel: drmn0: info: R_008674_CP_STALLED_STAT1 = 0x00000000 Dec 12 13:23:23 sorilea kernel: drmn0: info: R_008678_CP_STALLED_STAT2 = 0x00010100 Dec 12 13:23:23 sorilea kernel: drmn0: info: R_00867C_CP_BUSY_STAT = 0x00020182 Dec 12 13:23:23 sorilea kernel: drmn0: info: R_008680_CP_STAT = 0x80038243 Dec 12 13:23:23 sorilea kernel: drmn0: info: GRBM_SOFT_RESET=0x00007F6B Dec 12 13:23:23 sorilea kernel: drmn0: info: GRBM_STATUS = 0x00003828 Dec 12 13:23:23 sorilea kernel: drmn0: info: GRBM_STATUS_SE0 = 0x00000007 Dec 12 13:23:23 sorilea kernel: drmn0: info: GRBM_STATUS_SE1 = 0x00000007 Dec 12 13:23:23 sorilea kernel: drmn0: info: SRBM_STATUS = 0x200000C0 Dec 12 13:23:23 sorilea kernel: drmn0: info: R_008674_CP_STALLED_STAT1 = 0x00000000 Dec 12 13:23:23 sorilea kernel: drmn0: info: R_008678_CP_STALLED_STAT2 = 0x00000000 Dec 12 13:23:23 sorilea kernel: drmn0: info: R_00867C_CP_BUSY_STAT = 0x00000000 Dec 12 13:23:23 sorilea kernel: drmn0: info: R_008680_CP_STAT = 0x00000000 Dec 12 13:23:23 sorilea kernel: drmn0: info: GPU reset succeeded, trying to resume Dec 12 13:23:23 sorilea kernel: info: [drm] probing gen 2 caps for device 1002:5a16 = 2/0 Dec 12 13:23:23 sorilea kernel: info: [drm] enabling PCIE gen 2 link speeds, disable w ith radeon.pcie_gen2=0 Dec 12 13:23:23 sorilea kernel: info: [drm] PCIE GART of 512M enabled (table at 0x0000 000000040000). Dec 12 13:23:23 sorilea kernel: drmn0: info: WB enabled Dec 12 13:23:23 sorilea kernel: drmn0: info: fence driver on ring 0 use gpu addr 0x000 0000040000c00 and cpu addr 0x0xfffff8007e940c00 Dec 12 13:23:23 sorilea kernel: drmn0: info: fence driver on ring 3 use gpu addr 0x000 0000040000c0c and cpu addr 0x0xfffff8007e940c0c Dec 12 13:23:23 sorilea kernel: info: [drm] ring test on 0 succeeded in 4 usecs Dec 12 13:23:23 sorilea kernel: info: [drm] ring test on 3 succeeded in 2 usecs Dec 12 13:23:33 sorilea kernel: drmn0: error: GPU lockup CP stall for more than 10000m sec Dec 12 13:23:33 sorilea kernel: drmn0: warning: GPU lockup (waiting for 0x000000000008 7185 last fence id 0x0000000000087177) Dec 12 13:23:33 sorilea kernel: error: [drm:pid939:r600_ib_test] *ERROR* radeon: fence wait failed (-11). Dec 12 13:23:33 sorilea kernel: error: [drm:pid939:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-11). Dec 12 13:23:33 sorilea kernel: drmn0: error: ib ring test failed (-11). Dec 12 13:23:33 sorilea kernel: drmn0: info: GPU softreset: 0x00000003 Dec 12 13:23:33 sorilea kernel: drmn0: info: GRBM_STATUS = 0xA0003828 Dec 12 13:23:33 sorilea kernel: drmn0: info: GRBM_STATUS_SE0 = 0x00000007 Dec 12 13:23:33 sorilea kernel: drmn0: info: GRBM_STATUS_SE1 = 0x00000007 Dec 12 13:23:33 sorilea kernel: drmn0: info: SRBM_STATUS = 0x200000C0 Dec 12 13:23:33 sorilea kernel: drmn0: info: R_008674_CP_STALLED_STAT1 = 0x00000000 Dec 12 13:23:33 sorilea kernel: drmn0: info: R_008678_CP_STALLED_STAT2 = 0x00004100 Dec 12 13:23:33 sorilea kernel: drmn0: info: R_00867C_CP_BUSY_STAT = 0x00020182 Dec 12 13:23:33 sorilea kernel: drmn0: info: R_008680_CP_STAT = 0x80028243 Dec 12 13:23:33 sorilea kernel: drmn0: info: GRBM_SOFT_RESET=0x00007F6B Dec 12 13:23:33 sorilea kernel: drmn0: info: GRBM_STATUS = 0x00003828 Dec 12 13:23:33 sorilea kernel: drmn0: info: GRBM_STATUS_SE0 = 0x00000007 Dec 12 13:23:33 sorilea kernel: drmn0: info: GRBM_STATUS_SE1 = 0x00000007 Dec 12 13:23:33 sorilea kernel: drmn0: info: SRBM_STATUS = 0x200000C0 Dec 12 13:23:33 sorilea kernel: drmn0: info: R_008674_CP_STALLED_STAT1 = 0x00000000 Dec 12 13:23:33 sorilea kernel: drmn0: info: R_008678_CP_STALLED_STAT2 = 0x00000000 Dec 12 13:23:33 sorilea kernel: drmn0: info: R_00867C_CP_BUSY_STAT = 0x00000000 Dec 12 13:23:33 sorilea kernel: drmn0: info: R_008680_CP_STAT = 0x00000000 Dec 12 13:23:33 sorilea kernel: drmn0: info: GPU reset succeeded, trying to resume Dec 12 13:23:33 sorilea kernel: info: [drm] probing gen 2 caps for device 1002:5a16 = 2/0 Dec 12 13:23:33 sorilea kernel: info: [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 Dec 12 13:23:33 sorilea kernel: info: [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). Dec 12 13:23:33 sorilea kernel: drmn0: info: WB enabled Dec 12 13:23:33 sorilea kernel: drmn0: info: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0x0xfffff8007e940c00 Dec 12 13:23:33 sorilea kernel: drmn0: info: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0x0xfffff8007e940c0c Dec 12 13:23:33 sorilea kernel: info: [drm] ring test on 0 succeeded in 4 usecs Dec 12 13:23:33 sorilea kernel: info: [drm] ring test on 3 succeeded in 2 usecs Dec 12 13:23:33 sorilea kernel: info: [drm] ib test on ring 0 succeeded in 0 usecs Dec 12 13:23:33 sorilea kernel: info: [drm] ib test on ring 3 succeeded in 1 usecs It worked perfectly for 5 hours after this recovery. Thanks all, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' schroot and sbuild http://alioth.debian.org/projects/buildd-tools `- GPG Public Key F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800 -------------- next part -------------- A non-text attachment was scrubbed... Name: Xorg.0.log.old.xz Type: application/octet-stream Size: 8636 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20141212/062fcba5/attachment.obj>
Pete French
2014-Dec-13 14:58 UTC
Hard system lockups with 10.1, probably drm/newcons/radeonkms-related
> Subject: Hard system lockups with 10.1, probably drm/newcons/radeonkms-related > > > Hi folks, > > With 10.1-RELEASE, I've enabled newcons at boot with > kern.vty="vt" > in loader.conf. With the latest Xorg/drm installed with pkg, I'm > seeing intermittent hangs and hard lockups of the system. I've > included the logs for one which recovered earlier today, but later > on it just locked up completely and I don't have logs for that > since I had to do a hard reset. I had to install and enable > hal+dbus to get a working keyboard and mouse when running X, > despite both working fine on the console!Interesting, as I was about to write something very similar about latest xorg under 9.3-STABLE. I have been seeing the same, hangs and hard lockups. I thought it was the card so I replaced it, but the result is the same. I am also using the new console, and I am also using Radeons. This is definetly video related - it never happens using the system remotely, and sometimes it is just the graphics which locks up (I can ssh in and shutdown). Occasionally, the lockup clears itself and the graphics comes back. Very puzzling - it can go for days without a lokcup, then sometimes I will gte several in the space of an afternoon. -pete.