Robin Murphy
2016-Apr-29 11:18 UTC
[Nouveau] [PATCH] Revert "drm/nouveau/device/pci: set as non-CPU-coherent on ARM64"
This reverts commit 1733a2ad36741b1812cf8b3f3037c28d0af53f50. There is apparently something amiss with the way the TTM code handles DMA buffers, which the above commit was attempting to work around for arm64 systems with non-coherent PCI. Unfortunately, this completely breaks systems *with* coherent PCI (which appear to be the majority). Booting a plain arm64 defconfig + CONFIG_DRM + CONFIG_DRM_NOUVEAU on a machine with a PCI GPU having coherent dma_map_ops (in this case a 7600GT card plugged into an ARM Juno board) results in a fatal crash: [ 2.803438] nouveau 0000:06:00.0: DRM: allocated 1024x768 fb: 0x9000, bo ffffffc976141c00 [ 2.897662] Unable to handle kernel NULL pointer dereference at virtual address 000001ac [ 2.897666] pgd = ffffff8008e00000 [ 2.897675] [000001ac] *pgd=00000009ffffe003, *pud=00000009ffffe003, *pmd=0000000000000000 [ 2.897680] Internal error: Oops: 96000045 [#1] PREEMPT SMP [ 2.897685] Modules linked in: [ 2.897692] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc5+ #543 [ 2.897694] Hardware name: ARM Juno development board (r1) (DT) [ 2.897699] task: ffffffc9768a0000 ti: ffffffc9768a8000 task.ti: ffffffc9768a8000 [ 2.897711] PC is at __memcpy+0x7c/0x180 [ 2.897719] LR is at OUT_RINGp+0x34/0x70 [ 2.897724] pc : [<ffffff80083465fc>] lr : [<ffffff800854248c>] pstate: 80000045 [ 2.897726] sp : ffffffc9768ab360 [ 2.897732] x29: ffffffc9768ab360 x28: 0000000000000001 [ 2.897738] x27: ffffffc97624c000 x26: 0000000000000000 [ 2.897744] x25: 0000000000000080 x24: 0000000000006c00 [ 2.897749] x23: 0000000000000005 x22: ffffffc97624c010 [ 2.897755] x21: 0000000000000004 x20: 0000000000000004 [ 2.897761] x19: ffffffc9763da000 x18: ffffffc976b2491c [ 2.897766] x17: 0000000000000007 x16: 0000000000000006 [ 2.897771] x15: 0000000000000001 x14: 0000000000000001 [ 2.897777] x13: 0000000000e31b70 x12: ffffffc9768a0080 [ 2.897783] x11: 0000000000000000 x10: fffffffffffffb00 [ 2.897788] x9 : 0000000000000000 x8 : 0000000000000000 [ 2.897793] x7 : 0000000000000000 x6 : 00000000000001ac [ 2.897799] x5 : 00000000ffffffff x4 : 0000000000000000 [ 2.897804] x3 : 0000000000000010 x2 : 0000000000000010 [ 2.897810] x1 : ffffffc97624c010 x0 : 00000000000001ac ... [ 2.898494] Call trace: [ 2.898499] Exception stack(0xffffffc9768ab1a0 to 0xffffffc9768ab2c0) [ 2.898506] b1a0: ffffffc9763da000 0000000000000004 ffffffc9768ab360 ffffff80083465fc [ 2.898513] b1c0: ffffffc976801e00 ffffffc9762b8000 ffffffc9768ab1f0 ffffff80080ec158 [ 2.898520] b1e0: ffffffc9768ab230 ffffff8008496d04 ffffffc975ce6d80 ffffffc9768ab36e [ 2.898527] b200: ffffffc9768ab36f ffffffc9768ab29d ffffffc9768ab29e ffffffc9768a0000 [ 2.898533] b220: ffffffc9768ab250 ffffff80080e70c0 ffffffc9768ab270 ffffff8008496e44 [ 2.898540] b240: 00000000000001ac ffffffc97624c010 0000000000000010 0000000000000010 [ 2.898546] b260: 0000000000000000 00000000ffffffff 00000000000001ac 0000000000000000 [ 2.898552] b280: 0000000000000000 0000000000000000 fffffffffffffb00 0000000000000000 [ 2.898558] b2a0: ffffffc9768a0080 0000000000e31b70 0000000000000001 0000000000000001 [ 2.898566] [<ffffff80083465fc>] __memcpy+0x7c/0x180 [ 2.898574] [<ffffff800853e164>] nv04_fbcon_imageblit+0x1d4/0x2e8 [ 2.898582] [<ffffff800853d6d0>] nouveau_fbcon_imageblit+0xd8/0xe0 [ 2.898591] [<ffffff80083c4db4>] soft_cursor+0x154/0x1d8 [ 2.898598] [<ffffff80083c47b4>] bit_cursor+0x4fc/0x538 [ 2.898605] [<ffffff80083c0cfc>] fbcon_cursor+0x134/0x1a8 [ 2.898613] [<ffffff800841c280>] hide_cursor+0x38/0xa0 [ 2.898620] [<ffffff800841d420>] redraw_screen+0x120/0x228 [ 2.898628] [<ffffff80083bf268>] fbcon_prepare_logo+0x370/0x3f8 [ 2.898635] [<ffffff80083bf640>] fbcon_init+0x350/0x560 [ 2.898641] [<ffffff800841c634>] visual_init+0xac/0x108 [ 2.898648] [<ffffff800841df14>] do_bind_con_driver+0x1c4/0x3a8 [ 2.898655] [<ffffff800841e4f4>] do_take_over_console+0x174/0x1e8 [ 2.898662] [<ffffff80083bf8c4>] do_fbcon_takeover+0x74/0x100 [ 2.898669] [<ffffff80083c3e44>] fbcon_event_notify+0x8cc/0x920 [ 2.898680] [<ffffff80080d7e38>] notifier_call_chain+0x50/0x90 [ 2.898685] [<ffffff80080d8214>] __blocking_notifier_call_chain+0x4c/0x90 [ 2.898691] [<ffffff80080d826c>] blocking_notifier_call_chain+0x14/0x20 [ 2.898696] [<ffffff80083c5e1c>] fb_notifier_call_chain+0x1c/0x28 [ 2.898703] [<ffffff80083c81ac>] register_framebuffer+0x1cc/0x2e0 [ 2.898712] [<ffffff800845da80>] drm_fb_helper_initial_config+0x288/0x3e8 [ 2.898719] [<ffffff800853da20>] nouveau_fbcon_init+0xe0/0x118 [ 2.898727] [<ffffff800852d2f8>] nouveau_drm_load+0x268/0x890 [ 2.898734] [<ffffff8008466e24>] drm_dev_register+0xbc/0xc8 [ 2.898740] [<ffffff8008468a88>] drm_get_pci_dev+0xa0/0x180 [ 2.898747] [<ffffff800852cb28>] nouveau_drm_probe+0x1a0/0x1e0 [ 2.898755] [<ffffff80083a32e0>] pci_device_probe+0x98/0x110 [ 2.898763] [<ffffff800858e434>] driver_probe_device+0x204/0x2b0 [ 2.898770] [<ffffff800858e58c>] __driver_attach+0xac/0xb0 [ 2.898777] [<ffffff800858c3e0>] bus_for_each_dev+0x60/0xa0 [ 2.898783] [<ffffff800858dbc0>] driver_attach+0x20/0x28 [ 2.898789] [<ffffff800858d7b0>] bus_add_driver+0x1d0/0x238 [ 2.898796] [<ffffff800858ed50>] driver_register+0x60/0xf8 [ 2.898802] [<ffffff80083a20dc>] __pci_register_driver+0x3c/0x48 [ 2.898809] [<ffffff8008468eb4>] drm_pci_init+0xf4/0x120 [ 2.898818] [<ffffff8008c56fc0>] nouveau_drm_init+0x21c/0x230 [ 2.898825] [<ffffff80080829d4>] do_one_initcall+0x8c/0x190 [ 2.898832] [<ffffff8008c31af4>] kernel_init_freeable+0x14c/0x1f0 [ 2.898839] [<ffffff80088a0c20>] kernel_init+0x10/0x100 [ 2.898845] [<ffffff8008085e10>] ret_from_fork+0x10/0x40 [ 2.898853] Code: a88120c7 a8c12027 a88120c7 a8c12027 (a88120c7) [ 2.898871] ---[ end trace d5713dcad023ee04 ]--- [ 2.898888] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b In a toss-up between the GPU seeing stale data artefacts on some systems vs. catastrophic kernel crashes on other systems, the latter would seem to take precedence, so revert this change until the real underlying problem can be fixed. Signed-off-by: Robin Murphy <robin.murphy at arm.com> --- Alex, Ben, Dave, I know Alex was looking into this, but since we're nearly at -rc6 already it looks like the only thing to do for 4.6 is pick the lesser of two evils... Thanks, Robin. drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c index 18fab3973ce5..62ad0300cfa5 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c @@ -1614,7 +1614,7 @@ nvkm_device_pci_func = { .fini = nvkm_device_pci_fini, .resource_addr = nvkm_device_pci_resource_addr, .resource_size = nvkm_device_pci_resource_size, - .cpu_coherent = !IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_ARM64), + .cpu_coherent = !IS_ENABLED(CONFIG_ARM), }; int -- 2.8.1.dirty
Alexandre Courbot
2016-May-09 10:28 UTC
[Nouveau] [PATCH] Revert "drm/nouveau/device/pci: set as non-CPU-coherent on ARM64"
On 04/29/2016 08:18 PM, Robin Murphy wrote:> This reverts commit 1733a2ad36741b1812cf8b3f3037c28d0af53f50. > > There is apparently something amiss with the way the TTM code handles > DMA buffers, which the above commit was attempting to work around for > arm64 systems with non-coherent PCI. Unfortunately, this completely > breaks systems *with* coherent PCI (which appear to be the majority). > > Booting a plain arm64 defconfig + CONFIG_DRM + CONFIG_DRM_NOUVEAU on > a machine with a PCI GPU having coherent dma_map_ops (in this case a > 7600GT card plugged into an ARM Juno board) results in a fatal crash: > > [ 2.803438] nouveau 0000:06:00.0: DRM: allocated 1024x768 fb: 0x9000, bo ffffffc976141c00 > [ 2.897662] Unable to handle kernel NULL pointer dereference at virtual address 000001ac > [ 2.897666] pgd = ffffff8008e00000 > [ 2.897675] [000001ac] *pgd=00000009ffffe003, *pud=00000009ffffe003, *pmd=0000000000000000 > [ 2.897680] Internal error: Oops: 96000045 [#1] PREEMPT SMP > [ 2.897685] Modules linked in: > [ 2.897692] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc5+ #543 > [ 2.897694] Hardware name: ARM Juno development board (r1) (DT) > [ 2.897699] task: ffffffc9768a0000 ti: ffffffc9768a8000 task.ti: ffffffc9768a8000 > [ 2.897711] PC is at __memcpy+0x7c/0x180 > [ 2.897719] LR is at OUT_RINGp+0x34/0x70 > [ 2.897724] pc : [<ffffff80083465fc>] lr : [<ffffff800854248c>] pstate: 80000045 > [ 2.897726] sp : ffffffc9768ab360 > [ 2.897732] x29: ffffffc9768ab360 x28: 0000000000000001 > [ 2.897738] x27: ffffffc97624c000 x26: 0000000000000000 > [ 2.897744] x25: 0000000000000080 x24: 0000000000006c00 > [ 2.897749] x23: 0000000000000005 x22: ffffffc97624c010 > [ 2.897755] x21: 0000000000000004 x20: 0000000000000004 > [ 2.897761] x19: ffffffc9763da000 x18: ffffffc976b2491c > [ 2.897766] x17: 0000000000000007 x16: 0000000000000006 > [ 2.897771] x15: 0000000000000001 x14: 0000000000000001 > [ 2.897777] x13: 0000000000e31b70 x12: ffffffc9768a0080 > [ 2.897783] x11: 0000000000000000 x10: fffffffffffffb00 > [ 2.897788] x9 : 0000000000000000 x8 : 0000000000000000 > [ 2.897793] x7 : 0000000000000000 x6 : 00000000000001ac > [ 2.897799] x5 : 00000000ffffffff x4 : 0000000000000000 > [ 2.897804] x3 : 0000000000000010 x2 : 0000000000000010 > [ 2.897810] x1 : ffffffc97624c010 x0 : 00000000000001ac > ... > [ 2.898494] Call trace: > [ 2.898499] Exception stack(0xffffffc9768ab1a0 to 0xffffffc9768ab2c0) > [ 2.898506] b1a0: ffffffc9763da000 0000000000000004 ffffffc9768ab360 ffffff80083465fc > [ 2.898513] b1c0: ffffffc976801e00 ffffffc9762b8000 ffffffc9768ab1f0 ffffff80080ec158 > [ 2.898520] b1e0: ffffffc9768ab230 ffffff8008496d04 ffffffc975ce6d80 ffffffc9768ab36e > [ 2.898527] b200: ffffffc9768ab36f ffffffc9768ab29d ffffffc9768ab29e ffffffc9768a0000 > [ 2.898533] b220: ffffffc9768ab250 ffffff80080e70c0 ffffffc9768ab270 ffffff8008496e44 > [ 2.898540] b240: 00000000000001ac ffffffc97624c010 0000000000000010 0000000000000010 > [ 2.898546] b260: 0000000000000000 00000000ffffffff 00000000000001ac 0000000000000000 > [ 2.898552] b280: 0000000000000000 0000000000000000 fffffffffffffb00 0000000000000000 > [ 2.898558] b2a0: ffffffc9768a0080 0000000000e31b70 0000000000000001 0000000000000001 > [ 2.898566] [<ffffff80083465fc>] __memcpy+0x7c/0x180 > [ 2.898574] [<ffffff800853e164>] nv04_fbcon_imageblit+0x1d4/0x2e8 > [ 2.898582] [<ffffff800853d6d0>] nouveau_fbcon_imageblit+0xd8/0xe0 > [ 2.898591] [<ffffff80083c4db4>] soft_cursor+0x154/0x1d8 > [ 2.898598] [<ffffff80083c47b4>] bit_cursor+0x4fc/0x538 > [ 2.898605] [<ffffff80083c0cfc>] fbcon_cursor+0x134/0x1a8 > [ 2.898613] [<ffffff800841c280>] hide_cursor+0x38/0xa0 > [ 2.898620] [<ffffff800841d420>] redraw_screen+0x120/0x228 > [ 2.898628] [<ffffff80083bf268>] fbcon_prepare_logo+0x370/0x3f8 > [ 2.898635] [<ffffff80083bf640>] fbcon_init+0x350/0x560 > [ 2.898641] [<ffffff800841c634>] visual_init+0xac/0x108 > [ 2.898648] [<ffffff800841df14>] do_bind_con_driver+0x1c4/0x3a8 > [ 2.898655] [<ffffff800841e4f4>] do_take_over_console+0x174/0x1e8 > [ 2.898662] [<ffffff80083bf8c4>] do_fbcon_takeover+0x74/0x100 > [ 2.898669] [<ffffff80083c3e44>] fbcon_event_notify+0x8cc/0x920 > [ 2.898680] [<ffffff80080d7e38>] notifier_call_chain+0x50/0x90 > [ 2.898685] [<ffffff80080d8214>] __blocking_notifier_call_chain+0x4c/0x90 > [ 2.898691] [<ffffff80080d826c>] blocking_notifier_call_chain+0x14/0x20 > [ 2.898696] [<ffffff80083c5e1c>] fb_notifier_call_chain+0x1c/0x28 > [ 2.898703] [<ffffff80083c81ac>] register_framebuffer+0x1cc/0x2e0 > [ 2.898712] [<ffffff800845da80>] drm_fb_helper_initial_config+0x288/0x3e8 > [ 2.898719] [<ffffff800853da20>] nouveau_fbcon_init+0xe0/0x118 > [ 2.898727] [<ffffff800852d2f8>] nouveau_drm_load+0x268/0x890 > [ 2.898734] [<ffffff8008466e24>] drm_dev_register+0xbc/0xc8 > [ 2.898740] [<ffffff8008468a88>] drm_get_pci_dev+0xa0/0x180 > [ 2.898747] [<ffffff800852cb28>] nouveau_drm_probe+0x1a0/0x1e0 > [ 2.898755] [<ffffff80083a32e0>] pci_device_probe+0x98/0x110 > [ 2.898763] [<ffffff800858e434>] driver_probe_device+0x204/0x2b0 > [ 2.898770] [<ffffff800858e58c>] __driver_attach+0xac/0xb0 > [ 2.898777] [<ffffff800858c3e0>] bus_for_each_dev+0x60/0xa0 > [ 2.898783] [<ffffff800858dbc0>] driver_attach+0x20/0x28 > [ 2.898789] [<ffffff800858d7b0>] bus_add_driver+0x1d0/0x238 > [ 2.898796] [<ffffff800858ed50>] driver_register+0x60/0xf8 > [ 2.898802] [<ffffff80083a20dc>] __pci_register_driver+0x3c/0x48 > [ 2.898809] [<ffffff8008468eb4>] drm_pci_init+0xf4/0x120 > [ 2.898818] [<ffffff8008c56fc0>] nouveau_drm_init+0x21c/0x230 > [ 2.898825] [<ffffff80080829d4>] do_one_initcall+0x8c/0x190 > [ 2.898832] [<ffffff8008c31af4>] kernel_init_freeable+0x14c/0x1f0 > [ 2.898839] [<ffffff80088a0c20>] kernel_init+0x10/0x100 > [ 2.898845] [<ffffff8008085e10>] ret_from_fork+0x10/0x40 > [ 2.898853] Code: a88120c7 a8c12027 a88120c7 a8c12027 (a88120c7) > [ 2.898871] ---[ end trace d5713dcad023ee04 ]--- > [ 2.898888] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b > > In a toss-up between the GPU seeing stale data artefacts on some systems > vs. catastrophic kernel crashes on other systems, the latter would seem > to take precedence, so revert this change until the real underlying > problem can be fixed. > > Signed-off-by: Robin Murphy <robin.murphy at arm.com> > --- > > Alex, Ben, Dave, > > I know Alex was looking into this, but since we're nearly at -rc6 already > it looks like the only thing to do for 4.6 is pick the lesser of two evils...Hi Robin, Sorry for the delayed reply - I was offline last week. You are right, so let's pick this patch for now. Reviewed-by: Alexandre Courbot <acourbot at nvidia.com>
Reasonably Related Threads
- [PATCH v2] Revert "drm/nouveau/device/pci: set as non-CPU-coherent on ARM64"
- [PATCH v2] Revert "drm/nouveau/device/pci: set as non-CPU-coherent on ARM64"
- v4.8-rc2 crashes while probing nvidia graphics card on arm64
- Sound not working properly on Xen Dom0, but works on native
- Samba Print Driver Errors