Ard Biesheuvel
2017-Mar-21 17:08 UTC
[Nouveau] use-after-free bug with GT218 on arm64 machine
Hello all, I am trying to debug an elusive memory corruption bug on my arm64 machine which appears to be in the nouveau driver. I got the following splat from the refcount debugging code: """ refcount_t: underflow; use-after-free. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3366 at lib/refcount.c:128 refcount_sub_and_test+0xe8/0x108 Modules linked in: fuse nouveau ttm drm_kms_helper drm ip_tables x_tables ipv6 CPU: 0 PID: 3366 Comm: gnome-shell Not tainted 4.11.0-rc3-00407-g97da3854c526 #1 Hardware name: AMD Seattle/Seattle, BIOS 11:14:27 Mar 20 2017 task: ffffffd6cca5b600 task.stack: ffffffd6c8120000 PC is at refcount_sub_and_test+0xe8/0x108 LR is at refcount_sub_and_test+0xe8/0x108 pc : [<ffffffa1f81af7f0>] lr : [<ffffffa1f81af7f0>] pstate: 20000145 sp : ffffffd6c81237c0 x29: ffffffd6c81237c0 x28: 0000000000000028 x27: ffffffd6cca5b610 x26: ffffffd6b4417080 x25: ffffffd68715a000 x24: ffffffd6cca5b610 x23: ffffffd68b513680 x22: ffffffa1f8473a30 x21: ffffffd6c7cc9e00 x20: 0000000000000001 x19: ffffffd68b513500 x18: 0000000000000020 x17: 0000007fb64d3db0 x16: ffffffa1f8473f18 x15: 000000003040d230 x14: 0000000200000000 x13: 0000000200000010 x12: ffffffffffffffff x11: 1ffffff43f3e8e1f x10: ffffff843f3e8e1f x9 : dfffff9000000000 x8 : ffffffa1f9f470fc x7 : 0000000000000000 x6 : ffffff843f3e8e20 x5 : ffffff843f3e8e20 x4 : 0000000000000000 x3 : 0000003600d52000 x2 : dfffff9000000000 x1 : ffffff8ad90246c6 x0 : 0000000000000026 ---[ end trace d188d18d5d3d25db ]--- Call trace: Exception stack(0xffffffd6c8123590 to 0xffffffd6c81236c0) 3580: ffffffd68b513500 0000008000000000 35a0: ffffffd6c81237c0 ffffffa1f81af7f0 0000000020000145 000000000000003d 35c0: ffffffd68715a000 ffffffa1f7e591b8 0000000041b58ab3 ffffffa1f8fe11b8 35e0: ffffffa1f7cf15b8 ffffffa1f8473a30 ffffffd68b513680 ffffffd6cca5b610 3600: ffffffd68715a000 ffffffd6b4417080 ffffffd6c81237c0 ffffffd6c81237c0 3620: ffffffd6c8123780 00000000ffffffc8 0000000041b58ab3 ffffffa1f8fea138 3640: ffffffa1f7dc0540 ffffffa1f7cfad28 ffffffa1f7ee9ee8 ffffff9001162e10 3660: ffffff900113f084 ffffff9000ed84b8 ffffff900113200c ffffffa1f7f2c910 3680: ffffffa1f7f2d20c ffffffa1f7cf3730 0000000000000026 ffffff8ad90246c6 36a0: dfffff9000000000 0000003600d52000 0000000000000000 ffffff843f3e8e20 [<ffffffa1f81af7f0>] refcount_sub_and_test+0xe8/0x108 [<ffffffa1f81af824>] refcount_dec_and_test+0x14/0x20 [<ffffffa1f847405c>] reservation_object_add_excl_fence+0x144/0x1e0 [<ffffff900113cce0>] nouveau_bo_fence+0x50/0x60 [nouveau] [<ffffff900113d1dc>] validate_fini_no_ticket+0xc4/0x190 [nouveau] [<ffffff900113e1fc>] nouveau_gem_ioctl_pushbuf+0x49c/0x1c78 [nouveau] [<ffffff9000ed84b8>] drm_ioctl+0x280/0x590 [drm] [<ffffff900113200c>] nouveau_drm_ioctl+0x8c/0x100 [nouveau] [<ffffffa1f7f2c910>] do_vfs_ioctl+0x130/0x9a0 [<ffffffa1f7f2d20c>] SyS_ioctl+0x8c/0xa0 [<ffffffa1f7cf3730>] el0_svc_naked+0x24/0x28 """ Enabling KASAN gives some additional information, many reports similar to """ =================================================================BUG: KASAN: use-after-free in nouveau_fence_sync+0x154/0x398 [nouveau] at addr ffffffd69064f808 Read of size 8 by task gnome-shell/3366 CPU: 4 PID: 3366 Comm: gnome-shell Tainted: G W 4.11.0-rc3-00407-g97da3854c526 #1 Hardware name: AMD Seattle/Seattle, BIOS 11:14:27 Mar 20 2017 Call trace: [<ffffffa1f7cfb1f8>] dump_backtrace+0x0/0x300 [<ffffffa1f7cfb50c>] show_stack+0x14/0x20 [<ffffffa1f8188788>] dump_stack+0xa8/0xd0 [<ffffffa1f7eea964>] kasan_object_err+0x24/0x80 [<ffffffa1f7eeabec>] kasan_report_error+0x1cc/0x4f0 [<ffffffa1f7eeb2e8>] kasan_report+0x38/0x40 [<ffffffa1f7ee985c>] __asan_load8+0x84/0x98 [<ffffff9001162a9c>] nouveau_fence_sync+0x154/0x398 [nouveau] [<ffffff900113e904>] nouveau_gem_ioctl_pushbuf+0xba4/0x1c78 [nouveau] [<ffffff9000ed84b8>] drm_ioctl+0x280/0x590 [drm] [<ffffff900113200c>] nouveau_drm_ioctl+0x8c/0x100 [nouveau] [<ffffffa1f7f2c910>] do_vfs_ioctl+0x130/0x9a0 [<ffffffa1f7f2d20c>] SyS_ioctl+0x8c/0xa0 [<ffffffa1f7cf3730>] el0_svc_naked+0x24/0x28 Object at ffffffd69064f800, in cache kmalloc-256 size: 256 Allocated: PID = 3366 save_stack_trace_tsk+0x0/0x220 save_stack_trace+0x18/0x20 kasan_kmalloc+0xd8/0x188 nouveau_fence_new+0xb0/0x150 [nouveau] nouveau_gem_ioctl_pushbuf+0x1324/0x1c78 [nouveau] drm_ioctl+0x280/0x590 [drm] nouveau_drm_ioctl+0x8c/0x100 [nouveau] do_vfs_ioctl+0x130/0x9a0 SyS_ioctl+0x8c/0xa0 el0_svc_naked+0x24/0x28 Freed: PID = 0 save_stack_trace_tsk+0x0/0x220 save_stack_trace+0x18/0x20 kasan_slab_free+0x88/0x188 kfree+0x70/0x1e0 rcu_process_callbacks+0x290/0x6a8 __do_softirq+0x1a0/0x328 Memory state around the buggy address: ffffffd69064f700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffffffd69064f780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc>ffffffd69064f800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb^ ffffffd69064f880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffffffd69064f900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc =================================================================""" I guess these are both results of the fact that a dma_fence object was freed but still turns up in some list. So my question is whether anyone can tell what's going on right off the bat (hey, I can always try), and if not, may I please have some suggestions on how to proceed debugging this? Regards, Ard.