Peter Hurley
2013-Mar-23 11:47 UTC
[Nouveau] [bisected][3.9.0-rc3] NULL ptr dereference from nv50_disp_intr()
On Tue, 2013-03-19 at 11:13 -0400, Peter Hurley wrote:> On vanilla 3.9.0-rc3, I get this 100% repeatable oops after login when > the user X session is coming up:Perhaps I wasn't clear that this happens on every boot and is a regression from 3.8 I'd be happy to help resolve this but time is of the essence; it would be a shame to have to revert all of this for 3.9 Regards, Peter Hurley> BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 > IP: [<0000000000000001>] 0x0 > PGD 0 > Oops: 0010 [#1] PREEMPT SMP > Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ...<snip>... > CPU 3 > Pid: 0, comm: swapper/3 Not tainted 3.9.0-rc3-xeon #rc3 Dell Inc. Precision WorkStation T5400 /0RW203 > RIP: 0010:[<0000000000000001>] [<0000000000000001>] 0x0 > RSP: 0018:ffff8802afcc3d80 EFLAGS: 00010087 > RAX: ffff88029f6e5808 RBX: 0000000000000001 RCX: 0000000000000000 > RDX: 0000000000000096 RSI: 0000000000000001 RDI: ffff88029f6e5808 > RBP: ffff8802afcc3dc8 R08: 0000000000000000 R09: 0000000000000004 > R10: 000000000000002c R11: ffff88029e559a98 R12: ffff8802a376cb78 > R13: ffff88029f6e57e0 R14: ffff88029f6e57f8 R15: ffff88029f6e5808 > FS: 0000000000000000(0000) GS:ffff8802afcc0000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000001 CR3: 000000029fa67000 CR4: 00000000000007e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper/3 (pid: 0, threadinfo ffff8802a355e000, task ffff8802a3535c40) > Stack: > ffffffffa0159d8a 0000000000000082 ffff88029f6e5820 0000000000000001 > ffff88029f71aa00 0000000000000000 0000000000000000 0000000004000000 > 0000000004000000 ffff8802afcc3e38 ffffffffa01843b5 ffff8802afcc3df8 > Call Trace: > <IRQ> > [<ffffffffa0159d8a>] ? nouveau_event_trigger+0xaa/0xe0 [nouveau] > [<ffffffffa01843b5>] nv50_disp_intr+0xc5/0x200 [nouveau] > [<ffffffff816fbacc>] ? _raw_spin_unlock_irqrestore+0x2c/0x50 > [<ffffffff816ff98d>] ? notifier_call_chain+0x4d/0x70 > [<ffffffffa017a105>] nouveau_mc_intr+0xb5/0x110 [nouveau] > [<ffffffffa01d45ff>] nouveau_irq_handler+0x6f/0x80 [nouveau] > [<ffffffff810eec95>] handle_irq_event_percpu+0x75/0x260 > [<ffffffff810eeec8>] handle_irq_event+0x48/0x70 > [<ffffffff810f205a>] handle_fasteoi_irq+0x5a/0x100 > [<ffffffff810182f2>] handle_irq+0x22/0x40 > [<ffffffff8170561a>] do_IRQ+0x5a/0xd0 > [<ffffffff816fc2ad>] common_interrupt+0x6d/0x6d > <EOI> > [<ffffffff810449b6>] ? native_safe_halt+0x6/0x10 > [<ffffffff8101ea1d>] default_idle+0x3d/0x170 > [<ffffffff8101f736>] cpu_idle+0x116/0x130 > [<ffffffff816e2a06>] start_secondary+0x251/0x258 > Code: Bad RIP value. > RIP [<0000000000000001>] 0x0 > RSP <ffff8802afcc3d80> > CR2: 0000000000000001 > ---[ end trace 907323cb8ce6f301 ]--- > > > > git bisect from 3.8.0 (good) to 3.9.0-rc3 (bad) blames (bisect log > attached): > > 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b is the first bad commit > commit 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b > Author: Ben Skeggs <bskeggs at redhat.com> > Date: Thu Jan 31 09:23:34 2013 +1000 > > drm/nouveau/disp: port vblank handling to event interface > > This removes the nastiness with the interactions between display and > software engines when handling vblank semaphore release interrupts. > > Now, all the semantics are handled in one place (sw) \o/. > > Signed-off-by: Ben Skeggs <bskeggs at redhat.com> > > :040000 040000 fbd44f8566271415fd2775ab4b6346efef7e82fe a0730be0f35feaa1476b1447b1d65c4b3b3c0686 M drivers > > > On this hardware: > nouveau [ DEVICE][0000:02:00.0] BOOT0 : 0x084e00a2 > nouveau [ DEVICE][0000:02:00.0] Chipset: G84 (NV84) > nouveau [ DEVICE][0000:02:00.0] Family : NV50 > nouveau [ VBIOS][0000:02:00.0] checking PRAMIN for image... > nouveau [ VBIOS][0000:02:00.0] ... appears to be valid > nouveau [ VBIOS][0000:02:00.0] using image from PRAMIN > nouveau [ VBIOS][0000:02:00.0] BIT signature found > nouveau [ VBIOS][0000:02:00.0] version 60.84.63.00.11 > nouveau [ PFB][0000:02:00.0] RAM type: DDR2 > nouveau [ PFB][0000:02:00.0] RAM size: 256 MiB > nouveau [ PFB][0000:02:00.0] ZCOMP: 1892 tags > nouveau [ DRM] VRAM: 256 MiB > nouveau [ DRM] GART: 512 MiB > nouveau [ DRM] BIT BIOS found > nouveau [ DRM] Bios version 60.84.63.00 > nouveau [ DRM] TMDS table version 2.0 > nouveau [ DRM] DCB version 4.0 > nouveau [ DRM] DCB outp 00: 02000300 00000028 > nouveau [ DRM] DCB outp 01: 01000302 00000030 > nouveau [ DRM] DCB outp 02: 04011310 00000028 > nouveau [ DRM] DCB outp 03: 02011312 00000030 > nouveau [ DRM] DCB conn 00: 1030 > nouveau [ DRM] DCB conn 01: 2130 > nouveau [ DRM] 2 available performance level(s) > nouveau [ DRM] 0: core 208MHz shader 416MHz memory 100MHz voltage 1200mV fanspeed 100% > nouveau [ DRM] 1: core 460MHz shader 920MHz memory 400MHz voltage 1200mV fanspeed 100% > nouveau [ DRM] c: core 459MHz shader 918MHz memory 399MHz voltage 1200mV > nouveau [ DRM] MM: using CRYPT for buffer copies > nouveau [ DRM] allocated 1680x1050 fb: 0x60000, bo ffff88029ef50400 > fbcon: nouveaufb (fb0) is primary device > nouveau 0000:02:00.0: fb0: nouveaufb frame buffer device > nouveau 0000:02:00.0: registered panic notifier > [drm] Initialized nouveau 1.1.0 20120801 for 0000:02:00.0 on minor 0 > > > 02:00.0 VGA compatible controller: NVIDIA Corporation G84 [Quadro FX 570] (rev a1) (prog-if 00 [VGA controller]) > Subsystem: NVIDIA Corporation Device 0474 > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 52 > Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M] > Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M] > Region 3: Memory at f8000000 (64-bit, non-prefetchable) [size=32M] > Region 5: I/O ports at dc80 [size=128] > Expansion ROM at fbd00000 [disabled] [size=128K] > Capabilities: [60] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ > Address: 0000000000000000 Data: 0000 > Capabilities: [78] Express (v1) Endpoint, MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported- > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > LnkCap: Port #8, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <4us > ClockPM- Surprise- LLActRep- BwNot- > LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > Capabilities: [100 v1] Virtual Channel > Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 > Arb: Fixed- WRR32- WRR64- WRR128- > Ctrl: ArbSelect=Fixed > Status: InProgress- > VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- > Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- > Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 > Status: NegoPending- InProgress- > Capabilities: [128 v1] Power Budgeting <?> > Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> > Kernel driver in use: nouveau > Kernel modules: nouveau, nvidiafb > >
Maarten Lankhorst
2013-Mar-24 11:56 UTC
[Nouveau] [PATCH] drm/nouveau: fix NULL ptr dereference from nv50_disp_intr()
Op 23-03-13 12:47, Peter Hurley schreef:> On Tue, 2013-03-19 at 11:13 -0400, Peter Hurley wrote: >> On vanilla 3.9.0-rc3, I get this 100% repeatable oops after login when >> the user X session is coming up: > Perhaps I wasn't clear that this happens on every boot and is a > regression from 3.8 > > I'd be happy to help resolve this but time is of the essence; it would > be a shame to have to revert all of this for 3.9Well it broke on my system too, so it was easy to fix. I didn't even need gdm to trigger it!>8----This fixes regression caused by 1d7c71a3e2f7 (drm/nouveau/disp: port vblank handling to event interface), which causes a oops in the following way: BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 IP: [<0000000000000001>] 0x0 PGD 0 Oops: 0010 [#1] PREEMPT SMP Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ...<snip>... CPU 3 Pid: 0, comm: swapper/3 Not tainted 3.9.0-rc3-xeon #rc3 Dell Inc. Precision WorkStation T5400 /0RW203 RIP: 0010:[<0000000000000001>] [<0000000000000001>] 0x0 RSP: 0018:ffff8802afcc3d80 EFLAGS: 00010087 RAX: ffff88029f6e5808 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000096 RSI: 0000000000000001 RDI: ffff88029f6e5808 RBP: ffff8802afcc3dc8 R08: 0000000000000000 R09: 0000000000000004 R10: 000000000000002c R11: ffff88029e559a98 R12: ffff8802a376cb78 R13: ffff88029f6e57e0 R14: ffff88029f6e57f8 R15: ffff88029f6e5808 FS: 0000000000000000(0000) GS:ffff8802afcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000001 CR3: 000000029fa67000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper/3 (pid: 0, threadinfo ffff8802a355e000, task ffff8802a3535c40) Stack: ffffffffa0159d8a 0000000000000082 ffff88029f6e5820 0000000000000001 ffff88029f71aa00 0000000000000000 0000000000000000 0000000004000000 0000000004000000 ffff8802afcc3e38 ffffffffa01843b5 ffff8802afcc3df8 Call Trace: <IRQ> [<ffffffffa0159d8a>] ? nouveau_event_trigger+0xaa/0xe0 [nouveau] [<ffffffffa01843b5>] nv50_disp_intr+0xc5/0x200 [nouveau] [<ffffffff816fbacc>] ? _raw_spin_unlock_irqrestore+0x2c/0x50 [<ffffffff816ff98d>] ? notifier_call_chain+0x4d/0x70 [<ffffffffa017a105>] nouveau_mc_intr+0xb5/0x110 [nouveau] [<ffffffffa01d45ff>] nouveau_irq_handler+0x6f/0x80 [nouveau] [<ffffffff810eec95>] handle_irq_event_percpu+0x75/0x260 [<ffffffff810eeec8>] handle_irq_event+0x48/0x70 [<ffffffff810f205a>] handle_fasteoi_irq+0x5a/0x100 [<ffffffff810182f2>] handle_irq+0x22/0x40 [<ffffffff8170561a>] do_IRQ+0x5a/0xd0 [<ffffffff816fc2ad>] common_interrupt+0x6d/0x6d <EOI> [<ffffffff810449b6>] ? native_safe_halt+0x6/0x10 [<ffffffff8101ea1d>] default_idle+0x3d/0x170 [<ffffffff8101f736>] cpu_idle+0x116/0x130 [<ffffffff816e2a06>] start_secondary+0x251/0x258 Code: Bad RIP value. RIP [<0000000000000001>] 0x0 RSP <ffff8802afcc3d80> CR2: 0000000000000001 ---[ end trace 907323cb8ce6f301 ]--- Signed-off-by: Maarten Lankhorst <maarten.lankhorst at canonical.com> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index d109936..c95decf 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -72,11 +72,25 @@ module_param_named(modeset, nouveau_modeset, int, 0400); static struct drm_driver driver; static int +nouveau_drm_vblank_handler(struct nouveau_eventh *event, int head) +{ + struct nouveau_drm *drm + container_of(event, struct nouveau_drm, vblank[head]); + drm_handle_vblank(drm->dev, head); + return NVKM_EVENT_KEEP; +} + +static int nouveau_drm_vblank_enable(struct drm_device *dev, int head) { struct nouveau_drm *drm = nouveau_drm(dev); struct nouveau_disp *pdisp = nouveau_disp(drm->device); - nouveau_event_get(pdisp->vblank, head, &drm->vblank); + + if (WARN_ON_ONCE(head > ARRAY_SIZE(drm->vblank))) + return -EIO; + WARN_ON_ONCE(drm->vblank[head].func); + drm->vblank[head].func = nouveau_drm_vblank_handler; + nouveau_event_get(pdisp->vblank, head, &drm->vblank[head]); return 0; } @@ -85,16 +99,11 @@ nouveau_drm_vblank_disable(struct drm_device *dev, int head) { struct nouveau_drm *drm = nouveau_drm(dev); struct nouveau_disp *pdisp = nouveau_disp(drm->device); - nouveau_event_put(pdisp->vblank, head, &drm->vblank); -} - -static int -nouveau_drm_vblank_handler(struct nouveau_eventh *event, int head) -{ - struct nouveau_drm *drm - container_of(event, struct nouveau_drm, vblank); - drm_handle_vblank(drm->dev, head); - return NVKM_EVENT_KEEP; + if (drm->vblank[head].func) + nouveau_event_put(pdisp->vblank, head, &drm->vblank[head]); + else + WARN_ON_ONCE(1); + drm->vblank[head].func = NULL; } static u64 @@ -292,7 +301,6 @@ nouveau_drm_load(struct drm_device *dev, unsigned long flags) dev->dev_private = drm; drm->dev = dev; - drm->vblank.func = nouveau_drm_vblank_handler; INIT_LIST_HEAD(&drm->clients); spin_lock_init(&drm->tile.lock); diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.h b/drivers/gpu/drm/nouveau/nouveau_drm.h index b25df37..9c85601 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.h +++ b/drivers/gpu/drm/nouveau/nouveau_drm.h @@ -113,7 +113,7 @@ struct nouveau_drm { struct nvbios vbios; struct nouveau_display *display; struct backlight_device *backlight; - struct nouveau_eventh vblank; + struct nouveau_eventh vblank[16]; /* power management */ struct nouveau_pm *pm;
Possibly Parallel Threads
- [PATCH 0/9] drm/nouveau: Cleanup event/handler design
- [PATCH] drm/nouveau: fix vblank deadlock
- [PATCH] drm/nouveau: fix vblank deadlock
- [PATCH 5/9] drm/nouveau: Add install/remove semantics for event handlers
- [PATCH 4/9] drm/nouveau: Allow asymmetric nouveau_event_get/_put