Philipp Stanner
2025-Apr-10 09:24 UTC
[PATCH 0/3] drm/nouveau: Fix & improve nouveau_fence_done()
Contains two patches improving nouveau_fence_done(), and one addressing an actual bug (race): [ 39.848463] WARNING: CPU: 21 PID: 1734 at drivers/gpu/drm/nouveau/nouveau_fence.c:509 nouveau_fence_no_signaling+0xac/0xd0 [nouveau] [ 39.848551] Modules linked in: snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_ine t nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables qrtr sunrpc snd_sof_pci_intel_ tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic snd_sof_pci snd_sof_xtensa_dsp snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda snd_sof snd_sof_utils snd _soc_acpi_intel_match snd_soc_acpi snd_soc_acpi_intel_sdca_quirks snd_sof_intel_hda_mlink snd_soc_sdca snd_soc_avs snd_ctl_led snd_soc_hda_codec intel_rapl_msr snd_hda_ codec_realtek snd_hda_ext_core intel_rapl_common snd_hda_codec_generic snd_soc_core snd_hda_scodec_component intel_uncore_frequency intel_uncore_frequency_common snd_hd a_codec_hdmi intel_ifs snd_compress i10nm_edac skx_edac_common nfit snd_hda_intel snd_intel_dspcfg libnvdimm snd_hda_codec binfmt_misc snd_hwdep snd_hda_core snd_seq sn d_seq_device dell_wmi [ 39.848575] dell_pc x86_pkg_temp_thermal spi_nor platform_profile sparse_keymap intel_powerclamp dax_hmem snd_pcm cxl_acpi coretemp cxl_port iTCO_wdt mtd rapl intel _pmc_bxt pmt_telemetry cxl_core dell_wmi_sysman pmt_class iTCO_vendor_support snd_timer isst_if_mmio vfat intel_cstate dell_smbios dcdbas fat dell_wmi_ddv dell_smm_hwmo n dell_wmi_descriptor firmware_attributes_class wmi_bmof intel_uncore einj pcspkr isst_if_mbox_pci atlantic snd isst_if_common intel_vsec e1000e macsec mei_me i2c_i801 spi_intel_pci soundcore i2c_smbus spi_intel mei joydev loop nfnetlink zram nouveau drm_ttm_helper ttm polyval_clmulni iaa_crypto gpu_sched polyval_generic rtsx_pci_sdmm c ghash_clmulni_intel i2c_algo_bit mmc_core drm_gpuvm sha512_ssse3 nvme drm_exec drm_display_helper sha256_ssse3 idxd sha1_ssse3 cec nvme_core idxd_bus rtsx_pci nvme_au th pinctrl_alderlake ip6_tables ip_tables fuse [ 39.848603] CPU: 21 UID: 42 PID: 1734 Comm: gnome-shell Tainted: G W 6.14.0-rc4+ #11 [ 39.848605] Tainted: [W]=WARN [ 39.848606] Hardware name: Dell Inc. Precision 7960 Tower/01G0M6, BIOS 2.7.0 12/17/2024 [ 39.848607] RIP: 0010:nouveau_fence_no_signaling+0xac/0xd0 [nouveau] [ 39.848688] Code: db 74 17 48 8d 7b 38 b8 ff ff ff ff f0 0f c1 43 38 83 f8 01 74 29 85 c0 7e 17 31 c0 5b 5d c3 cc cc cc cc e8 76 b2 c5 f0 eb 96 <0f> 0b e9 67 ff ff f f be 03 00 00 00 e8 83 76 33 f1 31 c0 eb dd e8 [ 39.848690] RSP: 0018:ff1cc1ffc5c039f0 EFLAGS: 00010046 [ 39.848691] RAX: 0000000000000001 RBX: ff175a3b504da980 RCX: ff175a3b4801e008 [ 39.848692] RDX: ff175a3b43e7bad0 RSI: ffffffffc09d3fda RDI: ff175a3b504da980 [ 39.848693] RBP: ff175a3b504da9c0 R08: ffffffffc09e39df R09: 0000000000000001 [ 39.848694] R10: 0000000000000001 R11: 0000000000000000 R12: ff175a3b6d97de00 [ 39.848695] R13: 0000000000000246 R14: ff1cc1ffc5c03c60 R15: 0000000000000001 [ 39.848696] FS: 00007fc5477846c0(0000) GS:ff175a5a50280000(0000) knlGS:0000000000000000 [ 39.848698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 39.848699] CR2: 000055cb7613d1a8 CR3: 000000012e5ce004 CR4: 0000000000f71ef0 [ 39.848700] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 39.848701] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 39.848702] PKRU: 55555554 [ 39.848703] Call Trace: [ 39.848704] <TASK> [ 39.848705] ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau] [ 39.848782] ? __warn.cold+0x93/0xfa [ 39.848785] ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau] [ 39.848861] ? report_bug+0xff/0x140 [ 39.848863] ? handle_bug+0x58/0x90 [ 39.848865] ? exc_invalid_op+0x17/0x70 [ 39.848866] ? asm_exc_invalid_op+0x1a/0x20 [ 39.848870] ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau] [ 39.848943] nouveau_fence_enable_signaling+0x32/0x80 [nouveau] [ 39.849016] ? __pfx_nouveau_fence_cleanup_cb+0x10/0x10 [nouveau] [ 39.849088] __dma_fence_enable_signaling+0x33/0xc0 [ 39.849090] dma_fence_add_callback+0x4b/0xd0 [ 39.849093] nouveau_fence_emit+0xa3/0x260 [nouveau] [ 39.849166] nouveau_fence_new+0x7d/0xf0 [nouveau] [ 39.849242] nouveau_gem_ioctl_pushbuf+0xe8f/0x1300 [nouveau] [ 39.849338] ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau] [ 39.849431] drm_ioctl_kernel+0xad/0x100 [ 39.849433] drm_ioctl+0x288/0x550 [ 39.849435] ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau] [ 39.849526] nouveau_drm_ioctl+0x57/0xb0 [nouveau] [ 39.849620] __x64_sys_ioctl+0x94/0xc0 [ 39.849621] do_syscall_64+0x82/0x160 [ 39.849623] ? drm_ioctl+0x2b7/0x550 [ 39.849625] ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau] [ 39.849719] ? ktime_get_mono_fast_ns+0x38/0xd0 [ 39.849721] ? __pm_runtime_suspend+0x69/0xc0 [ 39.849724] ? syscall_exit_to_user_mode_prepare+0x15e/0x1a0 [ 39.849726] ? syscall_exit_to_user_mode+0x10/0x200 [ 39.849729] ? do_syscall_64+0x8e/0x160 [ 39.849730] ? exc_page_fault+0x7e/0x1a0 [ 39.849733] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 39.849735] RIP: 0033:0x7fc5576fe0ad [ 39.849736] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 [ 39.849737] RSP: 002b:00007ffc002688a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 39.849739] RAX: ffffffffffffffda RBX: 000055cb74e316c0 RCX: 00007fc5576fe0ad [ 39.849740] RDX: 00007ffc00268960 RSI: 00000000c0406481 RDI: 000000000000000e [ 39.849741] RBP: 00007ffc002688f0 R08: 0000000000000000 R09: 000055cb74e35560 [ 39.849742] R10: 0000000000000014 R11: 0000000000000246 R12: 00007ffc00268960 [ 39.849744] R13: 00000000c0406481 R14: 000000000000000e R15: 000055cb74e3cd10 [ 39.849746] </TASK> [ 39.849746] ---[ end trace 0000000000000000 ]--- [ 39.849776] ------------[ cut here ]------------ This is the first WARN_ON() in dma_fence_set_error(), called by nouveau_fence_context_kill(). It's rare, but it is a bug, or rather: the archetype of a race, since (as Christian pointed out) nouveau_fence_update() later at some point will remove the signaled fence (by signaling it again). P. Philipp Stanner (3): drm/nouveau: Prevent signaled fences in pending list drm/nouveau: Remove surplus if-branch drm/nouveau: Add helper to check base fence drivers/gpu/drm/nouveau/nouveau_fence.c | 32 ++++++++++++++----------- 1 file changed, 18 insertions(+), 14 deletions(-) -- 2.48.1
Philipp Stanner
2025-Apr-10 09:24 UTC
[PATCH 1/3] drm/nouveau: Prevent signaled fences in pending list
Nouveau currently relies on the assumption that dma_fences will only ever get signaled through nouveau_fence_signal(), which takes care of removing a signaled fence from the list nouveau_fence_chan.pending. This self-imposed rule is violated in nouveau_fence_done(), where dma_fence_is_signaled() (somewhat surprisingly, considering its name) can signal the fence without removing it from the list. This enables accesses to already signaled fences through the list, which is a bug. In particular, it can race with nouveau_fence_context_kill(), which would then attempt to set an error code on an already signaled fence, which is illegal. In nouveau_fence_done(), the call to nouveau_fence_update() already ensures to signal all ready fences. Thus, the signaling potentially performed by dma_fence_is_signaled() is actually not necessary. Replace the call to dma_fence_is_signaled() with nouveau_fence_base_is_signaled(). Cc: <stable at vger.kernel.org> # 4.10+, precise commit not to be determined Signed-off-by: Philipp Stanner <phasta at kernel.org> --- drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index 7cc84472cece..33535987d8ed 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -274,7 +274,7 @@ nouveau_fence_done(struct nouveau_fence *fence) nvif_event_block(&fctx->event); spin_unlock_irqrestore(&fctx->lock, flags); } - return dma_fence_is_signaled(&fence->base); + return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags); } static long -- 2.48.1
nouveau_fence_done() contains an if-branch which checks for the
existence of either of two fence backend ops. Those two are the only
backend ops existing in Nouveau, however; and at least one backend ops
must be in use for the entire driver to be able to work. The if branch
is, therefore, surplus.
Remove the if-branch.
Signed-off-by: Philipp Stanner <phasta at kernel.org>
---
drivers/gpu/drm/nouveau/nouveau_fence.c | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c
b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 33535987d8ed..db6f4494405c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -259,21 +259,19 @@ nouveau_fence_emit(struct nouveau_fence *fence)
bool
nouveau_fence_done(struct nouveau_fence *fence)
{
- if (fence->base.ops == &nouveau_fence_ops_legacy ||
- fence->base.ops == &nouveau_fence_ops_uevent) {
- struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
- struct nouveau_channel *chan;
- unsigned long flags;
+ struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
+ struct nouveau_channel *chan;
+ unsigned long flags;
- if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
- return true;
+ if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
+ return true;
+
+ spin_lock_irqsave(&fctx->lock, flags);
+ chan = rcu_dereference_protected(fence->channel,
lockdep_is_held(&fctx->lock));
+ if (chan && nouveau_fence_update(chan, fctx))
+ nvif_event_block(&fctx->event);
+ spin_unlock_irqrestore(&fctx->lock, flags);
- spin_lock_irqsave(&fctx->lock, flags);
- chan = rcu_dereference_protected(fence->channel,
lockdep_is_held(&fctx->lock));
- if (chan && nouveau_fence_update(chan, fctx))
- nvif_event_block(&fctx->event);
- spin_unlock_irqrestore(&fctx->lock, flags);
- }
return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags);
}
--
2.48.1
Philipp Stanner
2025-Apr-10 09:24 UTC
[PATCH 3/3] drm/nouveau: Add helper to check base fence
Nouveau, unfortunately, checks whether a dma_fence is already siganled
at various different places with, at times, different methods. In
nouveau_fence_update() it generally signals all fences the hardware is
done with by evaluating the sequence number. That mechanism then has no
way to tell the caller nouveau_fence_done() whether a particular fence
is actually signaled, which is why the internal bits of the dma_fence
get checked.
This can be made more readable by providing a new wrapper, which can
then later be helpful to solve an unrelated bug.
Add nouveau_fence_base_is_signaled().
Signed-off-by: Philipp Stanner <phasta at kernel.org>
---
drivers/gpu/drm/nouveau/nouveau_fence.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c
b/drivers/gpu/drm/nouveau/nouveau_fence.c
index db6f4494405c..0d58a81b3402 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -256,6 +256,12 @@ nouveau_fence_emit(struct nouveau_fence *fence)
return ret;
}
+static inline bool
+nouveau_fence_base_is_signaled(struct nouveau_fence *fence)
+{
+ return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags);
+}
+
bool
nouveau_fence_done(struct nouveau_fence *fence)
{
@@ -263,7 +269,7 @@ nouveau_fence_done(struct nouveau_fence *fence)
struct nouveau_channel *chan;
unsigned long flags;
- if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
+ if (nouveau_fence_base_is_signaled(fence))
return true;
spin_lock_irqsave(&fctx->lock, flags);
@@ -272,7 +278,7 @@ nouveau_fence_done(struct nouveau_fence *fence)
nvif_event_block(&fctx->event);
spin_unlock_irqrestore(&fctx->lock, flags);
- return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags);
+ return nouveau_fence_base_is_signaled(fence);
}
static long
--
2.48.1
Philipp Stanner
2025-Apr-10 09:51 UTC
[PATCH 0/3] drm/nouveau: Fix & improve nouveau_fence_done()
On Thu, 2025-04-10 at 11:24 +0200, Philipp Stanner wrote:> Contains two patches improving nouveau_fence_done(), and one > addressing > an actual bug (race):Oops, that's the wrong calltrace. Here we go: [ 85.791794] Call Trace: [ 85.791796] <TASK> [ 85.791797] ? nouveau_fence_context_kill (/home/imperator/linux/./include/linux/dma-fence.h:587 (discriminator 9) /home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_fence.c:94 (discriminator 9)) nouveau [ 85.791874] ? __warn.cold (/home/imperator/linux/kernel/panic.c:748) [ 85.791878] ? nouveau_fence_context_kill (/home/imperator/linux/./include/linux/dma-fence.h:587 (discriminator 9) /home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_fence.c:94 (discriminator 9)) nouveau [ 85.791950] ? report_bug (/home/imperator/linux/lib/bug.c:180 /home/imperator/linux/lib/bug.c:219) [ 85.791953] ? handle_bug (/home/imperator/linux/arch/x86/kernel/traps.c:260) [ 85.791956] ? exc_invalid_op (/home/imperator/linux/arch/x86/kernel/traps.c:309 (discriminator 1)) [ 85.791957] ? asm_exc_invalid_op (/home/imperator/linux/./arch/x86/include/asm/idtentry.h:621) [ 85.791960] ? nouveau_fence_context_kill (/home/imperator/linux/./include/linux/dma-fence.h:587 (discriminator 9) /home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_fence.c:94 (discriminator 9)) nouveau [ 85.792028] drm_sched_fini.cold (/home/imperator/linux/./include/trace/../../drivers/gpu/drm/scheduler/gpu_scheduler_trace.h:72 (discriminator 1)) gpu_sched [ 85.792033] ? drm_sched_entity_kill.part.0 (/home/imperator/linux/drivers/gpu/drm/scheduler/sched_entity.c:243 (discriminator 2)) gpu_sched [ 85.792037] nouveau_sched_destroy (/home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_sched.c:509 /home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_sched.c:518) nouveau [ 85.792122] nouveau_abi16_chan_fini.isra.0 (/home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_abi16.c:188) nouveau [ 85.792191] nouveau_abi16_fini (/home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_abi16.c:224 (discriminator 3)) nouveau [ 85.792263] nouveau_drm_postclose (/home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_drm.c:1240) nouveau [ 85.792349] drm_file_free (/home/imperator/linux/drivers/gpu/drm/drm_file.c:255) [ 85.792353] drm_release (/home/imperator/linux/./arch/x86/include/asm/atomic.h:67 (discriminator 1) /home/imperator/linux/./include/linux/atomic/atomic-arch-fallback.h:2278 (discriminator 1) /home/imperator/linux/./include/linux/atomic/atomic-instrumented.h:1384 (discriminator 1) /home/imperator/linux/drivers/gpu/drm/drm_file.c:428 (discriminator 1)) [ 85.792355] __fput (/home/imperator/linux/fs/file_table.c:464) [ 85.792357] task_work_run (/home/imperator/linux/kernel/task_work.c:227) [ 85.792360] do_exit (/home/imperator/linux/kernel/exit.c:939) [ 85.792362] do_group_exit (/home/imperator/linux/kernel/exit.c:1069) [ 85.792364] get_signal (/home/imperator/linux/kernel/signal.c:3036) [ 85.792366] arch_do_signal_or_restart (/home/imperator/linux/./arch/x86/include/asm/syscall.h:38 /home/imperator/linux/arch/x86/kernel/signal.c:264 /home/imperator/linux/arch/x86/kernel/signal.c:339) [ 85.792369] syscall_exit_to_user_mode (/home/imperator/linux/kernel/entry/common.c:113 /home/imperator/linux/./include/linux/entry-common.h:329 /home/imperator/linux/kernel/entry/common.c:207 /home/imperator/linux/kernel/entry/common.c:218) [ 85.792372] do_syscall_64 (/home/imperator/linux/./arch/x86/include/asm/cpufeature.h:172 /home/imperator/linux/arch/x86/entry/common.c:98) [ 85.792373] ? syscall_exit_to_user_mode_prepare (/home/imperator/linux/./include/linux/audit.h:357 /home/imperator/linux/kernel/entry/common.c:166 /home/imperator/linux/kernel/entry/common.c:200) [ 85.792376] ? syscall_exit_to_user_mode (/home/imperator/linux/./arch/x86/include/asm/paravirt.h:686 /home/imperator/linux/./include/linux/entry-common.h:232 /home/imperator/linux/kernel/entry/common.c:206 /home/imperator/linux/kernel/entry/common.c:218) [ 85.792377] ? do_syscall_64 (/home/imperator/linux/./arch/x86/include/asm/cpufeature.h:172 /home/imperator/linux/arch/x86/entry/common.c:98) [ 85.792378] entry_SYSCALL_64_after_hwframe (/home/imperator/linux/arch/x86/entry/entry_64.S:130) [ 85.792381] RIP: 0033:0x7ff950b6af70 [ 85.792383] Code: Unable to access opcode bytes at 0x7ff950b6af46. objdump: '/tmp/tmp.sfPRl5k2te.o': No such file Code starting with the faulting instruction =========================================== [ 85.792383] RSP: 002b:00007ff93cdfb6f0 EFLAGS: 00000293 ORIG_RAX: 000000000000010f [ 85.792385] RAX: fffffffffffffdfe RBX: 000055d386d61870 RCX: 00007ff950b6af70 [ 85.792386] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00007ff928000b90 [ 85.792387] RBP: 00007ff93cdfb740 R08: 0000000000000008 R09: 0000000000000000 [ 85.792388] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000001 [ 85.792388] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ff951b10b40 [ 85.792390] </TASK> [ 85.792391] ---[ end trace 0000000000000000 ]--- By the way, for reference: I did try whether it could be done to have nouveau_fence_signal() incorporated into nouveau_fence_update() and nouveau_fence_done(). This, however, would then cause a race with the list_del() in nouveau_fence_no_signaling(), WARNing because of the list poison. So the "solution" space is: * A cleanup callback on the dma_fence. * Keeping the current race or * replacing it with another race with another function. * Just preventing nouveau_fence_done() from signaling fences other than through nouveau_fence_update/signal The later seems clearly like the cleanest solution to me. Alternative would be a work-intensive rework of all the misdesigns broken in nouveau_fence.c P.> > [?? 39.848463] WARNING: CPU: 21 PID: 1734 at > drivers/gpu/drm/nouveau/nouveau_fence.c:509 > nouveau_fence_no_signaling+0xac/0xd0 [nouveau] > [?? 39.848551] Modules linked in: snd_seq_dummy snd_hrtimer > nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_ine > t nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat > nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set > nf_tables qrtr sunrpc snd_sof_pci_intel_ > tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic snd_sof_pci > snd_sof_xtensa_dsp snd_sof_intel_hda_common snd_soc_hdac_hda > snd_sof_intel_hda snd_sof snd_sof_utils snd > _soc_acpi_intel_match snd_soc_acpi snd_soc_acpi_intel_sdca_quirks > snd_sof_intel_hda_mlink snd_soc_sdca snd_soc_avs snd_ctl_led > snd_soc_hda_codec intel_rapl_msr snd_hda_ > codec_realtek snd_hda_ext_core intel_rapl_common > snd_hda_codec_generic snd_soc_core snd_hda_scodec_component > intel_uncore_frequency intel_uncore_frequency_common snd_hd > a_codec_hdmi intel_ifs snd_compress i10nm_edac skx_edac_common nfit > snd_hda_intel snd_intel_dspcfg libnvdimm snd_hda_codec binfmt_misc > snd_hwdep snd_hda_core snd_seq sn > d_seq_device dell_wmi > [?? 39.848575]? dell_pc x86_pkg_temp_thermal spi_nor platform_profile > sparse_keymap intel_powerclamp dax_hmem snd_pcm cxl_acpi coretemp > cxl_port iTCO_wdt mtd rapl intel > _pmc_bxt pmt_telemetry cxl_core dell_wmi_sysman pmt_class > iTCO_vendor_support snd_timer isst_if_mmio vfat intel_cstate > dell_smbios dcdbas fat dell_wmi_ddv dell_smm_hwmo > n dell_wmi_descriptor firmware_attributes_class wmi_bmof intel_uncore > einj pcspkr isst_if_mbox_pci atlantic snd isst_if_common intel_vsec > e1000e macsec mei_me i2c_i801 > spi_intel_pci soundcore i2c_smbus spi_intel mei joydev loop nfnetlink > zram nouveau drm_ttm_helper ttm polyval_clmulni iaa_crypto gpu_sched > polyval_generic rtsx_pci_sdmm > c ghash_clmulni_intel i2c_algo_bit mmc_core drm_gpuvm sha512_ssse3 > nvme drm_exec drm_display_helper sha256_ssse3 idxd sha1_ssse3 cec > nvme_core idxd_bus rtsx_pci nvme_au > th pinctrl_alderlake ip6_tables ip_tables fuse > [?? 39.848603] CPU: 21 UID: 42 PID: 1734 Comm: gnome-shell Tainted: > G??????? W????????? 6.14.0-rc4+ #11 > [?? 39.848605] Tainted: [W]=WARN > [?? 39.848606] Hardware name: Dell Inc. Precision 7960 Tower/01G0M6, > BIOS 2.7.0 12/17/2024 > [?? 39.848607] RIP: 0010:nouveau_fence_no_signaling+0xac/0xd0 > [nouveau] > [?? 39.848688] Code: db 74 17 48 8d 7b 38 b8 ff ff ff ff f0 0f c1 43 > 38 83 f8 01 74 29 85 c0 7e 17 31 c0 5b 5d c3 cc cc cc cc e8 76 b2 c5 > f0 eb 96 <0f> 0b e9 67 ff ff f > f be 03 00 00 00 e8 83 76 33 f1 31 c0 eb dd e8 > [?? 39.848690] RSP: 0018:ff1cc1ffc5c039f0 EFLAGS: 00010046 > [?? 39.848691] RAX: 0000000000000001 RBX: ff175a3b504da980 RCX: > ff175a3b4801e008 > [?? 39.848692] RDX: ff175a3b43e7bad0 RSI: ffffffffc09d3fda RDI: > ff175a3b504da980 > [?? 39.848693] RBP: ff175a3b504da9c0 R08: ffffffffc09e39df R09: > 0000000000000001 > [?? 39.848694] R10: 0000000000000001 R11: 0000000000000000 R12: > ff175a3b6d97de00 > [?? 39.848695] R13: 0000000000000246 R14: ff1cc1ffc5c03c60 R15: > 0000000000000001 > [?? 39.848696] FS:? 00007fc5477846c0(0000) GS:ff175a5a50280000(0000) > knlGS:0000000000000000 > [?? 39.848698] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [?? 39.848699] CR2: 000055cb7613d1a8 CR3: 000000012e5ce004 CR4: > 0000000000f71ef0 > [?? 39.848700] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [?? 39.848701] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: > 0000000000000400 > [?? 39.848702] PKRU: 55555554 > [?? 39.848703] Call Trace: > [?? 39.848704]? <TASK> > [?? 39.848705]? ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau] > [?? 39.848782]? ? __warn.cold+0x93/0xfa > [?? 39.848785]? ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau] > [?? 39.848861]? ? report_bug+0xff/0x140 > [?? 39.848863]? ? handle_bug+0x58/0x90 > [?? 39.848865]? ? exc_invalid_op+0x17/0x70 > [?? 39.848866]? ? asm_exc_invalid_op+0x1a/0x20 > [?? 39.848870]? ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau] > [?? 39.848943]? nouveau_fence_enable_signaling+0x32/0x80 [nouveau] > [?? 39.849016]? ? __pfx_nouveau_fence_cleanup_cb+0x10/0x10 [nouveau] > [?? 39.849088]? __dma_fence_enable_signaling+0x33/0xc0 > [?? 39.849090]? dma_fence_add_callback+0x4b/0xd0 > [?? 39.849093]? nouveau_fence_emit+0xa3/0x260 [nouveau] > [?? 39.849166]? nouveau_fence_new+0x7d/0xf0 [nouveau] > [?? 39.849242]? nouveau_gem_ioctl_pushbuf+0xe8f/0x1300 [nouveau] > [?? 39.849338]? ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau] > [?? 39.849431]? drm_ioctl_kernel+0xad/0x100 > [?? 39.849433]? drm_ioctl+0x288/0x550 > [?? 39.849435]? ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau] > [?? 39.849526]? nouveau_drm_ioctl+0x57/0xb0 [nouveau] > [?? 39.849620]? __x64_sys_ioctl+0x94/0xc0 > [?? 39.849621]? do_syscall_64+0x82/0x160 > [?? 39.849623]? ? drm_ioctl+0x2b7/0x550 > [?? 39.849625]? ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau] > [?? 39.849719]? ? ktime_get_mono_fast_ns+0x38/0xd0 > [?? 39.849721]? ? __pm_runtime_suspend+0x69/0xc0 > [?? 39.849724]? ? syscall_exit_to_user_mode_prepare+0x15e/0x1a0 > [?? 39.849726]? ? syscall_exit_to_user_mode+0x10/0x200 > [?? 39.849729]? ? do_syscall_64+0x8e/0x160 > [?? 39.849730]? ? exc_page_fault+0x7e/0x1a0 > [?? 39.849733]? entry_SYSCALL_64_after_hwframe+0x76/0x7e > [?? 39.849735] RIP: 0033:0x7fc5576fe0ad > [?? 39.849736] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 > c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 > 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 > 00 00 00 > [?? 39.849737] RSP: 002b:00007ffc002688a0 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > [?? 39.849739] RAX: ffffffffffffffda RBX: 000055cb74e316c0 RCX: > 00007fc5576fe0ad > [?? 39.849740] RDX: 00007ffc00268960 RSI: 00000000c0406481 RDI: > 000000000000000e > [?? 39.849741] RBP: 00007ffc002688f0 R08: 0000000000000000 R09: > 000055cb74e35560 > [?? 39.849742] R10: 0000000000000014 R11: 0000000000000246 R12: > 00007ffc00268960 > [?? 39.849744] R13: 00000000c0406481 R14: 000000000000000e R15: > 000055cb74e3cd10 > [?? 39.849746]? </TASK> > [?? 39.849746] ---[ end trace 0000000000000000 ]--- > [?? 39.849776] ------------[ cut here ]------------ > > > This is the first WARN_ON() in dma_fence_set_error(), called by > nouveau_fence_context_kill(). > > It's rare, but it is a bug, or rather: the archetype of a race, since > (as Christian pointed out) nouveau_fence_update() later at some point > will remove the signaled fence (by signaling it again). > > > P. > > > Philipp Stanner (3): > ? drm/nouveau: Prevent signaled fences in pending list > ? drm/nouveau: Remove surplus if-branch > ? drm/nouveau: Add helper to check base fence > > ?drivers/gpu/drm/nouveau/nouveau_fence.c | 32 ++++++++++++++--------- > -- > ?1 file changed, 18 insertions(+), 14 deletions(-) >
Christian König
2025-Apr-10 12:18 UTC
[PATCH 0/3] drm/nouveau: Fix & improve nouveau_fence_done()
Am 10.04.25 um 11:51 schrieb Philipp Stanner:> On Thu, 2025-04-10 at 11:24 +0200, Philipp Stanner wrote: >> Contains two patches improving nouveau_fence_done(), and one >> addressing >> an actual bug (race): > Oops, that's the wrong calltrace. Here we go: > > [ 85.791794] Call Trace: [ 85.791796] <TASK> [ 85.791797] ? nouveau_fence_context_kill (/home/imperator/linux/./include/linux/dma-fence.h:587 (discriminator 9) /home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_fence.c:94 (discriminator 9)) nouveau [ 85.791874] ? __warn.cold (/home/imperator/linux/kernel/panic.c:748) [ 85.791878] ? nouveau_fence_context_kill (/home/imperator/linux/./include/linux/dma-fence.h:587 (discriminator 9) /home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_fence.c:94 (discriminator 9)) nouveau [ 85.791950] ? report_bug (/home/imperator/linux/lib/bug.c:180 /home/imperator/linux/lib/bug.c:219) [ 85.791953] ? handle_bug (/home/imperator/linux/arch/x86/kernel/traps.c:260) [ 85.791956] ? exc_invalid_op (/home/imperator/linux/arch/x86/kernel/traps.c:309 (discriminator 1)) [ 85.791957] ? asm_exc_invalid_op (/home/imperator/linux/./arch/x86/include/asm/idtentry.h:621) [ 85.791960] ? nouveau_fence_context_kill (/home/imperator/linux/./include/linux/dma-fence.h:587 (discriminator 9) /home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_fence.c:94 (discriminator 9)) nouveau [ 85.792028] drm_sched_fini.cold (/home/imperator/linux/./include/trace/../../drivers/gpu/drm/scheduler/gpu_scheduler_trace.h:72 (discriminator 1)) gpu_sched [ 85.792033] ? drm_sched_entity_kill.part.0 (/home/imperator/linux/drivers/gpu/drm/scheduler/sched_entity.c:243 (discriminator 2)) gpu_sched [ 85.792037] nouveau_sched_destroy (/home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_sched.c:509 /home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_sched.c:518) nouveau [ 85.792122] nouveau_abi16_chan_fini.isra.0 (/home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_abi16.c:188) nouveau [ 85.792191] nouveau_abi16_fini (/home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_abi16.c:224 (discriminator 3)) nouveau [ 85.792263] nouveau_drm_postclose (/home/imperator/linux/drivers/gpu/drm/nouveau/nouveau_drm.c:1240) nouveau [ 85.792349] drm_file_free (/home/imperator/linux/drivers/gpu/drm/drm_file.c:255) [ 85.792353] drm_release (/home/imperator/linux/./arch/x86/include/asm/atomic.h:67 (discriminator 1) /home/imperator/linux/./include/linux/atomic/atomic-arch-fallback.h:2278 (discriminator 1) /home/imperator/linux/./include/linux/atomic/atomic-instrumented.h:1384 (discriminator 1) /home/imperator/linux/drivers/gpu/drm/drm_file.c:428 (discriminator 1)) [ 85.792355] __fput (/home/imperator/linux/fs/file_table.c:464) [ 85.792357] task_work_run (/home/imperator/linux/kernel/task_work.c:227) [ 85.792360] do_exit (/home/imperator/linux/kernel/exit.c:939) [ 85.792362] do_group_exit (/home/imperator/linux/kernel/exit.c:1069) [ 85.792364] get_signal (/home/imperator/linux/kernel/signal.c:3036) [ 85.792366] arch_do_signal_or_restart (/home/imperator/linux/./arch/x86/include/asm/syscall.h:38 /home/imperator/linux/arch/x86/kernel/signal.c:264 /home/imperator/linux/arch/x86/kernel/signal.c:339) [ 85.792369] syscall_exit_to_user_mode (/home/imperator/linux/kernel/entry/common.c:113 /home/imperator/linux/./include/linux/entry-common.h:329 /home/imperator/linux/kernel/entry/common.c:207 /home/imperator/linux/kernel/entry/common.c:218) [ 85.792372] do_syscall_64 (/home/imperator/linux/./arch/x86/include/asm/cpufeature.h:172 /home/imperator/linux/arch/x86/entry/common.c:98) [ 85.792373] ? syscall_exit_to_user_mode_prepare (/home/imperator/linux/./include/linux/audit.h:357 /home/imperator/linux/kernel/entry/common.c:166 /home/imperator/linux/kernel/entry/common.c:200) [ 85.792376] ? syscall_exit_to_user_mode (/home/imperator/linux/./arch/x86/include/asm/paravirt.h:686 /home/imperator/linux/./include/linux/entry-common.h:232 /home/imperator/linux/kernel/entry/common.c:206 /home/imperator/linux/kernel/entry/common.c:218) [ 85.792377] ? do_syscall_64 (/home/imperator/linux/./arch/x86/include/asm/cpufeature.h:172 /home/imperator/linux/arch/x86/entry/common.c:98) [ 85.792378] entry_SYSCALL_64_after_hwframe (/home/imperator/linux/arch/x86/entry/entry_64.S:130) [ 85.792381] RIP: 0033:0x7ff950b6af70 [ 85.792383] Code: Unable to access opcode bytes at 0x7ff950b6af46. objdump: '/tmp/tmp.sfPRl5k2te.o': No such file Code starting with the faulting instruction =========================================== [ 85.792383] RSP: 002b:00007ff93cdfb6f0 EFLAGS: 00000293 ORIG_RAX: 000000000000010f [ 85.792385] RAX: fffffffffffffdfe RBX: 000055d386d61870 RCX: 00007ff950b6af70 [ 85.792386] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00007ff928000b90 [ 85.792387] RBP: 00007ff93cdfb740 R08: 0000000000000008 R09: 0000000000000000 [ 85.792388] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000001 [ 85.792388] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ff951b10b40 [ 85.792390] </TASK> [ 85.792391] ---[ end trace 0000000000000000 ]---I think I understand the problem now as well, but that backtrace is completely mangled in the mail. It would be nice if you could send that out again. Thanks, Christian.> > By the way, for reference: > I did try whether it could be done to have nouveau_fence_signal() > incorporated into nouveau_fence_update() and nouveau_fence_done(). > This, however, would then cause a race with the list_del() in > nouveau_fence_no_signaling(), WARNing because of the list poison. > > So the "solution" space is: > * A cleanup callback on the dma_fence. > * Keeping the current race or > * replacing it with another race with another function. > * Just preventing nouveau_fence_done() from signaling fences other > than through nouveau_fence_update/signal > > The later seems clearly like the cleanest solution to me. Alternative > would be a work-intensive rework of all the misdesigns broken in > nouveau_fence.c > > > P. > >> [?? 39.848463] WARNING: CPU: 21 PID: 1734 at >> drivers/gpu/drm/nouveau/nouveau_fence.c:509 >> nouveau_fence_no_signaling+0xac/0xd0 [nouveau] >> [?? 39.848551] Modules linked in: snd_seq_dummy snd_hrtimer >> nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet >> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_ine >> t nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat >> nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set >> nf_tables qrtr sunrpc snd_sof_pci_intel_ >> tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic snd_sof_pci >> snd_sof_xtensa_dsp snd_sof_intel_hda_common snd_soc_hdac_hda >> snd_sof_intel_hda snd_sof snd_sof_utils snd >> _soc_acpi_intel_match snd_soc_acpi snd_soc_acpi_intel_sdca_quirks >> snd_sof_intel_hda_mlink snd_soc_sdca snd_soc_avs snd_ctl_led >> snd_soc_hda_codec intel_rapl_msr snd_hda_ >> codec_realtek snd_hda_ext_core intel_rapl_common >> snd_hda_codec_generic snd_soc_core snd_hda_scodec_component >> intel_uncore_frequency intel_uncore_frequency_common snd_hd >> a_codec_hdmi intel_ifs snd_compress i10nm_edac skx_edac_common nfit >> snd_hda_intel snd_intel_dspcfg libnvdimm snd_hda_codec binfmt_misc >> snd_hwdep snd_hda_core snd_seq sn >> d_seq_device dell_wmi >> [?? 39.848575]? dell_pc x86_pkg_temp_thermal spi_nor platform_profile >> sparse_keymap intel_powerclamp dax_hmem snd_pcm cxl_acpi coretemp >> cxl_port iTCO_wdt mtd rapl intel >> _pmc_bxt pmt_telemetry cxl_core dell_wmi_sysman pmt_class >> iTCO_vendor_support snd_timer isst_if_mmio vfat intel_cstate >> dell_smbios dcdbas fat dell_wmi_ddv dell_smm_hwmo >> n dell_wmi_descriptor firmware_attributes_class wmi_bmof intel_uncore >> einj pcspkr isst_if_mbox_pci atlantic snd isst_if_common intel_vsec >> e1000e macsec mei_me i2c_i801 >> spi_intel_pci soundcore i2c_smbus spi_intel mei joydev loop nfnetlink >> zram nouveau drm_ttm_helper ttm polyval_clmulni iaa_crypto gpu_sched >> polyval_generic rtsx_pci_sdmm >> c ghash_clmulni_intel i2c_algo_bit mmc_core drm_gpuvm sha512_ssse3 >> nvme drm_exec drm_display_helper sha256_ssse3 idxd sha1_ssse3 cec >> nvme_core idxd_bus rtsx_pci nvme_au >> th pinctrl_alderlake ip6_tables ip_tables fuse >> [?? 39.848603] CPU: 21 UID: 42 PID: 1734 Comm: gnome-shell Tainted: >> G??????? W????????? 6.14.0-rc4+ #11 >> [?? 39.848605] Tainted: [W]=WARN >> [?? 39.848606] Hardware name: Dell Inc. Precision 7960 Tower/01G0M6, >> BIOS 2.7.0 12/17/2024 >> [?? 39.848607] RIP: 0010:nouveau_fence_no_signaling+0xac/0xd0 >> [nouveau] >> [?? 39.848688] Code: db 74 17 48 8d 7b 38 b8 ff ff ff ff f0 0f c1 43 >> 38 83 f8 01 74 29 85 c0 7e 17 31 c0 5b 5d c3 cc cc cc cc e8 76 b2 c5 >> f0 eb 96 <0f> 0b e9 67 ff ff f >> f be 03 00 00 00 e8 83 76 33 f1 31 c0 eb dd e8 >> [?? 39.848690] RSP: 0018:ff1cc1ffc5c039f0 EFLAGS: 00010046 >> [?? 39.848691] RAX: 0000000000000001 RBX: ff175a3b504da980 RCX: >> ff175a3b4801e008 >> [?? 39.848692] RDX: ff175a3b43e7bad0 RSI: ffffffffc09d3fda RDI: >> ff175a3b504da980 >> [?? 39.848693] RBP: ff175a3b504da9c0 R08: ffffffffc09e39df R09: >> 0000000000000001 >> [?? 39.848694] R10: 0000000000000001 R11: 0000000000000000 R12: >> ff175a3b6d97de00 >> [?? 39.848695] R13: 0000000000000246 R14: ff1cc1ffc5c03c60 R15: >> 0000000000000001 >> [?? 39.848696] FS:? 00007fc5477846c0(0000) GS:ff175a5a50280000(0000) >> knlGS:0000000000000000 >> [?? 39.848698] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [?? 39.848699] CR2: 000055cb7613d1a8 CR3: 000000012e5ce004 CR4: >> 0000000000f71ef0 >> [?? 39.848700] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> [?? 39.848701] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: >> 0000000000000400 >> [?? 39.848702] PKRU: 55555554 >> [?? 39.848703] Call Trace: >> [?? 39.848704]? <TASK> >> [?? 39.848705]? ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau] >> [?? 39.848782]? ? __warn.cold+0x93/0xfa >> [?? 39.848785]? ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau] >> [?? 39.848861]? ? report_bug+0xff/0x140 >> [?? 39.848863]? ? handle_bug+0x58/0x90 >> [?? 39.848865]? ? exc_invalid_op+0x17/0x70 >> [?? 39.848866]? ? asm_exc_invalid_op+0x1a/0x20 >> [?? 39.848870]? ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau] >> [?? 39.848943]? nouveau_fence_enable_signaling+0x32/0x80 [nouveau] >> [?? 39.849016]? ? __pfx_nouveau_fence_cleanup_cb+0x10/0x10 [nouveau] >> [?? 39.849088]? __dma_fence_enable_signaling+0x33/0xc0 >> [?? 39.849090]? dma_fence_add_callback+0x4b/0xd0 >> [?? 39.849093]? nouveau_fence_emit+0xa3/0x260 [nouveau] >> [?? 39.849166]? nouveau_fence_new+0x7d/0xf0 [nouveau] >> [?? 39.849242]? nouveau_gem_ioctl_pushbuf+0xe8f/0x1300 [nouveau] >> [?? 39.849338]? ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau] >> [?? 39.849431]? drm_ioctl_kernel+0xad/0x100 >> [?? 39.849433]? drm_ioctl+0x288/0x550 >> [?? 39.849435]? ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau] >> [?? 39.849526]? nouveau_drm_ioctl+0x57/0xb0 [nouveau] >> [?? 39.849620]? __x64_sys_ioctl+0x94/0xc0 >> [?? 39.849621]? do_syscall_64+0x82/0x160 >> [?? 39.849623]? ? drm_ioctl+0x2b7/0x550 >> [?? 39.849625]? ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau] >> [?? 39.849719]? ? ktime_get_mono_fast_ns+0x38/0xd0 >> [?? 39.849721]? ? __pm_runtime_suspend+0x69/0xc0 >> [?? 39.849724]? ? syscall_exit_to_user_mode_prepare+0x15e/0x1a0 >> [?? 39.849726]? ? syscall_exit_to_user_mode+0x10/0x200 >> [?? 39.849729]? ? do_syscall_64+0x8e/0x160 >> [?? 39.849730]? ? exc_page_fault+0x7e/0x1a0 >> [?? 39.849733]? entry_SYSCALL_64_after_hwframe+0x76/0x7e >> [?? 39.849735] RIP: 0033:0x7fc5576fe0ad >> [?? 39.849736] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 >> c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 >> 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 >> 00 00 00 >> [?? 39.849737] RSP: 002b:00007ffc002688a0 EFLAGS: 00000246 ORIG_RAX: >> 0000000000000010 >> [?? 39.849739] RAX: ffffffffffffffda RBX: 000055cb74e316c0 RCX: >> 00007fc5576fe0ad >> [?? 39.849740] RDX: 00007ffc00268960 RSI: 00000000c0406481 RDI: >> 000000000000000e >> [?? 39.849741] RBP: 00007ffc002688f0 R08: 0000000000000000 R09: >> 000055cb74e35560 >> [?? 39.849742] R10: 0000000000000014 R11: 0000000000000246 R12: >> 00007ffc00268960 >> [?? 39.849744] R13: 00000000c0406481 R14: 000000000000000e R15: >> 000055cb74e3cd10 >> [?? 39.849746]? </TASK> >> [?? 39.849746] ---[ end trace 0000000000000000 ]--- >> [?? 39.849776] ------------[ cut here ]------------ >> >> >> This is the first WARN_ON() in dma_fence_set_error(), called by >> nouveau_fence_context_kill(). >> >> It's rare, but it is a bug, or rather: the archetype of a race, since >> (as Christian pointed out) nouveau_fence_update() later at some point >> will remove the signaled fence (by signaling it again). >> >> >> P. >> >> >> Philipp Stanner (3): >> ? drm/nouveau: Prevent signaled fences in pending list >> ? drm/nouveau: Remove surplus if-branch >> ? drm/nouveau: Add helper to check base fence >> >> ?drivers/gpu/drm/nouveau/nouveau_fence.c | 32 ++++++++++++++--------- >> -- >> ?1 file changed, 18 insertions(+), 14 deletions(-) >>