thr3ads.net - Linux Virtualization - [Spice-devel] Xorg indefinitely hangs in kernelspace [Sep 2019]

If this information is useful, please help other people find it:
Share via:

Hillf Danton

2019-Sep-06 05:53 UTC

Xorg indefinitely hangs in kernelspace

On Tue, 6 Aug 2019 21:00:10 +0300 From:   Jaak Ristioja <jaak at
ristioja.ee>> Hello!
> 
> I'm writing to report a crash in the QXL / DRM code in the Linux
kernel.
> I originally filed the issue on LaunchPad and more details can be found
> there, although I doubt whether these details are useful.
> 
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1813620
> 
> I first experienced these issues with:
> 
> * Ubuntu 18.04 (probably kernel 4.15.something)
> * Ubuntu 18.10 (kernel 4.18.0-13)
> * Ubuntu 19.04 (kernel 5.0.0-13-generic)
> * Ubuntu 19.04 (mainline kernel 5.1-rc7)
> * Ubuntu 19.04 (mainline kernel 5.2.0-050200rc1-generic)
> 
> Here is the crash output from dmesg:
> 
> [354073.713350] INFO: task Xorg:920 blocked for more than 120 seconds.
> [354073.717755]       Not tainted 5.2.0-050200rc1-generic #201905191930
> [354073.722277] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [354073.738332] Xorg            D    0   920    854 0x00404004
> [354073.738334] Call Trace:
> [354073.738340]  __schedule+0x2ba/0x650
> [354073.738342]  schedule+0x2d/0x90
> [354073.738343]  schedule_preempt_disabled+0xe/0x10
> [354073.738345]  __ww_mutex_lock.isra.11+0x3e0/0x750
> [354073.738346]  __ww_mutex_lock_slowpath+0x16/0x20
> [354073.738347]  ww_mutex_lock+0x34/0x50
> [354073.738352]  ttm_eu_reserve_buffers+0x1f9/0x2e0 [ttm]
> [354073.738356]  qxl_release_reserve_list+0x67/0x150 [qxl]
> [354073.738358]  ? qxl_bo_pin+0xaa/0x190 [qxl]
> [354073.738359]  qxl_cursor_atomic_update+0x1b0/0x2e0 [qxl]
> [354073.738367]  drm_atomic_helper_commit_planes+0xb9/0x220
[drm_kms_helper]
> [354073.738371]  drm_atomic_helper_commit_tail+0x2b/0x70 [drm_kms_helper]
> [354073.738374]  commit_tail+0x67/0x70 [drm_kms_helper]
> [354073.738378]  drm_atomic_helper_commit+0x113/0x120 [drm_kms_helper]
> [354073.738390]  drm_atomic_commit+0x4a/0x50 [drm]
> [354073.738394]  drm_atomic_helper_update_plane+0xe9/0x100 [drm_kms_helper]
> [354073.738402]  __setplane_atomic+0xd3/0x120 [drm]
> [354073.738410]  drm_mode_cursor_universal+0x142/0x270 [drm]
> [354073.738418]  drm_mode_cursor_common+0xcb/0x220 [drm]
> [354073.738425]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> [354073.738432]  drm_mode_cursor2_ioctl+0xe/0x10 [drm]
> [354073.738438]  drm_ioctl_kernel+0xb0/0x100 [drm]
> [354073.738440]  ? ___sys_recvmsg+0x16c/0x200
> [354073.738445]  drm_ioctl+0x233/0x410 [drm]
> [354073.738452]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> [354073.738454]  ? timerqueue_add+0x57/0x90
> [354073.738456]  ? enqueue_hrtimer+0x3c/0x90
> [354073.738458]  do_vfs_ioctl+0xa9/0x640
> [354073.738459]  ? fput+0x13/0x20
> [354073.738461]  ? __sys_recvmsg+0x88/0xa0
> [354073.738462]  ksys_ioctl+0x67/0x90
> [354073.738463]  __x64_sys_ioctl+0x1a/0x20
> [354073.738465]  do_syscall_64+0x5a/0x140
> [354073.738467]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [354073.738468] RIP: 0033:0x7ffad14d3417
> [354073.738472] Code: Bad RIP value.
> [354073.738472] RSP: 002b:00007ffdd5679978 EFLAGS: 00003246 ORIG_RAX:
> 0000000000000010
> [354073.738473] RAX: ffffffffffffffda RBX: 000056428a474610 RCX:
> 00007ffad14d3417
> [354073.738474] RDX: 00007ffdd56799b0 RSI: 00000000c02464bb RDI:
> 000000000000000e
> [354073.738474] RBP: 00007ffdd56799b0 R08: 0000000000000040 R09:
> 0000000000000010
> [354073.738475] R10: 000000000000003f R11: 0000000000003246 R12:
> 00000000c02464bb
> [354073.738475] R13: 000000000000000e R14: 0000000000000000 R15:
> 000056428a4721d0
> [354073.738511] INFO: task kworker/1:0:27625 blocked for more than 120
seconds.
> [354073.745154]       Not tainted 5.2.0-050200rc1-generic #201905191930
> [354073.751900] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [354073.762197] kworker/1:0     D    0 27625      2 0x80004000
> [354073.762205] Workqueue: events qxl_client_monitors_config_work_func
[qxl]
> [354073.762206] Call Trace:
> [354073.762211]  __schedule+0x2ba/0x650
> [354073.762214]  schedule+0x2d/0x90
> [354073.762215]  schedule_preempt_disabled+0xe/0x10
> [354073.762216]  __ww_mutex_lock.isra.11+0x3e0/0x750
> [354073.762217]  ? __switch_to_asm+0x34/0x70
> [354073.762218]  ? __switch_to_asm+0x40/0x70
> [354073.762219]  ? __switch_to_asm+0x40/0x70
> [354073.762220]  __ww_mutex_lock_slowpath+0x16/0x20
> [354073.762221]  ww_mutex_lock+0x34/0x50
> [354073.762235]  drm_modeset_lock+0x35/0xb0 [drm]
> [354073.762243]  drm_modeset_lock_all_ctx+0x5d/0xe0 [drm]
> [354073.762251]  drm_modeset_lock_all+0x5e/0xb0 [drm]
> [354073.762252]  qxl_display_read_client_monitors_config+0x1e1/0x370 [qxl]
> [354073.762254]  qxl_client_monitors_config_work_func+0x15/0x20 [qxl]
> [354073.762256]  process_one_work+0x20f/0x410
> [354073.762257]  worker_thread+0x34/0x400
> [354073.762259]  kthread+0x120/0x140
> [354073.762260]  ? process_one_work+0x410/0x410
> [354073.762261]  ? __kthread_parkme+0x70/0x70
> [354073.762262]  ret_from_fork+0x35/0x40
> 
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -97,8 +97,9 @@ int ttm_eu_reserve_buffers(struct ww_acq
 			   struct list_head *dups, bool del_lru)
 {
 	struct ttm_bo_global *glob;
-	struct ttm_validate_buffer *entry;
+	struct ttm_validate_buffer *entry, *last_entry;
 	int ret;
+	bool locked = false;
 
 	if (list_empty(list))
 		return 0;
@@ -112,7 +113,10 @@ int ttm_eu_reserve_buffers(struct ww_acq
 	list_for_each_entry(entry, list, head) {
 		struct ttm_buffer_object *bo = entry->bo;
 
+		last_entry = entry;
 		ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), ticket);
+		if (!ret)
+			locked = true;
 		if (!ret && unlikely(atomic_read(&bo->cpu_writers) > 0)) {
 			reservation_object_unlock(bo->resv);
 
@@ -151,6 +155,10 @@ int ttm_eu_reserve_buffers(struct ww_acq
 				ret = 0;
 			}
 		}
+		if (!ret)
+			locked = true;
+		else
+			locked = false;
 
 		if (!ret && entry->num_shared)
 			ret = reservation_object_reserve_shared(bo->resv,
@@ -163,6 +171,8 @@ int ttm_eu_reserve_buffers(struct ww_acq
 				ww_acquire_done(ticket);
 				ww_acquire_fini(ticket);
 			}
+			if (locked)
+				ttm_eu_backoff_reservation_reverse(list, entry);
 			return ret;
 		}
 
@@ -172,6 +182,8 @@ int ttm_eu_reserve_buffers(struct ww_acq
 		list_del(&entry->head);
 		list_add(&entry->head, list);
 	}
+	if (locked)
+		ttm_eu_backoff_reservation_reverse(list, last_entry);
 
 	if (del_lru) {
 		spin_lock(&glob->lru_lock);
--

Frediano Ziglio

2019-Sep-06 20:27 UTC

head link

[Spice-devel] Xorg indefinitely hangs in kernelspace

> 
> On Tue, 6 Aug 2019 21:00:10 +0300 From:   Jaak Ristioja <jaak at
ristioja.ee>
> > Hello!
> > 
> > I'm writing to report a crash in the QXL / DRM code in the Linux
kernel.
> > I originally filed the issue on LaunchPad and more details can be
found
> > there, although I doubt whether these details are useful.
> > 
> >   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1813620
> > 
> > I first experienced these issues with:
> > 
> > * Ubuntu 18.04 (probably kernel 4.15.something)
> > * Ubuntu 18.10 (kernel 4.18.0-13)
> > * Ubuntu 19.04 (kernel 5.0.0-13-generic)
> > * Ubuntu 19.04 (mainline kernel 5.1-rc7)
> > * Ubuntu 19.04 (mainline kernel 5.2.0-050200rc1-generic)
> > 
> > Here is the crash output from dmesg:
> > 
> > [354073.713350] INFO: task Xorg:920 blocked for more than 120 seconds.
> > [354073.717755]       Not tainted 5.2.0-050200rc1-generic
#201905191930
> > [354073.722277] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [354073.738332] Xorg            D    0   920    854 0x00404004
> > [354073.738334] Call Trace:
> > [354073.738340]  __schedule+0x2ba/0x650
> > [354073.738342]  schedule+0x2d/0x90
> > [354073.738343]  schedule_preempt_disabled+0xe/0x10
> > [354073.738345]  __ww_mutex_lock.isra.11+0x3e0/0x750
> > [354073.738346]  __ww_mutex_lock_slowpath+0x16/0x20
> > [354073.738347]  ww_mutex_lock+0x34/0x50
> > [354073.738352]  ttm_eu_reserve_buffers+0x1f9/0x2e0 [ttm]
> > [354073.738356]  qxl_release_reserve_list+0x67/0x150 [qxl]
> > [354073.738358]  ? qxl_bo_pin+0xaa/0x190 [qxl]
> > [354073.738359]  qxl_cursor_atomic_update+0x1b0/0x2e0 [qxl]
> > [354073.738367]  drm_atomic_helper_commit_planes+0xb9/0x220
> > [drm_kms_helper]
> > [354073.738371]  drm_atomic_helper_commit_tail+0x2b/0x70
[drm_kms_helper]
> > [354073.738374]  commit_tail+0x67/0x70 [drm_kms_helper]
> > [354073.738378]  drm_atomic_helper_commit+0x113/0x120 [drm_kms_helper]
> > [354073.738390]  drm_atomic_commit+0x4a/0x50 [drm]
> > [354073.738394]  drm_atomic_helper_update_plane+0xe9/0x100
[drm_kms_helper]
> > [354073.738402]  __setplane_atomic+0xd3/0x120 [drm]
> > [354073.738410]  drm_mode_cursor_universal+0x142/0x270 [drm]
> > [354073.738418]  drm_mode_cursor_common+0xcb/0x220 [drm]
> > [354073.738425]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> > [354073.738432]  drm_mode_cursor2_ioctl+0xe/0x10 [drm]
> > [354073.738438]  drm_ioctl_kernel+0xb0/0x100 [drm]
> > [354073.738440]  ? ___sys_recvmsg+0x16c/0x200
> > [354073.738445]  drm_ioctl+0x233/0x410 [drm]
> > [354073.738452]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> > [354073.738454]  ? timerqueue_add+0x57/0x90
> > [354073.738456]  ? enqueue_hrtimer+0x3c/0x90
> > [354073.738458]  do_vfs_ioctl+0xa9/0x640
> > [354073.738459]  ? fput+0x13/0x20
> > [354073.738461]  ? __sys_recvmsg+0x88/0xa0
> > [354073.738462]  ksys_ioctl+0x67/0x90
> > [354073.738463]  __x64_sys_ioctl+0x1a/0x20
> > [354073.738465]  do_syscall_64+0x5a/0x140
> > [354073.738467]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [354073.738468] RIP: 0033:0x7ffad14d3417
> > [354073.738472] Code: Bad RIP value.
> > [354073.738472] RSP: 002b:00007ffdd5679978 EFLAGS: 00003246 ORIG_RAX:
> > 0000000000000010
> > [354073.738473] RAX: ffffffffffffffda RBX: 000056428a474610 RCX:
> > 00007ffad14d3417
> > [354073.738474] RDX: 00007ffdd56799b0 RSI: 00000000c02464bb RDI:
> > 000000000000000e
> > [354073.738474] RBP: 00007ffdd56799b0 R08: 0000000000000040 R09:
> > 0000000000000010
> > [354073.738475] R10: 000000000000003f R11: 0000000000003246 R12:
> > 00000000c02464bb
> > [354073.738475] R13: 000000000000000e R14: 0000000000000000 R15:
> > 000056428a4721d0
> > [354073.738511] INFO: task kworker/1:0:27625 blocked for more than 120
> > seconds.
> > [354073.745154]       Not tainted 5.2.0-050200rc1-generic
#201905191930
> > [354073.751900] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [354073.762197] kworker/1:0     D    0 27625      2 0x80004000
> > [354073.762205] Workqueue: events qxl_client_monitors_config_work_func
> > [qxl]
> > [354073.762206] Call Trace:
> > [354073.762211]  __schedule+0x2ba/0x650
> > [354073.762214]  schedule+0x2d/0x90
> > [354073.762215]  schedule_preempt_disabled+0xe/0x10
> > [354073.762216]  __ww_mutex_lock.isra.11+0x3e0/0x750
> > [354073.762217]  ? __switch_to_asm+0x34/0x70
> > [354073.762218]  ? __switch_to_asm+0x40/0x70
> > [354073.762219]  ? __switch_to_asm+0x40/0x70
> > [354073.762220]  __ww_mutex_lock_slowpath+0x16/0x20
> > [354073.762221]  ww_mutex_lock+0x34/0x50
> > [354073.762235]  drm_modeset_lock+0x35/0xb0 [drm]
> > [354073.762243]  drm_modeset_lock_all_ctx+0x5d/0xe0 [drm]
> > [354073.762251]  drm_modeset_lock_all+0x5e/0xb0 [drm]
> > [354073.762252]  qxl_display_read_client_monitors_config+0x1e1/0x370
[qxl]
> > [354073.762254]  qxl_client_monitors_config_work_func+0x15/0x20 [qxl]
> > [354073.762256]  process_one_work+0x20f/0x410
> > [354073.762257]  worker_thread+0x34/0x400
> > [354073.762259]  kthread+0x120/0x140
> > [354073.762260]  ? process_one_work+0x410/0x410
> > [354073.762261]  ? __kthread_parkme+0x70/0x70
> > [354073.762262]  ret_from_fork+0x35/0x40
> > 
> 
> --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> @@ -97,8 +97,9 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  			   struct list_head *dups, bool del_lru)
>  {
>  	struct ttm_bo_global *glob;
> -	struct ttm_validate_buffer *entry;
> +	struct ttm_validate_buffer *entry, *last_entry;
>  	int ret;
> +	bool locked = false;
>  
>  	if (list_empty(list))
>  		return 0;
> @@ -112,7 +113,10 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  	list_for_each_entry(entry, list, head) {
>  		struct ttm_buffer_object *bo = entry->bo;
>  
> +		last_entry = entry;
>  		ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), ticket);
> +		if (!ret)
> +			locked = true;
>  		if (!ret && unlikely(atomic_read(&bo->cpu_writers) >
0)) {
>  			reservation_object_unlock(bo->resv);
>  
> @@ -151,6 +155,10 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  				ret = 0;
>  			}
>  		}
> +		if (!ret)
> +			locked = true;
> +		else
> +			locked = false;
>  
locked = !ret; 

?
>  		if (!ret && entry->num_shared)
>  			ret = reservation_object_reserve_shared(bo->resv,
> @@ -163,6 +171,8 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  				ww_acquire_done(ticket);
>  				ww_acquire_fini(ticket);
>  			}
> +			if (locked)
> +				ttm_eu_backoff_reservation_reverse(list, entry);
>  			return ret;
>  		}
>  
> @@ -172,6 +182,8 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  		list_del(&entry->head);
>  		list_add(&entry->head, list);
>  	}
> +	if (locked)
> +		ttm_eu_backoff_reservation_reverse(list, last_entry);
>  
>  	if (del_lru) {
>  		spin_lock(&glob->lru_lock);
Where does it came this patch? Is it already somewhere?
Is it supposed to fix this issue?
Does it affect some other card beside QXL?

Frediano

Hillf Danton

2019-Sep-07 02:00 UTC

head link

[Spice-devel] Xorg indefinitely hangs in kernelspace

>From Frediano Ziglio <fziglio at redhat.com>
>
> Where does it came this patch?
My fingers tapping the keyboard.
> Is it already somewhere?
No idea yet.
> Is it supposed to fix this issue?
It may do nothing else as far as I can tell.
> Does it affect some other card beside QXL?
Perhaps.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20190907/61a52145/attachment.html>

Gerd Hoffmann

2019-Sep-09 05:52 UTC

head link

Xorg indefinitely hangs in kernelspace

Hi,

--verbose please.  Do you see the same hang?  Does the patch fix it?
> --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> @@ -97,8 +97,9 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  			   struct list_head *dups, bool del_lru)[ ... ]
> +			if (locked)
> +				ttm_eu_backoff_reservation_reverse(list, entry);
Hmm, I think the patch is wrong.  As far I know it is the qxl drivers's
job to call ttm_eu_backoff_reservation().  Doing that automatically in
ttm will most likely break other ttm users.

So I guess the call is missing in the qxl driver somewhere, most likely
in some error handling code path given that this bug is a relatively
rare event.

There is only a single ttm_eu_reserve_buffers() call in qxl.
So how about this?

----------------------- cut here --------------------
diff --git a/drivers/gpu/drm/qxl/qxl_release.c
b/drivers/gpu/drm/qxl/qxl_release.c
index 312216caeea2..2f9950fa0b8d 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -262,18 +262,20 @@ int qxl_release_reserve_list(struct qxl_release *release,
bool no_intr)
 	ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos,
 				     !no_intr, NULL, true);
 	if (ret)
-		return ret;
+		goto err_backoff;
 
 	list_for_each_entry(entry, &release->bos, tv.head) {
 		struct qxl_bo *bo = to_qxl_bo(entry->tv.bo);
 
 		ret = qxl_release_validate_bo(bo);
-		if (ret) {
-			ttm_eu_backoff_reservation(&release->ticket, &release->bos);
-			return ret;
-		}
+		if (ret)
+			goto err_backoff;
 	}
 	return 0;
+
+err_backoff:
+	ttm_eu_backoff_reservation(&release->ticket, &release->bos);
+	return ret;
 }
 
 void qxl_release_backoff_reserve_list(struct qxl_release *release)
----------------------- cut here --------------------

cheers,
  Gerd

Hillf Danton

2019-Sep-09 07:13 UTC

head link

Xorg indefinitely hangs in kernelspace

Hi,

On Mon, 9 Sep 2019 from Gerd Hoffmann <kraxel at
redhat.com>>
> Hmm, I think the patch is wrong.  As far I know it is the qxl drivers's
> job to call ttm_eu_backoff_reservation().  Doing that automatically in
> ttm will most likely break other ttm users.
>Perhaps.
>So I guess the call is missing in the qxl driver somewhere, most likely
>in some error handling code path given that this bug is a relatively
>rare event.
>
>There is only a single ttm_eu_reserve_buffers() call in qxl.
>So how about this?
>No preference in either way if it is a right cure.

BTW a quick peep at the mainline tree shows not every
ttm_eu_reserve_buffers() pairs with ttm_eu_backoff_reservation()
without qxl being taken in account.

Hillf
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20190909/920e9e59/attachment.html>

Seemingly Similar Threads

Search for more seemingly similar threads

Linux Virtualization - Sep 2019 - [Spice-devel] Xorg indefinitely hangs in kernelspace

Xorg indefinitely hangs in kernelspace

[Spice-devel] Xorg indefinitely hangs in kernelspace

[Spice-devel] Xorg indefinitely hangs in kernelspace

Xorg indefinitely hangs in kernelspace

Xorg indefinitely hangs in kernelspace

Seemingly Similar Threads