thr3ads.net - Xen devel - [Xen-devel] Crash on blktap shutdown [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Jeremy Fitzhardinge

2010-Feb-24 22:55 UTC

[Xen-devel] Crash on blktap shutdown

When rebooting the machine,  I got this crash from blktap.  The rip maps to line
262 in
0xffffffff812548a1 is in blktap_request_pool_free
(/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
257		spin_lock_irqsave(&pool.lock, flags);
258	
259		pool.status = BLKTAP_POOL_CLOSING;
260		while (atomic_read(&pool.reqs_in_use)) {
261			spin_unlock_irqrestore(&pool.lock, flags);
262			wait_event(pool.wait_queue, !atomic_read(&pool.reqs_in_use));
263			spin_lock_irqsave(&pool.lock, flags);
264		}
265	
266		for (i = 0; i<  MAX_BUCKETS; i++) {


blktap_ring_vm_close: unmapping ring 6
blktap_ring_release: freeing device 6
general protection fault: 0000 [#2] SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/net/eth0/address
CPU 1
Modules linked in: e1000 evdev ahci dm_mod sd_mod mptspi mptscsih mptbase scsi_]
Pid: 993, comm: tapdisk2 Tainted: G      D    2.6.32.8 #355 PowerEdge 1850
RIP: e030:[<ffffffff8125413b>]  [<ffffffff8125413b>]
blktap_device_restart+0x7a8
RSP: e02b:ffff88002d767be8  EFLAGS: 00010092
RAX: ffff88002ea06b08 RBX: ffff88002f319090 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b
RBP: ffff88002d767bf8 R08: 0000000000000002 R09: 0000000000000001
R10: ffffffff8125412d R11: ffffffff811eaa4a R12: ffff88002f319330
R13: ffff88002f3191b8 R14: ffff8800242a3a50 R15: 0000000000000001
FS:  00007f7e3234a740(0000) GS:ffff8800028fb000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000036a05a8d84 CR3: 000000002d364000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process tapdisk2 (pid: 993, threadinfo ffff88002d766000, task ffff8800242c3d00)
Stack:
  ffff88002f319090 ffff88002f319238 ffff88002d767c28 ffffffff81251b3b
<0>  ffff8800242a3a50 ffff88002f2c2870 ffff880002909820 ffff88002400ad60
<0>  ffff88002d767c48 ffffffff8109aead ffff8800242a3a50 ffff88002400ad00
Call Trace:
  [<ffffffff81251b3b>] blktap_ring_vm_close+0x39/0x12d
  [<ffffffff8109aead>] remove_vma+0x3b/0x71
  [<ffffffff8109b036>] exit_mmap+0x153/0x175
  [<ffffffff8103eef6>] mmput+0x3e/0xd9
  [<ffffffff81042b83>] exit_mm+0x100/0x10b
  [<ffffffff81044416>] do_exit+0x1b9/0x638
  [<ffffffff8104d797>] ? get_signal_to_deliver+0x2dd/0x36e
  [<ffffffff8100efef>] ? xen_restore_fl_direct_end+0x0/0x1
  [<ffffffff81044908>] do_group_exit+0x73/0x9c
  [<ffffffff8104d809>] get_signal_to_deliver+0x34f/0x36e
  [<ffffffff810111c4>] do_signal+0x6d/0x6b0
  [<ffffffff8104ef1f>] ? sys_getsid+0x88/0xaf
  [<ffffffff810bd680>] ? poll_select_copy_remaining+0xc9/0xee
  [<ffffffff8101182e>] do_notify_resume+0x27/0x47
  [<ffffffff81390f80>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff810549ef>] ? remove_wait_queue+0x12/0x45
  [<ffffffff81011f56>] int_signal+0x12/0x17
Code: a8 01 74 0a 48 89 df e8 24 e6 ff ff eb 46 4c 8d a3 a0 02 00 00 4c 89 e7 e
RIP  [<ffffffff8125413b>] blktap_device_restart+0x7a/0xa8
  RSP<ffff88002d767be8>
---[ end trace 1b88501e9b8effb5 ]---

	J


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2010-Feb-24 23:20 UTC

head link

[Xen-devel] Re: Crash on blktap shutdown

Jake, any immediate ideas?

Daniel

On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge
wrote:> When rebooting the machine,  I got this crash from blktap.  The rip maps to
line 262 in
> 0xffffffff812548a1 is in blktap_request_pool_free
(/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
> 257		spin_lock_irqsave(&pool.lock, flags);
> 258	
> 259		pool.status = BLKTAP_POOL_CLOSING;
> 260		while (atomic_read(&pool.reqs_in_use)) {
> 261			spin_unlock_irqrestore(&pool.lock, flags);
> 262			wait_event(pool.wait_queue, !atomic_read(&pool.reqs_in_use));
> 263			spin_lock_irqsave(&pool.lock, flags);
> 264		}
> 265	
> 266		for (i = 0; i<  MAX_BUCKETS; i++) {
> 
> 
> blktap_ring_vm_close: unmapping ring 6
> blktap_ring_release: freeing device 6
> general protection fault: 0000 [#2] SMP DEBUG_PAGEALLOC
> last sysfs file: /sys/devices/virtual/net/eth0/address
> CPU 1
> Modules linked in: e1000 evdev ahci dm_mod sd_mod mptspi mptscsih mptbase
scsi_]
> Pid: 993, comm: tapdisk2 Tainted: G      D    2.6.32.8 #355 PowerEdge 1850
> RIP: e030:[<ffffffff8125413b>]  [<ffffffff8125413b>]
blktap_device_restart+0x7a8
> RSP: e02b:ffff88002d767be8  EFLAGS: 00010092
> RAX: ffff88002ea06b08 RBX: ffff88002f319090 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b
> RBP: ffff88002d767bf8 R08: 0000000000000002 R09: 0000000000000001
> R10: ffffffff8125412d R11: ffffffff811eaa4a R12: ffff88002f319330
> R13: ffff88002f3191b8 R14: ffff8800242a3a50 R15: 0000000000000001
> FS:  00007f7e3234a740(0000) GS:ffff8800028fb000(0000)
knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000036a05a8d84 CR3: 000000002d364000 CR4: 0000000000000660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process tapdisk2 (pid: 993, threadinfo ffff88002d766000, task
ffff8800242c3d00)
> Stack:
>   ffff88002f319090 ffff88002f319238 ffff88002d767c28 ffffffff81251b3b
> <0>  ffff8800242a3a50 ffff88002f2c2870 ffff880002909820
ffff88002400ad60
> <0>  ffff88002d767c48 ffffffff8109aead ffff8800242a3a50
ffff88002400ad00
> Call Trace:
>   [<ffffffff81251b3b>] blktap_ring_vm_close+0x39/0x12d
>   [<ffffffff8109aead>] remove_vma+0x3b/0x71
>   [<ffffffff8109b036>] exit_mmap+0x153/0x175
>   [<ffffffff8103eef6>] mmput+0x3e/0xd9
>   [<ffffffff81042b83>] exit_mm+0x100/0x10b
>   [<ffffffff81044416>] do_exit+0x1b9/0x638
>   [<ffffffff8104d797>] ? get_signal_to_deliver+0x2dd/0x36e
>   [<ffffffff8100efef>] ? xen_restore_fl_direct_end+0x0/0x1
>   [<ffffffff81044908>] do_group_exit+0x73/0x9c
>   [<ffffffff8104d809>] get_signal_to_deliver+0x34f/0x36e
>   [<ffffffff810111c4>] do_signal+0x6d/0x6b0
>   [<ffffffff8104ef1f>] ? sys_getsid+0x88/0xaf
>   [<ffffffff810bd680>] ? poll_select_copy_remaining+0xc9/0xee
>   [<ffffffff8101182e>] do_notify_resume+0x27/0x47
>   [<ffffffff81390f80>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>   [<ffffffff810549ef>] ? remove_wait_queue+0x12/0x45
>   [<ffffffff81011f56>] int_signal+0x12/0x17
> Code: a8 01 74 0a 48 89 df e8 24 e6 ff ff eb 46 4c 8d a3 a0 02 00 00 4c 89
e7 e
> RIP  [<ffffffff8125413b>] blktap_device_restart+0x7a/0xa8
>   RSP<ffff88002d767be8>
> ---[ end trace 1b88501e9b8effb5 ]---
> 
> 	J
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2010-Feb-24 23:26 UTC

head link

[Xen-devel] Re: Crash on blktap shutdown

On 02/24/2010 03:20 PM, Daniel Stodden wrote:> Jake, any immediate ideas?
>    
Just got another one on domain shutdown.  The crashing instruction is:
0xffffffff8104a3f2 <lock_timer_base+17>:    mov    0x28(%r12),%r14



r12 = 6b6b6b6b6b6b6c8b

0x6b is the use-after-free poison value.  So I think a use-after-free.

0xffffffff8104a3f2 is in lock_timer_base
(/home/jeremy/git/linux/kernel/timer.c:620).
615		__acquires(timer->base->lock)
616	{
617		struct tvec_base *base;
618	
619		for (;;) {
620			struct tvec_base *prelock_base = timer->base;
621			base = tbase_get_base(prelock_base);
622			if (likely(base != NULL)) {
623				spin_lock_irqsave(&base->lock, *flags);
624				if (likely(prelock_base == timer->base))


general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/blktap2/blktap0/remove
CPU 1
Modules linked in: e1000 evdev ahci dm_mod sd_mod mptspi mptscsih mptbase scsi_]
Pid: 6533, comm: xend Not tainted 2.6.32.9 #356 PowerEdge 1850
RIP: e030:[<ffffffff8104a3f2>]  [<ffffffff8104a3f2>]
lock_timer_base+0x11/0x4d
RSP: e02b:ffff880021a73ce8  EFLAGS: 00010286
RAX: ffff88001d858f40 RBX: 6b6b6b6b6b6b6c8b RCX: 0000000000000000
RDX: ffffffff8104abda RSI: ffff880021a73d20 RDI: 6b6b6b6b6b6b6c8b
RBP: ffff880021a73d08 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff8104abda R11: ffff880003cd1810 R12: 6b6b6b6b6b6b6c8b
R13: ffff880021a73d20 R14: 000000000000011e R15: ffff880021a73e20
FS:  00007f164dffb910(0000) GS:ffff8800028fb000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000001d62140 CR3: 000000002eac8000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process xend (pid: 6533, threadinfo ffff880021a72000, task ffff88001d858f40)
Stack:
  6b6b6b6b6b6b6c8b 00000000ffffffff ffff88002f2802e8 000000000000011e
<0>  ffff880021a73d38 ffffffff8104a7b5 0000000000000001 ffffffff8104abda
<0>  6b6b6b6b6b6b6c8b 6b6b6b6b6b6b6cbb ffff880021a73d78 ffffffff8104ac68
Call Trace:
  [<ffffffff8104a7b5>] try_to_del_timer_sync+0x1b/0x81
  [<ffffffff8104abda>] ? del_timer_sync+0x0/0xa1
  [<ffffffff8104ac68>] del_timer_sync+0x8e/0xa1
  [<ffffffff8104abda>] ? del_timer_sync+0x0/0xa1
  [<ffffffff811e79b7>] ? kobject_release+0x0/0x66
  [<ffffffff811d842c>] blk_sync_queue+0x18/0x34
  [<ffffffff811d8457>] blk_cleanup_queue+0xf/0x4b
  [<ffffffff81254039>] blktap_device_destroy+0xad/0xd7
  [<ffffffff812512a5>] blktap_control_destroy_device+0x55/0x154
  [<ffffffff81390438>] ? mutex_lock_nested+0x2a5/0x2b4
  [<ffffffff81254de5>] blktap_sysfs_remove_device+0x39/0x49
  [<ffffffff81294170>] dev_attr_store+0x1b/0x1d
  [<ffffffff810fa9d4>] sysfs_write_file+0xf6/0x132
  [<ffffffff810b03f0>] vfs_write+0xad/0x14e
  [<ffffffff810b0c1b>] ? fget_light+0x52/0xeb
  [<ffffffff811eab92>] ? __up_read+0x1c/0xa2
  [<ffffffff810b054a>] sys_write+0x45/0x6c
  [<ffffffff81011c82>] system_call_fastpath+0x16/0x1b
Code: 55 31 d2 48 89 e5 31 f6 65 48 8b 3c 25 c0 cb 00 00 e8 95 77 00 00 c9 48 9
RIP  [<ffffffff8104a3f2>] lock_timer_base+0x11/0x4d
  RSP<ffff880021a73ce8>
---[ end trace 767ddf28dd1b4a3e ]---


> Daniel
>
> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote:
>    
>> When rebooting the machine,  I got this crash from blktap.  The rip
maps to line 262 in
>> 0xffffffff812548a1 is in blktap_request_pool_free
(/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
>> 257		spin_lock_irqsave(&pool.lock, flags);
>> 258	
>> 259		pool.status = BLKTAP_POOL_CLOSING;
>> 260		while (atomic_read(&pool.reqs_in_use)) {
>> 261			spin_unlock_irqrestore(&pool.lock, flags);
>> 262			wait_event(pool.wait_queue, !atomic_read(&pool.reqs_in_use));
>> 263			spin_lock_irqsave(&pool.lock, flags);
>> 264		}
>> 265	
>> 266		for (i = 0; i<   MAX_BUCKETS; i++) {
>>
>>
>> blktap_ring_vm_close: unmapping ring 6
>> blktap_ring_release: freeing device 6
>> general protection fault: 0000 [#2] SMP DEBUG_PAGEALLOC
>> last sysfs file: /sys/devices/virtual/net/eth0/address
>> CPU 1
>> Modules linked in: e1000 evdev ahci dm_mod sd_mod mptspi mptscsih
mptbase scsi_]
>> Pid: 993, comm: tapdisk2 Tainted: G      D    2.6.32.8 #355 PowerEdge
1850
>> RIP: e030:[<ffffffff8125413b>]  [<ffffffff8125413b>]
blktap_device_restart+0x7a8
>> RSP: e02b:ffff88002d767be8  EFLAGS: 00010092
>> RAX: ffff88002ea06b08 RBX: ffff88002f319090 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b
>> RBP: ffff88002d767bf8 R08: 0000000000000002 R09: 0000000000000001
>> R10: ffffffff8125412d R11: ffffffff811eaa4a R12: ffff88002f319330
>> R13: ffff88002f3191b8 R14: ffff8800242a3a50 R15: 0000000000000001
>> FS:  00007f7e3234a740(0000) GS:ffff8800028fb000(0000)
knlGS:0000000000000000
>> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 00000036a05a8d84 CR3: 000000002d364000 CR4: 0000000000000660
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process tapdisk2 (pid: 993, threadinfo ffff88002d766000, task
ffff8800242c3d00)
>> Stack:
>>    ffff88002f319090 ffff88002f319238 ffff88002d767c28 ffffffff81251b3b
>> <0>   ffff8800242a3a50 ffff88002f2c2870 ffff880002909820
ffff88002400ad60
>> <0>   ffff88002d767c48 ffffffff8109aead ffff8800242a3a50
ffff88002400ad00
>> Call Trace:
>>    [<ffffffff81251b3b>] blktap_ring_vm_close+0x39/0x12d
>>    [<ffffffff8109aead>] remove_vma+0x3b/0x71
>>    [<ffffffff8109b036>] exit_mmap+0x153/0x175
>>    [<ffffffff8103eef6>] mmput+0x3e/0xd9
>>    [<ffffffff81042b83>] exit_mm+0x100/0x10b
>>    [<ffffffff81044416>] do_exit+0x1b9/0x638
>>    [<ffffffff8104d797>] ? get_signal_to_deliver+0x2dd/0x36e
>>    [<ffffffff8100efef>] ? xen_restore_fl_direct_end+0x0/0x1
>>    [<ffffffff81044908>] do_group_exit+0x73/0x9c
>>    [<ffffffff8104d809>] get_signal_to_deliver+0x34f/0x36e
>>    [<ffffffff810111c4>] do_signal+0x6d/0x6b0
>>    [<ffffffff8104ef1f>] ? sys_getsid+0x88/0xaf
>>    [<ffffffff810bd680>] ? poll_select_copy_remaining+0xc9/0xee
>>    [<ffffffff8101182e>] do_notify_resume+0x27/0x47
>>    [<ffffffff81390f80>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>>    [<ffffffff810549ef>] ? remove_wait_queue+0x12/0x45
>>    [<ffffffff81011f56>] int_signal+0x12/0x17
>> Code: a8 01 74 0a 48 89 df e8 24 e6 ff ff eb 46 4c 8d a3 a0 02 00 00 4c
89 e7 e
>> RIP  [<ffffffff8125413b>] blktap_device_restart+0x7a/0xa8
>>    RSP<ffff88002d767be8>
>> ---[ end trace 1b88501e9b8effb5 ]---
>>
>> 	J
>>
>>      
>
>    

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2010-Feb-24 23:49 UTC

head link

[Xen-devel] Re: Crash on blktap shutdown

On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge
wrote:> When rebooting the machine,  I got this crash from blktap.  The rip maps to
line 262 in
> 0xffffffff812548a1 is in blktap_request_pool_free
(/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
Uhm, where did that RIP come from?

pool_free is on the module exit path. The stack trace below looks like a
crash from the broadcasted SIGTERM before reboot.

Daniel
> 257		spin_lock_irqsave(&pool.lock, flags);
> 258	
> 259		pool.status = BLKTAP_POOL_CLOSING;
> 260		while (atomic_read(&pool.reqs_in_use)) {
> 261			spin_unlock_irqrestore(&pool.lock, flags);
> 262			wait_event(pool.wait_queue, !atomic_read(&pool.reqs_in_use));
> 263			spin_lock_irqsave(&pool.lock, flags);
> 264		}
> 265	
> 266		for (i = 0; i<  MAX_BUCKETS; i++) {
> 
> 
> blktap_ring_vm_close: unmapping ring 6
> blktap_ring_release: freeing device 6
> general protection fault: 0000 [#2] SMP DEBUG_PAGEALLOC
> last sysfs file: /sys/devices/virtual/net/eth0/address
> CPU 1
> Modules linked in: e1000 evdev ahci dm_mod sd_mod mptspi mptscsih mptbase
scsi_]
> Pid: 993, comm: tapdisk2 Tainted: G      D    2.6.32.8 #355 PowerEdge 1850
> RIP: e030:[<ffffffff8125413b>]  [<ffffffff8125413b>]
blktap_device_restart+0x7a8
> RSP: e02b:ffff88002d767be8  EFLAGS: 00010092
> RAX: ffff88002ea06b08 RBX: ffff88002f319090 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b
> RBP: ffff88002d767bf8 R08: 0000000000000002 R09: 0000000000000001
> R10: ffffffff8125412d R11: ffffffff811eaa4a R12: ffff88002f319330
> R13: ffff88002f3191b8 R14: ffff8800242a3a50 R15: 0000000000000001
> FS:  00007f7e3234a740(0000) GS:ffff8800028fb000(0000)
knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000036a05a8d84 CR3: 000000002d364000 CR4: 0000000000000660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process tapdisk2 (pid: 993, threadinfo ffff88002d766000, task
ffff8800242c3d00)
> Stack:
>   ffff88002f319090 ffff88002f319238 ffff88002d767c28 ffffffff81251b3b
> <0>  ffff8800242a3a50 ffff88002f2c2870 ffff880002909820
ffff88002400ad60
> <0>  ffff88002d767c48 ffffffff8109aead ffff8800242a3a50
ffff88002400ad00
> Call Trace:
>   [<ffffffff81251b3b>] blktap_ring_vm_close+0x39/0x12d
>   [<ffffffff8109aead>] remove_vma+0x3b/0x71
>   [<ffffffff8109b036>] exit_mmap+0x153/0x175
>   [<ffffffff8103eef6>] mmput+0x3e/0xd9
>   [<ffffffff81042b83>] exit_mm+0x100/0x10b
>   [<ffffffff81044416>] do_exit+0x1b9/0x638
>   [<ffffffff8104d797>] ? get_signal_to_deliver+0x2dd/0x36e
>   [<ffffffff8100efef>] ? xen_restore_fl_direct_end+0x0/0x1
>   [<ffffffff81044908>] do_group_exit+0x73/0x9c
>   [<ffffffff8104d809>] get_signal_to_deliver+0x34f/0x36e
>   [<ffffffff810111c4>] do_signal+0x6d/0x6b0
>   [<ffffffff8104ef1f>] ? sys_getsid+0x88/0xaf
>   [<ffffffff810bd680>] ? poll_select_copy_remaining+0xc9/0xee
>   [<ffffffff8101182e>] do_notify_resume+0x27/0x47
>   [<ffffffff81390f80>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>   [<ffffffff810549ef>] ? remove_wait_queue+0x12/0x45
>   [<ffffffff81011f56>] int_signal+0x12/0x17
> Code: a8 01 74 0a 48 89 df e8 24 e6 ff ff eb 46 4c 8d a3 a0 02 00 00 4c 89
e7 e
> RIP  [<ffffffff8125413b>] blktap_device_restart+0x7a/0xa8
>   RSP<ffff88002d767be8>
> ---[ end trace 1b88501e9b8effb5 ]---
> 
> 	J
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2010-Feb-24 23:52 UTC

head link

Re: [Xen-devel] Re: Crash on blktap shutdown

On 02/24/2010 03:49 PM, Daniel Stodden wrote:> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote:
>    
>> When rebooting the machine,  I got this crash from blktap.  The rip
maps to line 262 in
>> 0xffffffff812548a1 is in blktap_request_pool_free
(/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
>>      
> Uhm, where did that RIP come from?
>
> pool_free is on the module exit path. The stack trace below looks like a
> crash from the broadcasted SIGTERM before reboot.
>    
Ignore it; I generated it from a different kernel from the one that 
crashed.  But the other oops I posted should be all consistent and 
meaningful.

     J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2010-Feb-25 00:12 UTC

head link

[Xen-devel] Re: Crash on blktap shutdown

On Wed, 2010-02-24 at 18:26 -0500, Jeremy Fitzhardinge
wrote:> On 02/24/2010 03:20 PM, Daniel Stodden wrote:
> > Jake, any immediate ideas?
> >    
> 
> Just got another one on domain shutdown.  The crashing instruction is:
> 0xffffffff8104a3f2 <lock_timer_base+17>:    mov    0x28(%r12),%r14
> 
Oh, a classic.

I think I had the same issue somewhere in blktap1 when moving to 2.6.27.

Coming.

Thanks,
Daniel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2010-Feb-25 00:16 UTC

head link

[Xen-devel] Re: Crash on blktap shutdown

On Wed, 2010-02-24 at 19:12 -0500, Daniel Stodden wrote:> On Wed, 2010-02-24 at 18:26 -0500, Jeremy Fitzhardinge wrote:
> > On 02/24/2010 03:20 PM, Daniel Stodden wrote:
> > > Jake, any immediate ideas?
> > >    
> > 
> > Just got another one on domain shutdown.  The crashing instruction is:
> > 0xffffffff8104a3f2 <lock_timer_base+17>:    mov   
0x28(%r12),%r14
> > 
> 
> Oh, a classic.
> 
> I think I had the same issue somewhere in blktap1 when moving to 2.6.27.
> 
> Coming.
This should do. 100% untested.

--snip---
blktap/device: Fix wild ptr deref during device destruction.

A put_disk() before blk_cleanup_queue() would free gd before gd->queue
is read.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>

diff -r 7d0b5bd0725f drivers/xen/blktap/device.c
--- a/drivers/xen/blktap/device.c	Fri Feb 05 11:12:24 2010 -0800
+++ b/drivers/xen/blktap/device.c	Wed Feb 24 16:13:26 2010 -0800
@@ -1027,8 +1027,8 @@
 #endif
 
 	del_gendisk(dev->gd);
+	blk_cleanup_queue(dev->gd->queue);
 	put_disk(dev->gd);
-	blk_cleanup_queue(dev->gd->queue);
 
 	dev->gd = NULL;
 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2010-Feb-25 00:24 UTC

head link

[Xen-devel] Re: [PATCH] Fix wild ptr deref during device destruction.

C&P results might not have been so great. Diff attached.

Cheers,
Daniel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2010-Feb-25 00:29 UTC

head link

Re: [Xen-devel] Re: Crash on blktap shutdown

On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge
wrote:> On 02/24/2010 03:49 PM, Daniel Stodden wrote:
> > On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote:
> >    
> >> When rebooting the machine,  I got this crash from blktap.  The
rip maps to line 262 in
> >> 0xffffffff812548a1 is in blktap_request_pool_free
(/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
> >>      
> > Uhm, where did that RIP come from?
> >
> > pool_free is on the module exit path. The stack trace below looks like
a
> > crash from the broadcasted SIGTERM before reboot.
> >    
> 
> Ignore it; I generated it from a different kernel from the one that 
> crashed.  But the other oops I posted should be all consistent and 
> meaningful.
Ignore only the debuginfo quote, right?
Cos this looks like a different issue to me.

Thanks,
Daniel 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2010-Feb-25 00:37 UTC

head link

Re: [Xen-devel] Re: Crash on blktap shutdown

On 02/24/2010 04:29 PM, Daniel Stodden wrote:> On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote:
>    
>> On 02/24/2010 03:49 PM, Daniel Stodden wrote:
>>      
>>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote:
>>>
>>>        
>>>> When rebooting the machine,  I got this crash from blktap.  The
rip maps to line 262 in
>>>> 0xffffffff812548a1 is in blktap_request_pool_free
(/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
>>>>
>>>>          
>>> Uhm, where did that RIP come from?
>>>
>>> pool_free is on the module exit path. The stack trace below looks
like a
>>> crash from the broadcasted SIGTERM before reboot.
>>>
>>>        
>> Ignore it; I generated it from a different kernel from the one that
>> crashed.  But the other oops I posted should be all consistent and
>> meaningful.
>>      
> Ignore only the debuginfo quote, right?
> Cos this looks like a different issue to me.
>    
Perhaps.  I got all the others on normal domain shutdown, but this one 
was on machine reboot.  I''ll try to repro (as I boot the test kernel 
with your patch in it).

     J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2010-Feb-25 01:47 UTC

head link

Re: [Xen-devel] Re: Crash on blktap shutdown

On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge
wrote:> On 02/24/2010 04:29 PM, Daniel Stodden wrote:
> > On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote:
> >    
> >> On 02/24/2010 03:49 PM, Daniel Stodden wrote:
> >>      
> >>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote:
> >>>
> >>>        
> >>>> When rebooting the machine,  I got this crash from blktap.
The rip maps to line 262 in
> >>>> 0xffffffff812548a1 is in blktap_request_pool_free
(/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
> >>>>
> >>>>          
> >>> Uhm, where did that RIP come from?
> >>>
> >>> pool_free is on the module exit path. The stack trace below
looks like a
> >>> crash from the broadcasted SIGTERM before reboot.
> >>>
> >>>        
> >> Ignore it; I generated it from a different kernel from the one
that
> >> crashed.  But the other oops I posted should be all consistent and
> >> meaningful.
> >>      
> > Ignore only the debuginfo quote, right?
> > Cos this looks like a different issue to me.
> >    
> 
> Perhaps.  I got all the others on normal domain shutdown, but this one 
> was on machine reboot.  I''ll try to repro (as I boot the test
kernel
> with your patch in it).
(gdb) list *(blktap_device_restart+0x7a)
0x2a73 is in blktap_device_restart
(/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/drivers/xen/blktap/device.c:920).
915 /* Re-enable calldowns. */
916 if (blk_queue_stopped(dev->gd->queue))
917 blk_start_queue(dev->gd->queue);
918 
919 /* Kick things off immediately. */
920 blktap_device_do_request(dev->gd->queue);
921 
922 spin_unlock_irq(&dev->lock);
923 }
924 

Assuming we''ve been dereferencing a NULL gendisk, i.e. device_destroy
racing against device_restart.

Would take

 * Tapdisk killed on the other thread, which goes through into
   a device_restart(). Which is what your stacktrace shows.

 * Device removal pending, blocking until
   device->users drops to 0, then doing the device_destroy().
   That might have happened during bdev .release.

Both running at the same time sounds like what happens if you kill them
all at once.

That clearly takes another patch then.

Daniel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2010-Feb-25 03:03 UTC

head link

Re: [Xen-devel] [PATCH] Re: Crash on blktap shutdown

On Wed, 2010-02-24 at 20:47 -0500, Daniel Stodden wrote:> On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge wrote:
> > On 02/24/2010 04:29 PM, Daniel Stodden wrote:
> > > On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote:
> > >    
> > >> On 02/24/2010 03:49 PM, Daniel Stodden wrote:
> > >>      
> > >>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge
wrote:
> > >>>
> > >>>        
> > >>>> When rebooting the machine,  I got this crash from
blktap.  The rip maps to line 262 in
> > >>>> 0xffffffff812548a1 is in blktap_request_pool_free
(/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
> > >>>>
> > >>>>          
> > >>> Uhm, where did that RIP come from?
> > >>>
> > >>> pool_free is on the module exit path. The stack trace
below looks like a
> > >>> crash from the broadcasted SIGTERM before reboot.
> > >>>
> > >>>        
> > >> Ignore it; I generated it from a different kernel from the
one that
> > >> crashed.  But the other oops I posted should be all
consistent and
> > >> meaningful.
> > >>      
> > > Ignore only the debuginfo quote, right?
> > > Cos this looks like a different issue to me.
> > >    
> > 
> > Perhaps.  I got all the others on normal domain shutdown, but this one
> > was on machine reboot.  I''ll try to repro (as I boot the test
kernel
> > with your patch in it).
> 
> (gdb) list *(blktap_device_restart+0x7a)
> 0x2a73 is in blktap_device_restart
>
(/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/drivers/xen/blktap/device.c:920).
> 915 /* Re-enable calldowns. */
> 916 if (blk_queue_stopped(dev->gd->queue))
> 917 blk_start_queue(dev->gd->queue);
> 918 
> 919 /* Kick things off immediately. */
> 920 blktap_device_do_request(dev->gd->queue);
> 921 
> 922 spin_unlock_irq(&dev->lock);
> 923 }
> 924 
> 
> Assuming we''ve been dereferencing a NULL gendisk, i.e.
device_destroy
> racing against device_restart.
> 
> Would take
> 
>  * Tapdisk killed on the other thread, which goes through into
>    a device_restart(). Which is what your stacktrace shows.
> 
>  * Device removal pending, blocking until
>    device->users drops to 0, then doing the device_destroy().
>    That might have happened during bdev .release.
> 
> Both running at the same time sounds like what happens if you kill them
> all at once.
> 
> That clearly takes another patch then.
Jeremy, 

can you try out the attached patch for me?

This should close the above shutdown race as well.

Should be nowhere as frequent as the timer_sync crash fixed earlier.

Thanks,
Daniel





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Feb-25 08:28 UTC

head link

[Xen-devel] Re: [PATCH] Fix wild ptr deref during device destruction.

Wouldn''t it be better to move blk_cleanup_queue() even before
del_gendisk()?

Jan
>>> Daniel Stodden <daniel.stodden@citrix.com> 25.02.10 01:24
>>>
C&P results might not have been so great. Diff attached.

Cheers,
Daniel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2010-Feb-25 09:57 UTC

head link

Re: [Xen-devel] Re: [PATCH] Fix wild ptr deref during device destruction.

On Thu, 2010-02-25 at 03:28 -0500, Jan Beulich wrote:> Wouldn''t it be better to move blk_cleanup_queue() even before
del_gendisk()?
No.

[2009-09-22 12:48:58 UTC] Call Trace:
[2009-09-22 12:48:58 UTC]  [<c01d0186>] ? sysfs_remove_dir+0x46/0xa0
[2009-09-22 12:48:58 UTC]  [<c020180f>] ? kobject_del+0xf/0x30
[2009-09-22 12:48:58 UTC]  [<c01f107c>] ? __elv_unregister_queue+0x1c/0x20
[2009-09-22 12:48:58 UTC]  [<c01f108f>] ? elv_unregister_queue+0xf/0x20
[2009-09-22 12:48:58 UTC]  [<c01f512a>] ? blk_unregister_queue+0x2a/0x70
[2009-09-22 12:48:58 UTC]  [<c01fa55a>] ? unlink_gendisk+0x2a/0x40
[2009-09-22 12:48:58 UTC]  [<c01c9b10>] ? del_gendisk+0x60/0xd0
[2009-09-22 12:48:58 UTC]  [<c028066e>] ? destroy_backdev+0x7e/0x100
[2009-09-22 12:48:58 UTC]  [<c027f05b>] ? tap_blkif_schedule+0x5cb/0x830
[2009-09-22 12:48:58 UTC]  [<c011ed51>] ? pick_next_task_fair+0x91/0xd0
[2009-09-22 12:48:58 UTC]  [<c013dd70>] ?
autoremove_wake_function+0x0/0x50
[2009-09-22 12:48:58 UTC]  [<c027ea90>] ? tap_blkif_schedule+0x0/0x830
[2009-09-22 12:48:58 UTC]  [<c013da12>] ? kthread+0x42/0x70
[2009-09-22 12:48:58 UTC]  [<c013d9d0>] ? kthread+0x0/0x70
[2009-09-22 12:48:58 UTC]  [<c010561b>] ? kernel_thread_helper+0x7/0x10

changeset:   660:88fe4866b738
user:        Daniel Stodden <daniel.stodden@citrix.com>
date:        Wed Oct 07 13:54:16 2009 -0700
files:       CA-32943-wild-ptr-deref.diff series
description:
CA-33070: Fix and reenable my broken CA-30953.diff & co.

A del_gendisk() definitely wants to find a queue on the disk. Which
in turn will have dropped to zero right after the cleanup
call. Because that crackbrained gendisk, as the only queue holder
which really matters in that entire game, is also the single entity
left short of maintaining that ref on its own here.

In summary, it apparently has to be *this* particular order.

+diff -r ebd0574c414a drivers/xen/blktap/backdev.c
+--- a/drivers/xen/blktap/backdev.c	Mon Sep 21 16:09:37 2009 -0700
++++ b/drivers/xen/blktap/backdev.c	Tue Sep 22 17:16:52 2009 -0700
+@@ -99,10 +99,9 @@
  	spin_unlock_irq(&backdev_io_lock);
  
+ 	del_gendisk(info->gd);
 +	blk_cleanup_queue(info->gd->queue);
-+
- 	del_gendisk(info->gd);
  	put_disk(info->gd);
  
 -	blk_cleanup_queue(info->gd->queue);




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2010-Feb-25 10:02 UTC

head link

Re: [Xen-devel] Re: [PATCH] Fix wild ptr deref during device destruction.

On Thu, 2010-02-25 at 04:57 -0500, Daniel Stodden wrote:> On Thu, 2010-02-25 at 03:28 -0500, Jan Beulich wrote:
> > Wouldn''t it be better to move blk_cleanup_queue() even before
del_gendisk()?
> 
> No.
Well, I beg you to differ. Maybe this changed, after all this is 2.6.3x.

Cheers,
Daniel
> [2009-09-22 12:48:58 UTC] Call Trace:
> [2009-09-22 12:48:58 UTC]  [<c01d0186>] ? sysfs_remove_dir+0x46/0xa0
> [2009-09-22 12:48:58 UTC]  [<c020180f>] ? kobject_del+0xf/0x30
> [2009-09-22 12:48:58 UTC]  [<c01f107c>] ?
__elv_unregister_queue+0x1c/0x20
> [2009-09-22 12:48:58 UTC]  [<c01f108f>] ?
elv_unregister_queue+0xf/0x20
> [2009-09-22 12:48:58 UTC]  [<c01f512a>] ?
blk_unregister_queue+0x2a/0x70
> [2009-09-22 12:48:58 UTC]  [<c01fa55a>] ? unlink_gendisk+0x2a/0x40
> [2009-09-22 12:48:58 UTC]  [<c01c9b10>] ? del_gendisk+0x60/0xd0
> [2009-09-22 12:48:58 UTC]  [<c028066e>] ? destroy_backdev+0x7e/0x100
> [2009-09-22 12:48:58 UTC]  [<c027f05b>] ?
tap_blkif_schedule+0x5cb/0x830
> [2009-09-22 12:48:58 UTC]  [<c011ed51>] ?
pick_next_task_fair+0x91/0xd0
> [2009-09-22 12:48:58 UTC]  [<c013dd70>] ?
autoremove_wake_function+0x0/0x50
> [2009-09-22 12:48:58 UTC]  [<c027ea90>] ?
tap_blkif_schedule+0x0/0x830
> [2009-09-22 12:48:58 UTC]  [<c013da12>] ? kthread+0x42/0x70
> [2009-09-22 12:48:58 UTC]  [<c013d9d0>] ? kthread+0x0/0x70
> [2009-09-22 12:48:58 UTC]  [<c010561b>] ?
kernel_thread_helper+0x7/0x10
> 
> changeset:   660:88fe4866b738
> user:        Daniel Stodden <daniel.stodden@citrix.com>
> date:        Wed Oct 07 13:54:16 2009 -0700
> files:       CA-32943-wild-ptr-deref.diff series
> description:
> CA-33070: Fix and reenable my broken CA-30953.diff & co.
> 
> A del_gendisk() definitely wants to find a queue on the disk. Which
> in turn will have dropped to zero right after the cleanup
> call. Because that crackbrained gendisk, as the only queue holder
> which really matters in that entire game, is also the single entity
> left short of maintaining that ref on its own here.
> 
> In summary, it apparently has to be *this* particular order.
> 
> +diff -r ebd0574c414a drivers/xen/blktap/backdev.c
> +--- a/drivers/xen/blktap/backdev.c	Mon Sep 21 16:09:37 2009 -0700
> ++++ b/drivers/xen/blktap/backdev.c	Tue Sep 22 17:16:52 2009 -0700
> +@@ -99,10 +99,9 @@
>   	spin_unlock_irq(&backdev_io_lock);
>   
> + 	del_gendisk(info->gd);
>  +	blk_cleanup_queue(info->gd->queue);
> -+
> - 	del_gendisk(info->gd);
>   	put_disk(info->gd);
>   
>  -	blk_cleanup_queue(info->gd->queue);
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2010-Feb-25 22:54 UTC

head link

Re: [Xen-devel] Yet another [PATCH] blkfront: Fix wild ptr deref during device destruction.

On Thu, 2010-02-25 at 05:02 -0500, Daniel Stodden wrote:> On Thu, 2010-02-25 at 04:57 -0500, Daniel Stodden wrote:
> > On Thu, 2010-02-25 at 03:28 -0500, Jan Beulich wrote:
> > > Wouldn''t it be better to move blk_cleanup_queue() even
before del_gendisk()?
> > 
> > No.
> 
> Well, I beg you to differ. Maybe this changed, after all this is 2.6.3x.
Oh, I guess the answer is no. I just came across the same issue in a
debian/lenny while detaching a CD on 2.6.32.

Daniel

Feb 25 13:33:18 debian kernel: [  455.074625] *pdpt = 000000000eff8027 *pde =
0000000000000000
Feb 25 13:33:18 debian kernel: [  455.074660] Modules linked in: xenfs nls_utf8
isofs nls_base loop evdev snd_pcsp snd_pcm snd_timer snd soundcore xen_netfront
snd_page_alloc ext3 jbd mbcache xen_blkfront thermal_sys
Feb 25 13:33:18 debian kernel: [  455.074727] 
Feb 25 13:33:18 debian kernel: [  455.074733] Pid: 1114, comm: umount Not
tainted (2.6.32-2-686-bigmem #1)
Feb 25 13:33:18 debian kernel: [  455.074743] EIP: 0061:[<c1139509>]
EFLAGS: 00010206 CPU: 0
Feb 25 13:33:18 debian kernel: [  455.074751] EIP is at
kobject_uevent_env+0x3d/0x35c
Feb 25 13:33:18 debian kernel: [  455.074759] EAX: 00000ad1 EBX: cf9562a8 ECX:
00000000 EDX: 00000ad1
Feb 25 13:33:18 debian kernel: [  455.074768] ESI: cfb00800 EDI: cfb00200 EBP:
cf9562a8 ESP: ced73eac
Feb 25 13:33:18 debian kernel: [  455.074777]  DS: 007b ES: 007b FS: 00d8 GS:
00e0 SS: 0069
Feb 25 13:33:18 debian kernel: [  455.074801]  00000000 00000001 00000ad1
00000000 c1309b85 ced73ec0 ced73ec0 cf915300
Feb 25 13:33:18 debian kernel: [  455.074828] <0> c12f8540 cfb00200
cf9562a8 cfb00800 cfb00200 00000000 c1125113 ce6ceeb0
Feb 25 13:33:18 debian kernel: [  455.074859] <0> c112d430 cfb00800
cfb00800 c1130ba3 0000000a c10f509d cfb00800 00000000
Feb 25 13:33:18 debian kernel: [  455.074903]  [<c1125113>] ?
elv_unregister_queue+0x17/0x21
Feb 25 13:33:18 debian kernel: [  455.074915]  [<c112d430>] ?
blk_unregister_queue+0x26/0x59
Feb 25 13:33:18 debian kernel: [  455.074926]  [<c1130ba3>] ?
unlink_gendisk+0x27/0x3b
Feb 25 13:33:18 debian kernel: [  455.074937]  [<c10f509d>] ?
del_gendisk+0x7b/0xf6
Feb 25 13:33:18 debian kernel: [  455.074949]  [<d082fc73>] ?
blkfront_closing+0x68/0x72 [xen_blkfront]
Feb 25 13:33:18 debian kernel: [  455.074961]  [<d08300c4>] ?
blkif_release+0x38/0x3d [xen_blkfront]
Feb 25 13:33:18 debian kernel: [  455.074974]  [<c10d9744>] ?
__blkdev_put+0x7a/0x10f
Feb 25 13:33:18 debian kernel: [  455.074985]  [<c10ea727>] ?
vfs_quota_off+0x0/0xd
Feb 25 13:33:18 debian kernel: [  455.074996]  [<c10bc913>] ?
deactivate_super+0x4a/0x5f
Feb 25 13:33:18 debian kernel: [  455.075007]  [<c10cc6c5>] ?
sys_umount+0x28b/0x2b1
Feb 25 13:33:18 debian kernel: [  455.075017]  [<c10cc6f6>] ?
sys_oldumount+0xb/0xe
Feb 25 13:33:18 debian kernel: [  455.075029]  [<c1007f7b>] ?
sysenter_do_call+0x12/0x28
Feb 25 13:33:18 debian kernel: [  455.075243] ---[ end trace 91b332cfeb23bfaf
]---



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2010-Feb-25 23:18 UTC

head link

Re: [Xen-devel] [PATCH] Re: Crash on blktap shutdown

On 02/24/2010 07:03 PM, Daniel Stodden wrote:> On Wed, 2010-02-24 at 20:47 -0500, Daniel Stodden wrote:
>    
>> On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge wrote:
>>      
>>> On 02/24/2010 04:29 PM, Daniel Stodden wrote:
>>>        
>>>> On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote:
>>>>
>>>>          
>>>>> On 02/24/2010 03:49 PM, Daniel Stodden wrote:
>>>>>
>>>>>            
>>>>>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge
wrote:
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> When rebooting the machine,  I got this crash from
blktap.  The rip maps to line 262 in
>>>>>>> 0xffffffff812548a1 is in blktap_request_pool_free
(/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> Uhm, where did that RIP come from?
>>>>>>
>>>>>> pool_free is on the module exit path. The stack trace
below looks like a
>>>>>> crash from the broadcasted SIGTERM before reboot.
>>>>>>
>>>>>>
>>>>>>              
>>>>> Ignore it; I generated it from a different kernel from the
one that
>>>>> crashed.  But the other oops I posted should be all
consistent and
>>>>> meaningful.
>>>>>
>>>>>            
>>>> Ignore only the debuginfo quote, right?
>>>> Cos this looks like a different issue to me.
>>>>
>>>>          
>>> Perhaps.  I got all the others on normal domain shutdown, but this
one
>>> was on machine reboot.  I''ll try to repro (as I boot the
test kernel
>>> with your patch in it).
>>>        
>> (gdb) list *(blktap_device_restart+0x7a)
>> 0x2a73 is in blktap_device_restart
>>
(/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/drivers/xen/blktap/device.c:920).
>> 915 /* Re-enable calldowns. */
>> 916 if (blk_queue_stopped(dev->gd->queue))
>> 917 blk_start_queue(dev->gd->queue);
>> 918
>> 919 /* Kick things off immediately. */
>> 920 blktap_device_do_request(dev->gd->queue);
>> 921
>> 922 spin_unlock_irq(&dev->lock);
>> 923 }
>> 924
>>
>> Assuming we''ve been dereferencing a NULL gendisk, i.e.
device_destroy
>> racing against device_restart.
>>
>> Would take
>>
>>   * Tapdisk killed on the other thread, which goes through into
>>     a device_restart(). Which is what your stacktrace shows.
>>
>>   * Device removal pending, blocking until
>>     device->users drops to 0, then doing the device_destroy().
>>     That might have happened during bdev .release.
>>
>> Both running at the same time sounds like what happens if you kill them
>> all at once.
>>
>> That clearly takes another patch then.
>>      
> Jeremy,
>
> can you try out the attached patch for me?
>
> This should close the above shutdown race as well.
>
> Should be nowhere as frequent as the timer_sync crash fixed earlier.
>    
Hm, the two patches changed things but I''m still seeing problems on 
domain shutdown.  Still looks like use-after-free.

blktap_device_destroy: destroy device 0 users 0
blktap_ring_vm_close: unmapping ring 0
blktap_ring_release: freeing device 0
blktap_sysfs_destroy
============================================================================BUG
kmalloc-512: Poison overwritten
-----------------------------------------------------------------------------

INFO: 0xffff88002e9e2048-0xffff88002e9e2048. First byte 0x6a instead of 0x6b
INFO: Allocated in device_create_vargs+0x47/0xd7 age=7705 cpu=0 pid=3072
INFO: Freed in device_create_release+0x9/0xb age=14 cpu=0 pid=3320
INFO: Slab 0xffff880003cca5b0 objects=14 used=2 fp=0xffff88002e9e2000 flags=0xa3
INFO: Object 0xffff88002e9e2000 @offset=0 fp=0xffff88002e9e2248

   Object 0xffff88002e9e2000:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2010:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2020:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2030:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2040:  6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2050:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2060:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2070:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2080:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2090:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e20a0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e20b0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e20c0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e20d0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e20e0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e20f0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2100:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2110:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2120:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2130:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2140:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2150:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2160:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2170:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2180:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e2190:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e21a0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e21b0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e21c0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e21d0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e21e0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
kk
   Object 0xffff88002e9e21f0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5
k�
  Redzone 0xffff88002e9e2200:  bb bb bb bb bb bb bb bb                         �
  Padding 0xffff88002e9e2240:  5a 5a 5a 5a 5a 5a 5a 5a                         Z
Pid: 3327, comm: ifdown Not tainted 2.6.32 #358
Call Trace:
  [<ffffffff810a83f9>] print_trailer+0x16a/0x173
  [<ffffffff810a89a0>] check_bytes_and_report+0xb5/0xe6
  [<ffffffff810a8a96>] check_object+0xc5/0x237
  [<ffffffff810aa588>] __slab_alloc+0x493/0x591
  [<ffffffff810e8fea>] ? load_elf_binary+0xe2/0x17d8
  [<ffffffff810e8fea>] ? load_elf_binary+0xe2/0x17d8
  [<ffffffff810ab06f>] __kmalloc+0xbe/0x12f
  [<ffffffff810e8fea>] load_elf_binary+0xe2/0x17d8
  [<ffffffff8100e921>] ? xen_force_evtchn_callback+0xd/0xf
  [<ffffffff8100e921>] ? xen_force_evtchn_callback+0xd/0xf
  [<ffffffff8100eff2>] ? check_events+0x12/0x20
  [<ffffffff810b3ee9>] ? search_binary_handler+0x18f/0x278
  [<ffffffff810e0208>] ? flock_to_posix_lock+0x4/0xe1
  [<ffffffff810b3e2c>] ? search_binary_handler+0xd2/0x278
  [<ffffffff8100efdf>] ? xen_restore_fl_direct_end+0x0/0x1
  [<ffffffff81064f38>] ? lock_release+0x15a/0x166
  [<ffffffff810e0208>] ? flock_to_posix_lock+0x4/0xe1
  [<ffffffff810b3e39>] search_binary_handler+0xdf/0x278
  [<ffffffff810e8f08>] ? load_elf_binary+0x0/0x17d8
  [<ffffffff810b5453>] do_execve+0x185/0x27a
  [<ffffffff81010673>] sys_execve+0x3e/0x5c
  [<ffffffff8101209a>] stub_execve+0x6a/0xc0
FIX kmalloc-512: Restoring 0xffff88002e9e2048-0xffff88002e9e2048=0x6b


	J


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2010-Feb-26 15:38 UTC

head link

Re: [Xen-devel] [PATCH] Re: Crash on blktap shutdown

On Thu, 2010-02-25 at 18:18 -0500, Jeremy Fitzhardinge
wrote:> On 02/24/2010 07:03 PM, Daniel Stodden wrote:
> > On Wed, 2010-02-24 at 20:47 -0500, Daniel Stodden wrote:
> >    
> >> On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge wrote:
> >>      
> >>> On 02/24/2010 04:29 PM, Daniel Stodden wrote:
> >>>        
> >>>> On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge
wrote:
> >>>>
> >>>>          
> >>>>> On 02/24/2010 03:49 PM, Daniel Stodden wrote:
> >>>>>
> >>>>>            
> >>>>>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy
Fitzhardinge wrote:
> >>>>>>
> >>>>>>
> >>>>>>              
> >>>>>>> When rebooting the machine,  I got this crash
from blktap.  The rip maps to line 262 in
> >>>>>>> 0xffffffff812548a1 is in
blktap_request_pool_free
(/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
> >>>>>>>
> >>>>>>>
> >>>>>>>                
> >>>>>> Uhm, where did that RIP come from?
> >>>>>>
> >>>>>> pool_free is on the module exit path. The stack
trace below looks like a
> >>>>>> crash from the broadcasted SIGTERM before reboot.
> >>>>>>
> >>>>>>
> >>>>>>              
> >>>>> Ignore it; I generated it from a different kernel from
the one that
> >>>>> crashed.  But the other oops I posted should be all
consistent and
> >>>>> meaningful.
> >>>>>
> >>>>>            
> >>>> Ignore only the debuginfo quote, right?
> >>>> Cos this looks like a different issue to me.
> >>>>
> >>>>          
> >>> Perhaps.  I got all the others on normal domain shutdown, but
this one
> >>> was on machine reboot.  I''ll try to repro (as I boot
the test kernel
> >>> with your patch in it).
> >>>        
> >> (gdb) list *(blktap_device_restart+0x7a)
> >> 0x2a73 is in blktap_device_restart
> >>
(/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/drivers/xen/blktap/device.c:920).
> >> 915 /* Re-enable calldowns. */
> >> 916 if (blk_queue_stopped(dev->gd->queue))
> >> 917 blk_start_queue(dev->gd->queue);
> >> 918
> >> 919 /* Kick things off immediately. */
> >> 920 blktap_device_do_request(dev->gd->queue);
> >> 921
> >> 922 spin_unlock_irq(&dev->lock);
> >> 923 }
> >> 924
> >>
> >> Assuming we''ve been dereferencing a NULL gendisk, i.e.
device_destroy
> >> racing against device_restart.
> >>
> >> Would take
> >>
> >>   * Tapdisk killed on the other thread, which goes through into
> >>     a device_restart(). Which is what your stacktrace shows.
> >>
> >>   * Device removal pending, blocking until
> >>     device->users drops to 0, then doing the device_destroy().
> >>     That might have happened during bdev .release.
> >>
> >> Both running at the same time sounds like what happens if you kill
them
> >> all at once.
> >>
> >> That clearly takes another patch then.
> >>      
> > Jeremy,
> >
> > can you try out the attached patch for me?
> >
> > This should close the above shutdown race as well.
> >
> > Should be nowhere as frequent as the timer_sync crash fixed earlier.
> >    
> 
> Hm, the two patches changed things but I''m still seeing problems
on
> domain shutdown.  Still looks like use-after-free.
All these new-fashioned debug switches. Only causing trouble.

This is yet a different piece. The sysfs code was causing a double unref
on the ring device.

Daniel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Feb 2010 - Crash on blktap shutdown

[Xen-devel] Crash on blktap shutdown

[Xen-devel] Re: Crash on blktap shutdown

[Xen-devel] Re: Crash on blktap shutdown

[Xen-devel] Re: Crash on blktap shutdown

Re: [Xen-devel] Re: Crash on blktap shutdown

[Xen-devel] Re: Crash on blktap shutdown

[Xen-devel] Re: Crash on blktap shutdown

[Xen-devel] Re: [PATCH] Fix wild ptr deref during device destruction.

Re: [Xen-devel] Re: Crash on blktap shutdown

Re: [Xen-devel] Re: Crash on blktap shutdown

Re: [Xen-devel] Re: Crash on blktap shutdown

Re: [Xen-devel] [PATCH] Re: Crash on blktap shutdown

[Xen-devel] Re: [PATCH] Fix wild ptr deref during device destruction.

Re: [Xen-devel] Re: [PATCH] Fix wild ptr deref during device destruction.

Re: [Xen-devel] Re: [PATCH] Fix wild ptr deref during device destruction.

Re: [Xen-devel] Yet another [PATCH] blkfront: Fix wild ptr deref during device destruction.

Re: [Xen-devel] [PATCH] Re: Crash on blktap shutdown

Re: [Xen-devel] [PATCH] Re: Crash on blktap shutdown