When rebooting the machine, I got this crash from blktap. The rip maps to line 262 in 0xffffffff812548a1 is in blktap_request_pool_free (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262). 257 spin_lock_irqsave(&pool.lock, flags); 258 259 pool.status = BLKTAP_POOL_CLOSING; 260 while (atomic_read(&pool.reqs_in_use)) { 261 spin_unlock_irqrestore(&pool.lock, flags); 262 wait_event(pool.wait_queue, !atomic_read(&pool.reqs_in_use)); 263 spin_lock_irqsave(&pool.lock, flags); 264 } 265 266 for (i = 0; i< MAX_BUCKETS; i++) { blktap_ring_vm_close: unmapping ring 6 blktap_ring_release: freeing device 6 general protection fault: 0000 [#2] SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/virtual/net/eth0/address CPU 1 Modules linked in: e1000 evdev ahci dm_mod sd_mod mptspi mptscsih mptbase scsi_] Pid: 993, comm: tapdisk2 Tainted: G D 2.6.32.8 #355 PowerEdge 1850 RIP: e030:[<ffffffff8125413b>] [<ffffffff8125413b>] blktap_device_restart+0x7a8 RSP: e02b:ffff88002d767be8 EFLAGS: 00010092 RAX: ffff88002ea06b08 RBX: ffff88002f319090 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b RBP: ffff88002d767bf8 R08: 0000000000000002 R09: 0000000000000001 R10: ffffffff8125412d R11: ffffffff811eaa4a R12: ffff88002f319330 R13: ffff88002f3191b8 R14: ffff8800242a3a50 R15: 0000000000000001 FS: 00007f7e3234a740(0000) GS:ffff8800028fb000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000036a05a8d84 CR3: 000000002d364000 CR4: 0000000000000660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process tapdisk2 (pid: 993, threadinfo ffff88002d766000, task ffff8800242c3d00) Stack: ffff88002f319090 ffff88002f319238 ffff88002d767c28 ffffffff81251b3b <0> ffff8800242a3a50 ffff88002f2c2870 ffff880002909820 ffff88002400ad60 <0> ffff88002d767c48 ffffffff8109aead ffff8800242a3a50 ffff88002400ad00 Call Trace: [<ffffffff81251b3b>] blktap_ring_vm_close+0x39/0x12d [<ffffffff8109aead>] remove_vma+0x3b/0x71 [<ffffffff8109b036>] exit_mmap+0x153/0x175 [<ffffffff8103eef6>] mmput+0x3e/0xd9 [<ffffffff81042b83>] exit_mm+0x100/0x10b [<ffffffff81044416>] do_exit+0x1b9/0x638 [<ffffffff8104d797>] ? get_signal_to_deliver+0x2dd/0x36e [<ffffffff8100efef>] ? xen_restore_fl_direct_end+0x0/0x1 [<ffffffff81044908>] do_group_exit+0x73/0x9c [<ffffffff8104d809>] get_signal_to_deliver+0x34f/0x36e [<ffffffff810111c4>] do_signal+0x6d/0x6b0 [<ffffffff8104ef1f>] ? sys_getsid+0x88/0xaf [<ffffffff810bd680>] ? poll_select_copy_remaining+0xc9/0xee [<ffffffff8101182e>] do_notify_resume+0x27/0x47 [<ffffffff81390f80>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff810549ef>] ? remove_wait_queue+0x12/0x45 [<ffffffff81011f56>] int_signal+0x12/0x17 Code: a8 01 74 0a 48 89 df e8 24 e6 ff ff eb 46 4c 8d a3 a0 02 00 00 4c 89 e7 e RIP [<ffffffff8125413b>] blktap_device_restart+0x7a/0xa8 RSP<ffff88002d767be8> ---[ end trace 1b88501e9b8effb5 ]--- J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jake, any immediate ideas? Daniel On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote:> When rebooting the machine, I got this crash from blktap. The rip maps to line 262 in > 0xffffffff812548a1 is in blktap_request_pool_free (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262). > 257 spin_lock_irqsave(&pool.lock, flags); > 258 > 259 pool.status = BLKTAP_POOL_CLOSING; > 260 while (atomic_read(&pool.reqs_in_use)) { > 261 spin_unlock_irqrestore(&pool.lock, flags); > 262 wait_event(pool.wait_queue, !atomic_read(&pool.reqs_in_use)); > 263 spin_lock_irqsave(&pool.lock, flags); > 264 } > 265 > 266 for (i = 0; i< MAX_BUCKETS; i++) { > > > blktap_ring_vm_close: unmapping ring 6 > blktap_ring_release: freeing device 6 > general protection fault: 0000 [#2] SMP DEBUG_PAGEALLOC > last sysfs file: /sys/devices/virtual/net/eth0/address > CPU 1 > Modules linked in: e1000 evdev ahci dm_mod sd_mod mptspi mptscsih mptbase scsi_] > Pid: 993, comm: tapdisk2 Tainted: G D 2.6.32.8 #355 PowerEdge 1850 > RIP: e030:[<ffffffff8125413b>] [<ffffffff8125413b>] blktap_device_restart+0x7a8 > RSP: e02b:ffff88002d767be8 EFLAGS: 00010092 > RAX: ffff88002ea06b08 RBX: ffff88002f319090 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b > RBP: ffff88002d767bf8 R08: 0000000000000002 R09: 0000000000000001 > R10: ffffffff8125412d R11: ffffffff811eaa4a R12: ffff88002f319330 > R13: ffff88002f3191b8 R14: ffff8800242a3a50 R15: 0000000000000001 > FS: 00007f7e3234a740(0000) GS:ffff8800028fb000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00000036a05a8d84 CR3: 000000002d364000 CR4: 0000000000000660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process tapdisk2 (pid: 993, threadinfo ffff88002d766000, task ffff8800242c3d00) > Stack: > ffff88002f319090 ffff88002f319238 ffff88002d767c28 ffffffff81251b3b > <0> ffff8800242a3a50 ffff88002f2c2870 ffff880002909820 ffff88002400ad60 > <0> ffff88002d767c48 ffffffff8109aead ffff8800242a3a50 ffff88002400ad00 > Call Trace: > [<ffffffff81251b3b>] blktap_ring_vm_close+0x39/0x12d > [<ffffffff8109aead>] remove_vma+0x3b/0x71 > [<ffffffff8109b036>] exit_mmap+0x153/0x175 > [<ffffffff8103eef6>] mmput+0x3e/0xd9 > [<ffffffff81042b83>] exit_mm+0x100/0x10b > [<ffffffff81044416>] do_exit+0x1b9/0x638 > [<ffffffff8104d797>] ? get_signal_to_deliver+0x2dd/0x36e > [<ffffffff8100efef>] ? xen_restore_fl_direct_end+0x0/0x1 > [<ffffffff81044908>] do_group_exit+0x73/0x9c > [<ffffffff8104d809>] get_signal_to_deliver+0x34f/0x36e > [<ffffffff810111c4>] do_signal+0x6d/0x6b0 > [<ffffffff8104ef1f>] ? sys_getsid+0x88/0xaf > [<ffffffff810bd680>] ? poll_select_copy_remaining+0xc9/0xee > [<ffffffff8101182e>] do_notify_resume+0x27/0x47 > [<ffffffff81390f80>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [<ffffffff810549ef>] ? remove_wait_queue+0x12/0x45 > [<ffffffff81011f56>] int_signal+0x12/0x17 > Code: a8 01 74 0a 48 89 df e8 24 e6 ff ff eb 46 4c 8d a3 a0 02 00 00 4c 89 e7 e > RIP [<ffffffff8125413b>] blktap_device_restart+0x7a/0xa8 > RSP<ffff88002d767be8> > ---[ end trace 1b88501e9b8effb5 ]--- > > J >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/24/2010 03:20 PM, Daniel Stodden wrote:> Jake, any immediate ideas? >Just got another one on domain shutdown. The crashing instruction is: 0xffffffff8104a3f2 <lock_timer_base+17>: mov 0x28(%r12),%r14 r12 = 6b6b6b6b6b6b6c8b 0x6b is the use-after-free poison value. So I think a use-after-free. 0xffffffff8104a3f2 is in lock_timer_base (/home/jeremy/git/linux/kernel/timer.c:620). 615 __acquires(timer->base->lock) 616 { 617 struct tvec_base *base; 618 619 for (;;) { 620 struct tvec_base *prelock_base = timer->base; 621 base = tbase_get_base(prelock_base); 622 if (likely(base != NULL)) { 623 spin_lock_irqsave(&base->lock, *flags); 624 if (likely(prelock_base == timer->base)) general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/virtual/blktap2/blktap0/remove CPU 1 Modules linked in: e1000 evdev ahci dm_mod sd_mod mptspi mptscsih mptbase scsi_] Pid: 6533, comm: xend Not tainted 2.6.32.9 #356 PowerEdge 1850 RIP: e030:[<ffffffff8104a3f2>] [<ffffffff8104a3f2>] lock_timer_base+0x11/0x4d RSP: e02b:ffff880021a73ce8 EFLAGS: 00010286 RAX: ffff88001d858f40 RBX: 6b6b6b6b6b6b6c8b RCX: 0000000000000000 RDX: ffffffff8104abda RSI: ffff880021a73d20 RDI: 6b6b6b6b6b6b6c8b RBP: ffff880021a73d08 R08: 0000000000000000 R09: 0000000000000001 R10: ffffffff8104abda R11: ffff880003cd1810 R12: 6b6b6b6b6b6b6c8b R13: ffff880021a73d20 R14: 000000000000011e R15: ffff880021a73e20 FS: 00007f164dffb910(0000) GS:ffff8800028fb000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000001d62140 CR3: 000000002eac8000 CR4: 0000000000000660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process xend (pid: 6533, threadinfo ffff880021a72000, task ffff88001d858f40) Stack: 6b6b6b6b6b6b6c8b 00000000ffffffff ffff88002f2802e8 000000000000011e <0> ffff880021a73d38 ffffffff8104a7b5 0000000000000001 ffffffff8104abda <0> 6b6b6b6b6b6b6c8b 6b6b6b6b6b6b6cbb ffff880021a73d78 ffffffff8104ac68 Call Trace: [<ffffffff8104a7b5>] try_to_del_timer_sync+0x1b/0x81 [<ffffffff8104abda>] ? del_timer_sync+0x0/0xa1 [<ffffffff8104ac68>] del_timer_sync+0x8e/0xa1 [<ffffffff8104abda>] ? del_timer_sync+0x0/0xa1 [<ffffffff811e79b7>] ? kobject_release+0x0/0x66 [<ffffffff811d842c>] blk_sync_queue+0x18/0x34 [<ffffffff811d8457>] blk_cleanup_queue+0xf/0x4b [<ffffffff81254039>] blktap_device_destroy+0xad/0xd7 [<ffffffff812512a5>] blktap_control_destroy_device+0x55/0x154 [<ffffffff81390438>] ? mutex_lock_nested+0x2a5/0x2b4 [<ffffffff81254de5>] blktap_sysfs_remove_device+0x39/0x49 [<ffffffff81294170>] dev_attr_store+0x1b/0x1d [<ffffffff810fa9d4>] sysfs_write_file+0xf6/0x132 [<ffffffff810b03f0>] vfs_write+0xad/0x14e [<ffffffff810b0c1b>] ? fget_light+0x52/0xeb [<ffffffff811eab92>] ? __up_read+0x1c/0xa2 [<ffffffff810b054a>] sys_write+0x45/0x6c [<ffffffff81011c82>] system_call_fastpath+0x16/0x1b Code: 55 31 d2 48 89 e5 31 f6 65 48 8b 3c 25 c0 cb 00 00 e8 95 77 00 00 c9 48 9 RIP [<ffffffff8104a3f2>] lock_timer_base+0x11/0x4d RSP<ffff880021a73ce8> ---[ end trace 767ddf28dd1b4a3e ]---> Daniel > > On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote: > >> When rebooting the machine, I got this crash from blktap. The rip maps to line 262 in >> 0xffffffff812548a1 is in blktap_request_pool_free (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262). >> 257 spin_lock_irqsave(&pool.lock, flags); >> 258 >> 259 pool.status = BLKTAP_POOL_CLOSING; >> 260 while (atomic_read(&pool.reqs_in_use)) { >> 261 spin_unlock_irqrestore(&pool.lock, flags); >> 262 wait_event(pool.wait_queue, !atomic_read(&pool.reqs_in_use)); >> 263 spin_lock_irqsave(&pool.lock, flags); >> 264 } >> 265 >> 266 for (i = 0; i< MAX_BUCKETS; i++) { >> >> >> blktap_ring_vm_close: unmapping ring 6 >> blktap_ring_release: freeing device 6 >> general protection fault: 0000 [#2] SMP DEBUG_PAGEALLOC >> last sysfs file: /sys/devices/virtual/net/eth0/address >> CPU 1 >> Modules linked in: e1000 evdev ahci dm_mod sd_mod mptspi mptscsih mptbase scsi_] >> Pid: 993, comm: tapdisk2 Tainted: G D 2.6.32.8 #355 PowerEdge 1850 >> RIP: e030:[<ffffffff8125413b>] [<ffffffff8125413b>] blktap_device_restart+0x7a8 >> RSP: e02b:ffff88002d767be8 EFLAGS: 00010092 >> RAX: ffff88002ea06b08 RBX: ffff88002f319090 RCX: 0000000000000000 >> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b >> RBP: ffff88002d767bf8 R08: 0000000000000002 R09: 0000000000000001 >> R10: ffffffff8125412d R11: ffffffff811eaa4a R12: ffff88002f319330 >> R13: ffff88002f3191b8 R14: ffff8800242a3a50 R15: 0000000000000001 >> FS: 00007f7e3234a740(0000) GS:ffff8800028fb000(0000) knlGS:0000000000000000 >> CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b >> CR2: 00000036a05a8d84 CR3: 000000002d364000 CR4: 0000000000000660 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process tapdisk2 (pid: 993, threadinfo ffff88002d766000, task ffff8800242c3d00) >> Stack: >> ffff88002f319090 ffff88002f319238 ffff88002d767c28 ffffffff81251b3b >> <0> ffff8800242a3a50 ffff88002f2c2870 ffff880002909820 ffff88002400ad60 >> <0> ffff88002d767c48 ffffffff8109aead ffff8800242a3a50 ffff88002400ad00 >> Call Trace: >> [<ffffffff81251b3b>] blktap_ring_vm_close+0x39/0x12d >> [<ffffffff8109aead>] remove_vma+0x3b/0x71 >> [<ffffffff8109b036>] exit_mmap+0x153/0x175 >> [<ffffffff8103eef6>] mmput+0x3e/0xd9 >> [<ffffffff81042b83>] exit_mm+0x100/0x10b >> [<ffffffff81044416>] do_exit+0x1b9/0x638 >> [<ffffffff8104d797>] ? get_signal_to_deliver+0x2dd/0x36e >> [<ffffffff8100efef>] ? xen_restore_fl_direct_end+0x0/0x1 >> [<ffffffff81044908>] do_group_exit+0x73/0x9c >> [<ffffffff8104d809>] get_signal_to_deliver+0x34f/0x36e >> [<ffffffff810111c4>] do_signal+0x6d/0x6b0 >> [<ffffffff8104ef1f>] ? sys_getsid+0x88/0xaf >> [<ffffffff810bd680>] ? poll_select_copy_remaining+0xc9/0xee >> [<ffffffff8101182e>] do_notify_resume+0x27/0x47 >> [<ffffffff81390f80>] ? trace_hardirqs_on_thunk+0x3a/0x3f >> [<ffffffff810549ef>] ? remove_wait_queue+0x12/0x45 >> [<ffffffff81011f56>] int_signal+0x12/0x17 >> Code: a8 01 74 0a 48 89 df e8 24 e6 ff ff eb 46 4c 8d a3 a0 02 00 00 4c 89 e7 e >> RIP [<ffffffff8125413b>] blktap_device_restart+0x7a/0xa8 >> RSP<ffff88002d767be8> >> ---[ end trace 1b88501e9b8effb5 ]--- >> >> J >> >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote:> When rebooting the machine, I got this crash from blktap. The rip maps to line 262 in > 0xffffffff812548a1 is in blktap_request_pool_free (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).Uhm, where did that RIP come from? pool_free is on the module exit path. The stack trace below looks like a crash from the broadcasted SIGTERM before reboot. Daniel> 257 spin_lock_irqsave(&pool.lock, flags); > 258 > 259 pool.status = BLKTAP_POOL_CLOSING; > 260 while (atomic_read(&pool.reqs_in_use)) { > 261 spin_unlock_irqrestore(&pool.lock, flags); > 262 wait_event(pool.wait_queue, !atomic_read(&pool.reqs_in_use)); > 263 spin_lock_irqsave(&pool.lock, flags); > 264 } > 265 > 266 for (i = 0; i< MAX_BUCKETS; i++) { > > > blktap_ring_vm_close: unmapping ring 6 > blktap_ring_release: freeing device 6 > general protection fault: 0000 [#2] SMP DEBUG_PAGEALLOC > last sysfs file: /sys/devices/virtual/net/eth0/address > CPU 1 > Modules linked in: e1000 evdev ahci dm_mod sd_mod mptspi mptscsih mptbase scsi_] > Pid: 993, comm: tapdisk2 Tainted: G D 2.6.32.8 #355 PowerEdge 1850 > RIP: e030:[<ffffffff8125413b>] [<ffffffff8125413b>] blktap_device_restart+0x7a8 > RSP: e02b:ffff88002d767be8 EFLAGS: 00010092 > RAX: ffff88002ea06b08 RBX: ffff88002f319090 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b > RBP: ffff88002d767bf8 R08: 0000000000000002 R09: 0000000000000001 > R10: ffffffff8125412d R11: ffffffff811eaa4a R12: ffff88002f319330 > R13: ffff88002f3191b8 R14: ffff8800242a3a50 R15: 0000000000000001 > FS: 00007f7e3234a740(0000) GS:ffff8800028fb000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00000036a05a8d84 CR3: 000000002d364000 CR4: 0000000000000660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process tapdisk2 (pid: 993, threadinfo ffff88002d766000, task ffff8800242c3d00) > Stack: > ffff88002f319090 ffff88002f319238 ffff88002d767c28 ffffffff81251b3b > <0> ffff8800242a3a50 ffff88002f2c2870 ffff880002909820 ffff88002400ad60 > <0> ffff88002d767c48 ffffffff8109aead ffff8800242a3a50 ffff88002400ad00 > Call Trace: > [<ffffffff81251b3b>] blktap_ring_vm_close+0x39/0x12d > [<ffffffff8109aead>] remove_vma+0x3b/0x71 > [<ffffffff8109b036>] exit_mmap+0x153/0x175 > [<ffffffff8103eef6>] mmput+0x3e/0xd9 > [<ffffffff81042b83>] exit_mm+0x100/0x10b > [<ffffffff81044416>] do_exit+0x1b9/0x638 > [<ffffffff8104d797>] ? get_signal_to_deliver+0x2dd/0x36e > [<ffffffff8100efef>] ? xen_restore_fl_direct_end+0x0/0x1 > [<ffffffff81044908>] do_group_exit+0x73/0x9c > [<ffffffff8104d809>] get_signal_to_deliver+0x34f/0x36e > [<ffffffff810111c4>] do_signal+0x6d/0x6b0 > [<ffffffff8104ef1f>] ? sys_getsid+0x88/0xaf > [<ffffffff810bd680>] ? poll_select_copy_remaining+0xc9/0xee > [<ffffffff8101182e>] do_notify_resume+0x27/0x47 > [<ffffffff81390f80>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [<ffffffff810549ef>] ? remove_wait_queue+0x12/0x45 > [<ffffffff81011f56>] int_signal+0x12/0x17 > Code: a8 01 74 0a 48 89 df e8 24 e6 ff ff eb 46 4c 8d a3 a0 02 00 00 4c 89 e7 e > RIP [<ffffffff8125413b>] blktap_device_restart+0x7a/0xa8 > RSP<ffff88002d767be8> > ---[ end trace 1b88501e9b8effb5 ]--- > > J >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/24/2010 03:49 PM, Daniel Stodden wrote:> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote: > >> When rebooting the machine, I got this crash from blktap. The rip maps to line 262 in >> 0xffffffff812548a1 is in blktap_request_pool_free (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262). >> > Uhm, where did that RIP come from? > > pool_free is on the module exit path. The stack trace below looks like a > crash from the broadcasted SIGTERM before reboot. >Ignore it; I generated it from a different kernel from the one that crashed. But the other oops I posted should be all consistent and meaningful. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2010-02-24 at 18:26 -0500, Jeremy Fitzhardinge wrote:> On 02/24/2010 03:20 PM, Daniel Stodden wrote: > > Jake, any immediate ideas? > > > > Just got another one on domain shutdown. The crashing instruction is: > 0xffffffff8104a3f2 <lock_timer_base+17>: mov 0x28(%r12),%r14 >Oh, a classic. I think I had the same issue somewhere in blktap1 when moving to 2.6.27. Coming. Thanks, Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2010-02-24 at 19:12 -0500, Daniel Stodden wrote:> On Wed, 2010-02-24 at 18:26 -0500, Jeremy Fitzhardinge wrote: > > On 02/24/2010 03:20 PM, Daniel Stodden wrote: > > > Jake, any immediate ideas? > > > > > > > Just got another one on domain shutdown. The crashing instruction is: > > 0xffffffff8104a3f2 <lock_timer_base+17>: mov 0x28(%r12),%r14 > > > > Oh, a classic. > > I think I had the same issue somewhere in blktap1 when moving to 2.6.27. > > Coming.This should do. 100% untested. --snip--- blktap/device: Fix wild ptr deref during device destruction. A put_disk() before blk_cleanup_queue() would free gd before gd->queue is read. Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com> diff -r 7d0b5bd0725f drivers/xen/blktap/device.c --- a/drivers/xen/blktap/device.c Fri Feb 05 11:12:24 2010 -0800 +++ b/drivers/xen/blktap/device.c Wed Feb 24 16:13:26 2010 -0800 @@ -1027,8 +1027,8 @@ #endif del_gendisk(dev->gd); + blk_cleanup_queue(dev->gd->queue); put_disk(dev->gd); - blk_cleanup_queue(dev->gd->queue); dev->gd = NULL; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Feb-25 00:24 UTC
[Xen-devel] Re: [PATCH] Fix wild ptr deref during device destruction.
C&P results might not have been so great. Diff attached. Cheers, Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote:> On 02/24/2010 03:49 PM, Daniel Stodden wrote: > > On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote: > > > >> When rebooting the machine, I got this crash from blktap. The rip maps to line 262 in > >> 0xffffffff812548a1 is in blktap_request_pool_free (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262). > >> > > Uhm, where did that RIP come from? > > > > pool_free is on the module exit path. The stack trace below looks like a > > crash from the broadcasted SIGTERM before reboot. > > > > Ignore it; I generated it from a different kernel from the one that > crashed. But the other oops I posted should be all consistent and > meaningful.Ignore only the debuginfo quote, right? Cos this looks like a different issue to me. Thanks, Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/24/2010 04:29 PM, Daniel Stodden wrote:> On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote: > >> On 02/24/2010 03:49 PM, Daniel Stodden wrote: >> >>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote: >>> >>> >>>> When rebooting the machine, I got this crash from blktap. The rip maps to line 262 in >>>> 0xffffffff812548a1 is in blktap_request_pool_free (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262). >>>> >>>> >>> Uhm, where did that RIP come from? >>> >>> pool_free is on the module exit path. The stack trace below looks like a >>> crash from the broadcasted SIGTERM before reboot. >>> >>> >> Ignore it; I generated it from a different kernel from the one that >> crashed. But the other oops I posted should be all consistent and >> meaningful. >> > Ignore only the debuginfo quote, right? > Cos this looks like a different issue to me. >Perhaps. I got all the others on normal domain shutdown, but this one was on machine reboot. I''ll try to repro (as I boot the test kernel with your patch in it). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge wrote:> On 02/24/2010 04:29 PM, Daniel Stodden wrote: > > On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote: > > > >> On 02/24/2010 03:49 PM, Daniel Stodden wrote: > >> > >>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote: > >>> > >>> > >>>> When rebooting the machine, I got this crash from blktap. The rip maps to line 262 in > >>>> 0xffffffff812548a1 is in blktap_request_pool_free (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262). > >>>> > >>>> > >>> Uhm, where did that RIP come from? > >>> > >>> pool_free is on the module exit path. The stack trace below looks like a > >>> crash from the broadcasted SIGTERM before reboot. > >>> > >>> > >> Ignore it; I generated it from a different kernel from the one that > >> crashed. But the other oops I posted should be all consistent and > >> meaningful. > >> > > Ignore only the debuginfo quote, right? > > Cos this looks like a different issue to me. > > > > Perhaps. I got all the others on normal domain shutdown, but this one > was on machine reboot. I''ll try to repro (as I boot the test kernel > with your patch in it).(gdb) list *(blktap_device_restart+0x7a) 0x2a73 is in blktap_device_restart (/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/drivers/xen/blktap/device.c:920). 915 /* Re-enable calldowns. */ 916 if (blk_queue_stopped(dev->gd->queue)) 917 blk_start_queue(dev->gd->queue); 918 919 /* Kick things off immediately. */ 920 blktap_device_do_request(dev->gd->queue); 921 922 spin_unlock_irq(&dev->lock); 923 } 924 Assuming we''ve been dereferencing a NULL gendisk, i.e. device_destroy racing against device_restart. Would take * Tapdisk killed on the other thread, which goes through into a device_restart(). Which is what your stacktrace shows. * Device removal pending, blocking until device->users drops to 0, then doing the device_destroy(). That might have happened during bdev .release. Both running at the same time sounds like what happens if you kill them all at once. That clearly takes another patch then. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2010-02-24 at 20:47 -0500, Daniel Stodden wrote:> On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge wrote: > > On 02/24/2010 04:29 PM, Daniel Stodden wrote: > > > On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote: > > > > > >> On 02/24/2010 03:49 PM, Daniel Stodden wrote: > > >> > > >>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote: > > >>> > > >>> > > >>>> When rebooting the machine, I got this crash from blktap. The rip maps to line 262 in > > >>>> 0xffffffff812548a1 is in blktap_request_pool_free (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262). > > >>>> > > >>>> > > >>> Uhm, where did that RIP come from? > > >>> > > >>> pool_free is on the module exit path. The stack trace below looks like a > > >>> crash from the broadcasted SIGTERM before reboot. > > >>> > > >>> > > >> Ignore it; I generated it from a different kernel from the one that > > >> crashed. But the other oops I posted should be all consistent and > > >> meaningful. > > >> > > > Ignore only the debuginfo quote, right? > > > Cos this looks like a different issue to me. > > > > > > > Perhaps. I got all the others on normal domain shutdown, but this one > > was on machine reboot. I''ll try to repro (as I boot the test kernel > > with your patch in it). > > (gdb) list *(blktap_device_restart+0x7a) > 0x2a73 is in blktap_device_restart > (/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/drivers/xen/blktap/device.c:920). > 915 /* Re-enable calldowns. */ > 916 if (blk_queue_stopped(dev->gd->queue)) > 917 blk_start_queue(dev->gd->queue); > 918 > 919 /* Kick things off immediately. */ > 920 blktap_device_do_request(dev->gd->queue); > 921 > 922 spin_unlock_irq(&dev->lock); > 923 } > 924 > > Assuming we''ve been dereferencing a NULL gendisk, i.e. device_destroy > racing against device_restart. > > Would take > > * Tapdisk killed on the other thread, which goes through into > a device_restart(). Which is what your stacktrace shows. > > * Device removal pending, blocking until > device->users drops to 0, then doing the device_destroy(). > That might have happened during bdev .release. > > Both running at the same time sounds like what happens if you kill them > all at once. > > That clearly takes another patch then.Jeremy, can you try out the attached patch for me? This should close the above shutdown race as well. Should be nowhere as frequent as the timer_sync crash fixed earlier. Thanks, Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Feb-25 08:28 UTC
[Xen-devel] Re: [PATCH] Fix wild ptr deref during device destruction.
Wouldn''t it be better to move blk_cleanup_queue() even before del_gendisk()? Jan>>> Daniel Stodden <daniel.stodden@citrix.com> 25.02.10 01:24 >>>C&P results might not have been so great. Diff attached. Cheers, Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Feb-25 09:57 UTC
Re: [Xen-devel] Re: [PATCH] Fix wild ptr deref during device destruction.
On Thu, 2010-02-25 at 03:28 -0500, Jan Beulich wrote:> Wouldn''t it be better to move blk_cleanup_queue() even before del_gendisk()?No. [2009-09-22 12:48:58 UTC] Call Trace: [2009-09-22 12:48:58 UTC] [<c01d0186>] ? sysfs_remove_dir+0x46/0xa0 [2009-09-22 12:48:58 UTC] [<c020180f>] ? kobject_del+0xf/0x30 [2009-09-22 12:48:58 UTC] [<c01f107c>] ? __elv_unregister_queue+0x1c/0x20 [2009-09-22 12:48:58 UTC] [<c01f108f>] ? elv_unregister_queue+0xf/0x20 [2009-09-22 12:48:58 UTC] [<c01f512a>] ? blk_unregister_queue+0x2a/0x70 [2009-09-22 12:48:58 UTC] [<c01fa55a>] ? unlink_gendisk+0x2a/0x40 [2009-09-22 12:48:58 UTC] [<c01c9b10>] ? del_gendisk+0x60/0xd0 [2009-09-22 12:48:58 UTC] [<c028066e>] ? destroy_backdev+0x7e/0x100 [2009-09-22 12:48:58 UTC] [<c027f05b>] ? tap_blkif_schedule+0x5cb/0x830 [2009-09-22 12:48:58 UTC] [<c011ed51>] ? pick_next_task_fair+0x91/0xd0 [2009-09-22 12:48:58 UTC] [<c013dd70>] ? autoremove_wake_function+0x0/0x50 [2009-09-22 12:48:58 UTC] [<c027ea90>] ? tap_blkif_schedule+0x0/0x830 [2009-09-22 12:48:58 UTC] [<c013da12>] ? kthread+0x42/0x70 [2009-09-22 12:48:58 UTC] [<c013d9d0>] ? kthread+0x0/0x70 [2009-09-22 12:48:58 UTC] [<c010561b>] ? kernel_thread_helper+0x7/0x10 changeset: 660:88fe4866b738 user: Daniel Stodden <daniel.stodden@citrix.com> date: Wed Oct 07 13:54:16 2009 -0700 files: CA-32943-wild-ptr-deref.diff series description: CA-33070: Fix and reenable my broken CA-30953.diff & co. A del_gendisk() definitely wants to find a queue on the disk. Which in turn will have dropped to zero right after the cleanup call. Because that crackbrained gendisk, as the only queue holder which really matters in that entire game, is also the single entity left short of maintaining that ref on its own here. In summary, it apparently has to be *this* particular order. +diff -r ebd0574c414a drivers/xen/blktap/backdev.c +--- a/drivers/xen/blktap/backdev.c Mon Sep 21 16:09:37 2009 -0700 ++++ b/drivers/xen/blktap/backdev.c Tue Sep 22 17:16:52 2009 -0700 +@@ -99,10 +99,9 @@ spin_unlock_irq(&backdev_io_lock); + del_gendisk(info->gd); + blk_cleanup_queue(info->gd->queue); -+ - del_gendisk(info->gd); put_disk(info->gd); - blk_cleanup_queue(info->gd->queue); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Feb-25 10:02 UTC
Re: [Xen-devel] Re: [PATCH] Fix wild ptr deref during device destruction.
On Thu, 2010-02-25 at 04:57 -0500, Daniel Stodden wrote:> On Thu, 2010-02-25 at 03:28 -0500, Jan Beulich wrote: > > Wouldn''t it be better to move blk_cleanup_queue() even before del_gendisk()? > > No.Well, I beg you to differ. Maybe this changed, after all this is 2.6.3x. Cheers, Daniel> [2009-09-22 12:48:58 UTC] Call Trace: > [2009-09-22 12:48:58 UTC] [<c01d0186>] ? sysfs_remove_dir+0x46/0xa0 > [2009-09-22 12:48:58 UTC] [<c020180f>] ? kobject_del+0xf/0x30 > [2009-09-22 12:48:58 UTC] [<c01f107c>] ? __elv_unregister_queue+0x1c/0x20 > [2009-09-22 12:48:58 UTC] [<c01f108f>] ? elv_unregister_queue+0xf/0x20 > [2009-09-22 12:48:58 UTC] [<c01f512a>] ? blk_unregister_queue+0x2a/0x70 > [2009-09-22 12:48:58 UTC] [<c01fa55a>] ? unlink_gendisk+0x2a/0x40 > [2009-09-22 12:48:58 UTC] [<c01c9b10>] ? del_gendisk+0x60/0xd0 > [2009-09-22 12:48:58 UTC] [<c028066e>] ? destroy_backdev+0x7e/0x100 > [2009-09-22 12:48:58 UTC] [<c027f05b>] ? tap_blkif_schedule+0x5cb/0x830 > [2009-09-22 12:48:58 UTC] [<c011ed51>] ? pick_next_task_fair+0x91/0xd0 > [2009-09-22 12:48:58 UTC] [<c013dd70>] ? autoremove_wake_function+0x0/0x50 > [2009-09-22 12:48:58 UTC] [<c027ea90>] ? tap_blkif_schedule+0x0/0x830 > [2009-09-22 12:48:58 UTC] [<c013da12>] ? kthread+0x42/0x70 > [2009-09-22 12:48:58 UTC] [<c013d9d0>] ? kthread+0x0/0x70 > [2009-09-22 12:48:58 UTC] [<c010561b>] ? kernel_thread_helper+0x7/0x10 > > changeset: 660:88fe4866b738 > user: Daniel Stodden <daniel.stodden@citrix.com> > date: Wed Oct 07 13:54:16 2009 -0700 > files: CA-32943-wild-ptr-deref.diff series > description: > CA-33070: Fix and reenable my broken CA-30953.diff & co. > > A del_gendisk() definitely wants to find a queue on the disk. Which > in turn will have dropped to zero right after the cleanup > call. Because that crackbrained gendisk, as the only queue holder > which really matters in that entire game, is also the single entity > left short of maintaining that ref on its own here. > > In summary, it apparently has to be *this* particular order. > > +diff -r ebd0574c414a drivers/xen/blktap/backdev.c > +--- a/drivers/xen/blktap/backdev.c Mon Sep 21 16:09:37 2009 -0700 > ++++ b/drivers/xen/blktap/backdev.c Tue Sep 22 17:16:52 2009 -0700 > +@@ -99,10 +99,9 @@ > spin_unlock_irq(&backdev_io_lock); > > + del_gendisk(info->gd); > + blk_cleanup_queue(info->gd->queue); > -+ > - del_gendisk(info->gd); > put_disk(info->gd); > > - blk_cleanup_queue(info->gd->queue); > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Feb-25 22:54 UTC
Re: [Xen-devel] Yet another [PATCH] blkfront: Fix wild ptr deref during device destruction.
On Thu, 2010-02-25 at 05:02 -0500, Daniel Stodden wrote:> On Thu, 2010-02-25 at 04:57 -0500, Daniel Stodden wrote: > > On Thu, 2010-02-25 at 03:28 -0500, Jan Beulich wrote: > > > Wouldn''t it be better to move blk_cleanup_queue() even before del_gendisk()? > > > > No. > > Well, I beg you to differ. Maybe this changed, after all this is 2.6.3x.Oh, I guess the answer is no. I just came across the same issue in a debian/lenny while detaching a CD on 2.6.32. Daniel Feb 25 13:33:18 debian kernel: [ 455.074625] *pdpt = 000000000eff8027 *pde = 0000000000000000 Feb 25 13:33:18 debian kernel: [ 455.074660] Modules linked in: xenfs nls_utf8 isofs nls_base loop evdev snd_pcsp snd_pcm snd_timer snd soundcore xen_netfront snd_page_alloc ext3 jbd mbcache xen_blkfront thermal_sys Feb 25 13:33:18 debian kernel: [ 455.074727] Feb 25 13:33:18 debian kernel: [ 455.074733] Pid: 1114, comm: umount Not tainted (2.6.32-2-686-bigmem #1) Feb 25 13:33:18 debian kernel: [ 455.074743] EIP: 0061:[<c1139509>] EFLAGS: 00010206 CPU: 0 Feb 25 13:33:18 debian kernel: [ 455.074751] EIP is at kobject_uevent_env+0x3d/0x35c Feb 25 13:33:18 debian kernel: [ 455.074759] EAX: 00000ad1 EBX: cf9562a8 ECX: 00000000 EDX: 00000ad1 Feb 25 13:33:18 debian kernel: [ 455.074768] ESI: cfb00800 EDI: cfb00200 EBP: cf9562a8 ESP: ced73eac Feb 25 13:33:18 debian kernel: [ 455.074777] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 Feb 25 13:33:18 debian kernel: [ 455.074801] 00000000 00000001 00000ad1 00000000 c1309b85 ced73ec0 ced73ec0 cf915300 Feb 25 13:33:18 debian kernel: [ 455.074828] <0> c12f8540 cfb00200 cf9562a8 cfb00800 cfb00200 00000000 c1125113 ce6ceeb0 Feb 25 13:33:18 debian kernel: [ 455.074859] <0> c112d430 cfb00800 cfb00800 c1130ba3 0000000a c10f509d cfb00800 00000000 Feb 25 13:33:18 debian kernel: [ 455.074903] [<c1125113>] ? elv_unregister_queue+0x17/0x21 Feb 25 13:33:18 debian kernel: [ 455.074915] [<c112d430>] ? blk_unregister_queue+0x26/0x59 Feb 25 13:33:18 debian kernel: [ 455.074926] [<c1130ba3>] ? unlink_gendisk+0x27/0x3b Feb 25 13:33:18 debian kernel: [ 455.074937] [<c10f509d>] ? del_gendisk+0x7b/0xf6 Feb 25 13:33:18 debian kernel: [ 455.074949] [<d082fc73>] ? blkfront_closing+0x68/0x72 [xen_blkfront] Feb 25 13:33:18 debian kernel: [ 455.074961] [<d08300c4>] ? blkif_release+0x38/0x3d [xen_blkfront] Feb 25 13:33:18 debian kernel: [ 455.074974] [<c10d9744>] ? __blkdev_put+0x7a/0x10f Feb 25 13:33:18 debian kernel: [ 455.074985] [<c10ea727>] ? vfs_quota_off+0x0/0xd Feb 25 13:33:18 debian kernel: [ 455.074996] [<c10bc913>] ? deactivate_super+0x4a/0x5f Feb 25 13:33:18 debian kernel: [ 455.075007] [<c10cc6c5>] ? sys_umount+0x28b/0x2b1 Feb 25 13:33:18 debian kernel: [ 455.075017] [<c10cc6f6>] ? sys_oldumount+0xb/0xe Feb 25 13:33:18 debian kernel: [ 455.075029] [<c1007f7b>] ? sysenter_do_call+0x12/0x28 Feb 25 13:33:18 debian kernel: [ 455.075243] ---[ end trace 91b332cfeb23bfaf ]--- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Feb-25 23:18 UTC
Re: [Xen-devel] [PATCH] Re: Crash on blktap shutdown
On 02/24/2010 07:03 PM, Daniel Stodden wrote:> On Wed, 2010-02-24 at 20:47 -0500, Daniel Stodden wrote: > >> On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge wrote: >> >>> On 02/24/2010 04:29 PM, Daniel Stodden wrote: >>> >>>> On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote: >>>> >>>> >>>>> On 02/24/2010 03:49 PM, Daniel Stodden wrote: >>>>> >>>>> >>>>>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote: >>>>>> >>>>>> >>>>>> >>>>>>> When rebooting the machine, I got this crash from blktap. The rip maps to line 262 in >>>>>>> 0xffffffff812548a1 is in blktap_request_pool_free (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262). >>>>>>> >>>>>>> >>>>>>> >>>>>> Uhm, where did that RIP come from? >>>>>> >>>>>> pool_free is on the module exit path. The stack trace below looks like a >>>>>> crash from the broadcasted SIGTERM before reboot. >>>>>> >>>>>> >>>>>> >>>>> Ignore it; I generated it from a different kernel from the one that >>>>> crashed. But the other oops I posted should be all consistent and >>>>> meaningful. >>>>> >>>>> >>>> Ignore only the debuginfo quote, right? >>>> Cos this looks like a different issue to me. >>>> >>>> >>> Perhaps. I got all the others on normal domain shutdown, but this one >>> was on machine reboot. I''ll try to repro (as I boot the test kernel >>> with your patch in it). >>> >> (gdb) list *(blktap_device_restart+0x7a) >> 0x2a73 is in blktap_device_restart >> (/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/drivers/xen/blktap/device.c:920). >> 915 /* Re-enable calldowns. */ >> 916 if (blk_queue_stopped(dev->gd->queue)) >> 917 blk_start_queue(dev->gd->queue); >> 918 >> 919 /* Kick things off immediately. */ >> 920 blktap_device_do_request(dev->gd->queue); >> 921 >> 922 spin_unlock_irq(&dev->lock); >> 923 } >> 924 >> >> Assuming we''ve been dereferencing a NULL gendisk, i.e. device_destroy >> racing against device_restart. >> >> Would take >> >> * Tapdisk killed on the other thread, which goes through into >> a device_restart(). Which is what your stacktrace shows. >> >> * Device removal pending, blocking until >> device->users drops to 0, then doing the device_destroy(). >> That might have happened during bdev .release. >> >> Both running at the same time sounds like what happens if you kill them >> all at once. >> >> That clearly takes another patch then. >> > Jeremy, > > can you try out the attached patch for me? > > This should close the above shutdown race as well. > > Should be nowhere as frequent as the timer_sync crash fixed earlier. >Hm, the two patches changed things but I''m still seeing problems on domain shutdown. Still looks like use-after-free. blktap_device_destroy: destroy device 0 users 0 blktap_ring_vm_close: unmapping ring 0 blktap_ring_release: freeing device 0 blktap_sysfs_destroy ============================================================================BUG kmalloc-512: Poison overwritten ----------------------------------------------------------------------------- INFO: 0xffff88002e9e2048-0xffff88002e9e2048. First byte 0x6a instead of 0x6b INFO: Allocated in device_create_vargs+0x47/0xd7 age=7705 cpu=0 pid=3072 INFO: Freed in device_create_release+0x9/0xb age=14 cpu=0 pid=3320 INFO: Slab 0xffff880003cca5b0 objects=14 used=2 fp=0xffff88002e9e2000 flags=0xa3 INFO: Object 0xffff88002e9e2000 @offset=0 fp=0xffff88002e9e2248 Object 0xffff88002e9e2000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2020: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2040: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2080: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2090: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e20a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e20b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e20c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e20d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e20e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e20f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2100: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2110: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2120: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2130: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2140: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2150: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2160: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2170: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2180: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e2190: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e21a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e21b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e21c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e21d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e21e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kk Object 0xffff88002e9e21f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 k� Redzone 0xffff88002e9e2200: bb bb bb bb bb bb bb bb � Padding 0xffff88002e9e2240: 5a 5a 5a 5a 5a 5a 5a 5a Z Pid: 3327, comm: ifdown Not tainted 2.6.32 #358 Call Trace: [<ffffffff810a83f9>] print_trailer+0x16a/0x173 [<ffffffff810a89a0>] check_bytes_and_report+0xb5/0xe6 [<ffffffff810a8a96>] check_object+0xc5/0x237 [<ffffffff810aa588>] __slab_alloc+0x493/0x591 [<ffffffff810e8fea>] ? load_elf_binary+0xe2/0x17d8 [<ffffffff810e8fea>] ? load_elf_binary+0xe2/0x17d8 [<ffffffff810ab06f>] __kmalloc+0xbe/0x12f [<ffffffff810e8fea>] load_elf_binary+0xe2/0x17d8 [<ffffffff8100e921>] ? xen_force_evtchn_callback+0xd/0xf [<ffffffff8100e921>] ? xen_force_evtchn_callback+0xd/0xf [<ffffffff8100eff2>] ? check_events+0x12/0x20 [<ffffffff810b3ee9>] ? search_binary_handler+0x18f/0x278 [<ffffffff810e0208>] ? flock_to_posix_lock+0x4/0xe1 [<ffffffff810b3e2c>] ? search_binary_handler+0xd2/0x278 [<ffffffff8100efdf>] ? xen_restore_fl_direct_end+0x0/0x1 [<ffffffff81064f38>] ? lock_release+0x15a/0x166 [<ffffffff810e0208>] ? flock_to_posix_lock+0x4/0xe1 [<ffffffff810b3e39>] search_binary_handler+0xdf/0x278 [<ffffffff810e8f08>] ? load_elf_binary+0x0/0x17d8 [<ffffffff810b5453>] do_execve+0x185/0x27a [<ffffffff81010673>] sys_execve+0x3e/0x5c [<ffffffff8101209a>] stub_execve+0x6a/0xc0 FIX kmalloc-512: Restoring 0xffff88002e9e2048-0xffff88002e9e2048=0x6b J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2010-02-25 at 18:18 -0500, Jeremy Fitzhardinge wrote:> On 02/24/2010 07:03 PM, Daniel Stodden wrote: > > On Wed, 2010-02-24 at 20:47 -0500, Daniel Stodden wrote: > > > >> On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge wrote: > >> > >>> On 02/24/2010 04:29 PM, Daniel Stodden wrote: > >>> > >>>> On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote: > >>>> > >>>> > >>>>> On 02/24/2010 03:49 PM, Daniel Stodden wrote: > >>>>> > >>>>> > >>>>>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> When rebooting the machine, I got this crash from blktap. The rip maps to line 262 in > >>>>>>> 0xffffffff812548a1 is in blktap_request_pool_free (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262). > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> Uhm, where did that RIP come from? > >>>>>> > >>>>>> pool_free is on the module exit path. The stack trace below looks like a > >>>>>> crash from the broadcasted SIGTERM before reboot. > >>>>>> > >>>>>> > >>>>>> > >>>>> Ignore it; I generated it from a different kernel from the one that > >>>>> crashed. But the other oops I posted should be all consistent and > >>>>> meaningful. > >>>>> > >>>>> > >>>> Ignore only the debuginfo quote, right? > >>>> Cos this looks like a different issue to me. > >>>> > >>>> > >>> Perhaps. I got all the others on normal domain shutdown, but this one > >>> was on machine reboot. I''ll try to repro (as I boot the test kernel > >>> with your patch in it). > >>> > >> (gdb) list *(blktap_device_restart+0x7a) > >> 0x2a73 is in blktap_device_restart > >> (/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/drivers/xen/blktap/device.c:920). > >> 915 /* Re-enable calldowns. */ > >> 916 if (blk_queue_stopped(dev->gd->queue)) > >> 917 blk_start_queue(dev->gd->queue); > >> 918 > >> 919 /* Kick things off immediately. */ > >> 920 blktap_device_do_request(dev->gd->queue); > >> 921 > >> 922 spin_unlock_irq(&dev->lock); > >> 923 } > >> 924 > >> > >> Assuming we''ve been dereferencing a NULL gendisk, i.e. device_destroy > >> racing against device_restart. > >> > >> Would take > >> > >> * Tapdisk killed on the other thread, which goes through into > >> a device_restart(). Which is what your stacktrace shows. > >> > >> * Device removal pending, blocking until > >> device->users drops to 0, then doing the device_destroy(). > >> That might have happened during bdev .release. > >> > >> Both running at the same time sounds like what happens if you kill them > >> all at once. > >> > >> That clearly takes another patch then. > >> > > Jeremy, > > > > can you try out the attached patch for me? > > > > This should close the above shutdown race as well. > > > > Should be nowhere as frequent as the timer_sync crash fixed earlier. > > > > Hm, the two patches changed things but I''m still seeing problems on > domain shutdown. Still looks like use-after-free.All these new-fashioned debug switches. Only causing trouble. This is yet a different piece. The sysfs code was causing a double unref on the ring device. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel