Sheng Yang
2010-Sep-15 07:10 UTC
[Xen-devel] [PATCH] C6 state with EOI issue fix for some Intel processors
There is an errata in some of Intel processors. AAJ72. EOI Transaction May Not be Sent if Software Enters Core C6 During an Interrupt Service Routine If core C6 is entered after the start of an interrupt service routine but before a write to the APIC EOI register, the core may not send an EOI transaction (if needed) and further interrupts from the same priority level or lower may be blocked. This patch fix this issue, by checking if ISR is pending before enter deep Cx state. If so, it would use power->safe_state instead of deep Cx state to prevent the above issue happen. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sheng Yang
2010-Sep-15 07:18 UTC
Re: [Xen-devel] [PATCH] C6 state with EOI issue fix for some Intel processors
On Wednesday 15 September 2010 15:10:43 Sheng Yang wrote:> There is an errata in some of Intel processors. > > AAJ72. EOI Transaction May Not be Sent if Software Enters Core C6 During > an Interrupt Service Routine > > If core C6 is entered after the start of an interrupt service routine but > before a write to the APIC EOI register, the core may not send an EOI > transaction (if needed) and further interrupts from the same priority > level or lower may be blocked. > > This patch fix this issue, by checking if ISR is pending before enter deep > Cx state. If so, it would use power->safe_state instead of deep Cx state > to prevent the above issue happen.Signed-off-by: Sheng Yang <sheng@linux.intel.com> -- regards Yang, Sheng _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Sep-15 07:32 UTC
[Xen-devel] Re: [PATCH] C6 state with EOI issue fix for some Intel processors
Aieee! :-) K. On 15/09/2010 08:10, "Sheng Yang" <sheng@linux.intel.com> wrote:> There is an errata in some of Intel processors. > > AAJ72. EOI Transaction May Not be Sent if Software Enters Core C6 During > an Interrupt Service Routine > > If core C6 is entered after the start of an interrupt service routine but > before > a write to the APIC EOI register, the core may not send an EOI transaction (if > needed) and further interrupts from the same priority level or lower may be > blocked. > > This patch fix this issue, by checking if ISR is pending before enter deep Cx > state. If so, it would use power->safe_state instead of deep Cx state to > prevent > the above issue happen._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Sep-15 08:03 UTC
[Xen-devel] Re: [PATCH] C6 state with EOI issue fix for some Intel processors
On 15/09/2010 08:10, "Sheng Yang" <sheng@linux.intel.com> wrote:> This patch fix this issue, by checking if ISR is pending before enter deep Cx > state. If so, it would use power->safe_state instead of deep Cx state to > prevent > the above issue happen.Thanks. I reworked this patch substantially and applied as xen-unstable:22160 and xen-4.0-testing:21348. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Kinzler
2010-Sep-15 13:42 UTC
Re: [Xen-devel] Re: [PATCH] C6 state with EOI issue fix for some Intel processors
On 15.09.2010 10:03, Keir Fraser wrote:>> This patch fix this issue, by checking if ISR is pending before enter deep Cx >> state. If so, it would use power->safe_state instead of deep Cx state to >> prevent >> the above issue happen. > Thanks. I reworked this patch substantially and applied as > xen-unstable:22160 and xen-4.0-testing:21348.I tested the patch on vanilla 4.0.1 and it does help a bit. Uptime was now over 100 minutes instead of under 3 minutes. But problems still occurred (aacraid reset, eth reset). With my patch from (http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html) the machine uptime was over 10 days when I stopped the test. Regards Andreas Sep 15 14:55:19 virt kernel: ------------[ cut here ]------------ Sep 15 14:55:19 virt kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x220/0x230() Sep 15 14:55:19 virt kernel: Hardware name: X8SIL Sep 15 14:55:19 virt kernel: NETDEV WATCHDOG: peth0 (e1000e): transmit queue 0 timed out Sep 15 14:55:19 virt kernel: Modules linked in: bridge stp llc iptable_filter xt_MARK xt_mark xt_iprange xt_conntrack nf_conntrack ip_tables x_tables tun loop e1000e Sep 15 14:55:19 virt kernel: Pid: 4088, comm: blkback.1.hdc Not tainted 2.6.32.18-pvops0-ak3 #1 Sep 15 14:55:19 virt kernel: Call Trace: Sep 15 14:55:19 virt kernel: <IRQ> [<ffffffff810458f6>] warn_slowpath_common+0x76/0xb0 Sep 15 14:55:19 virt kernel: [<ffffffff8104598c>] warn_slowpath_fmt+0x3c/0x40 Sep 15 14:55:19 virt kernel: [<ffffffff812f1b70>] dev_watchdog+0x220/0x230 Sep 15 14:55:19 virt kernel: [<ffffffff81050820>] ? mod_timer+0x110/0x180 Sep 15 14:55:19 virt kernel: [<ffffffff81091c40>] ? sync_supers_timer_fn+0x0/0x20 Sep 15 14:55:19 virt kernel: [<ffffffff812f1950>] ? dev_watchdog+0x0/0x230 Sep 15 14:55:19 virt kernel: [<ffffffff810502fc>] run_timer_softirq+0x14c/0x230 Sep 15 14:55:19 virt kernel: [<ffffffff8104b72f>] __do_softirq+0xaf/0x140 Sep 15 14:55:19 virt kernel: [<ffffffff811c5c09>] ? __xen_evtchn_do_upcall+0x219/0x230 Sep 15 14:55:19 virt kernel: [<ffffffff8101357c>] call_softirq+0x1c/0x30 Sep 15 14:55:19 virt kernel: [<ffffffff81015675>] do_softirq+0x65/0xa0 Sep 15 14:55:19 virt kernel: [<ffffffff8104b3fd>] irq_exit+0x8d/0x90 Sep 15 14:55:19 virt kernel: [<ffffffff811c5cdd>] xen_evtchn_do_upcall+0x3d/0x60 Sep 15 14:55:19 virt kernel: [<ffffffff810135ce>] xen_do_hypervisor_callback+0x1e/0x30 Sep 15 14:55:19 virt kernel: <EOI> [<ffffffff8100922a>] ? hypercall_page+0x22a/0x1010 Sep 15 14:55:19 virt kernel: [<ffffffff8100922a>] ? hypercall_page+0x22a/0x1010 Sep 15 14:55:19 virt kernel: [<ffffffff8100ed7d>] ? xen_force_evtchn_callback+0xd/0x10 Sep 15 14:55:19 virt kernel: [<ffffffff8100f712>] ? check_events+0x12/0x20 Sep 15 14:55:19 virt kernel: [<ffffffff8100f6b9>] ? xen_irq_enable_direct_end+0x0/0x7 Sep 15 14:55:19 virt kernel: [<ffffffff8135e9dd>] ? _spin_unlock_irq+0xd/0x40 Sep 15 14:55:19 virt kernel: [<ffffffff81151955>] ? generic_unplug_device+0x35/0x40 Sep 15 14:55:19 virt kernel: [<ffffffff811cf456>] ? unplug_queue+0x26/0x50 Sep 15 14:55:19 virt kernel: [<ffffffff811d001e>] ? blkif_schedule+0xde/0x320 Sep 15 14:55:19 virt kernel: [<ffffffff8105c530>] ? autoremove_wake_function+0x0/0x40 Sep 15 14:55:19 virt kernel: [<ffffffff8135ea42>] ? _spin_unlock_irqrestore+0x32/0x40 Sep 15 14:55:19 virt kernel: [<ffffffff811cff40>] ? blkif_schedule+0x0/0x320 Sep 15 14:55:19 virt kernel: [<ffffffff8105c24e>] ? kthread+0x8e/0xa0 Sep 15 14:55:19 virt kernel: [<ffffffff8101347a>] ? child_rip+0xa/0x20 Sep 15 14:55:19 virt kernel: [<ffffffff81012626>] ? int_ret_from_sys_call+0x7/0x1b Sep 15 14:55:19 virt kernel: [<ffffffff81012de1>] ? retint_restore_args+0x5/0x6 Sep 15 14:55:19 virt kernel: [<ffffffff81013470>] ? child_rip+0x0/0x20 Sep 15 14:55:19 virt kernel: ---[ end trace 6548e737c4c22ec9 ]--- Sep 15 14:55:19 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter Sep 15 14:55:19 virt kernel: eth0: port 1(peth0) entering disabled state Sep 15 14:55:19 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter Sep 15 14:55:22 virt kernel: e1000e: peth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Sep 15 14:55:22 virt kernel: eth0: port 1(peth0) entering forwarding state Sep 15 15:16:29 virt kernel: hrtimer: interrupt took 10082426 ns Sep 15 15:24:06 virt kernel: aacraid: Host adapter abort request (0,0,1,0) Sep 15 15:24:06 virt kernel: aacraid: Host adapter abort request (0,0,1,0) Sep 15 15:24:06 virt kernel: aacraid: Host adapter reset request. SCSI hang ? Sep 15 15:24:06 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter Sep 15 15:24:06 virt kernel: eth0: port 1(peth0) entering disabled state Sep 15 15:24:06 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter Sep 15 15:24:09 virt kernel: e1000e: peth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sheng Yang
2010-Sep-16 00:23 UTC
Re: [Xen-devel] Re: [PATCH] C6 state with EOI issue fix for some Intel processors
On Wednesday 15 September 2010 21:42:53 Andreas Kinzler wrote:> On 15.09.2010 10:03, Keir Fraser wrote: > >> This patch fix this issue, by checking if ISR is pending before enter > >> deep Cx state. If so, it would use power->safe_state instead of deep Cx > >> state to prevent > >> the above issue happen. > > > > Thanks. I reworked this patch substantially and applied as > > xen-unstable:22160 and xen-4.0-testing:21348. > > I tested the patch on vanilla 4.0.1 and it does help a bit. Uptime was > now over 100 minutes instead of under 3 minutes. But problems still > occurred (aacraid reset, eth reset).To determine if the issue was caused by the errata, you can try disable C6 state in the BIOS. This errata only happen with C6 state involved. -- regards Yang, Sheng> With my patch from > (http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html) > the machine uptime was over 10 days when I stopped the test. > > Regards Andreas > > Sep 15 14:55:19 virt kernel: ------------[ cut here ]------------ > Sep 15 14:55:19 virt kernel: WARNING: at net/sched/sch_generic.c:261 > dev_watchdog+0x220/0x230() > Sep 15 14:55:19 virt kernel: Hardware name: X8SIL > Sep 15 14:55:19 virt kernel: NETDEV WATCHDOG: peth0 (e1000e): transmit > queue 0 timed out > Sep 15 14:55:19 virt kernel: Modules linked in: bridge stp llc > iptable_filter xt_MARK xt_mark xt_iprange xt_conntrack nf_conntrack > ip_tables x_tables tun loop e1000e > Sep 15 14:55:19 virt kernel: Pid: 4088, comm: blkback.1.hdc Not tainted > 2.6.32.18-pvops0-ak3 #1 > Sep 15 14:55:19 virt kernel: Call Trace: > Sep 15 14:55:19 virt kernel: <IRQ> [<ffffffff810458f6>] > warn_slowpath_common+0x76/0xb0 > Sep 15 14:55:19 virt kernel: [<ffffffff8104598c>] > warn_slowpath_fmt+0x3c/0x40 > Sep 15 14:55:19 virt kernel: [<ffffffff812f1b70>] dev_watchdog+0x220/0x230 > Sep 15 14:55:19 virt kernel: [<ffffffff81050820>] ? mod_timer+0x110/0x180 > Sep 15 14:55:19 virt kernel: [<ffffffff81091c40>] ? > sync_supers_timer_fn+0x0/0x20 > Sep 15 14:55:19 virt kernel: [<ffffffff812f1950>] ? dev_watchdog+0x0/0x230 > Sep 15 14:55:19 virt kernel: [<ffffffff810502fc>] > run_timer_softirq+0x14c/0x230 > Sep 15 14:55:19 virt kernel: [<ffffffff8104b72f>] __do_softirq+0xaf/0x140 > Sep 15 14:55:19 virt kernel: [<ffffffff811c5c09>] ? > __xen_evtchn_do_upcall+0x219/0x230 > Sep 15 14:55:19 virt kernel: [<ffffffff8101357c>] call_softirq+0x1c/0x30 > Sep 15 14:55:19 virt kernel: [<ffffffff81015675>] do_softirq+0x65/0xa0 > Sep 15 14:55:19 virt kernel: [<ffffffff8104b3fd>] irq_exit+0x8d/0x90 > Sep 15 14:55:19 virt kernel: [<ffffffff811c5cdd>] > xen_evtchn_do_upcall+0x3d/0x60 > Sep 15 14:55:19 virt kernel: [<ffffffff810135ce>] > xen_do_hypervisor_callback+0x1e/0x30 > Sep 15 14:55:19 virt kernel: <EOI> [<ffffffff8100922a>] ? > hypercall_page+0x22a/0x1010 > Sep 15 14:55:19 virt kernel: [<ffffffff8100922a>] ? > hypercall_page+0x22a/0x1010 > Sep 15 14:55:19 virt kernel: [<ffffffff8100ed7d>] ? > xen_force_evtchn_callback+0xd/0x10 > Sep 15 14:55:19 virt kernel: [<ffffffff8100f712>] ? check_events+0x12/0x20 > Sep 15 14:55:19 virt kernel: [<ffffffff8100f6b9>] ? > xen_irq_enable_direct_end+0x0/0x7 > Sep 15 14:55:19 virt kernel: [<ffffffff8135e9dd>] ? > _spin_unlock_irq+0xd/0x40 > Sep 15 14:55:19 virt kernel: [<ffffffff81151955>] ? > generic_unplug_device+0x35/0x40 > Sep 15 14:55:19 virt kernel: [<ffffffff811cf456>] ? unplug_queue+0x26/0x50 > Sep 15 14:55:19 virt kernel: [<ffffffff811d001e>] ? > blkif_schedule+0xde/0x320 > Sep 15 14:55:19 virt kernel: [<ffffffff8105c530>] ? > autoremove_wake_function+0x0/0x40 > Sep 15 14:55:19 virt kernel: [<ffffffff8135ea42>] ? > _spin_unlock_irqrestore+0x32/0x40 > Sep 15 14:55:19 virt kernel: [<ffffffff811cff40>] ? > blkif_schedule+0x0/0x320 Sep 15 14:55:19 virt kernel: [<ffffffff8105c24e>] > ? kthread+0x8e/0xa0 Sep 15 14:55:19 virt kernel: [<ffffffff8101347a>] ? > child_rip+0xa/0x20 Sep 15 14:55:19 virt kernel: [<ffffffff81012626>] ? > int_ret_from_sys_call+0x7/0x1b > Sep 15 14:55:19 virt kernel: [<ffffffff81012de1>] ? > retint_restore_args+0x5/0x6 > Sep 15 14:55:19 virt kernel: [<ffffffff81013470>] ? child_rip+0x0/0x20 > Sep 15 14:55:19 virt kernel: ---[ end trace 6548e737c4c22ec9 ]--- > Sep 15 14:55:19 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter > Sep 15 14:55:19 virt kernel: eth0: port 1(peth0) entering disabled state > Sep 15 14:55:19 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter > Sep 15 14:55:22 virt kernel: e1000e: peth0 NIC Link is Up 1000 Mbps Full > Duplex, Flow Control: None > Sep 15 14:55:22 virt kernel: eth0: port 1(peth0) entering forwarding state > Sep 15 15:16:29 virt kernel: hrtimer: interrupt took 10082426 ns > Sep 15 15:24:06 virt kernel: aacraid: Host adapter abort request (0,0,1,0) > Sep 15 15:24:06 virt kernel: aacraid: Host adapter abort request (0,0,1,0) > Sep 15 15:24:06 virt kernel: aacraid: Host adapter reset request. SCSI > hang ? > Sep 15 15:24:06 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter > Sep 15 15:24:06 virt kernel: eth0: port 1(peth0) entering disabled state > Sep 15 15:24:06 virt kernel: e1000e 0000:04:00.0: peth0: Reset adapter > Sep 15 15:24:09 virt kernel: e1000e: peth0 NIC Link is Up 1000 Mbps Full > Duplex, Flow Control: None_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel