Li, Xin B
2006-Jun-23 05:14 UTC
[Xen-devel] [PATCH] [HVM] Fix virtual apic irq distribution
Fix virtual apic irq distribution. But currently we inject PIT irqs to cpu0 only. Also mute some warning messages. Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: Xin Li <xin.b.li@intel.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jun-27 09:53 UTC
Re: [Xen-devel] [PATCH] [HVM] Fix virtual apic irq distribution
On 23 Jun 2006, at 06:14, Li, Xin B wrote:> Fix virtual apic irq distribution. > But currently we inject PIT irqs to cpu0 only. Also mute some warning > messages.Does anything break if we inject PIT irqs to other than cpu0? It seems an odd restriction -- native hardware wouldn''t treat that line specially if IRQ0 is routed through the IO-APIC, would it? I think the patch is okay apart from that. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li, Xin B
2006-Jun-27 11:39 UTC
RE: [Xen-devel] [PATCH] [HVM] Fix virtual apic irq distribution
>> Fix virtual apic irq distribution. >> But currently we inject PIT irqs to cpu0 only. Also mute some warning >> messages. > >Does anything break if we inject PIT irqs to other than cpu0? It seems >an odd restriction -- native hardware wouldn''t treat that line >specially if IRQ0 is routed through the IO-APIC, would it?You''re right, on native hardware, PIT irq can be routed to any processor, and usually PIT irq handler will keep OS time by checking TSC. On native hardware, TSC are naturally synchronized across processors, so it won''t be trouble. But on our VM, we will have to synchronize TSC from time to time, so PIT irq handler on different vcpu may see big TSC diff and complain about the unreliable TSC, then maybe it will try to do TSC sync, which make guest time keeping complex and unreliable. Ideally, we should not have this restriction, but for now I think this can make guest time keeping simple and reliable, and I think Eddie should have more comments on this :-) -Xin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jun-27 12:52 UTC
Re: [Xen-devel] [PATCH] [HVM] Fix virtual apic irq distribution
On 27 Jun 2006, at 12:39, Li, Xin B wrote:> But on our VM, we will have to synchronize TSC from time to time, so > PIT > irq handler on different vcpu may see big TSC diff and complain about > the unreliable TSC, then maybe it will try to do TSC sync, which make > guest time keeping complex and unreliable.How out-of-sync do TSCs of different VCPUs get? I''d like to see a non-optimised TSC mode in which RDTSC always vmexits and we implement constant-rate always-sync''ed TSC in Xen. It''d be good to see if we could at least get that time mode working properly, and we''ll need it anyway for save/restore/migration between machines with (even only slightly) different clock speeds or the guest will get very confused. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dong, Eddie
2006-Jun-28 03:54 UTC
RE: [Xen-devel] [PATCH] [HVM] Fix virtual apic irq distribution
Keir: This has a long history behind this:-) It is not caused by TSC hardware acceleration. But how we handle periodic time IRQ. In X86 VMM, the guest platform time like PIT, RTC has an periodic IRQ (same for guest local APIC time) whose pending interrupt is accumulated (pending_intr_nr) if the host time goes more. In this periodic time model, the guest time seen can be explained like: Pit_time = total_pit_int_nr * T0 + CNT0 + Off0 T0: period CNT0: The time elapsed since last IRQ fires. It can be known from the “Counter” register CNT0 = (LATCH-1- “Counter”)* T0 / LATCH Off0: A constant offset standing for the time when 1st IRQ happens. Here CNT0 is always less than T0. Say guest see guest_pit0 at last PIT IRQ injection time, now before we inject the next PIT IRQ, the guest time seen from guest PIT is actually limited within (guest_pit0, guest_pit0 + T0) no matter how many physical time elapsed. If we sync guest TSC with host TSC using a fixed OFFSET, the guest time seen from TSC may be much ahead than guest PIT time if the host time elapsed a lot. Due to this, we freeze guest time (TSC time) at domain switch time to solve the problem. It works fine for UP. In SMP guest, each VP has its TSC time and local APIC time which we need to sync them together like above UP approach. Ideally this per VP time should sync with platform time too like PIT (and thus all VP guest time is synced). But the problem is that some VPs may be deactive and thus no way to inject perioidic time IRQ no matter for LOCAL APIC time or PIT time. The solution will be very complicated to this and it may need to change scheduler to do switch in/out with affinity for VPs in same VM. After some investigation, we think this is too complicate for now, and we want to go with simple solution first, i.e. Each guest time is synced, but the guest time among different VP is out of sync if the pending_intr_nr is not 0. In this way the guest application (may migrate among processors) is OK because pending_intr_nr is always 0 before a guest application get executed on a VP, and thus the guest application see persistently guest time going ahead even it is migrated to anothre VP. We assume guest kernel critical path will not migrate between processors. With this approach, the platform time (PIT here) must be pinned to one processor (VP0 here), and guest TSC, PIT & local APIC time are synced in VP0 (other VPs only sync TSC time and local APIC time). If it is routed to different VP dynamically say VP1, then the guest time seen within same VP (here VP1) will be not synced. That will cause the "well-known" guest lost too many ticks issue (refer CSET: 7478, 9324). thx,eddie -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser Sent: 2006年6月27日 20:53 To: Li, Xin B Cc: xen-devel@lists.xensource.com Subject: Re: [Xen-devel] [PATCH] [HVM] Fix virtual apic irq distribution On 27 Jun 2006, at 12:39, Li, Xin B wrote:> But on our VM, we will have to synchronize TSC from time to time, so > PIT > irq handler on different vcpu may see big TSC diff and complain about > the unreliable TSC, then maybe it will try to do TSC sync, which make > guest time keeping complex and unreliable.How out-of-sync do TSCs of different VCPUs get? I''d like to see a non-optimised TSC mode in which RDTSC always vmexits and we implement constant-rate always-sync''ed TSC in Xen. It''d be good to see if we could at least get that time mode working properly, and we''ll need it anyway for save/restore/migration between machines with (even only slightly) different clock speeds or the guest will get very confused. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jun-28 07:00 UTC
Re: [Xen-devel] [PATCH] [HVM] Fix virtual apic irq distribution
On 28 Jun 2006, at 04:54, Dong, Eddie wrote:> In SMP guest, each VP has its TSC time and local APIC time which we > need to sync them together like above UP approach. Ideally this per VP > time should sync with platform time too like PIT (and thus all VP > guest time is synced). But the problem is that some VPs may be > deactive and thus no way to inject perioidic time IRQ no matter for > LOCAL APIC time or PIT time. The solution will be very complicated to > this and it may need to change scheduler to do switch in/out with > affinity for VPs in same VM.Yes, it sounds like PIT, LAPIC and TSC of all VCPUs will need to lag in sync with furthest-behind LAPIC or TSC of any VCPU. Yield-to in scheduler will be useful here. Thanks for the detailed explanation. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel