When hotplug hvm vcpu by ''xm vcpu-set'' command, if it add/remove many vcpus by 1 ''xm vcpu-set'' command, it has a bug that it cannot add/remove all vcpus that want to be added/removed. This patch is to fix the bug. It delays trigger sci until all xenstore cpu node status are watched. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liu, Jinsong writes ("[Xen-devel] [PATCH] Fix hvm vcpu hotplug bug"):> When hotplug hvm vcpu by ''xm vcpu-set'' command, if it add/remove > many vcpus by 1 ''xm vcpu-set'' command, it has a bug that it cannot > add/remove all vcpus that want to be added/removed. > This patch is to fix the bug. It delays trigger sci until all xenstore > cpu node status are watched.This patch seems to arrange to take multiple CPU hot-add/remove events and coalesce them into a single event. It is obvious how this avoids triggering a race, but I''m not convinced that it''s a correct fix. The core problem seems to be that somehow the SCI IRQ is lost ? Perhaps the real problem is this code: qemu_set_irq(sci_irq, 1); qemu_set_irq(sci_irq, 0); I''m not familiar with the way SCI is supposed to work but clearing the irq in the qemu add/remove function seems wrong. Surely the host should clear the interrupt when it has serviced the interrupt. Can you explain what I''m missing ? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson wrote:> Liu, Jinsong writes ("[Xen-devel] [PATCH] Fix hvm vcpu hotplug bug"): >> When hotplug hvm vcpu by ''xm vcpu-set'' command, if it add/remove >> many vcpus by 1 ''xm vcpu-set'' command, it has a bug that it cannot >> add/remove all vcpus that want to be added/removed. >> This patch is to fix the bug. It delays trigger sci until all >> xenstore cpu node status are watched. > > This patch seems to arrange to take multiple CPU hot-add/remove events > and coalesce them into a single event. It is obvious how this avoids > triggering a race, but I''m not convinced that it''s a correct fix.It''s used to avoid inconsistency of cpu status map (producer: qemu watch xenstore cpu nodes; customer: SCI \_L02 control method), so it delay trigger SCI until all cpu node are watched.> > The core problem seems to be that somehow the SCI IRQ is lost ? > Perhaps the real problem is this code: > qemu_set_irq(sci_irq, 1); > qemu_set_irq(sci_irq, 0); > > I''m not familiar with the way SCI is supposed to work but clearing the > irq in the qemu add/remove function seems wrong. Surely the host > should clear the interrupt when it has serviced the interrupt. > > Can you explain what I''m missing ? > > Ian.Yes, you are right. In fact, it make me puzzling how and when to drop sci_irq to 0. According to acpi spec, there are 2 level logic: gpe and sci. 1). gpe_en and gpe_sts ''AND'' to trigger a gpe event (like pci-hotplug, cpu-hotplug); 2). multi gpe event ''OR'' to trigger sci, which now wired to i8259[9]; Current qemu-xen implement gpe logic, and directly wired gpe to i8259[9]. Since qemu-xen now only support pci-hotplug event, it can work doing so. However, if we want to support multi hotplug event, qemu-xen now didn''t have ''gpe events OR to trigger sci'' logic. I think qemu-xen should add this logic level, so that it can support more gpe events in the future. BTW, at qemu-kvm, it do same as our current patch: qemu_set_irq(sci_irq, 1); qemu_set_irq(sci_irq, 0); it triggers a sci pulse and works fine. Thanks, Jinsong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liu, Jinsong writes ("RE: [Xen-devel] [PATCH] Fix hvm vcpu hotplug bug"):> I think qemu-xen should add this logic level, so that it can support > more gpe events in the future.Yes. But what clears the interrupt ? Is it edge or level triggered ?> BTW, at qemu-kvm, it do same as our current patch: > qemu_set_irq(sci_irq, 1); > qemu_set_irq(sci_irq, 0); > it triggers a sci pulse and works fine.If this works fine it is just lucky, I think. It has the same race as the one you''ve identified. Batching up the different events is just a way to make the race probably not happen in that particular case. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson wrote:> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH] Fix hvm vcpu hotplug > bug"): >> I think qemu-xen should add this logic level, so that it can support >> more gpe events in the future. > > Yes. But what clears the interrupt ? Is it edge or level triggered ? >SCI is a shareable level interrupt. Per my understanding, sci-gpe logic is like |------- gpe_sts x |-----AND | |------- gpe_en x sci int | ------------------------------------ OR ...... (wired to i8259[9] etc) | | |------- gpe_sts y |-----AND |------- gpe_en y Currently qemu-xen didn''t have ''gep event x OR gpe event y --> sci'' logic, it should add this logic level. To clear the interrupt: 1). gpe_sts x & gpe_en x --> low gpe event x 2). all gpe events low --> low sci (low is of logic meaning here, not real electric value) Thanks, Jinsong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liu, Jinsong writes ("RE: [Xen-devel] [PATCH] Fix hvm vcpu hotplug bug"):> Per my understanding, sci-gpe logic is like > [diagram] > > Currently qemu-xen didn''t have ''gep event x OR gpe event y --> sci'' > logic, it should add this logic level.Yes, thanks, I''d understood that. But:> To clear the interrupt: > 1). gpe_sts x & gpe_en x --> low gpe event xI think you mean "gpe_sts goes low" ? Since your diagram is in terms of levels and I guess "sts" is the actual interrupt source, and "sts_en" is the interrupt enable flag. So in real hardware, what makes gpe_sts go low ? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson wrote:> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH] Fix hvm vcpu hotplug > bug"): >> Per my understanding, sci-gpe logic is like >> [diagram] >> >> Currently qemu-xen didn''t have ''gep event x OR gpe event y --> sci'' >> logic, it should add this logic level. > > Yes, thanks, I''d understood that. But: > >> To clear the interrupt: >> 1). gpe_sts x & gpe_en x --> low gpe event x > > I think you mean "gpe_sts goes low" ? Since your diagram is in terms > of levels and I guess "sts" is the actual interrupt source, and > "sts_en" is the interrupt enable flag.Yes.> > So in real hardware, what makes gpe_sts go low ? > > Ian.ospm will do. Its sequence is (i.e. linux 2.6.32, ospm dispatch control method case) 1). clear pge_en x by writing ''0'' to the bit; 2). asynchronic call control method; 3). clear gpe_sts x by writing ''1'' to the bit; 4). re-enable gpe_en x by writing ''1'' to the bit; Thanks, Jinsong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liu, Jinsong writes ("RE: [Xen-devel] [PATCH] Fix hvm vcpu hotplug bug"):> ospm will do. Its sequence is (i.e. linux 2.6.32, ospm dispatch > control method case)"ospm" is "OS power management" ie in our case the guest kernel I take it ?> 1). clear pge_en x by writing ''0'' to the bit; > 2). asynchronic call control method; > 3). clear gpe_sts x by writing ''1'' to the bit; > 4). re-enable gpe_en x by writing ''1'' to the bit;So the code in qemu should never clear gpe_sts itself. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson wrote:> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH] Fix hvm vcpu hotplug > bug"): >> ospm will do. Its sequence is (i.e. linux 2.6.32, ospm dispatch >> control method case) > > "ospm" is "OS power management" ie in our case the guest kernel I take > it ?hmm, ''ospm'' is just a generial speaking, more accurate speaking is ''acpica''. i.e. hvm linux 2.6.32 handle sci/gpe at: drivers/acpi/acpica/evsci.c drivers/acpi/acpica/evgpe.c> >> 1). clear pge_en x by writing ''0'' to the bit; >> 2). asynchronic call control method; >> 3). clear gpe_sts x by writing ''1'' to the bit; >> 4). re-enable gpe_en x by writing ''1'' to the bit; > > So the code in qemu should never clear gpe_sts itself. > > Ian.No, that''s just what qemu should do, to simulate hardware behavior. Thanks, Jinsong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liu, Jinsong writes ("RE: [Xen-devel] [PATCH] Fix hvm vcpu hotplug bug"):> Ian Jackson wrote: > > Liu, Jinsong writes ("RE: [Xen-devel] [PATCH] Fix hvm vcpu hotplug bug"): > >> [linux/drivers/acpi/acpica/ will:] > >> 1). clear pge_en x by writing ''0'' to the bit; > >> 2). asynchronic call control method; > >> 3). clear gpe_sts x by writing ''1'' to the bit; > >> 4). re-enable gpe_en x by writing ''1'' to the bit; > > > > So the code in qemu should never clear gpe_sts itself. > > No, that''s just what qemu should do, to simulate hardware behavior.Obviously I still haven''t understood. If the guest kernel driver is supposed to clear this bit as you seem to say above, then it should not be cleared automatically by qemu-dm as part of the hotplug notification. Obviously the register ought to be emulated by qemu and the bit ought to be cleared in qemu when the kernel driver writes an inactive value to it. None of this seems to be implemented in qemu right now AFAICT ? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson wrote:> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH] Fix hvm vcpu hotplug > bug"): >> Ian Jackson wrote: >>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH] Fix hvm vcpu hotplug >>> bug"): >>>> [linux/drivers/acpi/acpica/ will:] >>>> 1). clear pge_en x by writing ''0'' to the bit; >>>> 2). asynchronic call control method; >>>> 3). clear gpe_sts x by writing ''1'' to the bit; >>>> 4). re-enable gpe_en x by writing ''1'' to the bit; >>> >>> So the code in qemu should never clear gpe_sts itself. >> >> No, that''s just what qemu should do, to simulate hardware behavior. > > Obviously I still haven''t understood. If the guest kernel driver is > supposed to clear this bit as you seem to say above, then it should > not be cleared automatically by qemu-dm as part of the hotplug > notification.Of course hotplug notification will not clear gpe_sts. At qemu side, ''hotplug notification'' and ''clear gpe_sts'' are asynchronic: 1) qemu: hotplug notification trigger sci, 2) guest kernel: a). clear pge_en x by writing ''0'' to the bit; b). asynchronically clear gpe_sts x by writing ''1'' to the bit; c). re-enable gpe_en x by writing ''1'' to the bit; 3) qemu: gpe_en_write/gpe_sts_write will simulate the hardware action to set or clear gpe_en/gpe_sts. What I mean is, currently at qmeu side, it need add the logic of ''multi gpe --> OR --> sci'' logic. (i.e, at current qemu hw/piix4acpi.c, gpe_sts_write and gpe_en_write cannot handle multi gpe). Only after add the logic, it could be clean to handle ''vcpu hotplug'' at qmeu. Thanks, Jinsong> > Obviously the register ought to be emulated by qemu and the bit ought > to be cleared in qemu when the kernel driver writes an inactive value > to it. > > None of this seems to be implemented in qemu right now AFAICT ? > > Ian._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Reasonably Related Threads
- [PATCH] qemu-traditional - ACPI vCPU hotplug fixes for Xen 4.3 (v2).
- [PATCH] Fix QEMU HVM hotplug race in QEMU traditional (Xen 4.1, Xen 4.2, and Xen 4.3) (v1).
- [PATCH 2/2] RFC: Xen pad logic
- [PATCH v3 0/6] initial suspend support
- ACPI fixed event or General Purpose Event to HVM guest.