Anthony PERARD
2013-Dec-09 18:30 UTC
[PATCH] firmware: Fix vcpu hotplug race with qemu-xen
When hotplugging more than one vcpu, some of those vcpus might not be seen as plugged by the guest. By using edged-triggered General-Purpose Event instead of a level-triggered GPE, the race is avoided. This is in sync with how a different QEMU guest is handeling CPU hotplug event. There is more information about why in SeaBIOS''s commit 9c6635bd48d39a1d17d0a73df6e577ef6bd0037c. Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> --- tools/firmware/hvmloader/acpi/mk_dsdt.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tools/firmware/hvmloader/acpi/mk_dsdt.c b/tools/firmware/hvmloader/acpi/mk_dsdt.c index 996f30b..ecbe4cd 100644 --- a/tools/firmware/hvmloader/acpi/mk_dsdt.c +++ b/tools/firmware/hvmloader/acpi/mk_dsdt.c @@ -220,9 +220,13 @@ int main(int argc, char **argv) pop_block(); - /* Define GPE control method ''_L02''. */ + /* Define GPE control method. */ push_block("Scope", "\\_GPE"); - push_block("Method", "_L02"); + if (dm_version == QEMU_XEN) { + push_block("Method", "_E02"); + } else { + push_block("Method", "_L02"); + } stmt("Return", "\\_SB.PRSC()"); pop_block(); pop_block(); -- Anthony PERARD
>>> On 09.12.13 at 19:30, Anthony PERARD <anthony.perard@citrix.com> wrote: > When hotplugging more than one vcpu, some of those vcpus might not be > seen as plugged by the guest. > > By using edged-triggered General-Purpose Event instead of a > level-triggered GPE, the race is avoided. This is in sync with how a > different QEMU guest is handeling CPU hotplug event.What "different QEMU guest" are you talking about here? That second sentence is rather cryptic to me.> There is more information about why in SeaBIOS''s commit > 9c6635bd48d39a1d17d0a73df6e577ef6bd0037c.That commit''s description talks of reducing the chance of a race; it doesn''t say it''s being avoided completely. Can you clarify the discrepancy? Also, I personally think referring to a foreign tree''s commit in a commit message like this is putting too high expectations on the reader. Naming commit ID _and title_ would be nice, but you should really have quoted the core part of the explanation from there.> --- a/tools/firmware/hvmloader/acpi/mk_dsdt.c > +++ b/tools/firmware/hvmloader/acpi/mk_dsdt.c > @@ -220,9 +220,13 @@ int main(int argc, char **argv) > > pop_block(); > > - /* Define GPE control method ''_L02''. */ > + /* Define GPE control method. */ > push_block("Scope", "\\_GPE"); > - push_block("Method", "_L02"); > + if (dm_version == QEMU_XEN) { > + push_block("Method", "_E02"); > + } else { > + push_block("Method", "_L02"); > + }The commit description says nothing at all about why this needs to be done differently depending on dm_version. Furthermore, to more clearly reflect what the real difference is, I''d suggest using a conditional operator for passing the one argument that varies rather than using an if/else construct. Jan
Anthony PERARD
2013-Dec-11 16:12 UTC
Re: [PATCH] firmware: Fix vcpu hotplug race with qemu-xen
On Tue, Dec 10, 2013 at 08:43:20AM +0000, Jan Beulich wrote:> >>> On 09.12.13 at 19:30, Anthony PERARD <anthony.perard@citrix.com> wrote: > > When hotplugging more than one vcpu, some of those vcpus might not be > > seen as plugged by the guest. > > > > By using edged-triggered General-Purpose Event instead of a > > level-triggered GPE, the race is avoided. This is in sync with how a > > different QEMU guest is handeling CPU hotplug event. > > What "different QEMU guest" are you talking about here? That > second sentence is rather cryptic to me.Well, I''ve tested only KVM, but the ACPI table are provided by the SeaBIOS project. So the "different QEMU guest" are any guest that use the default machine (Standard PC (i440FX + PIIX, 1996)) with SeaBIOS and default ACPI tables.> > There is more information about why in SeaBIOS''s commit > > 9c6635bd48d39a1d17d0a73df6e577ef6bd0037c. > > That commit''s description talks of reducing the chance of a race; > it doesn''t say it''s being avoided completely. Can you clarify the > discrepancy?Yes, I did not really understand the change, but it is now clear, the change here does not fix the race. So there is no more discrepancy. I will rewrite the patch comment.> Also, I personally think referring to a foreign tree''s commit in a > commit message like this is putting too high expectations on the > reader. Naming commit ID _and title_ would be nice, but you > should really have quoted the core part of the explanation from > there.Yes, that make sense, I will fix that.> > --- a/tools/firmware/hvmloader/acpi/mk_dsdt.c > > +++ b/tools/firmware/hvmloader/acpi/mk_dsdt.c > > @@ -220,9 +220,13 @@ int main(int argc, char **argv) > > > > pop_block(); > > > > - /* Define GPE control method ''_L02''. */ > > + /* Define GPE control method. */ > > push_block("Scope", "\\_GPE"); > > - push_block("Method", "_L02"); > > + if (dm_version == QEMU_XEN) { > > + push_block("Method", "_E02"); > > + } else { > > + push_block("Method", "_L02"); > > + } > > The commit description says nothing at all about why this needs to > be done differently depending on dm_version.I though it did, the title have "with qemu-xen". I want to avoid changing the table for qemu-traditional where the race have been fix in qemu. Thanks, -- Anthony PERARD
Anthony PERARD
2013-Dec-12 15:30 UTC
[PATCH V2] firmware: Change level-triggered GPE event to a edge one for qemu-xen
This should help to reduce a CPU hotplug race window where a cpu hotplug event while not be seen by the OS. When hotplugging more than one vcpu, some of those vcpus might not be seen as plugged by the guest. This is what is currently happenning: 1. hw adds cpu, sets GPE.2 bit and sends SCI 2. OSPM gets SCI, reads GPE00.sts and masks GPE.2 bit in GPE00.en 3. OSPM executes _L02 (level-triggered event associate to cpu hotplug) 4. hw adds second cpu and sets GPE.2 bit but SCI is not asserted since GPE00.en masks event 5. OSPM resets GPE.2 bit in GPE00.sts and umasks it in GPE00.en as result event for step 4 is lost because step 5 clears it and OS will not see added second cpu. ACPI 50 spec: 5.6.4 General-Purpose Event Handling defines GPE event handling as following: 1. Disables the interrupt source (GPEx_BLK EN bit). 2. If an edge event, clears the status bit. 3. Performs one of the following: * Dispatches to an ACPI-aware device driver. * Queues the matching control method for execution. * Manages a wake event using device _PRW objects. 4. If a level event, clears the status bit. 5. Enables the interrupt source. So, by using edge-triggered General-Purpose Event instead of a level-triggered GPE, OSPM is less likely to clear the status bit of the addition of the second CPU. On step 5, QEMU will resend an interrupt if the status bit is set. This description apply also for PCI hotplug since the same step are follow by QEMU, so we also change the GPE event type for PCI hotplug. Patch and description inspired by SeaBIOS''s commit: Replace level gpe event with edge gpe event for hot-plug handlers 9c6635bd48d39a1d17d0a73df6e577ef6bd0037c from Igor Mammedov <imammedo@redhat.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> --- Change in V2: - better patch comment: patch those not fix race, but reduce the window include patch description of the quoted commit - change also apply to pci hotplug. --- tools/firmware/hvmloader/acpi/mk_dsdt.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/tools/firmware/hvmloader/acpi/mk_dsdt.c b/tools/firmware/hvmloader/acpi/mk_dsdt.c index 996f30b..a4b693b 100644 --- a/tools/firmware/hvmloader/acpi/mk_dsdt.c +++ b/tools/firmware/hvmloader/acpi/mk_dsdt.c @@ -220,9 +220,13 @@ int main(int argc, char **argv) pop_block(); - /* Define GPE control method ''_L02''. */ + /* Define GPE control method. */ push_block("Scope", "\\_GPE"); - push_block("Method", "_L02"); + if (dm_version == QEMU_XEN_TRADITIONAL) { + push_block("Method", "_L02"); + } else { + push_block("Method", "_E02"); + } stmt("Return", "\\_SB.PRSC()"); pop_block(); pop_block(); @@ -428,7 +432,7 @@ int main(int argc, char **argv) decision_tree(0x00, 0x100, "SLT", pci_hotplug_notify); pop_block(); } else { - push_block("Method", "_L01"); + push_block("Method", "_E01"); for (slot = 1; slot <= 31; slot++) { push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot); stmt("Notify", "\\_SB.PCI0.S%i, 1", slot); -- Anthony PERARD
Jan Beulich
2013-Dec-13 11:51 UTC
[PATCH V2] firmware: Change level-triggered GPE event to a edge one for qemu-xen
>>> On 12.12.13 at 16:30, Anthony PERARD <anthony.perard@citrix.com> wrote: > Change in V2: > - better patch comment: > patch those not fix race, but reduce the window > include patch description of the quoted commitThanks - quite a bit better to understand.> - change also apply to pci hotplug.The one thing I''m still missing for both changes is a brief word on why qemu-xen-traditional doesn''t want/need this and - as iirc you said - if that one manages to get this implemented without a similar race, why upstream qemu can''t do things in a similar way. Also iirc someone responded to v1 with a comment saying a race like this is an OS bug - I don#t think I''ve seen a response to this. Jan
Anthony PERARD
2013-Dec-13 17:11 UTC
Re: [PATCH V2] firmware: Change level-triggered GPE event to a edge one for qemu-xen
On Fri, Dec 13, 2013 at 11:51:19AM +0000, Jan Beulich wrote:> >>> On 12.12.13 at 16:30, Anthony PERARD <anthony.perard@citrix.com> wrote: > > Change in V2: > > - better patch comment: > > patch those not fix race, but reduce the window > > include patch description of the quoted commit > > Thanks - quite a bit better to understand. > > > - change also apply to pci hotplug. > > The one thing I''m still missing for both changes is a brief word on > why qemu-xen-traditional doesn''t want/need this and - as iirc > you said - if that one manages to get this implemented without a > similar race, why upstream qemu can''t do things in a similar way.In qemu-trad, instead of sending an SCI interrupt for every new vcpu, we loop through every xenstore key that represent vcpus availability and only send an SCI only when the loop is over. And it looks like one `xl vcpu-set` provoc only one loop. But adding a vcpu in qemu-xen is done via a QMP command, "cpu-add id=X". qemu-xen have no way to know if there will be a next cpu-add command, so we can not apply the same thing. For why qemu-xen-traditional doesn''t not need this: - a single `xl vcpu-set` can not trigger the race For why qemu-xen-traditional doesn''t not want this: - avoid unnecessary change, especially in the ACPI table About the PCI hotplug change and qemu-xen-traditional, I actually don''t know what would be the effect. Why qemu-xen-traditional would like this change for vcpu only: - we might hit the race window via several consecutive `xl vcpu-set` calls, or via direct change of xenstore entries. but after few try, I could not manage to have a missing cpu hotplug event. So should I just add in the patch comment something like: "Since some measure have been put in place in qemu-xen-traditional against this race, we don''t change the ACPI table for it." ?> Also iirc someone responded to v1 with a comment saying a race > like this is an OS bug - I don#t think I''ve seen a response to this.I can not find this comment. But the fact that a race can happen does not look like an OS bug. Also, I''ve discrabed in the patch comment the behavior of QEMU (hardware) and what an OS is suppose to do with the AML as specified in the ACPI document. -- Anthony PERARD
Ian Jackson
2013-Dec-13 17:32 UTC
Re: [PATCH V2] firmware: Change level-triggered GPE event to a edge one for qemu-xen
Anthony PERARD writes ("Re: [PATCH V2] firmware: Change level-triggered GPE event to a edge one for qemu-xen"):> For why qemu-xen-traditional doesn''t not need this: > - a single `xl vcpu-set` can not trigger the race > > For why qemu-xen-traditional doesn''t not want this: > - avoid unnecessary change, especially in the ACPI table > > About the PCI hotplug change and qemu-xen-traditional, I actually don''t > know what would be the effect. > > Why qemu-xen-traditional would like this change for vcpu only: > - we might hit the race window via several consecutive `xl vcpu-set` > calls, or via direct change of xenstore entries. > but after few try, I could not manage to have a missing cpu hotplug > event.I think this suggests to me that we should take this patch. There''s no point leaving races sitting there to bite people, even if we think they''re theoretical. Ian.
Konrad Rzeszutek Wilk
2013-Dec-13 17:44 UTC
Re: [PATCH V2] firmware: Change level-triggered GPE event to a edge one for qemu-xen
On Fri, Dec 13, 2013 at 05:11:58PM +0000, Anthony PERARD wrote:> On Fri, Dec 13, 2013 at 11:51:19AM +0000, Jan Beulich wrote: > > >>> On 12.12.13 at 16:30, Anthony PERARD <anthony.perard@citrix.com> wrote: > > > Change in V2: > > > - better patch comment: > > > patch those not fix race, but reduce the window > > > include patch description of the quoted commit > > > > Thanks - quite a bit better to understand. > > > > > - change also apply to pci hotplug. > > > > The one thing I''m still missing for both changes is a brief word on > > why qemu-xen-traditional doesn''t want/need this and - as iirc > > you said - if that one manages to get this implemented without a > > similar race, why upstream qemu can''t do things in a similar way. > > In qemu-trad, instead of sending an SCI interrupt for every new vcpu, we > loop through every xenstore key that represent vcpus availability and > only send an SCI only when the loop is over. And it looks like one `xl > vcpu-set` provoc only one loop. > > But adding a vcpu in qemu-xen is done via a QMP command, "cpu-add id=X". > qemu-xen have no way to know if there will be a next cpu-add command, so > we can not apply the same thing.Can it have a workqueue (does such thing exist in QEMU?) with a list - so that every time you get an cpu-add it puts the ''vcpuX'' on this command list and the thread wakes up, reads up all of the commands it needs, and then dispatches it? That could also be used for VCPU hotplug .. Actually it could be used for any QMP command.> > For why qemu-xen-traditional doesn''t not need this: > - a single `xl vcpu-set` can not trigger the race > > For why qemu-xen-traditional doesn''t not want this: > - avoid unnecessary change, especially in the ACPI tableThat is not right. The reason I didn''t do it is b/c it was not enough. I could still trigger the the race with the change.> > About the PCI hotplug change and qemu-xen-traditional, I actually don''t > know what would be the effect. > > Why qemu-xen-traditional would like this change for vcpu only: > - we might hit the race window via several consecutive `xl vcpu-set` > calls, or via direct change of xenstore entries. > but after few try, I could not manage to have a missing cpu hotplug > event. > > > So should I just add in the patch comment something like: > "Since some measure have been put in place in qemu-xen-traditional > against this race, we don''t change the ACPI table for it." > ? > > > Also iirc someone responded to v1 with a comment saying a race > > like this is an OS bug - I don#t think I''ve seen a response to this. > > I can not find this comment. But the fact that a race can happen does > not look like an OS bug. Also, I''ve discrabed in the patch comment the > behavior of QEMU (hardware) and what an OS is suppose to do with the > AML as specified in the ACPI document. > > -- > Anthony PERARD
Anthony PERARD
2013-Dec-13 18:17 UTC
Re: [PATCH V2] firmware: Change level-triggered GPE event to a edge one for qemu-xen
On Fri, Dec 13, 2013 at 12:44:08PM -0500, Konrad Rzeszutek Wilk wrote:> On Fri, Dec 13, 2013 at 05:11:58PM +0000, Anthony PERARD wrote: > > On Fri, Dec 13, 2013 at 11:51:19AM +0000, Jan Beulich wrote: > > > >>> On 12.12.13 at 16:30, Anthony PERARD <anthony.perard@citrix.com> wrote: > > > > Change in V2: > > > > - better patch comment: > > > > patch those not fix race, but reduce the window > > > > include patch description of the quoted commit > > > > > > Thanks - quite a bit better to understand. > > > > > > > - change also apply to pci hotplug. > > > > > > The one thing I''m still missing for both changes is a brief word on > > > why qemu-xen-traditional doesn''t want/need this and - as iirc > > > you said - if that one manages to get this implemented without a > > > similar race, why upstream qemu can''t do things in a similar way. > > > > In qemu-trad, instead of sending an SCI interrupt for every new vcpu, we > > loop through every xenstore key that represent vcpus availability and > > only send an SCI only when the loop is over. And it looks like one `xl > > vcpu-set` provoc only one loop. > > > > But adding a vcpu in qemu-xen is done via a QMP command, "cpu-add id=X". > > qemu-xen have no way to know if there will be a next cpu-add command, so > > we can not apply the same thing. > > Can it have a workqueue (does such thing exist in QEMU?) with a list - so > that every time you get an cpu-add it puts the ''vcpuX'' on this command list > and the thread wakes up, reads up all of the commands it needs, and then > dispatches it? > > That could also be used for VCPU hotplug .. Actually it could be used > for any QMP command. > > > > > For why qemu-xen-traditional doesn''t not need this: > > - a single `xl vcpu-set` can not trigger the race > > > > For why qemu-xen-traditional doesn''t not want this: > > - avoid unnecessary change, especially in the ACPI table > > That is not right. The reason I didn''t do it is b/c it was not enough. > I could still trigger the the race with the change.Oh, I see, so this change won''t be very useful to qemu-trad. I did not read qemu-trad''s code properly. Thanks, -- Anthony PERARD
Anthony PERARD
2013-Dec-13 19:17 UTC
[PATCH V3] firmware: Change level-triggered GPE event to a edge one for qemu-xen
This should help to reduce a CPU hotplug race window where a cpu hotplug event while not be seen by the OS. When hotplugging more than one vcpu, some of those vcpus might not be seen as plugged by the guest. This is what is currently happenning: 1. hw adds cpu, sets GPE.2 bit and sends SCI 2. OSPM gets SCI, reads GPE00.sts and masks GPE.2 bit in GPE00.en 3. OSPM executes _L02 (level-triggered event associate to cpu hotplug) 4. hw adds second cpu and sets GPE.2 bit but SCI is not asserted since GPE00.en masks event 5. OSPM resets GPE.2 bit in GPE00.sts and umasks it in GPE00.en as result event for step 4 is lost because step 5 clears it and OS will not see added second cpu. ACPI 50 spec: 5.6.4 General-Purpose Event Handling defines GPE event handling as following: 1. Disables the interrupt source (GPEx_BLK EN bit). 2. If an edge event, clears the status bit. 3. Performs one of the following: * Dispatches to an ACPI-aware device driver. * Queues the matching control method for execution. * Manages a wake event using device _PRW objects. 4. If a level event, clears the status bit. 5. Enables the interrupt source. So, by using edge-triggered General-Purpose Event instead of a level-triggered GPE, OSPM is less likely to clear the status bit of the addition of the second CPU. On step 5, QEMU will resend an interrupt if the status bit is set. This description apply also for PCI hotplug since the same step are followed by QEMU, so we also change the GPE event type for PCI hotplug. This does not apply to qemu-xen-traditional because it does not resend an interrupt if necessary as a result of step 5. Patch and description inspired by SeaBIOS''s commit: Replace level gpe event with edge gpe event for hot-plug handlers 9c6635bd48d39a1d17d0a73df6e577ef6bd0037c from Igor Mammedov <imammedo@redhat.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> --- Change in V3: - description for: does not apply to qemu-dm Change in V2: - better patch comment: patch does not fix race, but reduce the window include patch description of the quoted commit - change also apply to pci hotplug. --- tools/firmware/hvmloader/acpi/mk_dsdt.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/tools/firmware/hvmloader/acpi/mk_dsdt.c b/tools/firmware/hvmloader/acpi/mk_dsdt.c index 996f30b..a4b693b 100644 --- a/tools/firmware/hvmloader/acpi/mk_dsdt.c +++ b/tools/firmware/hvmloader/acpi/mk_dsdt.c @@ -220,9 +220,13 @@ int main(int argc, char **argv) pop_block(); - /* Define GPE control method ''_L02''. */ + /* Define GPE control method. */ push_block("Scope", "\\_GPE"); - push_block("Method", "_L02"); + if (dm_version == QEMU_XEN_TRADITIONAL) { + push_block("Method", "_L02"); + } else { + push_block("Method", "_E02"); + } stmt("Return", "\\_SB.PRSC()"); pop_block(); pop_block(); @@ -428,7 +432,7 @@ int main(int argc, char **argv) decision_tree(0x00, 0x100, "SLT", pci_hotplug_notify); pop_block(); } else { - push_block("Method", "_L01"); + push_block("Method", "_E01"); for (slot = 1; slot <= 31; slot++) { push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot); stmt("Notify", "\\_SB.PCI0.S%i, 1", slot); -- Anthony PERARD
Jan Beulich
2013-Dec-16 07:51 UTC
Re: [PATCH V2] firmware: Change level-triggered GPE event to a edge one for qemu-xen
>>> On 13.12.13 at 18:11, Anthony PERARD <anthony.perard@citrix.com> wrote: > On Fri, Dec 13, 2013 at 11:51:19AM +0000, Jan Beulich wrote: >> >>> On 12.12.13 at 16:30, Anthony PERARD <anthony.perard@citrix.com> wrote: >> > Change in V2: >> > - better patch comment: >> > patch those not fix race, but reduce the window >> > include patch description of the quoted commit >> >> Thanks - quite a bit better to understand. >> >> > - change also apply to pci hotplug. >> >> The one thing I''m still missing for both changes is a brief word on >> why qemu-xen-traditional doesn''t want/need this and - as iirc >> you said - if that one manages to get this implemented without a >> similar race, why upstream qemu can''t do things in a similar way. > > In qemu-trad, instead of sending an SCI interrupt for every new vcpu, we > loop through every xenstore key that represent vcpus availability and > only send an SCI only when the loop is over. And it looks like one `xl > vcpu-set` provoc only one loop. > > But adding a vcpu in qemu-xen is done via a QMP command, "cpu-add id=X". > qemu-xen have no way to know if there will be a next cpu-add command, so > we can not apply the same thing. > > For why qemu-xen-traditional doesn''t not need this: > - a single `xl vcpu-set` can not trigger the race > > For why qemu-xen-traditional doesn''t not want this: > - avoid unnecessary change, especially in the ACPI table > > About the PCI hotplug change and qemu-xen-traditional, I actually don''t > know what would be the effect. > > Why qemu-xen-traditional would like this change for vcpu only: > - we might hit the race window via several consecutive `xl vcpu-set` > calls, or via direct change of xenstore entries. > but after few try, I could not manage to have a missing cpu hotplug > event. > > > So should I just add in the patch comment something like: > "Since some measure have been put in place in qemu-xen-traditional > against this race, we don''t change the ACPI table for it." > ?Yes please. Jan