Qing He
2009-Nov-06 09:17 UTC
[Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write
Current pt_pci_write_config always writes back to real pci conf space. However, in the case of MSI address and data registers, if guest changes the affinity of the interrupt, stale data will be written to these registers. This is particularly a problem if Xen uses per-CPU vector, where the interrupt in question fails to work. This patch fixes this by adding an option to disable the write back of certain controls. Signed-off-by: Qing He <qing.he@intel.com> --- diff --git a/hw/pass-through.c b/hw/pass-through.c index 8d80755..b1a3b09 100644 --- a/hw/pass-through.c +++ b/hw/pass-through.c @@ -626,6 +626,7 @@ static struct pt_reg_info_tbl pt_emu_reg_msi_tbl[] = { .init_val = 0x00000000, .ro_mask = 0x00000003, .emu_mask = 0xFFFFFFFF, + .no_wb = 1, .init = pt_common_reg_init, .u.dw.read = pt_long_reg_read, .u.dw.write = pt_msgaddr32_reg_write, @@ -638,6 +639,7 @@ static struct pt_reg_info_tbl pt_emu_reg_msi_tbl[] = { .init_val = 0x00000000, .ro_mask = 0x00000000, .emu_mask = 0xFFFFFFFF, + .no_wb = 1, .init = pt_msgaddr64_reg_init, .u.dw.read = pt_long_reg_read, .u.dw.write = pt_msgaddr64_reg_write, @@ -650,6 +652,7 @@ static struct pt_reg_info_tbl pt_emu_reg_msi_tbl[] = { .init_val = 0x0000, .ro_mask = 0x0000, .emu_mask = 0xFFFF, + .no_wb = 1, .init = pt_msgdata_reg_init, .u.w.read = pt_word_reg_read, .u.w.write = pt_msgdata_reg_write, @@ -662,6 +665,7 @@ static struct pt_reg_info_tbl pt_emu_reg_msi_tbl[] = { .init_val = 0x0000, .ro_mask = 0x0000, .emu_mask = 0xFFFF, + .no_wb = 1, .init = pt_msgdata_reg_init, .u.w.read = pt_word_reg_read, .u.w.write = pt_msgdata_reg_write, @@ -1550,10 +1554,12 @@ static void pt_pci_write_config(PCIDevice *d, uint32_t address, uint32_t val, val >>= ((address & 3) << 3); out: - ret = pci_write_block(pci_dev, address, (uint8_t *)&val, len); + if (!reg->no_wb) { + ret = pci_write_block(pci_dev, address, (uint8_t *)&val, len); - if (!ret) - PT_LOG("Error: pci_write_block failed. return value[%d].\n", ret); + if (!ret) + PT_LOG("Error: pci_write_block failed. return value[%d].\n", ret); + } if (pm_state != NULL && pm_state->flags & PT_FLAG_TRANSITING) /* set QEMUTimer */ diff --git a/hw/pass-through.h b/hw/pass-through.h index 028a03e..3c79885 100644 --- a/hw/pass-through.h +++ b/hw/pass-through.h @@ -364,6 +364,8 @@ struct pt_reg_info_tbl { uint32_t ro_mask; /* reg emulate field mask (ON:emu, OFF:passthrough) */ uint32_t emu_mask; + /* no write back allowed */ + uint32_t no_wb; /* emul reg initialize method */ conf_reg_init init; union { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2009-Nov-06 18:16 UTC
Re: [Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write
Qing He writes ("[Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write"):> Current pt_pci_write_config always writes back to real pci conf > space. However, in the case of MSI address and data registers, > if guest changes the affinity of the interrupt, stale data will > be written to these registers. This is particularly a problem > if Xen uses per-CPU vector, where the interrupt in question fails > to work. This patch fixes this by adding an option to disable the > write back of certain controls.Thanks for this patch, which I have applied. But I do have a question about it. I hope you''ll forgive my ignorance about MSIs (I haven''t read the reference manuals). I don''t think I fully understand the problem this is trying to fix. There are two ways of updating the MSI address and data registers ? Are they available via the space directly mapped into the guest as well as via config space then ? One of them is pt_pci_write_config (called when the guest writes to PCI config space) and the other is used by the guest when it changes affinity ? Under what circumstances does pt_pci_write_config get used for these registers ? Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Nov-09 03:03 UTC
Re: [Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write
On Sat, 2009-11-07 at 02:16 +0800, Ian Jackson wrote:> But I do have a question about it. I hope you''ll forgive my ignorance > about MSIs (I haven''t read the reference manuals). > > I don''t think I fully understand the problem this is trying to fix.I know that patch note is not clear enough :-)> > There are two ways of updating the MSI address and data registers ? > Are they available via the space directly mapped into the guest as > well as via config space then ?The data and address registers is the sole way to update MSI vector and affinity (at least when not using intremap), but the problem here is that QEmu overwrite the hypervisor changes using stale data. As we know, guest MSI is virtual, this means guest MSI address and data are all emulated, and guest vector has nothing to do with real vector. QEmu needs to map and bind MSI through Xen. via the following two calls: xc_physdev_map_pirq_msi xc_domain_bind_pt_irq The physical content of MSI data/address is then decided and written by Xen. xc_physdev_map_pirq_msi is also used to update guest MSI, including vector and affinity. Now come to the pt_pci_write_config logic: pci_read_block(&read_val); reg->u.dw.write(read_val, &val); // the handler pci_write_block(val); Since MSI data/address is fully emulated, val always equals to read_val, i.e. write what is read back to the register. This would be OK for most of the time, however, when the guest changes MSI affinity, something happens between read and write. the handler calls xc_physdev_map_pirq_msi to update the MSI, hypervisor changes the affinity and write a new vector/affinity to the real registers. When the handler returns, pci_write_block(val) overwrites the real registers, all the HV changes are lost, making the MSI fail. Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2009-Nov-09 10:34 UTC
Re: [Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write
On Mon, 9 Nov 2009, Qing He wrote:> The data and address registers is the sole way to update MSI vector and > affinity (at least when not using intremap), but the problem here is > that QEmu overwrite the hypervisor changes using stale data. > > As we know, guest MSI is virtual, this means guest MSI address and data > are all emulated, and guest vector has nothing to do with real vector. > QEmu needs to map and bind MSI through Xen. via the following two calls: > > xc_physdev_map_pirq_msi > xc_domain_bind_pt_irq > > The physical content of MSI data/address is then decided and written by Xen. > xc_physdev_map_pirq_msi is also used to update guest MSI, including vector > and affinity. > > Now come to the pt_pci_write_config logic: > > pci_read_block(&read_val); > reg->u.dw.write(read_val, &val); // the handler > pci_write_block(val); > > Since MSI data/address is fully emulated, val always equals to read_val, > i.e. write what is read back to the register. This would be OK for most of > the time, however, when the guest changes MSI affinity, something happens > between read and write. the handler calls xc_physdev_map_pirq_msi to update > the MSI, hypervisor changes the affinity and write a new vector/affinity > to the real registers. When the handler returns, pci_write_block(val) > overwrites the real registers, all the HV changes are lost, making the > MSI fail. >If "val always equals to read_val", why do we need to call pci_write_block at all? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Nov-09 14:58 UTC
Re: [Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write
On Mon, 2009-11-09 at 18:34 +0800, Stefano Stabellini wrote:> On Mon, 9 Nov 2009, Qing He wrote: > > The data and address registers is the sole way to update MSI vector and > > affinity (at least when not using intremap), but the problem here is > > that QEmu overwrite the hypervisor changes using stale data. > > > > As we know, guest MSI is virtual, this means guest MSI address and data > > are all emulated, and guest vector has nothing to do with real vector. > > QEmu needs to map and bind MSI through Xen. via the following two calls: > > > > xc_physdev_map_pirq_msi > > xc_domain_bind_pt_irq > > > > The physical content of MSI data/address is then decided and written by Xen. > > xc_physdev_map_pirq_msi is also used to update guest MSI, including vector > > and affinity. > > > > Now come to the pt_pci_write_config logic: > > > > pci_read_block(&read_val); > > reg->u.dw.write(read_val, &val); // the handler > > pci_write_block(val); > > > > Since MSI data/address is fully emulated, val always equals to read_val, > > i.e. write what is read back to the register. This would be OK for most of > > the time, however, when the guest changes MSI affinity, something happens > > between read and write. the handler calls xc_physdev_map_pirq_msi to update > > the MSI, hypervisor changes the affinity and write a new vector/affinity > > to the real registers. When the handler returns, pci_write_block(val) > > overwrites the real registers, all the HV changes are lost, making the > > MSI fail. > > > > If "val always equals to read_val", why do we need to call > pci_write_block at all? >val == read_val is not necessarily true for registers other than MSI data/address, but pt_pci_write_config tend to be generic for all pci config. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2009-Nov-09 17:19 UTC
Re: [Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write
Qing He writes ("Re: [Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write"):> The physical content of MSI data/address is then decided and written by Xen. > xc_physdev_map_pirq_msi is also used to update guest MSI, including vector > and affinity. > > Now come to the pt_pci_write_config logic: > > pci_read_block(&read_val); > reg->u.dw.write(read_val, &val); // the handler > pci_write_block(val); > > Since MSI data/address is fully emulated, val always equals to read_val, > i.e. write what is read back to the register. This would be OK for most of > the time, however, when the guest changes MSI affinity, something happens > between read and write. the handler calls xc_physdev_map_pirq_msi to update > the MSI, hypervisor changes the affinity and write a new vector/affinity > to the real registers. When the handler returns, pci_write_block(val) > overwrites the real registers, all the HV changes are lost, making the > MSI fail.So as I understand it the problem is as follows, if I may put what you have said in different words: 1. Write accesses by the guest to PCI config space are done by reading the corresponding real PCI config register to get the old value, and passing the old and new values to the write handler. 2. Depending on the write handler, some action will be taken. For an MSI affinity change, this is pt_msgaddr32_reg_write which calls pt_msi_update. 3. The write access unconditionally writes the value from the write handler into the real device. The problem is that pt_msi_update will itself actually change the real affinity in the real PCI config space to a different value. And then in step 3 we overwrite that correct value with a wrong one. It is not convenient for pt_msgaddr32_reg_write to return the correct value for writing into the real PCI config space (a) because it''s computed by Xen and we don''t have the value accessible (b) there might be races and complicationswith reading it again (c) there might be races in writing it twice. The problem occurs because of the previous assumption that guest PCI config space writes are all write-through, possibly with some modification to the written value. The new flag prevents the write-through (not a write _back_). In which case I think it''s fine if somewhat misnamed. But it would be better to consider whether the assumpion is actually valid; perhaps it would be better for the write handlers to explicitly do the write to real config space themselves if they need it ? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Nov-10 01:36 UTC
Re: [Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write
On Tue, 2009-11-10 at 01:19 +0800, Ian Jackson wrote:> The problem occurs because of the previous assumption that guest PCI > config space writes are all write-through, possibly with some > modification to the written value. The new flag prevents the > write-through (not a write _back_).Well, if you consider it as the register write back stage of an instruction pipeline, the term becomes natural. It''s the write of a read-execute-write pattern, that''s why `back'' is used.> > In which case I think it''s fine if somewhat misnamed. But it would be > better to consider whether the assumpion is actually valid; perhaps it > would be better for the write handlers to explicitly do the write to > real config space themselves if they need it ?I''d like to make the change as small. If the write is moved to the handler, all the handlers have to change. And for the current generalized pci config space algorithm, I think its logics is quite clear. There was even discussion that long run letting QEmu write physical pci config space is not desirable, either it be moved to some place like pcistub or hypervisor audit every write. For now, I think a few line change is fine. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2009-Nov-12 17:06 UTC
Re: [Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write
Qing He writes ("Re: [Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write"):> On Tue, 2009-11-10 at 01:19 +0800, Ian Jackson wrote: > > [stuff] > > [explanation] > > There was even discussion that long run letting QEmu write physical > pci config space is not desirable, either it be moved to some place > like pcistub or hypervisor audit every write. For now, I think a few > line change is fine.Right, yes, thanks for the explanation. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel