Wei Wang
2012-Jun-12 12:02 UTC
[PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it
Hi Jan & Andrew I had attached a revised patch, please check it. I found that the following Linux commit triggers this issue. It has been included into 3.4 pv_ops. " commit a776c491ca5e38c26d9f66923ff574d041e747f4 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Mon Oct 17 11:46:06 2011 -0700 PCI: msi: Disable msi interrupts when we initialize a pci device " Thanks, Wei _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2012-Jun-12 15:13 UTC
Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it
>>> On 12.06.12 at 14:02, Wei Wang <wei.wang2@amd.com> wrote: > I had attached a revised patch, please check it.While the patch technically looks better now, you didn''t eliminate my objections to the approach you take, nor did you comment on the proposed alternative. A fundamental problem is that your IOMMUs show up as a "normal" PCI devices, breaking the separation between what is being managed by the hypervisor vs by the Dom0 kernel. (This even allows something as odd as passing through an IOMMU to a DomU, which would clearly upset the hypervisor.)> I found that the following Linux commit triggers this issue. It has been > included into 3.4 pv_ops. > > " commit a776c491ca5e38c26d9f66923ff574d041e747f4 > Author: Eric W. Biederman <ebiederm@xmission.com> > Date: Mon Oct 17 11:46:06 2011 -0700 > > PCI: msi: Disable msi interrupts when we initialize a pci device "Thanks for locating this. As it stands, it is incomplete though anyway: If the kexec-ed kernel is one built without CONFIG_PCI_MSI, it won''t have a means to suppress the "screaming" interrupts. Jan
Andrew Cooper
2012-Jun-12 16:08 UTC
Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it
On 12/06/12 16:13, Jan Beulich wrote:>>>> On 12.06.12 at 14:02, Wei Wang <wei.wang2@amd.com> wrote: >> I had attached a revised patch, please check it. > While the patch technically looks better now, you didn''t eliminate > my objections to the approach you take, nor did you comment on > the proposed alternative. > > A fundamental problem is that your IOMMUs show up as a "normal" > PCI devices, breaking the separation between what is being > managed by the hypervisor vs by the Dom0 kernel. (This even > allows something as odd as passing through an IOMMU to a > DomU, which would clearly upset the hypervisor.) > >> I found that the following Linux commit triggers this issue. It has been >> included into 3.4 pv_ops. >> >> " commit a776c491ca5e38c26d9f66923ff574d041e747f4 >> Author: Eric W. Biederman <ebiederm@xmission.com> >> Date: Mon Oct 17 11:46:06 2011 -0700 >> >> PCI: msi: Disable msi interrupts when we initialize a pci device " > Thanks for locating this. As it stands, it is incomplete though > anyway: If the kexec-ed kernel is one built without CONFIG_PCI_MSI, > it won''t have a means to suppress the "screaming" interrupts. > > JanI feel that the correct solution would be for Xen to hide the PCI devices from dom0. Xen already hides the DMAR ACPI table (by turning it to an XMAR table), and this would be the logical way to proceed now that IOMMU internals are appearing as PCI devices. ( A similar issue comes into play now that newer generations of CPUs are exposing CPU internals as PCI devices ) ~Andrew> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Jan Beulich
2012-Jun-12 16:43 UTC
Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it
>>> On 12.06.12 at 18:08, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 12/06/12 16:13, Jan Beulich wrote: >>>>> On 12.06.12 at 14:02, Wei Wang <wei.wang2@amd.com> wrote: >>> I had attached a revised patch, please check it. >> While the patch technically looks better now, you didn''t eliminate >> my objections to the approach you take, nor did you comment on >> the proposed alternative. >> >> A fundamental problem is that your IOMMUs show up as a "normal" >> PCI devices, breaking the separation between what is being >> managed by the hypervisor vs by the Dom0 kernel. (This even >> allows something as odd as passing through an IOMMU to a >> DomU, which would clearly upset the hypervisor.) >> >>> I found that the following Linux commit triggers this issue. It has been >>> included into 3.4 pv_ops. >>> >>> " commit a776c491ca5e38c26d9f66923ff574d041e747f4 >>> Author: Eric W. Biederman <ebiederm@xmission.com> >>> Date: Mon Oct 17 11:46:06 2011 -0700 >>> >>> PCI: msi: Disable msi interrupts when we initialize a pci device " >> Thanks for locating this. As it stands, it is incomplete though >> anyway: If the kexec-ed kernel is one built without CONFIG_PCI_MSI, >> it won''t have a means to suppress the "screaming" interrupts. > > I feel that the correct solution would be for Xen to hide the PCI > devices from dom0. Xen already hides the DMAR ACPI table (by turning it > to an XMAR table), and this would be the logical way to proceed now that > IOMMU internals are appearing as PCI devices.That is precisely what I suggested in my response to the first version of this patch, and I''d also volunteer to look into putting together a first draft implementation if we sort of agree that this is the way to go.> ( A similar issue comes into play now that newer generations of CPUs are > exposing CPU internals as PCI devices )Indeed, good point. Might be a little tricky though to determine which one(s) they are, and still avoid conflicting with things like the EDAC drivers in Dom0. Jan
Wei Wang
2012-Jun-14 12:13 UTC
Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it
Am 12.06.2012 18:43, schrieb Jan Beulich:>>>> On 12.06.12 at 18:08, Andrew Cooper<andrew.cooper3@citrix.com> wrote: >> On 12/06/12 16:13, Jan Beulich wrote: >>>>>> On 12.06.12 at 14:02, Wei Wang<wei.wang2@amd.com> wrote: >>>> I had attached a revised patch, please check it. >>> While the patch technically looks better now, you didn''t eliminate >>> my objections to the approach you take, nor did you comment on >>> the proposed alternative. >>> >>> A fundamental problem is that your IOMMUs show up as a "normal" >>> PCI devices, breaking the separation between what is being >>> managed by the hypervisor vs by the Dom0 kernel. (This even >>> allows something as odd as passing through an IOMMU to a >>> DomU, which would clearly upset the hypervisor.) >>> >>>> I found that the following Linux commit triggers this issue. It has been >>>> included into 3.4 pv_ops. >>>> >>>> " commit a776c491ca5e38c26d9f66923ff574d041e747f4 >>>> Author: Eric W. Biederman<ebiederm@xmission.com> >>>> Date: Mon Oct 17 11:46:06 2011 -0700 >>>> >>>> PCI: msi: Disable msi interrupts when we initialize a pci device " >>> Thanks for locating this. As it stands, it is incomplete though >>> anyway: If the kexec-ed kernel is one built without CONFIG_PCI_MSI, >>> it won''t have a means to suppress the "screaming" interrupts. >> >> I feel that the correct solution would be for Xen to hide the PCI >> devices from dom0. Xen already hides the DMAR ACPI table (by turning it >> to an XMAR table), and this would be the logical way to proceed now that >> IOMMU internals are appearing as PCI devices.That sounds absolutely good to me. thanks for the suggestion.> That is precisely what I suggested in my response to the first > version of this patch, and I''d also volunteer to look into putting > together a first draft implementation if we sort of agree that > this is the way to go.Cool! thanks for doing that. Looking forward to it in Xen 4.2 since iommu msi is really broken with recent Linux dom0... Thanks Wei>> ( A similar issue comes into play now that newer generations of CPUs are >> exposing CPU internals as PCI devices ) > > Indeed, good point. Might be a little tricky though to determine > which one(s) they are, and still avoid conflicting with things like > the EDAC drivers in Dom0. > > Jan > >
Jan Beulich
2012-Jun-14 14:18 UTC
Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it
>>> On 14.06.12 at 14:13, Wei Wang <wei.wang2@amd.com> wrote: > Am 12.06.2012 18:43, schrieb Jan Beulich: >> That is precisely what I suggested in my response to the first >> version of this patch, and I''d also volunteer to look into putting >> together a first draft implementation if we sort of agree that >> this is the way to go. > > Cool! thanks for doing that. Looking forward to it in Xen 4.2 since > iommu msi is really broken with recent Linux dom0...That''s unlikely to be for 4.2 - it''s going to have a reasonable risk of regressions, and functionality like that I don''t think is nice to rush into a feature frozen tree, the more that it provides a workaround for the combination of questionable design and an improper kernel side fix. Have you at all considered getting this fixed on the kernel side? As I don''t have direct access to any AMD IOMMU capable system - can one, other than by enumerating the respective PCI IDs or reading ACPI tables, reasonably easily identify the devices in question (e.g. via vendor/class/sub-class or some such)? That might permit skipping those in the offending kernel code... Jan
Wei Wang
2012-Jun-14 15:15 UTC
Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it
Am 14.06.2012 16:18, schrieb Jan Beulich:>>>> On 14.06.12 at 14:13, Wei Wang<wei.wang2@amd.com> wrote: >> Am 12.06.2012 18:43, schrieb Jan Beulich: >>> That is precisely what I suggested in my response to the first >>> version of this patch, and I''d also volunteer to look into putting >>> together a first draft implementation if we sort of agree that >>> this is the way to go. >> >> Cool! thanks for doing that. Looking forward to it in Xen 4.2 since >> iommu msi is really broken with recent Linux dom0... > > That''s unlikely to be for 4.2 - it''s going to have a reasonable risk > of regressions, and functionality like that I don''t think is nice to > rush into a feature frozen tree, the more that it provides a > workaround for the combination of questionable design and an > improper kernel side fix.Yes, it looks like improper to me also, but it would also be nice to have something Xen to handle this situation. Maybe we should not trust guest os, even dom0 some times...> Have you at all considered getting this fixed on the kernel side? > As I don''t have direct access to any AMD IOMMU capable > system - can one, other than by enumerating the respective > PCI IDs or reading ACPI tables, reasonably easily identify the > devices in question (e.g. via vendor/class/sub-class or some > such)? That might permit skipping those in the offending kernel > code...AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral) and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this should be enough to identify amd iommu device. I could send you a kernel patch for review using this approach. I would believe that fixing this issue in 4.2, no matter how, is really important for amd iommu. Thanks, Wei> Jan > >
Jan Beulich
2012-Jun-14 15:27 UTC
Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it
>>> On 14.06.12 at 17:15, Wei Wang <wei.wang2@amd.com> wrote: > Am 14.06.2012 16:18, schrieb Jan Beulich: >> Have you at all considered getting this fixed on the kernel side? >> As I don''t have direct access to any AMD IOMMU capable >> system - can one, other than by enumerating the respective >> PCI IDs or reading ACPI tables, reasonably easily identify the >> devices in question (e.g. via vendor/class/sub-class or some >> such)? That might permit skipping those in the offending kernel >> code... > > AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral) > and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this > should be enough to identify amd iommu device. I could send you a kernel > patch for review using this approach.Yes, please. Jan
Jan Beulich
2012-Jun-20 15:45 UTC
Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it
>>> On 14.06.12 at 14:13, Wei Wang <wei.wang2@amd.com> wrote: > Am 12.06.2012 18:43, schrieb Jan Beulich: >>>>> On 12.06.12 at 18:08, Andrew Cooper<andrew.cooper3@citrix.com> wrote: >>> On 12/06/12 16:13, Jan Beulich wrote: >>>>>>> On 12.06.12 at 14:02, Wei Wang<wei.wang2@amd.com> wrote: >>>>> I had attached a revised patch, please check it. >>>> While the patch technically looks better now, you didn''t eliminate >>>> my objections to the approach you take, nor did you comment on >>>> the proposed alternative. >>>> >>>> A fundamental problem is that your IOMMUs show up as a "normal" >>>> PCI devices, breaking the separation between what is being >>>> managed by the hypervisor vs by the Dom0 kernel. (This even >>>> allows something as odd as passing through an IOMMU to a >>>> DomU, which would clearly upset the hypervisor.) >>>> >>>>> I found that the following Linux commit triggers this issue. It has been >>>>> included into 3.4 pv_ops. >>>>> >>>>> " commit a776c491ca5e38c26d9f66923ff574d041e747f4 >>>>> Author: Eric W. Biederman<ebiederm@xmission.com> >>>>> Date: Mon Oct 17 11:46:06 2011 -0700 >>>>> >>>>> PCI: msi: Disable msi interrupts when we initialize a pci device " >>>> Thanks for locating this. As it stands, it is incomplete though >>>> anyway: If the kexec-ed kernel is one built without CONFIG_PCI_MSI, >>>> it won''t have a means to suppress the "screaming" interrupts. >>> >>> I feel that the correct solution would be for Xen to hide the PCI >>> devices from dom0. Xen already hides the DMAR ACPI table (by turning it >>> to an XMAR table), and this would be the logical way to proceed now that >>> IOMMU internals are appearing as PCI devices. > > That sounds absolutely good to me. thanks for the suggestion. > >> That is precisely what I suggested in my response to the first >> version of this patch, and I''d also volunteer to look into putting >> together a first draft implementation if we sort of agree that >> this is the way to go. > > Cool! thanks for doing that. Looking forward to it in Xen 4.2 since > iommu msi is really broken with recent Linux dom0...Rather than submitting it for inclusion immediately, I''d rather see you first use it successfully. Below/attached what appears to work fine for me in contrived situations (i.e. hiding bridges with nothing connected, as I still don''t have any AMD IOMMU system at hand). Jan Note that due to ptwr_do_page_fault() being run first, there''ll be a MEM_LOG() issued for each such attempt. If that''s undesirable, the order of the calls would need to be swapped. --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -1875,6 +1875,7 @@ static int mod_l1_entry(l1_pgentry_t *pl break; case 1: l1e_remove_flags(nl1e, _PAGE_RW); + rc = 0; break; } if ( page ) @@ -5208,6 +5209,97 @@ int ptwr_do_page_fault(struct vcpu *v, u return 0; } +#ifdef __x86_64__ +/************************* + * fault handling for read-only MMIO pages + */ + +struct mmio_ro_emulate_ctxt { + struct x86_emulate_ctxt ctxt; + unsigned long cr2; +}; + +static int mmio_ro_emulated_read( + enum x86_segment seg, + unsigned long offset, + void *p_data, + unsigned int bytes, + struct x86_emulate_ctxt *ctxt) +{ + return X86EMUL_UNHANDLEABLE; +} + +static int mmio_ro_emulated_write( + enum x86_segment seg, + unsigned long offset, + void *p_data, + unsigned int bytes, + struct x86_emulate_ctxt *ctxt) +{ + struct mmio_ro_emulate_ctxt *mmio_ro_ctxt + container_of(ctxt, struct mmio_ro_emulate_ctxt, ctxt); + + /* Only allow naturally-aligned stores at the original %cr2 address. */ + if ( ((bytes | offset) & (bytes - 1)) || offset != mmio_ro_ctxt->cr2 ) + { + MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)", + mmio_ro_ctxt->cr2, offset, bytes); + return X86EMUL_UNHANDLEABLE; + } + + return X86EMUL_OKAY; +} + +static const struct x86_emulate_ops mmio_ro_emulate_ops = { + .read = mmio_ro_emulated_read, + .insn_fetch = ptwr_emulated_read, + .write = mmio_ro_emulated_write, +}; + +/* Check if guest is trying to modify a r/o MMIO page. */ +int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr, + struct cpu_user_regs *regs) +{ + l1_pgentry_t pte; + unsigned long mfn; + unsigned int addr_size = is_pv_32on64_domain(v->domain) ? + 32 : BITS_PER_LONG; + struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { + .ctxt.regs = regs, + .ctxt.addr_size = addr_size, + .ctxt.sp_size = addr_size, + .cr2 = addr + }; + int rc; + + /* Attempt to read the PTE that maps the VA being accessed. */ + guest_get_eff_l1e(v, addr, &pte); + + /* We are looking only for read-only mappings of MMIO pages. */ + if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ) + return 0; + + mfn = l1e_get_pfn(pte); + if ( mfn_valid(mfn) ) + { + struct page_info *page = mfn_to_page(mfn); + struct domain *owner = page_get_owner_and_reference(page); + + if ( owner ) + put_page(page); + if ( owner != dom_io ) + return 0; + } + + if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) + return 0; + + rc = x86_emulate(&mmio_ro_ctxt.ctxt, &mmio_ro_emulate_ops); + + return rc != X86EMUL_UNHANDLEABLE ? EXCRET_fault_fixed : 0; +} +#endif /* __x86_64__ */ + void free_xen_pagetable(void *v) { if ( system_state == SYS_STATE_early_boot ) --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1349,20 +1349,23 @@ static int fixup_page_fault(unsigned lon return 0; } - if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables) && - guest_kernel_mode(v, regs) ) - { - unsigned int mbs = PFEC_write_access; - unsigned int mbz = PFEC_reserved_bit | PFEC_insn_fetch; - - /* Do not check if access-protection fault since the page may - legitimately be not present in shadow page tables */ - if ( !paging_mode_enabled(d) ) - mbs |= PFEC_page_present; - - if ( ((regs->error_code & (mbs | mbz)) == mbs) && + if ( guest_kernel_mode(v, regs) && + !(regs->error_code & (PFEC_reserved_bit | PFEC_insn_fetch)) && + (regs->error_code & PFEC_write_access) ) + { + if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables) && + /* Do not check if access-protection fault since the page may + legitimately be not present in shadow page tables */ + (paging_mode_enabled(d) || + (regs->error_code & PFEC_page_present)) && ptwr_do_page_fault(v, addr, regs) ) return EXCRET_fault_fixed; + +#ifdef __x86_64__ + if ( IS_PRIV(d) && (regs->error_code & PFEC_page_present) && + mmio_ro_do_page_fault(v, addr, regs) ) + return EXCRET_fault_fixed; +#endif } /* For non-external shadowed guests, we fix up both their own @@ -1690,6 +1693,13 @@ static int pci_cfg_ok(struct domain *d, return 0; machine_bdf = (d->arch.pci_cf8 >> 8) & 0xFFFF; + if ( write ) + { + const unsigned long *ro_map = pci_get_ro_map(0); + + if ( ro_map && test_bit(machine_bdf, ro_map) ) + return 0; + } start = d->arch.pci_cf8 & 0xFF; end = start + size - 1; if (xsm_pci_config_permission(d, machine_bdf, start, end, write)) --- a/xen/arch/x86/x86_32/pci.c +++ b/xen/arch/x86/x86_32/pci.c @@ -6,6 +6,7 @@ #include <xen/spinlock.h> #include <xen/pci.h> +#include <xen/init.h> #include <asm/io.h> #define PCI_CONF_ADDRESS(bus, dev, func, reg) \ @@ -70,3 +71,7 @@ void pci_conf_write32( BUG_ON((bus > 255) || (dev > 31) || (func > 7) || (reg > 255)); pci_conf_write(PCI_CONF_ADDRESS(bus, dev, func, reg), 0, 4, data); } + +void __init arch_pci_ro_device(int seg, int bdf) +{ +} --- a/xen/arch/x86/x86_64/mmconfig_64.c +++ b/xen/arch/x86/x86_64/mmconfig_64.c @@ -145,9 +145,30 @@ static void __iomem *mcfg_ioremap(const return (void __iomem *) virt; } +void arch_pci_ro_device(int seg, int bdf) +{ + unsigned int idx, bus = PCI_BUS(bdf); + + for (idx = 0; idx < pci_mmcfg_config_num; ++idx) { + const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg; + unsigned long mfn = (cfg->address >> PAGE_SHIFT) + bdf; + + if (!pci_mmcfg_virt[idx].virt || cfg->pci_segment != seg || + cfg->start_bus_number > bus || cfg->end_bus_number < bus) + continue; + + if (rangeset_add_singleton(mmio_ro_ranges, mfn)) + printk(XENLOG_ERR + "%04x:%02x:%02x.%u: could not mark MCFG (mfn %#lx) read-only\n", + cfg->pci_segment, bus, PCI_SLOT(bdf), PCI_FUNC(bdf), + mfn); + } +} + int pci_mmcfg_arch_enable(unsigned int idx) { const typeof(pci_mmcfg_config[0]) *cfg = pci_mmcfg_virt[idx].cfg; + const unsigned long *ro_map = pci_get_ro_map(cfg->pci_segment); if (pci_mmcfg_virt[idx].virt) return 0; @@ -159,6 +180,16 @@ int pci_mmcfg_arch_enable(unsigned int i } printk(KERN_INFO "PCI: Using MCFG for segment %04x bus %02x-%02x\n", cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number); + if (ro_map) { + unsigned int bdf = PCI_BDF(cfg->start_bus_number, 0, 0); + unsigned int end = PCI_BDF(cfg->end_bus_number, -1, -1); + + while ((bdf = find_next_bit(ro_map, end + 1, bdf)) <= end) { + arch_pci_ro_device(cfg->pci_segment, bdf); + if (bdf++ == end) + break; + } + } return 0; } --- a/xen/drivers/passthrough/amd/iommu_detect.c +++ b/xen/drivers/passthrough/amd/iommu_detect.c @@ -153,6 +153,12 @@ int __init amd_iommu_detect_one_acpi( if ( rt ) return -ENODEV; + rt = pci_ro_device(iommu->seg, bus, PCI_DEVFN(dev, func)); + if ( rt ) + printk(XENLOG_ERR + "Could not mark config space of %04x:%02x:%02x.%u read-only (%d)\n", + iommu->seg, bus, dev, func, rt); + list_add_tail(&iommu->list, &amd_iommu_head); return 0; --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -593,11 +593,3 @@ void hvm_dpci_eoi(struct domain *d, unsi unlock: spin_unlock(&d->event_lock); } - -static int __init setup_mmio_ro_ranges(void) -{ - mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges", - RANGESETF_prettyprint_hex); - return 0; -} -__initcall(setup_mmio_ro_ranges); --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -36,6 +36,7 @@ struct pci_seg { struct list_head alldevs_list; u16 nr; + unsigned long *ro_map; /* bus2bridge_lock protects bus2bridge array */ spinlock_t bus2bridge_lock; #define MAX_BUSES 256 @@ -106,6 +107,8 @@ void __init pt_pci_init(void) radix_tree_init(&pci_segments); if ( !alloc_pseg(0) ) panic("Could not initialize PCI segment 0\n"); + mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges", + RANGESETF_prettyprint_hex); } int __init pci_add_segment(u16 seg) @@ -113,6 +116,13 @@ int __init pci_add_segment(u16 seg) return alloc_pseg(seg) ? 0 : -ENOMEM; } +const unsigned long *pci_get_ro_map(u16 seg) +{ + struct pci_seg *pseg = get_pseg(seg); + + return pseg ? pseg->ro_map : NULL; +} + static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn) { struct pci_dev *pdev; @@ -198,6 +208,33 @@ static void free_pdev(struct pci_seg *ps xfree(pdev); } +int __init pci_ro_device(int seg, int bus, int devfn) +{ + struct pci_seg *pseg = alloc_pseg(seg); + struct pci_dev *pdev; + + if ( !pseg ) + return -ENOMEM; + pdev = alloc_pdev(pseg, bus, devfn); + if ( !pdev ) + return -ENOMEM; + + if ( !pseg->ro_map ) + { + size_t sz = BITS_TO_LONGS(PCI_BDF(-1, -1, -1) + 1) * sizeof(long); + + pseg->ro_map = alloc_xenheap_pages(get_order_from_bytes(sz), 0); + if ( !pseg->ro_map ) + return -ENOMEM; + memset(pseg->ro_map, 0, sz); + } + + __set_bit(PCI_BDF2(bus, devfn), pseg->ro_map); + arch_pci_ro_device(seg, PCI_BDF2(bus, devfn)); + + return 0; +} + struct pci_dev *pci_get_pdev(int seg, int bus, int devfn) { struct pci_seg *pseg = get_pseg(seg); --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -555,6 +555,8 @@ void memguard_unguard_stack(void *p); int ptwr_do_page_fault(struct vcpu *, unsigned long, struct cpu_user_regs *); +int mmio_ro_do_page_fault(struct vcpu *, unsigned long, + struct cpu_user_regs *); int audit_adjust_pgtables(struct domain *d, int dir, int noisy); --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -98,8 +98,11 @@ struct pci_dev *pci_lock_domain_pdev( void setup_dom0_pci_devices(struct domain *, void (*)(struct pci_dev *)); void pci_release_devices(struct domain *d); int pci_add_segment(u16 seg); +const unsigned long *pci_get_ro_map(u16 seg); int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *); int pci_remove_device(u16 seg, u8 bus, u8 devfn); +int pci_ro_device(int seg, int bus, int devfn); +void arch_pci_ro_device(int seg, int bdf); struct pci_dev *pci_get_pdev(int seg, int bus, int devfn); struct pci_dev *pci_get_pdev_by_domain( struct domain *, int seg, int bus, int devfn); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2012-Jun-21 09:59 UTC
[PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0 (was: Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it)
>>> On 14.06.12 at 17:15, Wei Wang <wei.wang2@amd.com> wrote: > Am 14.06.2012 16:18, schrieb Jan Beulich: >> Have you at all considered getting this fixed on the kernel side? >> As I don''t have direct access to any AMD IOMMU capable >> system - can one, other than by enumerating the respective >> PCI IDs or reading ACPI tables, reasonably easily identify the >> devices in question (e.g. via vendor/class/sub-class or some >> such)? That might permit skipping those in the offending kernel >> code... > > AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral) > and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this > should be enough to identify amd iommu device. I could send you a kernel > patch for review using this approach. I would believe that fixing this > issue in 4.2, no matter how, is really important for amd iommu.As you didn''t come forward with anything, here''s my first take on this: Commit d5dea7d95c48d7bc951cee4910a7fd9c0cd26fb0 ("PCI: msi: Disable msi interrupts when we initialize a pci device") caused MSI to get disabled on Xen Dom0 despite it having got turned on purposefully by the hypervisor. As an immediate band aid, suppress the disabling in this specific case. I don''t think, however, that either the original change or this fix are really appropriate. The original fix, besides leaving aside the presence of a hypervisor like Xen, doesn''t deal with all possible cases (in particular it has no effect if the secondary kernel was built with CONFIG_PCI_MSI unset). And taking into account a hypervisor like Xen, the logic like this should probably be skipped altogether (i.e. it should be entirely left to the hypervisor, being the entity in control of IRQs). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@kernel.org --- drivers/pci/msi.c | 7 +++++++ include/linux/pci_ids.h | 1 + 2 files changed, 8 insertions(+) --- 3.5-rc3/drivers/pci/msi.c +++ 3.5-rc3-xen-pci-msi-no-iommu-disable/drivers/pci/msi.c @@ -20,6 +20,7 @@ #include <linux/errno.h> #include <linux/io.h> #include <linux/slab.h> +#include <xen/xen.h> #include "pci.h" #include "msi.h" @@ -1022,7 +1023,13 @@ void pci_msi_init_pci_dev(struct pci_dev /* Disable the msi hardware to avoid screaming interrupts * during boot. This is the power on reset default so * usually this should be a noop. + * But on a Xen host don''t do this for IOMMUs which the hypervisor + * is in control of (and hence has already enabled on purpose). */ + if (xen_initial_domain() + && (dev->class >> 8) == PCI_CLASS_SYSTEM_IOMMU + && dev->vendor == PCI_VENDOR_ID_AMD) + return; pos = pci_find_capability(dev, PCI_CAP_ID_MSI); if (pos) msi_set_enable(dev, pos, 0); --- 3.5-rc3/include/linux/pci_ids.h +++ 3.5-rc3-xen-pci-msi-no-iommu-disable/include/linux/pci_ids.h @@ -75,6 +75,7 @@ #define PCI_CLASS_SYSTEM_RTC 0x0803 #define PCI_CLASS_SYSTEM_PCI_HOTPLUG 0x0804 #define PCI_CLASS_SYSTEM_SDHCI 0x0805 +#define PCI_CLASS_SYSTEM_IOMMU 0x0806 #define PCI_CLASS_SYSTEM_OTHER 0x0880 #define PCI_BASE_CLASS_INPUT 0x09
Eric W. Biederman
2012-Jun-21 11:08 UTC
Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0
"Jan Beulich" <JBeulich@suse.com> writes:>>>> On 14.06.12 at 17:15, Wei Wang <wei.wang2@amd.com> wrote: >> Am 14.06.2012 16:18, schrieb Jan Beulich: >>> Have you at all considered getting this fixed on the kernel side? >>> As I don''t have direct access to any AMD IOMMU capable >>> system - can one, other than by enumerating the respective >>> PCI IDs or reading ACPI tables, reasonably easily identify the >>> devices in question (e.g. via vendor/class/sub-class or some >>> such)? That might permit skipping those in the offending kernel >>> code... >> >> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral) >> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this >> should be enough to identify amd iommu device. I could send you a kernel >> patch for review using this approach. I would believe that fixing this >> issue in 4.2, no matter how, is really important for amd iommu. > > As you didn''t come forward with anything, here''s my first > take on this: > > Commit d5dea7d95c48d7bc951cee4910a7fd9c0cd26fb0 ("PCI: msi: Disable msi > interrupts when we initialize a pci device") caused MSI to get disabled > on Xen Dom0 despite it having got turned on purposefully by the > hypervisor. As an immediate band aid, suppress the disabling in this > specific case. > > I don''t think, however, that either the original change or this fix are > really appropriate. The original fix, besides leaving aside the > presence of a hypervisor like Xen, doesn''t deal with all possible > cases (in particular it has no effect if the secondary kernel was built > with CONFIG_PCI_MSI unset). And taking into account a hypervisor like > Xen, the logic like this should probably be skipped altogether (i.e. it > should be entirely left to the hypervisor, being the entity in control > of IRQs).The original fix is fine. Xen dom0 remains insane. Although perhaps some better than Xen dom0 once was. Why does the dom0 kernel even get any access to pci devices that Xen doesn''t want it to touch? As far as I can tell my patch has revealed a design bug in Xen. If you don''t want to be messed up by the kernel don''t let the kernel touch things. At the very least we need an abstraction much cleaner than the patch below. Eric> Signed-off-by: Jan Beulich <jbeulich@suse.com> > Cc: stable@kernel.org > > --- > drivers/pci/msi.c | 7 +++++++ > include/linux/pci_ids.h | 1 + > 2 files changed, 8 insertions(+) > > --- 3.5-rc3/drivers/pci/msi.c > +++ 3.5-rc3-xen-pci-msi-no-iommu-disable/drivers/pci/msi.c > @@ -20,6 +20,7 @@ > #include <linux/errno.h> > #include <linux/io.h> > #include <linux/slab.h> > +#include <xen/xen.h> > > #include "pci.h" > #include "msi.h" > @@ -1022,7 +1023,13 @@ void pci_msi_init_pci_dev(struct pci_dev > /* Disable the msi hardware to avoid screaming interrupts > * during boot. This is the power on reset default so > * usually this should be a noop. > + * But on a Xen host don''t do this for IOMMUs which the hypervisor > + * is in control of (and hence has already enabled on purpose). > */ > + if (xen_initial_domain() > + && (dev->class >> 8) == PCI_CLASS_SYSTEM_IOMMU > + && dev->vendor == PCI_VENDOR_ID_AMD) > + return; > pos = pci_find_capability(dev, PCI_CAP_ID_MSI); > if (pos) > msi_set_enable(dev, pos, 0); > --- 3.5-rc3/include/linux/pci_ids.h > +++ 3.5-rc3-xen-pci-msi-no-iommu-disable/include/linux/pci_ids.h > @@ -75,6 +75,7 @@ > #define PCI_CLASS_SYSTEM_RTC 0x0803 > #define PCI_CLASS_SYSTEM_PCI_HOTPLUG 0x0804 > #define PCI_CLASS_SYSTEM_SDHCI 0x0805 > +#define PCI_CLASS_SYSTEM_IOMMU 0x0806 > #define PCI_CLASS_SYSTEM_OTHER 0x0880 > > #define PCI_BASE_CLASS_INPUT 0x09
Wei Wang
2012-Jun-21 11:21 UTC
Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0
On 06/21/2012 11:59 AM, Jan Beulich wrote:>>>> On 14.06.12 at 17:15, Wei Wang<wei.wang2@amd.com> wrote: >> Am 14.06.2012 16:18, schrieb Jan Beulich: >>> Have you at all considered getting this fixed on the kernel side? >>> As I don''t have direct access to any AMD IOMMU capable >>> system - can one, other than by enumerating the respective >>> PCI IDs or reading ACPI tables, reasonably easily identify the >>> devices in question (e.g. via vendor/class/sub-class or some >>> such)? That might permit skipping those in the offending kernel >>> code... >> >> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral) >> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this >> should be enough to identify amd iommu device. I could send you a kernel >> patch for review using this approach. I would believe that fixing this >> issue in 4.2, no matter how, is really important for amd iommu. > > As you didn''t come forward with anything, here''s my first > take on this:Hi Jan Thanks a lot for the patch. Actually I plan to send my version today, which is based on 3.4 pv_ops but looks very similar to yours. So, Acked! I also evaluated the possibility of hiding iommu device from dom0. I think the change is no quite a lot, at least, for io based pcicfg access. A proof-of-concept patch is attached. Thanks, Wei diff -r baa85434d0ec xen/arch/x86/traps.c --- a/xen/arch/x86/traps.c Thu Jun 21 11:30:59 2012 +0200 +++ b/xen/arch/x86/traps.c Thu Jun 21 13:19:02 2012 +0200 @@ -73,6 +73,7 @@ #include <asm/hpet.h> #include <public/arch-x86/cpuid.h> #include <xsm/xsm.h> +#include <asm/hvm/svm/amd-iommu-proto.h> /* * opt_nmi: one of ''ignore'', ''dom0'', or ''fatal''. @@ -1686,10 +1687,19 @@ static int pci_cfg_ok(struct domain *d, { uint32_t machine_bdf; uint16_t start, end; + struct amd_iommu *iommu; + if (!IS_PRIV(d)) return 0; machine_bdf = (d->arch.pci_cf8 >> 8) & 0xFFFF; + + for_each_amd_iommu ( iommu ) + { + if ( machine_bdf == iommu->bdf ) + return 0; + } + start = d->arch.pci_cf8 & 0xFF; end = start + size - 1; if (xsm_pci_config_permission(d, machine_bdf, start, end, write))> > Commit d5dea7d95c48d7bc951cee4910a7fd9c0cd26fb0 ("PCI: msi: Disable msi > interrupts when we initialize a pci device") caused MSI to get disabled > on Xen Dom0 despite it having got turned on purposefully by the > hypervisor. As an immediate band aid, suppress the disabling in this > specific case. > > I don''t think, however, that either the original change or this fix are > really appropriate. The original fix, besides leaving aside the > presence of a hypervisor like Xen, doesn''t deal with all possible > cases (in particular it has no effect if the secondary kernel was built > with CONFIG_PCI_MSI unset). And taking into account a hypervisor like > Xen, the logic like this should probably be skipped altogether (i.e. it > should be entirely left to the hypervisor, being the entity in control > of IRQs). > > Signed-off-by: Jan Beulich<jbeulich@suse.com> > Cc: stable@kernel.org > > --- > drivers/pci/msi.c | 7 +++++++ > include/linux/pci_ids.h | 1 + > 2 files changed, 8 insertions(+) > > --- 3.5-rc3/drivers/pci/msi.c > +++ 3.5-rc3-xen-pci-msi-no-iommu-disable/drivers/pci/msi.c > @@ -20,6 +20,7 @@ > #include<linux/errno.h> > #include<linux/io.h> > #include<linux/slab.h> > +#include<xen/xen.h> > > #include "pci.h" > #include "msi.h" > @@ -1022,7 +1023,13 @@ void pci_msi_init_pci_dev(struct pci_dev > /* Disable the msi hardware to avoid screaming interrupts > * during boot. This is the power on reset default so > * usually this should be a noop. > + * But on a Xen host don''t do this for IOMMUs which the hypervisor > + * is in control of (and hence has already enabled on purpose). > */ > + if (xen_initial_domain() > + && (dev->class>> 8) == PCI_CLASS_SYSTEM_IOMMU > + && dev->vendor == PCI_VENDOR_ID_AMD) > + return; > pos = pci_find_capability(dev, PCI_CAP_ID_MSI); > if (pos) > msi_set_enable(dev, pos, 0); > --- 3.5-rc3/include/linux/pci_ids.h > +++ 3.5-rc3-xen-pci-msi-no-iommu-disable/include/linux/pci_ids.h > @@ -75,6 +75,7 @@ > #define PCI_CLASS_SYSTEM_RTC 0x0803 > #define PCI_CLASS_SYSTEM_PCI_HOTPLUG 0x0804 > #define PCI_CLASS_SYSTEM_SDHCI 0x0805 > +#define PCI_CLASS_SYSTEM_IOMMU 0x0806 > #define PCI_CLASS_SYSTEM_OTHER 0x0880 > > #define PCI_BASE_CLASS_INPUT 0x09 > > > >
Jan Beulich
2012-Jun-21 12:06 UTC
Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0
>>> On 21.06.12 at 13:21, Wei Wang <wei.wang2@amd.com> wrote: > On 06/21/2012 11:59 AM, Jan Beulich wrote: >>>>> On 14.06.12 at 17:15, Wei Wang<wei.wang2@amd.com> wrote: >>> Am 14.06.2012 16:18, schrieb Jan Beulich: >>>> Have you at all considered getting this fixed on the kernel side? >>>> As I don''t have direct access to any AMD IOMMU capable >>>> system - can one, other than by enumerating the respective >>>> PCI IDs or reading ACPI tables, reasonably easily identify the >>>> devices in question (e.g. via vendor/class/sub-class or some >>>> such)? That might permit skipping those in the offending kernel >>>> code... >>> >>> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral) >>> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this >>> should be enough to identify amd iommu device. I could send you a kernel >>> patch for review using this approach. I would believe that fixing this >>> issue in 4.2, no matter how, is really important for amd iommu. >> >> As you didn''t come forward with anything, here''s my first >> take on this: > > Hi Jan > Thanks a lot for the patch. Actually I plan to send my version today, > which is based on 3.4 pv_ops but looks very similar to yours. So, Acked! > > I also evaluated the possibility of hiding iommu device from dom0. I > think the change is no quite a lot, at least, for io based pcicfg > access. A proof-of-concept patch is attached.This completely hides the device from Dom0, but only when config space is accessed via method 1. Did you not see my earlier patch doing this for MCFG as well (albeit only disallowing writes, so allowing the device to still be seen by Dom0)? Whether completely hiding the device is actually okay I can''t easily tell: Would IOMMUs always be either at func 0 of a single- unction device, or at a non-zero func of a multi-function one? If not, other devices may get hidden implicitly. Also I noticed just now that guest_io_read() wouldn''t really behave correctly when pci_cfg_ok() returned false - it might pass back 0xff even for a multi-byte read. I''ll send a fix shortly. Jan> --- a/xen/arch/x86/traps.c Thu Jun 21 11:30:59 2012 +0200 > +++ b/xen/arch/x86/traps.c Thu Jun 21 13:19:02 2012 +0200 > @@ -73,6 +73,7 @@ > #include <asm/hpet.h> > #include <public/arch-x86/cpuid.h> > #include <xsm/xsm.h> > +#include <asm/hvm/svm/amd-iommu-proto.h> > > /* > * opt_nmi: one of ''ignore'', ''dom0'', or ''fatal''. > @@ -1686,10 +1687,19 @@ static int pci_cfg_ok(struct domain *d, > { > uint32_t machine_bdf; > uint16_t start, end; > + struct amd_iommu *iommu; > + > if (!IS_PRIV(d)) > return 0; > > machine_bdf = (d->arch.pci_cf8 >> 8) & 0xFFFF; > + > + for_each_amd_iommu ( iommu ) > + { > + if ( machine_bdf == iommu->bdf ) > + return 0; > + } > + > start = d->arch.pci_cf8 & 0xFF; > end = start + size - 1; > if (xsm_pci_config_permission(d, machine_bdf, start, end, write))
Wei Wang
2012-Jun-21 12:28 UTC
Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0
On 06/21/2012 02:06 PM, Jan Beulich wrote:>>>> On 21.06.12 at 13:21, Wei Wang<wei.wang2@amd.com> wrote: >> On 06/21/2012 11:59 AM, Jan Beulich wrote: >>>>>> On 14.06.12 at 17:15, Wei Wang<wei.wang2@amd.com> wrote: >>>> Am 14.06.2012 16:18, schrieb Jan Beulich: >>>>> Have you at all considered getting this fixed on the kernel side? >>>>> As I don''t have direct access to any AMD IOMMU capable >>>>> system - can one, other than by enumerating the respective >>>>> PCI IDs or reading ACPI tables, reasonably easily identify the >>>>> devices in question (e.g. via vendor/class/sub-class or some >>>>> such)? That might permit skipping those in the offending kernel >>>>> code... >>>> >>>> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral) >>>> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this >>>> should be enough to identify amd iommu device. I could send you a kernel >>>> patch for review using this approach. I would believe that fixing this >>>> issue in 4.2, no matter how, is really important for amd iommu. >>> >>> As you didn''t come forward with anything, here''s my first >>> take on this: >> >> Hi Jan >> Thanks a lot for the patch. Actually I plan to send my version today, >> which is based on 3.4 pv_ops but looks very similar to yours. So, Acked! >> >> I also evaluated the possibility of hiding iommu device from dom0. I >> think the change is no quite a lot, at least, for io based pcicfg >> access. A proof-of-concept patch is attached. > > This completely hides the device from Dom0, but only when > config space is accessed via method 1. Did you not see my > earlier patch doing this for MCFG as wellCould you please provide a particular c/s number?... (I saw too many c/s might be related to this topic). so that I could work out a patch to support both i/o and mmcfg. (albeit only disallowing> writes, so allowing the device to still be seen by Dom0)?Sounds better to me...this still allows user to check iommu status from lspci.> Whether completely hiding the device is actually okay I can''t > easily tell: Would IOMMUs always be either at func 0 of a single- > unction device, or at a non-zero func of a multi-function one? If > not, other devices may get hidden implicitly.AMD IOMMU is an independent pci-e endpoint, and this function will not be used for other purposes other than containing an iommu. So I don''t see that iommu will share bdf value with other devices. Thanks, Wei> Also I noticed just now that guest_io_read() wouldn''t really > behave correctly when pci_cfg_ok() returned false - it might > pass back 0xff even for a multi-byte read. I''ll send a fix shortly. > > Jan > >> --- a/xen/arch/x86/traps.c Thu Jun 21 11:30:59 2012 +0200 >> +++ b/xen/arch/x86/traps.c Thu Jun 21 13:19:02 2012 +0200 >> @@ -73,6 +73,7 @@ >> #include<asm/hpet.h> >> #include<public/arch-x86/cpuid.h> >> #include<xsm/xsm.h> >> +#include<asm/hvm/svm/amd-iommu-proto.h> >> >> /* >> * opt_nmi: one of ''ignore'', ''dom0'', or ''fatal''. >> @@ -1686,10 +1687,19 @@ static int pci_cfg_ok(struct domain *d, >> { >> uint32_t machine_bdf; >> uint16_t start, end; >> + struct amd_iommu *iommu; >> + >> if (!IS_PRIV(d)) >> return 0; >> >> machine_bdf = (d->arch.pci_cf8>> 8)& 0xFFFF; >> + >> + for_each_amd_iommu ( iommu ) >> + { >> + if ( machine_bdf == iommu->bdf ) >> + return 0; >> + } >> + >> start = d->arch.pci_cf8& 0xFF; >> end = start + size - 1; >> if (xsm_pci_config_permission(d, machine_bdf, start, end, write)) > > >
Jan Beulich
2012-Jun-21 12:28 UTC
Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0
>>> On 21.06.12 at 13:08, Eric W. Biederman <ebiederm@xmission.com> wrote: > "Jan Beulich" <JBeulich@suse.com> writes: > >>>>> On 14.06.12 at 17:15, Wei Wang <wei.wang2@amd.com> wrote: >>> Am 14.06.2012 16:18, schrieb Jan Beulich: >>>> Have you at all considered getting this fixed on the kernel side? >>>> As I don''t have direct access to any AMD IOMMU capable >>>> system - can one, other than by enumerating the respective >>>> PCI IDs or reading ACPI tables, reasonably easily identify the >>>> devices in question (e.g. via vendor/class/sub-class or some >>>> such)? That might permit skipping those in the offending kernel >>>> code... >>> >>> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral) >>> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this >>> should be enough to identify amd iommu device. I could send you a kernel >>> patch for review using this approach. I would believe that fixing this >>> issue in 4.2, no matter how, is really important for amd iommu. >> >> As you didn''t come forward with anything, here''s my first >> take on this: >> >> Commit d5dea7d95c48d7bc951cee4910a7fd9c0cd26fb0 ("PCI: msi: Disable msi >> interrupts when we initialize a pci device") caused MSI to get disabled >> on Xen Dom0 despite it having got turned on purposefully by the >> hypervisor. As an immediate band aid, suppress the disabling in this >> specific case. >> >> I don''t think, however, that either the original change or this fix are >> really appropriate. The original fix, besides leaving aside the >> presence of a hypervisor like Xen, doesn''t deal with all possible >> cases (in particular it has no effect if the secondary kernel was built >> with CONFIG_PCI_MSI unset). And taking into account a hypervisor like >> Xen, the logic like this should probably be skipped altogether (i.e. it >> should be entirely left to the hypervisor, being the entity in control >> of IRQs). > > The original fix is fine. Xen dom0 remains insane. Although perhaps > some better than Xen dom0 once was. > > Why does the dom0 kernel even get any access to pci devices that > Xen doesn''t want it to touch? > > As far as I can tell my patch has revealed a design bug in Xen. If you > don''t want to be messed up by the kernel don''t let the kernel touch > things.I rather think this is a design bug of the AMD IOMMU - it should never have got surfaced as a PCI device. Furthermore, fully hiding _any_ PCI device from Dom0 has its own set of problems - depending on the nature of the device, full emulation of all devices may become necessary to get this implemented (because this can implicitly hide other devices, or the Dom0 kernel re-assigning BARs could get in conflict with what the hidden devices use). Xen''s model has always been to allow Dom0 full control over all peripherals and bridges, so if all of the sudden PCI devices of other kinds show up, we can''t simply re-model everything.> At the very least we need an abstraction much cleaner > than the patch below.Probably - any suggestion? As said in the mail I sent, this is meant to overcome the immediate problem, and a more permanent fix would be desirable (ideally suppressing the playing with the MSI related data altogether, following the rest of the MSI support in Xen/Dom0). Jan
Jan Beulich
2012-Jun-21 12:45 UTC
Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0
>>> On 21.06.12 at 14:28, Wei Wang <wei.wang2@amd.com> wrote: > On 06/21/2012 02:06 PM, Jan Beulich wrote: >>>>> On 21.06.12 at 13:21, Wei Wang<wei.wang2@amd.com> wrote: >>> I also evaluated the possibility of hiding iommu device from dom0. I >>> think the change is no quite a lot, at least, for io based pcicfg >>> access. A proof-of-concept patch is attached. >> >> This completely hides the device from Dom0, but only when >> config space is accessed via method 1. Did you not see my >> earlier patch doing this for MCFG as well > Could you please provide a particular c/s number?... (I saw too many c/s > might be related to this topic). so that I could work out a patch to > support both i/o and mmcfg.I sent this to you yesterday, so you''d be able to test whether it actually fulfills its purpose before we discuss whether this is acceptable for 4.2. See http://lists.xen.org/archives/html/xen-devel/2012-06/msg01129.html> (albeit only disallowing >> writes, so allowing the device to still be seen by Dom0)? > Sounds better to me...this still allows user to check iommu status from > lspci. > >> Whether completely hiding the device is actually okay I can''t >> easily tell: Would IOMMUs always be either at func 0 of a single- >> unction device, or at a non-zero func of a multi-function one? If >> not, other devices may get hidden implicitly. > > AMD IOMMU is an independent pci-e endpoint, and this function will not > be used for other purposes other than containing an iommu. So I don''t > see that iommu will share bdf value with other devices.The question is not regarding bdf, but regarding whether under the same seg:bus:dev there might be multiple functions, one of which is the IOMMU, and if so, whether the IOMMU would be guaranteed to have a non-zero function number. Jan
Wei Wang
2012-Jun-21 13:10 UTC
Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0
On 06/21/2012 02:45 PM, Jan Beulich wrote:>>>> On 21.06.12 at 14:28, Wei Wang<wei.wang2@amd.com> wrote: >> On 06/21/2012 02:06 PM, Jan Beulich wrote: >>>>>> On 21.06.12 at 13:21, Wei Wang<wei.wang2@amd.com> wrote: >>>> I also evaluated the possibility of hiding iommu device from dom0. I >>>> think the change is no quite a lot, at least, for io based pcicfg >>>> access. A proof-of-concept patch is attached. >>> >>> This completely hides the device from Dom0, but only when >>> config space is accessed via method 1. Did you not see my >>> earlier patch doing this for MCFG as well >> Could you please provide a particular c/s number?... (I saw too many c/s >> might be related to this topic). so that I could work out a patch to >> support both i/o and mmcfg. > > I sent this to you yesterday, so you''d be able to test whether > it actually fulfills its purpose before we discuss whether this is > acceptable for 4.2. See > http://lists.xen.org/archives/html/xen-devel/2012-06/msg01129.htmlOh, yes I found it, my email filter did not work well so I did not see it at the right folder. I will test right now.> >> (albeit only disallowing >>> writes, so allowing the device to still be seen by Dom0)? >> Sounds better to me...this still allows user to check iommu status from >> lspci. >> >>> Whether completely hiding the device is actually okay I can''t >>> easily tell: Would IOMMUs always be either at func 0 of a single- >>> unction device, or at a non-zero func of a multi-function one? If >>> not, other devices may get hidden implicitly. >> >> AMD IOMMU is an independent pci-e endpoint, and this function will not >> be used for other purposes other than containing an iommu. So I don''t >> see that iommu will share bdf value with other devices. > > The question is not regarding bdf, but regarding whether under > the same seg:bus:dev there might be multiple functions, one of > which is the IOMMU, and if so, whether the IOMMU would be > guaranteed to have a non-zero function number.In a real system (single or multiple iommu), amd iommu shares the same device number with north bridge but has function number 2.. (e.g bus:00.2) Howerver according to spec, it does not guaranteed to have non-zero function number. So what is the problem you see if iommu uses fun0 on a multi-func device? Thanks, Wei> Jan > >
Jan Beulich
2012-Jun-21 13:24 UTC
Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0
>>> On 21.06.12 at 15:10, Wei Wang <wei.wang2@amd.com> wrote: > On 06/21/2012 02:45 PM, Jan Beulich wrote: >>>>> On 21.06.12 at 14:28, Wei Wang<wei.wang2@amd.com> wrote: >>> AMD IOMMU is an independent pci-e endpoint, and this function will not >>> be used for other purposes other than containing an iommu. So I don''t >>> see that iommu will share bdf value with other devices. >> >> The question is not regarding bdf, but regarding whether under >> the same seg:bus:dev there might be multiple functions, one of >> which is the IOMMU, and if so, whether the IOMMU would be >> guaranteed to have a non-zero function number. > > In a real system (single or multiple iommu), amd iommu shares the same > device number with north bridge but has function number 2.. (e.g > bus:00.2) Howerver according to spec, it does not guaranteed to have > non-zero function number. So what is the problem you see if iommu uses > fun0 on a multi-func device?If it''s on func 0 and gets hidden completely (as done by your partial patch), other functions won''t be found when scanning for them (because secondary functions get looked at only when func 0 actually exists, as otherwise evaluating the header type register is invalid). Jan
Wei Wang
2012-Jun-21 13:27 UTC
Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0
On 06/21/2012 03:24 PM, Jan Beulich wrote:>>>> On 21.06.12 at 15:10, Wei Wang<wei.wang2@amd.com> wrote: >> On 06/21/2012 02:45 PM, Jan Beulich wrote: >>>>>> On 21.06.12 at 14:28, Wei Wang<wei.wang2@amd.com> wrote: >>>> AMD IOMMU is an independent pci-e endpoint, and this function will not >>>> be used for other purposes other than containing an iommu. So I don''t >>>> see that iommu will share bdf value with other devices. >>> >>> The question is not regarding bdf, but regarding whether under >>> the same seg:bus:dev there might be multiple functions, one of >>> which is the IOMMU, and if so, whether the IOMMU would be >>> guaranteed to have a non-zero function number. >> >> In a real system (single or multiple iommu), amd iommu shares the same >> device number with north bridge but has function number 2.. (e.g >> bus:00.2) Howerver according to spec, it does not guaranteed to have >> non-zero function number. So what is the problem you see if iommu uses >> fun0 on a multi-func device? > > If it''s on func 0 and gets hidden completely (as done by your > partial patch), other functions won''t be found when scanning > for them (because secondary functions get looked at only > when func 0 actually exists, as otherwise evaluating the header > type register is invalid).OK, understood. Then I think we do need to allow pci cfg read for iommu device. Thanks Wei> Jan > >
Wei Wang
2012-Jun-21 15:29 UTC
Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it
On 06/20/2012 05:45 PM, Jan Beulich wrote: > Rather than submitting it for inclusion immediately, I''d rather see> you first use it successfully. Below/attached what appears to > work fine for me in contrived situations (i.e. hiding bridges with > nothing connected, as I still don''t have any AMD IOMMU system > at hand).Hi Jan, The patch works for me. IOMMU msi Capability is shown as enabled with this patch. Thanks, Wei 00:00.2 Generic system peripheral [0806]: Advanced Micro Devices [AMD] Device 1419 Subsystem: Advanced Micro Devices [AMD] Device 1419 Flags: bus master, fast devsel, latency 0, IRQ 11 Capabilities: [40] Secure device <?> Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+> Jan > > Note that due to ptwr_do_page_fault() being run first, there''ll be a > MEM_LOG() issued for each such attempt. If that''s undesirable, the > order of the calls would need to be swapped. > > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -1875,6 +1875,7 @@ static int mod_l1_entry(l1_pgentry_t *pl > break; > case 1: > l1e_remove_flags(nl1e, _PAGE_RW); > + rc = 0; > break; > } > if ( page ) > @@ -5208,6 +5209,97 @@ int ptwr_do_page_fault(struct vcpu *v, u > return 0; > } > > +#ifdef __x86_64__ > +/************************* > + * fault handling for read-only MMIO pages > + */ > + > +struct mmio_ro_emulate_ctxt { > + struct x86_emulate_ctxt ctxt; > + unsigned long cr2; > +}; > + > +static int mmio_ro_emulated_read( > + enum x86_segment seg, > + unsigned long offset, > + void *p_data, > + unsigned int bytes, > + struct x86_emulate_ctxt *ctxt) > +{ > + return X86EMUL_UNHANDLEABLE; > +} > + > +static int mmio_ro_emulated_write( > + enum x86_segment seg, > + unsigned long offset, > + void *p_data, > + unsigned int bytes, > + struct x86_emulate_ctxt *ctxt) > +{ > + struct mmio_ro_emulate_ctxt *mmio_ro_ctxt > + container_of(ctxt, struct mmio_ro_emulate_ctxt, ctxt); > + > + /* Only allow naturally-aligned stores at the original %cr2 address. */ > + if ( ((bytes | offset)& (bytes - 1)) || offset != mmio_ro_ctxt->cr2 ) > + { > + MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)", > + mmio_ro_ctxt->cr2, offset, bytes); > + return X86EMUL_UNHANDLEABLE; > + } > + > + return X86EMUL_OKAY; > +} > + > +static const struct x86_emulate_ops mmio_ro_emulate_ops = { > + .read = mmio_ro_emulated_read, > + .insn_fetch = ptwr_emulated_read, > + .write = mmio_ro_emulated_write, > +}; > + > +/* Check if guest is trying to modify a r/o MMIO page. */ > +int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr, > + struct cpu_user_regs *regs) > +{ > + l1_pgentry_t pte; > + unsigned long mfn; > + unsigned int addr_size = is_pv_32on64_domain(v->domain) ? > + 32 : BITS_PER_LONG; > + struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { > + .ctxt.regs = regs, > + .ctxt.addr_size = addr_size, > + .ctxt.sp_size = addr_size, > + .cr2 = addr > + }; > + int rc; > + > + /* Attempt to read the PTE that maps the VA being accessed. */ > + guest_get_eff_l1e(v, addr,&pte); > + > + /* We are looking only for read-only mappings of MMIO pages. */ > + if ( ((l1e_get_flags(pte)& (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ) > + return 0; > + > + mfn = l1e_get_pfn(pte); > + if ( mfn_valid(mfn) ) > + { > + struct page_info *page = mfn_to_page(mfn); > + struct domain *owner = page_get_owner_and_reference(page); > + > + if ( owner ) > + put_page(page); > + if ( owner != dom_io ) > + return 0; > + } > + > + if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) > + return 0; > + > + rc = x86_emulate(&mmio_ro_ctxt.ctxt,&mmio_ro_emulate_ops); > + > + return rc != X86EMUL_UNHANDLEABLE ? EXCRET_fault_fixed : 0; > +} > +#endif /* __x86_64__ */ > + > void free_xen_pagetable(void *v) > { > if ( system_state == SYS_STATE_early_boot ) > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -1349,20 +1349,23 @@ static int fixup_page_fault(unsigned lon > return 0; > } > > - if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables)&& > - guest_kernel_mode(v, regs) ) > - { > - unsigned int mbs = PFEC_write_access; > - unsigned int mbz = PFEC_reserved_bit | PFEC_insn_fetch; > - > - /* Do not check if access-protection fault since the page may > - legitimately be not present in shadow page tables */ > - if ( !paging_mode_enabled(d) ) > - mbs |= PFEC_page_present; > - > - if ( ((regs->error_code& (mbs | mbz)) == mbs)&& > + if ( guest_kernel_mode(v, regs)&& > + !(regs->error_code& (PFEC_reserved_bit | PFEC_insn_fetch))&& > + (regs->error_code& PFEC_write_access) ) > + { > + if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables)&& > + /* Do not check if access-protection fault since the page may > + legitimately be not present in shadow page tables */ > + (paging_mode_enabled(d) || > + (regs->error_code& PFEC_page_present))&& > ptwr_do_page_fault(v, addr, regs) ) > return EXCRET_fault_fixed; > + > +#ifdef __x86_64__ > + if ( IS_PRIV(d)&& (regs->error_code& PFEC_page_present)&& > + mmio_ro_do_page_fault(v, addr, regs) ) > + return EXCRET_fault_fixed; > +#endif > } > > /* For non-external shadowed guests, we fix up both their own > @@ -1690,6 +1693,13 @@ static int pci_cfg_ok(struct domain *d, > return 0; > > machine_bdf = (d->arch.pci_cf8>> 8)& 0xFFFF; > + if ( write ) > + { > + const unsigned long *ro_map = pci_get_ro_map(0); > + > + if ( ro_map&& test_bit(machine_bdf, ro_map) ) > + return 0; > + } > start = d->arch.pci_cf8& 0xFF; > end = start + size - 1; > if (xsm_pci_config_permission(d, machine_bdf, start, end, write)) > --- a/xen/arch/x86/x86_32/pci.c > +++ b/xen/arch/x86/x86_32/pci.c > @@ -6,6 +6,7 @@ > > #include<xen/spinlock.h> > #include<xen/pci.h> > +#include<xen/init.h> > #include<asm/io.h> > > #define PCI_CONF_ADDRESS(bus, dev, func, reg) \ > @@ -70,3 +71,7 @@ void pci_conf_write32( > BUG_ON((bus> 255) || (dev> 31) || (func> 7) || (reg> 255)); > pci_conf_write(PCI_CONF_ADDRESS(bus, dev, func, reg), 0, 4, data); > } > + > +void __init arch_pci_ro_device(int seg, int bdf) > +{ > +} > --- a/xen/arch/x86/x86_64/mmconfig_64.c > +++ b/xen/arch/x86/x86_64/mmconfig_64.c > @@ -145,9 +145,30 @@ static void __iomem *mcfg_ioremap(const > return (void __iomem *) virt; > } > > +void arch_pci_ro_device(int seg, int bdf) > +{ > + unsigned int idx, bus = PCI_BUS(bdf); > + > + for (idx = 0; idx< pci_mmcfg_config_num; ++idx) { > + const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg; > + unsigned long mfn = (cfg->address>> PAGE_SHIFT) + bdf; > + > + if (!pci_mmcfg_virt[idx].virt || cfg->pci_segment != seg || > + cfg->start_bus_number> bus || cfg->end_bus_number< bus) > + continue; > + > + if (rangeset_add_singleton(mmio_ro_ranges, mfn)) > + printk(XENLOG_ERR > + "%04x:%02x:%02x.%u: could not mark MCFG (mfn %#lx) read-only\n", > + cfg->pci_segment, bus, PCI_SLOT(bdf), PCI_FUNC(bdf), > + mfn); > + } > +} > + > int pci_mmcfg_arch_enable(unsigned int idx) > { > const typeof(pci_mmcfg_config[0]) *cfg = pci_mmcfg_virt[idx].cfg; > + const unsigned long *ro_map = pci_get_ro_map(cfg->pci_segment); > > if (pci_mmcfg_virt[idx].virt) > return 0; > @@ -159,6 +180,16 @@ int pci_mmcfg_arch_enable(unsigned int i > } > printk(KERN_INFO "PCI: Using MCFG for segment %04x bus %02x-%02x\n", > cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number); > + if (ro_map) { > + unsigned int bdf = PCI_BDF(cfg->start_bus_number, 0, 0); > + unsigned int end = PCI_BDF(cfg->end_bus_number, -1, -1); > + > + while ((bdf = find_next_bit(ro_map, end + 1, bdf))<= end) { > + arch_pci_ro_device(cfg->pci_segment, bdf); > + if (bdf++ == end) > + break; > + } > + } > return 0; > } > > --- a/xen/drivers/passthrough/amd/iommu_detect.c > +++ b/xen/drivers/passthrough/amd/iommu_detect.c > @@ -153,6 +153,12 @@ int __init amd_iommu_detect_one_acpi( > if ( rt ) > return -ENODEV; > > + rt = pci_ro_device(iommu->seg, bus, PCI_DEVFN(dev, func)); > + if ( rt ) > + printk(XENLOG_ERR > + "Could not mark config space of %04x:%02x:%02x.%u read-only (%d)\n", > + iommu->seg, bus, dev, func, rt); > + > list_add_tail(&iommu->list,&amd_iommu_head); > > return 0; > --- a/xen/drivers/passthrough/io.c > +++ b/xen/drivers/passthrough/io.c > @@ -593,11 +593,3 @@ void hvm_dpci_eoi(struct domain *d, unsi > unlock: > spin_unlock(&d->event_lock); > } > - > -static int __init setup_mmio_ro_ranges(void) > -{ > - mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges", > - RANGESETF_prettyprint_hex); > - return 0; > -} > -__initcall(setup_mmio_ro_ranges); > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -36,6 +36,7 @@ > struct pci_seg { > struct list_head alldevs_list; > u16 nr; > + unsigned long *ro_map; > /* bus2bridge_lock protects bus2bridge array */ > spinlock_t bus2bridge_lock; > #define MAX_BUSES 256 > @@ -106,6 +107,8 @@ void __init pt_pci_init(void) > radix_tree_init(&pci_segments); > if ( !alloc_pseg(0) ) > panic("Could not initialize PCI segment 0\n"); > + mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges", > + RANGESETF_prettyprint_hex); > } > > int __init pci_add_segment(u16 seg) > @@ -113,6 +116,13 @@ int __init pci_add_segment(u16 seg) > return alloc_pseg(seg) ? 0 : -ENOMEM; > } > > +const unsigned long *pci_get_ro_map(u16 seg) > +{ > + struct pci_seg *pseg = get_pseg(seg); > + > + return pseg ? pseg->ro_map : NULL; > +} > + > static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn) > { > struct pci_dev *pdev; > @@ -198,6 +208,33 @@ static void free_pdev(struct pci_seg *ps > xfree(pdev); > } > > +int __init pci_ro_device(int seg, int bus, int devfn) > +{ > + struct pci_seg *pseg = alloc_pseg(seg); > + struct pci_dev *pdev; > + > + if ( !pseg ) > + return -ENOMEM; > + pdev = alloc_pdev(pseg, bus, devfn); > + if ( !pdev ) > + return -ENOMEM; > + > + if ( !pseg->ro_map ) > + { > + size_t sz = BITS_TO_LONGS(PCI_BDF(-1, -1, -1) + 1) * sizeof(long); > + > + pseg->ro_map = alloc_xenheap_pages(get_order_from_bytes(sz), 0); > + if ( !pseg->ro_map ) > + return -ENOMEM; > + memset(pseg->ro_map, 0, sz); > + } > + > + __set_bit(PCI_BDF2(bus, devfn), pseg->ro_map); > + arch_pci_ro_device(seg, PCI_BDF2(bus, devfn)); > + > + return 0; > +} > + > struct pci_dev *pci_get_pdev(int seg, int bus, int devfn) > { > struct pci_seg *pseg = get_pseg(seg); > --- a/xen/include/asm-x86/mm.h > +++ b/xen/include/asm-x86/mm.h > @@ -555,6 +555,8 @@ void memguard_unguard_stack(void *p); > > int ptwr_do_page_fault(struct vcpu *, unsigned long, > struct cpu_user_regs *); > +int mmio_ro_do_page_fault(struct vcpu *, unsigned long, > + struct cpu_user_regs *); > > int audit_adjust_pgtables(struct domain *d, int dir, int noisy); > > --- a/xen/include/xen/pci.h > +++ b/xen/include/xen/pci.h > @@ -98,8 +98,11 @@ struct pci_dev *pci_lock_domain_pdev( > void setup_dom0_pci_devices(struct domain *, void (*)(struct pci_dev *)); > void pci_release_devices(struct domain *d); > int pci_add_segment(u16 seg); > +const unsigned long *pci_get_ro_map(u16 seg); > int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *); > int pci_remove_device(u16 seg, u8 bus, u8 devfn); > +int pci_ro_device(int seg, int bus, int devfn); > +void arch_pci_ro_device(int seg, int bdf); > struct pci_dev *pci_get_pdev(int seg, int bus, int devfn); > struct pci_dev *pci_get_pdev_by_domain( > struct domain *, int seg, int bus, int devfn); >
Jan Beulich
2012-Jun-21 15:49 UTC
Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it
>>> On 21.06.12 at 17:29, Wei Wang <wei.wang2@amd.com> wrote: > On 06/20/2012 05:45 PM, Jan Beulich wrote: > > Rather than submitting it for inclusion immediately, I''d rather see >> you first use it successfully. Below/attached what appears to >> work fine for me in contrived situations (i.e. hiding bridges with >> nothing connected, as I still don''t have any AMD IOMMU system >> at hand). > > The patch works for me. IOMMU msi Capability is shown as enabled with > this patch.Keir, the question now arises whether we really want this, and particularly this late before 4.2. The Linux folks don''t seem to be willing to take the strait forward workaround for the problem introduced at their end, so we will likely need something (the more that the questionable fix already made it into various stable trees) before 4.2 goes out (and even the older trees would be affected, just that putting a change like this there is even more questionable). There are obviously more potential problems in this area: If any of the MMIO addresses used by AMD''s IOMMU is configurable through one of the BARs, and if Dom0 decides to re-assign MMIO space, neither allowing the writes nor simply dropping them as done here will work. Whether that''s a real problem I don''t know - Wei? And there might be other issues arising from dropping all writes - we might just currently not be aware of them. Jan> 00:00.2 Generic system peripheral [0806]: Advanced Micro Devices [AMD] > Device 1419 > Subsystem: Advanced Micro Devices [AMD] Device 1419 > Flags: bus master, fast devsel, latency 0, IRQ 11 > Capabilities: [40] Secure device <?> > Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+ > > >> Jan >> >> Note that due to ptwr_do_page_fault() being run first, there''ll be a >> MEM_LOG() issued for each such attempt. If that''s undesirable, the >> order of the calls would need to be swapped. >> >> --- a/xen/arch/x86/mm.c >> +++ b/xen/arch/x86/mm.c >> @@ -1875,6 +1875,7 @@ static int mod_l1_entry(l1_pgentry_t *pl >> break; >> case 1: >> l1e_remove_flags(nl1e, _PAGE_RW); >> + rc = 0; >> break; >> } >> if ( page ) >> @@ -5208,6 +5209,97 @@ int ptwr_do_page_fault(struct vcpu *v, u >> return 0; >> } >> >> +#ifdef __x86_64__ >> +/************************* >> + * fault handling for read-only MMIO pages >> + */ >> + >> +struct mmio_ro_emulate_ctxt { >> + struct x86_emulate_ctxt ctxt; >> + unsigned long cr2; >> +}; >> + >> +static int mmio_ro_emulated_read( >> + enum x86_segment seg, >> + unsigned long offset, >> + void *p_data, >> + unsigned int bytes, >> + struct x86_emulate_ctxt *ctxt) >> +{ >> + return X86EMUL_UNHANDLEABLE; >> +} >> + >> +static int mmio_ro_emulated_write( >> + enum x86_segment seg, >> + unsigned long offset, >> + void *p_data, >> + unsigned int bytes, >> + struct x86_emulate_ctxt *ctxt) >> +{ >> + struct mmio_ro_emulate_ctxt *mmio_ro_ctxt >> + container_of(ctxt, struct mmio_ro_emulate_ctxt, ctxt); >> + >> + /* Only allow naturally-aligned stores at the original %cr2 address. */ >> + if ( ((bytes | offset)& (bytes - 1)) || offset != mmio_ro_ctxt->cr2 ) >> + { >> + MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx, addr=%lx, > bytes=%u)", >> + mmio_ro_ctxt->cr2, offset, bytes); >> + return X86EMUL_UNHANDLEABLE; >> + } >> + >> + return X86EMUL_OKAY; >> +} >> + >> +static const struct x86_emulate_ops mmio_ro_emulate_ops = { >> + .read = mmio_ro_emulated_read, >> + .insn_fetch = ptwr_emulated_read, >> + .write = mmio_ro_emulated_write, >> +}; >> + >> +/* Check if guest is trying to modify a r/o MMIO page. */ >> +int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr, >> + struct cpu_user_regs *regs) >> +{ >> + l1_pgentry_t pte; >> + unsigned long mfn; >> + unsigned int addr_size = is_pv_32on64_domain(v->domain) ? >> + 32 : BITS_PER_LONG; >> + struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { >> + .ctxt.regs = regs, >> + .ctxt.addr_size = addr_size, >> + .ctxt.sp_size = addr_size, >> + .cr2 = addr >> + }; >> + int rc; >> + >> + /* Attempt to read the PTE that maps the VA being accessed. */ >> + guest_get_eff_l1e(v, addr,&pte); >> + >> + /* We are looking only for read-only mappings of MMIO pages. */ >> + if ( ((l1e_get_flags(pte)& (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) > ) >> + return 0; >> + >> + mfn = l1e_get_pfn(pte); >> + if ( mfn_valid(mfn) ) >> + { >> + struct page_info *page = mfn_to_page(mfn); >> + struct domain *owner = page_get_owner_and_reference(page); >> + >> + if ( owner ) >> + put_page(page); >> + if ( owner != dom_io ) >> + return 0; >> + } >> + >> + if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) >> + return 0; >> + >> + rc = x86_emulate(&mmio_ro_ctxt.ctxt,&mmio_ro_emulate_ops); >> + >> + return rc != X86EMUL_UNHANDLEABLE ? EXCRET_fault_fixed : 0; >> +} >> +#endif /* __x86_64__ */ >> + >> void free_xen_pagetable(void *v) >> { >> if ( system_state == SYS_STATE_early_boot ) >> --- a/xen/arch/x86/traps.c >> +++ b/xen/arch/x86/traps.c >> @@ -1349,20 +1349,23 @@ static int fixup_page_fault(unsigned lon >> return 0; >> } >> >> - if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables)&& >> - guest_kernel_mode(v, regs) ) >> - { >> - unsigned int mbs = PFEC_write_access; >> - unsigned int mbz = PFEC_reserved_bit | PFEC_insn_fetch; >> - >> - /* Do not check if access-protection fault since the page may >> - legitimately be not present in shadow page tables */ >> - if ( !paging_mode_enabled(d) ) >> - mbs |= PFEC_page_present; >> - >> - if ( ((regs->error_code& (mbs | mbz)) == mbs)&& >> + if ( guest_kernel_mode(v, regs)&& >> + !(regs->error_code& (PFEC_reserved_bit | PFEC_insn_fetch))&& >> + (regs->error_code& PFEC_write_access) ) >> + { >> + if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables)&& >> + /* Do not check if access-protection fault since the page may >> + legitimately be not present in shadow page tables */ >> + (paging_mode_enabled(d) || >> + (regs->error_code& PFEC_page_present))&& >> ptwr_do_page_fault(v, addr, regs) ) >> return EXCRET_fault_fixed; >> + >> +#ifdef __x86_64__ >> + if ( IS_PRIV(d)&& (regs->error_code& PFEC_page_present)&& >> + mmio_ro_do_page_fault(v, addr, regs) ) >> + return EXCRET_fault_fixed; >> +#endif >> } >> >> /* For non-external shadowed guests, we fix up both their own >> @@ -1690,6 +1693,13 @@ static int pci_cfg_ok(struct domain *d, >> return 0; >> >> machine_bdf = (d->arch.pci_cf8>> 8)& 0xFFFF; >> + if ( write ) >> + { >> + const unsigned long *ro_map = pci_get_ro_map(0); >> + >> + if ( ro_map&& test_bit(machine_bdf, ro_map) ) >> + return 0; >> + } >> start = d->arch.pci_cf8& 0xFF; >> end = start + size - 1; >> if (xsm_pci_config_permission(d, machine_bdf, start, end, write)) >> --- a/xen/arch/x86/x86_32/pci.c >> +++ b/xen/arch/x86/x86_32/pci.c >> @@ -6,6 +6,7 @@ >> >> #include<xen/spinlock.h> >> #include<xen/pci.h> >> +#include<xen/init.h> >> #include<asm/io.h> >> >> #define PCI_CONF_ADDRESS(bus, dev, func, reg) \ >> @@ -70,3 +71,7 @@ void pci_conf_write32( >> BUG_ON((bus> 255) || (dev> 31) || (func> 7) || (reg> 255)); >> pci_conf_write(PCI_CONF_ADDRESS(bus, dev, func, reg), 0, 4, data); >> } >> + >> +void __init arch_pci_ro_device(int seg, int bdf) >> +{ >> +} >> --- a/xen/arch/x86/x86_64/mmconfig_64.c >> +++ b/xen/arch/x86/x86_64/mmconfig_64.c >> @@ -145,9 +145,30 @@ static void __iomem *mcfg_ioremap(const >> return (void __iomem *) virt; >> } >> >> +void arch_pci_ro_device(int seg, int bdf) >> +{ >> + unsigned int idx, bus = PCI_BUS(bdf); >> + >> + for (idx = 0; idx< pci_mmcfg_config_num; ++idx) { >> + const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg; >> + unsigned long mfn = (cfg->address>> PAGE_SHIFT) + bdf; >> + >> + if (!pci_mmcfg_virt[idx].virt || cfg->pci_segment != seg || >> + cfg->start_bus_number> bus || cfg->end_bus_number< bus) >> + continue; >> + >> + if (rangeset_add_singleton(mmio_ro_ranges, mfn)) >> + printk(XENLOG_ERR >> + "%04x:%02x:%02x.%u: could not mark MCFG (mfn %#lx) > read-only\n", >> + cfg->pci_segment, bus, PCI_SLOT(bdf), PCI_FUNC(bdf), >> + mfn); >> + } >> +} >> + >> int pci_mmcfg_arch_enable(unsigned int idx) >> { >> const typeof(pci_mmcfg_config[0]) *cfg = pci_mmcfg_virt[idx].cfg; >> + const unsigned long *ro_map = pci_get_ro_map(cfg->pci_segment); >> >> if (pci_mmcfg_virt[idx].virt) >> return 0; >> @@ -159,6 +180,16 @@ int pci_mmcfg_arch_enable(unsigned int i >> } >> printk(KERN_INFO "PCI: Using MCFG for segment %04x bus %02x-%02x\n", >> cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number); >> + if (ro_map) { >> + unsigned int bdf = PCI_BDF(cfg->start_bus_number, 0, 0); >> + unsigned int end = PCI_BDF(cfg->end_bus_number, -1, -1); >> + >> + while ((bdf = find_next_bit(ro_map, end + 1, bdf))<= end) { >> + arch_pci_ro_device(cfg->pci_segment, bdf); >> + if (bdf++ == end) >> + break; >> + } >> + } >> return 0; >> } >> >> --- a/xen/drivers/passthrough/amd/iommu_detect.c >> +++ b/xen/drivers/passthrough/amd/iommu_detect.c >> @@ -153,6 +153,12 @@ int __init amd_iommu_detect_one_acpi( >> if ( rt ) >> return -ENODEV; >> >> + rt = pci_ro_device(iommu->seg, bus, PCI_DEVFN(dev, func)); >> + if ( rt ) >> + printk(XENLOG_ERR >> + "Could not mark config space of %04x:%02x:%02x.%u read-only > (%d)\n", >> + iommu->seg, bus, dev, func, rt); >> + >> list_add_tail(&iommu->list,&amd_iommu_head); >> >> return 0; >> --- a/xen/drivers/passthrough/io.c >> +++ b/xen/drivers/passthrough/io.c >> @@ -593,11 +593,3 @@ void hvm_dpci_eoi(struct domain *d, unsi >> unlock: >> spin_unlock(&d->event_lock); >> } >> - >> -static int __init setup_mmio_ro_ranges(void) >> -{ >> - mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges", >> - RANGESETF_prettyprint_hex); >> - return 0; >> -} >> -__initcall(setup_mmio_ro_ranges); >> --- a/xen/drivers/passthrough/pci.c >> +++ b/xen/drivers/passthrough/pci.c >> @@ -36,6 +36,7 @@ >> struct pci_seg { >> struct list_head alldevs_list; >> u16 nr; >> + unsigned long *ro_map; >> /* bus2bridge_lock protects bus2bridge array */ >> spinlock_t bus2bridge_lock; >> #define MAX_BUSES 256 >> @@ -106,6 +107,8 @@ void __init pt_pci_init(void) >> radix_tree_init(&pci_segments); >> if ( !alloc_pseg(0) ) >> panic("Could not initialize PCI segment 0\n"); >> + mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges", >> + RANGESETF_prettyprint_hex); >> } >> >> int __init pci_add_segment(u16 seg) >> @@ -113,6 +116,13 @@ int __init pci_add_segment(u16 seg) >> return alloc_pseg(seg) ? 0 : -ENOMEM; >> } >> >> +const unsigned long *pci_get_ro_map(u16 seg) >> +{ >> + struct pci_seg *pseg = get_pseg(seg); >> + >> + return pseg ? pseg->ro_map : NULL; >> +} >> + >> static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn) >> { >> struct pci_dev *pdev; >> @@ -198,6 +208,33 @@ static void free_pdev(struct pci_seg *ps >> xfree(pdev); >> } >> >> +int __init pci_ro_device(int seg, int bus, int devfn) >> +{ >> + struct pci_seg *pseg = alloc_pseg(seg); >> + struct pci_dev *pdev; >> + >> + if ( !pseg ) >> + return -ENOMEM; >> + pdev = alloc_pdev(pseg, bus, devfn); >> + if ( !pdev ) >> + return -ENOMEM; >> + >> + if ( !pseg->ro_map ) >> + { >> + size_t sz = BITS_TO_LONGS(PCI_BDF(-1, -1, -1) + 1) * sizeof(long); >> + >> + pseg->ro_map = alloc_xenheap_pages(get_order_from_bytes(sz), 0); >> + if ( !pseg->ro_map ) >> + return -ENOMEM; >> + memset(pseg->ro_map, 0, sz); >> + } >> + >> + __set_bit(PCI_BDF2(bus, devfn), pseg->ro_map); >> + arch_pci_ro_device(seg, PCI_BDF2(bus, devfn)); >> + >> + return 0; >> +} >> + >> struct pci_dev *pci_get_pdev(int seg, int bus, int devfn) >> { >> struct pci_seg *pseg = get_pseg(seg); >> --- a/xen/include/asm-x86/mm.h >> +++ b/xen/include/asm-x86/mm.h >> @@ -555,6 +555,8 @@ void memguard_unguard_stack(void *p); >> >> int ptwr_do_page_fault(struct vcpu *, unsigned long, >> struct cpu_user_regs *); >> +int mmio_ro_do_page_fault(struct vcpu *, unsigned long, >> + struct cpu_user_regs *); >> >> int audit_adjust_pgtables(struct domain *d, int dir, int noisy); >> >> --- a/xen/include/xen/pci.h >> +++ b/xen/include/xen/pci.h >> @@ -98,8 +98,11 @@ struct pci_dev *pci_lock_domain_pdev( >> void setup_dom0_pci_devices(struct domain *, void (*)(struct pci_dev *)); >> void pci_release_devices(struct domain *d); >> int pci_add_segment(u16 seg); >> +const unsigned long *pci_get_ro_map(u16 seg); >> int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info > *); >> int pci_remove_device(u16 seg, u8 bus, u8 devfn); >> +int pci_ro_device(int seg, int bus, int devfn); >> +void arch_pci_ro_device(int seg, int bdf); >> struct pci_dev *pci_get_pdev(int seg, int bus, int devfn); >> struct pci_dev *pci_get_pdev_by_domain( >> struct domain *, int seg, int bus, int devfn); >>
Keir Fraser
2012-Jun-21 16:31 UTC
Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it
On 21/06/2012 16:49, "Jan Beulich" <JBeulich@suse.com> wrote:> the question now arises whether we really want this, and > particularly this late before 4.2. The Linux folks don''t seem to > be willing to take the strait forward workaround for the > problem introduced at their end, so we will likely need > something (the more that the questionable fix already made > it into various stable trees) before 4.2 goes out (and even > the older trees would be affected, just that putting a change > like this there is even more questionable).I''d be okay seeing the proposed patch go in ahead of 4.2.0-rc1. -- Keir> There are obviously more potential problems in this area: If > any of the MMIO addresses used by AMD''s IOMMU is > configurable through one of the BARs, and if Dom0 decides to > re-assign MMIO space, neither allowing the writes nor simply > dropping them as done here will work. Whether that''s a real > problem I don''t know - Wei? And there might be other issues > arising from dropping all writes - we might just currently not be > aware of them.
Wei Wang
2012-Jun-22 09:03 UTC
Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it
On 06/21/2012 05:49 PM, Jan Beulich wrote:>>>> On 21.06.12 at 17:29, Wei Wang<wei.wang2@amd.com> wrote: >> On 06/20/2012 05:45 PM, Jan Beulich wrote: >> > Rather than submitting it for inclusion immediately, I''d rather see >>> you first use it successfully. Below/attached what appears to >>> work fine for me in contrived situations (i.e. hiding bridges with >>> nothing connected, as I still don''t have any AMD IOMMU system >>> at hand). >> >> The patch works for me. IOMMU msi Capability is shown as enabled with >> this patch. > > Keir, > > the question now arises whether we really want this, and > particularly this late before 4.2. The Linux folks don''t seem to > be willing to take the strait forward workaround for the > problem introduced at their end, so we will likely need > something (the more that the questionable fix already made > it into various stable trees) before 4.2 goes out (and even > the older trees would be affected, just that putting a change > like this there is even more questionable). > > There are obviously more potential problems in this area: If > any of the MMIO addresses used by AMD''s IOMMU is > configurable through one of the BARs, and if Dom0 decides to > re-assign MMIO space, neither allowing the writes nor simply > dropping them as done here will work. Whether that''s a real > problem I don''t know - Wei? And there might be other issues > arising from dropping all writes - we might just currently not be > aware of them.AMD IOMMU does not have any PCI BARs, All address configuration should go to ACPI tables. So this should be fine at least for AMD IOMMU. Thanks Wei> Jan > >> 00:00.2 Generic system peripheral [0806]: Advanced Micro Devices [AMD] >> Device 1419 >> Subsystem: Advanced Micro Devices [AMD] Device 1419 >> Flags: bus master, fast devsel, latency 0, IRQ 11 >> Capabilities: [40] Secure device<?> >> Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ >> Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+ >> >> >>> Jan >>> >>> Note that due to ptwr_do_page_fault() being run first, there''ll be a >>> MEM_LOG() issued for each such attempt. If that''s undesirable, the >>> order of the calls would need to be swapped. >>> >>> --- a/xen/arch/x86/mm.c >>> +++ b/xen/arch/x86/mm.c >>> @@ -1875,6 +1875,7 @@ static int mod_l1_entry(l1_pgentry_t *pl >>> break; >>> case 1: >>> l1e_remove_flags(nl1e, _PAGE_RW); >>> + rc = 0; >>> break; >>> } >>> if ( page ) >>> @@ -5208,6 +5209,97 @@ int ptwr_do_page_fault(struct vcpu *v, u >>> return 0; >>> } >>> >>> +#ifdef __x86_64__ >>> +/************************* >>> + * fault handling for read-only MMIO pages >>> + */ >>> + >>> +struct mmio_ro_emulate_ctxt { >>> + struct x86_emulate_ctxt ctxt; >>> + unsigned long cr2; >>> +}; >>> + >>> +static int mmio_ro_emulated_read( >>> + enum x86_segment seg, >>> + unsigned long offset, >>> + void *p_data, >>> + unsigned int bytes, >>> + struct x86_emulate_ctxt *ctxt) >>> +{ >>> + return X86EMUL_UNHANDLEABLE; >>> +} >>> + >>> +static int mmio_ro_emulated_write( >>> + enum x86_segment seg, >>> + unsigned long offset, >>> + void *p_data, >>> + unsigned int bytes, >>> + struct x86_emulate_ctxt *ctxt) >>> +{ >>> + struct mmio_ro_emulate_ctxt *mmio_ro_ctxt >>> + container_of(ctxt, struct mmio_ro_emulate_ctxt, ctxt); >>> + >>> + /* Only allow naturally-aligned stores at the original %cr2 address. */ >>> + if ( ((bytes | offset)& (bytes - 1)) || offset != mmio_ro_ctxt->cr2 ) >>> + { >>> + MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx, addr=%lx, >> bytes=%u)", >>> + mmio_ro_ctxt->cr2, offset, bytes); >>> + return X86EMUL_UNHANDLEABLE; >>> + } >>> + >>> + return X86EMUL_OKAY; >>> +} >>> + >>> +static const struct x86_emulate_ops mmio_ro_emulate_ops = { >>> + .read = mmio_ro_emulated_read, >>> + .insn_fetch = ptwr_emulated_read, >>> + .write = mmio_ro_emulated_write, >>> +}; >>> + >>> +/* Check if guest is trying to modify a r/o MMIO page. */ >>> +int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr, >>> + struct cpu_user_regs *regs) >>> +{ >>> + l1_pgentry_t pte; >>> + unsigned long mfn; >>> + unsigned int addr_size = is_pv_32on64_domain(v->domain) ? >>> + 32 : BITS_PER_LONG; >>> + struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { >>> + .ctxt.regs = regs, >>> + .ctxt.addr_size = addr_size, >>> + .ctxt.sp_size = addr_size, >>> + .cr2 = addr >>> + }; >>> + int rc; >>> + >>> + /* Attempt to read the PTE that maps the VA being accessed. */ >>> + guest_get_eff_l1e(v, addr,&pte); >>> + >>> + /* We are looking only for read-only mappings of MMIO pages. */ >>> + if ( ((l1e_get_flags(pte)& (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) >> ) >>> + return 0; >>> + >>> + mfn = l1e_get_pfn(pte); >>> + if ( mfn_valid(mfn) ) >>> + { >>> + struct page_info *page = mfn_to_page(mfn); >>> + struct domain *owner = page_get_owner_and_reference(page); >>> + >>> + if ( owner ) >>> + put_page(page); >>> + if ( owner != dom_io ) >>> + return 0; >>> + } >>> + >>> + if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) >>> + return 0; >>> + >>> + rc = x86_emulate(&mmio_ro_ctxt.ctxt,&mmio_ro_emulate_ops); >>> + >>> + return rc != X86EMUL_UNHANDLEABLE ? EXCRET_fault_fixed : 0; >>> +} >>> +#endif /* __x86_64__ */ >>> + >>> void free_xen_pagetable(void *v) >>> { >>> if ( system_state == SYS_STATE_early_boot ) >>> --- a/xen/arch/x86/traps.c >>> +++ b/xen/arch/x86/traps.c >>> @@ -1349,20 +1349,23 @@ static int fixup_page_fault(unsigned lon >>> return 0; >>> } >>> >>> - if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables)&& >>> - guest_kernel_mode(v, regs) ) >>> - { >>> - unsigned int mbs = PFEC_write_access; >>> - unsigned int mbz = PFEC_reserved_bit | PFEC_insn_fetch; >>> - >>> - /* Do not check if access-protection fault since the page may >>> - legitimately be not present in shadow page tables */ >>> - if ( !paging_mode_enabled(d) ) >>> - mbs |= PFEC_page_present; >>> - >>> - if ( ((regs->error_code& (mbs | mbz)) == mbs)&& >>> + if ( guest_kernel_mode(v, regs)&& >>> + !(regs->error_code& (PFEC_reserved_bit | PFEC_insn_fetch))&& >>> + (regs->error_code& PFEC_write_access) ) >>> + { >>> + if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables)&& >>> + /* Do not check if access-protection fault since the page may >>> + legitimately be not present in shadow page tables */ >>> + (paging_mode_enabled(d) || >>> + (regs->error_code& PFEC_page_present))&& >>> ptwr_do_page_fault(v, addr, regs) ) >>> return EXCRET_fault_fixed; >>> + >>> +#ifdef __x86_64__ >>> + if ( IS_PRIV(d)&& (regs->error_code& PFEC_page_present)&& >>> + mmio_ro_do_page_fault(v, addr, regs) ) >>> + return EXCRET_fault_fixed; >>> +#endif >>> } >>> >>> /* For non-external shadowed guests, we fix up both their own >>> @@ -1690,6 +1693,13 @@ static int pci_cfg_ok(struct domain *d, >>> return 0; >>> >>> machine_bdf = (d->arch.pci_cf8>> 8)& 0xFFFF; >>> + if ( write ) >>> + { >>> + const unsigned long *ro_map = pci_get_ro_map(0); >>> + >>> + if ( ro_map&& test_bit(machine_bdf, ro_map) ) >>> + return 0; >>> + } >>> start = d->arch.pci_cf8& 0xFF; >>> end = start + size - 1; >>> if (xsm_pci_config_permission(d, machine_bdf, start, end, write)) >>> --- a/xen/arch/x86/x86_32/pci.c >>> +++ b/xen/arch/x86/x86_32/pci.c >>> @@ -6,6 +6,7 @@ >>> >>> #include<xen/spinlock.h> >>> #include<xen/pci.h> >>> +#include<xen/init.h> >>> #include<asm/io.h> >>> >>> #define PCI_CONF_ADDRESS(bus, dev, func, reg) \ >>> @@ -70,3 +71,7 @@ void pci_conf_write32( >>> BUG_ON((bus> 255) || (dev> 31) || (func> 7) || (reg> 255)); >>> pci_conf_write(PCI_CONF_ADDRESS(bus, dev, func, reg), 0, 4, data); >>> } >>> + >>> +void __init arch_pci_ro_device(int seg, int bdf) >>> +{ >>> +} >>> --- a/xen/arch/x86/x86_64/mmconfig_64.c >>> +++ b/xen/arch/x86/x86_64/mmconfig_64.c >>> @@ -145,9 +145,30 @@ static void __iomem *mcfg_ioremap(const >>> return (void __iomem *) virt; >>> } >>> >>> +void arch_pci_ro_device(int seg, int bdf) >>> +{ >>> + unsigned int idx, bus = PCI_BUS(bdf); >>> + >>> + for (idx = 0; idx< pci_mmcfg_config_num; ++idx) { >>> + const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg; >>> + unsigned long mfn = (cfg->address>> PAGE_SHIFT) + bdf; >>> + >>> + if (!pci_mmcfg_virt[idx].virt || cfg->pci_segment != seg || >>> + cfg->start_bus_number> bus || cfg->end_bus_number< bus) >>> + continue; >>> + >>> + if (rangeset_add_singleton(mmio_ro_ranges, mfn)) >>> + printk(XENLOG_ERR >>> + "%04x:%02x:%02x.%u: could not mark MCFG (mfn %#lx) >> read-only\n", >>> + cfg->pci_segment, bus, PCI_SLOT(bdf), PCI_FUNC(bdf), >>> + mfn); >>> + } >>> +} >>> + >>> int pci_mmcfg_arch_enable(unsigned int idx) >>> { >>> const typeof(pci_mmcfg_config[0]) *cfg = pci_mmcfg_virt[idx].cfg; >>> + const unsigned long *ro_map = pci_get_ro_map(cfg->pci_segment); >>> >>> if (pci_mmcfg_virt[idx].virt) >>> return 0; >>> @@ -159,6 +180,16 @@ int pci_mmcfg_arch_enable(unsigned int i >>> } >>> printk(KERN_INFO "PCI: Using MCFG for segment %04x bus %02x-%02x\n", >>> cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number); >>> + if (ro_map) { >>> + unsigned int bdf = PCI_BDF(cfg->start_bus_number, 0, 0); >>> + unsigned int end = PCI_BDF(cfg->end_bus_number, -1, -1); >>> + >>> + while ((bdf = find_next_bit(ro_map, end + 1, bdf))<= end) { >>> + arch_pci_ro_device(cfg->pci_segment, bdf); >>> + if (bdf++ == end) >>> + break; >>> + } >>> + } >>> return 0; >>> } >>> >>> --- a/xen/drivers/passthrough/amd/iommu_detect.c >>> +++ b/xen/drivers/passthrough/amd/iommu_detect.c >>> @@ -153,6 +153,12 @@ int __init amd_iommu_detect_one_acpi( >>> if ( rt ) >>> return -ENODEV; >>> >>> + rt = pci_ro_device(iommu->seg, bus, PCI_DEVFN(dev, func)); >>> + if ( rt ) >>> + printk(XENLOG_ERR >>> + "Could not mark config space of %04x:%02x:%02x.%u read-only >> (%d)\n", >>> + iommu->seg, bus, dev, func, rt); >>> + >>> list_add_tail(&iommu->list,&amd_iommu_head); >>> >>> return 0; >>> --- a/xen/drivers/passthrough/io.c >>> +++ b/xen/drivers/passthrough/io.c >>> @@ -593,11 +593,3 @@ void hvm_dpci_eoi(struct domain *d, unsi >>> unlock: >>> spin_unlock(&d->event_lock); >>> } >>> - >>> -static int __init setup_mmio_ro_ranges(void) >>> -{ >>> - mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges", >>> - RANGESETF_prettyprint_hex); >>> - return 0; >>> -} >>> -__initcall(setup_mmio_ro_ranges); >>> --- a/xen/drivers/passthrough/pci.c >>> +++ b/xen/drivers/passthrough/pci.c >>> @@ -36,6 +36,7 @@ >>> struct pci_seg { >>> struct list_head alldevs_list; >>> u16 nr; >>> + unsigned long *ro_map; >>> /* bus2bridge_lock protects bus2bridge array */ >>> spinlock_t bus2bridge_lock; >>> #define MAX_BUSES 256 >>> @@ -106,6 +107,8 @@ void __init pt_pci_init(void) >>> radix_tree_init(&pci_segments); >>> if ( !alloc_pseg(0) ) >>> panic("Could not initialize PCI segment 0\n"); >>> + mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges", >>> + RANGESETF_prettyprint_hex); >>> } >>> >>> int __init pci_add_segment(u16 seg) >>> @@ -113,6 +116,13 @@ int __init pci_add_segment(u16 seg) >>> return alloc_pseg(seg) ? 0 : -ENOMEM; >>> } >>> >>> +const unsigned long *pci_get_ro_map(u16 seg) >>> +{ >>> + struct pci_seg *pseg = get_pseg(seg); >>> + >>> + return pseg ? pseg->ro_map : NULL; >>> +} >>> + >>> static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn) >>> { >>> struct pci_dev *pdev; >>> @@ -198,6 +208,33 @@ static void free_pdev(struct pci_seg *ps >>> xfree(pdev); >>> } >>> >>> +int __init pci_ro_device(int seg, int bus, int devfn) >>> +{ >>> + struct pci_seg *pseg = alloc_pseg(seg); >>> + struct pci_dev *pdev; >>> + >>> + if ( !pseg ) >>> + return -ENOMEM; >>> + pdev = alloc_pdev(pseg, bus, devfn); >>> + if ( !pdev ) >>> + return -ENOMEM; >>> + >>> + if ( !pseg->ro_map ) >>> + { >>> + size_t sz = BITS_TO_LONGS(PCI_BDF(-1, -1, -1) + 1) * sizeof(long); >>> + >>> + pseg->ro_map = alloc_xenheap_pages(get_order_from_bytes(sz), 0); >>> + if ( !pseg->ro_map ) >>> + return -ENOMEM; >>> + memset(pseg->ro_map, 0, sz); >>> + } >>> + >>> + __set_bit(PCI_BDF2(bus, devfn), pseg->ro_map); >>> + arch_pci_ro_device(seg, PCI_BDF2(bus, devfn)); >>> + >>> + return 0; >>> +} >>> + >>> struct pci_dev *pci_get_pdev(int seg, int bus, int devfn) >>> { >>> struct pci_seg *pseg = get_pseg(seg); >>> --- a/xen/include/asm-x86/mm.h >>> +++ b/xen/include/asm-x86/mm.h >>> @@ -555,6 +555,8 @@ void memguard_unguard_stack(void *p); >>> >>> int ptwr_do_page_fault(struct vcpu *, unsigned long, >>> struct cpu_user_regs *); >>> +int mmio_ro_do_page_fault(struct vcpu *, unsigned long, >>> + struct cpu_user_regs *); >>> >>> int audit_adjust_pgtables(struct domain *d, int dir, int noisy); >>> >>> --- a/xen/include/xen/pci.h >>> +++ b/xen/include/xen/pci.h >>> @@ -98,8 +98,11 @@ struct pci_dev *pci_lock_domain_pdev( >>> void setup_dom0_pci_devices(struct domain *, void (*)(struct pci_dev *)); >>> void pci_release_devices(struct domain *d); >>> int pci_add_segment(u16 seg); >>> +const unsigned long *pci_get_ro_map(u16 seg); >>> int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info >> *); >>> int pci_remove_device(u16 seg, u8 bus, u8 devfn); >>> +int pci_ro_device(int seg, int bus, int devfn); >>> +void arch_pci_ro_device(int seg, int bdf); >>> struct pci_dev *pci_get_pdev(int seg, int bus, int devfn); >>> struct pci_dev *pci_get_pdev_by_domain( >>> struct domain *, int seg, int bus, int devfn); >>> > > >