thr3ads.net - Xen devel - [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Wei Wang

2012-Jun-12 12:02 UTC

[PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Hi Jan & Andrew

I had attached a revised patch, please check it.

I found that the following Linux commit triggers this issue. It has been 
included into 3.4 pv_ops.

" commit a776c491ca5e38c26d9f66923ff574d041e747f4
   Author: Eric W. Biederman <ebiederm@xmission.com>
   Date:   Mon Oct 17 11:46:06 2011 -0700

   PCI: msi: Disable msi interrupts when we initialize a pci device "

Thanks,
Wei


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2012-Jun-12 15:13 UTC

head link

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

>>> On 12.06.12 at 14:02, Wei Wang <wei.wang2@amd.com> wrote:
> I had attached a revised patch, please check it.
While the patch technically looks better now, you didn''t eliminate
my objections to the approach you take, nor did you comment on
the proposed alternative.

A fundamental problem is that your IOMMUs show up as a "normal"
PCI devices, breaking the separation between what is being
managed by the hypervisor vs by the Dom0 kernel. (This even
allows something as odd as passing through an IOMMU to a
DomU, which would clearly upset the hypervisor.)
> I found that the following Linux commit triggers this issue. It has been 
> included into 3.4 pv_ops.
> 
> " commit a776c491ca5e38c26d9f66923ff574d041e747f4
>    Author: Eric W. Biederman <ebiederm@xmission.com>
>    Date:   Mon Oct 17 11:46:06 2011 -0700
> 
>    PCI: msi: Disable msi interrupts when we initialize a pci device "
Thanks for locating this. As it stands, it is incomplete though
anyway: If the kexec-ed kernel is one built without CONFIG_PCI_MSI,
it won''t have a means to suppress the "screaming" interrupts.

Jan

Andrew Cooper

2012-Jun-12 16:08 UTC

head link

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

On 12/06/12 16:13, Jan Beulich wrote:>>>> On 12.06.12 at 14:02, Wei Wang <wei.wang2@amd.com> wrote:
>> I had attached a revised patch, please check it.
> While the patch technically looks better now, you didn''t eliminate
> my objections to the approach you take, nor did you comment on
> the proposed alternative.
>
> A fundamental problem is that your IOMMUs show up as a "normal"
> PCI devices, breaking the separation between what is being
> managed by the hypervisor vs by the Dom0 kernel. (This even
> allows something as odd as passing through an IOMMU to a
> DomU, which would clearly upset the hypervisor.)
>
>> I found that the following Linux commit triggers this issue. It has
been
>> included into 3.4 pv_ops.
>>
>> " commit a776c491ca5e38c26d9f66923ff574d041e747f4
>>    Author: Eric W. Biederman <ebiederm@xmission.com>
>>    Date:   Mon Oct 17 11:46:06 2011 -0700
>>
>>    PCI: msi: Disable msi interrupts when we initialize a pci device
"
> Thanks for locating this. As it stands, it is incomplete though
> anyway: If the kexec-ed kernel is one built without CONFIG_PCI_MSI,
> it won''t have a means to suppress the "screaming"
interrupts.
>
> Jan
I feel that the correct solution would be for Xen to hide the PCI
devices from dom0.  Xen already hides the DMAR ACPI table (by turning it
to an XMAR table), and this would be the logical way to proceed now that
IOMMU internals are appearing as PCI devices.

( A similar issue comes into play now that newer generations of CPUs are
exposing CPU internals as PCI devices )

~Andrew
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Jan Beulich

2012-Jun-12 16:43 UTC

head link

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

>>> On 12.06.12 at 18:08, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 12/06/12 16:13, Jan Beulich wrote:
>>>>> On 12.06.12 at 14:02, Wei Wang <wei.wang2@amd.com>
wrote:
>>> I had attached a revised patch, please check it.
>> While the patch technically looks better now, you didn''t
eliminate
>> my objections to the approach you take, nor did you comment on
>> the proposed alternative.
>>
>> A fundamental problem is that your IOMMUs show up as a
"normal"
>> PCI devices, breaking the separation between what is being
>> managed by the hypervisor vs by the Dom0 kernel. (This even
>> allows something as odd as passing through an IOMMU to a
>> DomU, which would clearly upset the hypervisor.)
>>
>>> I found that the following Linux commit triggers this issue. It has
been
>>> included into 3.4 pv_ops.
>>>
>>> " commit a776c491ca5e38c26d9f66923ff574d041e747f4
>>>    Author: Eric W. Biederman <ebiederm@xmission.com>
>>>    Date:   Mon Oct 17 11:46:06 2011 -0700
>>>
>>>    PCI: msi: Disable msi interrupts when we initialize a pci device
"
>> Thanks for locating this. As it stands, it is incomplete though
>> anyway: If the kexec-ed kernel is one built without CONFIG_PCI_MSI,
>> it won''t have a means to suppress the "screaming"
interrupts.
> 
> I feel that the correct solution would be for Xen to hide the PCI
> devices from dom0.  Xen already hides the DMAR ACPI table (by turning it
> to an XMAR table), and this would be the logical way to proceed now that
> IOMMU internals are appearing as PCI devices.
That is precisely what I suggested in my response to the first
version of this patch, and I''d also volunteer to look into putting
together a first draft implementation if we sort of agree that
this is the way to go.
> ( A similar issue comes into play now that newer generations of CPUs are
> exposing CPU internals as PCI devices )
Indeed, good point. Might be a little tricky though to determine
which one(s) they are, and still avoid conflicting with things like
the EDAC drivers in Dom0.

Jan

Wei Wang

2012-Jun-14 12:13 UTC

head link

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Am 12.06.2012 18:43, schrieb Jan Beulich:>>>> On 12.06.12 at 18:08, Andrew
Cooper<andrew.cooper3@citrix.com>  wrote:
>> On 12/06/12 16:13, Jan Beulich wrote:
>>>>>> On 12.06.12 at 14:02, Wei Wang<wei.wang2@amd.com>
wrote:
>>>> I had attached a revised patch, please check it.
>>> While the patch technically looks better now, you didn''t
eliminate
>>> my objections to the approach you take, nor did you comment on
>>> the proposed alternative.
>>>
>>> A fundamental problem is that your IOMMUs show up as a
"normal"
>>> PCI devices, breaking the separation between what is being
>>> managed by the hypervisor vs by the Dom0 kernel. (This even
>>> allows something as odd as passing through an IOMMU to a
>>> DomU, which would clearly upset the hypervisor.)
>>>
>>>> I found that the following Linux commit triggers this issue. It
has been
>>>> included into 3.4 pv_ops.
>>>>
>>>> " commit a776c491ca5e38c26d9f66923ff574d041e747f4
>>>>     Author: Eric W. Biederman<ebiederm@xmission.com>
>>>>     Date:   Mon Oct 17 11:46:06 2011 -0700
>>>>
>>>>     PCI: msi: Disable msi interrupts when we initialize a pci
device "
>>> Thanks for locating this. As it stands, it is incomplete though
>>> anyway: If the kexec-ed kernel is one built without CONFIG_PCI_MSI,
>>> it won''t have a means to suppress the
"screaming" interrupts.
>>
>> I feel that the correct solution would be for Xen to hide the PCI
>> devices from dom0.  Xen already hides the DMAR ACPI table (by turning
it
>> to an XMAR table), and this would be the logical way to proceed now
that
>> IOMMU internals are appearing as PCI devices.
That sounds absolutely good to me. thanks for the suggestion.
> That is precisely what I suggested in my response to the first
> version of this patch, and I''d also volunteer to look into putting
> together a first draft implementation if we sort of agree that
> this is the way to go.
Cool! thanks for doing that. Looking forward to it in Xen 4.2 since 
iommu msi is really broken with recent Linux dom0...

Thanks
Wei

>> ( A similar issue comes into play now that newer generations of CPUs
are
>> exposing CPU internals as PCI devices )
>
> Indeed, good point. Might be a little tricky though to determine
> which one(s) they are, and still avoid conflicting with things like
> the EDAC drivers in Dom0.
>
> Jan
>
>

Jan Beulich

2012-Jun-14 14:18 UTC

head link

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

>>> On 14.06.12 at 14:13, Wei Wang <wei.wang2@amd.com> wrote:
> Am 12.06.2012 18:43, schrieb Jan Beulich:
>> That is precisely what I suggested in my response to the first
>> version of this patch, and I''d also volunteer to look into
putting
>> together a first draft implementation if we sort of agree that
>> this is the way to go.
> 
> Cool! thanks for doing that. Looking forward to it in Xen 4.2 since 
> iommu msi is really broken with recent Linux dom0...
That''s unlikely to be for 4.2 - it''s going to have a
reasonable risk
of regressions, and functionality like that I don''t think is nice to
rush into a feature frozen tree, the more that it provides a
workaround for the combination of questionable design and an
improper kernel side fix.

Have you at all considered getting this fixed on the kernel side?
As I don''t have direct access to any AMD IOMMU capable
system - can one, other than by enumerating the respective
PCI IDs or reading ACPI tables, reasonably easily identify the
devices in question (e.g. via vendor/class/sub-class or some
such)? That might permit skipping those in the offending kernel
code...

Jan

Wei Wang

2012-Jun-14 15:15 UTC

head link

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Am 14.06.2012 16:18, schrieb Jan Beulich:>>>> On 14.06.12 at 14:13, Wei Wang<wei.wang2@amd.com>  wrote:
>> Am 12.06.2012 18:43, schrieb Jan Beulich:
>>> That is precisely what I suggested in my response to the first
>>> version of this patch, and I''d also volunteer to look into
putting
>>> together a first draft implementation if we sort of agree that
>>> this is the way to go.
>>
>> Cool! thanks for doing that. Looking forward to it in Xen 4.2 since
>> iommu msi is really broken with recent Linux dom0...
>
> That''s unlikely to be for 4.2 - it''s going to have a
reasonable risk
> of regressions, and functionality like that I don''t think is nice
to
> rush into a feature frozen tree, the more that it provides a
> workaround for the combination of questionable design and an
> improper kernel side fix.
Yes, it looks like improper to me also, but it would also be nice to 
have something Xen to handle this situation. Maybe we should not trust 
guest os, even dom0 some times...
> Have you at all considered getting this fixed on the kernel side?
> As I don''t have direct access to any AMD IOMMU capable
> system - can one, other than by enumerating the respective
> PCI IDs or reading ACPI tables, reasonably easily identify the
> devices in question (e.g. via vendor/class/sub-class or some
> such)? That might permit skipping those in the offending kernel
> code...
AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral) 
and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this 
should be enough to identify amd iommu device. I could send you a kernel 
patch for review using this approach. I would believe that fixing this 
issue in 4.2, no matter how, is really important for amd iommu.

Thanks,
Wei
> Jan
>
>

Jan Beulich

2012-Jun-14 15:27 UTC

head link

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

>>> On 14.06.12 at 17:15, Wei Wang <wei.wang2@amd.com> wrote:
> Am 14.06.2012 16:18, schrieb Jan Beulich:
>> Have you at all considered getting this fixed on the kernel side?
>> As I don''t have direct access to any AMD IOMMU capable
>> system - can one, other than by enumerating the respective
>> PCI IDs or reading ACPI tables, reasonably easily identify the
>> devices in question (e.g. via vendor/class/sub-class or some
>> such)? That might permit skipping those in the offending kernel
>> code...
> 
> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral) 
> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this 
> should be enough to identify amd iommu device. I could send you a kernel 
> patch for review using this approach.
Yes, please.

Jan

Jan Beulich

2012-Jun-20 15:45 UTC

head link

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

>>> On 14.06.12 at 14:13, Wei Wang <wei.wang2@amd.com> wrote:
> Am 12.06.2012 18:43, schrieb Jan Beulich:
>>>>> On 12.06.12 at 18:08, Andrew
Cooper<andrew.cooper3@citrix.com>  wrote:
>>> On 12/06/12 16:13, Jan Beulich wrote:
>>>>>>> On 12.06.12 at 14:02, Wei
Wang<wei.wang2@amd.com>  wrote:
>>>>> I had attached a revised patch, please check it.
>>>> While the patch technically looks better now, you
didn''t eliminate
>>>> my objections to the approach you take, nor did you comment on
>>>> the proposed alternative.
>>>>
>>>> A fundamental problem is that your IOMMUs show up as a
"normal"
>>>> PCI devices, breaking the separation between what is being
>>>> managed by the hypervisor vs by the Dom0 kernel. (This even
>>>> allows something as odd as passing through an IOMMU to a
>>>> DomU, which would clearly upset the hypervisor.)
>>>>
>>>>> I found that the following Linux commit triggers this
issue. It has been
>>>>> included into 3.4 pv_ops.
>>>>>
>>>>> " commit a776c491ca5e38c26d9f66923ff574d041e747f4
>>>>>     Author: Eric W. Biederman<ebiederm@xmission.com>
>>>>>     Date:   Mon Oct 17 11:46:06 2011 -0700
>>>>>
>>>>>     PCI: msi: Disable msi interrupts when we initialize a
pci device "
>>>> Thanks for locating this. As it stands, it is incomplete though
>>>> anyway: If the kexec-ed kernel is one built without
CONFIG_PCI_MSI,
>>>> it won''t have a means to suppress the
"screaming" interrupts.
>>>
>>> I feel that the correct solution would be for Xen to hide the PCI
>>> devices from dom0.  Xen already hides the DMAR ACPI table (by
turning it
>>> to an XMAR table), and this would be the logical way to proceed now
that
>>> IOMMU internals are appearing as PCI devices.
> 
> That sounds absolutely good to me. thanks for the suggestion.
> 
>> That is precisely what I suggested in my response to the first
>> version of this patch, and I''d also volunteer to look into
putting
>> together a first draft implementation if we sort of agree that
>> this is the way to go.
> 
> Cool! thanks for doing that. Looking forward to it in Xen 4.2 since 
> iommu msi is really broken with recent Linux dom0...
Rather than submitting it for inclusion immediately, I''d rather see
you first use it successfully. Below/attached what appears to
work fine for me in contrived situations (i.e. hiding bridges with
nothing connected, as I still don''t have any AMD IOMMU system
at hand).

Jan

Note that due to ptwr_do_page_fault() being run first, there''ll be a
MEM_LOG() issued for each such attempt. If that''s undesirable, the
order of the calls would need to be swapped.

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1875,6 +1875,7 @@ static int mod_l1_entry(l1_pgentry_t *pl
             break;
         case 1:
             l1e_remove_flags(nl1e, _PAGE_RW);
+            rc = 0;
             break;
         }
         if ( page )
@@ -5208,6 +5209,97 @@ int ptwr_do_page_fault(struct vcpu *v, u
     return 0;
 }
 
+#ifdef __x86_64__
+/*************************
+ * fault handling for read-only MMIO pages
+ */
+
+struct mmio_ro_emulate_ctxt {
+    struct x86_emulate_ctxt ctxt;
+    unsigned long cr2;
+};
+
+static int mmio_ro_emulated_read(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int mmio_ro_emulated_write(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    struct mmio_ro_emulate_ctxt *mmio_ro_ctxt +        container_of(ctxt,
struct mmio_ro_emulate_ctxt, ctxt);
+
+    /* Only allow naturally-aligned stores at the original %cr2 address. */
+    if ( ((bytes | offset) & (bytes - 1)) || offset != mmio_ro_ctxt->cr2
)
+    {
+        MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx, addr=%lx,
bytes=%u)",
+                mmio_ro_ctxt->cr2, offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static const struct x86_emulate_ops mmio_ro_emulate_ops = {
+    .read       = mmio_ro_emulated_read,
+    .insn_fetch = ptwr_emulated_read,
+    .write      = mmio_ro_emulated_write,
+};
+
+/* Check if guest is trying to modify a r/o MMIO page. */
+int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
+                          struct cpu_user_regs *regs)
+{
+    l1_pgentry_t      pte;
+    unsigned long     mfn;
+    unsigned int      addr_size = is_pv_32on64_domain(v->domain) ?
+                                  32 : BITS_PER_LONG;
+    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = {
+        .ctxt.regs = regs,
+        .ctxt.addr_size = addr_size,
+        .ctxt.sp_size = addr_size,
+        .cr2 = addr
+    };
+    int rc;
+
+    /* Attempt to read the PTE that maps the VA being accessed. */
+    guest_get_eff_l1e(v, addr, &pte);
+
+    /* We are looking only for read-only mappings of MMIO pages. */
+    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT)
)
+        return 0;
+
+    mfn = l1e_get_pfn(pte);
+    if ( mfn_valid(mfn) )
+    {
+        struct page_info *page = mfn_to_page(mfn);
+        struct domain *owner = page_get_owner_and_reference(page);
+
+        if ( owner )
+            put_page(page);
+        if ( owner != dom_io )
+            return 0;
+    }
+
+    if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        return 0;
+
+    rc = x86_emulate(&mmio_ro_ctxt.ctxt, &mmio_ro_emulate_ops);
+
+    return rc != X86EMUL_UNHANDLEABLE ? EXCRET_fault_fixed : 0;
+}
+#endif /* __x86_64__ */
+
 void free_xen_pagetable(void *v)
 {
     if ( system_state == SYS_STATE_early_boot )
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1349,20 +1349,23 @@ static int fixup_page_fault(unsigned lon
         return 0;
     }
 
-    if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables) &&
-         guest_kernel_mode(v, regs) )
-    {
-        unsigned int mbs = PFEC_write_access;
-        unsigned int mbz = PFEC_reserved_bit | PFEC_insn_fetch;
-
-        /* Do not check if access-protection fault since the page may 
-           legitimately be not present in shadow page tables */
-        if ( !paging_mode_enabled(d) )
-            mbs |= PFEC_page_present;
-
-        if ( ((regs->error_code & (mbs | mbz)) == mbs) &&
+    if ( guest_kernel_mode(v, regs) &&
+         !(regs->error_code & (PFEC_reserved_bit | PFEC_insn_fetch))
&&
+         (regs->error_code & PFEC_write_access) )
+    {
+        if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables) &&
+             /* Do not check if access-protection fault since the page may
+                legitimately be not present in shadow page tables */
+             (paging_mode_enabled(d) ||
+              (regs->error_code & PFEC_page_present)) &&
              ptwr_do_page_fault(v, addr, regs) )
             return EXCRET_fault_fixed;
+
+#ifdef __x86_64__
+        if ( IS_PRIV(d) && (regs->error_code &
PFEC_page_present) &&
+             mmio_ro_do_page_fault(v, addr, regs) )
+            return EXCRET_fault_fixed;
+#endif
     }
 
     /* For non-external shadowed guests, we fix up both their own 
@@ -1690,6 +1693,13 @@ static int pci_cfg_ok(struct domain *d, 
         return 0;
 
     machine_bdf = (d->arch.pci_cf8 >> 8) & 0xFFFF;
+    if ( write )
+    {
+        const unsigned long *ro_map = pci_get_ro_map(0);
+
+        if ( ro_map && test_bit(machine_bdf, ro_map) )
+            return 0;
+    }
     start = d->arch.pci_cf8 & 0xFF;
     end = start + size - 1;
     if (xsm_pci_config_permission(d, machine_bdf, start, end, write))
--- a/xen/arch/x86/x86_32/pci.c
+++ b/xen/arch/x86/x86_32/pci.c
@@ -6,6 +6,7 @@
 
 #include <xen/spinlock.h>
 #include <xen/pci.h>
+#include <xen/init.h>
 #include <asm/io.h>
 
 #define PCI_CONF_ADDRESS(bus, dev, func, reg) \
@@ -70,3 +71,7 @@ void pci_conf_write32(
     BUG_ON((bus > 255) || (dev > 31) || (func > 7) || (reg > 255));
     pci_conf_write(PCI_CONF_ADDRESS(bus, dev, func, reg), 0, 4, data);
 }
+
+void __init arch_pci_ro_device(int seg, int bdf)
+{
+}
--- a/xen/arch/x86/x86_64/mmconfig_64.c
+++ b/xen/arch/x86/x86_64/mmconfig_64.c
@@ -145,9 +145,30 @@ static void __iomem *mcfg_ioremap(const 
     return (void __iomem *) virt;
 }
 
+void arch_pci_ro_device(int seg, int bdf)
+{
+    unsigned int idx, bus = PCI_BUS(bdf);
+
+    for (idx = 0; idx < pci_mmcfg_config_num; ++idx) {
+        const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg;
+        unsigned long mfn = (cfg->address >> PAGE_SHIFT) + bdf;
+
+        if (!pci_mmcfg_virt[idx].virt || cfg->pci_segment != seg ||
+            cfg->start_bus_number > bus || cfg->end_bus_number <
bus)
+            continue;
+
+        if (rangeset_add_singleton(mmio_ro_ranges, mfn))
+            printk(XENLOG_ERR
+                   "%04x:%02x:%02x.%u: could not mark MCFG (mfn %#lx)
read-only\n",
+                   cfg->pci_segment, bus, PCI_SLOT(bdf), PCI_FUNC(bdf),
+                   mfn);
+    }
+}
+
 int pci_mmcfg_arch_enable(unsigned int idx)
 {
     const typeof(pci_mmcfg_config[0]) *cfg = pci_mmcfg_virt[idx].cfg;
+    const unsigned long *ro_map = pci_get_ro_map(cfg->pci_segment);
 
     if (pci_mmcfg_virt[idx].virt)
         return 0;
@@ -159,6 +180,16 @@ int pci_mmcfg_arch_enable(unsigned int i
     }
     printk(KERN_INFO "PCI: Using MCFG for segment %04x bus
%02x-%02x\n",
            cfg->pci_segment, cfg->start_bus_number,
cfg->end_bus_number);
+    if (ro_map) {
+        unsigned int bdf = PCI_BDF(cfg->start_bus_number, 0, 0);
+        unsigned int end = PCI_BDF(cfg->end_bus_number, -1, -1);
+
+        while ((bdf = find_next_bit(ro_map, end + 1, bdf)) <= end) {
+            arch_pci_ro_device(cfg->pci_segment, bdf);
+            if (bdf++ == end)
+                break;
+        }
+    }
     return 0;
 }
 
--- a/xen/drivers/passthrough/amd/iommu_detect.c
+++ b/xen/drivers/passthrough/amd/iommu_detect.c
@@ -153,6 +153,12 @@ int __init amd_iommu_detect_one_acpi(
     if ( rt )
         return -ENODEV;
 
+    rt = pci_ro_device(iommu->seg, bus, PCI_DEVFN(dev, func));
+    if ( rt )
+        printk(XENLOG_ERR
+               "Could not mark config space of %04x:%02x:%02x.%u read-only
(%d)\n",
+               iommu->seg, bus, dev, func, rt);
+
     list_add_tail(&iommu->list, &amd_iommu_head);
 
     return 0;
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -593,11 +593,3 @@ void hvm_dpci_eoi(struct domain *d, unsi
 unlock:
     spin_unlock(&d->event_lock);
 }
-
-static int __init setup_mmio_ro_ranges(void)
-{
-    mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges",
-                                  RANGESETF_prettyprint_hex);
-    return 0;
-}
-__initcall(setup_mmio_ro_ranges);
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -36,6 +36,7 @@
 struct pci_seg {
     struct list_head alldevs_list;
     u16 nr;
+    unsigned long *ro_map;
     /* bus2bridge_lock protects bus2bridge array */
     spinlock_t bus2bridge_lock;
 #define MAX_BUSES 256
@@ -106,6 +107,8 @@ void __init pt_pci_init(void)
     radix_tree_init(&pci_segments);
     if ( !alloc_pseg(0) )
         panic("Could not initialize PCI segment 0\n");
+    mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges",
+                                  RANGESETF_prettyprint_hex);
 }
 
 int __init pci_add_segment(u16 seg)
@@ -113,6 +116,13 @@ int __init pci_add_segment(u16 seg)
     return alloc_pseg(seg) ? 0 : -ENOMEM;
 }
 
+const unsigned long *pci_get_ro_map(u16 seg)
+{
+    struct pci_seg *pseg = get_pseg(seg);
+
+    return pseg ? pseg->ro_map : NULL;
+}
+
 static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
 {
     struct pci_dev *pdev;
@@ -198,6 +208,33 @@ static void free_pdev(struct pci_seg *ps
     xfree(pdev);
 }
 
+int __init pci_ro_device(int seg, int bus, int devfn)
+{
+    struct pci_seg *pseg = alloc_pseg(seg);
+    struct pci_dev *pdev;
+
+    if ( !pseg )
+        return -ENOMEM;
+    pdev = alloc_pdev(pseg, bus, devfn);
+    if ( !pdev )
+        return -ENOMEM;
+
+    if ( !pseg->ro_map )
+    {
+        size_t sz = BITS_TO_LONGS(PCI_BDF(-1, -1, -1) + 1) * sizeof(long);
+
+        pseg->ro_map = alloc_xenheap_pages(get_order_from_bytes(sz), 0);
+        if ( !pseg->ro_map )
+            return -ENOMEM;
+        memset(pseg->ro_map, 0, sz);
+    }
+
+    __set_bit(PCI_BDF2(bus, devfn), pseg->ro_map);
+    arch_pci_ro_device(seg, PCI_BDF2(bus, devfn));
+
+    return 0;
+}
+
 struct pci_dev *pci_get_pdev(int seg, int bus, int devfn)
 {
     struct pci_seg *pseg = get_pseg(seg);
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -555,6 +555,8 @@ void memguard_unguard_stack(void *p);
 
 int  ptwr_do_page_fault(struct vcpu *, unsigned long,
                         struct cpu_user_regs *);
+int  mmio_ro_do_page_fault(struct vcpu *, unsigned long,
+                           struct cpu_user_regs *);
 
 int audit_adjust_pgtables(struct domain *d, int dir, int noisy);
 
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -98,8 +98,11 @@ struct pci_dev *pci_lock_domain_pdev(
 void setup_dom0_pci_devices(struct domain *, void (*)(struct pci_dev *));
 void pci_release_devices(struct domain *d);
 int pci_add_segment(u16 seg);
+const unsigned long *pci_get_ro_map(u16 seg);
 int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *);
 int pci_remove_device(u16 seg, u8 bus, u8 devfn);
+int pci_ro_device(int seg, int bus, int devfn);
+void arch_pci_ro_device(int seg, int bdf);
 struct pci_dev *pci_get_pdev(int seg, int bus, int devfn);
 struct pci_dev *pci_get_pdev_by_domain(
     struct domain *, int seg, int bus, int devfn);



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2012-Jun-21 09:59 UTC

head link

[PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0 (was: Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it)

>>> On 14.06.12 at 17:15, Wei Wang <wei.wang2@amd.com> wrote:
> Am 14.06.2012 16:18, schrieb Jan Beulich:
>> Have you at all considered getting this fixed on the kernel side?
>> As I don''t have direct access to any AMD IOMMU capable
>> system - can one, other than by enumerating the respective
>> PCI IDs or reading ACPI tables, reasonably easily identify the
>> devices in question (e.g. via vendor/class/sub-class or some
>> such)? That might permit skipping those in the offending kernel
>> code...
> 
> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral) 
> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this 
> should be enough to identify amd iommu device. I could send you a kernel 
> patch for review using this approach. I would believe that fixing this 
> issue in 4.2, no matter how, is really important for amd iommu.
As you didn''t come forward with anything, here''s my first
take on this:

Commit d5dea7d95c48d7bc951cee4910a7fd9c0cd26fb0 ("PCI: msi: Disable msi
interrupts when we initialize a pci device") caused MSI to get disabled
on Xen Dom0 despite it having got turned on purposefully by the
hypervisor. As an immediate band aid, suppress the disabling in this
specific case.

I don''t think, however, that either the original change or this fix are
really appropriate. The original fix, besides leaving aside the
presence of a hypervisor like Xen, doesn''t deal with all possible
cases (in particular it has no effect if the secondary kernel was built
with CONFIG_PCI_MSI unset). And taking into account a hypervisor like
Xen, the logic like this should probably be skipped altogether (i.e. it
should be entirely left to the hypervisor, being the entity in control
of IRQs).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@kernel.org 

---
 drivers/pci/msi.c       |    7 +++++++
 include/linux/pci_ids.h |    1 +
 2 files changed, 8 insertions(+)

--- 3.5-rc3/drivers/pci/msi.c
+++ 3.5-rc3-xen-pci-msi-no-iommu-disable/drivers/pci/msi.c
@@ -20,6 +20,7 @@
 #include <linux/errno.h>
 #include <linux/io.h>
 #include <linux/slab.h>
+#include <xen/xen.h>
 
 #include "pci.h"
 #include "msi.h"
@@ -1022,7 +1023,13 @@ void pci_msi_init_pci_dev(struct pci_dev
 	/* Disable the msi hardware to avoid screaming interrupts
 	 * during boot.  This is the power on reset default so
 	 * usually this should be a noop.
+	 * But on a Xen host don''t do this for IOMMUs which the hypervisor
+	 * is in control of (and hence has already enabled on purpose).
 	 */
+	if (xen_initial_domain()
+	    && (dev->class >> 8) == PCI_CLASS_SYSTEM_IOMMU
+	    && dev->vendor == PCI_VENDOR_ID_AMD)
+		return;
 	pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
 	if (pos)
 		msi_set_enable(dev, pos, 0);
--- 3.5-rc3/include/linux/pci_ids.h
+++ 3.5-rc3-xen-pci-msi-no-iommu-disable/include/linux/pci_ids.h
@@ -75,6 +75,7 @@
 #define PCI_CLASS_SYSTEM_RTC		0x0803
 #define PCI_CLASS_SYSTEM_PCI_HOTPLUG	0x0804
 #define PCI_CLASS_SYSTEM_SDHCI		0x0805
+#define PCI_CLASS_SYSTEM_IOMMU		0x0806
 #define PCI_CLASS_SYSTEM_OTHER		0x0880
 
 #define PCI_BASE_CLASS_INPUT		0x09

Eric W. Biederman

2012-Jun-21 11:08 UTC

head link

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

"Jan Beulich" <JBeulich@suse.com> writes:
>>>> On 14.06.12 at 17:15, Wei Wang <wei.wang2@amd.com> wrote:
>> Am 14.06.2012 16:18, schrieb Jan Beulich:
>>> Have you at all considered getting this fixed on the kernel side?
>>> As I don''t have direct access to any AMD IOMMU capable
>>> system - can one, other than by enumerating the respective
>>> PCI IDs or reading ACPI tables, reasonably easily identify the
>>> devices in question (e.g. via vendor/class/sub-class or some
>>> such)? That might permit skipping those in the offending kernel
>>> code...
>> 
>> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral) 
>> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this 
>> should be enough to identify amd iommu device. I could send you a
kernel
>> patch for review using this approach. I would believe that fixing this 
>> issue in 4.2, no matter how, is really important for amd iommu.
>
> As you didn''t come forward with anything, here''s my first
> take on this:
>
> Commit d5dea7d95c48d7bc951cee4910a7fd9c0cd26fb0 ("PCI: msi: Disable
msi
> interrupts when we initialize a pci device") caused MSI to get
disabled
> on Xen Dom0 despite it having got turned on purposefully by the
> hypervisor. As an immediate band aid, suppress the disabling in this
> specific case.
>
> I don''t think, however, that either the original change or this
fix are
> really appropriate. The original fix, besides leaving aside the
> presence of a hypervisor like Xen, doesn''t deal with all possible
> cases (in particular it has no effect if the secondary kernel was built
> with CONFIG_PCI_MSI unset). And taking into account a hypervisor like
> Xen, the logic like this should probably be skipped altogether (i.e. it
> should be entirely left to the hypervisor, being the entity in control
> of IRQs).
The original fix is fine. Xen dom0 remains insane.  Although perhaps
some better than Xen dom0 once was.

Why does the dom0 kernel even get any access to pci devices that
Xen doesn''t want it to touch?

As far as I can tell my patch has revealed a design bug in Xen.  If you
don''t want to be messed up by the kernel don''t let the kernel
touch
things.  At the very least we need an abstraction much cleaner
than the patch below.

Eric

> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Cc: stable@kernel.org 
>
> ---
>  drivers/pci/msi.c       |    7 +++++++
>  include/linux/pci_ids.h |    1 +
>  2 files changed, 8 insertions(+)
>
> --- 3.5-rc3/drivers/pci/msi.c
> +++ 3.5-rc3-xen-pci-msi-no-iommu-disable/drivers/pci/msi.c
> @@ -20,6 +20,7 @@
>  #include <linux/errno.h>
>  #include <linux/io.h>
>  #include <linux/slab.h>
> +#include <xen/xen.h>
>  
>  #include "pci.h"
>  #include "msi.h"
> @@ -1022,7 +1023,13 @@ void pci_msi_init_pci_dev(struct pci_dev
>  	/* Disable the msi hardware to avoid screaming interrupts
>  	 * during boot.  This is the power on reset default so
>  	 * usually this should be a noop.
> +	 * But on a Xen host don''t do this for IOMMUs which the
hypervisor
> +	 * is in control of (and hence has already enabled on purpose).
>  	 */
> +	if (xen_initial_domain()
> +	    && (dev->class >> 8) == PCI_CLASS_SYSTEM_IOMMU
> +	    && dev->vendor == PCI_VENDOR_ID_AMD)
> +		return;
>  	pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
>  	if (pos)
>  		msi_set_enable(dev, pos, 0);
> --- 3.5-rc3/include/linux/pci_ids.h
> +++ 3.5-rc3-xen-pci-msi-no-iommu-disable/include/linux/pci_ids.h
> @@ -75,6 +75,7 @@
>  #define PCI_CLASS_SYSTEM_RTC		0x0803
>  #define PCI_CLASS_SYSTEM_PCI_HOTPLUG	0x0804
>  #define PCI_CLASS_SYSTEM_SDHCI		0x0805
> +#define PCI_CLASS_SYSTEM_IOMMU		0x0806
>  #define PCI_CLASS_SYSTEM_OTHER		0x0880
>  
>  #define PCI_BASE_CLASS_INPUT		0x09

Wei Wang

2012-Jun-21 11:21 UTC

head link

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

On 06/21/2012 11:59 AM, Jan Beulich wrote:>>>> On 14.06.12 at 17:15, Wei Wang<wei.wang2@amd.com>  wrote:
>> Am 14.06.2012 16:18, schrieb Jan Beulich:
>>> Have you at all considered getting this fixed on the kernel side?
>>> As I don''t have direct access to any AMD IOMMU capable
>>> system - can one, other than by enumerating the respective
>>> PCI IDs or reading ACPI tables, reasonably easily identify the
>>> devices in question (e.g. via vendor/class/sub-class or some
>>> such)? That might permit skipping those in the offending kernel
>>> code...
>>
>> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral)
>> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this
>> should be enough to identify amd iommu device. I could send you a
kernel
>> patch for review using this approach. I would believe that fixing this
>> issue in 4.2, no matter how, is really important for amd iommu.
>
> As you didn''t come forward with anything, here''s my first
> take on this:
Hi Jan
Thanks a lot for the patch. Actually I plan to send my version today, 
which is based on 3.4 pv_ops but looks very similar to yours. So, Acked!

I also evaluated the possibility of hiding iommu device from dom0. I 
think the change is no quite a lot, at least, for io based pcicfg 
access. A proof-of-concept patch is attached.

Thanks,
Wei

diff -r baa85434d0ec xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c      Thu Jun 21 11:30:59 2012 +0200
+++ b/xen/arch/x86/traps.c      Thu Jun 21 13:19:02 2012 +0200
@@ -73,6 +73,7 @@
  #include <asm/hpet.h>
  #include <public/arch-x86/cpuid.h>
  #include <xsm/xsm.h>
+#include <asm/hvm/svm/amd-iommu-proto.h>

  /*
   * opt_nmi: one of ''ignore'', ''dom0'', or
''fatal''.
@@ -1686,10 +1687,19 @@ static int pci_cfg_ok(struct domain *d,
  {
      uint32_t machine_bdf;
      uint16_t start, end;
+    struct amd_iommu *iommu;
+
      if (!IS_PRIV(d))
          return 0;

      machine_bdf = (d->arch.pci_cf8 >> 8) & 0xFFFF;
+
+    for_each_amd_iommu ( iommu )
+    {
+        if ( machine_bdf == iommu->bdf )
+            return 0;
+    }
+
      start = d->arch.pci_cf8 & 0xFF;
      end = start + size - 1;
      if (xsm_pci_config_permission(d, machine_bdf, start, end, write))
>
> Commit d5dea7d95c48d7bc951cee4910a7fd9c0cd26fb0 ("PCI: msi: Disable
msi
> interrupts when we initialize a pci device") caused MSI to get
disabled
> on Xen Dom0 despite it having got turned on purposefully by the
> hypervisor. As an immediate band aid, suppress the disabling in this
> specific case.
>
> I don''t think, however, that either the original change or this
fix are
> really appropriate. The original fix, besides leaving aside the
> presence of a hypervisor like Xen, doesn''t deal with all possible
> cases (in particular it has no effect if the secondary kernel was built
> with CONFIG_PCI_MSI unset). And taking into account a hypervisor like
> Xen, the logic like this should probably be skipped altogether (i.e. it
> should be entirely left to the hypervisor, being the entity in control
> of IRQs).
>
> Signed-off-by: Jan Beulich<jbeulich@suse.com>
> Cc: stable@kernel.org
>
> ---
>   drivers/pci/msi.c       |    7 +++++++
>   include/linux/pci_ids.h |    1 +
>   2 files changed, 8 insertions(+)
>
> --- 3.5-rc3/drivers/pci/msi.c
> +++ 3.5-rc3-xen-pci-msi-no-iommu-disable/drivers/pci/msi.c
> @@ -20,6 +20,7 @@
>   #include<linux/errno.h>
>   #include<linux/io.h>
>   #include<linux/slab.h>
> +#include<xen/xen.h>
>
>   #include "pci.h"
>   #include "msi.h"
> @@ -1022,7 +1023,13 @@ void pci_msi_init_pci_dev(struct pci_dev
>   	/* Disable the msi hardware to avoid screaming interrupts
>   	 * during boot.  This is the power on reset default so
>   	 * usually this should be a noop.
> +	 * But on a Xen host don''t do this for IOMMUs which the
hypervisor
> +	 * is in control of (and hence has already enabled on purpose).
>   	 */
> +	if (xen_initial_domain()
> +	&&  (dev->class>>  8) == PCI_CLASS_SYSTEM_IOMMU
> +	&&  dev->vendor == PCI_VENDOR_ID_AMD)
> +		return;
>   	pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
>   	if (pos)
>   		msi_set_enable(dev, pos, 0);
> --- 3.5-rc3/include/linux/pci_ids.h
> +++ 3.5-rc3-xen-pci-msi-no-iommu-disable/include/linux/pci_ids.h
> @@ -75,6 +75,7 @@
>   #define PCI_CLASS_SYSTEM_RTC		0x0803
>   #define PCI_CLASS_SYSTEM_PCI_HOTPLUG	0x0804
>   #define PCI_CLASS_SYSTEM_SDHCI		0x0805
> +#define PCI_CLASS_SYSTEM_IOMMU		0x0806
>   #define PCI_CLASS_SYSTEM_OTHER		0x0880
>
>   #define PCI_BASE_CLASS_INPUT		0x09
>
>
>
>

Jan Beulich

2012-Jun-21 12:06 UTC

head link

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

>>> On 21.06.12 at 13:21, Wei Wang <wei.wang2@amd.com> wrote:
> On 06/21/2012 11:59 AM, Jan Beulich wrote:
>>>>> On 14.06.12 at 17:15, Wei Wang<wei.wang2@amd.com> 
wrote:
>>> Am 14.06.2012 16:18, schrieb Jan Beulich:
>>>> Have you at all considered getting this fixed on the kernel
side?
>>>> As I don''t have direct access to any AMD IOMMU capable
>>>> system - can one, other than by enumerating the respective
>>>> PCI IDs or reading ACPI tables, reasonably easily identify the
>>>> devices in question (e.g. via vendor/class/sub-class or some
>>>> such)? That might permit skipping those in the offending kernel
>>>> code...
>>>
>>> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base
Peripheral)
>>> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this
>>> should be enough to identify amd iommu device. I could send you a
kernel
>>> patch for review using this approach. I would believe that fixing
this
>>> issue in 4.2, no matter how, is really important for amd iommu.
>>
>> As you didn''t come forward with anything, here''s my
first
>> take on this:
> 
> Hi Jan
> Thanks a lot for the patch. Actually I plan to send my version today, 
> which is based on 3.4 pv_ops but looks very similar to yours. So, Acked!
> 
> I also evaluated the possibility of hiding iommu device from dom0. I 
> think the change is no quite a lot, at least, for io based pcicfg 
> access. A proof-of-concept patch is attached.
This completely hides the device from Dom0, but only when
config space is accessed via method 1. Did you not see my
earlier patch doing this for MCFG as well (albeit only disallowing
writes, so allowing the device to still be seen by Dom0)?

Whether completely hiding the device is actually okay I can''t
easily tell: Would IOMMUs always be either at func 0 of a single-
unction device, or at a non-zero func of a multi-function one? If
not, other devices may get hidden implicitly.

Also I noticed just now that guest_io_read() wouldn''t really
behave correctly when pci_cfg_ok() returned false - it might
pass back 0xff even for a multi-byte read. I''ll send a fix shortly.

Jan
> --- a/xen/arch/x86/traps.c      Thu Jun 21 11:30:59 2012 +0200
> +++ b/xen/arch/x86/traps.c      Thu Jun 21 13:19:02 2012 +0200
> @@ -73,6 +73,7 @@
>   #include <asm/hpet.h>
>   #include <public/arch-x86/cpuid.h>
>   #include <xsm/xsm.h>
> +#include <asm/hvm/svm/amd-iommu-proto.h>
> 
>   /*
>    * opt_nmi: one of ''ignore'', ''dom0'',
or ''fatal''.
> @@ -1686,10 +1687,19 @@ static int pci_cfg_ok(struct domain *d,
>   {
>       uint32_t machine_bdf;
>       uint16_t start, end;
> +    struct amd_iommu *iommu;
> +
>       if (!IS_PRIV(d))
>           return 0;
> 
>       machine_bdf = (d->arch.pci_cf8 >> 8) & 0xFFFF;
> +
> +    for_each_amd_iommu ( iommu )
> +    {
> +        if ( machine_bdf == iommu->bdf )
> +            return 0;
> +    }
> +
>       start = d->arch.pci_cf8 & 0xFF;
>       end = start + size - 1;
>       if (xsm_pci_config_permission(d, machine_bdf, start, end, write))

Wei Wang

2012-Jun-21 12:28 UTC

head link

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

On 06/21/2012 02:06 PM, Jan Beulich wrote:>>>> On 21.06.12 at 13:21, Wei Wang<wei.wang2@amd.com>  wrote:
>> On 06/21/2012 11:59 AM, Jan Beulich wrote:
>>>>>> On 14.06.12 at 17:15, Wei Wang<wei.wang2@amd.com>
wrote:
>>>> Am 14.06.2012 16:18, schrieb Jan Beulich:
>>>>> Have you at all considered getting this fixed on the kernel
side?
>>>>> As I don''t have direct access to any AMD IOMMU
capable
>>>>> system - can one, other than by enumerating the respective
>>>>> PCI IDs or reading ACPI tables, reasonably easily identify
the
>>>>> devices in question (e.g. via vendor/class/sub-class or
some
>>>>> such)? That might permit skipping those in the offending
kernel
>>>>> code...
>>>>
>>>> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base
Peripheral)
>>>> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD,
this
>>>> should be enough to identify amd iommu device. I could send you
a kernel
>>>> patch for review using this approach. I would believe that
fixing this
>>>> issue in 4.2, no matter how, is really important for amd iommu.
>>>
>>> As you didn''t come forward with anything, here''s
my first
>>> take on this:
>>
>> Hi Jan
>> Thanks a lot for the patch. Actually I plan to send my version today,
>> which is based on 3.4 pv_ops but looks very similar to yours. So,
Acked!
>>
>> I also evaluated the possibility of hiding iommu device from dom0. I
>> think the change is no quite a lot, at least, for io based pcicfg
>> access. A proof-of-concept patch is attached.
>
> This completely hides the device from Dom0, but only when
> config space is accessed via method 1. Did you not see my
> earlier patch doing this for MCFG as wellCould you please provide a particular c/s number?... (I saw too many c/s 
might be related to this topic). so that I could work out a patch to 
support both i/o and mmcfg.

(albeit only disallowing> writes, so allowing the device to still be seen by Dom0)?Sounds better to me...this still allows user to check iommu status from 
lspci.
> Whether completely hiding the device is actually okay I can''t
> easily tell: Would IOMMUs always be either at func 0 of a single-
> unction device, or at a non-zero func of a multi-function one? If
> not, other devices may get hidden implicitly.
AMD IOMMU is an independent pci-e endpoint, and this function will not 
be used for other purposes other than containing an iommu. So I don''t 
see that iommu will share bdf value with other devices.

Thanks,
Wei
> Also I noticed just now that guest_io_read() wouldn''t really
> behave correctly when pci_cfg_ok() returned false - it might
> pass back 0xff even for a multi-byte read. I''ll send a fix
shortly.
>
> Jan
>
>> --- a/xen/arch/x86/traps.c      Thu Jun 21 11:30:59 2012 +0200
>> +++ b/xen/arch/x86/traps.c      Thu Jun 21 13:19:02 2012 +0200
>> @@ -73,6 +73,7 @@
>>    #include<asm/hpet.h>
>>    #include<public/arch-x86/cpuid.h>
>>    #include<xsm/xsm.h>
>> +#include<asm/hvm/svm/amd-iommu-proto.h>
>>
>>    /*
>>     * opt_nmi: one of ''ignore'',
''dom0'', or ''fatal''.
>> @@ -1686,10 +1687,19 @@ static int pci_cfg_ok(struct domain *d,
>>    {
>>        uint32_t machine_bdf;
>>        uint16_t start, end;
>> +    struct amd_iommu *iommu;
>> +
>>        if (!IS_PRIV(d))
>>            return 0;
>>
>>        machine_bdf = (d->arch.pci_cf8>>  8)&  0xFFFF;
>> +
>> +    for_each_amd_iommu ( iommu )
>> +    {
>> +        if ( machine_bdf == iommu->bdf )
>> +            return 0;
>> +    }
>> +
>>        start = d->arch.pci_cf8&  0xFF;
>>        end = start + size - 1;
>>        if (xsm_pci_config_permission(d, machine_bdf, start, end,
write))
>
>
>

Jan Beulich

2012-Jun-21 12:28 UTC

head link

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

>>> On 21.06.12 at 13:08, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> "Jan Beulich" <JBeulich@suse.com> writes:
> 
>>>>> On 14.06.12 at 17:15, Wei Wang <wei.wang2@amd.com>
wrote:
>>> Am 14.06.2012 16:18, schrieb Jan Beulich:
>>>> Have you at all considered getting this fixed on the kernel
side?
>>>> As I don''t have direct access to any AMD IOMMU capable
>>>> system - can one, other than by enumerating the respective
>>>> PCI IDs or reading ACPI tables, reasonably easily identify the
>>>> devices in question (e.g. via vendor/class/sub-class or some
>>>> such)? That might permit skipping those in the offending kernel
>>>> code...
>>> 
>>> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base
Peripheral)
>>> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this
>>> should be enough to identify amd iommu device. I could send you a
kernel
>>> patch for review using this approach. I would believe that fixing
this
>>> issue in 4.2, no matter how, is really important for amd iommu.
>>
>> As you didn''t come forward with anything, here''s my
first
>> take on this:
>>
>> Commit d5dea7d95c48d7bc951cee4910a7fd9c0cd26fb0 ("PCI: msi:
Disable msi
>> interrupts when we initialize a pci device") caused MSI to get
disabled
>> on Xen Dom0 despite it having got turned on purposefully by the
>> hypervisor. As an immediate band aid, suppress the disabling in this
>> specific case.
>>
>> I don''t think, however, that either the original change or
this fix are
>> really appropriate. The original fix, besides leaving aside the
>> presence of a hypervisor like Xen, doesn''t deal with all
possible
>> cases (in particular it has no effect if the secondary kernel was built
>> with CONFIG_PCI_MSI unset). And taking into account a hypervisor like
>> Xen, the logic like this should probably be skipped altogether (i.e. it
>> should be entirely left to the hypervisor, being the entity in control
>> of IRQs).
> 
> The original fix is fine. Xen dom0 remains insane.  Although perhaps
> some better than Xen dom0 once was.
> 
> Why does the dom0 kernel even get any access to pci devices that
> Xen doesn''t want it to touch?
> 
> As far as I can tell my patch has revealed a design bug in Xen.  If you
> don''t want to be messed up by the kernel don''t let the
kernel touch
> things.
I rather think this is a design bug of the AMD IOMMU - it should
never have got surfaced as a PCI device.

Furthermore, fully hiding _any_ PCI device from Dom0 has its
own set of problems - depending on the nature of the device,
full emulation of all devices may become necessary to get this
implemented (because this can implicitly hide other devices, or
the Dom0 kernel re-assigning BARs could get in conflict with
what the hidden devices use).

Xen''s model has always been to allow Dom0 full control over all
peripherals and bridges, so if all of the sudden PCI devices of
other kinds show up, we can''t simply re-model everything.
>  At the very least we need an abstraction much cleaner
> than the patch below.
Probably - any suggestion? As said in the mail I sent, this is
meant to overcome the immediate problem, and a more
permanent fix would be desirable (ideally suppressing the
playing with the MSI related data altogether, following the
rest of the MSI support in Xen/Dom0).

Jan

Jan Beulich

2012-Jun-21 12:45 UTC

head link

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

>>> On 21.06.12 at 14:28, Wei Wang <wei.wang2@amd.com> wrote:
> On 06/21/2012 02:06 PM, Jan Beulich wrote:
>>>>> On 21.06.12 at 13:21, Wei Wang<wei.wang2@amd.com> 
wrote:
>>> I also evaluated the possibility of hiding iommu device from dom0.
I
>>> think the change is no quite a lot, at least, for io based pcicfg
>>> access. A proof-of-concept patch is attached.
>>
>> This completely hides the device from Dom0, but only when
>> config space is accessed via method 1. Did you not see my
>> earlier patch doing this for MCFG as well
> Could you please provide a particular c/s number?... (I saw too many c/s 
> might be related to this topic). so that I could work out a patch to 
> support both i/o and mmcfg.
I sent this to you yesterday, so you''d be able to test whether
it actually fulfills its purpose before we discuss whether this is
acceptable for 4.2. See
http://lists.xen.org/archives/html/xen-devel/2012-06/msg01129.html
> (albeit only disallowing
>> writes, so allowing the device to still be seen by Dom0)?
> Sounds better to me...this still allows user to check iommu status from 
> lspci.
> 
>> Whether completely hiding the device is actually okay I can''t
>> easily tell: Would IOMMUs always be either at func 0 of a single-
>> unction device, or at a non-zero func of a multi-function one? If
>> not, other devices may get hidden implicitly.
> 
> AMD IOMMU is an independent pci-e endpoint, and this function will not 
> be used for other purposes other than containing an iommu. So I
don''t
> see that iommu will share bdf value with other devices.
The question is not regarding bdf, but regarding whether under
the same seg:bus:dev there might be multiple functions, one of
which is the IOMMU, and if so, whether the IOMMU would be
guaranteed to have a non-zero function number.

Jan

Wei Wang

2012-Jun-21 13:10 UTC

head link

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

On 06/21/2012 02:45 PM, Jan Beulich wrote:>>>> On 21.06.12 at 14:28, Wei Wang<wei.wang2@amd.com>  wrote:
>> On 06/21/2012 02:06 PM, Jan Beulich wrote:
>>>>>> On 21.06.12 at 13:21, Wei Wang<wei.wang2@amd.com>
wrote:
>>>> I also evaluated the possibility of hiding iommu device from
dom0. I
>>>> think the change is no quite a lot, at least, for io based
pcicfg
>>>> access. A proof-of-concept patch is attached.
>>>
>>> This completely hides the device from Dom0, but only when
>>> config space is accessed via method 1. Did you not see my
>>> earlier patch doing this for MCFG as well
>> Could you please provide a particular c/s number?... (I saw too many
c/s
>> might be related to this topic). so that I could work out a patch to
>> support both i/o and mmcfg.
>
> I sent this to you yesterday, so you''d be able to test whether
> it actually fulfills its purpose before we discuss whether this is
> acceptable for 4.2. See
> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01129.html
Oh, yes I found it, my email filter did not work well so I did not see 
it at the right folder. I will test right now.
>
>> (albeit only disallowing
>>> writes, so allowing the device to still be seen by Dom0)?
>> Sounds better to me...this still allows user to check iommu status from
>> lspci.
>>
>>> Whether completely hiding the device is actually okay I
can''t
>>> easily tell: Would IOMMUs always be either at func 0 of a single-
>>> unction device, or at a non-zero func of a multi-function one? If
>>> not, other devices may get hidden implicitly.
>>
>> AMD IOMMU is an independent pci-e endpoint, and this function will not
>> be used for other purposes other than containing an iommu. So I
don''t
>> see that iommu will share bdf value with other devices.
>
> The question is not regarding bdf, but regarding whether under
> the same seg:bus:dev there might be multiple functions, one of
> which is the IOMMU, and if so, whether the IOMMU would be
> guaranteed to have a non-zero function number.
In a real system (single or multiple iommu), amd iommu shares the same 
device number with north bridge but has function number 2.. (e.g 
bus:00.2) Howerver according to spec, it does not guaranteed to have 
non-zero function number. So what is the problem you see if iommu uses 
fun0 on a multi-func device?

Thanks,
Wei
> Jan
>
>

Jan Beulich

2012-Jun-21 13:24 UTC

head link

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

>>> On 21.06.12 at 15:10, Wei Wang <wei.wang2@amd.com> wrote:
> On 06/21/2012 02:45 PM, Jan Beulich wrote:
>>>>> On 21.06.12 at 14:28, Wei Wang<wei.wang2@amd.com> 
wrote:
>>> AMD IOMMU is an independent pci-e endpoint, and this function will
not
>>> be used for other purposes other than containing an iommu. So I
don''t
>>> see that iommu will share bdf value with other devices.
>>
>> The question is not regarding bdf, but regarding whether under
>> the same seg:bus:dev there might be multiple functions, one of
>> which is the IOMMU, and if so, whether the IOMMU would be
>> guaranteed to have a non-zero function number.
> 
> In a real system (single or multiple iommu), amd iommu shares the same 
> device number with north bridge but has function number 2.. (e.g 
> bus:00.2) Howerver according to spec, it does not guaranteed to have 
> non-zero function number. So what is the problem you see if iommu uses 
> fun0 on a multi-func device?
If it''s on func 0 and gets hidden completely (as done by your
partial patch), other functions won''t be found when scanning
for them (because secondary functions get looked at only
when func 0 actually exists, as otherwise evaluating the header
type register is invalid).

Jan

Wei Wang

2012-Jun-21 13:27 UTC

head link

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

On 06/21/2012 03:24 PM, Jan Beulich wrote:>>>> On 21.06.12 at 15:10, Wei Wang<wei.wang2@amd.com>  wrote:
>> On 06/21/2012 02:45 PM, Jan Beulich wrote:
>>>>>> On 21.06.12 at 14:28, Wei Wang<wei.wang2@amd.com>
wrote:
>>>> AMD IOMMU is an independent pci-e endpoint, and this function
will not
>>>> be used for other purposes other than containing an iommu. So I
don''t
>>>> see that iommu will share bdf value with other devices.
>>>
>>> The question is not regarding bdf, but regarding whether under
>>> the same seg:bus:dev there might be multiple functions, one of
>>> which is the IOMMU, and if so, whether the IOMMU would be
>>> guaranteed to have a non-zero function number.
>>
>> In a real system (single or multiple iommu), amd iommu shares the same
>> device number with north bridge but has function number 2.. (e.g
>> bus:00.2) Howerver according to spec, it does not guaranteed to have
>> non-zero function number. So what is the problem you see if iommu uses
>> fun0 on a multi-func device?
>
> If it''s on func 0 and gets hidden completely (as done by your
> partial patch), other functions won''t be found when scanning
> for them (because secondary functions get looked at only
> when func 0 actually exists, as otherwise evaluating the header
> type register is invalid).
OK, understood. Then I think we do need to allow pci cfg read for iommu 
device.

Thanks
Wei
> Jan
>
>

Wei Wang

2012-Jun-21 15:29 UTC

head link

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

On 06/20/2012 05:45 PM, Jan Beulich wrote:
  > Rather than submitting it for inclusion immediately, I''d rather
see> you first use it successfully. Below/attached what appears to
> work fine for me in contrived situations (i.e. hiding bridges with
> nothing connected, as I still don''t have any AMD IOMMU system
> at hand).
Hi Jan,

The patch works for me. IOMMU msi Capability is shown as enabled with 
this patch.
Thanks,
Wei

00:00.2 Generic system peripheral [0806]: Advanced Micro Devices [AMD] 
Device 1419
         Subsystem: Advanced Micro Devices [AMD] Device 1419
         Flags: bus master, fast devsel, latency 0, IRQ 11
         Capabilities: [40] Secure device <?>
         Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+
         Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+

> Jan
>
> Note that due to ptwr_do_page_fault() being run first, there''ll be
a
> MEM_LOG() issued for each such attempt. If that''s undesirable, the
> order of the calls would need to be swapped.
>
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -1875,6 +1875,7 @@ static int mod_l1_entry(l1_pgentry_t *pl
>               break;
>           case 1:
>               l1e_remove_flags(nl1e, _PAGE_RW);
> +            rc = 0;
>               break;
>           }
>           if ( page )
> @@ -5208,6 +5209,97 @@ int ptwr_do_page_fault(struct vcpu *v, u
>       return 0;
>   }
>
> +#ifdef __x86_64__
> +/*************************
> + * fault handling for read-only MMIO pages
> + */
> +
> +struct mmio_ro_emulate_ctxt {
> +    struct x86_emulate_ctxt ctxt;
> +    unsigned long cr2;
> +};
> +
> +static int mmio_ro_emulated_read(
> +    enum x86_segment seg,
> +    unsigned long offset,
> +    void *p_data,
> +    unsigned int bytes,
> +    struct x86_emulate_ctxt *ctxt)
> +{
> +    return X86EMUL_UNHANDLEABLE;
> +}
> +
> +static int mmio_ro_emulated_write(
> +    enum x86_segment seg,
> +    unsigned long offset,
> +    void *p_data,
> +    unsigned int bytes,
> +    struct x86_emulate_ctxt *ctxt)
> +{
> +    struct mmio_ro_emulate_ctxt *mmio_ro_ctxt > +       
container_of(ctxt, struct mmio_ro_emulate_ctxt, ctxt);
> +
> +    /* Only allow naturally-aligned stores at the original %cr2 address.
*/
> +    if ( ((bytes | offset)&  (bytes - 1)) || offset !=
mmio_ro_ctxt->cr2 )
> +    {
> +        MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx, addr=%lx,
bytes=%u)",
> +                mmio_ro_ctxt->cr2, offset, bytes);
> +        return X86EMUL_UNHANDLEABLE;
> +    }
> +
> +    return X86EMUL_OKAY;
> +}
> +
> +static const struct x86_emulate_ops mmio_ro_emulate_ops = {
> +    .read       = mmio_ro_emulated_read,
> +    .insn_fetch = ptwr_emulated_read,
> +    .write      = mmio_ro_emulated_write,
> +};
> +
> +/* Check if guest is trying to modify a r/o MMIO page. */
> +int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
> +                          struct cpu_user_regs *regs)
> +{
> +    l1_pgentry_t      pte;
> +    unsigned long     mfn;
> +    unsigned int      addr_size = is_pv_32on64_domain(v->domain) ?
> +                                  32 : BITS_PER_LONG;
> +    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = {
> +        .ctxt.regs = regs,
> +        .ctxt.addr_size = addr_size,
> +        .ctxt.sp_size = addr_size,
> +        .cr2 = addr
> +    };
> +    int rc;
> +
> +    /* Attempt to read the PTE that maps the VA being accessed. */
> +    guest_get_eff_l1e(v, addr,&pte);
> +
> +    /* We are looking only for read-only mappings of MMIO pages. */
> +    if ( ((l1e_get_flags(pte)&  (_PAGE_PRESENT|_PAGE_RW)) !=
_PAGE_PRESENT) )
> +        return 0;
> +
> +    mfn = l1e_get_pfn(pte);
> +    if ( mfn_valid(mfn) )
> +    {
> +        struct page_info *page = mfn_to_page(mfn);
> +        struct domain *owner = page_get_owner_and_reference(page);
> +
> +        if ( owner )
> +            put_page(page);
> +        if ( owner != dom_io )
> +            return 0;
> +    }
> +
> +    if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
> +        return 0;
> +
> +    rc = x86_emulate(&mmio_ro_ctxt.ctxt,&mmio_ro_emulate_ops);
> +
> +    return rc != X86EMUL_UNHANDLEABLE ? EXCRET_fault_fixed : 0;
> +}
> +#endif /* __x86_64__ */
> +
>   void free_xen_pagetable(void *v)
>   {
>       if ( system_state == SYS_STATE_early_boot )
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -1349,20 +1349,23 @@ static int fixup_page_fault(unsigned lon
>           return 0;
>       }
>
> -    if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables)&&
> -         guest_kernel_mode(v, regs) )
> -    {
> -        unsigned int mbs = PFEC_write_access;
> -        unsigned int mbz = PFEC_reserved_bit | PFEC_insn_fetch;
> -
> -        /* Do not check if access-protection fault since the page may
> -           legitimately be not present in shadow page tables */
> -        if ( !paging_mode_enabled(d) )
> -            mbs |= PFEC_page_present;
> -
> -        if ( ((regs->error_code&  (mbs | mbz)) == mbs)&&
> +    if ( guest_kernel_mode(v, regs)&&
> +         !(regs->error_code&  (PFEC_reserved_bit |
PFEC_insn_fetch))&&
> +         (regs->error_code&  PFEC_write_access) )
> +    {
> +        if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables)&&
> +             /* Do not check if access-protection fault since the page may
> +                legitimately be not present in shadow page tables */
> +             (paging_mode_enabled(d) ||
> +              (regs->error_code&  PFEC_page_present))&&
>                ptwr_do_page_fault(v, addr, regs) )
>               return EXCRET_fault_fixed;
> +
> +#ifdef __x86_64__
> +        if ( IS_PRIV(d)&&  (regs->error_code& 
PFEC_page_present)&&
> +             mmio_ro_do_page_fault(v, addr, regs) )
> +            return EXCRET_fault_fixed;
> +#endif
>       }
>
>       /* For non-external shadowed guests, we fix up both their own
> @@ -1690,6 +1693,13 @@ static int pci_cfg_ok(struct domain *d,
>           return 0;
>
>       machine_bdf = (d->arch.pci_cf8>>  8)&  0xFFFF;
> +    if ( write )
> +    {
> +        const unsigned long *ro_map = pci_get_ro_map(0);
> +
> +        if ( ro_map&&  test_bit(machine_bdf, ro_map) )
> +            return 0;
> +    }
>       start = d->arch.pci_cf8&  0xFF;
>       end = start + size - 1;
>       if (xsm_pci_config_permission(d, machine_bdf, start, end, write))
> --- a/xen/arch/x86/x86_32/pci.c
> +++ b/xen/arch/x86/x86_32/pci.c
> @@ -6,6 +6,7 @@
>
>   #include<xen/spinlock.h>
>   #include<xen/pci.h>
> +#include<xen/init.h>
>   #include<asm/io.h>
>
>   #define PCI_CONF_ADDRESS(bus, dev, func, reg) \
> @@ -70,3 +71,7 @@ void pci_conf_write32(
>       BUG_ON((bus>  255) || (dev>  31) || (func>  7) || (reg> 
255));
>       pci_conf_write(PCI_CONF_ADDRESS(bus, dev, func, reg), 0, 4, data);
>   }
> +
> +void __init arch_pci_ro_device(int seg, int bdf)
> +{
> +}
> --- a/xen/arch/x86/x86_64/mmconfig_64.c
> +++ b/xen/arch/x86/x86_64/mmconfig_64.c
> @@ -145,9 +145,30 @@ static void __iomem *mcfg_ioremap(const
>       return (void __iomem *) virt;
>   }
>
> +void arch_pci_ro_device(int seg, int bdf)
> +{
> +    unsigned int idx, bus = PCI_BUS(bdf);
> +
> +    for (idx = 0; idx<  pci_mmcfg_config_num; ++idx) {
> +        const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg;
> +        unsigned long mfn = (cfg->address>>  PAGE_SHIFT) + bdf;
> +
> +        if (!pci_mmcfg_virt[idx].virt || cfg->pci_segment != seg ||
> +            cfg->start_bus_number>  bus ||
cfg->end_bus_number<  bus)
> +            continue;
> +
> +        if (rangeset_add_singleton(mmio_ro_ranges, mfn))
> +            printk(XENLOG_ERR
> +                   "%04x:%02x:%02x.%u: could not mark MCFG (mfn %#lx)
read-only\n",
> +                   cfg->pci_segment, bus, PCI_SLOT(bdf), PCI_FUNC(bdf),
> +                   mfn);
> +    }
> +}
> +
>   int pci_mmcfg_arch_enable(unsigned int idx)
>   {
>       const typeof(pci_mmcfg_config[0]) *cfg = pci_mmcfg_virt[idx].cfg;
> +    const unsigned long *ro_map = pci_get_ro_map(cfg->pci_segment);
>
>       if (pci_mmcfg_virt[idx].virt)
>           return 0;
> @@ -159,6 +180,16 @@ int pci_mmcfg_arch_enable(unsigned int i
>       }
>       printk(KERN_INFO "PCI: Using MCFG for segment %04x bus
%02x-%02x\n",
>              cfg->pci_segment, cfg->start_bus_number,
cfg->end_bus_number);
> +    if (ro_map) {
> +        unsigned int bdf = PCI_BDF(cfg->start_bus_number, 0, 0);
> +        unsigned int end = PCI_BDF(cfg->end_bus_number, -1, -1);
> +
> +        while ((bdf = find_next_bit(ro_map, end + 1, bdf))<= end) {
> +            arch_pci_ro_device(cfg->pci_segment, bdf);
> +            if (bdf++ == end)
> +                break;
> +        }
> +    }
>       return 0;
>   }
>
> --- a/xen/drivers/passthrough/amd/iommu_detect.c
> +++ b/xen/drivers/passthrough/amd/iommu_detect.c
> @@ -153,6 +153,12 @@ int __init amd_iommu_detect_one_acpi(
>       if ( rt )
>           return -ENODEV;
>
> +    rt = pci_ro_device(iommu->seg, bus, PCI_DEVFN(dev, func));
> +    if ( rt )
> +        printk(XENLOG_ERR
> +               "Could not mark config space of %04x:%02x:%02x.%u
read-only (%d)\n",
> +               iommu->seg, bus, dev, func, rt);
> +
>       list_add_tail(&iommu->list,&amd_iommu_head);
>
>       return 0;
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -593,11 +593,3 @@ void hvm_dpci_eoi(struct domain *d, unsi
>   unlock:
>       spin_unlock(&d->event_lock);
>   }
> -
> -static int __init setup_mmio_ro_ranges(void)
> -{
> -    mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges",
> -                                  RANGESETF_prettyprint_hex);
> -    return 0;
> -}
> -__initcall(setup_mmio_ro_ranges);
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -36,6 +36,7 @@
>   struct pci_seg {
>       struct list_head alldevs_list;
>       u16 nr;
> +    unsigned long *ro_map;
>       /* bus2bridge_lock protects bus2bridge array */
>       spinlock_t bus2bridge_lock;
>   #define MAX_BUSES 256
> @@ -106,6 +107,8 @@ void __init pt_pci_init(void)
>       radix_tree_init(&pci_segments);
>       if ( !alloc_pseg(0) )
>           panic("Could not initialize PCI segment 0\n");
> +    mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges",
> +                                  RANGESETF_prettyprint_hex);
>   }
>
>   int __init pci_add_segment(u16 seg)
> @@ -113,6 +116,13 @@ int __init pci_add_segment(u16 seg)
>       return alloc_pseg(seg) ? 0 : -ENOMEM;
>   }
>
> +const unsigned long *pci_get_ro_map(u16 seg)
> +{
> +    struct pci_seg *pseg = get_pseg(seg);
> +
> +    return pseg ? pseg->ro_map : NULL;
> +}
> +
>   static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
>   {
>       struct pci_dev *pdev;
> @@ -198,6 +208,33 @@ static void free_pdev(struct pci_seg *ps
>       xfree(pdev);
>   }
>
> +int __init pci_ro_device(int seg, int bus, int devfn)
> +{
> +    struct pci_seg *pseg = alloc_pseg(seg);
> +    struct pci_dev *pdev;
> +
> +    if ( !pseg )
> +        return -ENOMEM;
> +    pdev = alloc_pdev(pseg, bus, devfn);
> +    if ( !pdev )
> +        return -ENOMEM;
> +
> +    if ( !pseg->ro_map )
> +    {
> +        size_t sz = BITS_TO_LONGS(PCI_BDF(-1, -1, -1) + 1) * sizeof(long);
> +
> +        pseg->ro_map = alloc_xenheap_pages(get_order_from_bytes(sz),
0);
> +        if ( !pseg->ro_map )
> +            return -ENOMEM;
> +        memset(pseg->ro_map, 0, sz);
> +    }
> +
> +    __set_bit(PCI_BDF2(bus, devfn), pseg->ro_map);
> +    arch_pci_ro_device(seg, PCI_BDF2(bus, devfn));
> +
> +    return 0;
> +}
> +
>   struct pci_dev *pci_get_pdev(int seg, int bus, int devfn)
>   {
>       struct pci_seg *pseg = get_pseg(seg);
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -555,6 +555,8 @@ void memguard_unguard_stack(void *p);
>
>   int  ptwr_do_page_fault(struct vcpu *, unsigned long,
>                           struct cpu_user_regs *);
> +int  mmio_ro_do_page_fault(struct vcpu *, unsigned long,
> +                           struct cpu_user_regs *);
>
>   int audit_adjust_pgtables(struct domain *d, int dir, int noisy);
>
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -98,8 +98,11 @@ struct pci_dev *pci_lock_domain_pdev(
>   void setup_dom0_pci_devices(struct domain *, void (*)(struct pci_dev *));
>   void pci_release_devices(struct domain *d);
>   int pci_add_segment(u16 seg);
> +const unsigned long *pci_get_ro_map(u16 seg);
>   int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info
*);
>   int pci_remove_device(u16 seg, u8 bus, u8 devfn);
> +int pci_ro_device(int seg, int bus, int devfn);
> +void arch_pci_ro_device(int seg, int bdf);
>   struct pci_dev *pci_get_pdev(int seg, int bus, int devfn);
>   struct pci_dev *pci_get_pdev_by_domain(
>       struct domain *, int seg, int bus, int devfn);
>

Jan Beulich

2012-Jun-21 15:49 UTC

head link

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

>>> On 21.06.12 at 17:29, Wei Wang <wei.wang2@amd.com> wrote:
> On 06/20/2012 05:45 PM, Jan Beulich wrote:
>   > Rather than submitting it for inclusion immediately, I''d
rather see
>> you first use it successfully. Below/attached what appears to
>> work fine for me in contrived situations (i.e. hiding bridges with
>> nothing connected, as I still don''t have any AMD IOMMU system
>> at hand).
> 
> The patch works for me. IOMMU msi Capability is shown as enabled with 
> this patch.
Keir,

the question now arises whether we really want this, and
particularly this late before 4.2. The Linux folks don''t seem to
be willing to take the strait forward workaround for the
problem introduced at their end, so we will likely need
something (the more that the questionable fix already made
it into various stable trees) before 4.2 goes out (and even
the older trees would be affected, just that putting a change
like this there is even more questionable).

There are obviously more potential problems in this area: If
any of the MMIO addresses used by AMD''s IOMMU is
configurable through one of the BARs, and if Dom0 decides to
re-assign MMIO space, neither allowing the writes nor simply
dropping them as done here will work. Whether that''s a real
problem I don''t know - Wei? And there might be other issues
arising from dropping all writes - we might just currently not be
aware of them.

Jan
> 00:00.2 Generic system peripheral [0806]: Advanced Micro Devices [AMD] 
> Device 1419
>          Subsystem: Advanced Micro Devices [AMD] Device 1419
>          Flags: bus master, fast devsel, latency 0, IRQ 11
>          Capabilities: [40] Secure device <?>
>          Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+
>          Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+
> 
> 
>> Jan
>>
>> Note that due to ptwr_do_page_fault() being run first,
there''ll be a
>> MEM_LOG() issued for each such attempt. If that''s undesirable,
the
>> order of the calls would need to be swapped.
>>
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -1875,6 +1875,7 @@ static int mod_l1_entry(l1_pgentry_t *pl
>>               break;
>>           case 1:
>>               l1e_remove_flags(nl1e, _PAGE_RW);
>> +            rc = 0;
>>               break;
>>           }
>>           if ( page )
>> @@ -5208,6 +5209,97 @@ int ptwr_do_page_fault(struct vcpu *v, u
>>       return 0;
>>   }
>>
>> +#ifdef __x86_64__
>> +/*************************
>> + * fault handling for read-only MMIO pages
>> + */
>> +
>> +struct mmio_ro_emulate_ctxt {
>> +    struct x86_emulate_ctxt ctxt;
>> +    unsigned long cr2;
>> +};
>> +
>> +static int mmio_ro_emulated_read(
>> +    enum x86_segment seg,
>> +    unsigned long offset,
>> +    void *p_data,
>> +    unsigned int bytes,
>> +    struct x86_emulate_ctxt *ctxt)
>> +{
>> +    return X86EMUL_UNHANDLEABLE;
>> +}
>> +
>> +static int mmio_ro_emulated_write(
>> +    enum x86_segment seg,
>> +    unsigned long offset,
>> +    void *p_data,
>> +    unsigned int bytes,
>> +    struct x86_emulate_ctxt *ctxt)
>> +{
>> +    struct mmio_ro_emulate_ctxt *mmio_ro_ctxt >> +       
container_of(ctxt, struct mmio_ro_emulate_ctxt, ctxt);
>> +
>> +    /* Only allow naturally-aligned stores at the original %cr2
address. */
>> +    if ( ((bytes | offset)&  (bytes - 1)) || offset !=
mmio_ro_ctxt->cr2 )
>> +    {
>> +        MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx, addr=%lx, 
> bytes=%u)",
>> +                mmio_ro_ctxt->cr2, offset, bytes);
>> +        return X86EMUL_UNHANDLEABLE;
>> +    }
>> +
>> +    return X86EMUL_OKAY;
>> +}
>> +
>> +static const struct x86_emulate_ops mmio_ro_emulate_ops = {
>> +    .read       = mmio_ro_emulated_read,
>> +    .insn_fetch = ptwr_emulated_read,
>> +    .write      = mmio_ro_emulated_write,
>> +};
>> +
>> +/* Check if guest is trying to modify a r/o MMIO page. */
>> +int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
>> +                          struct cpu_user_regs *regs)
>> +{
>> +    l1_pgentry_t      pte;
>> +    unsigned long     mfn;
>> +    unsigned int      addr_size = is_pv_32on64_domain(v->domain) ?
>> +                                  32 : BITS_PER_LONG;
>> +    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = {
>> +        .ctxt.regs = regs,
>> +        .ctxt.addr_size = addr_size,
>> +        .ctxt.sp_size = addr_size,
>> +        .cr2 = addr
>> +    };
>> +    int rc;
>> +
>> +    /* Attempt to read the PTE that maps the VA being accessed. */
>> +    guest_get_eff_l1e(v, addr,&pte);
>> +
>> +    /* We are looking only for read-only mappings of MMIO pages. */
>> +    if ( ((l1e_get_flags(pte)&  (_PAGE_PRESENT|_PAGE_RW)) !=
_PAGE_PRESENT)
> )
>> +        return 0;
>> +
>> +    mfn = l1e_get_pfn(pte);
>> +    if ( mfn_valid(mfn) )
>> +    {
>> +        struct page_info *page = mfn_to_page(mfn);
>> +        struct domain *owner = page_get_owner_and_reference(page);
>> +
>> +        if ( owner )
>> +            put_page(page);
>> +        if ( owner != dom_io )
>> +            return 0;
>> +    }
>> +
>> +    if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
>> +        return 0;
>> +
>> +    rc = x86_emulate(&mmio_ro_ctxt.ctxt,&mmio_ro_emulate_ops);
>> +
>> +    return rc != X86EMUL_UNHANDLEABLE ? EXCRET_fault_fixed : 0;
>> +}
>> +#endif /* __x86_64__ */
>> +
>>   void free_xen_pagetable(void *v)
>>   {
>>       if ( system_state == SYS_STATE_early_boot )
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -1349,20 +1349,23 @@ static int fixup_page_fault(unsigned lon
>>           return 0;
>>       }
>>
>> -    if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables)&&
>> -         guest_kernel_mode(v, regs) )
>> -    {
>> -        unsigned int mbs = PFEC_write_access;
>> -        unsigned int mbz = PFEC_reserved_bit | PFEC_insn_fetch;
>> -
>> -        /* Do not check if access-protection fault since the page may
>> -           legitimately be not present in shadow page tables */
>> -        if ( !paging_mode_enabled(d) )
>> -            mbs |= PFEC_page_present;
>> -
>> -        if ( ((regs->error_code&  (mbs | mbz)) ==
mbs)&&
>> +    if ( guest_kernel_mode(v, regs)&&
>> +         !(regs->error_code&  (PFEC_reserved_bit |
PFEC_insn_fetch))&&
>> +         (regs->error_code&  PFEC_write_access) )
>> +    {
>> +        if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables)&&
>> +             /* Do not check if access-protection fault since the page
may
>> +                legitimately be not present in shadow page tables */
>> +             (paging_mode_enabled(d) ||
>> +              (regs->error_code&  PFEC_page_present))&&
>>                ptwr_do_page_fault(v, addr, regs) )
>>               return EXCRET_fault_fixed;
>> +
>> +#ifdef __x86_64__
>> +        if ( IS_PRIV(d)&&  (regs->error_code& 
PFEC_page_present)&&
>> +             mmio_ro_do_page_fault(v, addr, regs) )
>> +            return EXCRET_fault_fixed;
>> +#endif
>>       }
>>
>>       /* For non-external shadowed guests, we fix up both their own
>> @@ -1690,6 +1693,13 @@ static int pci_cfg_ok(struct domain *d,
>>           return 0;
>>
>>       machine_bdf = (d->arch.pci_cf8>>  8)&  0xFFFF;
>> +    if ( write )
>> +    {
>> +        const unsigned long *ro_map = pci_get_ro_map(0);
>> +
>> +        if ( ro_map&&  test_bit(machine_bdf, ro_map) )
>> +            return 0;
>> +    }
>>       start = d->arch.pci_cf8&  0xFF;
>>       end = start + size - 1;
>>       if (xsm_pci_config_permission(d, machine_bdf, start, end, write))
>> --- a/xen/arch/x86/x86_32/pci.c
>> +++ b/xen/arch/x86/x86_32/pci.c
>> @@ -6,6 +6,7 @@
>>
>>   #include<xen/spinlock.h>
>>   #include<xen/pci.h>
>> +#include<xen/init.h>
>>   #include<asm/io.h>
>>
>>   #define PCI_CONF_ADDRESS(bus, dev, func, reg) \
>> @@ -70,3 +71,7 @@ void pci_conf_write32(
>>       BUG_ON((bus>  255) || (dev>  31) || (func>  7) ||
(reg>  255));
>>       pci_conf_write(PCI_CONF_ADDRESS(bus, dev, func, reg), 0, 4,
data);
>>   }
>> +
>> +void __init arch_pci_ro_device(int seg, int bdf)
>> +{
>> +}
>> --- a/xen/arch/x86/x86_64/mmconfig_64.c
>> +++ b/xen/arch/x86/x86_64/mmconfig_64.c
>> @@ -145,9 +145,30 @@ static void __iomem *mcfg_ioremap(const
>>       return (void __iomem *) virt;
>>   }
>>
>> +void arch_pci_ro_device(int seg, int bdf)
>> +{
>> +    unsigned int idx, bus = PCI_BUS(bdf);
>> +
>> +    for (idx = 0; idx<  pci_mmcfg_config_num; ++idx) {
>> +        const struct acpi_mcfg_allocation *cfg =
pci_mmcfg_virt[idx].cfg;
>> +        unsigned long mfn = (cfg->address>>  PAGE_SHIFT) +
bdf;
>> +
>> +        if (!pci_mmcfg_virt[idx].virt || cfg->pci_segment != seg ||
>> +            cfg->start_bus_number>  bus ||
cfg->end_bus_number<  bus)
>> +            continue;
>> +
>> +        if (rangeset_add_singleton(mmio_ro_ranges, mfn))
>> +            printk(XENLOG_ERR
>> +                   "%04x:%02x:%02x.%u: could not mark MCFG (mfn
%#lx)
> read-only\n",
>> +                   cfg->pci_segment, bus, PCI_SLOT(bdf),
PCI_FUNC(bdf),
>> +                   mfn);
>> +    }
>> +}
>> +
>>   int pci_mmcfg_arch_enable(unsigned int idx)
>>   {
>>       const typeof(pci_mmcfg_config[0]) *cfg = pci_mmcfg_virt[idx].cfg;
>> +    const unsigned long *ro_map = pci_get_ro_map(cfg->pci_segment);
>>
>>       if (pci_mmcfg_virt[idx].virt)
>>           return 0;
>> @@ -159,6 +180,16 @@ int pci_mmcfg_arch_enable(unsigned int i
>>       }
>>       printk(KERN_INFO "PCI: Using MCFG for segment %04x bus
%02x-%02x\n",
>>              cfg->pci_segment, cfg->start_bus_number,
cfg->end_bus_number);
>> +    if (ro_map) {
>> +        unsigned int bdf = PCI_BDF(cfg->start_bus_number, 0, 0);
>> +        unsigned int end = PCI_BDF(cfg->end_bus_number, -1, -1);
>> +
>> +        while ((bdf = find_next_bit(ro_map, end + 1, bdf))<= end) {
>> +            arch_pci_ro_device(cfg->pci_segment, bdf);
>> +            if (bdf++ == end)
>> +                break;
>> +        }
>> +    }
>>       return 0;
>>   }
>>
>> --- a/xen/drivers/passthrough/amd/iommu_detect.c
>> +++ b/xen/drivers/passthrough/amd/iommu_detect.c
>> @@ -153,6 +153,12 @@ int __init amd_iommu_detect_one_acpi(
>>       if ( rt )
>>           return -ENODEV;
>>
>> +    rt = pci_ro_device(iommu->seg, bus, PCI_DEVFN(dev, func));
>> +    if ( rt )
>> +        printk(XENLOG_ERR
>> +               "Could not mark config space of %04x:%02x:%02x.%u
read-only
> (%d)\n",
>> +               iommu->seg, bus, dev, func, rt);
>> +
>>       list_add_tail(&iommu->list,&amd_iommu_head);
>>
>>       return 0;
>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -593,11 +593,3 @@ void hvm_dpci_eoi(struct domain *d, unsi
>>   unlock:
>>       spin_unlock(&d->event_lock);
>>   }
>> -
>> -static int __init setup_mmio_ro_ranges(void)
>> -{
>> -    mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges",
>> -                                  RANGESETF_prettyprint_hex);
>> -    return 0;
>> -}
>> -__initcall(setup_mmio_ro_ranges);
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -36,6 +36,7 @@
>>   struct pci_seg {
>>       struct list_head alldevs_list;
>>       u16 nr;
>> +    unsigned long *ro_map;
>>       /* bus2bridge_lock protects bus2bridge array */
>>       spinlock_t bus2bridge_lock;
>>   #define MAX_BUSES 256
>> @@ -106,6 +107,8 @@ void __init pt_pci_init(void)
>>       radix_tree_init(&pci_segments);
>>       if ( !alloc_pseg(0) )
>>           panic("Could not initialize PCI segment 0\n");
>> +    mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges",
>> +                                  RANGESETF_prettyprint_hex);
>>   }
>>
>>   int __init pci_add_segment(u16 seg)
>> @@ -113,6 +116,13 @@ int __init pci_add_segment(u16 seg)
>>       return alloc_pseg(seg) ? 0 : -ENOMEM;
>>   }
>>
>> +const unsigned long *pci_get_ro_map(u16 seg)
>> +{
>> +    struct pci_seg *pseg = get_pseg(seg);
>> +
>> +    return pseg ? pseg->ro_map : NULL;
>> +}
>> +
>>   static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8
devfn)
>>   {
>>       struct pci_dev *pdev;
>> @@ -198,6 +208,33 @@ static void free_pdev(struct pci_seg *ps
>>       xfree(pdev);
>>   }
>>
>> +int __init pci_ro_device(int seg, int bus, int devfn)
>> +{
>> +    struct pci_seg *pseg = alloc_pseg(seg);
>> +    struct pci_dev *pdev;
>> +
>> +    if ( !pseg )
>> +        return -ENOMEM;
>> +    pdev = alloc_pdev(pseg, bus, devfn);
>> +    if ( !pdev )
>> +        return -ENOMEM;
>> +
>> +    if ( !pseg->ro_map )
>> +    {
>> +        size_t sz = BITS_TO_LONGS(PCI_BDF(-1, -1, -1) + 1) *
sizeof(long);
>> +
>> +        pseg->ro_map =
alloc_xenheap_pages(get_order_from_bytes(sz), 0);
>> +        if ( !pseg->ro_map )
>> +            return -ENOMEM;
>> +        memset(pseg->ro_map, 0, sz);
>> +    }
>> +
>> +    __set_bit(PCI_BDF2(bus, devfn), pseg->ro_map);
>> +    arch_pci_ro_device(seg, PCI_BDF2(bus, devfn));
>> +
>> +    return 0;
>> +}
>> +
>>   struct pci_dev *pci_get_pdev(int seg, int bus, int devfn)
>>   {
>>       struct pci_seg *pseg = get_pseg(seg);
>> --- a/xen/include/asm-x86/mm.h
>> +++ b/xen/include/asm-x86/mm.h
>> @@ -555,6 +555,8 @@ void memguard_unguard_stack(void *p);
>>
>>   int  ptwr_do_page_fault(struct vcpu *, unsigned long,
>>                           struct cpu_user_regs *);
>> +int  mmio_ro_do_page_fault(struct vcpu *, unsigned long,
>> +                           struct cpu_user_regs *);
>>
>>   int audit_adjust_pgtables(struct domain *d, int dir, int noisy);
>>
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -98,8 +98,11 @@ struct pci_dev *pci_lock_domain_pdev(
>>   void setup_dom0_pci_devices(struct domain *, void (*)(struct pci_dev
*));
>>   void pci_release_devices(struct domain *d);
>>   int pci_add_segment(u16 seg);
>> +const unsigned long *pci_get_ro_map(u16 seg);
>>   int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct
pci_dev_info
> *);
>>   int pci_remove_device(u16 seg, u8 bus, u8 devfn);
>> +int pci_ro_device(int seg, int bus, int devfn);
>> +void arch_pci_ro_device(int seg, int bdf);
>>   struct pci_dev *pci_get_pdev(int seg, int bus, int devfn);
>>   struct pci_dev *pci_get_pdev_by_domain(
>>       struct domain *, int seg, int bus, int devfn);
>>

Keir Fraser

2012-Jun-21 16:31 UTC

head link

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

On 21/06/2012 16:49, "Jan Beulich" <JBeulich@suse.com> wrote:
> the question now arises whether we really want this, and
> particularly this late before 4.2. The Linux folks don''t seem to
> be willing to take the strait forward workaround for the
> problem introduced at their end, so we will likely need
> something (the more that the questionable fix already made
> it into various stable trees) before 4.2 goes out (and even
> the older trees would be affected, just that putting a change
> like this there is even more questionable).
I''d be okay seeing the proposed patch go in ahead of 4.2.0-rc1.

 -- Keir
> There are obviously more potential problems in this area: If
> any of the MMIO addresses used by AMD''s IOMMU is
> configurable through one of the BARs, and if Dom0 decides to
> re-assign MMIO space, neither allowing the writes nor simply
> dropping them as done here will work. Whether that''s a real
> problem I don''t know - Wei? And there might be other issues
> arising from dropping all writes - we might just currently not be
> aware of them.

Wei Wang

2012-Jun-22 09:03 UTC

head link

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

On 06/21/2012 05:49 PM, Jan Beulich wrote:>>>> On 21.06.12 at 17:29, Wei Wang<wei.wang2@amd.com>  wrote:
>> On 06/20/2012 05:45 PM, Jan Beulich wrote:
>>    >  Rather than submitting it for inclusion immediately,
I''d rather see
>>> you first use it successfully. Below/attached what appears to
>>> work fine for me in contrived situations (i.e. hiding bridges with
>>> nothing connected, as I still don''t have any AMD IOMMU
system
>>> at hand).
>>
>> The patch works for me. IOMMU msi Capability is shown as enabled with
>> this patch.
>
> Keir,
>
> the question now arises whether we really want this, and
> particularly this late before 4.2. The Linux folks don''t seem to
> be willing to take the strait forward workaround for the
> problem introduced at their end, so we will likely need
> something (the more that the questionable fix already made
> it into various stable trees) before 4.2 goes out (and even
> the older trees would be affected, just that putting a change
> like this there is even more questionable).
>
> There are obviously more potential problems in this area: If
> any of the MMIO addresses used by AMD''s IOMMU is
> configurable through one of the BARs, and if Dom0 decides to
> re-assign MMIO space, neither allowing the writes nor simply
> dropping them as done here will work. Whether that''s a real
> problem I don''t know - Wei? And there might be other issues
> arising from dropping all writes - we might just currently not be
> aware of them.
AMD IOMMU does not have any PCI BARs, All address configuration should 
go to ACPI tables. So this should be fine at least for AMD IOMMU.

Thanks
Wei
> Jan
>
>> 00:00.2 Generic system peripheral [0806]: Advanced Micro Devices [AMD]
>> Device 1419
>>           Subsystem: Advanced Micro Devices [AMD] Device 1419
>>           Flags: bus master, fast devsel, latency 0, IRQ 11
>>           Capabilities: [40] Secure device<?>
>>           Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>           Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+
>>
>>
>>> Jan
>>>
>>> Note that due to ptwr_do_page_fault() being run first,
there''ll be a
>>> MEM_LOG() issued for each such attempt. If that''s
undesirable, the
>>> order of the calls would need to be swapped.
>>>
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -1875,6 +1875,7 @@ static int mod_l1_entry(l1_pgentry_t *pl
>>>                break;
>>>            case 1:
>>>                l1e_remove_flags(nl1e, _PAGE_RW);
>>> +            rc = 0;
>>>                break;
>>>            }
>>>            if ( page )
>>> @@ -5208,6 +5209,97 @@ int ptwr_do_page_fault(struct vcpu *v, u
>>>        return 0;
>>>    }
>>>
>>> +#ifdef __x86_64__
>>> +/*************************
>>> + * fault handling for read-only MMIO pages
>>> + */
>>> +
>>> +struct mmio_ro_emulate_ctxt {
>>> +    struct x86_emulate_ctxt ctxt;
>>> +    unsigned long cr2;
>>> +};
>>> +
>>> +static int mmio_ro_emulated_read(
>>> +    enum x86_segment seg,
>>> +    unsigned long offset,
>>> +    void *p_data,
>>> +    unsigned int bytes,
>>> +    struct x86_emulate_ctxt *ctxt)
>>> +{
>>> +    return X86EMUL_UNHANDLEABLE;
>>> +}
>>> +
>>> +static int mmio_ro_emulated_write(
>>> +    enum x86_segment seg,
>>> +    unsigned long offset,
>>> +    void *p_data,
>>> +    unsigned int bytes,
>>> +    struct x86_emulate_ctxt *ctxt)
>>> +{
>>> +    struct mmio_ro_emulate_ctxt *mmio_ro_ctxt >>> +      
container_of(ctxt, struct mmio_ro_emulate_ctxt, ctxt);
>>> +
>>> +    /* Only allow naturally-aligned stores at the original %cr2
address. */
>>> +    if ( ((bytes | offset)&   (bytes - 1)) || offset !=
mmio_ro_ctxt->cr2 )
>>> +    {
>>> +        MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx,
addr=%lx,
>> bytes=%u)",
>>> +                mmio_ro_ctxt->cr2, offset, bytes);
>>> +        return X86EMUL_UNHANDLEABLE;
>>> +    }
>>> +
>>> +    return X86EMUL_OKAY;
>>> +}
>>> +
>>> +static const struct x86_emulate_ops mmio_ro_emulate_ops = {
>>> +    .read       = mmio_ro_emulated_read,
>>> +    .insn_fetch = ptwr_emulated_read,
>>> +    .write      = mmio_ro_emulated_write,
>>> +};
>>> +
>>> +/* Check if guest is trying to modify a r/o MMIO page. */
>>> +int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
>>> +                          struct cpu_user_regs *regs)
>>> +{
>>> +    l1_pgentry_t      pte;
>>> +    unsigned long     mfn;
>>> +    unsigned int      addr_size =
is_pv_32on64_domain(v->domain) ?
>>> +                                  32 : BITS_PER_LONG;
>>> +    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = {
>>> +        .ctxt.regs = regs,
>>> +        .ctxt.addr_size = addr_size,
>>> +        .ctxt.sp_size = addr_size,
>>> +        .cr2 = addr
>>> +    };
>>> +    int rc;
>>> +
>>> +    /* Attempt to read the PTE that maps the VA being accessed. */
>>> +    guest_get_eff_l1e(v, addr,&pte);
>>> +
>>> +    /* We are looking only for read-only mappings of MMIO pages.
*/
>>> +    if ( ((l1e_get_flags(pte)&   (_PAGE_PRESENT|_PAGE_RW)) !=
_PAGE_PRESENT)
>> )
>>> +        return 0;
>>> +
>>> +    mfn = l1e_get_pfn(pte);
>>> +    if ( mfn_valid(mfn) )
>>> +    {
>>> +        struct page_info *page = mfn_to_page(mfn);
>>> +        struct domain *owner = page_get_owner_and_reference(page);
>>> +
>>> +        if ( owner )
>>> +            put_page(page);
>>> +        if ( owner != dom_io )
>>> +            return 0;
>>> +    }
>>> +
>>> +    if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
>>> +        return 0;
>>> +
>>> +    rc =
x86_emulate(&mmio_ro_ctxt.ctxt,&mmio_ro_emulate_ops);
>>> +
>>> +    return rc != X86EMUL_UNHANDLEABLE ? EXCRET_fault_fixed : 0;
>>> +}
>>> +#endif /* __x86_64__ */
>>> +
>>>    void free_xen_pagetable(void *v)
>>>    {
>>>        if ( system_state == SYS_STATE_early_boot )
>>> --- a/xen/arch/x86/traps.c
>>> +++ b/xen/arch/x86/traps.c
>>> @@ -1349,20 +1349,23 @@ static int fixup_page_fault(unsigned lon
>>>            return 0;
>>>        }
>>>
>>> -    if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables)&&
>>> -         guest_kernel_mode(v, regs) )
>>> -    {
>>> -        unsigned int mbs = PFEC_write_access;
>>> -        unsigned int mbz = PFEC_reserved_bit | PFEC_insn_fetch;
>>> -
>>> -        /* Do not check if access-protection fault since the page
may
>>> -           legitimately be not present in shadow page tables */
>>> -        if ( !paging_mode_enabled(d) )
>>> -            mbs |= PFEC_page_present;
>>> -
>>> -        if ( ((regs->error_code&   (mbs | mbz)) ==
mbs)&&
>>> +    if ( guest_kernel_mode(v, regs)&&
>>> +         !(regs->error_code&   (PFEC_reserved_bit |
PFEC_insn_fetch))&&
>>> +         (regs->error_code&   PFEC_write_access) )
>>> +    {
>>> +        if ( VM_ASSIST(d,
VMASST_TYPE_writable_pagetables)&&
>>> +             /* Do not check if access-protection fault since the
page may
>>> +                legitimately be not present in shadow page tables
*/
>>> +             (paging_mode_enabled(d) ||
>>> +              (regs->error_code&  
PFEC_page_present))&&
>>>                 ptwr_do_page_fault(v, addr, regs) )
>>>                return EXCRET_fault_fixed;
>>> +
>>> +#ifdef __x86_64__
>>> +        if ( IS_PRIV(d)&&   (regs->error_code&  
PFEC_page_present)&&
>>> +             mmio_ro_do_page_fault(v, addr, regs) )
>>> +            return EXCRET_fault_fixed;
>>> +#endif
>>>        }
>>>
>>>        /* For non-external shadowed guests, we fix up both their
own
>>> @@ -1690,6 +1693,13 @@ static int pci_cfg_ok(struct domain *d,
>>>            return 0;
>>>
>>>        machine_bdf = (d->arch.pci_cf8>>   8)&  
0xFFFF;
>>> +    if ( write )
>>> +    {
>>> +        const unsigned long *ro_map = pci_get_ro_map(0);
>>> +
>>> +        if ( ro_map&&   test_bit(machine_bdf, ro_map) )
>>> +            return 0;
>>> +    }
>>>        start = d->arch.pci_cf8&   0xFF;
>>>        end = start + size - 1;
>>>        if (xsm_pci_config_permission(d, machine_bdf, start, end,
write))
>>> --- a/xen/arch/x86/x86_32/pci.c
>>> +++ b/xen/arch/x86/x86_32/pci.c
>>> @@ -6,6 +6,7 @@
>>>
>>>    #include<xen/spinlock.h>
>>>    #include<xen/pci.h>
>>> +#include<xen/init.h>
>>>    #include<asm/io.h>
>>>
>>>    #define PCI_CONF_ADDRESS(bus, dev, func, reg) \
>>> @@ -70,3 +71,7 @@ void pci_conf_write32(
>>>        BUG_ON((bus>   255) || (dev>   31) || (func>   7)
|| (reg>   255));
>>>        pci_conf_write(PCI_CONF_ADDRESS(bus, dev, func, reg), 0, 4,
data);
>>>    }
>>> +
>>> +void __init arch_pci_ro_device(int seg, int bdf)
>>> +{
>>> +}
>>> --- a/xen/arch/x86/x86_64/mmconfig_64.c
>>> +++ b/xen/arch/x86/x86_64/mmconfig_64.c
>>> @@ -145,9 +145,30 @@ static void __iomem *mcfg_ioremap(const
>>>        return (void __iomem *) virt;
>>>    }
>>>
>>> +void arch_pci_ro_device(int seg, int bdf)
>>> +{
>>> +    unsigned int idx, bus = PCI_BUS(bdf);
>>> +
>>> +    for (idx = 0; idx<   pci_mmcfg_config_num; ++idx) {
>>> +        const struct acpi_mcfg_allocation *cfg =
pci_mmcfg_virt[idx].cfg;
>>> +        unsigned long mfn = (cfg->address>>   PAGE_SHIFT)
+ bdf;
>>> +
>>> +        if (!pci_mmcfg_virt[idx].virt || cfg->pci_segment !=
seg ||
>>> +            cfg->start_bus_number>   bus ||
cfg->end_bus_number<   bus)
>>> +            continue;
>>> +
>>> +        if (rangeset_add_singleton(mmio_ro_ranges, mfn))
>>> +            printk(XENLOG_ERR
>>> +                   "%04x:%02x:%02x.%u: could not mark MCFG
(mfn %#lx)
>> read-only\n",
>>> +                   cfg->pci_segment, bus, PCI_SLOT(bdf),
PCI_FUNC(bdf),
>>> +                   mfn);
>>> +    }
>>> +}
>>> +
>>>    int pci_mmcfg_arch_enable(unsigned int idx)
>>>    {
>>>        const typeof(pci_mmcfg_config[0]) *cfg =
pci_mmcfg_virt[idx].cfg;
>>> +    const unsigned long *ro_map =
pci_get_ro_map(cfg->pci_segment);
>>>
>>>        if (pci_mmcfg_virt[idx].virt)
>>>            return 0;
>>> @@ -159,6 +180,16 @@ int pci_mmcfg_arch_enable(unsigned int i
>>>        }
>>>        printk(KERN_INFO "PCI: Using MCFG for segment %04x bus
%02x-%02x\n",
>>>               cfg->pci_segment, cfg->start_bus_number,
cfg->end_bus_number);
>>> +    if (ro_map) {
>>> +        unsigned int bdf = PCI_BDF(cfg->start_bus_number, 0,
0);
>>> +        unsigned int end = PCI_BDF(cfg->end_bus_number, -1,
-1);
>>> +
>>> +        while ((bdf = find_next_bit(ro_map, end + 1, bdf))<=
end) {
>>> +            arch_pci_ro_device(cfg->pci_segment, bdf);
>>> +            if (bdf++ == end)
>>> +                break;
>>> +        }
>>> +    }
>>>        return 0;
>>>    }
>>>
>>> --- a/xen/drivers/passthrough/amd/iommu_detect.c
>>> +++ b/xen/drivers/passthrough/amd/iommu_detect.c
>>> @@ -153,6 +153,12 @@ int __init amd_iommu_detect_one_acpi(
>>>        if ( rt )
>>>            return -ENODEV;
>>>
>>> +    rt = pci_ro_device(iommu->seg, bus, PCI_DEVFN(dev, func));
>>> +    if ( rt )
>>> +        printk(XENLOG_ERR
>>> +               "Could not mark config space of
%04x:%02x:%02x.%u read-only
>> (%d)\n",
>>> +               iommu->seg, bus, dev, func, rt);
>>> +
>>>        list_add_tail(&iommu->list,&amd_iommu_head);
>>>
>>>        return 0;
>>> --- a/xen/drivers/passthrough/io.c
>>> +++ b/xen/drivers/passthrough/io.c
>>> @@ -593,11 +593,3 @@ void hvm_dpci_eoi(struct domain *d, unsi
>>>    unlock:
>>>        spin_unlock(&d->event_lock);
>>>    }
>>> -
>>> -static int __init setup_mmio_ro_ranges(void)
>>> -{
>>> -    mmio_ro_ranges = rangeset_new(NULL, "r/o mmio
ranges",
>>> -                                  RANGESETF_prettyprint_hex);
>>> -    return 0;
>>> -}
>>> -__initcall(setup_mmio_ro_ranges);
>>> --- a/xen/drivers/passthrough/pci.c
>>> +++ b/xen/drivers/passthrough/pci.c
>>> @@ -36,6 +36,7 @@
>>>    struct pci_seg {
>>>        struct list_head alldevs_list;
>>>        u16 nr;
>>> +    unsigned long *ro_map;
>>>        /* bus2bridge_lock protects bus2bridge array */
>>>        spinlock_t bus2bridge_lock;
>>>    #define MAX_BUSES 256
>>> @@ -106,6 +107,8 @@ void __init pt_pci_init(void)
>>>        radix_tree_init(&pci_segments);
>>>        if ( !alloc_pseg(0) )
>>>            panic("Could not initialize PCI segment 0\n");
>>> +    mmio_ro_ranges = rangeset_new(NULL, "r/o mmio
ranges",
>>> +                                  RANGESETF_prettyprint_hex);
>>>    }
>>>
>>>    int __init pci_add_segment(u16 seg)
>>> @@ -113,6 +116,13 @@ int __init pci_add_segment(u16 seg)
>>>        return alloc_pseg(seg) ? 0 : -ENOMEM;
>>>    }
>>>
>>> +const unsigned long *pci_get_ro_map(u16 seg)
>>> +{
>>> +    struct pci_seg *pseg = get_pseg(seg);
>>> +
>>> +    return pseg ? pseg->ro_map : NULL;
>>> +}
>>> +
>>>    static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus,
u8 devfn)
>>>    {
>>>        struct pci_dev *pdev;
>>> @@ -198,6 +208,33 @@ static void free_pdev(struct pci_seg *ps
>>>        xfree(pdev);
>>>    }
>>>
>>> +int __init pci_ro_device(int seg, int bus, int devfn)
>>> +{
>>> +    struct pci_seg *pseg = alloc_pseg(seg);
>>> +    struct pci_dev *pdev;
>>> +
>>> +    if ( !pseg )
>>> +        return -ENOMEM;
>>> +    pdev = alloc_pdev(pseg, bus, devfn);
>>> +    if ( !pdev )
>>> +        return -ENOMEM;
>>> +
>>> +    if ( !pseg->ro_map )
>>> +    {
>>> +        size_t sz = BITS_TO_LONGS(PCI_BDF(-1, -1, -1) + 1) *
sizeof(long);
>>> +
>>> +        pseg->ro_map =
alloc_xenheap_pages(get_order_from_bytes(sz), 0);
>>> +        if ( !pseg->ro_map )
>>> +            return -ENOMEM;
>>> +        memset(pseg->ro_map, 0, sz);
>>> +    }
>>> +
>>> +    __set_bit(PCI_BDF2(bus, devfn), pseg->ro_map);
>>> +    arch_pci_ro_device(seg, PCI_BDF2(bus, devfn));
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>    struct pci_dev *pci_get_pdev(int seg, int bus, int devfn)
>>>    {
>>>        struct pci_seg *pseg = get_pseg(seg);
>>> --- a/xen/include/asm-x86/mm.h
>>> +++ b/xen/include/asm-x86/mm.h
>>> @@ -555,6 +555,8 @@ void memguard_unguard_stack(void *p);
>>>
>>>    int  ptwr_do_page_fault(struct vcpu *, unsigned long,
>>>                            struct cpu_user_regs *);
>>> +int  mmio_ro_do_page_fault(struct vcpu *, unsigned long,
>>> +                           struct cpu_user_regs *);
>>>
>>>    int audit_adjust_pgtables(struct domain *d, int dir, int noisy);
>>>
>>> --- a/xen/include/xen/pci.h
>>> +++ b/xen/include/xen/pci.h
>>> @@ -98,8 +98,11 @@ struct pci_dev *pci_lock_domain_pdev(
>>>    void setup_dom0_pci_devices(struct domain *, void (*)(struct
pci_dev *));
>>>    void pci_release_devices(struct domain *d);
>>>    int pci_add_segment(u16 seg);
>>> +const unsigned long *pci_get_ro_map(u16 seg);
>>>    int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct
pci_dev_info
>> *);
>>>    int pci_remove_device(u16 seg, u8 bus, u8 devfn);
>>> +int pci_ro_device(int seg, int bus, int devfn);
>>> +void arch_pci_ro_device(int seg, int bdf);
>>>    struct pci_dev *pci_get_pdev(int seg, int bus, int devfn);
>>>    struct pci_dev *pci_get_pdev_by_domain(
>>>        struct domain *, int seg, int bus, int devfn);
>>>
>
>
>

Xen devel - Jun 2012 - [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

[PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

[PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0 (was: Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it)

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

Re: [PATCH] PCI/MSI: don''t disable AMD IOMMU MSI on Xen dom0

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it

Re: [PATCH V2] amd iommu: re-enable iommu msi if dom0 disabled it