thr3ads.net - Xen devel - [Xen-devel] granting access to MSI-X table and pending bit array [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Jan Beulich

2010-Jul-07 10:14 UTC

[Xen-devel] granting access to MSI-X table and pending bit array

The original implementation (c/s 17536) disallowed access to these
after granting access to all BAR specified resources (i.e. this was
almost correct, except for a small time window during which the
memory was accessible to the guest and except for hiding the
pending bit array from the guest), but this got reverted with c/s
20171.

Afaics this is a security problem, as CPU accesses to the granted
memory don''t go through any IOMMU and hence there''s no place
these could be filtered out even in a supposedly secure environment
(not that I think devices accesses would be filtered at present, but
for those this would at least be possible ), and such accesses could
inadvertently or maliciously unmask masked vectors or modify the
message address/data fields.

Imo the pending bit array must be granted read-only access to the
guest (instead of either granting full access or no access at all),
with the potential side effect of also granting read-only access to
the table. And I would even think that this shouldn''t be done in the
tools, but rather in Xen itself (since it knows of all the PCI devices
and their respective eventual MSI-X address ranges), thus at once
eliminating any timing windows.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2010-Jul-07 14:14 UTC

head link

Re: [Xen-devel] granting access to MSI-X table and pending bit array

On Wed, Jul 07, 2010 at 11:14:04AM +0100, Jan Beulich
wrote:> The original implementation (c/s 17536) disallowed access to these
> after granting access to all BAR specified resources (i.e. this was
> almost correct, except for a small time window during which the
> memory was accessible to the guest and except for hiding the
> pending bit array from the guest), but this got reverted with c/s
> 20171.
> 
> Afaics this is a security problem, as CPU accesses to the granted
> memory don''t go through any IOMMU and hence there''s no
place
> these could be filtered out even in a supposedly secure environment
> (not that I think devices accesses would be filtered at present, but
> for those this would at least be possible ), and such accesses could
> inadvertently or maliciously unmask masked vectors or modify the
> message address/data fields.
> 
> Imo the pending bit array must be granted read-only access to the
> guest (instead of either granting full access or no access at all),
> with the potential side effect of also granting read-only access to
> the table. And I would even think that this shouldn''t be done in
the
> tools, but rather in Xen itself (since it knows of all the PCI devices
> and their respective eventual MSI-X address ranges), thus at once
> eliminating any timing windows.
That sounds sensible. You got a patch ready?> 
> Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Jul-07 14:31 UTC

head link

Re: [Xen-devel] granting access to MSI-X table and pending bit array

>>> On 07.07.10 at 16:14, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Wed, Jul 07, 2010 at 11:14:04AM +0100, Jan Beulich wrote:
>> The original implementation (c/s 17536) disallowed access to these
>> after granting access to all BAR specified resources (i.e. this was
>> almost correct, except for a small time window during which the
>> memory was accessible to the guest and except for hiding the
>> pending bit array from the guest), but this got reverted with c/s
>> 20171.
>> 
>> Afaics this is a security problem, as CPU accesses to the granted
>> memory don''t go through any IOMMU and hence there''s
no place
>> these could be filtered out even in a supposedly secure environment
>> (not that I think devices accesses would be filtered at present, but
>> for those this would at least be possible ), and such accesses could
>> inadvertently or maliciously unmask masked vectors or modify the
>> message address/data fields.
>> 
>> Imo the pending bit array must be granted read-only access to the
>> guest (instead of either granting full access or no access at all),
>> with the potential side effect of also granting read-only access to
>> the table. And I would even think that this shouldn''t be done
in the
>> tools, but rather in Xen itself (since it knows of all the PCI devices
>> and their respective eventual MSI-X address ranges), thus at once
>> eliminating any timing windows.
> 
> That sounds sensible. You got a patch ready?
Would be nice, but that''s not that simple as currently there''s
no way
to distinguish r/w mmio and r/o mmio.

Additionally we may need to override the guest''s page table
attributes, as generally at least Linux maps I/O memory writeable
no matter whether any writes will actually happen - which to me
seems to be an ugly hack (but I can''t think of better alternatives).

Furthermore I''m not really clear how a HVM guest''s writes to
the
MSI-X table get processed - clearly they can''t be let through to
the actual MMIO region. I''d guess they somehow get redirected
to qemu (since pciback doesn''t seem to be handling this), but I
don''t know this process very well...

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2010-Jul-07 16:49 UTC

head link

Re: [Xen-devel] granting access to MSI-X table and pending bit array

On Wed, 7 Jul 2010, Jan Beulich wrote:> >>> On 07.07.10 at 16:14, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > On Wed, Jul 07, 2010 at 11:14:04AM +0100, Jan Beulich wrote:
> >> The original implementation (c/s 17536) disallowed access to these
> >> after granting access to all BAR specified resources (i.e. this
was
> >> almost correct, except for a small time window during which the
> >> memory was accessible to the guest and except for hiding the
> >> pending bit array from the guest), but this got reverted with c/s
> >> 20171.
> >> 
> >> Afaics this is a security problem, as CPU accesses to the granted
> >> memory don''t go through any IOMMU and hence
there''s no place
> >> these could be filtered out even in a supposedly secure
environment
> >> (not that I think devices accesses would be filtered at present,
but
> >> for those this would at least be possible ), and such accesses
could
> >> inadvertently or maliciously unmask masked vectors or modify the
> >> message address/data fields.
> >> 
> >> Imo the pending bit array must be granted read-only access to the
> >> guest (instead of either granting full access or no access at
all),
> >> with the potential side effect of also granting read-only access
to
> >> the table. And I would even think that this shouldn''t be
done in the
> >> tools, but rather in Xen itself (since it knows of all the PCI
devices
> >> and their respective eventual MSI-X address ranges), thus at once
> >> eliminating any timing windows.
> > 
> > That sounds sensible. You got a patch ready?
> 
> Would be nice, but that''s not that simple as currently
there''s no way
> to distinguish r/w mmio and r/o mmio.
> 
> Additionally we may need to override the guest''s page table
> attributes, as generally at least Linux maps I/O memory writeable
> no matter whether any writes will actually happen - which to me
> seems to be an ugly hack (but I can''t think of better
alternatives).
> 
> Furthermore I''m not really clear how a HVM guest''s writes
to the
> MSI-X table get processed - clearly they can''t be let through to
> the actual MMIO region. I''d guess they somehow get redirected
> to qemu (since pciback doesn''t seem to be handling this), but I
> don''t know this process very well...
> 
the guest gets a fake MSI-X table, emulated by qemu, have a look at:

hw/pt-msi.c:pt_msi_setup
hw/pt-msi.c:pt_msi_update
hw/pass-through.c:pt_msgctrl_reg_write



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Jul-09 09:54 UTC

head link

Re: [Xen-devel] granting access to MSI-X table and pending bit array

>>> On 07.07.10 at 18:49, Stefano Stabellini
<stefano.stabellini@eu.citrix.com>
wrote:> On Wed, 7 Jul 2010, Jan Beulich wrote:
>> >>> On 07.07.10 at 16:14, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> > On Wed, Jul 07, 2010 at 11:14:04AM +0100, Jan Beulich wrote:
>> >> The original implementation (c/s 17536) disallowed access to
these
>> >> after granting access to all BAR specified resources (i.e.
this was
>> >> almost correct, except for a small time window during which
the
>> >> memory was accessible to the guest and except for hiding the
>> >> pending bit array from the guest), but this got reverted with
c/s
>> >> 20171.
>> >> 
>> >> Afaics this is a security problem, as CPU accesses to the
granted
>> >> memory don''t go through any IOMMU and hence
there''s no place
>> >> these could be filtered out even in a supposedly secure
environment
>> >> (not that I think devices accesses would be filtered at
present, but
>> >> for those this would at least be possible ), and such accesses
could
>> >> inadvertently or maliciously unmask masked vectors or modify
the
>> >> message address/data fields.
>> >> 
>> >> Imo the pending bit array must be granted read-only access to
the
>> >> guest (instead of either granting full access or no access at
all),
>> >> with the potential side effect of also granting read-only
access to
>> >> the table. And I would even think that this shouldn''t
be done in the
>> >> tools, but rather in Xen itself (since it knows of all the PCI
devices
>> >> and their respective eventual MSI-X address ranges), thus at
once
>> >> eliminating any timing windows.
>> > 
>> > That sounds sensible. You got a patch ready?
>> 
>> Would be nice, but that''s not that simple as currently
there''s no way
>> to distinguish r/w mmio and r/o mmio.
>> 
>> Additionally we may need to override the guest''s page table
>> attributes, as generally at least Linux maps I/O memory writeable
>> no matter whether any writes will actually happen - which to me
>> seems to be an ugly hack (but I can''t think of better
alternatives).
>> 
>> Furthermore I''m not really clear how a HVM guest''s
writes to the
>> MSI-X table get processed - clearly they can''t be let through
to
>> the actual MMIO region. I''d guess they somehow get redirected
>> to qemu (since pciback doesn''t seem to be handling this), but
I
>> don''t know this process very well...
>> 
> 
> the guest gets a fake MSI-X table, emulated by qemu, have a look at:
> 
> hw/pt-msi.c:pt_msi_setup
> hw/pt-msi.c:pt_msi_update
> hw/pass-through.c:pt_msgctrl_reg_write
Hmm, okay, this means that the HVM case looks safe but really isn''t:
Generally, access to the underlying I/O memory range is disallowed,
but there is a (small) time window where the guest can directly see
that range. What''s worse is that qemu writes the mask flag (as
coming from the guest) directly to hardware, this overriding
whatever Xen may want there.

With the revert mentioned above being about PV guests, I wonder
what problems PV guests had with not being able to access the
table and PBA ranges. Linux doesn''t (so far) use the PBA at all,
and according to my analysis of drivers/pci/msi-xen.c also doesn''t
access the table in any way even when being run privileged. Keir,
with the description of c/s 20171 being rather brief - what was
the problem PV guests were having? Or was that more specifically
with stubdom wanting to do above mentioned writes on behalf of
the serviced guest?

Finally, removing the rounded up to page boundaries range
completely for HVM guests will break any guest OSes that
want to make use of the PBA if, for a particular device, this
and the table (partially) share a page (which the spec explicitly
permits).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Jul-09 09:59 UTC

head link

Re: [Xen-devel] granting access to MSI-X table and pending bit array

On 09/07/2010 10:54, "Jan Beulich" <JBeulich@novell.com> wrote:
> With the revert mentioned above being about PV guests, I wonder
> what problems PV guests had with not being able to access the
> table and PBA ranges. Linux doesn''t (so far) use the PBA at all,
> and according to my analysis of drivers/pci/msi-xen.c also doesn''t
> access the table in any way even when being run privileged. Keir,
> with the description of c/s 20171 being rather brief - what was
> the problem PV guests were having? Or was that more specifically
> with stubdom wanting to do above mentioned writes on behalf of
> the serviced guest?
I was trying to pass through an Intel ixgbe VF to a PV guest. The guest
driver attempts to map some resource range (I think indicated via a BAR) and
that fails because one page in the range is disallowed, corresponding to an
MSI-X iomem page. Reverting this patch made things work. :-) Perhaps a dummy
page should get silently mapped in or something, but I didn''t
investigate
further myself.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Jul-12 09:55 UTC

head link

Re: [Xen-devel] granting access to MSI-X table and pending bit array

>>> On 07.07.10 at 16:14, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Wed, Jul 07, 2010 at 11:14:04AM +0100, Jan Beulich wrote:
>> The original implementation (c/s 17536) disallowed access to these
>> after granting access to all BAR specified resources (i.e. this was
>> almost correct, except for a small time window during which the
>> memory was accessible to the guest and except for hiding the
>> pending bit array from the guest), but this got reverted with c/s
>> 20171.
>> 
>> Afaics this is a security problem, as CPU accesses to the granted
>> memory don''t go through any IOMMU and hence there''s
no place
>> these could be filtered out even in a supposedly secure environment
>> (not that I think devices accesses would be filtered at present, but
>> for those this would at least be possible ), and such accesses could
>> inadvertently or maliciously unmask masked vectors or modify the
>> message address/data fields.
>> 
>> Imo the pending bit array must be granted read-only access to the
>> guest (instead of either granting full access or no access at all),
>> with the potential side effect of also granting read-only access to
>> the table. And I would even think that this shouldn''t be done
in the
>> tools, but rather in Xen itself (since it knows of all the PCI devices
>> and their respective eventual MSI-X address ranges), thus at once
>> eliminating any timing windows.
> 
> That sounds sensible. You got a patch ready?
Below/attached is an untested and possibly not yet complete one
(can''t really test this myself as I don''t have, with one
exception, any
MSI-X capable devices around, and the exceptional one doesn''t have
a driver making use of it).

It tracks MMIO MFNs required to only have read-only guest access
(determined when the first MSI-X interrupt gets enabled on a device)
in a global range set, and enforces the write protection as
translations get established.

The changes are made under the assumption that p2m_mmio_direct will
only ever be used for order 0 pages.

An open question is whether dealing with pv guests (including the
IOMMU-less case) is necessary, as handling mappings a domain may
already have in place at the time the first interrupt gets set up
would require scanning all of the guest''s page table pages.

An alternative would be to determine and insert the address ranges
earlier into mmio_ro_ranges, but that would require a hook in the
PCI config space writes, which is particularly problematic in case
MMCONFIG accesses are being used.

A second alternative would be to require Dom0 to report all devices
(or at least all MSI-X capable ones) regardless of whether they would
be used by that domain, and do so after resources got determined/
assigned for them (i.e. a second notification later than the one
currently happening from the PCI bus scan would be needed).

Jan

--- 2010-06-15.orig/xen/arch/x86/mm.c	2010-06-15 12:24:43.000000000 +0200
+++ 2010-06-15/xen/arch/x86/mm.c	2010-07-09 16:25:57.000000000 +0200
@@ -823,7 +823,13 @@ get_page_from_l1e(
             return 0;
         }
 
-        return 1;
+        if ( !(l1f & _PAGE_RW) || IS_PRIV(pg_owner) ||
+             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+            return 1;
+        dprintk(XENLOG_G_WARNING,
+                "d%d: Forcing read-only access to MFN %lx\n",
+                l1e_owner->domain_id, mfn);
+        return -1;
     }
 
     if ( unlikely(real_pg_owner != pg_owner) )
@@ -1179,9 +1185,15 @@ static int alloc_l1_table(struct page_in
 
     for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
     {
-        if ( is_guest_l1_slot(i) &&
-             unlikely(!get_page_from_l1e(pl1e[i], d, d)) )
-            goto fail;
+        if ( is_guest_l1_slot(i) )
+            switch ( get_page_from_l1e(pl1e[i], d, d) )
+            {
+            case 0:
+                goto fail;
+            case -1:
+                l1e_remove_flags(pl1e[i], _PAGE_RW);
+                break;
+            }
 
         adjust_guest_l1e(pl1e[i], d);
     }
@@ -1764,8 +1776,14 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return rc;
         }
 
-        if ( unlikely(!get_page_from_l1e(nl1e, pt_dom, pg_dom)) )
+        switch ( get_page_from_l1e(nl1e, pt_dom, pg_dom) )
+        {
+        case 0:
             return 0;
+        case -1:
+            l1e_remove_flags(nl1e, _PAGE_RW);
+            break;
+        }
         
         adjust_guest_l1e(nl1e, pt_dom);
         if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
@@ -4979,8 +4997,9 @@ static int ptwr_emulated_update(
 
     /* Check the new PTE. */
     nl1e = l1e_from_intpte(val);
-    if ( unlikely(!get_page_from_l1e(nl1e, d, d)) )
+    switch ( get_page_from_l1e(nl1e, d, d) )
     {
+    case 0:
         if ( is_pv_32bit_domain(d) && (bytes == 4) &&
(unaligned_addr & 4) &&
              !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
         {
@@ -4999,6 +5018,10 @@ static int ptwr_emulated_update(
             MEM_LOG("ptwr_emulate: could not get_page_from_l1e()");
             return X86EMUL_UNHANDLEABLE;
         }
+        break;
+    case -1:
+        l1e_remove_flags(nl1e, _PAGE_RW);
+        break;
     }
 
     adjust_guest_l1e(nl1e, d);
--- 2010-06-15.orig/xen/arch/x86/mm/hap/p2m-ept.c	2010-06-14 08:49:36.000000000
+0200
+++ 2010-06-15/xen/arch/x86/mm/hap/p2m-ept.c	2010-07-12 08:47:12.000000000 +0200
@@ -72,9 +72,13 @@ static void ept_p2m_type_to_flags(ept_en
             entry->r = entry->w = entry->x = 0;
             return;
         case p2m_ram_rw:
-        case p2m_mmio_direct:
             entry->r = entry->w = entry->x = 1;
             return;
+        case p2m_mmio_direct:
+            entry->r = entry->x = 1;
+            entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
+                                                    entry->mfn);
+            return;
         case p2m_ram_logdirty:
         case p2m_ram_ro:
         case p2m_ram_shared:
@@ -675,6 +679,9 @@ static void ept_change_entry_type_global
     if ( ept_get_asr(d) == 0 )
         return;
 
+    BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt));
+    BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt ==
p2m_mmio_direct));
+
     ept_change_entry_type_page(_mfn(ept_get_asr(d)), ept_get_wl(d), ot, nt);
 
     ept_sync_domain(d);
--- 2010-06-15.orig/xen/arch/x86/mm/p2m.c	2010-06-11 11:41:35.000000000 +0200
+++ 2010-06-15/xen/arch/x86/mm/p2m.c	2010-07-12 08:48:08.000000000 +0200
@@ -72,7 +72,7 @@ boolean_param("hap_1gb", opt_hap_1gb);
 #define SUPERPAGE_PAGES (1UL << 9)
 #define superpage_aligned(_x)  (((_x)&(SUPERPAGE_PAGES-1))==0)
 
-static unsigned long p2m_type_to_flags(p2m_type_t t) 
+static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn)
 {
     unsigned long flags;
 #ifdef __x86_64__
@@ -101,7 +101,9 @@ static unsigned long p2m_type_to_flags(p
     case p2m_mmio_dm:
         return flags;
     case p2m_mmio_direct:
-        return flags | P2M_BASE_FLAGS | _PAGE_RW | _PAGE_PCD;
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
+            flags |= _PAGE_RW;
+        return flags | P2M_BASE_FLAGS | _PAGE_PCD;
     case p2m_populate_on_demand:
         return flags;
     }
@@ -1301,8 +1303,10 @@ p2m_set_entry(struct domain *d, unsigned
             domain_crash(d);
             goto out;
         }
+        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         l3e_content = mfn_valid(mfn) 
-            ? l3e_from_pfn(mfn_x(mfn), p2m_type_to_flags(p2mt) | _PAGE_PSE)
+            ? l3e_from_pfn(mfn_x(mfn),
+                           p2m_type_to_flags(p2mt, mfn) | _PAGE_PSE)
             : l3e_empty();
         entry_content.l1 = l3e_content.l3;
         paging_write_p2m_entry(d, gfn, p2m_entry, table_mfn, entry_content, 3);
@@ -1335,7 +1339,8 @@ p2m_set_entry(struct domain *d, unsigned
         ASSERT(p2m_entry);
         
         if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) )
-            entry_content = l1e_from_pfn(mfn_x(mfn), p2m_type_to_flags(p2mt));
+            entry_content = l1e_from_pfn(mfn_x(mfn),
+                                         p2m_type_to_flags(p2mt, mfn));
         else
             entry_content = l1e_empty();
         
@@ -1358,9 +1363,11 @@ p2m_set_entry(struct domain *d, unsigned
             goto out;
         }
         
+        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         if ( mfn_valid(mfn) || p2m_is_magic(p2mt) )
             l2e_content = l2e_from_pfn(mfn_x(mfn),
-                                       p2m_type_to_flags(p2mt) | _PAGE_PSE);
+                                       p2m_type_to_flags(p2mt, mfn) |
+                                       _PAGE_PSE);
         else
             l2e_content = l2e_empty();
         
@@ -2424,6 +2431,7 @@ void p2m_change_type_global(struct domai
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
     BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt));
+    BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt ==
p2m_mmio_direct));
 
     if ( !paging_mode_translate(d) )
         return;
@@ -2465,7 +2473,7 @@ void p2m_change_type_global(struct domai
                     continue;
                 mfn = l3e_get_pfn(l3e[i3]);
                 gfn = get_gpfn_from_mfn(mfn);
-                flags = p2m_type_to_flags(nt);
+                flags = p2m_type_to_flags(nt, _mfn(mfn));
                 l1e_content = l1e_from_pfn(mfn, flags | _PAGE_PSE);
                 paging_write_p2m_entry(d, gfn, (l1_pgentry_t *)&l3e[i3],
                                        l3mfn, l1e_content, 3);
@@ -2495,7 +2503,7 @@ void p2m_change_type_global(struct domai
 #endif
 				)
                            * L2_PAGETABLE_ENTRIES) * L1_PAGETABLE_ENTRIES; 
-                    flags = p2m_type_to_flags(nt);
+                    flags = p2m_type_to_flags(nt, _mfn(mfn));
                     l1e_content = l1e_from_pfn(mfn, flags | _PAGE_PSE);
                     paging_write_p2m_entry(d, gfn, (l1_pgentry_t
*)&l2e[i2],
                                            l2mfn, l1e_content, 2);
@@ -2518,7 +2526,7 @@ void p2m_change_type_global(struct domai
 				     )
                            * L2_PAGETABLE_ENTRIES) * L1_PAGETABLE_ENTRIES; 
                     /* create a new 1le entry with the new type */
-                    flags = p2m_type_to_flags(nt);
+                    flags = p2m_type_to_flags(nt, _mfn(mfn));
                     l1e_content = l1e_from_pfn(mfn, flags);
                     paging_write_p2m_entry(d, gfn, &l1e[i1],
                                            l1mfn, l1e_content, 1);
--- 2010-06-15.orig/xen/arch/x86/mm/shadow/multi.c	2010-05-28 13:59:16.000000000
+0200
+++ 2010-06-15/xen/arch/x86/mm/shadow/multi.c	2010-07-09 17:11:07.000000000
+0200
@@ -653,7 +653,9 @@ _sh_propagate(struct vcpu *v, 
     }
 
     /* Read-only memory */
-    if ( p2m_is_readonly(p2mt) )
+    if ( p2m_is_readonly(p2mt) ||
+         (p2mt == p2m_mmio_direct &&
+          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
         sflags &= ~_PAGE_RW;
     
     // protect guest page tables
@@ -1204,15 +1206,19 @@ static int shadow_set_l1e(struct vcpu *v
         /* About to install a new reference */        
         if ( shadow_mode_refcounts(d) ) {
             TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
-            if ( shadow_get_page_from_l1e(new_sl1e, d, new_type) == 0 ) 
+            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
             {
+            case 0:
                 /* Doesn''t look like a pagetable. */
                 flags |= SHADOW_SET_ERROR;
                 new_sl1e = shadow_l1e_empty();
-            }
-            else
-            {
+                break;
+            case -1:
+                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
+                /* fall through */
+            default:
                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
+                break;
             }
         }
     } 
--- 2010-06-15.orig/xen/arch/x86/msi.c	2010-07-09 15:19:02.000000000 +0200
+++ 2010-06-15/xen/arch/x86/msi.c	2010-07-12 09:23:01.000000000 +0200
@@ -16,12 +16,14 @@
 #include <xen/errno.h>
 #include <xen/pci.h>
 #include <xen/pci_regs.h>
+#include <xen/iocap.h>
 #include <xen/keyhandler.h>
 #include <asm/io.h>
 #include <asm/smp.h>
 #include <asm/desc.h>
 #include <asm/msi.h>
 #include <asm/fixmap.h>
+#include <asm/p2m.h>
 #include <mach_apic.h>
 #include <io_ports.h>
 #include <public/physdev.h>
@@ -520,6 +522,43 @@ static int msi_capability_init(struct pc
     return 0;
 }
 
+static u64 read_pci_mem_bar(u8 bus, u8 slot, u8 func, u8 bir)
+{
+    u8 limit;
+    u32 addr;
+
+    switch ( pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) )
+    {
+    case PCI_HEADER_TYPE_NORMAL:
+        limit = 6;
+        break;
+    case PCI_HEADER_TYPE_BRIDGE:
+        limit = 2;
+        break;
+    case PCI_HEADER_TYPE_CARDBUS:
+        limit = 1;
+        break;
+    default:
+        return 0;
+    }
+
+    if ( bir >= limit )
+        return 0;
+    addr = pci_conf_read32(bus, slot, func, PCI_BASE_ADDRESS_0 + bir * 4);
+    if ( (addr & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
+        return 0;
+    if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
PCI_BASE_ADDRESS_MEM_TYPE_64 )
+    {
+        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
+        if ( ++bir >= limit )
+            return 0;
+        return addr |
+               ((u64)pci_conf_read32(bus, slot, func,
+                                     PCI_BASE_ADDRESS_0 + bir * 4) <<
32);
+    }
+    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
+}
+
 /**
  * msix_capability_init - configure device''s MSI-X capability
  * @dev: pointer to the pci_dev data structure of MSI-X device function
@@ -532,7 +571,8 @@ static int msi_capability_init(struct pc
  **/
 static int msix_capability_init(struct pci_dev *dev,
                                 struct msi_info *msi,
-                                struct msi_desc **desc)
+                                struct msi_desc **desc,
+                                unsigned int nr_entries)
 {
     struct msi_desc *entry;
     int pos;
@@ -587,6 +627,70 @@ static int msix_capability_init(struct p
 
     list_add_tail(&entry->list, &dev->msi_list);
 
+    if ( !dev->msix_nr_entries )
+    {
+        u64 pba_paddr;
+        u32 pba_offset;
+        unsigned long pfn;
+
+        ASSERT(!dev->msix_used_entries);
+        WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir));
+
+        dev->msix_nr_entries = nr_entries;
+        dev->msix_table.first = pfn = PFN_DOWN(table_paddr);
+        dev->msix_table.last = PFN_DOWN(table_paddr +
+                                        nr_entries * PCI_MSIX_ENTRY_SIZE - 1);
+        for ( ; pfn <= dev->msix_table.last; ++pfn )
+            WARN_ON(rangeset_contains_singleton(mmio_ro_ranges, pfn));
+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
+                                dev->msix_table.last) )
+            WARN();
+
+        pba_offset = pci_conf_read32(bus, slot, func,
+                                     msix_pba_offset_reg(pos));
+        bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
+        pba_paddr = read_pci_mem_bar(bus, slot, func, bir);
+        WARN_ON(!pba_paddr);
+        pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;
+
+        dev->msix_pba.first = PFN_DOWN(pba_paddr);
+        dev->msix_pba.last = PFN_DOWN(pba_paddr +
+                                      BITS_TO_LONGS(nr_entries) - 1);
+        for ( ; pfn <= dev->msix_table.last; ++pfn )
+            if ( pfn < dev->msix_table.first || pfn >
dev->msix_table.last )
+                WARN_ON(rangeset_contains_singleton(mmio_ro_ranges, pfn));
+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
+                                dev->msix_pba.last) )
+            WARN();
+printk("MSIX%02x:%02x.%x: table@(%lx,%lx), pba@(%lx,%lx)\n", bus,
slot, func,
+       dev->msix_table.first, dev->msix_table.last,
+       dev->msix_pba.first, dev->msix_pba.last);//temp
+
+        if ( dev->domain )
+            p2m_change_entry_type_global(dev->domain, p2m_mmio_direct,
+                                         p2m_mmio_direct);
+        if ( !dev->domain || !paging_mode_translate(dev->domain) )
+        {
+            struct domain *d = dev->domain;
+
+            if ( !d )
+                for_each_domain(d)
+                    if ( !paging_mode_translate(d) &&
+                         (iomem_access_permitted(d, dev->msix_table.first,
+                                                 dev->msix_table.last) ||
+                          iomem_access_permitted(d, dev->msix_pba.first,
+                                                 dev->msix_pba.last)) )
+                        break;
+            if ( d )
+            {
+                /* XXX How to deal with existing mappings? */
+            }
+        }
+    }
+    WARN_ON(dev->msix_nr_entries != nr_entries);
+    WARN_ON(dev->msix_table.first != (table_paddr >> PAGE_SHIFT));
+    ++dev->msix_used_entries;
+
     /* Mask interrupt here */
     writel(1, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 
@@ -707,7 +811,7 @@ static int __pci_enable_msix(struct msi_
         return 0;
     }
 
-    status = msix_capability_init(pdev, msi, desc);
+    status = msix_capability_init(pdev, msi, desc, nr_entries);
     return status;
 }
 
@@ -732,6 +836,16 @@ static void __pci_disable_msix(struct ms
     writel(1, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 
     pci_conf_write16(bus, slot, func, msix_control_reg(pos), control);
+
+    if ( !--dev->msix_used_entries )
+    {
+        if ( rangeset_remove_range(mmio_ro_ranges, dev->msix_table.first,
+                                  dev->msix_table.last) )
+            WARN();
+        if ( rangeset_remove_range(mmio_ro_ranges, dev->msix_pba.first,
+                                   dev->msix_pba.last) )
+            WARN();
+    }
 }
 
 /*
--- 2010-06-15.orig/xen/drivers/passthrough/io.c	2010-04-22 14:43:25.000000000
+0200
+++ 2010-06-15/xen/drivers/passthrough/io.c	2010-07-09 16:02:23.000000000 +0200
@@ -26,6 +26,8 @@
 #include <xen/hvm/irq.h>
 #include <xen/tasklet.h>
 
+struct rangeset *__read_mostly mmio_ro_ranges;
+
 static void hvm_dirq_assist(unsigned long _d);
 
 bool_t pt_irq_need_timer(uint32_t flags)
@@ -565,3 +567,11 @@ void hvm_dpci_eoi(struct domain *d, unsi
 unlock:
     spin_unlock(&d->event_lock);
 }
+
+static int __init setup_mmio_ro_ranges(void)
+{
+    mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges",
+                                  RANGESETF_prettyprint_hex);
+    return 0;
+}
+__initcall(setup_mmio_ro_ranges);
--- 2010-06-15.orig/xen/include/xen/iommu.h	2010-06-01 13:39:57.000000000 +0200
+++ 2010-06-15/xen/include/xen/iommu.h	2010-07-09 15:55:52.000000000 +0200
@@ -31,6 +31,8 @@ extern bool_t force_iommu, iommu_verbose
 extern bool_t iommu_workaround_bios_bug, iommu_passthrough;
 extern bool_t iommu_snoop, iommu_qinval, iommu_intremap;
 
+extern struct rangeset *mmio_ro_ranges;
+
 #define domain_hvm_iommu(d)     (&d->arch.hvm_domain.hvm_iommu)
 
 #define MAX_IOMMUS 32
--- 2010-06-15.orig/xen/include/xen/pci.h	2010-01-21 15:36:53.000000000 +0100
+++ 2010-06-15/xen/include/xen/pci.h	2010-07-09 15:49:32.000000000 +0200
@@ -45,6 +45,10 @@ struct pci_dev {
     struct list_head domain_list;
 
     struct list_head msi_list;
+    unsigned int msix_nr_entries, msix_used_entries;
+    struct {
+        unsigned long first, last;
+    } msix_table, msix_pba;
     int msix_table_refcnt[MAX_MSIX_TABLE_PAGES];
     int msix_table_idx[MAX_MSIX_TABLE_PAGES];
     spinlock_t msix_table_lock;



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Jul-15 08:22 UTC

head link

RE: [Xen-devel] granting access to MSI-X table and pending bit array

>-----Original Message-----
>From: xen-devel-bounces@lists.xensource.com
>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Jan Beulich
>Sent: Monday, July 12, 2010 5:55 PM
>To: Konrad Rzeszutek Wilk
>Cc: xen-devel@lists.xensource.com
>Subject: Re: [Xen-devel] granting access to MSI-X table and pending bit
array
>
>>>> On 07.07.10 at 16:14, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> On Wed, Jul 07, 2010 at 11:14:04AM +0100, Jan Beulich wrote:
>>> The original implementation (c/s 17536) disallowed access to these
>>> after granting access to all BAR specified resources (i.e. this was
>>> almost correct, except for a small time window during which the
>>> memory was accessible to the guest and except for hiding the
>>> pending bit array from the guest), but this got reverted with c/s
>>> 20171.
>>>
>>> Afaics this is a security problem, as CPU accesses to the granted
>>> memory don''t go through any IOMMU and hence
there''s no place
>>> these could be filtered out even in a supposedly secure environment
>>> (not that I think devices accesses would be filtered at present,
but
>>> for those this would at least be possible ), and such accesses
could
>>> inadvertently or maliciously unmask masked vectors or modify the
>>> message address/data fields.
>>>
>>> Imo the pending bit array must be granted read-only access to the
>>> guest (instead of either granting full access or no access at all),
>>> with the potential side effect of also granting read-only access to
>>> the table. And I would even think that this shouldn''t be
done in the
>>> tools, but rather in Xen itself (since it knows of all the PCI
devices
>>> and their respective eventual MSI-X address ranges), thus at once
>>> eliminating any timing windows.
>>
>> That sounds sensible. You got a patch ready?
>
>Below/attached is an untested and possibly not yet complete one
>(can''t really test this myself as I don''t have, with one
exception, any
>MSI-X capable devices around, and the exceptional one doesn''t have
>a driver making use of it).
>
>It tracks MMIO MFNs required to only have read-only guest access
>(determined when the first MSI-X interrupt gets enabled on a device)
>in a global range set, and enforces the write protection as
>translations get established.
>
>The changes are made under the assumption that p2m_mmio_direct will
>only ever be used for order 0 pages.
>
>An open question is whether dealing with pv guests (including the
>IOMMU-less case) is necessary, as handling mappings a domain may
>already have in place at the time the first interrupt gets set up
>would require scanning all of the guest''s page table pages.
>
>An alternative would be to determine and insert the address ranges
>earlier into mmio_ro_ranges, but that would require a hook in the
>PCI config space writes, which is particularly problematic in case
>MMCONFIG accesses are being used.
>
>A second alternative would be to require Dom0 to report all devices
>(or at least all MSI-X capable ones) regardless of whether they would
>be used by that domain, and do so after resources got determined/
>assigned for them (i.e. a second notification later than the one
>currently happening from the PCI bus scan would be needed).
>
>Jan
>
>--- 2010-06-15.orig/xen/arch/x86/mm.c  2010-06-15 12:24:43.000000000 +0200
>+++ 2010-06-15/xen/arch/x86/mm.c       2010-07-09 16:25:57.000000000 +0200
>@@ -823,7 +823,13 @@ get_page_from_l1e(
>             return 0;
>         }
>
>-        return 1;
>+        if ( !(l1f & _PAGE_RW) || IS_PRIV(pg_owner) ||
>+             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
>+            return 1;
>+        dprintk(XENLOG_G_WARNING,
>+                "d%d: Forcing read-only access to MFN %lx\n",
>+                l1e_owner->domain_id, mfn);
>+        return -1;
>     }
>
>     if ( unlikely(real_pg_owner != pg_owner) )
>@@ -1179,9 +1185,15 @@ static int alloc_l1_table(struct page_in
>
>     for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
>     {
>-        if ( is_guest_l1_slot(i) &&
>-             unlikely(!get_page_from_l1e(pl1e[i], d, d)) )
>-            goto fail;
>+        if ( is_guest_l1_slot(i) )
>+            switch ( get_page_from_l1e(pl1e[i], d, d) )
>+            {
>+            case 0:
>+                goto fail;
>+            case -1:
>+                l1e_remove_flags(pl1e[i], _PAGE_RW);
>+                break;
>+            }
>
>         adjust_guest_l1e(pl1e[i], d);
>     }
>@@ -1764,8 +1776,14 @@ static int mod_l1_entry(l1_pgentry_t *pl
>             return rc;
>         }
>
>-        if ( unlikely(!get_page_from_l1e(nl1e, pt_dom, pg_dom)) )
>+        switch ( get_page_from_l1e(nl1e, pt_dom, pg_dom) )
>+        {
>+        case 0:
>             return 0;
>+        case -1:
>+            l1e_remove_flags(nl1e, _PAGE_RW);
>+            break;
>+        }
>
>         adjust_guest_l1e(nl1e, pt_dom);
>         if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
>@@ -4979,8 +4997,9 @@ static int ptwr_emulated_update(
>
>     /* Check the new PTE. */
>     nl1e = l1e_from_intpte(val);
>-    if ( unlikely(!get_page_from_l1e(nl1e, d, d)) )
>+    switch ( get_page_from_l1e(nl1e, d, d) )
>     {
>+    case 0:
>         if ( is_pv_32bit_domain(d) && (bytes == 4) &&
(unaligned_addr & 4) &&
>              !do_cmpxchg && (l1e_get_flags(nl1e) &
_PAGE_PRESENT) )
>         {
>@@ -4999,6 +5018,10 @@ static int ptwr_emulated_update(
>             MEM_LOG("ptwr_emulate: could not
get_page_from_l1e()");
>             return X86EMUL_UNHANDLEABLE;
>         }
>+        break;
>+    case -1:
>+        l1e_remove_flags(nl1e, _PAGE_RW);
>+        break;
>     }
>
>     adjust_guest_l1e(nl1e, d);
>--- 2010-06-15.orig/xen/arch/x86/mm/hap/p2m-ept.c      2010-06-14
>08:49:36.000000000 +0200
>+++ 2010-06-15/xen/arch/x86/mm/hap/p2m-ept.c   2010-07-12
>08:47:12.000000000 +0200
>@@ -72,9 +72,13 @@ static void ept_p2m_type_to_flags(ept_en
>             entry->r = entry->w = entry->x = 0;
>             return;
>         case p2m_ram_rw:
>-        case p2m_mmio_direct:
>             entry->r = entry->w = entry->x = 1;
>             return;
>+        case p2m_mmio_direct:
>+            entry->r = entry->x = 1;
>+            entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
>+                                                    entry->mfn);
>+            return;
>         case p2m_ram_logdirty:
>         case p2m_ram_ro:
>         case p2m_ram_shared:
>@@ -675,6 +679,9 @@ static void ept_change_entry_type_global
>     if ( ept_get_asr(d) == 0 )
>         return;
>
>+    BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt));
>+    BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt ==
p2m_mmio_direct));
>+
>     ept_change_entry_type_page(_mfn(ept_get_asr(d)), ept_get_wl(d), ot,
nt);
>
>     ept_sync_domain(d);
>--- 2010-06-15.orig/xen/arch/x86/mm/p2m.c      2010-06-11 11:41:35.000000000
>+0200
>+++ 2010-06-15/xen/arch/x86/mm/p2m.c   2010-07-12 08:48:08.000000000 +0200
>@@ -72,7 +72,7 @@ boolean_param("hap_1gb", opt_hap_1gb);
> #define SUPERPAGE_PAGES (1UL << 9)
> #define superpage_aligned(_x)  (((_x)&(SUPERPAGE_PAGES-1))==0)
>
>-static unsigned long p2m_type_to_flags(p2m_type_t t)
>+static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn)
> {
>     unsigned long flags;
> #ifdef __x86_64__
>@@ -101,7 +101,9 @@ static unsigned long p2m_type_to_flags(p
>     case p2m_mmio_dm:
>         return flags;
>     case p2m_mmio_direct:
>-        return flags | P2M_BASE_FLAGS | _PAGE_RW | _PAGE_PCD;
>+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
>+            flags |= _PAGE_RW;
>+        return flags | P2M_BASE_FLAGS | _PAGE_PCD;
>     case p2m_populate_on_demand:
>         return flags;
>     }
>@@ -1301,8 +1303,10 @@ p2m_set_entry(struct domain *d, unsigned
>             domain_crash(d);
>             goto out;
>         }
>+        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
>         l3e_content = mfn_valid(mfn)
>-            ? l3e_from_pfn(mfn_x(mfn), p2m_type_to_flags(p2mt) | _PAGE_PSE)
>+            ? l3e_from_pfn(mfn_x(mfn),
>+                           p2m_type_to_flags(p2mt, mfn) | _PAGE_PSE)
>             : l3e_empty();
>         entry_content.l1 = l3e_content.l3;
>         paging_write_p2m_entry(d, gfn, p2m_entry, table_mfn, entry_content,
>3);
>@@ -1335,7 +1339,8 @@ p2m_set_entry(struct domain *d, unsigned
>         ASSERT(p2m_entry);
>
>         if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) )
>-            entry_content = l1e_from_pfn(mfn_x(mfn),
>p2m_type_to_flags(p2mt));
>+            entry_content = l1e_from_pfn(mfn_x(mfn),
>+                                         p2m_type_to_flags(p2mt, mfn));
>         else
>             entry_content = l1e_empty();
>
>@@ -1358,9 +1363,11 @@ p2m_set_entry(struct domain *d, unsigned
>             goto out;
>         }
>
>+        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
>         if ( mfn_valid(mfn) || p2m_is_magic(p2mt) )
>             l2e_content = l2e_from_pfn(mfn_x(mfn),
>-                                       p2m_type_to_flags(p2mt) |
>_PAGE_PSE);
>+                                       p2m_type_to_flags(p2mt, mfn) |
>+                                       _PAGE_PSE);
>         else
>             l2e_content = l2e_empty();
>
>@@ -2424,6 +2431,7 @@ void p2m_change_type_global(struct domai
>     struct p2m_domain *p2m = p2m_get_hostp2m(d);
>
>     BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt));
>+    BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt ==
p2m_mmio_direct));
>
>     if ( !paging_mode_translate(d) )
>         return;
>@@ -2465,7 +2473,7 @@ void p2m_change_type_global(struct domai
>                     continue;
>                 mfn = l3e_get_pfn(l3e[i3]);
>                 gfn = get_gpfn_from_mfn(mfn);
>-                flags = p2m_type_to_flags(nt);
>+                flags = p2m_type_to_flags(nt, _mfn(mfn));
>                 l1e_content = l1e_from_pfn(mfn, flags | _PAGE_PSE);
>                 paging_write_p2m_entry(d, gfn, (l1_pgentry_t
*)&l3e[i3],
>                                        l3mfn, l1e_content, 3);
>@@ -2495,7 +2503,7 @@ void p2m_change_type_global(struct domai
> #endif
>                               )
>                            * L2_PAGETABLE_ENTRIES) *
>L1_PAGETABLE_ENTRIES;
>-                    flags = p2m_type_to_flags(nt);
>+                    flags = p2m_type_to_flags(nt, _mfn(mfn));
>                     l1e_content = l1e_from_pfn(mfn, flags | _PAGE_PSE);
>                     paging_write_p2m_entry(d, gfn, (l1_pgentry_t
*)&l2e[i2],
>                                            l2mfn, l1e_content, 2);
>@@ -2518,7 +2526,7 @@ void p2m_change_type_global(struct domai
>                                    )
>                            * L2_PAGETABLE_ENTRIES) *
>L1_PAGETABLE_ENTRIES;
>                     /* create a new 1le entry with the new type */
>-                    flags = p2m_type_to_flags(nt);
>+                    flags = p2m_type_to_flags(nt, _mfn(mfn));
>                     l1e_content = l1e_from_pfn(mfn, flags);
>                     paging_write_p2m_entry(d, gfn, &l1e[i1],
>                                            l1mfn, l1e_content, 1);
>--- 2010-06-15.orig/xen/arch/x86/mm/shadow/multi.c     2010-05-28
>13:59:16.000000000 +0200
>+++ 2010-06-15/xen/arch/x86/mm/shadow/multi.c  2010-07-09
>17:11:07.000000000 +0200
>@@ -653,7 +653,9 @@ _sh_propagate(struct vcpu *v,
>     }
>
>     /* Read-only memory */
>-    if ( p2m_is_readonly(p2mt) )
>+    if ( p2m_is_readonly(p2mt) ||
>+         (p2mt == p2m_mmio_direct &&
>+          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
>         sflags &= ~_PAGE_RW;
>
>     // protect guest page tables
>@@ -1204,15 +1206,19 @@ static int shadow_set_l1e(struct vcpu *v
>         /* About to install a new reference */
>         if ( shadow_mode_refcounts(d) ) {
>
>TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
>-            if ( shadow_get_page_from_l1e(new_sl1e, d, new_type) == 0 )
>+            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
>             {
>+            case 0:
>                 /* Doesn''t look like a pagetable. */
>                 flags |= SHADOW_SET_ERROR;
>                 new_sl1e = shadow_l1e_empty();
>-            }
>-            else
>-            {
>+                break;
>+            case -1:
>+                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
>+                /* fall through */
>+            default:
>                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
>+                break;
>             }
>         }
>     }
>--- 2010-06-15.orig/xen/arch/x86/msi.c 2010-07-09 15:19:02.000000000 +0200
>+++ 2010-06-15/xen/arch/x86/msi.c      2010-07-12 09:23:01.000000000 +0200
>@@ -16,12 +16,14 @@
> #include <xen/errno.h>
> #include <xen/pci.h>
> #include <xen/pci_regs.h>
>+#include <xen/iocap.h>
> #include <xen/keyhandler.h>
> #include <asm/io.h>
> #include <asm/smp.h>
> #include <asm/desc.h>
> #include <asm/msi.h>
> #include <asm/fixmap.h>
>+#include <asm/p2m.h>
> #include <mach_apic.h>
> #include <io_ports.h>
> #include <public/physdev.h>
>@@ -520,6 +522,43 @@ static int msi_capability_init(struct pc
>     return 0;
> }
>
>+static u64 read_pci_mem_bar(u8 bus, u8 slot, u8 func, u8 bir)
>+{
>+    u8 limit;
>+    u32 addr;
>+
>+    switch ( pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) )
>+    {
>+    case PCI_HEADER_TYPE_NORMAL:
>+        limit = 6;
>+        break;
>+    case PCI_HEADER_TYPE_BRIDGE:
>+        limit = 2;
>+        break;
>+    case PCI_HEADER_TYPE_CARDBUS:
>+        limit = 1;
>+        break;
>+    default:
>+        return 0;
>+    }
>+
>+    if ( bir >= limit )
>+        return 0;
>+    addr = pci_conf_read32(bus, slot, func, PCI_BASE_ADDRESS_0 + bir * 4);
>+    if ( (addr & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>+        return 0;
>+    if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK)
=>PCI_BASE_ADDRESS_MEM_TYPE_64 )
>+    {
>+        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
>+        if ( ++bir >= limit )
>+            return 0;
>+        return addr |
>+               ((u64)pci_conf_read32(bus, slot, func,
>+                                     PCI_BASE_ADDRESS_0 + bir * 4) <<
>32);
>+    }
>+    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
>+}
>+
> /**
>  * msix_capability_init - configure device''s MSI-X capability
>  * @dev: pointer to the pci_dev data structure of MSI-X device function
>@@ -532,7 +571,8 @@ static int msi_capability_init(struct pc
>  **/
> static int msix_capability_init(struct pci_dev *dev,
>                                 struct msi_info *msi,
>-                                struct msi_desc **desc)
>+                                struct msi_desc **desc,
>+                                unsigned int nr_entries)
> {
>     struct msi_desc *entry;
>     int pos;
>@@ -587,6 +627,70 @@ static int msix_capability_init(struct p
>
>     list_add_tail(&entry->list, &dev->msi_list);
>
>+    if ( !dev->msix_nr_entries )
>+    {
>+        u64 pba_paddr;
>+        u32 pba_offset;
>+        unsigned long pfn;
>+
>+        ASSERT(!dev->msix_used_entries);
>+        WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func,
bir));
>+
>+        dev->msix_nr_entries = nr_entries;
>+        dev->msix_table.first = pfn = PFN_DOWN(table_paddr);
>+        dev->msix_table.last = PFN_DOWN(table_paddr +
>+                                        nr_entries *
>PCI_MSIX_ENTRY_SIZE - 1);
>+        for ( ; pfn <= dev->msix_table.last; ++pfn )
>+            WARN_ON(rangeset_contains_singleton(mmio_ro_ranges, pfn));
>+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
>+                                dev->msix_table.last) )
>+            WARN();
>+
>+        pba_offset = pci_conf_read32(bus, slot, func,
>+                                     msix_pba_offset_reg(pos));
>+        bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
>+        pba_paddr = read_pci_mem_bar(bus, slot, func, bir);
>+        WARN_ON(!pba_paddr);
>+        pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;
>+
>+        dev->msix_pba.first = PFN_DOWN(pba_paddr);
>+        dev->msix_pba.last = PFN_DOWN(pba_paddr +
>+                                      BITS_TO_LONGS(nr_entries) - 1);
>+        for ( ; pfn <= dev->msix_table.last; ++pfn )
>+            if ( pfn < dev->msix_table.first || pfn >
dev->msix_table.last )
>+                WARN_ON(rangeset_contains_singleton(mmio_ro_ranges,
>pfn));
>+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
>+                                dev->msix_pba.last) )
>+            WARN();
>+printk("MSIX%02x:%02x.%x: table@(%lx,%lx), pba@(%lx,%lx)\n", bus,
slot, func,
>+       dev->msix_table.first, dev->msix_table.last,
>+       dev->msix_pba.first, dev->msix_pba.last);//temp
>+
>+        if ( dev->domain )
>+            p2m_change_entry_type_global(dev->domain, p2m_mmio_direct,
>+                                         p2m_mmio_direct);
>+        if ( !dev->domain || !paging_mode_translate(dev->domain) )
>+        {
>+            struct domain *d = dev->domain;
>+
>+            if ( !d )
>+                for_each_domain(d)
>+                    if ( !paging_mode_translate(d) &&
>+                         (iomem_access_permitted(d,
dev->msix_table.first,
>+                                                 dev->msix_table.last)
>||
>+                          iomem_access_permitted(d, dev->msix_pba.first,
>+                                                 dev->msix_pba.last)) )
>+                        break;
>+            if ( d )
>+            {
>+                /* XXX How to deal with existing mappings? */
>+            }
>+        }
>+    }
>+    WARN_ON(dev->msix_nr_entries != nr_entries);
>+    WARN_ON(dev->msix_table.first != (table_paddr >> PAGE_SHIFT));
>+    ++dev->msix_used_entries;
>+
>     /* Mask interrupt here */
>     writel(1, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
>
>@@ -707,7 +811,7 @@ static int __pci_enable_msix(struct msi_
>         return 0;
>     }
>
>-    status = msix_capability_init(pdev, msi, desc);
>+    status = msix_capability_init(pdev, msi, desc, nr_entries);
>     return status;
> }
>
>@@ -732,6 +836,16 @@ static void __pci_disable_msix(struct ms
>     writel(1, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
>
>     pci_conf_write16(bus, slot, func, msix_control_reg(pos), control);
>+
>+    if ( !--dev->msix_used_entries )
>+    {
>+        if ( rangeset_remove_range(mmio_ro_ranges,
dev->msix_table.first,
>+                                  dev->msix_table.last) )
>+            WARN();
>+        if ( rangeset_remove_range(mmio_ro_ranges, dev->msix_pba.first,
>+                                   dev->msix_pba.last) )
>+            WARN();
>+    }
> }
>
> /*
>--- 2010-06-15.orig/xen/drivers/passthrough/io.c       2010-04-22
14:43:25.000000000
>+0200
>+++ 2010-06-15/xen/drivers/passthrough/io.c    2010-07-09 16:02:23.000000000
>+0200
>@@ -26,6 +26,8 @@
> #include <xen/hvm/irq.h>
> #include <xen/tasklet.h>
>
>+struct rangeset *__read_mostly mmio_ro_ranges;
>+
> static void hvm_dirq_assist(unsigned long _d);
>
> bool_t pt_irq_need_timer(uint32_t flags)
>@@ -565,3 +567,11 @@ void hvm_dpci_eoi(struct domain *d, unsi
> unlock:
>     spin_unlock(&d->event_lock);
> }
>+
>+static int __init setup_mmio_ro_ranges(void)
>+{
>+    mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges",
>+                                  RANGESETF_prettyprint_hex);
>+    return 0;
>+}
>+__initcall(setup_mmio_ro_ranges);
>--- 2010-06-15.orig/xen/include/xen/iommu.h    2010-06-01 13:39:57.000000000
>+0200
>+++ 2010-06-15/xen/include/xen/iommu.h 2010-07-09 15:55:52.000000000 +0200
>@@ -31,6 +31,8 @@ extern bool_t force_iommu, iommu_verbose
> extern bool_t iommu_workaround_bios_bug, iommu_passthrough;
> extern bool_t iommu_snoop, iommu_qinval, iommu_intremap;
>
>+extern struct rangeset *mmio_ro_ranges;
>+
> #define domain_hvm_iommu(d)     (&d->arch.hvm_domain.hvm_iommu)
>
> #define MAX_IOMMUS 32
>--- 2010-06-15.orig/xen/include/xen/pci.h      2010-01-21 15:36:53.000000000
+0100
>+++ 2010-06-15/xen/include/xen/pci.h   2010-07-09 15:49:32.000000000 +0200
>@@ -45,6 +45,10 @@ struct pci_dev {
>     struct list_head domain_list;
>
>     struct list_head msi_list;
>+    unsigned int msix_nr_entries, msix_used_entries;
>+    struct {
>+        unsigned long first, last;
>+    } msix_table, msix_pba;
>     int msix_table_refcnt[MAX_MSIX_TABLE_PAGES];
>     int msix_table_idx[MAX_MSIX_TABLE_PAGES];
>     spinlock_t msix_table_lock;
Will the small window for qemu exists still? QEMU may setup the mapping before
the msix_capability_init().

BTW, I''m not sure if mapping as RO this will solve the sharing issue
(i.e. the PBA or table entry share with other resource).

Thanks
--jyh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Aug-05 17:44 UTC

head link

RE: [Xen-devel] granting access to MSI-X table and pending bit array

>>> "Jiang, Yunhong"  07/15/10 10:24 AM >>>
>Will the small window for qemu exists still? QEMU may setup the mapping
before the msix_capability_init().
Yes, I think so, as Dom0 still is permitted to map this area writable (revoking
this requires a qemu fix to be committed at the same time, or a prerequisite fix
to make qemu map this area read-only, which I think doesn''t fit well
within the MMIO model used there).
>BTW, I''m not sure if mapping as RO this will solve the sharing
issue (i.e. the PBA or table entry share with other resource).
Not if the device meets the standard''s requirement of these two tables
being allowed to share a page between them, but not with any other resource.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jul 2010 - granting access to MSI-X table and pending bit array

[Xen-devel] granting access to MSI-X table and pending bit array

Re: [Xen-devel] granting access to MSI-X table and pending bit array

Re: [Xen-devel] granting access to MSI-X table and pending bit array

Re: [Xen-devel] granting access to MSI-X table and pending bit array

Re: [Xen-devel] granting access to MSI-X table and pending bit array

Re: [Xen-devel] granting access to MSI-X table and pending bit array

Re: [Xen-devel] granting access to MSI-X table and pending bit array

RE: [Xen-devel] granting access to MSI-X table and pending bit array

RE: [Xen-devel] granting access to MSI-X table and pending bit array