Hi, Jan, As you may already notice the bug 1732, ( http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit is c/s 22182. I see the following attached code in your patch. It is pointless to check msi->table_base against the value read from physical device if this function is a virtual function of SR-IOV device. VFs are required to have BARs zeroed by specifications. And for VFs, unless you can read these values from corresponding PF, you will have to trust the "table_base" passed from dom0 via hypercall. Actually, this parameter is specifically introduced for enabling SR-IOV. I am not familiar with this patch and hence its story. But I think it would be very simple for you to fix this up? BTW: I vaguely recall that MSI-X table base might not be the first page of the corresponding BAR register. Shan Haitao + if ( !dev->msix_nr_entries ) + { + u64 pba_paddr; + u32 pba_offset; + + ASSERT(!dev->msix_used_entries); + WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir)); + + dev->msix_nr_entries = nr_entries; + dev->msix_table.first = PFN_DOWN(table_paddr); + dev->msix_table.last = PFN_DOWN(table_paddr + + nr_entries * PCI_MSIX_ENTRY_SIZE - 1); + WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_table.first, + dev->msix_table.last)); + + pba_offset = pci_conf_read32(bus, slot, func, + msix_pba_offset_reg(pos)); + bir = (u8)(pba_offset & PCI_MSIX_BIRMASK); + pba_paddr = read_pci_mem_bar(bus, slot, func, bir); + WARN_ON(!pba_paddr); + pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK; + + dev->msix_pba.first = PFN_DOWN(pba_paddr); + dev->msix_pba.last = PFN_DOWN(pba_paddr + + BITS_TO_LONGS(nr_entries) - 1); + WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first, + dev->msix_pba.last)); + + if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first, + dev->msix_table.last) ) + WARN(); + if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first, + dev->msix_pba.last) ) + WARN(); + + if ( dev->domain ) + p2m_change_entry_type_global(p2m_get_hostp2m(dev->domain), + p2m_mmio_direct, p2m_mmio_direct); + if ( !dev->domain || !paging_mode_translate(dev->domain) ) + { + struct domain *d = dev->domain; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 31.01.11 at 05:54, Haitao Shan <maillists.shan@gmail.com> wrote: > Hi, Jan, > > As you may already notice the bug 1732, ( > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit is > c/s 22182. > > I see the following attached code in your patch. It is pointless to check > msi->table_base against the value read from physical device if this function > is a virtual function of SR-IOV device. VFs are required to have BARs zeroed > by specifications. And for VFs, unless you can read these values from > corresponding PF, you will have to trust the "table_base" passed from dom0 > via hypercall. Actually, this parameter is specifically introduced for > enabling SR-IOV.Quite possible, which would perhaps just require removing some or all of the warnings. In doing so, it must however be avoided to introduce new ways for things to go bad silently.> I am not familiar with this patch and hence its story. But I think it would > be very simple for you to fix this up?Not really, no. I had posted this patch as a draft after there was no reaction on the part of the original implementers of the MSI and pass-through code to address the security problem we''re dealing with here (and afaict the issue still wasn''t completely addressed, as I don''t recall having seen corresponding adjustments to qemu). I never had hardware to test this with, and hence had to rely on Yunhong''s testing and ack-ing of the patch.> BTW: I vaguely recall that MSI-X table base might not be the first page of > the corresponding BAR register.While I agree that the code is lacking the use of msix_table_offset_reg(), I would question what else would be in the range supplied by the BAR, as the specification allows only MSI-X table and PBA to share a BAR. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>Not really, no. I had posted this patch as a draft after there was >no reaction on the part of the original implementers of the MSI and >pass-through code to address the security problem we''re dealing >with here (and afaict the issue still wasn''t completely addressed, >as I don''t recall having seen corresponding adjustments to qemu). >I never had hardware to test this with, and hence had to rely on >Yunhong''s testing and ack-ing of the patch.Yes, my bad. I verified it, but didn''t use the SR-IOV at that time :$ -jyh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
2011/1/31 Jan Beulich <JBeulich@novell.com>> >>> On 31.01.11 at 05:54, Haitao Shan <maillists.shan@gmail.com> wrote: > > Hi, Jan, > > > > As you may already notice the bug 1732, ( > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the > culprit is > > c/s 22182. > > > > I see the following attached code in your patch. It is pointless to check > > msi->table_base against the value read from physical device if this > function > > is a virtual function of SR-IOV device. VFs are required to have BARs > zeroed > > by specifications. And for VFs, unless you can read these values from > > corresponding PF, you will have to trust the "table_base" passed from > dom0 > > via hypercall. Actually, this parameter is specifically introduced for > > enabling SR-IOV. > > Quite possible, which would perhaps just require removing some > or all of the warnings. In doing so, it must however be avoided to > introduce new ways for things to go bad silently. > > > I am not familiar with this patch and hence its story. But I think it > would > > be very simple for you to fix this up? > > Not really, no. I had posted this patch as a draft after there was > no reaction on the part of the original implementers of the MSI and > pass-through code to address the security problem we''re dealing > with here (and afaict the issue still wasn''t completely addressed, > as I don''t recall having seen corresponding adjustments to qemu). > I never had hardware to test this with, and hence had to rely on > Yunhong''s testing and ack-ing of the patch. > > > BTW: I vaguely recall that MSI-X table base might not be the first page > of > > the corresponding BAR register. > > While I agree that the code is lacking the use of > msix_table_offset_reg(), I would question what else would be > in the range supplied by the BAR, as the specification allows > only MSI-X table and PBA to share a BAR. >This is what I copied from PCI spec 3.0. I don''t see that the spec only allows the two to be shared. -----------------------------PCI---------- To enable system software to map MSI-X structures onto different processor pages for improved access control, it is recommended that a function dedicate separate Base Address registers for the MSI-X Table and MSI-X PBA, or else provide more than the minimum required isolation with address ranges. If dedicated separate Base Address registers is not feasible, it is recommended that a function dedicate a single Base Address register for the MSI-X Table and MSI-X PBA. If a dedicated Base Address register is not feasible, it is recommended that a function isolate the MSI-X structures from the non-MSI-X structures with aligned 8 KB ranges rather than the mandatory aligned 4 KB ranges. --------------------------spec---------------> > Jan > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
We are heading Chinese New Year, which typically take 1.5 weeks. I am afraid we don''t have time slot to make a fix for this. Probably Keir can make a judgement on the impact of this bug? And decide whether a fix before 4.1 release is needed. Shan Haitao 2011/1/31 Haitao Shan <maillists.shan@gmail.com>> > > 2011/1/31 Jan Beulich <JBeulich@novell.com> > > >>> On 31.01.11 at 05:54, Haitao Shan <maillists.shan@gmail.com> wrote: >> > Hi, Jan, >> > >> > As you may already notice the bug 1732, ( >> > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the >> culprit is >> > c/s 22182. >> > >> > I see the following attached code in your patch. It is pointless to >> check >> > msi->table_base against the value read from physical device if this >> function >> > is a virtual function of SR-IOV device. VFs are required to have BARs >> zeroed >> > by specifications. And for VFs, unless you can read these values from >> > corresponding PF, you will have to trust the "table_base" passed from >> dom0 >> > via hypercall. Actually, this parameter is specifically introduced for >> > enabling SR-IOV. >> >> Quite possible, which would perhaps just require removing some >> or all of the warnings. In doing so, it must however be avoided to >> introduce new ways for things to go bad silently. >> >> > I am not familiar with this patch and hence its story. But I think it >> would >> > be very simple for you to fix this up? >> >> Not really, no. I had posted this patch as a draft after there was >> no reaction on the part of the original implementers of the MSI and >> pass-through code to address the security problem we''re dealing >> with here (and afaict the issue still wasn''t completely addressed, >> as I don''t recall having seen corresponding adjustments to qemu). >> I never had hardware to test this with, and hence had to rely on >> Yunhong''s testing and ack-ing of the patch. >> >> > BTW: I vaguely recall that MSI-X table base might not be the first page >> of >> > the corresponding BAR register. >> >> While I agree that the code is lacking the use of >> msix_table_offset_reg(), I would question what else would be >> in the range supplied by the BAR, as the specification allows >> only MSI-X table and PBA to share a BAR. >> > > This is what I copied from PCI spec 3.0. I don''t see that the spec only > allows the two to be shared. > -----------------------------PCI---------- > To enable system software to map MSI-X structures onto different processor > pages for > improved access control, it is recommended that a function dedicate > separate Base Address > registers for the MSI-X Table and MSI-X PBA, or else provide more than the > minimum > required isolation with address ranges. > If dedicated separate Base Address registers is not feasible, it is > recommended that a > function dedicate a single Base Address register for the MSI-X Table and > MSI-X PBA. > If a dedicated Base Address register is not feasible, it is recommended > that a function isolate > the MSI-X structures from the non-MSI-X structures with aligned 8 KB ranges > rather than > the mandatory aligned 4 KB ranges. > --------------------------spec--------------- > >> >> Jan >> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 31.01.11 at 09:52, Haitao Shan <maillists.shan@gmail.com> wrote: >> > BTW: I vaguely recall that MSI-X table base might not be the first page >> of >> > the corresponding BAR register. >> >> While I agree that the code is lacking the use of >> msix_table_offset_reg(), I would question what else would be >> in the range supplied by the BAR, as the specification allows >> only MSI-X table and PBA to share a BAR. >> > > This is what I copied from PCI spec 3.0. I don''t see that the spec only > allows the two to be shared. > -----------------------------PCI---------- > To enable system software to map MSI-X structures onto different processor > pages for > improved access control, it is recommended that a function dedicate separate > Base Address > registers for the MSI-X Table and MSI-X PBA, or else provide more than the > minimum > required isolation with address ranges. > If dedicated separate Base Address registers is not feasible, it is > recommended that a > function dedicate a single Base Address register for the MSI-X Table and > MSI-X PBA. > If a dedicated Base Address register is not feasible, it is recommended that > a function isolate > the MSI-X structures from the non-MSI-X structures with aligned 8 KB ranges > rather than > the mandatory aligned 4 KB ranges. > --------------------------spec---------------Sorry, it should have been *page* instead of *BAR*. I certainly can propose a fix to the not-at-offset-zero part of the problem, but the VF (SR-IOV) specific part (i.e. the determination which of the warnings can be dropped safely) should be done by someone more familiar with all aspects of it. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
If the primary symptom is the large number of call traces, then simply removing the ''offending'' WARN_ON() for 4.1 might be a suitable fix. Even if the root cause is really elsewhere (but we do not have resources to fix it in time)? -- Keir On 31/01/2011 10:02, "Haitao Shan" <maillists.shan@gmail.com> wrote:> We are heading Chinese New Year, which typically take 1.5 weeks. I am afraid > we don''t have time slot to make a fix for this. > Probably Keir can make a judgement on the impact of this bug? And decide > whether a fix before 4.1 release is needed. > > Shan Haitao > > 2011/1/31 Haitao Shan <maillists.shan@gmail.com> >> >> >> 2011/1/31 Jan Beulich <JBeulich@novell.com> >> >>>>>> On 31.01.11 at 05:54, Haitao Shan <maillists.shan@gmail.com> wrote: >>>> Hi, Jan, >>>> >>>> As you may already notice the bug 1732, ( >>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit >>>> is >>>> c/s 22182. >>>> >>>> I see the following attached code in your patch. It is pointless to check >>>> msi->table_base against the value read from physical device if this >>>> function >>>> is a virtual function of SR-IOV device. VFs are required to have BARs >>>> zeroed >>>> by specifications. And for VFs, unless you can read these values from >>>> corresponding PF, you will have to trust the "table_base" passed from dom0 >>>> via hypercall. Actually, this parameter is specifically introduced for >>>> enabling SR-IOV. >>> >>> Quite possible, which would perhaps just require removing some >>> or all of the warnings. In doing so, it must however be avoided to >>> introduce new ways for things to go bad silently. >>> >>>> I am not familiar with this patch and hence its story. But I think it would >>>> be very simple for you to fix this up? >>> >>> Not really, no. I had posted this patch as a draft after there was >>> no reaction on the part of the original implementers of the MSI and >>> pass-through code to address the security problem we''re dealing >>> with here (and afaict the issue still wasn''t completely addressed, >>> as I don''t recall having seen corresponding adjustments to qemu). >>> I never had hardware to test this with, and hence had to rely on >>> Yunhong''s testing and ack-ing of the patch. >>> >>>> BTW: I vaguely recall that MSI-X table base might not be the first page of >>>> the corresponding BAR register. >>> >>> While I agree that the code is lacking the use of >>> msix_table_offset_reg(), I would question what else would be >>> in the range supplied by the BAR, as the specification allows >>> only MSI-X table and PBA to share a BAR. >> >> This is what I copied from PCI spec 3.0. I don''t see that the spec only >> allows the two to be shared. >> -----------------------------PCI---------- >> To enable system software to map MSI-X structures onto different processor >> pages for >> improved access control, it is recommended that a function dedicate separate >> Base Address >> registers for the MSI-X Table and MSI-X PBA, or else provide more than the >> minimum >> required isolation with address ranges. >> If dedicated separate Base Address registers is not feasible, it is >> recommended that a >> function dedicate a single Base Address register for the MSI-X Table and >> MSI-X PBA. >> If a dedicated Base Address register is not feasible, it is recommended that >> a function isolate >> the MSI-X structures from the non-MSI-X structures with aligned 8 KB ranges >> rather than >> the mandatory aligned 4 KB ranges. >> --------------------------spec--------------- >>> >>> Jan >>> >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I am wondering whether we can trust msi->table_base for the moment? Simply removing the warnings can of course suppress the symptom. But assuming MSIX table at pfn 0 and changing its p2m type are a concern to me. But probably this won''t trigger anything bad, since this pfn is seldom used? Shan Haitao 2011/1/31 Keir Fraser <keir@xen.org>> If the primary symptom is the large number of call traces, then simply > removing the ''offending'' WARN_ON() for 4.1 might be a suitable fix. Even if > the root cause is really elsewhere (but we do not have resources to fix it > in time)? > > -- Keir > > On 31/01/2011 10:02, "Haitao Shan" <maillists.shan@gmail.com> wrote: > > > We are heading Chinese New Year, which typically take 1.5 weeks. I am > afraid > > we don''t have time slot to make a fix for this. > > Probably Keir can make a judgement on the impact of this bug? And decide > > whether a fix before 4.1 release is needed. > > > > Shan Haitao > > > > 2011/1/31 Haitao Shan <maillists.shan@gmail.com> > >> > >> > >> 2011/1/31 Jan Beulich <JBeulich@novell.com> > >> > >>>>>> On 31.01.11 at 05:54, Haitao Shan <maillists.shan@gmail.com> wrote: > >>>> Hi, Jan, > >>>> > >>>> As you may already notice the bug 1732, ( > >>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the > culprit > >>>> is > >>>> c/s 22182. > >>>> > >>>> I see the following attached code in your patch. It is pointless to > check > >>>> msi->table_base against the value read from physical device if this > >>>> function > >>>> is a virtual function of SR-IOV device. VFs are required to have BARs > >>>> zeroed > >>>> by specifications. And for VFs, unless you can read these values from > >>>> corresponding PF, you will have to trust the "table_base" passed from > dom0 > >>>> via hypercall. Actually, this parameter is specifically introduced for > >>>> enabling SR-IOV. > >>> > >>> Quite possible, which would perhaps just require removing some > >>> or all of the warnings. In doing so, it must however be avoided to > >>> introduce new ways for things to go bad silently. > >>> > >>>> I am not familiar with this patch and hence its story. But I think it > would > >>>> be very simple for you to fix this up? > >>> > >>> Not really, no. I had posted this patch as a draft after there was > >>> no reaction on the part of the original implementers of the MSI and > >>> pass-through code to address the security problem we''re dealing > >>> with here (and afaict the issue still wasn''t completely addressed, > >>> as I don''t recall having seen corresponding adjustments to qemu). > >>> I never had hardware to test this with, and hence had to rely on > >>> Yunhong''s testing and ack-ing of the patch. > >>> > >>>> BTW: I vaguely recall that MSI-X table base might not be the first > page of > >>>> the corresponding BAR register. > >>> > >>> While I agree that the code is lacking the use of > >>> msix_table_offset_reg(), I would question what else would be > >>> in the range supplied by the BAR, as the specification allows > >>> only MSI-X table and PBA to share a BAR. > >> > >> This is what I copied from PCI spec 3.0. I don''t see that the spec only > >> allows the two to be shared. > >> -----------------------------PCI---------- > >> To enable system software to map MSI-X structures onto different > processor > >> pages for > >> improved access control, it is recommended that a function dedicate > separate > >> Base Address > >> registers for the MSI-X Table and MSI-X PBA, or else provide more than > the > >> minimum > >> required isolation with address ranges. > >> If dedicated separate Base Address registers is not feasible, it is > >> recommended that a > >> function dedicate a single Base Address register for the MSI-X Table and > >> MSI-X PBA. > >> If a dedicated Base Address register is not feasible, it is recommended > that > >> a function isolate > >> the MSI-X structures from the non-MSI-X structures with aligned 8 KB > ranges > >> rather than > >> the mandatory aligned 4 KB ranges. > >> --------------------------spec--------------- > >>> > >>> Jan > >>> > >> > > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 31.01.11 at 13:02, Haitao Shan <maillists.shan@gmail.com> wrote: > I am wondering whether we can trust msi->table_base for the moment?But that''s already the case. A warning is issued if it doesn''t match what we read from the BAR, but what comes from Dom0 is being used. The PBA, which doesn''t gets passed up from Dom0, gets determined by reading the device''s config space (and is going to be zero or close to it when the BAR is blank). Xen itself doesn''t use it for anything but trying to write-protect the corresponding MFN.> Simply removing the warnings can of course suppress the symptom. But > assuming MSIX table at pfn 0 and changing its p2m type are a concern to me. > But probably this won''t trigger anything bad, since this pfn is seldom used?We should not use MFN zero here (and that really is what I intended the warnings to catch). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 31.01.11 at 05:54, Haitao Shan <maillists.shan@gmail.com> wrote:After taking a closer look:> As you may already notice the bug 1732, ( > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit is > c/s 22182.The warnings are a result of the c/s, but if there are functionality problems, they shouldn''t be caused by this: The MSI-X table''s base address was always determined from the value passed from Dom0 (the raw address found in the BAR) plus the table offset as found in the MSI-X capability structure.> I see the following attached code in your patch. It is pointless to check > msi->table_base against the value read from physical device if this function > is a virtual function of SR-IOV device. VFs are required to have BARs zeroed > by specifications. And for VFs, unless you can read these values from > corresponding PF, you will have to trust the "table_base" passed from dom0 > via hypercall. Actually, this parameter is specifically introduced for > enabling SR-IOV.One important question then is whether there''s a way for Xen to determine the PF for the VF and the correct BAR to use without additional help from Dom0. If that''s not possible, passing down the BAR contents needed for the PBA base address calculation on a VF would be necessary, which would require a new sub-hypercall. The only exception to this would be if both use the same BAR (and really if that''s a common case, a simple initial fix could be to use the passed down table_base value also for pba_paddr if the two BIRs match). In any case I am of the opinion that all of the warnings make sense currently, with the sole exception of the VF case of the msi->table_base != read_pci_mem_bar() one (avoiding this would require Xen to at least have a way to recognize a given <bus>:<dev>.<func> is a VF).> BTW: I vaguely recall that MSI-X table base might not be the first page of > the corresponding BAR register.Indeed - that''s what is being accounted for using table_offset (read from MSI-X capability structure + msix_table_offset_reg()). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, 31 Jan 2011, Haitao Shan wrote:> We are heading Chinese New Year, which typically take 1.5 weeks.Oh we didn''t think about it. Considering that we are very keen on getting your fixes, we''ll be slipping the code freeze by at least two weeks. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Jan, My carelessness, the patch did not cause any malfunction besides these additional warning messages. :) What I am thinking about here is: 1> Given a BDF, how can Xen determine whether it is a VFs? 2> If it is really a VF, how can Xen find its PF? For example, if a VF looks like 07:03:0, its PF might be 07:00.0 or 06:00.0 ... Perhaps a little assist from Dom0 would be very good. Another question: What would the purpose of your patch be? I mean, you are trying to remove MSIX table access right for DomUs, or you are also aiming at removing msi->table_base from the trust chain so that guests cannot pass arbitrary address down to Xen? I guess you patch already addressed the former. The latter does need a reasonable solution but it won''t be a blocker for Xen 4.1 release, right? Shan Haitao 2011/1/31 Jan Beulich <JBeulich@novell.com>> >>> On 31.01.11 at 05:54, Haitao Shan <maillists.shan@gmail.com> wrote: > After taking a closer look: > > > As you may already notice the bug 1732, ( > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the > culprit is > > c/s 22182. > > The warnings are a result of the c/s, but if there are functionality > problems, they shouldn''t be caused by this: The MSI-X table''s base > address was always determined from the value passed from Dom0 > (the raw address found in the BAR) plus the table offset as found > in the MSI-X capability structure. > > > I see the following attached code in your patch. It is pointless to check > > msi->table_base against the value read from physical device if this > function > > is a virtual function of SR-IOV device. VFs are required to have BARs > zeroed > > by specifications. And for VFs, unless you can read these values from > > corresponding PF, you will have to trust the "table_base" passed from > dom0 > > via hypercall. Actually, this parameter is specifically introduced for > > enabling SR-IOV. > > One important question then is whether there''s a way for Xen to > determine the PF for the VF and the correct BAR to use without > additional help from Dom0. If that''s not possible, passing down the > BAR contents needed for the PBA base address calculation on a > VF would be necessary, which would require a new sub-hypercall. > > The only exception to this would be if both use the same BAR (and > really if that''s a common case, a simple initial fix could be to use > the passed down table_base value also for pba_paddr if the two > BIRs match). > > In any case I am of the opinion that all of the warnings make > sense currently, with the sole exception of the VF case of the > msi->table_base != read_pci_mem_bar() one (avoiding this > would require Xen to at least have a way to recognize a given > <bus>:<dev>.<func> is a VF). > > > BTW: I vaguely recall that MSI-X table base might not be the first page > of > > the corresponding BAR register. > > Indeed - that''s what is being accounted for using table_offset (read > from MSI-X capability structure + msix_table_offset_reg()). > > Jan > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 01.02.11 at 06:48, Haitao Shan <maillists.shan@gmail.com> wrote: > What I am thinking about here is: > 1> Given a BDF, how can Xen determine whether it is a VFs? > 2> If it is really a VF, how can Xen find its PF? For example, if a VF looks > like 07:03:0, its PF might be 07:00.0 or 06:00.0 ... > Perhaps a little assist from Dom0 would be very good.Getting away without assistance from Dom0 would be much preferred, but input from your SR-IOV engineers would be needed on how (if at all) this may be achieved.> Another question: What would the purpose of your patch be? I mean, you are > trying to remove MSIX table access right for DomUs, or you are also aiming > at removing msi->table_base from the trust chain so that guests cannot pass > arbitrary address down to Xen?No, passing down the address from Dom0 is fine, but given that we have to use an alternative approach (reading config space) to determine the PBA address, it seemed rather reasonable (proven by what we now see) to verify the value as passed up by Dom0 matches what we would calculate on our own (to provide a hint at whether the PBA address determination needs any fixing, which we now know it does). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > > Another question: What would the purpose of your patch be? I mean, you > are > > trying to remove MSIX table access right for DomUs, or you are also > aiming > > at removing msi->table_base from the trust chain so that guests cannot > pass > > arbitrary address down to Xen? > > No, passing down the address from Dom0 is fine, but given that > we have to use an alternative approach (reading config space) to > determine the PBA address, it seemed rather reasonable (proven > by what we now see) to verify the value as passed up by Dom0 > matches what we would calculate on our own (to provide a hint > at whether the PBA address determination needs any fixing, > which we now know it does). >It would be nice that the value passed by dom0 could be verified, I agree. But the reality is that it is quite complex to calculate this value if the function is a VF. Per my investigation, the process should be: 1. Determine the corresponding PF of an given VF. This is already in Xen. But such kind of information is passed to Xen by dom0. Again, do we trust dom0 for this? 2. Look through SRIOV capability in PF to find the actual MEM Bar for this VF. This would require Xen access extended PCI configuration space. The mechanism is not available for 32bit Xen. Is this justified to add this to 32bit Xen? I see this as something overkilling. At minimum, I don''t see any value of adding so many code to Xen 4.1 instead of removing the unnecessary checking and warning (or necessary but not implementation friendly). Shan Haitao> > Jan > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 11.02.11 at 07:00, Haitao Shan <maillists.shan@gmail.com> wrote: >> >> > Another question: What would the purpose of your patch be? I mean, you >> are >> > trying to remove MSIX table access right for DomUs, or you are also >> aiming >> > at removing msi->table_base from the trust chain so that guests cannot >> pass >> > arbitrary address down to Xen? >> >> No, passing down the address from Dom0 is fine, but given that >> we have to use an alternative approach (reading config space) to >> determine the PBA address, it seemed rather reasonable (proven >> by what we now see) to verify the value as passed up by Dom0 >> matches what we would calculate on our own (to provide a hint >> at whether the PBA address determination needs any fixing, >> which we now know it does). >> > It would be nice that the value passed by dom0 could be verified, I agree. > But the reality is that it is quite complex to calculate this value if the > function is a VF. > Per my investigation, the process should be: > 1. Determine the corresponding PF of an given VF. > This is already in Xen. But such kind of information is passed to Xen by > dom0. Again, do we trust dom0 for this?I think trusting Dom0 is the right thing to do - it''s being trusted in other places too.> 2. Look through SRIOV capability in PF to find the actual MEM Bar for this > VF. > This would require Xen access extended PCI configuration space. The > mechanism is not available for 32bit Xen. Is this justified to add this to > 32bit Xen? I see this as something overkilling. At minimum, I don''t see any > value of adding so many code to Xen 4.1 instead of removing the unnecessary > checking and warning (or necessary but not implementation friendly).I think we can leave aside 32-bit Xen, as it''ll go away soon anyway. There''s other code requiring extended cfg space accesses, so no reason not to add this instance. As to "unnecessary checking and warning" I can only repeat that we know by now that the checking and warning *is* appropriate, as we currently don''t calculate the PBA address correctly. Once we''re sure this is done right for all cases, the warning certainly can go away (there''s no question of checking here, as the value doesn''t get passed from Dom0). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Per my investigation, the process should be: > > 1. Determine the corresponding PF of an given VF. > > This is already in Xen. But such kind of information is passed to Xen by > > dom0. Again, do we trust dom0 for this? > > I think trusting Dom0 is the right thing to do - it''s being trusted in > other places too.Yes, dom0 is trusted! Thanks! -Xin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, 2011-01-31 at 13:18 +0000, Jan Beulich wrote:> >>> On 31.01.11 at 05:54, Haitao Shan <maillists.shan@gmail.com> wrote: > After taking a closer look: > > > As you may already notice the bug 1732, ( > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit is > > c/s 22182. > > The warnings are a result of the c/s, but if there are functionality > problems, they shouldn''t be caused by this: The MSI-X table''s base > address was always determined from the value passed from Dom0 > (the raw address found in the BAR) plus the table offset as found > in the MSI-X capability structure.Actually I have some functionality problems which coincide with these WARN()''s> > I see the following attached code in your patch. It is pointless to check > > msi->table_base against the value read from physical device if this function > > is a virtual function of SR-IOV device. VFs are required to have BARs zeroed > > by specifications. And for VFs, unless you can read these values from > > corresponding PF, you will have to trust the "table_base" passed from dom0 > > via hypercall. Actually, this parameter is specifically introduced for > > enabling SR-IOV. > > One important question then is whether there''s a way for Xen to > determine the PF for the VF and the correct BAR to use without > additional help from Dom0. If that''s not possible, passing down the > BAR contents needed for the PBA base address calculation on a > VF would be necessary, which would require a new sub-hypercall.In my case (HVM) it looks like qemu has figured out the correct base address for the PBA.> The only exception to this would be if both use the same BAR (and > really if that''s a common case, a simple initial fix could be to use > the passed down table_base value also for pba_paddr if the two > BIRs match). > > In any case I am of the opinion that all of the warnings make > sense currently, with the sole exception of the VF case of the > msi->table_base != read_pci_mem_bar() one (avoiding this > would require Xen to at least have a way to recognize a given > <bus>:<dev>.<func> is a VF).I see> > BTW: I vaguely recall that MSI-X table base might not be the first page of > > the corresponding BAR register. > > Indeed - that''s what is being accounted for using table_offset (read > from MSI-X capability structure + msix_table_offset_reg()).In my case the device is ixgbe and yes, it seems to follow the 8KB aligning recommendation. The actual symptom I am having is a lot of stuff like this in the guest with VF passed-through: ixgbevf: eth: ixgbevf_reset: PF still resetting ixgbevf: eth: ixgbevf_open: Unable to start - perhaps the PFDriver isn''t up yet ixgbevf: eth: ixgbevf_check_tx_hang: Detected Tx Unit Hang Tx Queue <0> TDH, TDT <0>, <1> next_to_use <1> next_to_clean <0> tx_buffer_info[next_to_clean] time_stamp <fffd2f6b> jiffies <fffd3db4> ixgbevf: eth: ixgbevf_clean_tx_irq: tx hang 3 detected, resetting adapter ixgbevf: eth: ixgbevf_watchdog_task: NIC Link is Up 10 Gbps And correspondingly no Tx or Rx traffic at all. It all seems very much like a lack of interrupts, but /proc/interrupts shows good numbers: 201: 146 PCI-MSI-X eth-rx-0 209: 140 PCI-MSI-X eth-tx-0 217: 8 PCI-MSI-X eth:mbx Furthermore this used to work on xen 3.4 but fails on 4.1 so it seems to be a regression. One other notable change is the assignments of the MSI-X vectors that I see when hitting the Q debug key: On 3.4: (XEN) 04:10.0 - dom 1 - MSIs < 66 74 82 > On 4.1: (XEN) 04:10.1 - dom 0 - MSIs < 117 118 119 > However qemu seems happy with it all in either case: Mar 15 18:00:30 localhost qemu.1[10344]: pt_register_regions: IO region registered (size=0x00004000 base_addr=0xdd700004) Mar 15 18:00:30 localhost qemu.1[10344]: pt_register_regions: IO region registered (size=0x00004000 base_addr=0xdd800004) Mar 15 18:00:30 localhost qemu.1[10344]: pt_msix_init: get MSI-X table bar base dd800000 Mar 15 18:00:30 localhost qemu.1[10344]: pt_msix_init: table_off = 0, total_entries = 3 Mar 15 18:00:30 localhost qemu.1[10344]: pt_msix_init: errno = 2 Mar 15 18:00:30 localhost qemu.1[10344]: pt_msix_init: mapping physical MSI-X table to b5d91000 Mar 15 18:00:30 localhost qemu.1[10344]: register_real_device: Real physical device 04:10.1 registered successfuly! IRQ type = INTx ... Mar 15 18:01:13 localhost qemu.1[10344]: pt_msix_update_one: Update msix entry 0 with pirq 56 gvec b9 Mar 15 18:01:13 localhost qemu.1[10344]: pt_msix_update_one: Update msix entry 1 with pirq 55 gvec c1 Mar 15 18:01:13 localhost qemu.1[10344]: pt_msix_update_one: Update msix entry 2 with pirq 54 gvec c9 Any ideas? Gianni _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 15.03.11 at 19:30, Gianni Tedesco <gianni.tedesco@citrix.com> wrote: > ixgbevf: eth: ixgbevf_reset: PF still resettingAnd nothing interesting in Dom0''s logs?> And correspondingly no Tx or Rx traffic at all. It all seems very much > like a lack of interrupts, but /proc/interrupts shows good numbers: > > 201: 146 PCI-MSI-X eth-rx-0 > 209: 140 PCI-MSI-X eth-tx-0 > 217: 8 PCI-MSI-X eth:mbxWith the above, I''d guess more towards a PF <-> VF communication problem (which I can say nothing about).> Furthermore this used to work on xen 3.4 but fails on 4.1 so it seems to > be a regression. One other notable change is the assignments of the > MSI-X vectors that I see when hitting the Q debug key: > > On 3.4: > (XEN) 04:10.0 - dom 1 - MSIs < 66 74 82 > > > On 4.1: > (XEN) 04:10.1 - dom 0 - MSIs < 117 118 119 >dom 1 on 3.4 vs dom 0 on 4.1? And different functions? Doesn''t look like a 1:1 comparison to me.> Any ideas?Not really. Despite me not thinking that the change in question (that introduced the WARN_ON()s) has any functionality impact (it''s really only about trying to write protect certain MMIO ranges, with the WARN_ON()s reporting that this didn''t work as expected) - did you try reverting it (and its follow-up fixes)? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2011-03-16 at 08:22 +0000, Jan Beulich wrote:> >>> On 15.03.11 at 19:30, Gianni Tedesco <gianni.tedesco@citrix.com> wrote: > > ixgbevf: eth: ixgbevf_reset: PF still resetting > > And nothing interesting in Dom0''s logs?Nothing of interest in the kernel messages no.> > And correspondingly no Tx or Rx traffic at all. It all seems very much > > like a lack of interrupts, but /proc/interrupts shows good numbers: > > > > 201: 146 PCI-MSI-X eth-rx-0 > > 209: 140 PCI-MSI-X eth-tx-0 > > 217: 8 PCI-MSI-X eth:mbx > > With the above, I''d guess more towards a PF <-> VF communication > problem (which I can say nothing about).Actually, I do get this from the dom0 kernel: ixgbe: eth5: ixgbe_rcv_msg_from_vf: Set MAC msg received from vf 0> > Furthermore this used to work on xen 3.4 but fails on 4.1 so it seems to > > be a regression. One other notable change is the assignments of the > > MSI-X vectors that I see when hitting the Q debug key: > > > > On 3.4: > > (XEN) 04:10.0 - dom 1 - MSIs < 66 74 82 > > > > > On 4.1: > > (XEN) 04:10.1 - dom 0 - MSIs < 117 118 119 > > > dom 1 on 3.4 vs dom 0 on 4.1? And different functions? Doesn''t > look like a 1:1 comparison to me.Yeah they are different machines with the same SR-IOV NIC (similar enough hardware wise). But the point is the different assigned domains, bear in mind that in both cases the function in question is assigned to a guest at the time the debug key was pressed.> > Any ideas? > > Not really. Despite me not thinking that the change in question > (that introduced the WARN_ON()s) has any functionality impact > (it''s really only about trying to write protect certain MMIO > ranges, with the WARN_ON()s reporting that this didn''t work as > expected) - did you try reverting it (and its follow-up fixes)?No change. Gianni _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 16.03.11 at 14:50, Gianni Tedesco <gianni.tedesco@citrix.com> wrote: > On Wed, 2011-03-16 at 08:22 +0000, Jan Beulich wrote: >> >>> On 15.03.11 at 19:30, Gianni Tedesco <gianni.tedesco@citrix.com> wrote: >> > Furthermore this used to work on xen 3.4 but fails on 4.1 so it seems to >> > be a regression. One other notable change is the assignments of the >> > MSI-X vectors that I see when hitting the Q debug key: >> > >> > On 3.4: >> > (XEN) 04:10.0 - dom 1 - MSIs < 66 74 82 > >> > >> > On 4.1: >> > (XEN) 04:10.1 - dom 0 - MSIs < 117 118 119 > >> >> dom 1 on 3.4 vs dom 0 on 4.1? And different functions? Doesn''t >> look like a 1:1 comparison to me. > > Yeah they are different machines with the same SR-IOV NIC (similar > enough hardware wise). But the point is the different assigned domains, > bear in mind that in both cases the function in question is assigned to > a guest at the time the debug key was pressed.And even iommu=verbose doesn''t produce anything more informative? Something must be going wrong during the assignment... Are the kernels in host and guest exactly the same in both the 3.4 and the 4.1 cases? Using pciback or pci-stub?>> > Any ideas? >> >> Not really. Despite me not thinking that the change in question >> (that introduced the WARN_ON()s) has any functionality impact >> (it''s really only about trying to write protect certain MMIO >> ranges, with the WARN_ON()s reporting that this didn''t work as >> expected) - did you try reverting it (and its follow-up fixes)? > > No change.With that, the regression then clearly must be elsewhere, and I''m afraid we''re having to hope that Intel folks will take a look. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2011-03-17 at 07:48 +0000, Jan Beulich wrote:> >>> On 16.03.11 at 14:50, Gianni Tedesco <gianni.tedesco@citrix.com> wrote: > > On Wed, 2011-03-16 at 08:22 +0000, Jan Beulich wrote: > >> >>> On 15.03.11 at 19:30, Gianni Tedesco <gianni.tedesco@citrix.com> wrote: > >> > Furthermore this used to work on xen 3.4 but fails on 4.1 so it seems to > >> > be a regression. One other notable change is the assignments of the > >> > MSI-X vectors that I see when hitting the Q debug key: > >> > > >> > On 3.4: > >> > (XEN) 04:10.0 - dom 1 - MSIs < 66 74 82 > > >> > > >> > On 4.1: > >> > (XEN) 04:10.1 - dom 0 - MSIs < 117 118 119 > > >> > >> dom 1 on 3.4 vs dom 0 on 4.1? And different functions? Doesn''t > >> look like a 1:1 comparison to me. > > > > Yeah they are different machines with the same SR-IOV NIC (similar > > enough hardware wise). But the point is the different assigned domains, > > bear in mind that in both cases the function in question is assigned to > > a guest at the time the debug key was pressed. > > And even iommu=verbose doesn''t produce anything more > informative? Something must be going wrong during the > assignment...Just a bunch of stuff like this: (XEN) [VT-D]iommu.c:1363: d0:PCIe: map bdf = c:10.1 (XEN) PCI add Virtual Function 0c:10.1> Are the kernels in host and guest exactly the same in both the > 3.4 and the 4.1 cases? Using pciback or pci-stub?Well I am currently working on getting a repro on two identical boxes differing only by hypervisor versions, will let you know.> >> > Any ideas? > >> > >> Not really. Despite me not thinking that the change in question > >> (that introduced the WARN_ON()s) has any functionality impact > >> (it''s really only about trying to write protect certain MMIO > >> ranges, with the WARN_ON()s reporting that this didn''t work as > >> expected) - did you try reverting it (and its follow-up fixes)? > > > > No change. > > With that, the regression then clearly must be elsewhere, and > I''m afraid we''re having to hope that Intel folks will take a look.*nods* Gianni _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel