Jan Beulich
2010-Oct-22 10:13 UTC
[Xen-devel] VT-d device assignment may fail (regression from Xen c/s 19805:2f1fa2215e60)
Weidong, in this patch you removed a bus/devfn check around an invocation of domain_context_mapping_one() avoiding the attempt to call the function again if it was already called for this very device. This removal, however, conflicts with the context_present() check at the top of domain_context_mapping_one() - in particular, pdev->domain isn''t set to the new owner yet, and hence the function fails. The question now is whether some similar check should be restored, or whether pdev->domain should get updated earlier. This may need some additional consideration, since from looking at the code I would say that reassign_device_ownership() needs some error handling improvements too: Currently, partial failure isn''t being handled properly at all (respective devices are left in a half way state - no longer properly assigned to Dom0, but also not yet assigned to DomU). I also wonder what guarantee there is for a device to exist at <secbus>:00.0 (since if there is none, the same context_present() check could at least theoretically again lead to problems as it checks for pci_get_pdev() returning non-NULL). Finally, isn''t it inconsistent that only the original device gets its ->domain set to the new owner and gets moved to that domain''s device list, but neither the upstream bridge nor that bridge''s <secbus>:00.0 get handled the same way? What if below that bridge a device gets hot-added? Wouldn''t that device incorrectly end up in Dom0, with no failures because the bridge still appears to be owned by Dom0 while it really isn''t? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Oct-25 07:05 UTC
[Xen-devel] RE: VT-d device assignment may fail (regression from Xen c/s 19805:2f1fa2215e60)
>-----Original Message----- >From: Jan Beulich [mailto:JBeulich@novell.com] >Sent: Friday, October 22, 2010 6:13 PM >To: Han, Weidong >Cc: Jiang, Yunhong; xen-devel@lists.xensource.com >Subject: VT-d device assignment may fail (regression from Xen c/s >19805:2f1fa2215e60) > >Weidong, > >in this patch you removed a bus/devfn check around an invocation of >domain_context_mapping_one() avoiding the attempt to call the >function again if it was already called for this very device. This >removal, however, conflicts with the context_present() check at the >top of domain_context_mapping_one() - in particular, pdev->domain >isn''t set to the new owner yet, and hence the function fails.Weidong is on travel, and he may give more comments when is back tomorrow. What''s the removed bus/devfn check you mean? I didn''t catch it in the patch.> >The question now is whether some similar check should be restored, >or whether pdev->domain should get updated earlier. This may >need some additional consideration, since from looking at the code >I would say that reassign_device_ownership() needs some error >handling improvements too: Currently, partial failure isn''t being >handled properly at all (respective devices are left in a half way >state - no longer properly assigned to Dom0, but also not yet >assigned to DomU). > >I also wonder what guarantee there is for a device to exist at ><secbus>:00.0 (since if there is none, the same context_present() >check could at least theoretically again lead to problems as it >checks for pci_get_pdev() returning non-NULL).Hmm, the function 0 should always exists, but I didn''t find in spec that device 0 should always be populated.> >Finally, isn''t it inconsistent that only the original device gets its >->domain set to the new owner and gets moved to that domain''s >device list, but neither the upstream bridge nor that bridge''s ><secbus>:00.0 get handled the same way? What if below thatPer my understanding, the bridge and the <secbus>:00.0 is only for PCI device because all PCI device behind the same pcie2pci bridge should be assigned to the same domain. So if a device is assigned to a domain, the bridge and the <secbus>:00.0 should be the same, so it is not that neccessary to keep that information for the bridge and <secbus>:0.0 . But seems current implementation missed something, Weidong, correct me please, if I''m wrong. 1) Currently Xen hypervisor does not gurantee the "atomic" assignment of device. I assume this is done by tools currently. But if tools does not guard this, it may cause problem in xen hypervisor. For example, if tools assign PCI device A to domain A, and then it try to assign PCI device B (in the same bus as device A) to domain B, the second assignment (to domain B) will fail because the assign to the pci bridge fail, and thus leave the device B in half way, As Jan stated above. 2) If a device is hot-added, the hot-added device is owned by domain0 by default, that may cause issue.>bridge a device gets hot-added? Wouldn''t that device >incorrectly end up in Dom0, with no failures because the bridge >still appears to be owned by Dom0 while it really isn''t?Yes, that has trouble. The device should be hide to dom0. Thanks --jyh> >Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Oct-25 08:58 UTC
[Xen-devel] RE: VT-d device assignment may fail (regression from Xen c/s 19805:2f1fa2215e60)
>>> On 25.10.10 at 09:05, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >>From: Jan Beulich [mailto:JBeulich@novell.com] > What''s the removed bus/devfn check you mean? I didn''t catch it in the patch.There used to be if ( (secbus != bus) && (secdevfn != 0) ) guarding the second call to domain_context_mapping_one().>>Finally, isn''t it inconsistent that only the original device gets its >>->domain set to the new owner and gets moved to that domain''s >>device list, but neither the upstream bridge nor that bridge''s >><secbus>:00.0 get handled the same way? What if below that > > Per my understanding, the bridge and the <secbus>:00.0 is only for PCI device > because all PCI device behind the same pcie2pci bridge should be assigned to > the same domain. So if a device is assigned to a domain, the bridge and the > <secbus>:00.0 should be the same, so it is not that neccessary to keep that > information for the bridge and <secbus>:0.0 .Hmm, having the hypervisor rely that much on the tools behaving well doesn''t seem too good an idea to me. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Oct-25 15:31 UTC
[Xen-devel] RE: VT-d device assignment may fail (regression from Xen c/s 19805:2f1fa2215e60)
CC Allen for more input. Thanks --jyh>-----Original Message----- >From: Jan Beulich [mailto:JBeulich@novell.com] >Sent: Monday, October 25, 2010 4:59 PM >To: Han, Weidong; Jiang, Yunhong >Cc: xen-devel@lists.xensource.com >Subject: RE: VT-d device assignment may fail (regression from Xen c/s >19805:2f1fa2215e60) > >>>> On 25.10.10 at 09:05, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >>>From: Jan Beulich [mailto:JBeulich@novell.com] >> What''s the removed bus/devfn check you mean? I didn''t catch it in the patch. > >There used to be > > if ( (secbus != bus) && (secdevfn != 0) ) > >guarding the second call to domain_context_mapping_one(). > >>>Finally, isn''t it inconsistent that only the original device gets its >>>->domain set to the new owner and gets moved to that domain''s >>>device list, but neither the upstream bridge nor that bridge''s >>><secbus>:00.0 get handled the same way? What if below that >> >> Per my understanding, the bridge and the <secbus>:00.0 is only for PCI device >> because all PCI device behind the same pcie2pci bridge should be assigned to >> the same domain. So if a device is assigned to a domain, the bridge and the >> <secbus>:00.0 should be the same, so it is not that neccessary to keep that >> information for the bridge and <secbus>:0.0 . > >Hmm, having the hypervisor rely that much on the tools behaving >well doesn''t seem too good an idea to me. > >Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Oct-25 15:52 UTC
[Xen-devel] RE: VT-d device assignment may fail (regression from Xen c/s 19805:2f1fa2215e60)
>-----Original Message----- >From: Jan Beulich [mailto:JBeulich@novell.com] >Sent: Monday, October 25, 2010 4:59 PM >To: Han, Weidong; Jiang, Yunhong >Cc: xen-devel@lists.xensource.com >Subject: RE: VT-d device assignment may fail (regression from Xen c/s >19805:2f1fa2215e60) > >>>> On 25.10.10 at 09:05, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >>>From: Jan Beulich [mailto:JBeulich@novell.com] >> What''s the removed bus/devfn check you mean? I didn''t catch it in the patch. > >There used to be > > if ( (secbus != bus) && (secdevfn != 0) ) > >guarding the second call to domain_context_mapping_one().I didn''t understand quite clearly of the oriignal check (the secbus!= bus && secdevfn != 0). But seems the new code should be ok, we only need setup the devfn=0 for the pcie2pci bridge, at least according to the vt-d spec. --jyh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Oct-25 16:07 UTC
[Xen-devel] RE: VT-d device assignment may fail (regression from Xen c/s 19805:2f1fa2215e60)
>>> On 25.10.10 at 17:52, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> >>-----Original Message----- >>From: Jan Beulich [mailto:JBeulich@novell.com] >>Sent: Monday, October 25, 2010 4:59 PM >>To: Han, Weidong; Jiang, Yunhong >>Cc: xen-devel@lists.xensource.com >>Subject: RE: VT-d device assignment may fail (regression from Xen c/s >>19805:2f1fa2215e60) >> >>>>> On 25.10.10 at 09:05, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >>>>From: Jan Beulich [mailto:JBeulich@novell.com] >>> What''s the removed bus/devfn check you mean? I didn''t catch it in the patch. >> >>There used to be >> >> if ( (secbus != bus) && (secdevfn != 0) ) >> >>guarding the second call to domain_context_mapping_one(). > > I didn''t understand quite clearly of the oriignal check (the secbus!= bus && > secdevfn != 0). But seems the new code should be ok, we only need setup the > devfn=0 for the pcie2pci bridge, at least according to the vt-d spec.It obviously isn''t okay in the case where the original device is the one at <secbus>:00.0 (and avoiding the attempt to map the device a second time was - afaict - the intention of that check). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Alfred Song
2010-Oct-26 03:48 UTC
RE: [Xen-devel] RE: VT-d device assignment may fail (regression from Xen c/s 19805:2f1fa2215e60)
Hi yulong and guys, I am just interested in the topic and I have the same question about hot-plug as Yunhong said in the old mail of this list: I think the conflict is when we assigned a pci bridge to a domain U, according to the vt-d spec, all the devices under the bridge should have been directly assigned to the domain U too. So the point may be in the current code, the whole bridge(including the current devices under the bridge) will not be allowed to be assigned to the same domain U again. Assume that a real pci device hot-attached to the HW platform where its upstream bridge directly assigned to the above domain U( if the case is possible), the device, according to the vt-d spec, should be assigned to the same domain U, but whether the hot-plug device is still not allowed to be assigned? By the way, I think if the real device which is hot-plugged to hw platform was assigned to domain 0 firstly and then let the tools deal with it, like notifying qemu for a new device plugged, the process makes sense, if it is automatically and we don''t need some extra work for an administrator. -- zhuo>>> On 25.10.10 at 17:52, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> >>-----Original Message----- >>From: Jan Beulich [mailto:JBeulich@novell.com] >>Sent: Monday, October 25, 2010 4:59 PM >>To: Han, Weidong; Jiang, Yunhong >>Cc: xen-devel@lists.xensource.com >>Subject: RE: VT-d device assignment may fail (regression from Xen c/s >>19805:2f1fa2215e60) >> >>>>> On 25.10.10 at 09:05, "Jiang, Yunhong" <yunhong.jiang@intel.com>wrote:>>>>From: Jan Beulich [mailto:JBeulich@novell.com] >>> What''s the removed bus/devfn check you mean? I didn''t catch it in thepatch.>> >>There used to be >> >> if ( (secbus != bus) && (secdevfn != 0) ) >> >>guarding the second call to domain_context_mapping_one(). > > I didn''t understand quite clearly of the oriignal check (the secbus!= bus&&> secdevfn != 0). But seems the new code should be ok, we only need setupthe> devfn=0 for the pcie2pci bridge, at least according to the vt-d spec.It obviously isn''t okay in the case where the original device is the one at <secbus>:00.0 (and avoiding the attempt to map the device a second time was - afaict - the intention of that check). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Weidong Han
2010-Oct-27 05:43 UTC
[Xen-devel] Re: VT-d device assignment may fail (regression from Xen c/s 19805:2f1fa2215e60)
Jan Beulich wrote:> Weidong, > > in this patch you removed a bus/devfn check around an invocation of > domain_context_mapping_one() avoiding the attempt to call the > function again if it was already called for this very device. This > removal, however, conflicts with the context_present() check at the > top of domain_context_mapping_one() - in particular, pdev->domain > isn''t set to the new owner yet, and hence the function fails. > > The question now is whether some similar check should be restored, > or whether pdev->domain should get updated earlier. This may >I prefer to add the check.> need some additional consideration, since from looking at the code > I would say that reassign_device_ownership() needs some error > handling improvements too: Currently, partial failure isn''t being > handled properly at all (respective devices are left in a half way > state - no longer properly assigned to Dom0, but also not yet > assigned to DomU). >Agree. The assignment should guarantee "done" or "none".> I also wonder what guarantee there is for a device to exist at > <secbus>:00.0 (since if there is none, the same context_present() > check could at least theoretically again lead to problems as it > checks for pci_get_pdev() returning non-NULL > > Finally, isn''t it inconsistent that only the original device gets its > ->domain set to the new owner and gets moved to that domain''s > device list, but neither the upstream bridge nor that bridge''s > <secbus>:00.0 get handled the same way? What if below that >Yes, it''s better to do the same for the upstream bridge.> bridge a device gets hot-added? Wouldn''t that device > incorrectly end up in Dom0, with no failures because the bridge > still appears to be owned by Dom0 while it really isn''t? >Do you want some error message for this case? Regards, Weidong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Oct-27 09:55 UTC
[Xen-devel] Re: VT-d device assignment may fail (regression from Xen c/s 19805:2f1fa2215e60)
>>> On 27.10.10 at 07:43, Weidong Han <weidong.han@intel.com> wrote: > Jan Beulich wrote: >> need some additional consideration, since from looking at the code >> I would say that reassign_device_ownership() needs some error >> handling improvements too: Currently, partial failure isn''t being >> handled properly at all (respective devices are left in a half way >> state - no longer properly assigned to Dom0, but also not yet >> assigned to DomU). >> > Agree. The assignment should guarantee "done" or "none".Are you going to work on this?>> I also wonder what guarantee there is for a device to exist at >> <secbus>:00.0 (since if there is none, the same context_present() >> check could at least theoretically again lead to problems as it >> checks for pci_get_pdev() returning non-NULL >> >> Finally, isn''t it inconsistent that only the original device gets its >> ->domain set to the new owner and gets moved to that domain''s >> device list, but neither the upstream bridge nor that bridge''s >> <secbus>:00.0 get handled the same way? What if below that >> > Yes, it''s better to do the same for the upstream bridge.And this?>> bridge a device gets hot-added? Wouldn''t that device >> incorrectly end up in Dom0, with no failures because the bridge >> still appears to be owned by Dom0 while it really isn''t? >> > Do you want some error message for this case?First of all I''d want the case to be handled correctly. Only if it really can''t be handled, I''d want an error message, yes, as silent failure leading to later mysterious misbehavior is very hard to diagnose. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Oct-27 11:34 UTC
[Xen-devel] Re: VT-d device assignment may fail (regression from Xen c/s 19805:2f1fa2215e60)
>>> On 27.10.10 at 07:43, Weidong Han <weidong.han@intel.com> wrote: > Jan Beulich wrote: >> The question now is whether some similar check should be restored, >> or whether pdev->domain should get updated earlier. This may >> > I prefer to add the check.Like this (not tested yet, simplifying the code a little at once): --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1371,23 +1371,16 @@ static int domain_context_mapping(struct if ( find_upstream_bridge(&bus, &devfn, &secbus) < 1 ) break; - /* PCIe to PCI/PCIx bridge */ - if ( pdev_type(bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE ) - { - ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn); - if ( ret ) - return ret; + ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn); - /* - * Devices behind PCIe-to-PCI/PCIx bridge may generate - * different requester-id. It may originate from devfn=0 - * on the secondary bus behind the bridge. Map that id - * as well. - */ + /* + * Devices behind PCIe-to-PCI/PCIx bridge may generate different + * requester-id. It may originate from devfn=0 on the secondary bus + * behind the bridge. Map that id as well if we didn''t already. + */ + if ( !ret && pdev_type(bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE && + (secbus != pdev->bus || pdev->devfn != 0) ) ret = domain_context_mapping_one(domain, drhd->iommu, secbus, 0); - } - else /* Legacy PCI bridge */ - ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn); break; Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Weidong Han
2010-Oct-28 00:26 UTC
[Xen-devel] Re: VT-d device assignment may fail (regression from Xen c/s 19805:2f1fa2215e60)
Jan Beulich wrote:>>>> On 27.10.10 at 07:43, Weidong Han <weidong.han@intel.com> wrote: >>>> >> Jan Beulich wrote: >> >>> need some additional consideration, since from looking at the code >>> I would say that reassign_device_ownership() needs some error >>> handling improvements too: Currently, partial failure isn''t being >>> handled properly at all (respective devices are left in a half way >>> state - no longer properly assigned to Dom0, but also not yet >>> assigned to DomU). >>> >>> >> Agree. The assignment should guarantee "done" or "none". >> > > Are you going to work on this? >Not yet. I''m working on other priority tasks.> >>> I also wonder what guarantee there is for a device to exist at >>> <secbus>:00.0 (since if there is none, the same context_present() >>> check could at least theoretically again lead to problems as it >>> checks for pci_get_pdev() returning non-NULL >>> >>> Finally, isn''t it inconsistent that only the original device gets its >>> ->domain set to the new owner and gets moved to that domain''s >>> device list, but neither the upstream bridge nor that bridge''s >>> <secbus>:00.0 get handled the same way? What if below that >>> >>> >> Yes, it''s better to do the same for the upstream bridge. >> > > And this? > > >>> bridge a device gets hot-added? Wouldn''t that device >>> incorrectly end up in Dom0, with no failures because the bridge >>> still appears to be owned by Dom0 while it really isn''t? >>> >>> >> Do you want some error message for this case? >> > > First of all I''d want the case to be handled correctly. Only if it > really can''t be handled, I''d want an error message, yes, as > silent failure leading to later mysterious misbehavior is very > hard to diagnose. >Agree! Regards, Weidong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Weidong Han
2010-Oct-28 00:31 UTC
[Xen-devel] Re: VT-d device assignment may fail (regression from Xen c/s 19805:2f1fa2215e60)
Jan Beulich wrote:>>>> On 27.10.10 at 07:43, Weidong Han <weidong.han@intel.com> wrote: >>>> >> Jan Beulich wrote: >> >>> The question now is whether some similar check should be restored, >>> or whether pdev->domain should get updated earlier. This may >>> >>> >> I prefer to add the check. >> > > Like this (not tested yet, simplifying the code a little at once): > > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -1371,23 +1371,16 @@ static int domain_context_mapping(struct > if ( find_upstream_bridge(&bus, &devfn, &secbus) < 1 ) > break; > > - /* PCIe to PCI/PCIx bridge */ > - if ( pdev_type(bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE ) > - { > - ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn); > - if ( ret ) > - return ret; > + ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn); > > - /* > - * Devices behind PCIe-to-PCI/PCIx bridge may generate > - * different requester-id. It may originate from devfn=0 > - * on the secondary bus behind the bridge. Map that id > - * as well. > - */ > + /* > + * Devices behind PCIe-to-PCI/PCIx bridge may generate different > + * requester-id. It may originate from devfn=0 on the secondary bus > + * behind the bridge. Map that id as well if we didn''t already. > + */ > + if ( !ret && pdev_type(bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE && > + (secbus != pdev->bus || pdev->devfn != 0) ) > ret = domain_context_mapping_one(domain, drhd->iommu, secbus, 0); > - } > - else /* Legacy PCI bridge */ > - ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn); > > break; > > > >Looks good. Regards, Weidong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel