Cui, Dexuan
2009-Jan-15 03:09 UTC
[Xen-devel] Move some of the PCI device manage/control into pciback?
1) Now in Xen VT-d, the FLR related things (can the device(s) be statically/dynamically assigned to a guest? how should the device(s) be FLR-ed?) are done in xend. The diff of the python patch is ~700 lines. We may consider moving these things to pciback. Certainly, with these things in pciback, I''m afraid we''ll have less flexibility -- a small adjustment (e.g., some people would like to relax the co-assignment constraint) or a bug fix requires a reload of pciback or a reboot of host (if pciback is built into Dom0 kernel). And we have some other issues: a) moving all the python logic into the pciback using C needs a big effort so maybe somebody doesn''t like the big number of the line of code; b) we may need to add an interface between pciback and control panel so that xend can invoke these FLR related functions of pciback. 2) Now the pci config space virtualizations of PV and HVM guests are not the same and there are some duplicated codes in pciback and ioemu. Now the ioemu of Dom0 accesses device config space via libpci (the /sys); maybe ioemu can talk to pciback directly? In the case of stubdomain, looks the libpci is implemented via pcifront -- if ioemu can talk to pciback directly, I think we can eliminate the duplicated codes in ioemu and we''ll have a consistency between PV and HVM. And for the pci passthrough related hypercalls invoked by the ioemu in the de-priviledged stubdomain, I think ioemu can ask pciback to help to invoke the hypercall, but this needs us to add an interface in pciback. All these things need us to re-architect the current codes. Will this bring compatibility issues? I remember it''s said Xen 3.4 will be released in March; now it''s the suitable time for us to consider the changes? PS, in the long run -- how long? -- will ioemu be removed from Dom0 and stubdomain will be the only place for ioemu? Any comment is appreciated! Thanks, -- Dexuan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Jan-15 10:17 UTC
Re: [Xen-devel] Move some of the PCI device manage/control into pciback?
On Thu, 15 Jan 2009 11:09:21 +0800 "Cui, Dexuan" <dexuan.cui@intel.com> wrote:> 1) Now in Xen VT-d, the FLR related things (can the device(s) be > statically/dynamically assigned to a guest? how should the device(s) > be FLR-ed?) are done in xend. The diff of the python patch is ~700 > lines. We may consider moving these things to pciback. Certainly, > with these things in pciback, I''m afraid we''ll have less flexibility > -- a small adjustment (e.g., some people would like to relax the > co-assignment constraint) or a bug fix requires a reload of pciback > or a reboot of host (if pciback is built into Dom0 kernel). And we > have some other issues: a) moving all the python logic into the > pciback using C needs a big effort so maybe somebody doesn''t like > the big number of the line of code; b) we may need to add an > interface between pciback and control panel so that xend can invoke > these FLR related functions of pciback > > 2) Now the pci config space virtualizations of PV and HVM guests are > not the same and there are some duplicated codes in pciback and > ioemu. Now the ioemu of Dom0 accesses device config space via libpci > (the /sys); maybe ioemu can talk to pciback directly? In the case > of stubdomain, looks the libpci is implemented via pcifront -- if > ioemu can talk to pciback directly, I think we can eliminate the > duplicated codes in ioemu and we''ll have a consistency between PV > and HVM.I agree with you that there are two similar codes in pciback and ioemu. But I''m not happy if the code is removed from ioemu. In case of HVM domain with stub domain, I''m considering direct access from ioemu to configuration space. We can achieve this by mapping the subset of MMCFG to stub domain. This will improve the scalability of PCI pass-through and reduce the responsibility of dom0. My model is the following. 1. PCI back driver resets the device and setups it. 2. PCI back driver passes the responsibility of configuration space of device to ioemu. 3. Ioemu reads/writes configuration space of the device, responding guest OS. 4. When ioemu exits, pci back driver gets the responsibility of configuration space of device. 5. PCI back driver resets device (and put D3hot state if possible) As you know, current xend reads/writes configuration space. If xend doesn''t reads/writes, the architecture becomes simpler. What do you think about this? Thanks, -- Shohei Fujiwara> And for the pci passthrough related hypercalls invoked by > the ioemu in the de-priviledged stubdomain, I think ioemu can ask > pciback to help to invoke the hypercall, but this needs us to add an > interface in pciback.> All these things need us to re-architect the current codes. Will > this bring compatibility issues? I remember it''s said Xen 3.4 will > be released in March; now it''s the suitable time for us to consider > the changes? > > PS, in the long run -- how long? -- will ioemu be removed from Dom0 > and stubdomain will be the only place for ioemu? > > Any comment is appreciated! > > Thanks, > -- Dexuan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-15 11:04 UTC
Re: [Xen-devel] Move some of the PCI device manage/control into pciback?
On 15/01/2009 10:17, "Shohei Fujiwara" <fujiwara-sxa@necst.nec.co.jp> wrote:> In case of HVM domain with stub domain, I''m considering direct access > from ioemu to configuration space. We can achieve this by mapping the > subset of MMCFG to stub domain. This will improve the scalability of PCI > pass-through and reduce the responsibility of dom0. > > My model is the following. > > 1. PCI back driver resets the device and setups it. > 2. PCI back driver passes the responsibility of configuration > space of device to ioemu. > 3. Ioemu reads/writes configuration space of the device, > responding guest OS. > 4. When ioemu exits, pci back driver gets the responsibility of > configuration space of device. > 5. PCI back driver resets device (and put D3hot state if possible) > > As you know, current xend reads/writes configuration space. If xend > doesn''t reads/writes, the architecture becomes simpler. > > What do you think about this?I''d rather have all accesses mediated through pciback. I don''t think PCI config accesses should be on any data path anyway, and you''ve already taken the hit of trapping to qemu in that case. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Jan-16 03:26 UTC
RE: [Xen-devel] Move some of the PCI device manage/control into pciback?
xen-devel-bounces@lists.xensource.com <> wrote:> On Thu, 15 Jan 2009 11:09:21 +0800 > "Cui, Dexuan" <dexuan.cui@intel.com> wrote: > >> 1) Now in Xen VT-d, the FLR related things (can the device(s) be >> statically/dynamically assigned to a guest? how should the device(s) >> be FLR-ed?) are done in xend. The diff of the python patch is ~700 >> lines. We may consider moving these things to pciback. Certainly, >> with these things in pciback, I''m afraid we''ll have less flexibility >> -- a small adjustment (e.g., some people would like to relax the >> co-assignment constraint) or a bug fix requires a reload of pciback >> or a reboot of host (if pciback is built into Dom0 kernel). And we >> have some other issues: a) moving all the python logic into the >> pciback using C needs a big effort so maybe somebody doesn''t like >> the big number of the line of code; b) we may need to add an >> interface between pciback and control panel so that xend can invoke >> these FLR related functions of pcibackI''m still not sure if we really need such flexibility in production environment.>> >> 2) Now the pci config space virtualizations of PV and HVM guests are >> not the same and there are some duplicated codes in pciback and >> ioemu. Now the ioemu of Dom0 accesses device config space via libpci >> (the /sys); maybe ioemu can talk to pciback directly? In the case >> of stubdomain, looks the libpci is implemented via pcifront -- if >> ioemu can talk to pciback directly, I think we can eliminate the >> duplicated codes in ioemu and we''ll have a consistency between PV >> and HVM.So you mean ioemu initiate xen_pci_op directly to pciback?> > I agree with you that there are two similar codes in pciback and > ioemu. But I''m not happy if the code is removed from ioemu. > > In case of HVM domain with stub domain, I''m considering direct access > from ioemu to configuration space. We can achieve this by mapping the > subset of MMCFG to stub domain. This will improve the > scalability of PCI > pass-through and reduce the responsibility of dom0. > > My model is the following. > > 1. PCI back driver resets the device and setups it. > 2. PCI back driver passes the responsibility of configuration > space of device to ioemu. > 3. Ioemu reads/writes configuration space of the device, > responding guest OS. > 4. When ioemu exits, pci back driver gets the responsibility of > configuration space of device. > 5. PCI back driver resets device (and put D3hot state if possible) > > As you know, current xend reads/writes configuration space. If xend > doesn''t reads/writes, the architecture becomes simpler. > > What do you think about this?Shohei, I think this model may have some issue. a) The stubdomain/qemu is not trustable, so user may use a fake stub domain and try to programe some sensitive config space (like MSI). b) If there is no mmcfg support, to sync access to cf8/cfc will be difficult. So you mean we have different implementation for mmcfg/cf8 method?> > Thanks, > -- > Shohei Fujiwara > >> And for the pci passthrough related hypercalls invoked by >> the ioemu in the de-priviledged stubdomain, I think ioemu can ask >> pciback to help to invoke the hypercall, but this needs us to add an >> interface in pciback. > >> All these things need us to re-architect the current codes. Will >> this bring compatibility issues? I remember it''s said Xen 3.4 will >> be released in March; now it''s the suitable time for us to consider the >> changes? >> >> PS, in the long run -- how long? -- will ioemu be removed from Dom0 >> and stubdomain will be the only place for ioemu? >> >> Any comment is appreciated! >> >> Thanks, >> -- Dexuan >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Jan-16 05:47 UTC
Re: [Xen-devel] Move some of the PCI device manage/control into pciback?
On Fri, 16 Jan 2009 11:26:10 +0800 "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> > I agree with you that there are two similar codes in pciback and > > ioemu. But I''m not happy if the code is removed from ioemu. > > > > In case of HVM domain with stub domain, I''m considering direct access > > from ioemu to configuration space. We can achieve this by mapping the > > subset of MMCFG to stub domain. This will improve the > > scalability of PCI > > pass-through and reduce the responsibility of dom0. > > > > My model is the following. > > > > 1. PCI back driver resets the device and setups it. > > 2. PCI back driver passes the responsibility of configuration > > space of device to ioemu. > > 3. Ioemu reads/writes configuration space of the device, > > responding guest OS. > > 4. When ioemu exits, pci back driver gets the responsibility of > > configuration space of device. > > 5. PCI back driver resets device (and put D3hot state if possible) > > > > As you know, current xend reads/writes configuration space. If xend > > doesn''t reads/writes, the architecture becomes simpler. > > > > What do you think about this? > > Shohei, I think this model may have some issue. > a) The stubdomain/qemu is not trustable, so user may use a fake stub > domain and try to programe some sensitive config space (like MSI).My idea is to call XEN_DOMCTL_iomem_permission from domain 0. So my idea doesn''t open a new hole. In addition to this, interrupt remapping of VT-d can block invalid MSI.> b) If there is no mmcfg support, to sync access to cf8/cfc will be > difficult. So you mean we have different implementation for > mmcfg/cf8 method?If there is no mmcfg support, I''d like to use existing mechanism (pciback in dom0 and pcifront in stub domain). If there is mmcfg support, I''d like to allow stub domain to access directly. Thanks, -- Shohei Fujiwara _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Jan-16 06:16 UTC
RE: [Xen-devel] Move some of the PCI device manage/control into pciback?
Shohei Fujiwara <mailto:fujiwara-sxa@necst.nec.co.jp> wrote:> On Fri, 16 Jan 2009 11:26:10 +0800 > "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> I agree with you that there are two similar codes in pciback and >>> ioemu. But I''m not happy if the code is removed from ioemu. >>> >>> In case of HVM domain with stub domain, I''m considering direct access >>> from ioemu to configuration space. We can achieve this by mapping the >>> subset of MMCFG to stub domain. This will improve the >>> scalability of PCI >>> pass-through and reduce the responsibility of dom0. >>> >>> My model is the following. >>> >>> 1. PCI back driver resets the device and setups it. >>> 2. PCI back driver passes the responsibility of configuration >>> space of device to ioemu. >>> 3. Ioemu reads/writes configuration space of the device, responding >>> guest OS. >>> 4. When ioemu exits, pci back driver gets the responsibility of >>> configuration space of device. >>> 5. PCI back driver resets device (and put D3hot state if possible) >>> >>> As you know, current xend reads/writes configuration space. If xend >>> doesn''t reads/writes, the architecture becomes simpler. >>> >>> What do you think about this? >> >> Shohei, I think this model may have some issue. >> a) The stubdomain/qemu is not trustable, so user may use a fake stub >> domain and try to programe some sensitive config space (like MSI). > > My idea is to call XEN_DOMCTL_iomem_permission from domain 0. > So my idea doesn''t open a new hole. > In addition to this, interrupt remapping of VT-d can block invalid MSI.I suspect that feature is not enabled in all system. Also what will happen if guest try to change the BAR value? Will be passed to hardware also? I''m not sure what will happen if two device under the same bus has the same BAR value. Maybe then it is possible one guest can write MMIO of another device.> >> b) If there is no mmcfg support, to sync access to cf8/cfc will be >> difficult. So you mean we have different implementation for >> mmcfg/cf8 method? > > If there is no mmcfg support, I''d like to use existing > mechanism (pciback in dom0 and pcifront in stub domain). > > If there is mmcfg support, I''d like to allow stub domain to access directly.I''m not sure how difference between these two implementation and if we really want keep this implementation. Mostly I think it is ok since it should not be on data path (Or any special device will do that??) But there is really one thing we need consider: The mask bit for MSI/MSI-X. Because guest may try to mask/unmask the interrupt. Maybe we need translate that operation to the mask/unmask of the virtual interrupt.> > Thanks, > -- > Shohei Fujiwara_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Jan-16 06:18 UTC
RE: [Xen-devel] Move some of the PCI device manage/control into pciback?
xen-devel-bounces@lists.xensource.com <> wrote:> On 15/01/2009 10:17, "Shohei Fujiwara" > <fujiwara-sxa@necst.nec.co.jp> wrote: > >> In case of HVM domain with stub domain, I''m considering direct access >> from ioemu to configuration space. We can achieve this by mapping the >> subset of MMCFG to stub domain. This will improve the scalability of PCI >> pass-through and reduce the responsibility of dom0. >> >> My model is the following. >> >> 1. PCI back driver resets the device and setups it. >> 2. PCI back driver passes the responsibility of configuration >> space of device to ioemu. >> 3. Ioemu reads/writes configuration space of the device, >> responding guest OS. >> 4. When ioemu exits, pci back driver gets the responsibility of >> configuration space of device. >> 5. PCI back driver resets device (and put D3hot state if possible) >> >> As you know, current xend reads/writes configuration space. If xend >> doesn''t reads/writes, the architecture becomes simpler. >> >> What do you think about this? > > I''d rather have all accesses mediated through pciback. I don''t think PCI > config accesses should be on any data path anyway, and you''ve already taken > the hit of trapping to qemu in that case.There is one exception: The mask bit for MSI/MSI-X. Maybe we need add some mechanism for HVM domain to mask/unmask the virtual interrupt directly, like what DomU did for evtchn. But that will be tricky. Thanks Yunhong Jiang> > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-16 08:07 UTC
Re: [Xen-devel] Move some of the PCI device manage/control into pciback?
On 16/01/2009 06:18, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> I''d rather have all accesses mediated through pciback. I don''t think PCI >> config accesses should be on any data path anyway, and you''ve already taken >> the hit of trapping to qemu in that case. > > There is one exception: The mask bit for MSI/MSI-X. Maybe we need add some > mechanism for HVM domain to mask/unmask the virtual interrupt directly, like > what DomU did for evtchn. But that will be tricky.Yes, that did occur to me. We already have plenty of special emulation code for MSI/MSI-x. I guess we may explicitly paravirtualise that aspect in a different way which would allow ioemu to interact direct with Xen. Actually if mask/unmask happens on every IRQ, we may need to push support for the PCI MSI registers right down into Xen itself to get decent speed? Because going to qemu with any great frequency is not very high performance. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Jan-16 08:55 UTC
Re: [Xen-devel] Move some of the PCI device manage/control into pciback?
On Fri, 16 Jan 2009 14:16:08 +0800 "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> Shohei Fujiwara <mailto:fujiwara-sxa@necst.nec.co.jp> wrote: > > On Fri, 16 Jan 2009 11:26:10 +0800 > > "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > > > >>> I agree with you that there are two similar codes in pciback and > >>> ioemu. But I''m not happy if the code is removed from ioemu. > >>> > >>> In case of HVM domain with stub domain, I''m considering direct access > >>> from ioemu to configuration space. We can achieve this by mapping the > >>> subset of MMCFG to stub domain. This will improve the > >>> scalability of PCI > >>> pass-through and reduce the responsibility of dom0. > >>> > >>> My model is the following. > >>> > >>> 1. PCI back driver resets the device and setups it. > >>> 2. PCI back driver passes the responsibility of configuration > >>> space of device to ioemu. > >>> 3. Ioemu reads/writes configuration space of the device, responding > >>> guest OS. > >>> 4. When ioemu exits, pci back driver gets the responsibility of > >>> configuration space of device. > >>> 5. PCI back driver resets device (and put D3hot state if possible) > >>> > >>> As you know, current xend reads/writes configuration space. If xend > >>> doesn''t reads/writes, the architecture becomes simpler. > >>> > >>> What do you think about this? > >> > >> Shohei, I think this model may have some issue. > >> a) The stubdomain/qemu is not trustable, so user may use a fake stub > >> domain and try to programe some sensitive config space (like MSI). > > > > My idea is to call XEN_DOMCTL_iomem_permission from domain 0. > > So my idea doesn''t open a new hole. > > In addition to this, interrupt remapping of VT-d can block invalid MSI. > > I suspect that feature is not enabled in all system. > > Also what will happen if guest try to change the BAR value? Will be > passed to hardware also? I''m not sure what will happen if two device > under the same bus has the same BAR value. Maybe then it is possible > one guest can write MMIO of another device.This is the figure of my idea. If mmcfg and interrupt remapping are supported: guest domain | stub domain ------------------+------------------------------------------ guest software -> | ioemu -> libpci(pcifront) -> mmcfg(subset) If mmcfg or interrupt remapping are not supported: guest domain | stub domain | domain 0 ------------------+------------------------------+--------------------- guest software -> | ioemu -> libpci(pcifront) -> | pciback -> mmcfg/cf8 * This is the same with current implementation. BAR is virtualized by ioemu. BAR value written by guest software isn''t passed to hardware. If stub domain is hijacked, it is possible to set invalid BAR value.> >> b) If there is no mmcfg support, to sync access to cf8/cfc will be > >> difficult. So you mean we have different implementation for > >> mmcfg/cf8 method? > > > > If there is no mmcfg support, I''d like to use existing > > mechanism (pciback in dom0 and pcifront in stub domain). > > > > If there is mmcfg support, I''d like to allow stub domain to access directly. > > I''m not sure how difference between these two implementation and if > we really want keep this implementation. Mostly I think it is ok > since it should not be on data path (Or any special device will do > that??) > But there is really one thing we need consider: The mask bit for > MSI/MSI-X. Because guest may try to mask/unmask the interrupt. Maybe > we need translate that operation to the mask/unmask of the virtual > interrupt.As mentioned above, my idea keeps pci configuration virtualization in ioemu in stub domain. So MSI mask bit in config space and MSI-X mask bits in memory space will work fine. Thanks, -- Shohei Fujiwara _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Espen Skoglund
2009-Jan-16 13:54 UTC
Re: [Xen-devel] Move some of the PCI device manage/control into pciback?
[Keir Fraser]> On 16/01/2009 06:18, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >>> I''d rather have all accesses mediated through pciback. I don''t >>> think PCI config accesses should be on any data path anyway, and >>> you''ve already taken the hit of trapping to qemu in that case. >> >> There is one exception: The mask bit for MSI/MSI-X. Maybe we need >> add some mechanism for HVM domain to mask/unmask the virtual >> interrupt directly, like what DomU did for evtchn. But that will be >> tricky.> Yes, that did occur to me. We already have plenty of special > emulation code for MSI/MSI-x. I guess we may explicitly > paravirtualise that aspect in a different way which would allow > ioemu to interact direct with Xen. Actually if mask/unmask happens > on every IRQ, we may need to push support for the PCI MSI registers > right down into Xen itself to get decent speed? Because going to > qemu with any great frequency is not very high performance.Last time I checked, current Linux code does not update the MSI/MSI-X mask bits frequently (as in on every IRQ). Doing so would require device interaction and could result in quite some overhead. I don''t know how other systems (e.g., Windows) handles the mask bits. I don''t think we need to optimize for frequent mask bit updates. Updates due to enabling/disabling interrupts, or masking due to interrupt storms would be ok to channel through a slower code path. Please do tell if the above assumption is wrong. eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Espen Skoglund
2009-Jan-16 14:19 UTC
Re: [Xen-devel] Move some of the PCI device manage/control into pciback?
[Shohei Fujiwara]> On Fri, 16 Jan 2009 14:16:08 +0800 > "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >> Shohei Fujiwara <mailto:fujiwara-sxa@necst.nec.co.jp> wrote: >>> My idea is to call XEN_DOMCTL_iomem_permission from domain 0. So >>> my idea doesn''t open a new hole. In addition to this, interrupt >>> remapping of VT-d can block invalid MSI. >> >> I suspect that feature is not enabled in all system. >> >> Also what will happen if guest try to change the BAR value? Will be >> passed to hardware also? I''m not sure what will happen if two >> device under the same bus has the same BAR value. Maybe then it is >> possible one guest can write MMIO of another device.> This is the figure of my idea.> If mmcfg and interrupt remapping are supported:> guest domain | stub domain > ------------------+------------------------------------------ > guest software -> | ioemu -> libpci(pcifront) -> mmcfg(subset)> If mmcfg or interrupt remapping are not supported:> guest domain | stub domain | domain 0 > ------------------+------------------------------+--------------------- > guest software -> | ioemu -> libpci(pcifront) -> | pciback -> mmcfg/cf8> * This is the same with current implementation.> BAR is virtualized by ioemu. BAR value written by guest software > isn''t passed to hardware.> If stub domain is hijacked, it is possible to set invalid BAR value.I still don''t understand what you''re trying to achieve by avoiding to go through pciback. As Keir said, PCI config accesses should not be taken on the data path. Config accesses should neither be required for regular device operation. It is afterall called "configuration space", not "control space". PCI config space acesses are kind of bound to have some overhead. For example, Itanium requires them to go through a SAL call. Is there a real problem you''re trying to solve by pushing this to the stub domain? Also, if this is to be handled in the stub domain I would very much like to be able to configure certain devices so that their config space acesses are still tunneled through pciback. eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Espen Skoglund
2009-Jan-16 14:41 UTC
Re: [Xen-devel] Move some of the PCI device manage/control into pciback?
[Shohei Fujiwara]> On Fri, 16 Jan 2009 11:26:10 +0800 > "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> Shohei, I think this model may have some issue. >> a) The stubdomain/qemu is not trustable, so user may use a fake stub >> domain and try to programe some sensitive config space (like MSI).> My idea is to call XEN_DOMCTL_iomem_permission from domain 0. So my > idea doesn''t open a new hole.> In addition to this, interrupt remapping of VT-d can block invalid > MSI.Except, the MSI entry must be programmed to deliver interrupts in a special remappable format. The stub domain can not be allowed to write arbitrary contents into the MSI entry. eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Jan-19 07:05 UTC
Re: [Xen-devel] Move some of the PCI device manage/control into pciback?
On Fri, 16 Jan 2009 14:19:12 +0000 Espen Skoglund <espen.skoglund@netronome.com> wrote:> [Shohei Fujiwara] > > On Fri, 16 Jan 2009 14:16:08 +0800 > > "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> Shohei Fujiwara <mailto:fujiwara-sxa@necst.nec.co.jp> wrote: > >>> My idea is to call XEN_DOMCTL_iomem_permission from domain 0. So > >>> my idea doesn''t open a new hole. In addition to this, interrupt > >>> remapping of VT-d can block invalid MSI. > >> > >> I suspect that feature is not enabled in all system. > >> > >> Also what will happen if guest try to change the BAR value? Will be > >> passed to hardware also? I''m not sure what will happen if two > >> device under the same bus has the same BAR value. Maybe then it is > >> possible one guest can write MMIO of another device. > > > This is the figure of my idea. > > > If mmcfg and interrupt remapping are supported: > > > guest domain | stub domain > > ------------------+------------------------------------------ > > guest software -> | ioemu -> libpci(pcifront) -> mmcfg(subset) > > > If mmcfg or interrupt remapping are not supported: > > > guest domain | stub domain | domain 0 > > ------------------+------------------------------+--------------------- > > guest software -> | ioemu -> libpci(pcifront) -> | pciback -> mmcfg/cf8 > > > * This is the same with current implementation. > > > BAR is virtualized by ioemu. BAR value written by guest software > > isn''t passed to hardware. > > > If stub domain is hijacked, it is possible to set invalid BAR value. > > > I still don''t understand what you''re trying to achieve by avoiding to > go through pciback. As Keir said, PCI config accesses should not be > taken on the data path. Config accesses should neither be required > for regular device operation. It is afterall called "configuration > space", not "control space". PCI config space acesses are kind of > bound to have some overhead. For example, Itanium requires them to go > through a SAL call.Domain 0 is SPOF(Single Point of Failure). If domain 0 panics, whole system stops. So, I''d like to remove the function from domain 0, if we can keep security. This reduces possibility of panic of domain 0. In the future, it is great if domain 0 can reboot while guest domain are working. This avoids SPOF. To achieve this, we have to solve many problems. In case of network, emulating link down during rebooting is needed. In case of PCI passthrough, it is difficult to block configuration access during rebooting. If stub domain can access to configuration space directly, we don''t need to block configuration access. What do you think?> Is there a real problem you''re trying to solve by pushing this to the > stub domain? Also, if this is to be handled in the stub domain I > would very much like to be able to configure certain devices so that > their config space acesses are still tunneled through pciback.There is no real problem of configuration. I also think config space access should work, if it is tunneled or not. Thanks, -- Shohei Fujiwara _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-19 08:30 UTC
Re: [Xen-devel] Move some of the PCI device manage/control into pciback?
On 19/01/2009 07:05, "Shohei Fujiwara" <fujiwara-sxa@necst.nec.co.jp> wrote:>> I still don''t understand what you''re trying to achieve by avoiding to >> go through pciback. As Keir said, PCI config accesses should not be >> taken on the data path. Config accesses should neither be required >> for regular device operation. It is afterall called "configuration >> space", not "control space". PCI config space acesses are kind of >> bound to have some overhead. For example, Itanium requires them to go >> through a SAL call. > > Domain 0 is SPOF(Single Point of Failure). If domain 0 panics, whole > system stops. So, I''d like to remove the function from domain 0, if we > can keep security. This reduces possibility of panic of domain 0. > > In the future, it is great if domain 0 can reboot while guest domain > are working. This avoids SPOF. > To achieve this, we have to solve many problems. In case > of network, emulating link down during rebooting is needed. In case of > PCI passthrough, it is difficult to block configuration access during > rebooting. If stub domain can access to configuration space directly, > we don''t need to block configuration access. > > What do you think?I think what you want to do sounds pretty hard. PCI accesses should definitely go through pciback by default. If you need other modes for more extensive rearchitecting you are doing, they belong in your dom0-can-reboot branch, or in the main tree as a configurable option. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Any further update on SR-IOV support for Xen? Are we going to include this feature soon? Thanks, -Wei _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Jan 19, 2009 at 11:13:37AM -0600, Wei Huang wrote:> Hi, > > Any further update on SR-IOV support for Xen? Are we going to include > this feature soon?I am also interested in this and would be willing to do some work on breaking out the patches to address Jan Beulich''s concerns[1] about accreditation. And to bring the patches up to date with the most recent Linux patches. The main problem that I see with the latter that 2.6.18.8''s PCI stack is now quite old, so using that as a target would be more work than for instance using the paravirt ops work that Jeremy Fitzhardinge has been working on. [1] http://lists.xensource.com/archives/html/xen-devel/2008-09/msg00965.html -- Simon Horman VA Linux Systems Japan K.K., Sydney, Australia Satellite Office H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Jan-20 06:08 UTC
RE: [Xen-devel] Move some of the PCI device manage/control into pciback?
Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> On 16/01/2009 06:18, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> I''d rather have all accesses mediated through pciback. I don''t think PCI >>> config accesses should be on any data path anyway, and you''ve already >>> taken the hit of trapping to qemu in that case. >> >> There is one exception: The mask bit for MSI/MSI-X. Maybe we need add some >> mechanism for HVM domain to mask/unmask the virtual interrupt directly, >> like what DomU did for evtchn. But that will be tricky. > > Yes, that did occur to me. We already have plenty of special emulation code > for MSI/MSI-x. I guess we may explicitly paravirtualise that > aspect in a > different way which would allow ioemu to interact direct with > Xen. Actually > if mask/unmask happens on every IRQ, we may need to push > support for the PCI > MSI registers right down into Xen itself to get decent speed? > Because going > to qemu with any great frequency is not very high performance.We plan to do this for MSI-X firstly, since currently qemu does not present mask support for MSI interrupt. And we do notice such issue for some OS (at least for those based on kernel 2.6.18).> > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Jan-20 09:26 UTC
Re: [Xen-devel] Move some of the PCI device manage/control into pciback?
On Mon, 19 Jan 2009 08:30:18 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote:> On 19/01/2009 07:05, "Shohei Fujiwara" <fujiwara-sxa@necst.nec.co.jp> wrote: > > >> I still don''t understand what you''re trying to achieve by avoiding to > >> go through pciback. As Keir said, PCI config accesses should not be > >> taken on the data path. Config accesses should neither be required > >> for regular device operation. It is afterall called "configuration > >> space", not "control space". PCI config space acesses are kind of > >> bound to have some overhead. For example, Itanium requires them to go > >> through a SAL call. > > > > Domain 0 is SPOF(Single Point of Failure). If domain 0 panics, whole > > system stops. So, I''d like to remove the function from domain 0, if we > > can keep security. This reduces possibility of panic of domain 0. > > > > In the future, it is great if domain 0 can reboot while guest domain > > are working. This avoids SPOF. > > To achieve this, we have to solve many problems. In case > > of network, emulating link down during rebooting is needed. In case of > > PCI passthrough, it is difficult to block configuration access during > > rebooting. If stub domain can access to configuration space directly, > > we don''t need to block configuration access. > > > > What do you think? > > I think what you want to do sounds pretty hard. PCI accesses should > definitely go through pciback by default. If you need other modes for more > extensive rearchitecting you are doing, they belong in your dom0-can-reboot > branch, or in the main tree as a configurable option.I understand PCI accesses should go through pciback by default. Direct access to MMCFG from stub domain should be configurable. I''d like to keep developing in the main tree while it is possible. For now, I am trying to enable PCI passthrough with stub domain, keeping it de-privileged. I hope new patch can be applied to main tree, because it will be useful for other developers and users. Thanks, -- Shohei Fujiwara _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
It is better not to duplicate the efforts if Yu Zhao is working on it. Otherwise, I would be interested on it too. -Wei Simon Horman wrote:> On Mon, Jan 19, 2009 at 11:13:37AM -0600, Wei Huang wrote: > > Hi, > > > > Any further update on SR-IOV support for Xen? Are we going to include > > this feature soon? > > I am also interested in this and would be willing to do some work > on breaking out the patches to address Jan Beulich''s concerns[1] > about accreditation. And to bring the patches up to date with the most > recent Linux patches. The main problem that I see with the latter > that 2.6.18.8''s PCI stack is now quite old, so using that as > a target would be more work than for instance using the paravirt ops > work that Jeremy Fitzhardinge has been working on. > > [1] http://lists.xensource.com/archives/html/xen-devel/2008-09/msg00965.html > > -- > Simon Horman > VA Linux Systems Japan K.K., Sydney, Australia Satellite Office > H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I''ll respin those patches and resubmit them later. One thing is that SR-IOV depends on MSI/MSI-x but they are disabled as Keir said in thread ''x86: Disable MSI as it seems to be triggering ASSERT at irq.c:269''. Wei Huang wrote:> It is better not to duplicate the efforts if Yu Zhao is working on it. > Otherwise, I would be interested on it too. > > -Wei > > Simon Horman wrote: >> On Mon, Jan 19, 2009 at 11:13:37AM -0600, Wei Huang wrote: >> > Hi, >> > >> > Any further update on SR-IOV support for Xen? Are we going to include >> > this feature soon? >> >> I am also interested in this and would be willing to do some work >> on breaking out the patches to address Jan Beulich''s concerns[1] >> about accreditation. And to bring the patches up to date with the most >> recent Linux patches. The main problem that I see with the latter >> that 2.6.18.8''s PCI stack is now quite old, so using that as >> a target would be more work than for instance using the paravirt ops >> work that Jeremy Fitzhardinge has been working on. >> >> [1] http://lists.xensource.com/archives/html/xen-devel/2008-09/msg00965.html >> >> -- >> Simon Horman >> VA Linux Systems Japan K.K., Sydney, Australia Satellite Office >> H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en >> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sorry, I wasn''t as clear as I should have been. I was offering to help Yu. On Tue, Jan 20, 2009 at 08:43:29AM -0600, Wei Huang wrote:> It is better not to duplicate the efforts if Yu Zhao is working on it. > Otherwise, I would be interested on it too. > > -Wei > > Simon Horman wrote: >> On Mon, Jan 19, 2009 at 11:13:37AM -0600, Wei Huang wrote: >> > Hi, >> > >> > Any further update on SR-IOV support for Xen? Are we going to >> include > this feature soon? >> >> I am also interested in this and would be willing to do some work >> on breaking out the patches to address Jan Beulich''s concerns[1] >> about accreditation. And to bring the patches up to date with the most >> recent Linux patches. The main problem that I see with the latter >> that 2.6.18.8''s PCI stack is now quite old, so using that as >> a target would be more work than for instance using the paravirt ops >> work that Jeremy Fitzhardinge has been working on. >> >> [1] http://lists.xensource.com/archives/html/xen-devel/2008-09/msg00965.html >> >> -- >> Simon Horman >> VA Linux Systems Japan K.K., Sydney, Australia Satellite Office >> H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en >> >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Simon Horman VA Linux Systems Japan K.K., Sydney, Australia Satellite Office H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hopefully I will have MSI support fixed and check in by tomorrow. The issue was not with Intel''s locking changes to pci/msi, but with Jan Beulich''s changes to ACKTYPE_NONE/ACKTYPE_EOI handling of MSIs. I''m just doing some testing of my fix now. -- Keir On 20/01/2009 15:20, "Zhao, Yu" <yu.zhao@intel.com> wrote:> I''ll respin those patches and resubmit them later. One thing is that > SR-IOV depends on MSI/MSI-x but they are disabled as Keir said in thread > ''x86: Disable MSI as it seems to be triggering ASSERT at irq.c:269''. > > Wei Huang wrote: >> It is better not to duplicate the efforts if Yu Zhao is working on it. >> Otherwise, I would be interested on it too. >> >> -Wei >> >> Simon Horman wrote: >>> On Mon, Jan 19, 2009 at 11:13:37AM -0600, Wei Huang wrote: >>>> Hi, >>>> >>>> Any further update on SR-IOV support for Xen? Are we going to include >>>> this feature soon? >>> >>> I am also interested in this and would be willing to do some work >>> on breaking out the patches to address Jan Beulich''s concerns[1] >>> about accreditation. And to bring the patches up to date with the most >>> recent Linux patches. The main problem that I see with the latter >>> that 2.6.18.8''s PCI stack is now quite old, so using that as >>> a target would be more work than for instance using the paravirt ops >>> work that Jeremy Fitzhardinge has been working on. >>> >>> [1] http://lists.xensource.com/archives/html/xen-devel/2008-09/msg00965.html >>> >>> -- >>> Simon Horman >>> VA Linux Systems Japan K.K., Sydney, Australia Satellite Office >>> H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en >>> >>> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel