Hi, If we assign a device to HVM guest, HVM guest can do directry DMA transfer. But how about accessing pci configuration register of the pass through device? Do we still access pci configuration register through qemu? Best Regards, Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
pradeep singh rautela
2008-Sep-11 11:00 UTC
Re: [Xen-devel] [Q] Is qemu used when we use VTd?
On Thu, Sep 11, 2008 at 4:16 PM, Akio Takebe <takebe_akio@jp.fujitsu.com> wrote:> Hi, > > If we assign a device to HVM guest, > HVM guest can do directry DMA transfer. > But how about accessing pci configuration register > of the pass through device? Do we still access > pci configuration register through qemu?AFAIK yes. Keir, Eddie and others are best to answer this I guess. Thanks, --Pradeep> > Best Regards, > > Akio Takebe > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >-- Pradeep Singh Rautela http://eagain.wordpress.com http://emptydomain.googlepages.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Akio Takebe >Sent: 2008年9月11日 18:46 > >Hi, > >If we assign a device to HVM guest, >HVM guest can do directry DMA transfer. >But how about accessing pci configuration register >of the pass through device? Do we still access >pci configuration register through qemu? >Yes, even when we say that given on PCI config field can be directly accessed by HVM guest, it''s still trapped to Qemu for route. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Thank you. And mmio is traped by hypervisor, then hypervisor do mmio instead of HVM guest, Right? Best Regards, Akio Takebe Tian, Kevin wrote:>> From: Akio Takebe >> Sent: 2008年9月11日 18:46 >> >> Hi, >> >> If we assign a device to HVM guest, >> HVM guest can do directry DMA transfer. >> But how about accessing pci configuration register >> of the pass through device? Do we still access >> pci configuration register through qemu? >> > > Yes, even when we say that given on PCI config field > can be directly accessed by HVM guest, it''s still trapped > to Qemu for route. > > Thanks, > Kevin >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Sep 11, 2008 at 11:46 AM, Akio Takebe <takebe_akio@jp.fujitsu.com> wrote:> Hi, > > If we assign a device to HVM guest, > HVM guest can do directry DMA transfer. > But how about accessing pci configuration register > of the pass through device? Do we still access > pci configuration register through qemu? >Yes, you should have a look at the function pt_pci_read_config in ioemu/hw/pass-through.c. Regards, -- Jean Guyader _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jean Guyader wrote:> On Thu, Sep 11, 2008 at 11:46 AM, Akio Takebe > <takebe_akio@jp.fujitsu.com> wrote: >> Hi, >> >> If we assign a device to HVM guest, >> HVM guest can do directry DMA transfer. >> But how about accessing pci configuration register >> of the pass through device? Do we still access >> pci configuration register through qemu? >> > > Yes, you should have a look at the function pt_pci_read_config in > ioemu/hw/pass-through.c. >Thanks. HVM guest use PCI BIOS of rombios at access pci configuration register, and the accesses are trapped by qemu. qemu do their access instead, right? Can we directly use MMCONFIG for PCI-Express card pass-throughed by VT-d on HVM? Best Regards, Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Akio Takebe [mailto:takebe_akio@jp.fujitsu.com] >Sent: 2008年9月11日 19:07 > >Hi, > >Thank you. > >And mmio is traped by hypervisor, >then hypervisor do mmio instead of HVM guest, Right? >Which mmio are you actually talking? If you mean MMIO ranges in PCI BARs of passthrough device, they can be directly accessed by HVM with p2m entry setup accord- ingly, if MMIO range falls in 4k page granularity. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Akio Takebe >Sent: 2008年9月11日 19:48 > >Can we directly use MMCONFIG for PCI-Express card >pass-throughed by VT-d on HVM? >You can''t. MMCONFIG in no sense vary from old I/O port style, as just a means to access PCI configuration space. Note that PCI configuration space access has to be trapped as it determines which devices, and also config space content to be observed by HVM guest. Normally you''d need some virtualization on config space content, such as BARs. Thus even by presenting MMCONFIG capability to HVM guest, you still need trap those access falling in that range. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Kevin Tian, Kevin wrote:>> From: Akio Takebe [mailto:takebe_akio@jp.fujitsu.com] >> Sent: 2008年9月11日 19:07 >> >> Hi, >> >> Thank you. >> >> And mmio is traped by hypervisor, >> then hypervisor do mmio instead of HVM guest, Right? >> > > Which mmio are you actually talking? If you mean MMIO > ranges in PCI BARs of passthrough device, they can be > directly accessed by HVM with p2m entry setup accord- > ingly, if MMIO range falls in 4k page granularity. >Thank you very much. I talked about MMIO ranges in PCI BARs of pass through device using VT-d. Does hypervisor trap only MMIO by APIC access? Best Reards, Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Akio Takebe [mailto:takebe_akio@jp.fujitsu.com] >Sent: 2008年9月12日 11:00 > >Tian, Kevin wrote: >>> From: Akio Takebe [mailto:takebe_akio@jp.fujitsu.com] >>> Sent: 2008年9月11日 19:07 >>> >>> Hi, >>> >>> Thank you. >>> >>> And mmio is traped by hypervisor, >>> then hypervisor do mmio instead of HVM guest, Right? >>> >> >> Which mmio are you actually talking? If you mean MMIO >> ranges in PCI BARs of passthrough device, they can be >> directly accessed by HVM with p2m entry setup accord- >> ingly, if MMIO range falls in 4k page granularity. >> >Thank you very much. I talked about MMIO ranges >in PCI BARs of pass through device using VT-d. > >Does hypervisor trap only MMIO by APIC access? >Hypervisor traps all MMIOs which, if passthrough to HVM, would generate unexpected result. In other word, you may simply consider all MMIOs outside of passthrough devices, including MMIOs of emulated devices, MMIOs of emulated chispet, and MMIOs from emulated cpus, etc. are trapped by hypervisor for virtualization. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin wrote:>> From: Akio Takebe >> Sent: 2008年9月11日 19:48 >> >> Can we directly use MMCONFIG for PCI-Express card >> pass-throughed by VT-d on HVM? >> > > You can''t. MMCONFIG in no sense vary from old I/O port > style, as just a means to access PCI configuration space. > Note that PCI configuration space access has to be trapped > as it determines which devices, and also config space > content to be observed by HVM guest. Normally you''d need > some virtualization on config space content, such as BARs. > Thus even by presenting MMCONFIG capability to HVM > guest, you still need trap those access falling in that range.I suspect MMCONFIG is needed to access Extended PCI configuration space of PCI-Express. Probably PCI-Express card can work without those access. But we can''t use full features of PCI-Express cards, right? Best Regards, Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin wrote:>> From: Akio Takebe [mailto:takebe_akio@jp.fujitsu.com] >> Sent: 2008年9月12日 11:00 >> >> Tian, Kevin wrote: >>>> From: Akio Takebe [mailto:takebe_akio@jp.fujitsu.com] >>>> Sent: 2008年9月11日 19:07 >>>> >>>> Hi, >>>> >>>> Thank you. >>>> >>>> And mmio is traped by hypervisor, >>>> then hypervisor do mmio instead of HVM guest, Right? >>>> >>> Which mmio are you actually talking? If you mean MMIO >>> ranges in PCI BARs of passthrough device, they can be >>> directly accessed by HVM with p2m entry setup accord- >>> ingly, if MMIO range falls in 4k page granularity. >>> >> Thank you very much. I talked about MMIO ranges >> in PCI BARs of pass through device using VT-d. >> >> Does hypervisor trap only MMIO by APIC access? >> > > Hypervisor traps all MMIOs which, if passthrough to HVM, > would generate unexpected result. In other word, you may > simply consider all MMIOs outside of passthrough devices, > including MMIOs of emulated devices, MMIOs of emulated > chispet, and MMIOs from emulated cpus, etc. are trapped > by hypervisor for virtualization. :-) >Sorry, I asked only about pass through devices of HVM. Thank you for your infomation. Best Regards Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I suspect MMCONFIG is needed to access Extended PCI > configuration space of PCI-Express. Probably PCI-Express card > can work without those access. > But we can''t use full features of PCI-Express cards, right?Yes. For example, guest software can''t use AER (Advanced Error Reporting). To use full features of PCI-Express, we need to implement Root Port emulator into qemu, in addition to MMCONFIG. Thanks, -- Yuji Shimada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Yuji Yuji Shimada wrote:>> I suspect MMCONFIG is needed to access Extended PCI >> configuration space of PCI-Express. Probably PCI-Express card >> can work without those access. >> But we can''t use full features of PCI-Express cards, right? > > Yes. For example, guest software can''t use AER (Advanced Error > Reporting). > > To use full features of PCI-Express, we need to implement Root Port > emulator into qemu, in addition to MMCONFIG. >Thank you very much. Best Regards, Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuji Shimada, how do you think to restrict the AER to host side, i.e. when a uncorrectable error happen, we can kill the guest and FLR the device, to avoid the whole system be desotried, at least for first step. As to PCI-E feature in guest side, I think it maybe complex, especially I''m not sure how to handle some PCI-E featue that requires operation for the whole hierarchy. For example, the VT/TC may requires changes to the switch and root port''s setting. The AER handling in Linux may require reset link from root port. IMO, it is complex to hanle such feature. Thanks Yunhong Jiang xen-devel-bounces@lists.xensource.com <> wrote:>> I suspect MMCONFIG is needed to access Extended PCI >> configuration space of PCI-Express. Probably PCI-Express card >> can work without those access. >> But we can''t use full features of PCI-Express cards, right? > > Yes. For example, guest software can''t use AER (Advanced Error Reporting). > > To use full features of PCI-Express, we need to implement Root Port > emulator into qemu, in addition to MMCONFIG. > > Thanks, > > -- > Yuji Shimada > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, 22 Sep 2008 21:47:11 +0800 "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> Yuji Shimada, how do you think to restrict the AER to host side, > i.e. when a uncorrectable error happen, we can kill the guest and > FLR the device, to avoid the whole system be desotried, at least for > first step.I think restricting the AER to host side is a good idea for first step, because implementation will be simple. By the way, Can we recover error condition by only FLR? Resetting link from root port is needed on some error, isn''t it?> As to PCI-E feature in guest side, I think it maybe complex, > especially I''m not sure how to handle some PCI-E featue that > requires operation for the whole hierarchy. For example, the VT/TC > may requires changes to the switch and root port''s setting. The AER > handling in Linux may require reset link from root port. IMO, it is > complex to hanle such feature.I agree with you that implementing full PCI-E future in guest side will be complex. I don''t think VT/TC in guest side is needed. But, AER in guest side is required in the long term, because guest OS will be able to handle AER and recover error condition. Thanks, -- Yuji Shimada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuji Shimada <mailto:shimada-yxb@necst.nec.co.jp> wrote:> On Mon, 22 Sep 2008 21:47:11 +0800 > "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> Yuji Shimada, how do you think to restrict the AER to host side, >> i.e. when a uncorrectable error happen, we can kill the guest and >> FLR the device, to avoid the whole system be desotried, at least for >> first step. > > I think restricting the AER to host side is a good idea for first > step, because implementation will be simple. > > By the way, Can we recover error condition by only FLR? Resetting link > from root port is needed on some error, isn''t it?Yes, root port link reset is needed for host side. I mean FLR is just for guest specific. what I''m considering is add error handling to pciback, so that when host reset the hierarchy, the pciback''s error handler will be invoked and notifiy control panel. But I''m not sure still if there are any mechanism exists for the notification (otherwise, we need xen special mechanism). Also not sure if the long latency is acceptable for error handling, especially it may finished after reset link.> >> As to PCI-E feature in guest side, I think it maybe complex, >> especially I''m not sure how to handle some PCI-E featue that >> requires operation for the whole hierarchy. For example, the VT/TC >> may requires changes to the switch and root port''s setting. The AER >> handling in Linux may require reset link from root port. IMO, it is >> complex to hanle such feature. > > I agree with you that implementing full PCI-E future in guest side > will be complex. I don''t think VT/TC in guest side is needed. But, AERI remember I saw a doc that Windows has VC/TC support for HD Audio, although not sure how is implemented. Is VC/TC needed for communication usage?> in guest side is required in the long term, because guest OS will be > able to handle AER and recover error condition.Yes, agree that if guest can do AER, it will enahnce reliability and availability. But more elegant design is needed. For example, if guest decide that the AER need root port reset link (switch link reset should be ok unless SR-IOV), what shall host do? If host act according to guest''s suggestion, that may not be safe, I suspect. BTW, do you know what will recover action usually be? I didn''t find much document on it, and the PCI-E spec didn''t give much clue either. Thanks Yunhong Jiang> > Thanks, > > -- > Yuji Shimada_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 24 Sep 2008 16:57:21 +0800 "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> > By the way, Can we recover error condition by only FLR? Resetting link > > from root port is needed on some error, isn''t it? > > Yes, root port link reset is needed for host side. I mean FLR is > just for guest specific. > what I''m considering is add error handling to pciback, so that when > host reset the hierarchy, the pciback''s error handler will be > invoked and notifiy control panel. But I''m not sure still if there > are any mechanism exists for the notification (otherwise, we need > xen special mechanism).We can use "error_detected" interface between AER driver and pciback driver, can''t we? Actually, there is no AER driver in linux-2.6.18-xen.hg. We have to wait to merge dom0 function into upstream linux. The interface between pciback and xend is xen''s special mechanism.> Also not sure if the long latency is > acceptable for error handling, especially it may finished after > reset link.I''m not sure too.> > I agree with you that implementing full PCI-E future in guest side > > will be complex. I don''t think VT/TC in guest side is needed. But, AER > > I remember I saw a doc that Windows has VC/TC support for HD Audio, > although not sure how is implemented. Is VC/TC needed for communication > usage?I do NOT think VT/TC in guest side is needed.> > in guest side is required in the long term, because guest OS will be > > able to handle AER and recover error condition. > > Yes, agree that if guest can do AER, it will enahnce reliability and > availability. But more elegant design is needed. For example, if > guest decide that the AER need root port reset link (switch link > reset should be ok unless SR-IOV), what shall host do? If host act > according to guest''s suggestion, that may not be safe, I suspect.I agree with you. Host should NOT act according to guest''s suggestion. I think host should recover error condition with dom0 linux''s AER driver. AER emulation for guest is needed to make guest survive.> BTW, do you know what will recover action usually be? I didn''t find > much document on it, and the PCI-E spec didn''t give much clue > either.Linux''s AER driver will help us to understand recover action. Following function is the main logic. drivers/pci/pcie/aer/aerdrv_core.c:do_recovery Thanks, -- Yuji Shimada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xen-devel-bounces@lists.xensource.com <> wrote:> On Wed, 24 Sep 2008 16:57:21 +0800 > "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> By the way, Can we recover error condition by only FLR? Resetting link >>> from root port is needed on some error, isn''t it? >> >> Yes, root port link reset is needed for host side. I mean FLR is just for >> guest specific. what I''m considering is add error handling to pciback, so >> that when host reset the hierarchy, the pciback''s error handler will be >> invoked and notifiy control panel. But I''m not sure still if there >> are any mechanism exists for the notification (otherwise, we need >> xen special mechanism). > > We can use "error_detected" interface between AER driver and pciback > driver, can''t we? Actually, there is no AER driver in > linux-2.6.18-xen.hg. We have to wait to merge dom0 function into upstream > linux.Yes, we try to use the PCI error recovery mechanism when internal discussion. To merge AER driver into dom0 is simple since AER driver merged in 2.6.18 also. Ke Liping did some experimental before and there is no conflict at all. But I''m not sure if the backport can be accepted by upstream Xen. What I considered is, for PV domain, the pciback can act as a stub/proxy, pass the callback from AER to guest side and wait guest''s return, like PCI_ERS_RESULT_NEED_RESET etc. I didn;t find much issue to this method, except some guard on pciback to make sure no timeout and the feedback is valid. Also some mechanism needed from pciback to notify pcifront (currently only request from pcifront to pciback per my understanding). But for HVM domain, maybe we can''t support it unless we have virtual AER support in virtual HVM platform. Even if we have virtual HVM platform, it is much complex to translat the physical AER to guest side, and parse guest side''s action to decide how to act on host side. We are still consider this. Do you have any idea on it? Also another point is, have you consider how to handle multi-function device that assigned to multple domain, and one function has error? Or devices under the same switch assigned to different domain??> > The interface between pciback and xend is xen''s special mechanism. > > >> Also not sure if the long latency is >> acceptable for error handling, especially it may finished after >> reset link. > > I''m not sure too.Yes, that is one point we need investigate. From the document, the error_corrected callback can do anything including schedule, but access device, so seems ok, but we need verify that has no side effect.> > >>> I agree with you that implementing full PCI-E future in guest side >>> will be complex. I don''t think VT/TC in guest side is needed. But, AER >> >> I remember I saw a doc that Windows has VC/TC support for HD Audio, >> although not sure how is implemented. Is VC/TC needed for communication >> usage? > > I do NOT think VT/TC in guest side is needed. > > >>> in guest side is required in the long term, because guest OS will be >>> able to handle AER and recover error condition. >> >> Yes, agree that if guest can do AER, it will enahnce reliability and >> availability. But more elegant design is needed. For example, if >> guest decide that the AER need root port reset link (switch link >> reset should be ok unless SR-IOV), what shall host do? If host act >> according to guest''s suggestion, that may not be safe, I suspect. > > I agree with you. Host should NOT act according to guest''s > suggestion. I think host should recover error condition with dom0 > linux''s AER driver. AER emulation for guest is needed to make guest survive.Have you considered implement just a virtual root port in qemu, not the whoel RC? Not sure if any effort/function difference between these two method.> > >> BTW, do you know what will recover action usually be? I didn''t find >> much document on it, and the PCI-E spec didn''t give much clue >> either. > > Linux''s AER driver will help us to understand recover > action. Following function is the main logic. > > drivers/pci/pcie/aer/aerdrv_core.c:do_recoveryI''m not sure if reset slot is sure to always resolve the issue, I have never meet AER on my platform :(> > Thanks, > > -- > Yuji Shimada > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 26 Sep 2008 12:36:21 +0800 "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> What I considered is, for PV domain, the pciback can act as a > stub/proxy, pass the callback from AER to guest side and wait > guest''s return, like PCI_ERS_RESULT_NEED_RESET etc. I didn;t find > much issue to this method, except some guard on pciback to make sure > no timeout and the feedback is valid. Also some mechanism needed > from pciback to notify pcifront (currently only request from > pcifront to pciback per my understanding). > > But for HVM domain, maybe we can''t support it unless we have virtual > AER support in virtual HVM platform. Even if we have virtual HVM > platform, it is much complex to translat the physical AER to guest > side, and parse guest side''s action to decide how to act on host > side. We are still consider this. Do you have any idea on it? > > Also another point is, have you consider how to handle > multi-function device that assigned to multple domain, and one > function has error? Or devices under the same switch assigned to > different domain??We have to solve many difficulties to keep guest domain running. How about following idea for first step? Non-fatal error on I/O device: - kill the domain with error source function. - reset the function. Non-fatal error on PCI-PCI bridge. - kill all domains with the functions under the PCI-PCI bridge. - reset PCI-PCI bridge and secondary bus. Fatal error: - kill all domains with the functions under the same root port. - reset the link (secondary bus reset on root port). Note: we have to consider to prevent device from destroying other domain''s memory.> >>> in guest side is required in the long term, because guest OS will be > >>> able to handle AER and recover error condition. > >> > >> Yes, agree that if guest can do AER, it will enahnce reliability and > >> availability. But more elegant design is needed. For example, if > >> guest decide that the AER need root port reset link (switch link > >> reset should be ok unless SR-IOV), what shall host do? If host act > >> according to guest''s suggestion, that may not be safe, I suspect. > > > > I agree with you. Host should NOT act according to guest''s > > suggestion. I think host should recover error condition with dom0 > > linux''s AER driver. AER emulation for guest is needed to make guest survive. > > Have you considered implement just a virtual root port in qemu, not > the whoel RC? Not sure if any effort/function difference between > these two method.I think it is good to implement just a virtual root port in qemu. The reasons are followings. - OS does not control chipset-specific device in RC much. We don''t need to provide chipset-specific device to guest OS. - Firmware control chipset-specific device in RC. But guest firmware is xen-specific. We don''t need to provide chipset-specific device to guest firmware. Note: for first step, we don''t need to implement virtual root port in qemu, because we kill guest domain. Thanks, -- Yuji Shimada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2008-Oct-06 02:28 UTC
RE: [Xen-devel] [Q] Device error handling discussion -- Was: Is qemu used when we use VTd?
Yuji Shimada <mailto:shimada-yxb@necst.nec.co.jp> wrote:> On Fri, 26 Sep 2008 12:36:21 +0800 > "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:I changed the subject to reflect what''s discussed.> We have to solve many difficulties to keep guest domain running. > > How about following idea for first step?Yes, agree.> > Non-fatal error on I/O device: > - kill the domain with error source function. > - reset the function.>From following staement in PCI-E 2.0 section 6.6.2: "Note that Port state machines associated with Link functionality including those in the Physical and Data Link Layers are not reset by FLR", I''m not sure if FLR is a right method to handle the error situation. That''s the reason I asked on how to handle multiple-function devices.> Non-fatal error on PCI-PCI bridge. > - kill all domains with the functions under the PCI-PCI bridge. > - reset PCI-PCI bridge and secondary bus. > > Fatal error: > - kill all domains with the functions under the same root port. > - reset the link (secondary bus reset on root port).Agree. Basically I think the action of "reset PCI-PCI bridge and secondary bus" or "reset the link" has been done by AER core already. What we need define is PCI back''s error handler. In first step, the error handler will trigger domain reset, in future, more elegant action can be defined/implemented, Any idea?> > Note: we have to consider to prevent device from destroying other domain''s > memory.Why should we consider destroy other domain''s memory? I think VT-d should gurantee this.> >>>>> in guest side is required in the long term, because guest OS will be >>>>> able to handle AER and recover error condition. >>>> >>>> Yes, agree that if guest can do AER, it will enahnce reliability and >>>> availability. But more elegant design is needed. For example, if >>>> guest decide that the AER need root port reset link (switch link >>>> reset should be ok unless SR-IOV), what shall host do? If host act >>>> according to guest''s suggestion, that may not be safe, I suspect. >>> >>> I agree with you. Host should NOT act according to guest''s >>> suggestion. I think host should recover error condition with dom0 >>> linux''s AER driver. AER emulation for guest is needed to make guest >>> survive. >> >> Have you considered implement just a virtual root port in qemu, not >> the whoel RC? Not sure if any effort/function difference between >> these two method. > > I think it is good to implement just a virtual root port in qemu. The > reasons are followings. > > - OS does not control chipset-specific device in RC much. We don''t > need to provide chipset-specific device to guest OS. > > - Firmware control chipset-specific device in RC. But guest firmware > is xen-specific. We don''t need to provide chipset-specific device to > guest firmware. > > Note: for first step, we don''t need to implement virtual root port in > qemu, because we kill guest domain.Agree.> > Thanks, > > -- > Yuji Shimada_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuji Shimada
2008-Oct-14 08:33 UTC
Re: [Xen-devel] [Q] Device error handling discussion -- Was: Is qemu used when we use VTd?
On Mon, 6 Oct 2008 10:28:26 +0800 "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> Yuji Shimada <mailto:shimada-yxb@necst.nec.co.jp> wrote: > > On Fri, 26 Sep 2008 12:36:21 +0800 > > "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > > I changed the subject to reflect what''s discussed. > > > We have to solve many difficulties to keep guest domain running. > > > > How about following idea for first step? > > Yes, agree. > > > > > Non-fatal error on I/O device: > > - kill the domain with error source function. > > - reset the function. > > >From following staement in PCI-E 2.0 section 6.6.2: "Note that Port > state machines associated with Link functionality including those > in the Physical and Data Link Layers are not reset by FLR", I''m not > sure if FLR is a right method to handle the error situation. That''s > the reason I asked on how to handle multiple-function devices.I think Non-fatal error is transaction''s error and it does not require to reset lower layer. But I am not sure.> > Non-fatal error on PCI-PCI bridge. > > - kill all domains with the functions under the PCI-PCI bridge. > > - reset PCI-PCI bridge and secondary bus. > > > > Fatal error: > > - kill all domains with the functions under the same root port. > > - reset the link (secondary bus reset on root port). > > Agree. Basically I think the action of "reset PCI-PCI bridge and > secondary bus" or "reset the link" has been done by AER core > already. What we need define is PCI back''s error handler. In first > step, the error handler will trigger domain reset, in future, more > elegant action can be defined/implemented, Any idea?I agree with you basically. Current AER core does not reset PCI-PCI bridge and secondary bus, when Non-fatal error occurs on PCI-PCI bridge. We need to implement resetting PCI-PCI bridge and secondary bus.> > > > Note: we have to consider to prevent device from destroying other domain''s > > memory. > > Why should we consider destroy other domain''s memory? I think VT-d > should gurantee this.The device is re-assigned to dom0 on destroying HVM domain. If we destroy domain before resetting the device, I/O device can write memory of dom0. On the other hand, we have to stop guest software before resetting the device to prevent guest software from accessing device. By the way, do you have any plan to implement these function? I can provide the idea. But I can''t provide the code. Thanks, -- Yuji Shimada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2008-Oct-16 07:32 UTC
RE: [Xen-devel] [Q] Device error handling discussion -- Was: Is qemu used when we use VTd?
Yuji Shimada <mailto:shimada-yxb@necst.nec.co.jp> wrote:> On Mon, 6 Oct 2008 10:28:26 +0800 > "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> Yuji Shimada <mailto:shimada-yxb@necst.nec.co.jp> wrote: >>> On Fri, 26 Sep 2008 12:36:21 +0800 >>> "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >> >> I changed the subject to reflect what''s discussed. >> >>> We have to solve many difficulties to keep guest domain running. >>> >>> How about following idea for first step? >> >> Yes, agree. >> >>> >>> Non-fatal error on I/O device: >>> - kill the domain with error source function. >>> - reset the function. >> >>> From following staement in PCI-E 2.0 section 6.6.2: "Note that Port >> state machines associated with Link functionality including those >> in the Physical and Data Link Layers are not reset by FLR", I''m not >> sure if FLR is a right method to handle the error situation. That''s >> the reason I asked on how to handle multiple-function devices. > > I think Non-fatal error is transaction''s error and it does not require > to reset lower layer. But I am not sure.By default, the data link layer''s error is fatal, but the result depends on how driver setup it. We can trap the access to AER register, and make sure data link layer error always report as fatal. That is easy to implement.> >>> Non-fatal error on PCI-PCI bridge. >>> - kill all domains with the functions under the PCI-PCI bridge. >>> - reset PCI-PCI bridge and secondary bus. >>> >>> Fatal error: >>> - kill all domains with the functions under the same root port. >>> - reset the link (secondary bus reset on root port). >> >> Agree. Basically I think the action of "reset PCI-PCI bridge and >> secondary bus" or "reset the link" has been done by AER core >> already. What we need define is PCI back''s error handler. In first >> step, the error handler will trigger domain reset, in future, more >> elegant action can be defined/implemented, Any idea? > > I agree with you basically. > > Current AER core does not reset PCI-PCI bridge and secondary bus, > when Non-fatal error occurs on PCI-PCI bridge. We need to implement > resetting PCI-PCI bridge and secondary bus.I''d keep the AER core as current-is unless some special reason. For example, why should we kill all domains under same root port and reset root port''s secondary link? Currently it will do so only if the impacted device has no aer service register. Also not sure if we need reset the link for non-fatal error if AER core does not do that. Are there any special difference between virtualization/native situation?> >>> >>> Note: we have to consider to prevent device from destroying other domain''s >>> memory. >> >> Why should we consider destroy other domain''s memory? I think VT-d >> should gurantee this. > > The device is re-assigned to dom0 on destroying HVM domain. If we > destroy domain before resetting the device, I/O device can write > memory of dom0. On the other hand, we have to stop guest software > before resetting the device to prevent guest software from accessing device.That should same to normal VT-d situation. We need FLR before we re-assign device to dom0 (If current not working like this, it should be a bug). Also, to stop guest software before resetting the device maybe helpful, but maybe not so important. Do you think guest''s second access will cause host impacted? After all, even on native environment this is guranted unless platform support it. (It is said PPC has such support). BTW, you stated "We have to solve many difficulties to keep guest domain running", can you give some detail difficulties (it maybe difficult to HVM, but not sure for PV side)?> > > By the way, do you have any plan to implement these function? > I can provide the idea. But I can''t provide the code.Yes, we try to work on it. But we may have not enough environment to test all types of error. Also although the AER code can be backported easily, some required ACPI fix is more challenge.> > Thanks, > -- > Yuji Shimada_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuji Shimada
2008-Oct-20 07:13 UTC
Re: [Xen-devel] [Q] Device error handling discussion -- Was: Is qemu used when we use VTd?
On Thu, 16 Oct 2008 15:32:40 +0800 "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> >>> Non-fatal error on I/O device: > >>> - kill the domain with error source function. > >>> - reset the function. > >> > >>> From following staement in PCI-E 2.0 section 6.6.2: "Note that Port > >> state machines associated with Link functionality including those > >> in the Physical and Data Link Layers are not reset by FLR", I''m not > >> sure if FLR is a right method to handle the error situation. That''s > >> the reason I asked on how to handle multiple-function devices. > > > > I think Non-fatal error is transaction''s error and it does not require > > to reset lower layer. But I am not sure. > > By default, the data link layer''s error is fatal, but the result > depends on how driver setup it. We can trap the access to AER > register, and make sure data link layer error always report as > fatal. That is easy to implement.It means non-fatal error is transaction layer''s error, with default setting. When non-fatal error occurs on I/O device, FLR seems to recover it.> > > >>> Non-fatal error on PCI-PCI bridge. > >>> - kill all domains with the functions under the PCI-PCI bridge. > >>> - reset PCI-PCI bridge and secondary bus. > >>> > >>> Fatal error: > >>> - kill all domains with the functions under the same root port. > >>> - reset the link (secondary bus reset on root port). > >> > >> Agree. Basically I think the action of "reset PCI-PCI bridge and > >> secondary bus" or "reset the link" has been done by AER core > >> already. What we need define is PCI back''s error handler. In first > >> step, the error handler will trigger domain reset, in future, more > >> elegant action can be defined/implemented, Any idea? > > > > I agree with you basically. > > > > Current AER core does not reset PCI-PCI bridge and secondary bus, > > when Non-fatal error occurs on PCI-PCI bridge. We need to implement > > resetting PCI-PCI bridge and secondary bus. > > I''d keep the AER core as current-is unless some special reason. For > example, why should we kill all domains under same root port and > reset root port''s secondary link? Currently it will do so only if > the impacted device has no aer service register.On linux 2.6.27, there is aer driver which bind to root port. But there is no aer driver for other device. So When fatal error occurs, linux resets root port''s secondary link. drivers/pci/pcie/aer/aerdrv.c:aer_root_reset> Also not sure if we need reset the link for non-fatal error if AER > core does not do that. Are there any special difference between > virtualization/native situation?No. There is no difference. I agree with you to keep the AER core as current-is.> >>> Note: we have to consider to prevent device from destroying other domain''s > >>> memory. > >> > >> Why should we consider destroy other domain''s memory? I think VT-d > >> should gurantee this. > > > > The device is re-assigned to dom0 on destroying HVM domain. If we > > destroy domain before resetting the device, I/O device can write > > memory of dom0. On the other hand, we have to stop guest software > > before resetting the device to prevent guest software from accessing device. > > That should same to normal VT-d situation. We need FLR before we > re-assign device to dom0 (If current not working like this, it > should be a bug). Also, to stop guest software before resetting the > device maybe helpful, but maybe not so important. Do you think > guest''s second access will cause host impacted? After all, even on > native environment this is guranted unless platform support it. (It > is said PPC has such support). > > BTW, you stated "We have to solve many difficulties to keep guest > domain running", can you give some detail difficulties (it maybe > difficult to HVM, but not sure for PV side)?- HVM * Implementing root port emulator in ioemu. * Implementing memory mapped configuration access mechanism for guest os. * Enhancing guest aml to allow guest os to handle aer. * Mapping host error to guest error. * Interaction between ioemu and pci back driver. * Handling when guest does not work fine. - PV * Notifying pciback to pcifront. * Handling when guest does not work fine.> > By the way, do you have any plan to implement these function? > > I can provide the idea. But I can''t provide the code. > > Yes, we try to work on it. But we may have not enough environment to > test all types of error. Also although the AER code can be > backported easily, some required ACPI fix is more challenge.I''m not sure backporting is good. In the long term, dom0 linux will be based on newer linux. How/When developers(we) can switch it to newer one? I''d like other developer''s comment. Thanks, -- Yuji Shimada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2008-Oct-23 00:21 UTC
RE: [Xen-devel] [Q] Device error handling discussion -- Was: Is qemu used when we use VTd?
Yuji Shimada <mailto:shimada-yxb@necst.nec.co.jp> wrote:> On Thu, 16 Oct 2008 15:32:40 +0800 > "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>>>> Non-fatal error on I/O device: >>>>> - kill the domain with error source function. >>>>> - reset the function. >>>> >>>>> From following staement in PCI-E 2.0 section 6.6.2: > "Note that Port >>>> state machines associated with Link functionality including those >>>> in the Physical and Data Link Layers are not reset by FLR", I''m not >>>> sure if FLR is a right method to handle the error situation. That''s >>>> the reason I asked on how to handle multiple-function devices. >>> >>> I think Non-fatal error is transaction''s error and it does not require >>> to reset lower layer. But I am not sure. >> >> By default, the data link layer''s error is fatal, but the result >> depends on how driver setup it. We can trap the access to AER >> register, and make sure data link layer error always report as >> fatal. That is easy to implement. > > It means non-fatal error is transaction layer''s error, with default > setting. When non-fatal error occurs on I/O device, FLR seems to recover it. > > >>> >>>>> Non-fatal error on PCI-PCI bridge. >>>>> - kill all domains with the functions under the PCI-PCI bridge. >>>>> - reset PCI-PCI bridge and secondary bus. >>>>> >>>>> Fatal error: >>>>> - kill all domains with the functions under the same root port. >>>>> - reset the link (secondary bus reset on root port). >>>> >>>> Agree. Basically I think the action of "reset PCI-PCI bridge and >>>> secondary bus" or "reset the link" has been done by AER core >>>> already. What we need define is PCI back''s error handler. In first >>>> step, the error handler will trigger domain reset, in future, more >>>> elegant action can be defined/implemented, Any idea? >>> >>> I agree with you basically. >>> >>> Current AER core does not reset PCI-PCI bridge and secondary bus, >>> when Non-fatal error occurs on PCI-PCI bridge. We need to implement >>> resetting PCI-PCI bridge and secondary bus. >> >> I''d keep the AER core as current-is unless some special reason. For >> example, why should we kill all domains under same root port and >> reset root port''s secondary link? Currently it will do so only if >> the impacted device has no aer service register. > > On linux 2.6.27, there is aer driver which bind to root port. But > there is no aer driver for other device. So When fatal error occurs, > linux resets root port''s secondary link. > > drivers/pci/pcie/aer/aerdrv.c:aer_root_reset > > >> Also not sure if we need reset the link for non-fatal error if AER >> core does not do that. Are there any special difference between >> virtualization/native situation? > > No. There is no difference. > I agree with you to keep the AER core as current-is. > > >>>>> Note: we have to consider to prevent device from destroying other >>>>> domain''s memory. >>>> >>>> Why should we consider destroy other domain''s memory? I think VT-d >>>> should gurantee this. >>> >>> The device is re-assigned to dom0 on destroying HVM domain. If we >>> destroy domain before resetting the device, I/O device can write >>> memory of dom0. On the other hand, we have to stop guest software >>> before resetting the device to prevent guest software from accessing >>> device. >> >> That should same to normal VT-d situation. We need FLR before we >> re-assign device to dom0 (If current not working like this, it >> should be a bug). Also, to stop guest software before resetting the >> device maybe helpful, but maybe not so important. Do you think >> guest''s second access will cause host impacted? After all, even on >> native environment this is guranted unless platform support it. (It >> is said PPC has such support). >> >> BTW, you stated "We have to solve many difficulties to keep guest >> domain running", can you give some detail difficulties (it maybe >> difficult to HVM, but not sure for PV side)? > > - HVMFor HVM, yes, it is tricky, and we have no plan for it till now.> * Implementing root port emulator in ioemu. > * Implementing memory mapped configuration access mechanism > for guest os. > * Enhancing guest aml to allow guest os to handle aer. > * Mapping host error to guest error.This should be the the tricky one considering: 1) How to translate the TLP for the header log register? 2) Need to map the Source Identification register.> * Interaction between ioemu and pci back driver.The main difficult is mapping the error_handler to AER register operation.> * Handling when guest does not work fine. > > - PV > * Notifying pciback to pcifront. > * Handling when guest does not work fine.We are working on this now.> >>> By the way, do you have any plan to implement these function? >>> I can provide the idea. But I can''t provide the code. >> >> Yes, we try to work on it. But we may have not enough environment to >> test all types of error. Also although the AER code can be >> backported easily, some required ACPI fix is more challenge. > > I''m not sure backporting is good. In the long term, dom0 linux will be > based on newer linux. How/When developers(we) can switch it to newer > one? I''d like other developer''s comment.At least we need do that for internal testing. Not sure when kernel update will happen.> > Thanks, > -- > Yuji Shimada_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel