Cui, Dexuan
2008-Jun-19 05:13 UTC
[Xen-devel] IOMMU: improve the FLR logic and move it from hypervisor to Control Panel?
Currently, when creating/destroying hvm guest with assigned devices, we perform FLR for the devices in hypervisor: xen/drivers/passthrough/vtd/utils.c: pdev_flr(). The logic is: a) if the device is PCI-e endpoint and it supports FLR, use that; b) for other cases, we use D3hot/D0 transition for FLR. There are some issues: 1) looks there are few PCIe devices supporting FLR now. So currently, almost all the PCIe devices and all PCI devices use the D3hot/D0 method. However, actually, Dstate transition is not guaranteed to properly clear the device state; 2) in case a), the current implementation is actually buggy: Transaction_Pending_bit==0 doesn''t mean the completion of FLR, just means a way to ensure there is no pending transaction when we''re going to issue FLR (so we can be sure there is no data corruption). And according to PCIe spec, after issuing FLR, we should wait at least 100ms, but "mdelay(100)" is not acceptable in Xen... To resolve the issues, I propose to change the FLR logic to: 1) If the device is PCIe endpoint and supports PCIe FLR, use that; 2) Else, if the device is PCIe endpoint, and all functions on the device are assigned to the same guest, we use the immediate parent bus''s "Secondary Bus Reset" to reset all functions of the device (here, actually we require all the functions of the device be assigned to the same guest); 3) Else, if the device is PCI endpoint and is on a host bus (e.g. integrated devices), and if the device supports PCI "Advanced Capabilities", we use that for FLR; 4) Else, if the device is a vendor integrated PCI device with "known" set of vendor/device id, we use the vendor-defined method of issuing FLR. For instance, for the VendorID=0x8086, we can use the method defined in Intel ICH9 Datasheet to perform FLR; 5) Else, we use the" Secondary Bus Reset" (we ensure all the PCI devices behind a bridge must be assigned to the same guest). And I propose to move the FLR logic to Control Panel. The benefits are: 1) It''s natural, and makes the hypervisor thin; 2) The 100ms-delay can be implemented easily in Control Panel, but not easily in hypervisor; 3) Some logic, like the lookup of a device''s BDF to its parent''s BDF can be done more easily in Control Panel. Comments are appreciated. Thanks, -- Dexuan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cui, Dexuan
2008-Jun-20 03:19 UTC
RE: [Xen-devel] IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?
Hi, Keir and all Do you think the improvement to the FLR logic is OK? And moving it to Control Panel? I''m going to make a patch based on this. Thanks, -- Dexuan -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cui, Dexuan Sent: 2008年6月19日 13:14 To: Keir Fraser; xen-devel@lists.xensource.com Subject: [Xen-devel] IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel? Currently, when creating/destroying hvm guest with assigned devices, we perform FLR for the devices in hypervisor: xen/drivers/passthrough/vtd/utils.c: pdev_flr(). The logic is: a) if the device is PCI-e endpoint and it supports FLR, use that; b) for other cases, we use D3hot/D0 transition for FLR. There are some issues: 1) looks there are few PCIe devices supporting FLR now. So currently, almost all the PCIe devices and all PCI devices use the D3hot/D0 method. However, actually, Dstate transition is not guaranteed to properly clear the device state; 2) in case a), the current implementation is actually buggy: Transaction_Pending_bit==0 doesn''t mean the completion of FLR, just means a way to ensure there is no pending transaction when we''re going to issue FLR (so we can be sure there is no data corruption). And according to PCIe spec, after issuing FLR, we should wait at least 100ms, but "mdelay(100)" is not acceptable in Xen... To resolve the issues, I propose to change the FLR logic to: 1) If the device is PCIe endpoint and supports PCIe FLR, use that; 2) Else, if the device is PCIe endpoint, and all functions on the device are assigned to the same guest, we use the immediate parent bus''s "Secondary Bus Reset" to reset all functions of the device (here, actually we require all the functions of the device be assigned to the same guest); 3) Else, if the device is PCI endpoint and is on a host bus (e.g. integrated devices), and if the device supports PCI "Advanced Capabilities", we use that for FLR; 4) Else, if the device is a vendor integrated PCI device with "known" set of vendor/device id, we use the vendor-defined method of issuing FLR. For instance, for the VendorID=0x8086, we can use the method defined in Intel ICH9 Datasheet to perform FLR; 5) Else, we use the" Secondary Bus Reset" (we ensure all the PCI devices behind a bridge must be assigned to the same guest). And I propose to move the FLR logic to Control Panel. The benefits are: 1) It''s natural, and makes the hypervisor thin; 2) The 100ms-delay can be implemented easily in Control Panel, but not easily in hypervisor; 3) Some logic, like the lookup of a device''s BDF to its parent''s BDF can be done more easily in Control Panel. Comments are appreciated. Thanks, -- Dexuan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yosuke Iwamatsu
2008-Jun-20 04:17 UTC
Re: [Xen-devel] IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?
Hi, The term ''Control Panel'' is rather unfamiliar to me. Does it mean qemu-dm for HVM guests? I think pciback in dom0 kernel would be the right place to do FLR, because it commonly used as the holder of pass-through pci device for both PV and HVM guests. The drawback of this is that communication between pciback and dom0 userspace tools may become complicated. But in general, it seems good to let dom0 kernel control pci devices. Regards, -- Yosuke Cui, Dexuan wrote:> Hi, Keir and all > Do you think the improvement to the FLR logic is OK? And moving it to Control Panel? > I''m going to make a patch based on this. > > Thanks, > -- Dexuan > > > -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cui, Dexuan > Sent: 2008年6月19日 13:14 > To: Keir Fraser; xen-devel@lists.xensource.com > Subject: [Xen-devel] IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel? > > Currently, when creating/destroying hvm guest with assigned devices, we > perform FLR for the devices in hypervisor: > xen/drivers/passthrough/vtd/utils.c: pdev_flr(). > The logic is: > a) if the device is PCI-e endpoint and it supports FLR, use that; > b) for other cases, we use D3hot/D0 transition for FLR. > > There are some issues: > > 1) looks there are few PCIe devices supporting FLR now. So currently, > almost all the PCIe devices and all PCI devices use the D3hot/D0 method. > However, actually, Dstate transition is not guaranteed to properly > clear the device state; > > 2) in case a), the current implementation is actually buggy: > Transaction_Pending_bit==0 doesn''t mean the completion of FLR, just > means a way to ensure there is no pending transaction when we''re going > to issue FLR (so we can be sure there is no data corruption). > And according to PCIe spec, after issuing FLR, we should wait at least > 100ms, but "mdelay(100)" is not acceptable in Xen... > > To resolve the issues, I propose to change the FLR logic to: > > 1) If the device is PCIe endpoint and supports PCIe FLR, use that; > 2) Else, if the device is PCIe endpoint, and all functions on the device > are assigned to the same guest, we use the immediate parent bus''s > "Secondary Bus Reset" to reset all functions of the device (here, > actually we require all the functions of the device be assigned to the > same guest); > 3) Else, if the device is PCI endpoint and is on a host bus (e.g. > integrated devices), and if the device supports PCI "Advanced > Capabilities", we use that for FLR; > 4) Else, if the device is a vendor integrated PCI device with "known" > set of vendor/device id, we use the vendor-defined method of issuing > FLR. For instance, for the VendorID=0x8086, we can use the method > defined in Intel ICH9 Datasheet to perform FLR; > 5) Else, we use the" Secondary Bus Reset" (we ensure all the PCI devices > behind a bridge must be assigned to the same guest). > > And I propose to move the FLR logic to Control Panel. > The benefits are: > 1) It''s natural, and makes the hypervisor thin; > 2) The 100ms-delay can be implemented easily in Control Panel, but not > easily in hypervisor; > 3) Some logic, like the lookup of a device''s BDF to its parent''s BDF can > be done more easily in Control Panel. > > Comments are appreciated. > > Thanks, > -- Dexuan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cui, Dexuan
2008-Jun-20 04:41 UTC
RE: [Xen-devel] IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?
Thanks for the comments. qemu-dm is Device Model. I think Control Panel means Xend/libxc (and other necessary scripts). I think pciback of Dom0 may be not the best place. Beside the drawback you mentioned, for the "Secondary Bus Reset", pciback doesn''t own the bridge. In Control Panel, Python script can access PCI config space easily via the sys filesystem. Thanks, -- Dexuan -----Original Message----- From: Yosuke Iwamatsu [mailto:y-iwamatsu@ab.jp.nec.com] Sent: 2008年6月20日 12:18 To: Cui, Dexuan Cc: Keir Fraser; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel? Hi, The term ''Control Panel'' is rather unfamiliar to me. Does it mean qemu-dm for HVM guests? I think pciback in dom0 kernel would be the right place to do FLR, because it commonly used as the holder of pass-through pci device for both PV and HVM guests. The drawback of this is that communication between pciback and dom0 userspace tools may become complicated. But in general, it seems good to let dom0 kernel control pci devices. Regards, -- Yosuke Cui, Dexuan wrote:> Hi, Keir and all > Do you think the improvement to the FLR logic is OK? And moving it to Control Panel? > I''m going to make a patch based on this. > > Thanks, > -- Dexuan > > > -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Cui, Dexuan > Sent: 2008年6月19日 13:14 > To: Keir Fraser; xen-devel@lists.xensource.com > Subject: [Xen-devel] IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel? > > Currently, when creating/destroying hvm guest with assigned devices, we > perform FLR for the devices in hypervisor: > xen/drivers/passthrough/vtd/utils.c: pdev_flr(). > The logic is: > a) if the device is PCI-e endpoint and it supports FLR, use that; > b) for other cases, we use D3hot/D0 transition for FLR. > > There are some issues: > > 1) looks there are few PCIe devices supporting FLR now. So currently, > almost all the PCIe devices and all PCI devices use the D3hot/D0 method. > However, actually, Dstate transition is not guaranteed to properly > clear the device state; > > 2) in case a), the current implementation is actually buggy: > Transaction_Pending_bit==0 doesn''t mean the completion of FLR, just > means a way to ensure there is no pending transaction when we''re going > to issue FLR (so we can be sure there is no data corruption). > And according to PCIe spec, after issuing FLR, we should wait at least > 100ms, but "mdelay(100)" is not acceptable in Xen... > > To resolve the issues, I propose to change the FLR logic to: > > 1) If the device is PCIe endpoint and supports PCIe FLR, use that; > 2) Else, if the device is PCIe endpoint, and all functions on the device > are assigned to the same guest, we use the immediate parent bus''s > "Secondary Bus Reset" to reset all functions of the device (here, > actually we require all the functions of the device be assigned to the > same guest); > 3) Else, if the device is PCI endpoint and is on a host bus (e.g. > integrated devices), and if the device supports PCI "Advanced > Capabilities", we use that for FLR; > 4) Else, if the device is a vendor integrated PCI device with "known" > set of vendor/device id, we use the vendor-defined method of issuing > FLR. For instance, for the VendorID=0x8086, we can use the method > defined in Intel ICH9 Datasheet to perform FLR; > 5) Else, we use the" Secondary Bus Reset" (we ensure all the PCI devices > behind a bridge must be assigned to the same guest). > > And I propose to move the FLR logic to Control Panel. > The benefits are: > 1) It''s natural, and makes the hypervisor thin; > 2) The 100ms-delay can be implemented easily in Control Panel, but not > easily in hypervisor; > 3) Some logic, like the lookup of a device''s BDF to its parent''s BDF can > be done more easily in Control Panel. > > Comments are appreciated. > > Thanks, > -- Dexuan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yosuke Iwamatsu
2008-Jun-20 05:45 UTC
Re: [Xen-devel] IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?
Cui, Dexuan wrote:> Thanks for the comments. > qemu-dm is Device Model. I think Control Panel means Xend/libxc (and other necessary scripts). > I think pciback of Dom0 may be not the best place. Beside the drawback you mentioned, for the "Secondary Bus Reset", pciback doesn''t own the bridge. > In Control Panel, Python script can access PCI config space easily via the sys filesystem.I vaguely feel uneasy about allowing python scripts to write in PCI config space directly. Is it a common way? -- Yosuke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Jun-20 09:10 UTC
Re: [Xen-devel] IOMMU: improve the FLR logic and move it fromhypervisor to Control Panel?
On 20/6/08 06:45, "Yosuke Iwamatsu" <y-iwamatsu@ab.jp.nec.com> wrote:> Cui, Dexuan wrote: >> Thanks for the comments. >> qemu-dm is Device Model. I think Control Panel means Xend/libxc (and other >> necessary scripts). >> I think pciback of Dom0 may be not the best place. Beside the drawback you >> mentioned, for the "Secondary Bus Reset", pciback doesn''t own the bridge. >> In Control Panel, Python script can access PCI config space easily via the >> sys filesystem. > > I vaguely feel uneasy about allowing python scripts to write in PCI > config space directly. Is it a common way?It has the benefit of being easy and flexible and not needing much re-architecting of the code. pciback would be indeed be a sensible place for this functionality, and in general a good place for all guest PCI config space accesses to pass through (whether HVM or PV). It gives us more consistency in implementation between PV and HVM, and it is a more stable, stateful and controlled environment than any user-space daemon, let alone a Python script! But personally I''m happy doing it in userspace for now at least. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel