Ke, Liping
2008-Nov-14 07:33 UTC
[Xen-devel] [RFC][patch 0/7] Enable PCIE-AER support for XEN
Following 7 patches are for PCIE AER (Advanced Error Reporting) support for XEN. --------------------------------------------------------------------------- Patches 1~4 back port from Linux Kernel which enables kernel support to AER. Those patches enable DOM0 PCIE error handling capability. When a device sends a PCIE error message to the root port, it will trigger an interrupt. The irq handler then collect root error status register then schedule a work to process the error based on the error type (correctable/non-fatal/fatal). For correctable errors, clear error status register of the device For non-fatal error, call the callback functions of the endpoint''s driver. For bridge, it will broadcast the error to the downstream ports. In dom0, it means pciback driver will be called accordingly. For fatal error, except reseting the pcie link as additional job, it have the same process with non-fatal error. ---------------------------------------------------------------------------- Patch 5~7: AER error handler implementation in pciback and pcifront. This the main job we have done As we mentioned above, pciback pci error handler will be scheduled by root port AER service. Pciback then ask pcifront help to call end-device driver for finally completing the related pci error handling jobs. We noticed there might be some race condition between pciback ops (such as pci error handling we now work on or other configuration ops) and pci-hotplug. Those issues will be solved before sending patch. --------------------------------------------------------------------------- Test: We have tested the patches on IPF Hitachi which could trigger Unsupported Request non-fatal AER by read/write a non-existing function on a pci-device which support AER. (We need to make sure the end device, and the middle bridge and the root port must support AER too) We also test it on the x86 and make sure it will not break current code path. --------------------------------------------------------------------------- Below example workflow which might be helpful: 1) Assigned an AER-capable network device to a PV driver domain (No-VTD supported on Hitachi). 2) Installed network device driver in PV guest which support pci error handling. 3) If no device driver installed in PV guest, or the driver does not support pci error recovery functions, the guest will be killed directly (the devices will be FLRed). For HVM guest, it will be killed obviously. 4) Trigger AER by test driver, an interrupt will be generated and caught by root port. 5) AER service driver below root port in DOM0 will help to do the recovery steps in bottom half of the aer interrupt context. For each recovery process (error_detected, mmio_enabled, slot_reset, error_resume), aer core will cooperate with each below devices which has registered pci_error_handlers to finish the process. For details, please see the related docs in kernel (attached aer_doc.patch). 6) pciback_error_handler will then be called by AER core for each above four processing. Pciback will send the processing notification to pcifront, pcifront then try to call the corresponding device driver if device driver has the pci_error_handler.. If all each recovery process succeeds, this pcie error should have been fixed and successfully recovered. Otherwise, impacted guest will be killed. Thanks& Regards, Criping _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Nov-14 14:47 UTC
[Xen-devel] Re: [RFC][patch 0/7] Enable PCIE-AER support for XEN
Need proper changeset comments and signed-off-by lines from you for all these patches (especially the new ones, which don''t have any upstream comments or sign offs). Putting the domAction node in /local/domain/x/ is dubious: it doesn''t need to be accessible outside of dom0. How about sticking it in pciback''s directory, and have the watch set up from pciif.py? -- Keir On 14/11/08 07:33, "Ke, Liping" <liping.ke@intel.com> wrote:> Following 7 patches are for PCIE AER (Advanced Error Reporting) support for > XEN. > --------------------------------------------------------------------------- > Patches 1~4 back port from Linux Kernel which enables kernel support to AER. > > Those patches enable DOM0 PCIE error handling capability. When a device sends > a PCIE error message to the root port, it will trigger an interrupt. The irq > handler then collect root error status register then schedule a work to > process the error based on the error type (correctable/non-fatal/fatal). > > For correctable errors, clear error status register of the device > For non-fatal error, call the callback functions of the endpoint''s driver. For > bridge, it will broadcast the error to the downstream ports. In dom0, it means > pciback driver will be called accordingly. > For fatal error, except reseting the pcie link as additional job, it have the > same process with non-fatal error. > ---------------------------------------------------------------------------- > Patch 5~7: AER error handler implementation in pciback and pcifront. This the > main job we have done > > As we mentioned above, pciback pci error handler will be scheduled by root > port AER service. Pciback then ask pcifront help to call end-device driver for > finally completing the related pci error handling jobs. > > We noticed there might be some race condition between pciback ops (such as pci > error handling we now work on or other configuration ops) and pci-hotplug. > Those issues will be solved before sending patch. > --------------------------------------------------------------------------- > Test: > We have tested the patches on IPF Hitachi which could trigger Unsupported > Request non-fatal AER by read/write a non-existing function on a pci-device > which support AER. (We need to make sure the end device, and the middle bridge > and the root port must support AER too) > We also test it on the x86 and make sure it will not break current code path. > --------------------------------------------------------------------------- > Below example workflow which might be helpful: > 1) Assigned an AER-capable network device to a PV driver domain (No-VTD > supported on Hitachi). > 2) Installed network device driver in PV guest which support pci error > handling. > 3) If no device driver installed in PV guest, or the driver does not support > pci error recovery functions, the guest will be killed directly (the devices > will be FLRed). For HVM guest, it will be killed obviously. > 4) Trigger AER by test driver, an interrupt will be generated and caught by > root port. > 5) AER service driver below root port in DOM0 will help to do the recovery > steps in bottom half of the aer interrupt context. > For each recovery process (error_detected, mmio_enabled, slot_reset, > error_resume), aer core will cooperate with each below devices which has > registered pci_error_handlers to finish the process. For details, please see > the related docs in kernel (attached aer_doc.patch). > 6) pciback_error_handler will then be called by AER core for each above four > processing. Pciback will send the processing notification to pcifront, > pcifront then try to call the corresponding device driver if device driver has > the pci_error_handler.. > If all each recovery process succeeds, this pcie error should have been fixed > and successfully recovered. Otherwise, impacted guest will be killed. > > Thanks& Regards, > Criping_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ke, Liping
2008-Nov-16 13:39 UTC
[Xen-devel] RE: [RFC][patch 0/7] Enable PCIE-AER support for XEN
Hi, Keir Thanks a lot for the node change and watch set suggestion. It's nice. And we'll add comments and sign offs when we send the final refined patch. Regards, Criping -----Original Message----- From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] Sent: 2008年11月14日 22:48 To: Ke, Liping; xen-devel@lists.xensource.com Cc: Jiang, Yunhong Subject: Re: [RFC][patch 0/7] Enable PCIE-AER support for XEN Need proper changeset comments and signed-off-by lines from you for all these patches (especially the new ones, which don't have any upstream comments or sign offs). Putting the domAction node in /local/domain/x/ is dubious: it doesn't need to be accessible outside of dom0. How about sticking it in pciback's directory, and have the watch set up from pciif.py? -- Keir On 14/11/08 07:33, "Ke, Liping" <liping.ke@intel.com> wrote:> Following 7 patches are for PCIE AER (Advanced Error Reporting) support for > XEN. > --------------------------------------------------------------------------- > Patches 1~4 back port from Linux Kernel which enables kernel support to AER. > > Those patches enable DOM0 PCIE error handling capability. When a device sends > a PCIE error message to the root port, it will trigger an interrupt. The irq > handler then collect root error status register then schedule a work to > process the error based on the error type (correctable/non-fatal/fatal). > > For correctable errors, clear error status register of the device > For non-fatal error, call the callback functions of the endpoint's driver. For > bridge, it will broadcast the error to the downstream ports. In dom0, it means > pciback driver will be called accordingly. > For fatal error, except reseting the pcie link as additional job, it have the > same process with non-fatal error. > ---------------------------------------------------------------------------- > Patch 5~7: AER error handler implementation in pciback and pcifront. This the > main job we have done > > As we mentioned above, pciback pci error handler will be scheduled by root > port AER service. Pciback then ask pcifront help to call end-device driver for > finally completing the related pci error handling jobs. > > We noticed there might be some race condition between pciback ops (such as pci > error handling we now work on or other configuration ops) and pci-hotplug. > Those issues will be solved before sending patch. > --------------------------------------------------------------------------- > Test: > We have tested the patches on IPF Hitachi which could trigger Unsupported > Request non-fatal AER by read/write a non-existing function on a pci-device > which support AER. (We need to make sure the end device, and the middle bridge > and the root port must support AER too) > We also test it on the x86 and make sure it will not break current code path. > --------------------------------------------------------------------------- > Below example workflow which might be helpful: > 1) Assigned an AER-capable network device to a PV driver domain (No-VTD > supported on Hitachi). > 2) Installed network device driver in PV guest which support pci error > handling. > 3) If no device driver installed in PV guest, or the driver does not support > pci error recovery functions, the guest will be killed directly (the devices > will be FLRed). For HVM guest, it will be killed obviously. > 4) Trigger AER by test driver, an interrupt will be generated and caught by > root port. > 5) AER service driver below root port in DOM0 will help to do the recovery > steps in bottom half of the aer interrupt context. > For each recovery process (error_detected, mmio_enabled, slot_reset, > error_resume), aer core will cooperate with each below devices which has > registered pci_error_handlers to finish the process. For details, please see > the related docs in kernel (attached aer_doc.patch). > 6) pciback_error_handler will then be called by AER core for each above four > processing. Pciback will send the processing notification to pcifront, > pcifront then try to call the corresponding device driver if device driver has > the pci_error_handler.. > If all each recovery process succeeds, this pcie error should have been fixed > and successfully recovered. Otherwise, impacted guest will be killed. > > Thanks& Regards, > Criping_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel