Ke, Liping
2008-Nov-21 05:49 UTC
[Xen-devel] [patch 0/7][PCIE-AER]Enable PCIE-AER support for XEN
Following 7 patches are for PCIE AER (Advanced Error Reporting) support for XEN. --------------------------------------------------------------------------- Patches 1~4 back port from Linux Kernel which enables DOM0 kernel support to AER. Those patches enable DOM0 PCIE error handling capability. When a device sends a PCIE error message to the root port, it will trigger an interrupt. The aer-irq handler collects root error status register and schedule a DPC to deal with the error based on error type (correctable/non-fatal/fatal). For correctable errors, it clears error status register of the device For uncorrectable errors (fatal, non-fatal), it calls the callback functions of the endpoint''s driver. For bridge, it broadcasts the error to the downstream ports. For dom0, it means pciback driver will be called accordingly. (Fatal error needs to do some additional job such as reset pcie-link, etc.) ---------------------------------------------------------------------------- Patch 5~7: AER error handler implementation in pciback and pcifront. This the main job we have done As mentioned above, pciback pci error handler will be scheduled by root port AER service. Pciback then asks pcifront help to call end-device driver''s support, completing related pci error handling. Please see detailed work flow/policy --------------------------------------------------------------------------- Below workflow/policy illustration might be helpful: 1) Assign an AER-capable network device to a PV driver domain 2) Installed network device driver in PV guest which support pci error handling. 3) If no device driver installed in PV guest, or the driver does not register pci error handler, the guest will be killed directly (the devices will be FLRed). HVM guest will be directly killed currently 4) Trigger AER by test driver, an interrupt will be generated and caught by root port. 5) AER service driver below root port in DOM0 will help to do the recovery steps For each recovery process (error_detected, mmio_enabled, slot_reset, error_resume), aer core will cooperate with each below devices which registers pci_error_handlers. For details, please see the related docs in kernel (patch1 aer_doc.patch). 6) pciback_error_handler will then be called by AER core for each above four steps. Pciback will send the service request to pcifront for each step. Pcifront then tries to call the corresponding device driver if device driver has the pci_error_handler. If each recovery step succeeds, this pcie error should have been successfully recovered. Otherwise, impacted guest will be killed and the pcie device will be FLRed. --------------------------------------------------------------------------- Test environment: We have tested the patches on IPF Hitachi which could trigger Unsupported Request non-fatal AER by read/write a non-existing function on a pci-device which supports AER. (We need to make sure the whole path: end device, bridges and the root port must support AER too) We also test it on the x86 and make sure it will not break current code path. --------------------------------------------------------------------------- Any question, just let me know. Thanks a lot for your help! Regards, Criping _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel