Hi, IOMMU maintainers, What should Xen do when an IOMMU fault happens? As far as I can see both the AMD and Intel code clears the error in the IOMMU and carries on, but I suspect some more vigorous action is appropriate. I''ve seen traces from an Intel machine that seemed to be livelocked on IOMMU faults from a passed-through VGA card, until it was killed by the watchdog. I think I can see two things that contribute to that: - The Intel IOMMU fault handler prints quite a lot of info in interrupt context, making it easier to livelock. Still I think the general problem applies on AMD too. - Domain destruction re-assigns passed though cards to dom0, but the cards don''t seem to get reset. So there''s nothing to stop a card battering away at DMA in the meantime. That seems like a problem independent of livelock, actually. In any case, it seems like it would be a good idea to stop a broken/malicious/deassigned card from flooding Xen with IOMMU faults. I was considering just writing 0 to the faulting card''s PCI command register, but I''m told that''s not always enough to properly deactivate a card, and it might be a little over-zealous to do it on the first offence. Ideas? Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 16/06 10:25, Tim Deegan wrote:> Hi, IOMMU maintainers, > > What should Xen do when an IOMMU fault happens? As far as I can > see both the AMD and Intel code clears the error in the IOMMU and > carries on, but I suspect some more vigorous action is appropriate. > I''ve seen traces from an Intel machine that seemed to be livelocked on > IOMMU faults from a passed-through VGA card, until it was killed by the > watchdog. I think I can see two things that contribute to that: > > - The Intel IOMMU fault handler prints quite a lot of info in interrupt > context, making it easier to livelock. Still I think the general > problem applies on AMD too. > - Domain destruction re-assigns passed though cards to dom0, but the > cards don''t seem to get reset. So there''s nothing to stop a card > battering away at DMA in the meantime. That seems like a problem > independent of livelock, actually. > > In any case, it seems like it would be a good idea to stop a > broken/malicious/deassigned card from flooding Xen with IOMMU faults. > > I was considering just writing 0 to the faulting card''s PCI command > register, but I''m told that''s not always enough to properly deactivate > a card, and it might be a little over-zealous to do it on the first > offence. > > Ideas? >Hi Tim, We have seed such behavior when we were testing GPU assignement especially the Intel GPU. The problem is that domain destruction in Xen is assynchronous and right now the pci device reset is done in dom0 with some help of the toolstack. In the Intel GPU case we need to make sure that the guest memory and the IOMMU are still in place while we perform to reset otherwise the device drift into an unstable state. There is probably other ways to do that in a cleaner way but we decided to move the pci reset code into Xen, so we are sure we perform the reset while the device is in a known state (functionning state). Attached is the patch we have in XenClient that move the pci reset into Xen. The modifications we have made to the VT-d code should go in the IOMMU generic section. I appologise but this patch is based on Xen 3.4, if we think this is the right way to do it, I can submit a proper patch against unstable and 4.1. Regards, Jean _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Jean, At 10:47 +0100 on 16 Jun (1308221277), Jean Guyader wrote:> We have seed such behavior when we were testing GPU assignement especially > the Intel GPU. The problem is that domain destruction in Xen is assynchronous > and right now the pci device reset is done in dom0 with some help of the toolstack. > > In the Intel GPU case we need to make sure that the guest memory and the IOMMU > are still in place while we perform to reset otherwise the device drift into > an unstable state. > > There is probably other ways to do that in a cleaner way but we decided to move > the pci reset code into Xen, so we are sure we perform the reset while the device > is in a known state (functionning state). > > Attached is the patch we have in XenClient that move the pci reset into Xen.Thanks, Jean. This sounds like a good idea to me, though I''d like to hear Wei and Allen''s opinions. The patch is incomplete (missing the new pci_reset.[ch] files) but I get the general idea. A few questions: - Why the special handling for one graphics device on each domain? (And if one, why not all?) - Why not reset when the target is dom0? It seems like it can do no harm and should protect dom0 from assigning itself an active PCI card. Of course, even with this patch, my original question still stands: should Xen do something more assertive in the IOMMU fault handler? Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 16/06 11:07, Tim Deegan wrote:> Hi Jean, >Reply below,> At 10:47 +0100 on 16 Jun (1308221277), Jean Guyader wrote: > > We have seed such behavior when we were testing GPU assignement especially > > the Intel GPU. The problem is that domain destruction in Xen is assynchronous > > and right now the pci device reset is done in dom0 with some help of the toolstack. > > > > In the Intel GPU case we need to make sure that the guest memory and the IOMMU > > are still in place while we perform to reset otherwise the device drift into > > an unstable state. > > > > There is probably other ways to do that in a cleaner way but we decided to move > > the pci reset code into Xen, so we are sure we perform the reset while the device > > is in a known state (functionning state). > > > > Attached is the patch we have in XenClient that move the pci reset into Xen. > > Thanks, Jean. This sounds like a good idea to me, though I''d like to > hear Wei and Allen''s opinions. > > The patch is incomplete (missing the new pci_reset.[ch] files) but I get > the general idea. A few questions:Reattach the full patch.> > - Why the special handling for one graphics device on each domain? > (And if one, why not all?)No good reason really just a limitation of the patch, we can trivially get ride of the limitation.> - Why not reset when the target is dom0? It seems like it can do no > harm and should protect dom0 from assigning itself an active PCI > card.Reset could be quiet expensive (couple of seconds sometimes). We did that to avoid a double reset on domain reboot. I agree that we should remove that, or extend the IOMMU API so we can reassign from domU to domU without going through dom0.> > Of course, even with this patch, my original question still stands: > should Xen do something more assertive in the IOMMU fault handler? >What we really want to achive here is to stop DMA on this device. One way of doing it is to perform a proper PCI reset (FLR, secondary bus reset, ...) when that happens. Jean _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Alberto BozzoOn Thursday 16 June 2011 11:25:09 Tim Deegan wrote:> Hi, IOMMU maintainers, > > What should Xen do when an IOMMU fault happens? As far as I can > see both the AMD and Intel code clears the error in the IOMMU and > carries on, but I suspect some more vigorous action is appropriate. > I''ve seen traces from an Intel machine that seemed to be livelocked on > IOMMU faults from a passed-through VGA card, until it was killed by the > watchdog. I think I can see two things that contribute to that: > > - The Intel IOMMU fault handler prints quite a lot of info in interrupt > context, making it easier to livelock. Still I think the general > problem applies on AMD too.This info could still be useful for debugging, but we should only enable this for debug build.> - Domain destruction re-assigns passed though cards to dom0, but the > cards don''t seem to get reset. So there''s nothing to stop a card > battering away at DMA in the meantime. That seems like a problem > independent of livelock, actually.There should be some FLR codes in tools (both xm and xl). But this might not work well with some devices...> In any case, it seems like it would be a good idea to stop a > broken/malicious/deassigned card from flooding Xen with IOMMU faults.Yes, agree that. Actually I saw a lot could be improved in the fault handler. When iommu faults come from dma error, we should either stop the device from doing dma or inject errors to guest if the guest driver is able to handle io page fault.> I was considering just writing 0 to the faulting card''s PCI command > register, but I''m told that''s not always enough to properly deactivate > a card, and it might be a little over-zealous to do it on the first > offence. > Ideas?It seems difficult to find a generic approach to stop a device without knowing more device specific details... Thanks, Wei> Tim._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > I was considering just writing 0 to the faulting card''s PCI command > > register, but I''m told that''s not always enough to properly deactivate > > a card, and it might be a little over-zealous to do it on the first > > offence. > > Ideas? > It seems difficult to find a generic approach to stop a device without knowing > more device specific details...Perhaps make something similar to the MCE fault interrupts? As in when the error happens, the Dom0 is notified of the offending BDF and persuses whatever action it thinks are neccessary. The action would be to tell the device driver to turn itself off. But how it would interact with the driver.. Well how does Linux deal with this today? Is there an extension to the device driver API (similar to the power) to notify the driver that it has done bad things and to shut itself off? Perhaps similar to the PCIe AER handling? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> - The Intel IOMMU fault handler prints quite a lot of info in interrupt > context, making it easier to livelock. Still I think the general > problem applies on AMD too.Someone at Intel looked into implementing measured rate printing in vt-d fault handler. He encountered some complications. I remember it had to do with measured rate printing not enabled by default (?). For now, I think having it print out only for debug case sounds simple enough. I will submit a patch for it.> - Domain destruction re-assigns passed though cards to dom0, but the > cards don''t seem to get reset. So there''s nothing to stop a card > battering away at DMA in the meantime. That seems like a problem > independent of livelock, actually.>From reading the code in libxl, it seems libxl__device_pci_reset() is called by both libxl__device_pci_add() and do_pci_remove(). Isn''t do_pci_remove() called when the pass through device is reassigned to dom0 during a domain teardown?Allen -----Original Message----- From: Tim Deegan [mailto:Tim.Deegan@citrix.com] Sent: Thursday, June 16, 2011 2:25 AM To: Kay, Allen M; Wei Wang Cc: xen-devel@lists.xensource.com; Jean Guyader Subject: IOMMU faults Hi, IOMMU maintainers, What should Xen do when an IOMMU fault happens? As far as I can see both the AMD and Intel code clears the error in the IOMMU and carries on, but I suspect some more vigorous action is appropriate. I''ve seen traces from an Intel machine that seemed to be livelocked on IOMMU faults from a passed-through VGA card, until it was killed by the watchdog. I think I can see two things that contribute to that: - The Intel IOMMU fault handler prints quite a lot of info in interrupt context, making it easier to livelock. Still I think the general problem applies on AMD too. - Domain destruction re-assigns passed though cards to dom0, but the cards don''t seem to get reset. So there''s nothing to stop a card battering away at DMA in the meantime. That seems like a problem independent of livelock, actually. In any case, it seems like it would be a good idea to stop a broken/malicious/deassigned card from flooding Xen with IOMMU faults. I was considering just writing 0 to the faulting card''s PCI command register, but I''m told that''s not always enough to properly deactivate a card, and it might be a little over-zealous to do it on the first offence. Ideas? Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 12:21 -0700 on 16 Jun (1308226873), Kay, Allen M wrote:> > - The Intel IOMMU fault handler prints quite a lot of info in interrupt > > context, making it easier to livelock. Still I think the general > > problem applies on AMD too. > > Someone at Intel looked into implementing measured rate printing in vt-d fault handler. He encountered some complications. I remember it had to do with measured rate printing not enabled by default (?). For now, I think having it print out only for debug case sounds simple enough. I will submit a patch for it. >That''s great, thanks.> > - Domain destruction re-assigns passed though cards to dom0, but the > > cards don''t seem to get reset. So there''s nothing to stop a card > > battering away at DMA in the meantime. That seems like a problem > > independent of livelock, actually. > > >From reading the code in libxl, it seems libxl__device_pci_reset() is called by both libxl__device_pci_add() and do_pci_remove(). Isn''t do_pci_remove() called when the pass through device is reassigned to dom0 during a domain teardown?Libxl could be too late, though. When a domain is destroyed, its iommu tables get torn down in Xen. So if it has active devices: - they can start raising IOMMU faults immediately, and in some circumstances libxl might never get to run. - since deassign is implemented as "assign to dom0" they might start DMAing over dom0 memory. If we can rely on the dom0 tools always completely resetting a domains''s devices before calling domctl_destroydomain, that should never happen. That seems a bit fragile, though I guess dom0 can shoot itself in the foot in enough other ways. I prefer Jean''s reset-in-xen approach; it''s only a few hundred lines of code and we could reuse some of it for resetting badly-behaved cards from the IOMMU fault handler. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 10:47 -0400 on 16 Jun (1308221250), Konrad Rzeszutek Wilk wrote:> Perhaps make something similar to the MCE fault interrupts? As in when > the error happens, the Dom0 is notified of the offending BDF and > persuses whatever action it thinks are neccessary. The action would be > to tell the device driver to turn itself off. But how it would > interact with the driver.. Well how does Linux deal with this today? > Is there an extension to the device driver API (similar to the power) > to notify the driver that it has done bad things and to shut itself > off?That sort of interface might be nice too, but I was worried more about badly-behaved guests or devices. In the livelock case the guest might never get to run so can''t do anything, and a malicious guest would just ignore the message anyway. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, At 11:28 +0100 on 16 Jun (1308223697), Jean Guyader wrote:> > Of course, even with this patch, my original question still stands: > > should Xen do something more assertive in the IOMMU fault handler? > > What we really want to achive here is to stop DMA on this device. > One way of doing it is to perform a proper PCI reset (FLR, secondary > bus reset, ...) when that happens.I think that''s more or less a consensus then, that we should try to stop the device from the IOMMU fault handler. Looking at your patch in a bit more detail, I see two things that worry me. The first is that the new pci_reset_device() function does nothing at all if the device isn''t one of the particular graphics cards it know about! The second is this comment:> + /* Leave CMD MEMORY set otherwise the platform can crashe during FLR */ > + pci_conf_write16(bus, d, f, PCI_COMMAND, 2);which implies that my current approach of just disabling the card might have pretty bad conequences. Can you expand on that? Would it be better just to mask out PCI_COMMAND_MASTER? And if I do that do I need to try and issue a reset as well (i.e. are there cards that are known to ignore this bit?) Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 14:32 +0100 on 24 Jun (1308925934), Tim Deegan wrote:> At 11:28 +0100 on 16 Jun (1308223697), Jean Guyader wrote: > > > Of course, even with this patch, my original question still stands: > > > should Xen do something more assertive in the IOMMU fault handler? > > > > What we really want to achive here is to stop DMA on this device. > > One way of doing it is to perform a proper PCI reset (FLR, secondary > > bus reset, ...) when that happens. > > I think that''s more or less a consensus then, that we should try to > stop the device from the IOMMU fault handler. > > Looking at your patch in a bit more detail, I see two things that worry > me. The first is that the new pci_reset_device() function does nothing > at all if the device isn''t one of the particular graphics cards it know > about! > > The second is this comment: > > > + /* Leave CMD MEMORY set otherwise the platform can crashe during FLR */ > > + pci_conf_write16(bus, d, f, PCI_COMMAND, 2); > > which implies that my current approach of just disabling the card might > have pretty bad conequences. Can you expand on that? Would it be > better just to mask out PCI_COMMAND_MASTER? And if I do that do I need > to try and issue a reset as well (i.e. are there cards that are known to > ignore this bit?)Ping? Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 30/06 11:08, Tim Deegan wrote:> At 14:32 +0100 on 24 Jun (1308925934), Tim Deegan wrote: > > At 11:28 +0100 on 16 Jun (1308223697), Jean Guyader wrote: > > > > Of course, even with this patch, my original question still stands: > > > > should Xen do something more assertive in the IOMMU fault handler? > > > > > > What we really want to achive here is to stop DMA on this device. > > > One way of doing it is to perform a proper PCI reset (FLR, secondary > > > bus reset, ...) when that happens. > > > > I think that''s more or less a consensus then, that we should try to > > stop the device from the IOMMU fault handler. > > > > Looking at your patch in a bit more detail, I see two things that worry > > me. The first is that the new pci_reset_device() function does nothing > > at all if the device isn''t one of the particular graphics cards it know > > about! > >In our case the reset of other devices is done using the classic Xen toolstack way in dom0. But the code could be easily changed to do a proper reset on all the type of devices.> > The second is this comment: > > > > > + /* Leave CMD MEMORY set otherwise the platform can crashe during FLR */ > > > + pci_conf_write16(bus, d, f, PCI_COMMAND, 2); > > > > which implies that my current approach of just disabling the card might > > have pretty bad conequences. Can you expand on that? Would it be > > better just to mask out PCI_COMMAND_MASTER? And if I do that do I need > > to try and issue a reset as well (i.e. are there cards that are known to > > ignore this bit?)Agreed, masking out PCI_COMMAND_MASTER should be enough, Linux only do that. Jean _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel