Ryan
2006-Apr-24 13:52 UTC
[Xen-devel] [PATCH] pciback: deferred handling of pci configuration space accesses
Some of the Linux PCI functions called by the virtual configuration space handlers were making calls into ACPI code which uses semaphores. Since semaphores can not be locked while atomic (because they could sleep), I changed the way the PCI backend responds to requests from the frontend. Previously, the virtual configuration space handlers ran in the same context as the event channel interrupt handler (which was often atomic if not always atomic). Now the interrupt handler schedules a callback function (a bottom half) in the system work queue (keventd) that will get called in process context at a slightly later time. This allows the handlers in the virtual configuration space to run in process context and to call any core PCI function regardless of whether it will sleep or not. This patch was previously submitted as part of a larger patch. There have been no significant changes, just renamed a variable and made this code independent of the other patch. Signed-off-by: Ryan Wilson <hap9@epoch.ncsc.mil> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Apr-25 08:15 UTC
Re: [Xen-devel] [PATCH] pciback: deferred handling of pci configuration space accesses
On 24 Apr 2006, at 14:52, Ryan wrote:> Previously, the virtual configuration space handlers ran in > the same context as the event channel interrupt handler (which was > often > atomic if not always atomic). Now the interrupt handler schedules a > callback function (a bottom half) in the system work queue (keventd) > that will get called in process context at a slightly later time. This > allows the handlers in the virtual configuration space to run in > process > context and to call any core PCI function regardless of whether it will > sleep or not.This is okay in principle, but I found the op_in_progress counter rather confusing and I''m not sure why it''s needed? If it''s to prevent a double schedule_work() call on a single PCI request then I''m not sure that it''s watertight. Does it need to be? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Apr-25 09:17 UTC
Re: [Xen-devel] [PATCH] pciback: deferred handling of pci configuration space accesses
On 25 Apr 2006, at 09:15, Keir Fraser wrote:>> Previously, the virtual configuration space handlers ran in >> the same context as the event channel interrupt handler (which was >> often >> atomic if not always atomic). Now the interrupt handler schedules a >> callback function (a bottom half) in the system work queue (keventd) >> that will get called in process context at a slightly later time. This >> allows the handlers in the virtual configuration space to run in >> process >> context and to call any core PCI function regardless of whether it >> will >> sleep or not. > > This is okay in principle, but I found the op_in_progress counter > rather confusing and I''m not sure why it''s needed? If it''s to prevent > a double schedule_work() call on a single PCI request then I''m not > sure that it''s watertight. Does it need to be?Let me be a bit more specific here: I think that if an interrupt is delivered after the work function has incremented op_in_progress, but before it clears _PCIF_active, then work can be scheduled erroneously because the IRQ handler will see atomic_dec_and_test() return TRUE. If serialised execution of pci requests is important, and it looks like it is, I think the simplest solution is simply to create your own single-threaded workqueue and queue_work() onto that. Personally I get worried about using the shared workqueues anyway, as they''re another shared resource to deadlock on. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ryan
2006-Apr-25 13:32 UTC
Re: [Xen-devel] [PATCH] pciback: deferred handling of pci configuration space accesses
On Tue, 2006-04-25 at 10:17 +0100, Keir Fraser wrote:> On 25 Apr 2006, at 09:15, Keir Fraser wrote: > > > > This is okay in principle, but I found the op_in_progress counter > > rather confusing and I''m not sure why it''s needed? If it''s to prevent > > a double schedule_work() call on a single PCI request then I''m not > > sure that it''s watertight. Does it need to be? > > Let me be a bit more specific here: I think that if an interrupt is > delivered after the work function has incremented op_in_progress, but > before it clears _PCIF_active, then work can be scheduled erroneously > because the IRQ handler will see atomic_dec_and_test() return TRUE.Maybe you''re seeing something that I''m missing so let me explain what I think about this section of code. And please point out where you think I''m mistaken or have made an incorrect assumption. I added the counter to act as a lock to guarantee that each request from the frontend would complete one at a time. This way, virtual configuration space handlers can be guaranteed to never be running concurrently on the same device. And they never should be because there is only one op structure in the shared memory. When I do the atomic_inc after completing the configuration space access, I don''t believe it would matter if the frontend triggers an interrupt again (before PCIF_active is cleared). In this case, the backend would be causing another request to be processed and overwriting the return value from the previous op. I don''t think that work would be scheduled erroneously in this case because by triggering an interrupt with the _PCIF_active bit still set, the frontend is basically placing a second request (but is only affecting itself and its device by executing the request again). I can''t put the atomic_inc after I clear the _PCIF_active bit as then the frontend could try and immediately turn around and send a second legitimate request that would fail because op_in_progress may not have decremented yet.> If serialised execution of pci requests is important, and it looks like > it is, I think the simplest solution is simply to create your own > single-threaded workqueue and queue_work() onto that. Personally I get > worried about using the shared workqueues anyway, as they''re another > shared resource to deadlock on. >Can you give me an example of how you think I could be deadlocking on the workqueue? I believe I''m interacting with it correctly and a quick glance at the kernel code for workqueues themselves shows only a single lock per workqueue. Granted, other driver/kernel code that may run in the workqueue could deadlock that thread, but I don''t see how this is any different from other kernel bugs that could take down the kernel (I hope that someday the PCI Backend will run isolated in its own driver domain). It seemed like overkill to me to create my own work queue and kernel thread that would sit and wait for pci configuration space requests when they will most likely be few and far between for most devices. Ryan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Apr-25 14:09 UTC
Re: [Xen-devel] [PATCH] pciback: deferred handling of pci configuration space accesses
On 25 Apr 2006, at 14:32, Ryan wrote:> I can''t put the atomic_inc after I clear the _PCIF_active bit as then > the frontend could try and immediately turn around and send a second > legitimate request that would fail because op_in_progress may not have > decremented yet.Okay, I think I see what you mean. But I think the code would be clearer *and* robust against spurious interrupts (event-channel callbacks) by doing the following: * turn op_in_progress into a proper boolean, but with type ''unsigned long''. * in the IRQ handler do: if (test_bit(_XEN_PCIF_active, &pdev->...) && !test_and_set_bit(0, &pdev->op_in_progress)) schedule_work(...); * at the very end of pciback_handle_event() do: smp_wmb(); /* finish the response /then/ flag the IRQ handler */ pdev->in_progress = 0; smp_mb(); /* flag the IRQ handler /then/ check for more work */ if (test_bit(_XEN_PCIF_active, &pdev->...) && !test_and_set_bit(0, &pdev->op_in_progress)) schedule_work(...); I think this is better than the current implementation of op_in_progress as a counter with zero meaning in-progress, giving clear one-shot-setting semantics for op_in_progress for each queued PCI request.>> If serialised execution of pci requests is important, and it looks >> like >> it is, I think the simplest solution is simply to create your own >> single-threaded workqueue and queue_work() onto that. Personally I get >> worried about using the shared workqueues anyway, as they''re another >> shared resource to deadlock on. >> > > Can you give me an example of how you think I could be deadlocking on > the workqueue?No, it was more an aside really. I think the keventd workqueues are totally fine in this case, but I have achieved deadlock loops involving the shared workqueues in my own code in the past (arguably due to stupid use of semaphores though). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ryan
2006-Apr-25 17:02 UTC
Re: [Xen-devel] [PATCH] pciback: deferred handling of pci configuration space accesses
On Tue, 2006-04-25 at 15:09 +0100, Keir Fraser wrote:> On 25 Apr 2006, at 14:32, Ryan wrote: > > > I can''t put the atomic_inc after I clear the _PCIF_active bit as then > > the frontend could try and immediately turn around and send a second > > legitimate request that would fail because op_in_progress may not have > > decremented yet. > > Okay, I think I see what you mean. But I think the code would be > clearer *and* robust against spurious interrupts (event-channel > callbacks) by doing the following: > > * turn op_in_progress into a proper boolean, but with type ''unsigned > long''. > > * in the IRQ handler do: > if (test_bit(_XEN_PCIF_active, &pdev->...) && > !test_and_set_bit(0, &pdev->op_in_progress)) > schedule_work(...); > > * at the very end of pciback_handle_event() do: > smp_wmb(); /* finish the response /then/ flag the IRQ handler */ > pdev->in_progress = 0; > smp_mb(); /* flag the IRQ handler /then/ check for more work */ > if (test_bit(_XEN_PCIF_active, &pdev->...) && > !test_and_set_bit(0, &pdev->op_in_progress)) > schedule_work(...); > > I think this is better than the current implementation of > op_in_progress as a counter with zero meaning in-progress, giving clear > one-shot-setting semantics for op_in_progress for each queued PCI > request.I agree: setting/clearing a bit is definitely clearer than the atomic type I was using. I''ve fixed that in the attached patch. I also see how your method is an improvement over mine in that the _XEN_PCIF_active flag isn''t cleared when the backend is still processing. I was willing to accept that as a consequence of the frontend misbehaving but your method is even more robust and will guarantee that the request will only run once and that the flag will be set/cleared appropriately. I''ve fixed up my code and am attaching a patch. I also added a comment about the need to defer (to avoid sleeping in atomic context). -- Some of the Linux PCI functions called by the virtual configuration space handlers were making calls into ACPI code which uses semaphores. Since semaphores can not be locked while atomic (because they could sleep), I changed the way the PCI backend responds to requests from the frontend. Previously, the virtual configuration space handlers ran in the same context as the event channel interrupt handler (which was often atomic if not always atomic). Now the interrupt handler schedules a callback function (a bottom half) in the system work queue (keventd) that will get called in process context at a slightly later time. This allows the handlers in the virtual configuration space to run in process context and to call any core PCI function regardless of whether it will sleep or not. Signed-off-by: Ryan Wilson <hap9@epoch.ncsc.mil> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel