Kouya Shimura
2009-Mar-06 02:55 UTC
[Xen-devel] [PATCH][RFC] fix some spinlock issues in vmsi
Hi, This patch fixes some spinlock issues related to changeset: 19246:9bc5799566be "passthrough MSI-X mask bit acceleration" 19263:9c5b4efc934d "hvm: passthrough MSI-X: fix ia64 link and MSI-X clean up" Without this patch, I got: (XEN) Assertion ''_spin_is_locked(&pcidevs_lock)'' failed at vmsi.c:389 or (XEN) Xen BUG at spinlock.c:25 I''m not sure this fix is right. Please review it. - d->arch.hvm_domain.msixtbl_list_lock is redundant. irq_desc->lock covers it, thus remove the spinlock. - ASSERT(spin_is_locked(&pcidevs_lock)) is not needed. At first, I intended to add the enclosure of pcidevs_lock. But this assertion seems pointless. pcidevs_lock is for alldevs_list and msixtbl_pt_xxx functions never refer it. - In "debug=y" environment, xmalloc must not be called from both irq_enable and irq_disable state. Otherwise, the assertion failure occurs in check_lock(). So, in msixtbl_pt_register, xmalloc is called beforehand. I thinks this is ugly, though. Thanks, Kouya Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Mar-06 05:31 UTC
[Xen-devel] Re: [PATCH][RFC] fix some spinlock issues in vmsi
On Fri, 2009-03-06 at 10:55 +0800, Kouya Shimura wrote:> Hi, > > This patch fixes some spinlock issues related to changeset: > 19246:9bc5799566be "passthrough MSI-X mask bit acceleration" > 19263:9c5b4efc934d "hvm: passthrough MSI-X: fix ia64 link and MSI-X clean up" > > Without this patch, I got: > (XEN) Assertion ''_spin_is_locked(&pcidevs_lock)'' failed at vmsi.c:389 > or > (XEN) Xen BUG at spinlock.c:25 > > > I''m not sure this fix is right. Please review it. > > - d->arch.hvm_domain.msixtbl_list_lock is redundant. > irq_desc->lock covers it, thus remove the spinlock.No, it''s not redundant. msixtbl_list_lock is to protect msixtbl_list, which may contain entries for multiple irqs, consider if two pt_bind_irq are called simultaneously, irq_desc->lock wouldn''t be enough to protect the list.> > - ASSERT(spin_is_locked(&pcidevs_lock)) is not needed. > At first, I intended to add the enclosure of pcidevs_lock. > But this assertion seems pointless. pcidevs_lock is > for alldevs_list and msixtbl_pt_xxx functions never refer it.I was thinking that pcidevs_lock is already held for msixtbl_pt_xxx, but obviously I was wrong:-) However, it''s not that pointless, msixtbl_pt_xxx refers to the content of pci_dev, the use of pcidevs_lock is intended to protect it against destructive modification, like an improper PHYDEVOP_manage_pci_remove. It may not be neccessary for a well-written device model, but that''s something we can''t rely on. A similar problem also exists in msixtbl_write/msixtbl_read. Currently, these functions access the mapped msix table using the pointer stored in pci_dev, and when pci_dev is removed without the knowledge of the guest, pci_dev alongside the fixmaps will be removed and this definitely will frustrate these functions. Unfortunately, adding spinlocks to msixtbl_{read,write} seems quite expensive, especially for multiple guest threads, that''s why RCU based lock is used here. For a completely secure handling of this, maybe we need a less brute, asynchronous pci_remove hypercall or something.> > - In "debug=y" environment, xmalloc must not be called from both > irq_enable and irq_disable state. Otherwise, > the assertion failure occurs in check_lock(). > So, in msixtbl_pt_register, xmalloc is called beforehand. > I thinks this is ugly, though.This is fine to me, thank you for pointing out this. Thanks, Qing> > > Thanks, > Kouya > > Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kouya Shimura
2009-Mar-06 07:31 UTC
Re: [Xen-devel] Re: [PATCH][RFC] fix some spinlock issues in vmsi
Hi Qing, Thank you for review and elaboration. Qing He writes:> > - d->arch.hvm_domain.msixtbl_list_lock is redundant. > > irq_desc->lock covers it, thus remove the spinlock. > > No, it''s not redundant. msixtbl_list_lock is to protect msixtbl_list, > which may contain entries for multiple irqs, consider if two > pt_bind_irq are called simultaneously, irq_desc->lock wouldn''t be > enough to protect the list.I see. Although two pt_bind_irq are hardly called simultaneously.> > - ASSERT(spin_is_locked(&pcidevs_lock)) is not needed. > > At first, I intended to add the enclosure of pcidevs_lock. > > But this assertion seems pointless. pcidevs_lock is > > for alldevs_list and msixtbl_pt_xxx functions never refer it. > > I was thinking that pcidevs_lock is already held for msixtbl_pt_xxx, but > obviously I was wrong:-) However, it''s not that pointless, msixtbl_pt_xxx > refers to the content of pci_dev, the use of pcidevs_lock is intended to > protect it against destructive modification, like an improper > PHYDEVOP_manage_pci_remove. It may not be neccessary for a well-written > device model, but that''s something we can''t rely on.Okay, one more question. If ASSERT(spin_is_locked(&pcidevs_lock)) is needed, msixtbl_list_lock becomes redundant? (i.e. pcidevs_lock must covers it) Any way, msixtbl_list_lock would better be remained for the future... Thanks, Kouya _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-06 08:28 UTC
[Xen-devel] Re: [PATCH][RFC] fix some spinlock issues in vmsi
On 06/03/2009 05:31, "Qing He" <qing.he@intel.com> wrote:> A similar problem also exists in msixtbl_write/msixtbl_read. Currently, > these functions access the mapped msix table using the pointer stored > in pci_dev, and when pci_dev is removed without the knowledge of the > guest, pci_dev alongside the fixmaps will be removed and this > definitely will frustrate these functions. Unfortunately, adding > spinlocks to msixtbl_{read,write} seems quite expensive, especially for > multiple guest threads, that''s why RCU based lock is used here. For a > completely secure handling of this, maybe we need a less brute, > asynchronous pci_remove hypercall or something.Add the locks, making it correct. Then optimise later as necessary, using RCU or (perhaps easier) just smaller locks: Per MSI-X or per-pci-dev or whatever. Throwing in racey code condemns us to subtle races forever more. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Mar-06 09:21 UTC
[Xen-devel] Re: [PATCH][RFC] fix some spinlock issues in vmsi
On Fri, 2009-03-06 at 16:28 +0800, Keir Fraser wrote:> On 06/03/2009 05:31, "Qing He" <qing.he@intel.com> wrote: > > > A similar problem also exists in msixtbl_write/msixtbl_read. Currently, > > these functions access the mapped msix table using the pointer stored > > in pci_dev, and when pci_dev is removed without the knowledge of the > > guest, pci_dev alongside the fixmaps will be removed and this > > definitely will frustrate these functions. Unfortunately, adding > > spinlocks to msixtbl_{read,write} seems quite expensive, especially for > > multiple guest threads, that''s why RCU based lock is used here. For a > > completely secure handling of this, maybe we need a less brute, > > asynchronous pci_remove hypercall or something. > > Add the locks, making it correct. Then optimise later as necessary, using > RCU or (perhaps easier) just smaller locks: Per MSI-X or per-pci-dev or > whatever. Throwing in racey code condemns us to subtle races forever more.Ok, I''ll make a patch to use the locks. Do you think it''s ok to assume that guest has already free the device when calling remove_pci_device on hotplugging? Thanks, Qing> > -- Keir > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-06 12:50 UTC
[Xen-devel] Re: [PATCH][RFC] fix some spinlock issues in vmsi
On 06/03/2009 09:21, "Qing He" <qing.he@intel.com> wrote:>> Add the locks, making it correct. Then optimise later as necessary, using >> RCU or (perhaps easier) just smaller locks: Per MSI-X or per-pci-dev or >> whatever. Throwing in racey code condemns us to subtle races forever more. > > Ok, I''ll make a patch to use the locks. > > Do you think it''s ok to assume that guest has already free the device > when calling remove_pci_device on hotplugging?Do you mean deinitialised and unloaded the device driver? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Mar-06 17:43 UTC
[Xen-devel] Re: [PATCH][RFC] fix some spinlock issues in vmsi
On Fri, 2009-03-06 at 20:50 +0800, Keir Fraser wrote:> On 06/03/2009 09:21, "Qing He" <qing.he@intel.com> wrote: > > >> Add the locks, making it correct. Then optimise later as necessary, using > >> RCU or (perhaps easier) just smaller locks: Per MSI-X or per-pci-dev or > >> whatever. Throwing in racey code condemns us to subtle races forever more. > > > > Ok, I''ll make a patch to use the locks. > > > > Do you think it''s ok to assume that guest has already free the device > > when calling remove_pci_device on hotplugging? > > Do you mean deinitialised and unloaded the device driver?Yes, the racey condition we just talked happens here. Consider the following case, when remove_pci_device is called, the device data structure is removed by the hypervisor. But there is no guarauntee in the hypervisor that guest already has knowledge, for example the guest may continue to access the device resources. It''s in this case msixtbl_{read,write} has a little racey condition. I don''t think this scenario is desirable, but does it happen in the current logic? Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-06 18:12 UTC
[Xen-devel] Re: [PATCH][RFC] fix some spinlock issues in vmsi
On 06/03/2009 17:43, "Qing He" <qing.he@intel.com> wrote:>> Do you mean deinitialised and unloaded the device driver? > > Yes, the racey condition we just talked happens here. Consider > the following case, when remove_pci_device is called, the device > data structure is removed by the hypervisor. But there is no > guarauntee in the hypervisor that guest already has knowledge, > for example the guest may continue to access the device resources. > It''s in this case msixtbl_{read,write} has a little racey condition. > > I don''t think this scenario is desirable, but does it happen in the > current logic?Uh, well since dom0 is in fact mostly in charge of PCI access, there''s not much we can do in Xen to enforce this. At that level we kind of have to assume that dom0 has sequenced hotplug/hotunplug sensibly. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-09 03:19 UTC
[Xen-devel] RE: [PATCH][RFC] fix some spinlock issues in vmsi
I was confused by this also when I try to working on the pci_dev lock. Current implementation did''t assume the device is not owned by guest, that means the IOMMU/MSI are both maybe enabled for a guest when dom0 try to invoke the PHYSDEVOP_manage_pci_remove hypercall. This cause a lot of race condition issue and makes the pci pass-through very complex. Maybe we can simplify through two step: a) Instead of let Xen HV to handle the situation that the device is still owned by other guest, we can have pciback to do that before this hypercall is invoked. For PV driver domain it will be simple since dom0''s kernel should invoke PCIback''s remove() function. For HVM domain, I think it should be ok for stubdomain environment. This will make HV simpler and robust and didn''t make thing worse. b) Find some mechanism so that dom0 can cause virtual hotplug to guest, then hotplug in host side will not cause trouble to guest. There are virtual hotplug for PV guest but it is async method, not sure if that can be integrated with dom0''s hotplug event handling. Not sure if this is feasible. CC Esp and Winston also. Thanks Yunhong Jiang xen-devel-bounces@lists.xensource.com <> wrote:> On 06/03/2009 17:43, "Qing He" <qing.he@intel.com> wrote: > >>> Do you mean deinitialised and unloaded the device driver? >> >> Yes, the racey condition we just talked happens here. Consider >> the following case, when remove_pci_device is called, the device >> data structure is removed by the hypervisor. But there is no >> guarauntee in the hypervisor that guest already has knowledge, >> for example the guest may continue to access the device resources. >> It''s in this case msixtbl_{read,write} has a little racey condition. >> >> I don''t think this scenario is desirable, but does it happen in the >> current logic? > > Uh, well since dom0 is in fact mostly in charge of PCI access, > there''s not > much we can do in Xen to enforce this. At that level we kind of have to > assume that dom0 has sequenced hotplug/hotunplug sensibly. > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Mar-10 01:11 UTC
Re: [Xen-devel] Re: [PATCH][RFC] fix some spinlock issues in vmsi
On Fri, 2009-03-06 at 15:31 +0800, Kouya Shimura wrote:> Qing He writes: > > No, it''s not redundant. msixtbl_list_lock is to protect msixtbl_list, > > which may contain entries for multiple irqs, consider if two > > pt_bind_irq are called simultaneously, irq_desc->lock wouldn''t be > > enough to protect the list. > > I see. Although two pt_bind_irq are hardly called simultaneously.The problem here and in many other place is whether we can safely assume that dom0 operations are completely within expectation> > > > - ASSERT(spin_is_locked(&pcidevs_lock)) is not needed. > > > At first, I intended to add the enclosure of pcidevs_lock. > > > But this assertion seems pointless. pcidevs_lock is > > > for alldevs_list and msixtbl_pt_xxx functions never refer it. > > > > I was thinking that pcidevs_lock is already held for msixtbl_pt_xxx, but > > obviously I was wrong:-) However, it''s not that pointless, msixtbl_pt_xxx > > refers to the content of pci_dev, the use of pcidevs_lock is intended to > > protect it against destructive modification, like an improper > > PHYDEVOP_manage_pci_remove. It may not be neccessary for a well-written > > device model, but that''s something we can''t rely on. > > Okay, one more question. > If ASSERT(spin_is_locked(&pcidevs_lock)) is needed, > msixtbl_list_lock becomes redundant? > (i.e. pcidevs_lock must covers it) > Any way, msixtbl_list_lock would better be remained for the future...I''m OK with that, though I think msixtbl_list_lock is more closer to the list. Are you going to use something named like this? Thanks, Qing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel