Michael S. Tsirkin
2019-Sep-26 15:46 UTC
[PATCH V2 5/8] mdev: introduce device specific ops
On Wed, Sep 25, 2019 at 10:30:28AM -0600, Alex Williamson wrote:> On Wed, 25 Sep 2019 10:11:00 -0400 > Rob Miller <rob.miller at broadcom.com> wrote: > > > > On Tue, 24 Sep 2019 21:53:29 +0800 > > > > Jason Wang <jasowang at redhat.com> wrote: > > > > > diff --git a/drivers/vfio/mdev/vfio_mdev.c > > > > b/drivers/vfio/mdev/vfio_mdev.c > > > > > index 891cf83a2d9a..95efa054442f 100644 > > > > > --- a/drivers/vfio/mdev/vfio_mdev.c > > > > > +++ b/drivers/vfio/mdev/vfio_mdev.c > > > > > @@ -14,6 +14,7 @@ > > > > > #include <linux/slab.h> > > > > > #include <linux/vfio.h> > > > > > #include <linux/mdev.h> > > > > > +#include <linux/vfio_mdev.h> > > > > > > > > > > #include "mdev_private.h" > > > > > > > > > > @@ -24,16 +25,16 @@ > > > > > static int vfio_mdev_open(void *device_data) > > > > > { > > > > > struct mdev_device *mdev = device_data; > > > > > - struct mdev_parent *parent = mdev->parent; > > > > > + const struct vfio_mdev_device_ops *ops = > > > > mdev_get_dev_ops(mdev); > > > > > int ret; > > > > > > > > > > - if (unlikely(!parent->ops->open)) > > > > > + if (unlikely(!ops->open)) > > > > > return -EINVAL; > > > > > > > > > > if (!try_module_get(THIS_MODULE)) > > > > > return -ENODEV; > > > > > > > RJM>] My understanding lately is that this call to > > try_module_get(THIS_MODULE) is no longer needed as is considered as a > > latent bug. > > Quote from > > https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put > > : > > There are a number of uses of try_module_get(THIS_MODULE) in the kernel > > source but most if not all of them are latent bugs that should be cleaned > > up. > > This use seems to fall exactly into the case where it is necessary, the > open here is not a direct VFS call, it's an internal interface between > modules. The user is interacting with filesystem objects from the vfio > module and the module reference we're trying to acquire here is to the > vfio-mdev module. Thanks, > > AlexI think the latent bug refers not to module get per se, but to the module_put tied to it. E.g.: static void vfio_mdev_release(void *device_data) { struct mdev_device *mdev = device_data; struct mdev_parent *parent = mdev->parent; if (likely(parent->ops->release)) parent->ops->release(mdev); module_put(THIS_MODULE); Does anything prevent the module from unloading at this point? if not then ... } it looks like the implicit return (with instructions for argument pop and functuon return) here can get overwritten on module unload, causing a crash when executed. IOW there's generally no way for module to keep a reference to itself: it can take a reference but it needs someone else to keep it and put. -- MST
On Thu, 26 Sep 2019 11:46:55 -0400 "Michael S. Tsirkin" <mst at redhat.com> wrote:> On Wed, Sep 25, 2019 at 10:30:28AM -0600, Alex Williamson wrote: > > On Wed, 25 Sep 2019 10:11:00 -0400 > > Rob Miller <rob.miller at broadcom.com> wrote: > > > > > On Tue, 24 Sep 2019 21:53:29 +0800 > > > > > Jason Wang <jasowang at redhat.com> wrote: > > > > > > diff --git a/drivers/vfio/mdev/vfio_mdev.c > > > > > b/drivers/vfio/mdev/vfio_mdev.c > > > > > > index 891cf83a2d9a..95efa054442f 100644 > > > > > > --- a/drivers/vfio/mdev/vfio_mdev.c > > > > > > +++ b/drivers/vfio/mdev/vfio_mdev.c > > > > > > @@ -14,6 +14,7 @@ > > > > > > #include <linux/slab.h> > > > > > > #include <linux/vfio.h> > > > > > > #include <linux/mdev.h> > > > > > > +#include <linux/vfio_mdev.h> > > > > > > > > > > > > #include "mdev_private.h" > > > > > > > > > > > > @@ -24,16 +25,16 @@ > > > > > > static int vfio_mdev_open(void *device_data) > > > > > > { > > > > > > struct mdev_device *mdev = device_data; > > > > > > - struct mdev_parent *parent = mdev->parent; > > > > > > + const struct vfio_mdev_device_ops *ops = > > > > > mdev_get_dev_ops(mdev); > > > > > > int ret; > > > > > > > > > > > > - if (unlikely(!parent->ops->open)) > > > > > > + if (unlikely(!ops->open)) > > > > > > return -EINVAL; > > > > > > > > > > > > if (!try_module_get(THIS_MODULE)) > > > > > > return -ENODEV; > > > > > > > > > > RJM>] My understanding lately is that this call to > > > try_module_get(THIS_MODULE) is no longer needed as is considered as a > > > latent bug. > > > Quote from > > > https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put > > > : > > > There are a number of uses of try_module_get(THIS_MODULE) in the kernel > > > source but most if not all of them are latent bugs that should be cleaned > > > up. > > > > This use seems to fall exactly into the case where it is necessary, the > > open here is not a direct VFS call, it's an internal interface between > > modules. The user is interacting with filesystem objects from the vfio > > module and the module reference we're trying to acquire here is to the > > vfio-mdev module. Thanks, > > > > Alex > > > I think the latent bug refers not to module get per se, > but to the module_put tied to it. E.g.: > > static void vfio_mdev_release(void *device_data) > { > struct mdev_device *mdev = device_data; > struct mdev_parent *parent = mdev->parent; > > if (likely(parent->ops->release)) > parent->ops->release(mdev); > > module_put(THIS_MODULE); > > Does anything prevent the module from unloading at this point? > if not then ... > > > } > > it looks like the implicit return (with instructions for argument pop > and functuon return) here can get overwritten on module > unload, causing a crash when executed. > > IOW there's generally no way for module to keep a reference > to itself: it can take a reference but it needs someone else > to keep it and put.I'd always assumed this would exit cleanly, but perhaps there is a latent race there. In any case, taking a module reference within the module in this case is better than not doing so, as the latter would potentially allow the module to be removed at any point in time, while the former only seems to expose acquire and release gaps. Add it to the todo list. Thanks, Alex
Michael S. Tsirkin
2019-Sep-26 16:34 UTC
[PATCH V2 5/8] mdev: introduce device specific ops
On Thu, Sep 26, 2019 at 10:26:08AM -0600, Alex Williamson wrote:> On Thu, 26 Sep 2019 11:46:55 -0400 > "Michael S. Tsirkin" <mst at redhat.com> wrote: > > > On Wed, Sep 25, 2019 at 10:30:28AM -0600, Alex Williamson wrote: > > > On Wed, 25 Sep 2019 10:11:00 -0400 > > > Rob Miller <rob.miller at broadcom.com> wrote: > > > > > > On Tue, 24 Sep 2019 21:53:29 +0800 > > > > > > Jason Wang <jasowang at redhat.com> wrote: > > > > > > > diff --git a/drivers/vfio/mdev/vfio_mdev.c > > > > > > b/drivers/vfio/mdev/vfio_mdev.c > > > > > > > index 891cf83a2d9a..95efa054442f 100644 > > > > > > > --- a/drivers/vfio/mdev/vfio_mdev.c > > > > > > > +++ b/drivers/vfio/mdev/vfio_mdev.c > > > > > > > @@ -14,6 +14,7 @@ > > > > > > > #include <linux/slab.h> > > > > > > > #include <linux/vfio.h> > > > > > > > #include <linux/mdev.h> > > > > > > > +#include <linux/vfio_mdev.h> > > > > > > > > > > > > > > #include "mdev_private.h" > > > > > > > > > > > > > > @@ -24,16 +25,16 @@ > > > > > > > static int vfio_mdev_open(void *device_data) > > > > > > > { > > > > > > > struct mdev_device *mdev = device_data; > > > > > > > - struct mdev_parent *parent = mdev->parent; > > > > > > > + const struct vfio_mdev_device_ops *ops = > > > > > > mdev_get_dev_ops(mdev); > > > > > > > int ret; > > > > > > > > > > > > > > - if (unlikely(!parent->ops->open)) > > > > > > > + if (unlikely(!ops->open)) > > > > > > > return -EINVAL; > > > > > > > > > > > > > > if (!try_module_get(THIS_MODULE)) > > > > > > > return -ENODEV; > > > > > > > > > > > > > RJM>] My understanding lately is that this call to > > > > try_module_get(THIS_MODULE) is no longer needed as is considered as a > > > > latent bug. > > > > Quote from > > > > https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put > > > > : > > > > There are a number of uses of try_module_get(THIS_MODULE) in the kernel > > > > source but most if not all of them are latent bugs that should be cleaned > > > > up. > > > > > > This use seems to fall exactly into the case where it is necessary, the > > > open here is not a direct VFS call, it's an internal interface between > > > modules. The user is interacting with filesystem objects from the vfio > > > module and the module reference we're trying to acquire here is to the > > > vfio-mdev module. Thanks, > > > > > > Alex > > > > > > I think the latent bug refers not to module get per se, > > but to the module_put tied to it. E.g.: > > > > static void vfio_mdev_release(void *device_data) > > { > > struct mdev_device *mdev = device_data; > > struct mdev_parent *parent = mdev->parent; > > > > if (likely(parent->ops->release)) > > parent->ops->release(mdev); > > > > module_put(THIS_MODULE); > > > > Does anything prevent the module from unloading at this point? > > if not then ... > > > > > > } > > > > it looks like the implicit return (with instructions for argument pop > > and functuon return) here can get overwritten on module > > unload, causing a crash when executed. > > > > IOW there's generally no way for module to keep a reference > > to itself: it can take a reference but it needs someone else > > to keep it and put. > > I'd always assumed this would exit cleanly, but perhaps there is a > latent race there. In any case, taking a module reference within the > module in this case is better than not doing so, as the latter would > potentially allow the module to be removed at any point in time, while > the former only seems to expose acquire and release gaps. Add it to > the todo list. Thanks, > > AlexRight. I agree with the stack overflow quote: as this example seems to show this is a latent bug. But I also agree that just removing the reference isn't the right way to clean it up. -- MST