Greg KH
2022-Oct-21 08:36 UTC
[Ocfs2-devel] [PATCH 00/11] fix memory leak while kset_register() fails
On Fri, Oct 21, 2022 at 04:24:23PM +0800, Yang Yingliang wrote:> > On 2022/10/21 13:37, Greg KH wrote: > > On Fri, Oct 21, 2022 at 01:29:31AM -0400, Luben Tuikov wrote: > > > On 2022-10-20 22:20, Yang Yingliang wrote: > > > > The previous discussion link: > > > > https://lore.kernel.org/lkml/0db486eb-6927-927e-3629-958f8f211194 at huawei.com/T/ > > > The very first discussion on this was here: > > > > > > https://www.spinics.net/lists/dri-devel/msg368077.html > > > > > > Please use this link, and not the that one up there you which quoted above, > > > and whose commit description is taken verbatim from the this link. > > > > > > > kset_register() is currently used in some places without calling > > > > kset_put() in error path, because the callers think it should be > > > > kset internal thing to do, but the driver core can not know what > > > > caller doing with that memory at times. The memory could be freed > > > > both in kset_put() and error path of caller, if it is called in > > > > kset_register(). > > > As I explained in the link above, the reason there's > > > a memory leak is that one cannot call kset_register() without > > > the kset->kobj.name being set--kobj_add_internal() returns -EINVAL, > > > in this case, i.e. kset_register() fails with -EINVAL. > > > > > > Thus, the most common usage is something like this: > > > > > > kobj_set_name(&kset->kobj, format, ...); > > > kset->kobj.kset = parent_kset; > > > kset->kobj.ktype = ktype; > > > res = kset_register(kset); > > > > > > So, what is being leaked, is the memory allocated in kobj_set_name(), > > > by the common idiom shown above. This needs to be mentioned in > > > the documentation, at least, in case, in the future this is absolved > > > in kset_register() redesign, etc. > > Based on this, can kset_register() just clean up from itself when an > > error happens? Ideally that would be the case, as the odds of a kset > > being embedded in a larger structure is probably slim, but we would have > > to search the tree to make sure. > I have search the whole tree, the kset used in bus_register() - patch #3, > kset_create_and_add() - patch #4 > __class_register() - patch #5,? fw_cfg_build_symlink() - patch #6 and > amdgpu_discovery.c - patch #10 > is embedded in a larger structure. In these cases, we can not call > kset_put() in error path in kset_register()Yes you can as the kobject in the kset should NOT be controling the lifespan of those larger objects. If it is, please point out the call chain here as I don't think that should be possible. Note all of this is a mess because the kobject name stuff was added much later, after the driver model had been created and running for a while. We missed this error path when adding the dynamic kobject name logic, thank for looking into this. If you could test the patch posted with your error injection systems, that could make this all much simpler to solve. thanks, greg k-h
Luben Tuikov
2022-Oct-21 08:52 UTC
[Ocfs2-devel] [PATCH 00/11] fix memory leak while kset_register() fails
On 2022-10-21 04:36, Greg KH wrote:> On Fri, Oct 21, 2022 at 04:24:23PM +0800, Yang Yingliang wrote: >> >> On 2022/10/21 13:37, Greg KH wrote: >>> On Fri, Oct 21, 2022 at 01:29:31AM -0400, Luben Tuikov wrote: >>>> On 2022-10-20 22:20, Yang Yingliang wrote: >>>>> The previous discussion link: >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F0db486eb-6927-927e-3629-958f8f211194%40huawei.com%2FT%2F&data=05%7C01%7Cluben.tuikov%40amd.com%7Ca8206f9348e04b13e3da08dab33f4f53%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019381738406942%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HqvF1p4ejF6%2BYS5u0pe15ZdDgUAIVP%2BB1xQXICWjNwY%3D&reserved=0 >>>> The very first discussion on this was here: >>>> >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Fdri-devel%2Fmsg368077.html&data=05%7C01%7Cluben.tuikov%40amd.com%7Ca8206f9348e04b13e3da08dab33f4f53%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019381738406942%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LmRgWUSQgK6wJvMdfBgjO4CiaKQ2TBoeW836r0UbcjU%3D&reserved=0 >>>> >>>> Please use this link, and not the that one up there you which quoted above, >>>> and whose commit description is taken verbatim from the this link. >>>> >>>>> kset_register() is currently used in some places without calling >>>>> kset_put() in error path, because the callers think it should be >>>>> kset internal thing to do, but the driver core can not know what >>>>> caller doing with that memory at times. The memory could be freed >>>>> both in kset_put() and error path of caller, if it is called in >>>>> kset_register(). >>>> As I explained in the link above, the reason there's >>>> a memory leak is that one cannot call kset_register() without >>>> the kset->kobj.name being set--kobj_add_internal() returns -EINVAL, >>>> in this case, i.e. kset_register() fails with -EINVAL. >>>> >>>> Thus, the most common usage is something like this: >>>> >>>> kobj_set_name(&kset->kobj, format, ...); >>>> kset->kobj.kset = parent_kset; >>>> kset->kobj.ktype = ktype; >>>> res = kset_register(kset); >>>> >>>> So, what is being leaked, is the memory allocated in kobj_set_name(), >>>> by the common idiom shown above. This needs to be mentioned in >>>> the documentation, at least, in case, in the future this is absolved >>>> in kset_register() redesign, etc. >>> Based on this, can kset_register() just clean up from itself when an >>> error happens? Ideally that would be the case, as the odds of a kset >>> being embedded in a larger structure is probably slim, but we would have >>> to search the tree to make sure. >> I have search the whole tree, the kset used in bus_register() - patch #3, >> kset_create_and_add() - patch #4 >> __class_register() - patch #5,? fw_cfg_build_symlink() - patch #6 and >> amdgpu_discovery.c - patch #10 >> is embedded in a larger structure. In these cases, we can not call >> kset_put() in error path in kset_register() > > Yes you can as the kobject in the kset should NOT be controling the > lifespan of those larger objects. > > If it is, please point out the call chain here as I don't think that > should be possible.WLOG, I believe it is something like this: x = kzalloc(); kobject_set_name(&x->kset.kobj, format, ...); x->kset.kobj.kset = parent_kset; x->kset.kobj.ktype = this_ktype; /* this has a .release which frees x */ res = kset_register(&x->kset); if (res) { kset_put(&x->kset); /* calls this_ktype->release() which frees x */ kfree(x); /* <-- double free */ } And since kref is set to 1, in kset_register(), then we'd double free. This is why I don't have kset_put() in that error path in amdgpu. Regards, Luben
Yang Yingliang
2022-Oct-21 08:59 UTC
[Ocfs2-devel] [PATCH 00/11] fix memory leak while kset_register() fails
On 2022/10/21 16:36, Greg KH wrote:> On Fri, Oct 21, 2022 at 04:24:23PM +0800, Yang Yingliang wrote: >> On 2022/10/21 13:37, Greg KH wrote: >>> On Fri, Oct 21, 2022 at 01:29:31AM -0400, Luben Tuikov wrote: >>>> On 2022-10-20 22:20, Yang Yingliang wrote: >>>>> The previous discussion link: >>>>> https://lore.kernel.org/lkml/0db486eb-6927-927e-3629-958f8f211194 at huawei.com/T/ >>>> The very first discussion on this was here: >>>> >>>> https://www.spinics.net/lists/dri-devel/msg368077.html >>>> >>>> Please use this link, and not the that one up there you which quoted above, >>>> and whose commit description is taken verbatim from the this link. >>>> >>>>> kset_register() is currently used in some places without calling >>>>> kset_put() in error path, because the callers think it should be >>>>> kset internal thing to do, but the driver core can not know what >>>>> caller doing with that memory at times. The memory could be freed >>>>> both in kset_put() and error path of caller, if it is called in >>>>> kset_register(). >>>> As I explained in the link above, the reason there's >>>> a memory leak is that one cannot call kset_register() without >>>> the kset->kobj.name being set--kobj_add_internal() returns -EINVAL, >>>> in this case, i.e. kset_register() fails with -EINVAL. >>>> >>>> Thus, the most common usage is something like this: >>>> >>>> kobj_set_name(&kset->kobj, format, ...); >>>> kset->kobj.kset = parent_kset; >>>> kset->kobj.ktype = ktype; >>>> res = kset_register(kset); >>>> >>>> So, what is being leaked, is the memory allocated in kobj_set_name(), >>>> by the common idiom shown above. This needs to be mentioned in >>>> the documentation, at least, in case, in the future this is absolved >>>> in kset_register() redesign, etc. >>> Based on this, can kset_register() just clean up from itself when an >>> error happens? Ideally that would be the case, as the odds of a kset >>> being embedded in a larger structure is probably slim, but we would have >>> to search the tree to make sure. >> I have search the whole tree, the kset used in bus_register() - patch #3, >> kset_create_and_add() - patch #4 >> __class_register() - patch #5,? fw_cfg_build_symlink() - patch #6 and >> amdgpu_discovery.c - patch #10 >> is embedded in a larger structure. In these cases, we can not call >> kset_put() in error path in kset_register() > Yes you can as the kobject in the kset should NOT be controling the > lifespan of those larger objects. > > If it is, please point out the call chain here as I don't think that > should be possible. > > Note all of this is a mess because the kobject name stuff was added much > later, after the driver model had been created and running for a while. > We missed this error path when adding the dynamic kobject name logic, > thank for looking into this. > > If you could test the patch posted with your error injection systems, > that could make this all much simpler to solve.The patch posted by Luben will cause double free in some cases. From 71e0a22801c0699f67ea40ed96e0a7d7d9e0f318 Mon Sep 17 00:00:00 2001 From: Luben Tuikov <luben.tuikov at amd.com> Date: Fri, 21 Oct 2022 03:34:21 -0400 Subject: [PATCH] kobject: Add kset_put() if kset_register() fails X-check-string-leak: v1.0 If kset_register() fails, we call kset_put() before returning the error. This makes sure that we free memory allocated by kobj_set_name() for the kset, since kset_register() cannot be called unless the kset has a name, usually gotten via kobj_set_name(&kset->kobj, format, ...); Cc: Greg Kroah-Hartman <gregkh at linuxfoundation.org> Cc: Rafael J. Wysocki <rafael at kernel.org> Cc: Yang Yingliang <yangyingliang at huawei.com> Cc: Linux Kernel Mailing List <linux-kernel at vger.kernel.org> Signed-off-by: Luben Tuikov <luben.tuikov at amd.com> --- ?lib/kobject.c | 4 +++- ?1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/kobject.c b/lib/kobject.c index a0b2dbfcfa2334..c122b979f2b75e 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -844,8 +844,10 @@ int kset_register(struct kset *k) ???? kset_init(k); ???? err = kobject_add_internal(&k->kobj); -??? if (err) +??? if (err) { +??? ??? kset_put(k); ???? ??? return err; +??? } ???? kobject_uevent(&k->kobj, KOBJ_ADD); ???? return 0; ?} -- 2.38.0-rc2> > thanks, > > greg k-h > .
Yang Yingliang
2022-Oct-21 09:12 UTC
[Ocfs2-devel] [PATCH 00/11] fix memory leak while kset_register() fails
On 2022/10/21 16:36, Greg KH wrote:> On Fri, Oct 21, 2022 at 04:24:23PM +0800, Yang Yingliang wrote: >> On 2022/10/21 13:37, Greg KH wrote: >>> On Fri, Oct 21, 2022 at 01:29:31AM -0400, Luben Tuikov wrote: >>>> On 2022-10-20 22:20, Yang Yingliang wrote: >>>>> The previous discussion link: >>>>> https://lore.kernel.org/lkml/0db486eb-6927-927e-3629-958f8f211194 at huawei.com/T/ >>>> The very first discussion on this was here: >>>> >>>> https://www.spinics.net/lists/dri-devel/msg368077.html >>>> >>>> Please use this link, and not the that one up there you which quoted above, >>>> and whose commit description is taken verbatim from the this link. >>>> >>>>> kset_register() is currently used in some places without calling >>>>> kset_put() in error path, because the callers think it should be >>>>> kset internal thing to do, but the driver core can not know what >>>>> caller doing with that memory at times. The memory could be freed >>>>> both in kset_put() and error path of caller, if it is called in >>>>> kset_register(). >>>> As I explained in the link above, the reason there's >>>> a memory leak is that one cannot call kset_register() without >>>> the kset->kobj.name being set--kobj_add_internal() returns -EINVAL, >>>> in this case, i.e. kset_register() fails with -EINVAL. >>>> >>>> Thus, the most common usage is something like this: >>>> >>>> kobj_set_name(&kset->kobj, format, ...); >>>> kset->kobj.kset = parent_kset; >>>> kset->kobj.ktype = ktype; >>>> res = kset_register(kset); >>>> >>>> So, what is being leaked, is the memory allocated in kobj_set_name(), >>>> by the common idiom shown above. This needs to be mentioned in >>>> the documentation, at least, in case, in the future this is absolved >>>> in kset_register() redesign, etc. >>> Based on this, can kset_register() just clean up from itself when an >>> error happens? Ideally that would be the case, as the odds of a kset >>> being embedded in a larger structure is probably slim, but we would have >>> to search the tree to make sure. >> I have search the whole tree, the kset used in bus_register() - patch #3, >> kset_create_and_add() - patch #4 >> __class_register() - patch #5,? fw_cfg_build_symlink() - patch #6 and >> amdgpu_discovery.c - patch #10 >> is embedded in a larger structure. In these cases, we can not call >> kset_put() in error path in kset_register() > Yes you can as the kobject in the kset should NOT be controling the > lifespan of those larger objects.Read through the code the only leak in this case is the name, so can we free it directly in kset_register(): --- a/lib/kobject.c +++ b/lib/kobject.c @@ -844,8 +844,11 @@ int kset_register(struct kset *k) ??????? kset_init(k); ??????? err = kobject_add_internal(&k->kobj); -?????? if (err) +?????? if (err) { +?????????????? kfree_const(k->kobj.name); +?????????????? k->kobj.name = NULL; ??????????????? return err; +?????? } ??????? kobject_uevent(&k->kobj, KOBJ_ADD); ??????? return 0; ?} or unset ktype of kobject, then call kset_put(): --- a/lib/kobject.c +++ b/lib/kobject.c @@ -844,8 +844,11 @@ int kset_register(struct kset *k) ??????? kset_init(k); ??????? err = kobject_add_internal(&k->kobj); -?????? if (err) +?????? if (err) { +?????????????? k->kobj.ktype = NULL; +?????????????? kset_put(k); ??????????????? return err; +?????? } ??????? kobject_uevent(&k->kobj, KOBJ_ADD); ??????? return 0; ?}> > If it is, please point out the call chain here as I don't think that > should be possible. > > Note all of this is a mess because the kobject name stuff was added much > later, after the driver model had been created and running for a while. > We missed this error path when adding the dynamic kobject name logic, > thank for looking into this. > > If you could test the patch posted with your error injection systems, > that could make this all much simpler to solve. > > thanks, > > greg k-h > .