Luben Tuikov
2022-Oct-21 08:24 UTC
[Ocfs2-devel] [PATCH 00/11] fix memory leak while kset_register() fails
On 2022-10-21 04:18, Greg KH wrote:> On Fri, Oct 21, 2022 at 03:55:18AM -0400, Luben Tuikov wrote: >> On 2022-10-21 01:37, Greg KH wrote: >>> On Fri, Oct 21, 2022 at 01:29:31AM -0400, Luben Tuikov wrote: >>>> On 2022-10-20 22:20, Yang Yingliang wrote: >>>>> The previous discussion link: >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F0db486eb-6927-927e-3629-958f8f211194%40huawei.com%2FT%2F&data=05%7C01%7Cluben.tuikov%40amd.com%7Cd41da3fd6449492d01f808dab33cdb75%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019371236833115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=C%2Bj1THkHpzVGks5eqB%2Fm%2FPAkMRohR7CYvRnOCqUqdcM%3D&reserved=0 >>>> >>>> The very first discussion on this was here: >>>> >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Fdri-devel%2Fmsg368077.html&data=05%7C01%7Cluben.tuikov%40amd.com%7Cd41da3fd6449492d01f808dab33cdb75%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019371236833115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pSR10abmK8nAMvKSezqWC0SPUBL4qEwtCCizyIKW7Dc%3D&reserved=0 >>>> >>>> Please use this link, and not the that one up there you which quoted above, >>>> and whose commit description is taken verbatim from the this link. >>>> >>>>> >>>>> kset_register() is currently used in some places without calling >>>>> kset_put() in error path, because the callers think it should be >>>>> kset internal thing to do, but the driver core can not know what >>>>> caller doing with that memory at times. The memory could be freed >>>>> both in kset_put() and error path of caller, if it is called in >>>>> kset_register(). >>>> >>>> As I explained in the link above, the reason there's >>>> a memory leak is that one cannot call kset_register() without >>>> the kset->kobj.name being set--kobj_add_internal() returns -EINVAL, >>>> in this case, i.e. kset_register() fails with -EINVAL. >>>> >>>> Thus, the most common usage is something like this: >>>> >>>> kobj_set_name(&kset->kobj, format, ...); >>>> kset->kobj.kset = parent_kset; >>>> kset->kobj.ktype = ktype; >>>> res = kset_register(kset); >>>> >>>> So, what is being leaked, is the memory allocated in kobj_set_name(), >>>> by the common idiom shown above. This needs to be mentioned in >>>> the documentation, at least, in case, in the future this is absolved >>>> in kset_register() redesign, etc. >>> >>> Based on this, can kset_register() just clean up from itself when an >>> error happens? Ideally that would be the case, as the odds of a kset >>> being embedded in a larger structure is probably slim, but we would have >>> to search the tree to make sure. >> >> Looking at kset_register(), we can add kset_put() in the error path, >> when kobject_add_internal(&kset->kobj) fails. >> >> See the attached patch. It needs to be tested with the same error injection >> as Yang has been doing. >> >> Now, struct kset is being embedded in larger structs--see amdgpu_discovery.c >> starting at line 575. If you're on an AMD system, it gets you the tree >> structure you'll see when you run "tree /sys/class/drm/card0/device/ip_discovery/". >> That shouldn't be a problem though. > > Yes, that shouldn't be an issue as the kobject embedded in a kset is > ONLY for that kset itself, the kset structure should not be controling > the lifespan of the object it is embedded in, right?Yes, and it doesn't. It only does a kobject_get(parent) and kobject_put(parent). So that's fine and natural. Yang, do you want to try the patch in my previous email in this thread, since you've got the error injection set up already? Regards, Luben
Luben Tuikov
2022-Oct-21 08:41 UTC
[Ocfs2-devel] [PATCH 00/11] fix memory leak while kset_register() fails
On 2022-10-21 04:24, Luben Tuikov wrote:> On 2022-10-21 04:18, Greg KH wrote: >> On Fri, Oct 21, 2022 at 03:55:18AM -0400, Luben Tuikov wrote: >>> On 2022-10-21 01:37, Greg KH wrote: >>>> On Fri, Oct 21, 2022 at 01:29:31AM -0400, Luben Tuikov wrote: >>>>> On 2022-10-20 22:20, Yang Yingliang wrote: >>>>>> The previous discussion link: >>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F0db486eb-6927-927e-3629-958f8f211194%40huawei.com%2FT%2F&data=05%7C01%7Cluben.tuikov%40amd.com%7Cd41da3fd6449492d01f808dab33cdb75%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019371236833115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=C%2Bj1THkHpzVGks5eqB%2Fm%2FPAkMRohR7CYvRnOCqUqdcM%3D&reserved=0 >>>>> >>>>> The very first discussion on this was here: >>>>> >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Fdri-devel%2Fmsg368077.html&data=05%7C01%7Cluben.tuikov%40amd.com%7Cd41da3fd6449492d01f808dab33cdb75%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019371236833115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pSR10abmK8nAMvKSezqWC0SPUBL4qEwtCCizyIKW7Dc%3D&reserved=0 >>>>> >>>>> Please use this link, and not the that one up there you which quoted above, >>>>> and whose commit description is taken verbatim from the this link. >>>>> >>>>>> >>>>>> kset_register() is currently used in some places without calling >>>>>> kset_put() in error path, because the callers think it should be >>>>>> kset internal thing to do, but the driver core can not know what >>>>>> caller doing with that memory at times. The memory could be freed >>>>>> both in kset_put() and error path of caller, if it is called in >>>>>> kset_register(). >>>>> >>>>> As I explained in the link above, the reason there's >>>>> a memory leak is that one cannot call kset_register() without >>>>> the kset->kobj.name being set--kobj_add_internal() returns -EINVAL, >>>>> in this case, i.e. kset_register() fails with -EINVAL. >>>>> >>>>> Thus, the most common usage is something like this: >>>>> >>>>> kobj_set_name(&kset->kobj, format, ...); >>>>> kset->kobj.kset = parent_kset; >>>>> kset->kobj.ktype = ktype; >>>>> res = kset_register(kset); >>>>> >>>>> So, what is being leaked, is the memory allocated in kobj_set_name(), >>>>> by the common idiom shown above. This needs to be mentioned in >>>>> the documentation, at least, in case, in the future this is absolved >>>>> in kset_register() redesign, etc. >>>> >>>> Based on this, can kset_register() just clean up from itself when an >>>> error happens? Ideally that would be the case, as the odds of a kset >>>> being embedded in a larger structure is probably slim, but we would have >>>> to search the tree to make sure. >>> >>> Looking at kset_register(), we can add kset_put() in the error path, >>> when kobject_add_internal(&kset->kobj) fails. >>> >>> See the attached patch. It needs to be tested with the same error injection >>> as Yang has been doing. >>> >>> Now, struct kset is being embedded in larger structs--see amdgpu_discovery.c >>> starting at line 575. If you're on an AMD system, it gets you the tree >>> structure you'll see when you run "tree /sys/class/drm/card0/device/ip_discovery/". >>> That shouldn't be a problem though. >> >> Yes, that shouldn't be an issue as the kobject embedded in a kset is >> ONLY for that kset itself, the kset structure should not be controling >> the lifespan of the object it is embedded in, right? > > Yes, and it doesn't. It only does a kobject_get(parent) and kobject_put(parent). > So that's fine and natural. > > Yang, do you want to try the patch in my previous email in this thread, since you've > got the error injection set up already?I spoke too soon. I believe you're onto something, because looking at the idiom: kobj_set_name(&kset->kobj, format, ...); kset->kobj.kset = parent_kset; kset->kobj.ktype = ktype; res = kset_register(kset); The ktype defines a release method, which frees the larger struct the kset is embedded in. And this would result in double free, for instance in the amdgpu_discovery.c code, if kset_put() is called after kset_register() fails, since we kzalloc the larger object just before and kfree it on error just after. Ideally, we'd only "revert" the actions done by kobj_set_name(), as there's some error recovery on create_dir() in kobject_add_internal(). So, we cannot do this business with the kset_put() on error from kset_register(), after all. Not sure how this wasn't caught in Yang's testing--the kernel should've complained. Regards, Luben
Yang Yingliang
2022-Oct-21 09:23 UTC
[Ocfs2-devel] [PATCH 00/11] fix memory leak while kset_register() fails
On 2022/10/21 16:41, Luben Tuikov wrote:> On 2022-10-21 04:24, Luben Tuikov wrote: >> On 2022-10-21 04:18, Greg KH wrote: >>> On Fri, Oct 21, 2022 at 03:55:18AM -0400, Luben Tuikov wrote: >>>> On 2022-10-21 01:37, Greg KH wrote: >>>>> On Fri, Oct 21, 2022 at 01:29:31AM -0400, Luben Tuikov wrote: >>>>>> On 2022-10-20 22:20, Yang Yingliang wrote: >>>>>>> The previous discussion link: >>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F0db486eb-6927-927e-3629-958f8f211194%40huawei.com%2FT%2F&data=05%7C01%7Cluben.tuikov%40amd.com%7Cd41da3fd6449492d01f808dab33cdb75%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019371236833115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=C%2Bj1THkHpzVGks5eqB%2Fm%2FPAkMRohR7CYvRnOCqUqdcM%3D&reserved=0 >>>>>> The very first discussion on this was here: >>>>>> >>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Fdri-devel%2Fmsg368077.html&data=05%7C01%7Cluben.tuikov%40amd.com%7Cd41da3fd6449492d01f808dab33cdb75%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019371236833115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pSR10abmK8nAMvKSezqWC0SPUBL4qEwtCCizyIKW7Dc%3D&reserved=0 >>>>>> >>>>>> Please use this link, and not the that one up there you which quoted above, >>>>>> and whose commit description is taken verbatim from the this link. >>>>>> >>>>>>> kset_register() is currently used in some places without calling >>>>>>> kset_put() in error path, because the callers think it should be >>>>>>> kset internal thing to do, but the driver core can not know what >>>>>>> caller doing with that memory at times. The memory could be freed >>>>>>> both in kset_put() and error path of caller, if it is called in >>>>>>> kset_register(). >>>>>> As I explained in the link above, the reason there's >>>>>> a memory leak is that one cannot call kset_register() without >>>>>> the kset->kobj.name being set--kobj_add_internal() returns -EINVAL, >>>>>> in this case, i.e. kset_register() fails with -EINVAL. >>>>>> >>>>>> Thus, the most common usage is something like this: >>>>>> >>>>>> kobj_set_name(&kset->kobj, format, ...); >>>>>> kset->kobj.kset = parent_kset; >>>>>> kset->kobj.ktype = ktype; >>>>>> res = kset_register(kset); >>>>>> >>>>>> So, what is being leaked, is the memory allocated in kobj_set_name(), >>>>>> by the common idiom shown above. This needs to be mentioned in >>>>>> the documentation, at least, in case, in the future this is absolved >>>>>> in kset_register() redesign, etc. >>>>> Based on this, can kset_register() just clean up from itself when an >>>>> error happens? Ideally that would be the case, as the odds of a kset >>>>> being embedded in a larger structure is probably slim, but we would have >>>>> to search the tree to make sure. >>>> Looking at kset_register(), we can add kset_put() in the error path, >>>> when kobject_add_internal(&kset->kobj) fails. >>>> >>>> See the attached patch. It needs to be tested with the same error injection >>>> as Yang has been doing. >>>> >>>> Now, struct kset is being embedded in larger structs--see amdgpu_discovery.c >>>> starting at line 575. If you're on an AMD system, it gets you the tree >>>> structure you'll see when you run "tree /sys/class/drm/card0/device/ip_discovery/". >>>> That shouldn't be a problem though. >>> Yes, that shouldn't be an issue as the kobject embedded in a kset is >>> ONLY for that kset itself, the kset structure should not be controling >>> the lifespan of the object it is embedded in, right? >> Yes, and it doesn't. It only does a kobject_get(parent) and kobject_put(parent). >> So that's fine and natural. >> >> Yang, do you want to try the patch in my previous email in this thread, since you've >> got the error injection set up already? > I spoke too soon. I believe you're onto something, because looking at the idiom: > > kobj_set_name(&kset->kobj, format, ...); > kset->kobj.kset = parent_kset; > kset->kobj.ktype = ktype; > res = kset_register(kset); > > The ktype defines a release method, which frees the larger struct the kset is embedded in. > And this would result in double free, for instance in the amdgpu_discovery.c code, if > kset_put() is called after kset_register() fails, since we kzalloc the larger object > just before and kfree it on error just after. Ideally, we'd only "revert" the actions > done by kobj_set_name(), as there's some error recovery on create_dir() in kobject_add_internal(). > > So, we cannot do this business with the kset_put() on error from kset_register(), after all. > Not sure how this wasn't caught in Yang's testing--the kernel should've complained.I have already tried the change that in your posted patch when I was debugging this issue before sent these fixes, because it may lead double free in some cases, and I had mentioned it in this mail: https://lore.kernel.org/lkml/0db486eb-6927-927e-3629-958f8f211194 at huawei.com/T/#m68eade1993859dfc6c3d14d35f988d35a32ef837 Thanks, Yang> > Regards, > Luben > > .