Michal Hocko
2018-Apr-23 15:15 UTC
[PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Mon 23-04-18 10:06:08, Mikulas Patocka wrote:> > > On Sat, 21 Apr 2018, Matthew Wilcox wrote: > > > On Fri, Apr 20, 2018 at 05:21:26PM -0400, Mikulas Patocka wrote: > > > On Fri, 20 Apr 2018, Matthew Wilcox wrote: > > > > On Fri, Apr 20, 2018 at 04:54:53PM -0400, Mikulas Patocka wrote: > > > > > On Fri, 20 Apr 2018, Michal Hocko wrote: > > > > > > No way. This is just wrong! First of all, you will explode most likely > > > > > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be > > > > > > enabled quite often. > > > > > > > > > > You're an evil person who doesn't want to fix bugs. > > > > > > > > Steady on. There's no need for that. Michal isn't evil. Please > > > > apologise. > > > > > > I see this attitude from Michal again and again. > > > > Fine; then *say that*. I also see Michal saying "No" a lot. Sometimes > > I agree with him, sometimes I don't. I think he genuinely wants the best > > code in the kernel, and saying "No" is part of it. > > > > > He didn't want to fix vmalloc(GFP_NOIO) > > > > I don't remember that conversation, so I don't know whether I agree with > > his reasoning or not. But we are supposed to be moving away from GFP_NOIO > > towards marking regions with memalloc_noio_save() / restore. If you do > > that, you won't need vmalloc(GFP_NOIO). > > He said the same thing a year ago. And there was small progress. 6 out of > 27 __vmalloc calls were converted to memalloc_noio_save in a year - 5 in > infiniband and 1 in btrfs. (the whole discussion is here > http://lkml.iu.edu/hypermail/linux/kernel/1706.3/04681.html )Well this is not that easy. It requires a cooperation from maintainers. I can only do as much. I've posted patches in the past and actively bringing up this topic at LSFMM last two years...> He refuses 15-line patch to fix GFP_NOIO bug because he believes that in 4 > years, the kernel will be refactored and GFP_NOIO will be eliminated. Why > does he have veto over this part of the code? I'd much rather argue with > people who have constructive comments about fixing bugs than with him.I didn't NACK the patch AFAIR. I've said it is not a good idea longterm. I would be much more willing to change my mind if you would back your patch by a real bug report. Hacks are acceptable when we have a real issue in hands. But if we want to fix potential issue then better make it properly. [...]> I sent the CONFIG_DEBUG_SG patch before (I wonder why he didn't repond to > it). I'll send a third version of the patch that actually randomly chooses > between kmalloc and vmalloc, because some abuses can only be detected with > kmalloc and we should test both. > > For bisecting, it is better to always fallback to vmalloc, but for general > testing, it is better to test both branches.Agreed! -- Michal Hocko SUSE Labs
Mikulas Patocka
2018-Apr-24 00:06 UTC
[PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG
The kvmalloc function tries to use kmalloc and falls back to vmalloc if kmalloc fails. Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then uses DMA-API on the returned memory or frees it with kfree. Such bugs were found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific code. These bugs are hard to reproduce because kvmalloc falls back to vmalloc only if memory is fragmented. In order to detect these bugs reliably I submit this patch that changes kvmalloc to fall back to vmalloc with 1/2 probability if CONFIG_DEBUG_SG is turned on. CONFIG_DEBUG_SG is used, because it makes the DMA API layer verify the addresses passed to it, and so the user will get a reliable stacktrace. Some bugs (such as buffer overflows) are better detected with kmalloc code, so we must test the kmalloc path too. Signed-off-by: Mikulas Patocka <mpatocka at redhat.com> --- mm/util.c | 10 ++++++++++ 1 file changed, 10 insertions(+) Index: linux-2.6/mm/util.c ==================================================================--- linux-2.6.orig/mm/util.c 2018-04-23 00:12:05.000000000 +0200 +++ linux-2.6/mm/util.c 2018-04-23 17:57:02.000000000 +0200 @@ -14,6 +14,7 @@ #include <linux/hugetlb.h> #include <linux/vmalloc.h> #include <linux/userfaultfd_k.h> +#include <linux/random.h> #include <asm/sections.h> #include <linux/uaccess.h> @@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f */ WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); +#ifdef CONFIG_DEBUG_SG + /* Catch bugs when the caller uses DMA API on the result of kvmalloc. */ + if (!(prandom_u32_max(2) & 1)) + goto do_vmalloc; +#endif + /* * We want to attempt a large physically contiguous block first because * it is less likely to fragment multiple larger blocks and therefore @@ -427,6 +434,9 @@ void *kvmalloc_node(size_t size, gfp_t f if (ret || size <= PAGE_SIZE) return ret; +#ifdef CONFIG_DEBUG_SG +do_vmalloc: +#endif return __vmalloc_node_flags_caller(size, node, flags, __builtin_return_address(0)); }
Mikulas Patocka
2018-Apr-24 00:25 UTC
[PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Mon, 23 Apr 2018, Michal Hocko wrote:> On Mon 23-04-18 10:06:08, Mikulas Patocka wrote: > > > > > He didn't want to fix vmalloc(GFP_NOIO) > > > > > > I don't remember that conversation, so I don't know whether I agree with > > > his reasoning or not. But we are supposed to be moving away from GFP_NOIO > > > towards marking regions with memalloc_noio_save() / restore. If you do > > > that, you won't need vmalloc(GFP_NOIO). > > > > He said the same thing a year ago. And there was small progress. 6 out of > > 27 __vmalloc calls were converted to memalloc_noio_save in a year - 5 in > > infiniband and 1 in btrfs. (the whole discussion is here > > http://lkml.iu.edu/hypermail/linux/kernel/1706.3/04681.html ) > > Well this is not that easy. It requires a cooperation from maintainers. > I can only do as much. I've posted patches in the past and actively > bringing up this topic at LSFMM last two years...You're right - but you have chosen the uneasy path. Fixing __vmalloc code is easy and it doesn't require cooperation with maintainers.> > He refuses 15-line patch to fix GFP_NOIO bug because he believes that in 4 > > years, the kernel will be refactored and GFP_NOIO will be eliminated. Why > > does he have veto over this part of the code? I'd much rather argue with > > people who have constructive comments about fixing bugs than with him. > > I didn't NACK the patch AFAIR. I've said it is not a good idea longterm. > I would be much more willing to change my mind if you would back your > patch by a real bug report. Hacks are acceptable when we have a real > issue in hands. But if we want to fix potential issue then better make > it properly.Developers should fix bugs in advance, not to wait until a crash hapens, is analyzed and reported. What's the problem with 15-line hack? Is the problem that kernel developers would feel depressed when looking the source code? Other than harming developers' feelings, I don't see what kind of damange could that piece of code do. Mikulas
Matthew Wilcox
2018-Apr-24 03:46 UTC
[PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG
On Mon, Apr 23, 2018 at 08:06:16PM -0400, Mikulas Patocka wrote:> Some bugs (such as buffer overflows) are better detected > with kmalloc code, so we must test the kmalloc path too.Well now, this brings up another item for the collective TODO list -- implement redzone checks for vmalloc. Unless this is something already taken care of by kasan or similar.
Mikulas Patocka
2018-Apr-24 11:04 UTC
[PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG
On Mon, 23 Apr 2018, David Rientjes wrote:> On Mon, 23 Apr 2018, Mikulas Patocka wrote: > > > The kvmalloc function tries to use kmalloc and falls back to vmalloc if > > kmalloc fails. > > > > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then > > uses DMA-API on the returned memory or frees it with kfree. Such bugs were > > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific > > code. > > > > These bugs are hard to reproduce because kvmalloc falls back to vmalloc > > only if memory is fragmented. > > > > In order to detect these bugs reliably I submit this patch that changes > > kvmalloc to fall back to vmalloc with 1/2 probability if CONFIG_DEBUG_SG > > is turned on. CONFIG_DEBUG_SG is used, because it makes the DMA API layer > > verify the addresses passed to it, and so the user will get a reliable > > stacktrace. > > Why not just do it unconditionally? Sounds better than "50% of the time > this will catch bugs".Because kmalloc (with slub_debug) detects buffer overflows better than vmalloc. vmalloc detects buffer overflows only at a page boundary. This is intended for debugging kernels and debugging kernels should detect as many bugs as possible. Mikulas
Michal Hocko
2018-Apr-24 12:51 UTC
[PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG
On Mon 23-04-18 20:06:16, Mikulas Patocka wrote: [...]> @@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f > */ > WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); > > +#ifdef CONFIG_DEBUG_SG > + /* Catch bugs when the caller uses DMA API on the result of kvmalloc. */ > + if (!(prandom_u32_max(2) & 1)) > + goto do_vmalloc; > +#endifI really do not think there is anything DEBUG_SG specific here. Why you simply do not follow should_failslab path or even reuse the function?> + > /* > * We want to attempt a large physically contiguous block first because > * it is less likely to fragment multiple larger blocks and therefore > @@ -427,6 +434,9 @@ void *kvmalloc_node(size_t size, gfp_t f > if (ret || size <= PAGE_SIZE) > return ret; > > +#ifdef CONFIG_DEBUG_SG > +do_vmalloc: > +#endif > return __vmalloc_node_flags_caller(size, node, flags, > __builtin_return_address(0)); > }-- Michal Hocko SUSE Labs
Michal Hocko
2018-Apr-24 13:31 UTC
[PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Mon 23-04-18 20:25:15, Mikulas Patocka wrote:> > > On Mon, 23 Apr 2018, Michal Hocko wrote: > > > On Mon 23-04-18 10:06:08, Mikulas Patocka wrote: > > > > > > > He didn't want to fix vmalloc(GFP_NOIO) > > > > > > > > I don't remember that conversation, so I don't know whether I agree with > > > > his reasoning or not. But we are supposed to be moving away from GFP_NOIO > > > > towards marking regions with memalloc_noio_save() / restore. If you do > > > > that, you won't need vmalloc(GFP_NOIO). > > > > > > He said the same thing a year ago. And there was small progress. 6 out of > > > 27 __vmalloc calls were converted to memalloc_noio_save in a year - 5 in > > > infiniband and 1 in btrfs. (the whole discussion is here > > > http://lkml.iu.edu/hypermail/linux/kernel/1706.3/04681.html ) > > > > Well this is not that easy. It requires a cooperation from maintainers. > > I can only do as much. I've posted patches in the past and actively > > bringing up this topic at LSFMM last two years... > > You're right - but you have chosen the uneasy path.Yes.> Fixing __vmalloc code > is easy and it doesn't require cooperation with maintainers.But it is a hack against the intention of the scope api. It also alows maintainers to not care about their broken code.> > > He refuses 15-line patch to fix GFP_NOIO bug because he believes that in 4 > > > years, the kernel will be refactored and GFP_NOIO will be eliminated. Why > > > does he have veto over this part of the code? I'd much rather argue with > > > people who have constructive comments about fixing bugs than with him. > > > > I didn't NACK the patch AFAIR. I've said it is not a good idea longterm. > > I would be much more willing to change my mind if you would back your > > patch by a real bug report. Hacks are acceptable when we have a real > > issue in hands. But if we want to fix potential issue then better make > > it properly. > > Developers should fix bugs in advance, not to wait until a crash hapens, > is analyzed and reported.I agree. But are those existing users broken in the first place? I have seen so many GFP_NOFS abuses that I would dare to guess that most of those vmalloc NOFS abusers can be simply turned into GFP_KERNEL. Maybe that is the reason we haven't heard any complains in years. -- Michal Hocko SUSE Labs
Possibly Parallel Threads
- [PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG
- [PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG
- [PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG
- [PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG
- [PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG