Shakeel Butt
2019-Jan-03 03:14 UTC
[Bridge] [PATCH v2] netfilter: account ebt_table_info to kmemcg
The [ip,ip6,arp]_tables use x_tables_info internally and the underlying memory is already accounted to kmemcg. Do the same for ebtables. The syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the whole system from a restricted memcg, a potential DoS. By accounting the ebt_table_info, the memory used for ebt_table_info can be contained within the memcg of the allocating process. However the lifetime of ebt_table_info is independent of the allocating process and is tied to the network namespace. So, the oom-killer will not be able to relieve the memory pressure due to ebt_table_info memory. The memory for ebt_table_info is allocated through vmalloc. Currently vmalloc does not handle the oom-killed allocating process correctly and one large allocation can bypass memcg limit enforcement. So, with this patch, at least the small allocations will be contained. For large allocations, we need to fix vmalloc. Reported-by: syzbot+7713f3aa67be76b1552c at syzkaller.appspotmail.com Signed-off-by: Shakeel Butt <shakeelb at google.com> Cc: Florian Westphal <fw at strlen.de> Cc: Michal Hocko <mhocko at kernel.org> Cc: Kirill Tkhai <ktkhai at virtuozzo.com> Cc: Pablo Neira Ayuso <pablo at netfilter.org> Cc: Jozsef Kadlecsik <kadlec at blackhole.kfki.hu> Cc: Roopa Prabhu <roopa at cumulusnetworks.com> Cc: Nikolay Aleksandrov <nikolay at cumulusnetworks.com> Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Linux MM <linux-mm at kvack.org> Cc: netfilter-devel at vger.kernel.org Cc: coreteam at netfilter.org Cc: bridge at lists.linux-foundation.org Cc: LKML <linux-kernel at vger.kernel.org> --- Changelog since v1: - More descriptive commit message. net/bridge/netfilter/ebtables.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index 491828713e0b..5e55cef0cec3 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user, tmp.name[sizeof(tmp.name) - 1] = 0; countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids; - newinfo = vmalloc(sizeof(*newinfo) + countersize); + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT, + PAGE_KERNEL); if (!newinfo) return -ENOMEM; if (countersize) memset(newinfo->counters, 0, countersize); - newinfo->entries = vmalloc(tmp.entries_size); + newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT, + PAGE_KERNEL); if (!newinfo->entries) { ret = -ENOMEM; goto free_newinfo; -- 2.20.1.415.g653613c723-goog
William Kucharski
2019-Jan-03 10:14 UTC
[Bridge] [PATCH v2] netfilter: account ebt_table_info to kmemcg
> On Jan 2, 2019, at 8:14 PM, Shakeel Butt <shakeelb at google.com> wrote: > > countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids; > - newinfo = vmalloc(sizeof(*newinfo) + countersize); > + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT, > + PAGE_KERNEL); > if (!newinfo) > return -ENOMEM; > > if (countersize) > memset(newinfo->counters, 0, countersize); > > - newinfo->entries = vmalloc(tmp.entries_size); > + newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT, > + PAGE_KERNEL); > if (!newinfo->entries) { > ret = -ENOMEM; > goto free_newinfo; > --Just out of curiosity, what are the actual sizes of these areas in typical use given __vmalloc() will be allocating by the page?
Shakeel Butt
2019-Jan-03 16:18 UTC
[Bridge] [PATCH v2] netfilter: account ebt_table_info to kmemcg
On Thu, Jan 3, 2019 at 2:15 AM William Kucharski <william.kucharski at oracle.com> wrote:> > > > > On Jan 2, 2019, at 8:14 PM, Shakeel Butt <shakeelb at google.com> wrote: > > > > countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids; > > - newinfo = vmalloc(sizeof(*newinfo) + countersize); > > + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT, > > + PAGE_KERNEL); > > if (!newinfo) > > return -ENOMEM; > > > > if (countersize) > > memset(newinfo->counters, 0, countersize); > > > > - newinfo->entries = vmalloc(tmp.entries_size); > > + newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT, > > + PAGE_KERNEL); > > if (!newinfo->entries) { > > ret = -ENOMEM; > > goto free_newinfo; > > -- > > Just out of curiosity, what are the actual sizes of these areas in typical use > given __vmalloc() will be allocating by the page? >We don't really use this in production, so, I don't have a good idea of the size in the typical case. The size depends on the workload. The motivation behind this patch was the system OOM triggered by a syzbot running in a restricted memcg. Shakeel
Kirill Tkhai
2019-Jan-06 11:00 UTC
[Bridge] [PATCH v2] netfilter: account ebt_table_info to kmemcg
On 03.01.2019 06:14, Shakeel Butt wrote:> The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > memory is already accounted to kmemcg. Do the same for ebtables. The > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > whole system from a restricted memcg, a potential DoS. > > By accounting the ebt_table_info, the memory used for ebt_table_info can > be contained within the memcg of the allocating process. However the > lifetime of ebt_table_info is independent of the allocating process and > is tied to the network namespace. So, the oom-killer will not be able to > relieve the memory pressure due to ebt_table_info memory. The memory for > ebt_table_info is allocated through vmalloc. Currently vmalloc does not > handle the oom-killed allocating process correctly and one large > allocation can bypass memcg limit enforcement. So, with this patch, > at least the small allocations will be contained. For large allocations, > we need to fix vmalloc. > > Reported-by: syzbot+7713f3aa67be76b1552c at syzkaller.appspotmail.com > Signed-off-by: Shakeel Butt <shakeelb at google.com> > Cc: Florian Westphal <fw at strlen.de> > Cc: Michal Hocko <mhocko at kernel.org> > Cc: Kirill Tkhai <ktkhai at virtuozzo.com> > Cc: Pablo Neira Ayuso <pablo at netfilter.org> > Cc: Jozsef Kadlecsik <kadlec at blackhole.kfki.hu> > Cc: Roopa Prabhu <roopa at cumulusnetworks.com> > Cc: Nikolay Aleksandrov <nikolay at cumulusnetworks.com> > Cc: Andrew Morton <akpm at linux-foundation.org> > Cc: Linux MM <linux-mm at kvack.org> > Cc: netfilter-devel at vger.kernel.org > Cc: coreteam at netfilter.org > Cc: bridge at lists.linux-foundation.org > Cc: LKML <linux-kernel at vger.kernel.org> > --- > Changelog since v1: > - More descriptive commit message.Reviewed-by: Kirill Tkhai <ktkhai at virtuozzo.com>> > net/bridge/netfilter/ebtables.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c > index 491828713e0b..5e55cef0cec3 100644 > --- a/net/bridge/netfilter/ebtables.c > +++ b/net/bridge/netfilter/ebtables.c > @@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user, > tmp.name[sizeof(tmp.name) - 1] = 0; > > countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids; > - newinfo = vmalloc(sizeof(*newinfo) + countersize); > + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT, > + PAGE_KERNEL); > if (!newinfo) > return -ENOMEM; > > if (countersize) > memset(newinfo->counters, 0, countersize); > > - newinfo->entries = vmalloc(tmp.entries_size); > + newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT, > + PAGE_KERNEL); > if (!newinfo->entries) { > ret = -ENOMEM; > goto free_newinfo; >
Pablo Neira Ayuso
2019-Jan-10 00:44 UTC
[Bridge] [PATCH v2] netfilter: account ebt_table_info to kmemcg
On Wed, Jan 02, 2019 at 07:14:31PM -0800, Shakeel Butt wrote:> The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > memory is already accounted to kmemcg. Do the same for ebtables. The > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > whole system from a restricted memcg, a potential DoS. > > By accounting the ebt_table_info, the memory used for ebt_table_info can > be contained within the memcg of the allocating process. However the > lifetime of ebt_table_info is independent of the allocating process and > is tied to the network namespace. So, the oom-killer will not be able to > relieve the memory pressure due to ebt_table_info memory. The memory for > ebt_table_info is allocated through vmalloc. Currently vmalloc does not > handle the oom-killed allocating process correctly and one large > allocation can bypass memcg limit enforcement. So, with this patch, > at least the small allocations will be contained. For large allocations, > we need to fix vmalloc.Fine with this -mm? If no objections, I'll apply this to the netfilter tree. Thanks.> Reported-by: syzbot+7713f3aa67be76b1552c at syzkaller.appspotmail.com > Signed-off-by: Shakeel Butt <shakeelb at google.com> > Cc: Florian Westphal <fw at strlen.de> > Cc: Michal Hocko <mhocko at kernel.org> > Cc: Kirill Tkhai <ktkhai at virtuozzo.com> > Cc: Pablo Neira Ayuso <pablo at netfilter.org> > Cc: Jozsef Kadlecsik <kadlec at blackhole.kfki.hu> > Cc: Roopa Prabhu <roopa at cumulusnetworks.com> > Cc: Nikolay Aleksandrov <nikolay at cumulusnetworks.com> > Cc: Andrew Morton <akpm at linux-foundation.org> > Cc: Linux MM <linux-mm at kvack.org> > Cc: netfilter-devel at vger.kernel.org > Cc: coreteam at netfilter.org > Cc: bridge at lists.linux-foundation.org > Cc: LKML <linux-kernel at vger.kernel.org> > --- > Changelog since v1: > - More descriptive commit message. > > net/bridge/netfilter/ebtables.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c > index 491828713e0b..5e55cef0cec3 100644 > --- a/net/bridge/netfilter/ebtables.c > +++ b/net/bridge/netfilter/ebtables.c > @@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user, > tmp.name[sizeof(tmp.name) - 1] = 0; > > countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids; > - newinfo = vmalloc(sizeof(*newinfo) + countersize); > + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT, > + PAGE_KERNEL); > if (!newinfo) > return -ENOMEM; > > if (countersize) > memset(newinfo->counters, 0, countersize); > > - newinfo->entries = vmalloc(tmp.entries_size); > + newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT, > + PAGE_KERNEL); > if (!newinfo->entries) { > ret = -ENOMEM; > goto free_newinfo; > -- > 2.20.1.415.g653613c723-goog >
Pablo Neira Ayuso
2019-Jan-10 23:57 UTC
[Bridge] [PATCH v2] netfilter: account ebt_table_info to kmemcg
On Wed, Jan 02, 2019 at 07:14:31PM -0800, Shakeel Butt wrote:> The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > memory is already accounted to kmemcg. Do the same for ebtables. The > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > whole system from a restricted memcg, a potential DoS. > > By accounting the ebt_table_info, the memory used for ebt_table_info can > be contained within the memcg of the allocating process. However the > lifetime of ebt_table_info is independent of the allocating process and > is tied to the network namespace. So, the oom-killer will not be able to > relieve the memory pressure due to ebt_table_info memory. The memory for > ebt_table_info is allocated through vmalloc. Currently vmalloc does not > handle the oom-killed allocating process correctly and one large > allocation can bypass memcg limit enforcement. So, with this patch, > at least the small allocations will be contained. For large allocations, > we need to fix vmalloc.OK, patch is applied, thanks.