Shakeel Butt
2018-Dec-31 03:59 UTC
[Bridge] [PATCH] netfilter: account ebt_table_info to kmemcg
On Sun, Dec 30, 2018 at 12:00 AM Michal Hocko <mhocko at kernel.org> wrote:> > On Sun 30-12-18 08:45:13, Michal Hocko wrote: > > On Sat 29-12-18 11:34:29, Shakeel Butt wrote: > > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko at kernel.org> wrote: > > > > > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote: > > > > > Michal Hocko <mhocko at kernel.org> wrote: > > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > > > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The > > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > > > > > > > whole system from a restricted memcg, a potential DoS. > > > > > > > > > > > > What is the lifetime of these objects? Are they bound to any process? > > > > > > > > > > No, they are not. > > > > > They are free'd only when userspace requests it or the netns is > > > > > destroyed. > > > > > > > > Then this is problematic, because the oom killer is not able to > > > > guarantee the hard limit and so the excessive memory consumption cannot > > > > be really contained. As a result the memcg will be basically useless > > > > until somebody tears down the charged objects by other means. The memcg > > > > oom killer will surely kill all the existing tasks in the cgroup and > > > > this could somehow reduce the problem. Maybe this is sufficient for > > > > some usecases but that should be properly analyzed and described in the > > > > changelog. > > > > > > > > > > Can you explain why you think the memcg hard limit will not be > > > enforced? From what I understand, the memcg oom-killer will kill the > > > allocating processes as you have mentioned. We do force charging for > > > very limited conditions but here the memcg oom-killer will take care > > > of > > > > I was talking about the force charge part. Depending on a specific > > allocation and its life time this can gradually get us over hard limit > > without any bound theoretically. > > Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when > the current task is killed"") there is no way to bail out from the > vmalloc allocation loop so if the request is really large then the memcg > oom will not help. Is that a problem here? >Yes, I think it will be an issue here.> Maybe it is time to revisit fatal_signal_pending check.Yes, we will need something to handle the memcg OOM. I will think more on that front or if you have any ideas, please do propose. thanks, Shakeel
Michal Hocko
2018-Dec-31 10:11 UTC
[Bridge] [PATCH] netfilter: account ebt_table_info to kmemcg
On Sun 30-12-18 19:59:53, Shakeel Butt wrote:> On Sun, Dec 30, 2018 at 12:00 AM Michal Hocko <mhocko at kernel.org> wrote: > > > > On Sun 30-12-18 08:45:13, Michal Hocko wrote: > > > On Sat 29-12-18 11:34:29, Shakeel Butt wrote: > > > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko at kernel.org> wrote: > > > > > > > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote: > > > > > > Michal Hocko <mhocko at kernel.org> wrote: > > > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > > > > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The > > > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > > > > > > > > whole system from a restricted memcg, a potential DoS. > > > > > > > > > > > > > > What is the lifetime of these objects? Are they bound to any process? > > > > > > > > > > > > No, they are not. > > > > > > They are free'd only when userspace requests it or the netns is > > > > > > destroyed. > > > > > > > > > > Then this is problematic, because the oom killer is not able to > > > > > guarantee the hard limit and so the excessive memory consumption cannot > > > > > be really contained. As a result the memcg will be basically useless > > > > > until somebody tears down the charged objects by other means. The memcg > > > > > oom killer will surely kill all the existing tasks in the cgroup and > > > > > this could somehow reduce the problem. Maybe this is sufficient for > > > > > some usecases but that should be properly analyzed and described in the > > > > > changelog. > > > > > > > > > > > > > Can you explain why you think the memcg hard limit will not be > > > > enforced? From what I understand, the memcg oom-killer will kill the > > > > allocating processes as you have mentioned. We do force charging for > > > > very limited conditions but here the memcg oom-killer will take care > > > > of > > > > > > I was talking about the force charge part. Depending on a specific > > > allocation and its life time this can gradually get us over hard limit > > > without any bound theoretically. > > > > Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when > > the current task is killed"") there is no way to bail out from the > > vmalloc allocation loop so if the request is really large then the memcg > > oom will not help. Is that a problem here? > > > > Yes, I think it will be an issue here. > > > Maybe it is time to revisit fatal_signal_pending check. > > Yes, we will need something to handle the memcg OOM. I will think more > on that front or if you have any ideas, please do propose.I can see three options here: - do not force charge on memcg oom or introduce a limited charge overflow (reserves basically). - revert the revert and reintroduce the fatal_signal_pending check into vmalloc - be more specific and check tsk_is_oom_victim in vmalloc and fail -- Michal Hocko SUSE Labs
Shakeel Butt
2019-Jan-03 20:52 UTC
[Bridge] [PATCH] netfilter: account ebt_table_info to kmemcg
On Mon, Dec 31, 2018 at 2:12 AM Michal Hocko <mhocko at kernel.org> wrote:> > On Sun 30-12-18 19:59:53, Shakeel Butt wrote: > > On Sun, Dec 30, 2018 at 12:00 AM Michal Hocko <mhocko at kernel.org> wrote: > > > > > > On Sun 30-12-18 08:45:13, Michal Hocko wrote: > > > > On Sat 29-12-18 11:34:29, Shakeel Butt wrote: > > > > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko at kernel.org> wrote: > > > > > > > > > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote: > > > > > > > Michal Hocko <mhocko at kernel.org> wrote: > > > > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > > > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > > > > > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The > > > > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > > > > > > > > > whole system from a restricted memcg, a potential DoS. > > > > > > > > > > > > > > > > What is the lifetime of these objects? Are they bound to any process? > > > > > > > > > > > > > > No, they are not. > > > > > > > They are free'd only when userspace requests it or the netns is > > > > > > > destroyed. > > > > > > > > > > > > Then this is problematic, because the oom killer is not able to > > > > > > guarantee the hard limit and so the excessive memory consumption cannot > > > > > > be really contained. As a result the memcg will be basically useless > > > > > > until somebody tears down the charged objects by other means. The memcg > > > > > > oom killer will surely kill all the existing tasks in the cgroup and > > > > > > this could somehow reduce the problem. Maybe this is sufficient for > > > > > > some usecases but that should be properly analyzed and described in the > > > > > > changelog. > > > > > > > > > > > > > > > > Can you explain why you think the memcg hard limit will not be > > > > > enforced? From what I understand, the memcg oom-killer will kill the > > > > > allocating processes as you have mentioned. We do force charging for > > > > > very limited conditions but here the memcg oom-killer will take care > > > > > of > > > > > > > > I was talking about the force charge part. Depending on a specific > > > > allocation and its life time this can gradually get us over hard limit > > > > without any bound theoretically. > > > > > > Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when > > > the current task is killed"") there is no way to bail out from the > > > vmalloc allocation loop so if the request is really large then the memcg > > > oom will not help. Is that a problem here? > > > > > > > Yes, I think it will be an issue here. > > > > > Maybe it is time to revisit fatal_signal_pending check. > > > > Yes, we will need something to handle the memcg OOM. I will think more > > on that front or if you have any ideas, please do propose. > > I can see three options here: > - do not force charge on memcg oom or introduce a limited charge > overflow (reserves basically). > - revert the revert and reintroduce the fatal_signal_pending > check into vmalloc > - be more specific and check tsk_is_oom_victim in vmalloc and > fail >I think for the long term solution we might need something similar to memcg oom reserves (1) but for quick fix I think we can do the combination of (2) and (3). Shakeel