On 03/23/2016 09:19 AM, Cole Robinson wrote:> On 03/23/2016 12:10 PM, Peter Steele wrote: >> Has anyone seen this issue? We're running containers under CentOS 7.2 and some >> of these containers are reporting incorrect memory allocation in >> /proc/meminfo. The output below comes from a system with 32G of memory and >> 84GB of swap. The values reported are completely wrong. >> > There was a meminfo bug here: > > https://bugzilla.redhat.com/show_bug.cgi?id=1300781 > > The initial report is fixed in git, however the reporter also mentioned the > issue you are seeing. I suspect something is going wacky with the memory > values we are getting from host cgroups after some period of time. If you can > reproduce with Fedora (or RHEL) try filing a bug there > > - Cole >It's interesting that the value I see on my containers (9007199254740991) is the exact same value that's reported in this Red Hat bug. Clearly that is not a coincidence. We did not see this problem in 7.1 so apparently it is something introduced in 7.2. For the immediate term it looks like we'll have to roll back to 7.1.I'll look into getting it reproduced in Federa or RHEL. Peter
On 03/23/2016 01:41 PM, Peter Steele wrote:> On 03/23/2016 09:19 AM, Cole Robinson wrote: >> On 03/23/2016 12:10 PM, Peter Steele wrote: >>> Has anyone seen this issue? We're running containers under CentOS 7.2 and some >>> of these containers are reporting incorrect memory allocation in >>> /proc/meminfo. The output below comes from a system with 32G of memory and >>> 84GB of swap. The values reported are completely wrong. >>> >> There was a meminfo bug here: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1300781 >> >> The initial report is fixed in git, however the reporter also mentioned the >> issue you are seeing. I suspect something is going wacky with the memory >> values we are getting from host cgroups after some period of time. If you can >> reproduce with Fedora (or RHEL) try filing a bug there >> >> - Cole >> > It's interesting that the value I see on my containers (9007199254740991) is > the exact same value that's reported in this Red Hat bug. Clearly that is not > a coincidence. We did not see this problem in 7.1 so apparently it is > something introduced in 7.2. For the immediate term it looks like we'll have > to roll back to 7.1.I'll look into getting it reproduced in Federa or RHEL. >Even if you don't reproduce on RHEL/Fedora, it would be useful if you figure out exactly what steps it takes to reproduce: is it random, or after a set amount of time, or after machine suspends, or some manual way to tickle the issue. Thanks, Cole
On 03/23/2016 11:26 AM, Cole Robinson wrote:> Even if you don't reproduce on RHEL/Fedora, it would be useful if you figure > out exactly what steps it takes to reproduce: is it random, or after a set > amount of time, or after machine suspends, or some manual way to tickle the issue. > > Thanks, > Cole >It's readily reproducible in own environment, although whatever is the root cause part of a larger workflow. I'll see if I can come up with a minimal set of steps independent of our workflow that reproduces this behavior. Peter