Ask Bjørn Hansen
2019-Feb-15 05:39 UTC
[nsd-users] nsd: remote.c:2307: daemon_remote_process_stats: Assertion `s->in_stats_list' failed.
Hi everyone, Recently some of my nsd installations started crashing every so many days (or weeks?) with: nsd: remote.c:2307: daemon_remote_process_stats: Assertion `s->in_stats_list' failed. I am using the 4.1.24-2.el7 build on CentOS 7. I haven?t been able to reproduce it. The servers are doing pretty light duty (<1000 qps) and rarely reloading configs. ?nsd-control stats_noreset? runs once or twice a minute (from a prometheus exporter). Is this a known problem in 4.1.24 or am I doing something goofy? Ask
Wouter Wijngaards
2019-Feb-15 07:12 UTC
[nsd-users] nsd: remote.c:2307: daemon_remote_process_stats: Assertion `s->in_stats_list' failed.
Hi Ask, On 15/02/2019 06:39, Ask Bj?rn Hansen wrote:> Hi everyone, > > Recently some of my nsd installations started crashing every so many days (or weeks?) with:This failure was reported to me before, and then I could not find it.? Could not reproduce and code inspection shows it should not really be possible.? With that I mean, that the code-paths that go to that assertion all have the boolean turned on.? When that item is removed from the list the boolean is turned off.? The only explanations I have left are random heap corruption or somehow the list management does not work right. This could happen if the item was in the list twice.? But that is also not possible, both by this boolean and because only one command per nsd-control stream.? So I am stymied at the issue; looks like the code paths involved in having that linked list mismanaged are all not possible. Some runs in a memory checker like valgrind also show no issues. And that is what I would ask for; some way to reproduce, or runs in a memory checker like valgrind or gcc's libasan (that might catch the issue before it becomes an assertion failure, eg. deleted element still in use or something).? Is that is not possible; I guess I could try to make debug-versions of the code for people that have it; some sort of lighter-weight debug code to get more information on this issue, lighter than valgrinds speed slowdown. But I am not sure what it would need to check for. Best regards, Wouter> > nsd: remote.c:2307: daemon_remote_process_stats: Assertion `s->in_stats_list' failed. > > I am using the 4.1.24-2.el7 build on CentOS 7. > > I haven?t been able to reproduce it. The servers are doing pretty light duty (<1000 qps) and rarely reloading configs. ?nsd-control stats_noreset? runs once or twice a minute (from a prometheus exporter). > > Is this a known problem in 4.1.24 or am I doing something goofy? > > > Ask > > _______________________________________________ > nsd-users mailing list > nsd-users at NLnetLabs.nl > https://open.nlnetlabs.nl/mailman/listinfo/nsd-users