Le 08/07/2012 19:50, R?cker Thomas a ?crit :> It sees far less testing and review than mainline. > The other day we had someone for whom Icecast suddenly started crashing. > Turned out it was a kh build.I'll pipe in. The company I work for has 1000+ audio (and some video) streams, serving tens of thousands of listeners per stream, running on both icecast and icecast-kh and _both_ have _huge_ race conditions. We experience many crashes per day on the 50+ servers we have in production. Last year, Laurent Defert, a colleague of mine had opened several trac tickets with patches to fix race conditions found using Hellgrind (a close relative of memcheck, the main Valgrind tool). I can only encourage the Icecast community to look at those patches (though they were done for the "regular" icecast, the same exercise can and _should_ be done for icecast-kh). Cheers, R?mi PS, if anyone has any questions, I'm almost always available on FreeNode as remi` (with the backtick) or as remi|work.
On 08/07/12 21:40, R?mi Cardona wrote:> Le 08/07/2012 19:50, R?cker Thomas a ?crit : >> It sees far less testing and review than mainline. >> The other day we had someone for whom Icecast suddenly started crashing. >> Turned out it was a kh build. > I'll pipe in. > > The company I work for has 1000+ audio (and some video) streams, serving > tens of thousands of listeners per stream, running on both icecast and > icecast-kh and _both_ have _huge_ race conditions. We experience many > crashes per day on the 50+ servers we have in production.Why have you not filed bugs about it then? Outlining the gravity of the problem? Or at least brought it up on IRC.> Last year, Laurent Defert, a colleague of mine had opened several trac > tickets with patches to fix race conditions found using Hellgrind (a > close relative of memcheck, the main Valgrind tool).I suspect you mean https://trac.xiph.org/ticket/1810 The ticket does not suggest that those are in any way proven and grave issues. It merely suggests them as 'potential' issues. As such it didn't make it into 2.3.3 and was deferred for 2.4. If you have proof that those race conditions occur on your systems it would have been beneficial to know that earlier. We are of course interested in Icecast being as stable as possible.> I can only encourage the Icecast community to look at those patches > (though they were done for the "regular" icecast, the same exercise can > and _should_ be done for icecast-kh).As we are currently actively merging things to trunk, #1810 was on my list anyway. Thanks again for your work and feedback, it's highly appreciated. Cheers Thomas
Le 08/07/2012 21:19, R?cker Thomas a ?crit :> Why have you not filed bugs about it then? Outlining the gravity of the > problem? > Or at least brought it up on IRC.The biggest reason is that the network ops at my company don't report many of the crashes/bugs to us (I'm on the dev side). They install monit [1] on all the servers and have it restart icecast as soon as something goes wrong. So up until recently, I didn't think the situation was as bad as it really is. The second reason was this race work done done outside our regular assignments so we had no authorization to put this on production servers...> I suspect you mean > https://trac.xiph.org/ticket/1810Indeed.> The ticket does not suggest that those are in any way proven and grave > issues. > It merely suggests them as 'potential' issues.[...]> As such it didn't make it into 2.3.3 and was deferred for 2.4. > If you have proof that those race conditions occur on your systems it > would have > been beneficial to know that earlier. We are of course interested in > Icecast being > as stable as possible.The thing is, finding proof and factual evidence of race conditions is one of the hardest thing one can do in any programming language. So far, the most common "proof" I've seen on production servers are double free() (which trigger an abort() on Linux), listener counters not incrementing/decrementing properly and socket leaks. All those issues are almost impossible to reproduce, which is what lead us to the race condition theory and to run icecast in Hellgrind. Hellgrind is a god-send as it emulates a CPU and records (almost) all memory operations and analyzes if/how locks are taken. So apart from a few false-positives (though I haven't seen any while working on icecast), all the errors reported by Hellgrind are indeed race conditions waiting to happen. FTR, Laurent and I only tested those patches on our own machines. Those patches could impact icecast's performance significantly. Consider them a work in progress. Cheers, R?mi [1] http://mmonit.com/monit/