Matt Garman
2011-May-20 18:34 UTC
[CentOS] scheduling differences between CentOS 4 and CentOS 5?
We have several latency-sensitive "pipeline"-style programs that have a measurable performance degredation when run on CentOS 5.x versus CentOS 4.x. By "pipeline" program, I mean one that has multiple threads. The mutiple threads work on shared data. Between each thread, there is a queue. So thread A gets data, pushes into Qab, thread B pulls from Qab, does some processing, then pushes into Qbc, thread C pulls from Qbc, etc. The initial data is from the network (generated by a 3rd party). We basically measure the time from when the data is received to when the last thread performs its task. In our application, we see an increase of anywhere from 20 to 50 microseconds when moving from CentOS 4 to CentOS 5. I have used a few methods of profiling our application, and determined that the added latency on CentOS 5 comes from queue operations (in particular, popping). However, I can improve performance on CentOS 5 (to be the same as CentOS 4) by using taskset to bind the program to a subset of the available cores. So it appers to me, between CentOS 4 and 5, there was some change (presumably to the kernel) that caused threads to be scheduled differently (and this difference is suboptimal for our application). While I can "solve" this problem with taskset, my preference is to not have to do this. I'm hoping there's some kind of kernel tunable (or maybe collection of tunables) whose default was changed between versions. Anyone have any experience with this? Perhaps some more areas to investigate? Thanks, Matt
R P Herrold
2011-May-20 18:46 UTC
[CentOS] scheduling differences between CentOS 4 and CentOS 5?
On Fri, 20 May 2011, Matt Garman wrote:> We have several latency-sensitive "pipeline"-style programs that have > a measurable performance degredation when run on CentOS 5.x versus > CentOS 4.x. > > By "pipeline" program, I mean one that has multiple threads. The > mutiple threads work on shared data. Between each thread, there is a > queue. So thread A gets data, pushes into Qab, thread B pulls from > Qab, does some processing, then pushes into Qbc, thread C pulls from > Qbc, etc. The initial data is from the network (generated by a 3rd > party). > > We basically measure the time from when the data is received to when > the last thread performs its task. In our application, we see an > increase of anywhere from 20 to 50 microseconds when moving from > CentOS 4 to CentOS 5.> Anyone have any experience with this? Perhaps some more areas to investigate?We do procesing similar to this with financials markets datastreams. You do not say, but I assume you are blocking on a select, rather than polling [polling is bad here]. Also you do not say if all threds are under a common process' ownership. If not, mod complexity of debugging threading, you may want to do so I say this, because in our testing (both with all housed in a single process, and when using co-processes fed through an anaoymous pipe), we will occasionally get hit with a context or process switch, which messes up the latencies something fierce. An 'at' or 'cron' job firing off can ruin the day as well Also, system calls are to be avoided, as the timing on when (and if, and in what order) one gets returned to, is not something controllable in userspace Average latencies are not so meaningful here ... collecton of all dispatch and return data and explaining the outliers is probably a good place to continue with afer addresing the foregoing. graphviz, and gnuplot are lovely for doing this kind of visualization -- Russ herrold