Matthew J. Roth
2007-May-25 11:49 UTC
[asterisk-users] Scaling Asterisk: Dual-Core CPUs not yielding gains at high call volumes
List users, Using Asterisk in an inbound call center environment has led us to pushing the limits of vertical scaling. In order to treat each caller fairly and to utilize our agents as efficiently as possible, it is desirable to configure each client as a single queue. As far as I know, Asterisk's queues cannot be distributed across servers, so the size of the largest queue we service is our vertical scaling goal. In our case, that queue must be able to hold in excess of 300 calls regardless of their makeup (ie. number of calls in queue vs. number of calls connected to an agent). In reality, we are servicing more than one client on our server, so on busy days the total number of calls we're handling is greater than 300. Recently, we were pushing our server to almost full CPU utilization. Since we've observed that Asterisk is CPU bound, we upgraded our server from a PowerEdge 6850 with four single-core Intel Xeon CPUs running at 3.16GHz, to a PowerEdge 6850 with 4 dual-core Intel Xeon CPUs running at 3.00GHz. The software installed is identical and a kernel build benchmark yielded promising results. The new dual-core server ran roughly 80% faster, which is about what we expected. As far as Asterisk is concerned, at low call volumes the dual-core server outperforms the single-core server at a similar rate. I'm working on a follow-up post that will demonstrate this with some benchmarks for a small number of calls in various scenarios on each machine. However, to our surprise as the number of concurrent calls increases, the performance gains begin to flatten out. In fact, it seems that somewhere between 200 and 300 calls, the two servers start to exhibit similar idle times despite one of them having twice as many cores. Once I collect the data, I will add a second follow-up post with a performance curve tracking the full range of call volumes we experience. Unfortunately, from day to day there are some variables that I'm sure affect performance, such as the number of agents logged in and the makeup of the calls. I'll do my best to choose a sample size that smooths out these bumps. In the meantime, I'm looking for insights as to what would cause Asterisk (or any other process) to idle at the same value, despite having similar workloads and twice as many CPUs available to it. I'll be working on benchmarking Asterisk from very low to very high call volumes so any suggestions or tips, such as how to generate a large number of calls or what statistics I should gather, would also be appreciated. Thank you, Matthew Roth InterMedia Marketing Solutions Software Engineer and Systems Developer
Sean M. Pappalardo
2007-May-25 12:28 UTC
[asterisk-users] Scaling Asterisk: Dual-Core CPUs not yielding gains at high call volumes
Hi there. Just curious if you've checked out Linux clustering software such as OpenSSI ( http://www.openssi.org/ ) and run Asterisk on it? It features a multi-threaded cluster-aware shell (and custom kernel) that will automatically cluster-ize any regular Linux executable (such as the main Asterisk process.) If it works as advertised, it should just be a matter of adding boxes to the cluster to speed up processing. As for Asterisk itself, is it multi-threaded enough to take advantage of 4+ way systems? Sean Pappalardo <<--------------------------------------------------------------------------------->> This E-Mail message has been scanned for viruses and cleared by >>SmartMail<< from Smarter Technology, Inc. <<--------------------------------------------------------------------------------->>
Matthew J. Roth
2007-May-25 16:14 UTC
[asterisk-users] Scaling Asterisk: Dual-Core CPUs not yielding gains at high call volumes - Low volume benchmarks
List users, This post contains the benchmarks for Asterisk at low call volumes on similar single and dual-core servers. I'd appreciate it greatly if you took the time to read and comment on it. Thank you, Matthew Roth InterMedia Marketing Solutions Software Engineer and Systems Developer Conclusions ----------- I'm presenting the conclusions first, because they are the most important part of the benchmarking. If you like details and numbers, scroll down. I've drawn three conclusions from this set of benchmarks. 1. At low call volumes, the dual-core server outperforms the single-core server by the expected margin. 2. Calls bridged to an agent are more CPU intensive than calls listening to audio via the Playback() application or calls in queue. This is expected, because they involve more SIP channels and more work is done on the RTP frames (bridging, recording, etc.). 3. For all call types, the majority of the CPU time is spent in the kernel (servicing system calls, etc.). I've observed this to be true at all call volumes on our production server, with the ratio sometimes in the range of 20 to 1. This may suggest that the popular perception that Asterisk doesn't scale well because of its extensive use of linked lists doesn't tell the whole story. So far there are no surprises, but over the next week or so I'll be collecting data that I expect to reveal that at high call volumes (200-300 concurrent calls) the idle percentage on both machines starts to approach the same value. In the end, my goal is to break through (or, at the least, understand) this scaling issue, so I welcome all forms of critique. It's quite possible that the problem lies in my setup or that I'm missing something obvious, but I suspect it is deeper than that. Benchmarking Methodology ------------------------ I collected each type of data as follows. - Active channel and call counts: 'asterisk -rx "show channels"' and 'asterisk -rx "sip show channels"' - Thread counts: 'ps -eLf' and 'ps axms' - Idle time values: 'sar 30 1' - Average CPU utilization per call: (startIdle - endIdle) / numCalls The servers were rebooted between tests. Call Types ---------- I tested the following three call types. - Incoming SIP to the Playback() application - 1 active SIP channel per call - From the originating Asterisk server to the Playback() application - Incoming SIP to the Queue() application - In queue - 1 active SIP channel per call - From the originating Asterisk server to the Queue() application - Incoming SIP to the Queue() application - Bridged to an agent - 2 active SIP channels per call - From the originating Asterisk server to the Queue() application - Bridged from the Queue() application to the agent All calls were pure VOIP (SIP/RTP) and originated from another Asterisk server. Calls that were bridged to agents terminated at SIP hardphones (Snom 320s) and were recorded to a RAM disk via the Monitor() application. All calls were in the uLaw codec and all audio files (including the call recordings, the native MOH, and the periodic queue announcements which played approximately every 60 seconds) were in the PCM file format. There was no transcoding, protocol bridging, or TDM hardware involved on the servers being benchmarked. A Note on Asterisk and Threads ------------------------------ On both systems, a freshly started Asterisk process consisted of 10 threads. Some events, such as performing an 'asterisk -rx reload' triggered the creation of a new persistent thread. The benchmarking revealed that in general, the Asterisk process will consist of 10-15 persistent background threads plus exactly 1 additional thread per active call. This means that at even modest call volumes, Asterisk will utilize all of the CPUs in most modern PC-based servers. Server Profiles --------------- The servers I performed the benchmarking on are described below. Note that the CPUs support hyperthreading, but it is disabled. This is reflected in the CPU count, which is the number of physical processors available to the OS. Short Name: DC Manufacturer: Dell Computer Corporation Product Name: PowerEdge 6850 Processors: Four Dual-Core Intel Xeon MP CPUs at 3.00GHz CPU Count: 8 FSB Speed: 800 MHz OS: Fedora Core 3 - 2.6.13-ztdummy SMP x86_64 Kernel Asterisk Ver: ABE-B.1-3 Short Name: SC Manufacturer: Dell Computer Corporation Product Name: PowerEdge 6850 Processors: Four Single-Core Intel Xeon MP CPUs at 3.16GHz CPU Count: 4 FSB Speed: 667 MHz OS: Fedora Core 3 - 2.6.13-ztdummy SMP x86_64 Kernel Asterisk Ver: ABE-B.1-3 The kernel is a vanilla 2.6.13 kernel with enhanced realtime clock support and a timer frequency of 1000 HZ (earning it the EXTRAVERSION of '-ztdummy'). I am aware that the 2.6.17 kernel introduced multi-core scheduler support, but it exhibited negligible gains in the kernel build benchmark. Nonetheless, I am open to any tips regarding kernel versions and configuration options. At the software level, the servers are identical. They are both running the same version of Asterisk Business Edition, and the Fedora Core 3 installation was performed from the bare metal using the same install document and a local source for the update RPMs. The Numbers ----------- DC - Incoming SIP to the Playback() application ==============================================calls %user %system %iowait %idle 0 0.00 0.01 0.01 99.98 1 0.02 0.04 0.00 99.94 2 0.02 0.06 0.00 99.92 3 0.03 0.11 0.00 99.86 4 0.04 0.13 0.00 99.83 5 0.05 0.16 0.00 99.80 6 0.05 0.20 0.00 99.75 7 0.07 0.24 0.00 99.70 8 0.07 0.25 0.00 99.67 9 0.08 0.27 0.00 99.65 10 0.09 0.33 0.00 99.58 Average CPU utilization per call: 0.040% (~960 MHz) SC - Incoming SIP to the Playback() application ==============================================calls %user %system %iowait %idle 0 0.01 0.02 0.00 99.98 1 0.02 0.10 0.00 99.88 2 0.03 0.17 0.00 99.80 3 0.06 0.21 0.00 99.73 4 0.08 0.28 0.00 99.63 5 0.10 0.34 0.01 99.55 6 0.11 0.48 0.00 99.41 7 0.14 0.49 0.00 99.37 8 0.16 0.57 0.00 99.28 9 0.17 0.63 0.01 99.19 10 0.18 0.75 0.00 99.07 Average CPU utilization per call: 0.091% (~1152 MHz) DC - Incoming SIP to the Queue() application - In queue ======================================================calls %user %system %iowait %idle 0 0.00 0.01 0.00 99.99 1 0.01 0.03 0.00 99.96 2 0.01 0.05 0.00 99.94 3 0.01 0.08 0.00 99.91 4 0.02 0.10 0.00 99.88 5 0.03 0.12 0.00 99.84 6 0.04 0.16 0.00 99.80 7 0.03 0.17 0.00 99.80 8 0.04 0.20 0.00 99.76 9 0.03 0.22 0.00 99.75 10 0.05 0.27 0.00 99.68 Average CPU utilization per call: 0.031% (~744 MHz) SC - Incoming SIP to the Queue() application - In queue ======================================================calls %user %system %iowait %idle 0 0.02 0.02 0.00 99.96 1 0.03 0.07 0.00 99.91 2 0.03 0.13 0.00 99.83 3 0.04 0.18 0.00 99.78 4 0.05 0.23 0.00 99.72 5 0.06 0.27 0.00 99.67 6 0.07 0.33 0.00 99.60 7 0.09 0.38 0.00 99.53 8 0.09 0.40 0.00 99.51 9 0.11 0.46 0.01 99.43 10 0.11 0.48 0.00 99.41 Average CPU utilization per call: 0.055% (~697 MHz) DC - Incoming SIP to the Queue() application - Bridged to an agent =================================================================calls %user %system %iowait %idle 0 0.00 0.01 0.00 99.99 1 0.01 0.06 0.00 99.93 2 0.02 0.14 0.00 99.84 3 0.03 0.16 0.00 99.81 Average CPU utilization per call: 0.060% (~1440 MHz) SC - Incoming SIP to the Queue() application - Bridged to an agent =================================================================calls %user %system %iowait %idle 0 0.01 0.02 0.00 99.98 1 0.02 0.16 0.00 99.82 2 0.04 0.28 0.00 99.68 3 0.07 0.36 0.00 99.57 Average CPU utilization per call: 0.137% (~1735 MHz)
Mark Coccimiglio
2007-May-26 01:27 UTC
[asterisk-users] Scaling Asterisk: Dual-Core CPUs not yielding gains at high call volumes
Matthew J. Roth wrote:> In fact, it seems that somewhere between 200 and 300 calls, the two > servers start to exhibit similar idle times despite one of them having > twice as many cores. >Sounds like you are running into the hardware limitations of your systems PCI or "Front Side Bus" (FSB) and not necessarily an issue of asterisk. In short there is a limited amount of bandwidth on the computer's PCI Bus (33 MHz) and the FSB (100-800MHz). One thing to remember is that ALL cores and data streams need to share the PCI and FSB. Asterisk is very processor and memory intensive. At the extreme level of usage more cores won't help if data is "stuck in the pipe". So the performance planing you described would be expected. Mark C.
JR Richardson
2007-May-26 06:54 UTC
[asterisk-users] RE: Scaling Asterisk: Dual-Core CPUs not yielding gains at high call volumes
> > In fact, it seems that somewhere between 200 and 300 calls, the two > > servers start to exhibit similar idle times despite one of them having > > twice as many cores. > > > > Sounds like you are running into the hardware limitations of your > systems PCI or "Front Side Bus" (FSB) and not necessarily an issue of > asterisk. In short there is a limited amount of bandwidth on the > computer's PCI Bus (33 MHz) and the FSB (100-800MHz). One thing to > remember is that ALL cores and data streams need to share the PCI and > FSB. Asterisk is very processor and memory intensive. At the extreme > level of usage more cores won't help if data is "stuck in the pipe". So > the performance planing you described would be expected. >This is a great point. FSB speed has always been a bottleneck in relations to PCI and Proc speed increases. The Dual-Core system you are working with must have cost a bundle, several thousand. My approach has been to stick with single cpu, single core servers and add more servers to the cluster, versus building bigger, faster Proc servers. With sub $1000 servers, I can achieve 150-200 calls per server, cluster several servers together and for the same price as a quad proc dual-core server have 700-1000 call capacity. Now, with that said, a cluster becomes harder to build and operate than a 1 server Asterisk implementation and does not work well in some environments, such as with large call queues. But when you are talking straight call capacity, multiple servers will usually dominate singe servers in relation to cost. All, Nice discussion, and thanks for posting your benchmark results and feedback. JR -- JR Richardson Engineering for the Masses
JR Richardson
2007-May-26 07:10 UTC
[asterisk-users] RE: Scaling Asterisk: Dual-Core CPUs not yielding gains at high call volumes
> > > In fact, it seems that somewhere between 200 and 300 calls, the two > > > servers start to exhibit similar idle times despite one of them having > > > twice as many cores.Do you get any errors at max call capacity about "too many open files"? You may try increasing your file descriptors. ---------------------------------------------------------------------------- * FILE DESCRIPTORS Depending on the size of your system and your configuration, Asterisk can consume a large number of file descriptors. In UNIX, file descriptors are used for more than just files on disk. File descriptors are also used for handling network communication (e.g. SIP, IAX2, or H.323 calls) and hardware access (e.g. analog and digital trunk hardware). Asterisk accesses many on-disk files for everything from configuration information to voicemail storage. Most systems limit the number of file descriptors that Asterisk can have open at one time. This can limit the number of simultaneous calls that your system can handle. For example, if the limit is set at 1024 (a common default value) Asterisk can handle approximately 150 SIP calls simultaneously. To change the number of file descriptors follow the instructions for your system below: == PAM-based Linux System = If your system uses PAM (Pluggable Authentication Modules) edit /etc/security/limits.conf. Add these lines to the bottom of the file: root soft nofile 4096 root hard nofile 8196 asterisk soft nofile 4096 asterisk hard nofile 8196 (adjust the numbers to taste). You may need to reboot the system for these changes to take effect. == Generic UNIX System = If there are no instructions specifically adapted to your system above you can try adding the command "ulimit -n 8192" to the script that starts Asterisk. ---------------------------------------------------------------------------- JR -- JR Richardson Engineering for the Masses
Matthew J. Roth
2007-May-28 11:11 UTC
[asterisk-users] Scaling Asterisk: Dual-Core CPUs not yielding gains at high call volumes - Low volume benchmarks - Correction
Luki wrote:> Perhaps a naive question, but how does 0.137% CPU utilization per call > equal 1735 MHz per call? > > If 1735 MHz / 0.137% = 1735 MHz / 0.00137 => 1266423 MHz at 100% > utilization ??! Even with 4 CPUs, those would be 316 GHz CPUs. > > I think you meant: > Average CPU utilization per call: 0.137% (~17 MHz)Luki, You are absolutely right. Thank you for pointing out and correcting my mistake. The corrected statistics are below. Note that the MHz per call statistic is calculated with the following formula: MHzPerCall = (numCPUs * CPUspeed) * (avgCPUperCall * .01) Thank you, Matthew Roth InterMedia Marketing Solutions Software Engineer and Systems Developer The Numbers (Corrected) ----------------------- DC - Incoming SIP to the Playback() application ==============================================calls %user %system %iowait %idle 0 0.00 0.01 0.01 99.98 1 0.02 0.04 0.00 99.94 2 0.02 0.06 0.00 99.92 3 0.03 0.11 0.00 99.86 4 0.04 0.13 0.00 99.83 5 0.05 0.16 0.00 99.80 6 0.05 0.20 0.00 99.75 7 0.07 0.24 0.00 99.70 8 0.07 0.25 0.00 99.67 9 0.08 0.27 0.00 99.65 10 0.09 0.33 0.00 99.58 Average CPU utilization per call: 0.040% (~9.60 MHz) SC - Incoming SIP to the Playback() application ==============================================calls %user %system %iowait %idle 0 0.01 0.02 0.00 99.98 1 0.02 0.10 0.00 99.88 2 0.03 0.17 0.00 99.80 3 0.06 0.21 0.00 99.73 4 0.08 0.28 0.00 99.63 5 0.10 0.34 0.01 99.55 6 0.11 0.48 0.00 99.41 7 0.14 0.49 0.00 99.37 8 0.16 0.57 0.00 99.28 9 0.17 0.63 0.01 99.19 10 0.18 0.75 0.00 99.07 Average CPU utilization per call: 0.091% (~11.52 MHz) DC - Incoming SIP to the Queue() application - In queue ======================================================calls %user %system %iowait %idle 0 0.00 0.01 0.00 99.99 1 0.01 0.03 0.00 99.96 2 0.01 0.05 0.00 99.94 3 0.01 0.08 0.00 99.91 4 0.02 0.10 0.00 99.88 5 0.03 0.12 0.00 99.84 6 0.04 0.16 0.00 99.80 7 0.03 0.17 0.00 99.80 8 0.04 0.20 0.00 99.76 9 0.03 0.22 0.00 99.75 10 0.05 0.27 0.00 99.68 Average CPU utilization per call: 0.031% (~7.44 MHz) SC - Incoming SIP to the Queue() application - In queue ======================================================calls %user %system %iowait %idle 0 0.02 0.02 0.00 99.96 1 0.03 0.07 0.00 99.91 2 0.03 0.13 0.00 99.83 3 0.04 0.18 0.00 99.78 4 0.05 0.23 0.00 99.72 5 0.06 0.27 0.00 99.67 6 0.07 0.33 0.00 99.60 7 0.09 0.38 0.00 99.53 8 0.09 0.40 0.00 99.51 9 0.11 0.46 0.01 99.43 10 0.11 0.48 0.00 99.41 Average CPU utilization per call: 0.055% (~6.97 MHz) DC - Incoming SIP to the Queue() application - Bridged to an agent =================================================================calls %user %system %iowait %idle 0 0.00 0.01 0.00 99.99 1 0.01 0.06 0.00 99.93 2 0.02 0.14 0.00 99.84 3 0.03 0.16 0.00 99.81 Average CPU utilization per call: 0.060% (~14.40 MHz) SC - Incoming SIP to the Queue() application - Bridged to an agent =================================================================calls %user %system %iowait %idle 0 0.01 0.02 0.00 99.98 1 0.02 0.16 0.00 99.82 2 0.04 0.28 0.00 99.68 3 0.07 0.36 0.00 99.57 Average CPU utilization per call: 0.137% (~17.35 MHz)
Matthew J. Roth
2007-May-28 11:23 UTC
[asterisk-users] RE: Scaling Asterisk: Dual-Core CPUs not yielding gains at high call volumes
JR Richardson wrote:> Do you get any errors at max call capacity about "too many open files"? You > may try increasing your file descriptors.JR, Thanks for the response, but I have the maximum number of open files available to Asterisk set to 65536. Thank you, Matthew Roth InterMedia Marketing Solutions Software Engineer and Systems Developer
John Hughes
2007-Jun-01 07:21 UTC
[asterisk-users] Scaling Asterisk: Dual-Core CPUs not yielding gains at high call volumes
Matthew J. Roth wrote:> Recently, we were pushing our server to almost full CPU utilization. > Since we've observed that Asterisk is CPU bound, we upgraded our > server from a PowerEdge 6850 with four single-core Intel Xeon CPUs > running at 3.16GHz, to a PowerEdge 6850 with 4 dual-core Intel Xeon > CPUs running at 3.00GHz. The software installed is identical and a > kernel build benchmark yielded promising results. The new dual-core > server ran roughly 80% faster, which is about what we expected. > > As far as Asterisk is concerned, at low call volumes the dual-core > server outperforms the single-core server at a similar rate.Outperforms in what sense?> I'm working on a follow-up post that will demonstrate this with some > benchmarks for a small number of calls in various scenarios on each > machine. However, to our surprise as the number of concurrent calls > increases, the performance gains begin to flatten out. In fact, it > seems that somewhere between 200 and 300 calls, the two servers start > to exhibit similar idle times despite one of them having twice as many > cores.What do you mean by "idle" here?
Russell Bryant
2007-Jun-15 15:18 UTC
[asterisk-users] Scaling Asterisk: Dual-Core CPUs not yielding gains at high call volumes
Matthew J. Roth wrote:> In the meantime, I'm looking for insights as to what would cause > Asterisk (or any other process) to idle at the same value, despite > having similar workloads and twice as many CPUs available to it. I'll > be working on benchmarking Asterisk from very low to very high call > volumes so any suggestions or tips, such as how to generate a large > number of calls or what statistics I should gather, would also be > appreciated.I am very curious if using this library on your system will help increase the load you are able to put on the dual core system. http://www.hoard.org/ People that are running Asterisk on Solaris have noted that using the mtmalloc library allows for much higher call density. I am hoping that hoard will let the people running Asterisk on Linux see similar performance improvements, but I have yet to convince anyone to give it a try and let me know how it goes. :) -- Russell Bryant Software Engineer Digium, Inc.
Apparently Analagous Threads
- 99% iowait on one core in 8 core processor
- Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
- Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
- Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
- 10 Node OCFS2 Cluster - Performance