thr3ads.net - asterisk users - [asterisk-users] Breaking news, but what happened? 11.000 channels on one server [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Olle E. Johansson

2009-Aug-25 12:59 UTC

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

Hello Asterisk users around the world!

Recently, I have been working with pretty large Asterisk  
installations. 300 servers running Asterisk and Kamailio (OpenSER).  
Replacing large Nortel systems with just a few tiny boxes and other  
interesting solutions. Testing has been a large part of these  
projects. How much can we put into one Asterisk box? Calls per euro  
invested matters.

So far, we've been able to reach about 2000 channels of G.711 with  
quad core CPU's and Intel Pro/1000 network cards in IBM servers. At  
that point, we see that IRQ balancer gives up and goes to bed, and all  
the traffic is directed to one core and the system gives up. We've  
been running these tests on several systems, with different NICs and  
have been working hard to tweak capacity. New drivers, new cards, new  
stuff. But all indications told us that the problem was the CPU  
context switching between handling network traffic (RTP traffic) and  
Asterisk. This was also confirmed from a few different independent  
software development teams.

Imaging my surprise this Monday when I installed a plain old Asterisk  
1.4 on a new HP server, a DL380 G6, and could run in circles around  
the old IBM servers. Three servers looping calls between them and we  
bypassed 10.000 channels without any issues.  SIP to SIP calls, the  
p2p RTP bridge, basically running a media proxy. At that point, our  
cheap gigabit switch gave up, and of course the NICs. Pushing 850 Mbit  
was more than enough. The CPU's (we had 16 of them with  
hyperthreading) was not very stressed. Asterisk was occupying a few of  
them in a nice way, but we had a majority of them idling around  
looking for something to do.

So, please help me. I need answers to John Todds questions while he's  
treating me with really good expensive wine at Astricon. How did this  
happen? Was it the Broadcom NICs? Was it the Intel 5530 Xeon CPU's? Or  
a combination? Or maybe just the cheap Netgear switch...

I hope to get more access to these boxes, three of them, to run tests  
with the latest code. In that version we have the new hashtables, all  
the refcounters and fancy stuff that the Digium team has reworked on  
the inside of Asterisk. The trunk version will propably behave much,  
much better than 1.4 when it comes to heavy loads and high call setup  
rates.

We're on our way to build a new generation of Asterisk, far away from  
the 1.0 platform. At the same time, the hardware guys have obviously  
not been asleep. They're giving us inexpensive hardware that makes our  
software shine. Now we need to test other things and see how the rest  
of Asterisk scales, apart from the actual calls. Manager, events,  
musiconhold, agi/fastagi... New interesting challenges.

So take one of these standard rack servers from HP and run a telco for  
a small city on one box. While you're at it, buy a spare one, hardware  
can fail ( ;-) ).
But don't say that Asterisk does not scale well. Those times are gone.

/Olle

---
* Olle E Johansson - oej at edvina.net
* Open Unified Communication - SIP & XMPP projects

Raimund Sacherer

2009-Aug-25 13:42 UTC

head link

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

Hi,

what machines where the IBM Servers? I would be really interested in  
this as we have currently IBM Hardware deployed and, well, maybe it's  
time to investigate in different hardware,

best

-- 
Raimund Sacherer
-
RunSolutions
     Open Source It Consulting
-
Email: rs at runsolutions.com
tel: 625 40 32 08

Parc Bit - Centro Empresarial Son Espanyol
Edificio Estel - Local 3D
07121 -  Palma de Mallorca
Baleares

On Aug 25, 2009, at 2:59 PM, Olle E. Johansson wrote:
> Hello Asterisk users around the world!
>
> Recently, I have been working with pretty large Asterisk
> installations. 300 servers running Asterisk and Kamailio (OpenSER).
> Replacing large Nortel systems with just a few tiny boxes and other
> interesting solutions. Testing has been a large part of these
> projects. How much can we put into one Asterisk box? Calls per euro
> invested matters.
>
> So far, we've been able to reach about 2000 channels of G.711 with
> quad core CPU's and Intel Pro/1000 network cards in IBM servers. At
> that point, we see that IRQ balancer gives up and goes to bed, and all
> the traffic is directed to one core and the system gives up. We've
> been running these tests on several systems, with different NICs and
> have been working hard to tweak capacity. New drivers, new cards, new
> stuff. But all indications told us that the problem was the CPU
> context switching between handling network traffic (RTP traffic) and
> Asterisk. This was also confirmed from a few different independent
> software development teams.
>
> Imaging my surprise this Monday when I installed a plain old Asterisk
> 1.4 on a new HP server, a DL380 G6, and could run in circles around
> the old IBM servers. Three servers looping calls between them and we
> bypassed 10.000 channels without any issues.  SIP to SIP calls, the
> p2p RTP bridge, basically running a media proxy. At that point, our
> cheap gigabit switch gave up, and of course the NICs. Pushing 850 Mbit
> was more than enough. The CPU's (we had 16 of them with
> hyperthreading) was not very stressed. Asterisk was occupying a few of
> them in a nice way, but we had a majority of them idling around
> looking for something to do.
>
> So, please help me. I need answers to John Todds questions while he's
> treating me with really good expensive wine at Astricon. How did this
> happen? Was it the Broadcom NICs? Was it the Intel 5530 Xeon CPU's? Or
> a combination? Or maybe just the cheap Netgear switch...
>
> I hope to get more access to these boxes, three of them, to run tests
> with the latest code. In that version we have the new hashtables, all
> the refcounters and fancy stuff that the Digium team has reworked on
> the inside of Asterisk. The trunk version will propably behave much,
> much better than 1.4 when it comes to heavy loads and high call setup
> rates.
>
> We're on our way to build a new generation of Asterisk, far away from
> the 1.0 platform. At the same time, the hardware guys have obviously
> not been asleep. They're giving us inexpensive hardware that makes our
> software shine. Now we need to test other things and see how the rest
> of Asterisk scales, apart from the actual calls. Manager, events,
> musiconhold, agi/fastagi... New interesting challenges.
>
> So take one of these standard rack servers from HP and run a telco for
> a small city on one box. While you're at it, buy a spare one, hardware
> can fail ( ;-) ).
> But don't say that Asterisk does not scale well. Those times are gone.
>
> /Olle
>
> ---
> * Olle E Johansson - oej at edvina.net
> * Open Unified Communication - SIP & XMPP projects
>
>
>
>
> _______________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> AstriCon 2009 - October 13 - 15 Phoenix, Arizona
> Register Now: http://www.astricon.net
>
> asterisk-users mailing list
> To UNSUBSCRIBE or update options visit:
>   http://lists.digium.com/mailman/listinfo/asterisk-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.digium.com/pipermail/asterisk-users/attachments/20090825/3cc5b8c2/attachment.htm

Steve Totaro

2009-Aug-25 14:10 UTC

head link

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

On Tue, Aug 25, 2009 at 8:59 AM, Olle E. Johansson <oej at edvina.net>
wrote:
> Hello Asterisk users around the world!
>
> Recently, I have been working with pretty large Asterisk
> installations. 300 servers running Asterisk and Kamailio (OpenSER).
> Replacing large Nortel systems with just a few tiny boxes and other
> interesting solutions. Testing has been a large part of these
> projects. How much can we put into one Asterisk box? Calls per euro
> invested matters.
>
> So far, we've been able to reach about 2000 channels of G.711 with
> quad core CPU's and Intel Pro/1000 network cards in IBM servers. At
> that point, we see that IRQ balancer gives up and goes to bed, and all
> the traffic is directed to one core and the system gives up. We've
> been running these tests on several systems, with different NICs and
> have been working hard to tweak capacity. New drivers, new cards, new
> stuff. But all indications told us that the problem was the CPU
> context switching between handling network traffic (RTP traffic) and
> Asterisk. This was also confirmed from a few different independent
> software development teams.
>
> Imaging my surprise this Monday when I installed a plain old Asterisk
> 1.4 on a new HP server, a DL380 G6, and could run in circles around
> the old IBM servers. Three servers looping calls between them and we
> bypassed 10.000 channels without any issues.  SIP to SIP calls, the
> p2p RTP bridge, basically running a media proxy. At that point, our
> cheap gigabit switch gave up, and of course the NICs. Pushing 850 Mbit
> was more than enough. The CPU's (we had 16 of them with
> hyperthreading) was not very stressed. Asterisk was occupying a few of
> them in a nice way, but we had a majority of them idling around
> looking for something to do.
>
> So, please help me. I need answers to John Todds questions while he's
> treating me with really good expensive wine at Astricon. How did this
> happen? Was it the Broadcom NICs? Was it the Intel 5530 Xeon CPU's? Or
> a combination? Or maybe just the cheap Netgear switch...
>
> I hope to get more access to these boxes, three of them, to run tests
> with the latest code. In that version we have the new hashtables, all
> the refcounters and fancy stuff that the Digium team has reworked on
> the inside of Asterisk. The trunk version will propably behave much,
> much better than 1.4 when it comes to heavy loads and high call setup
> rates.
>
> We're on our way to build a new generation of Asterisk, far away from
> the 1.0 platform. At the same time, the hardware guys have obviously
> not been asleep. They're giving us inexpensive hardware that makes our
> software shine. Now we need to test other things and see how the rest
> of Asterisk scales, apart from the actual calls. Manager, events,
> musiconhold, agi/fastagi... New interesting challenges.
>
> So take one of these standard rack servers from HP and run a telco for
> a small city on one box. While you're at it, buy a spare one, hardware
> can fail ( ;-) ).
> But don't say that Asterisk does not scale well. Those times are gone.
>
> /Olle
>
> ---
> * Olle E Johansson - oej at edvina.net
> * Open Unified Communication - SIP & XMPP projects
>
>I always was a fan and recommended IBM DL380s if not 360s (dual power
supply).

I would like to see some benchmarking on the AMI.  Not sure how to do it but
that used to be a very weak link.  I wonder if, and how much it has improved
over 1.2.x

-- 
Thanks,
Steve Totaro
+18887771888 (Toll Free)
+12409381212 (Cell)
+12024369784 (Skype)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.digium.com/pipermail/asterisk-users/attachments/20090825/b26fb4f4/attachment.htm

John Todd

2009-Aug-27 14:53 UTC

head link

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

On Aug 25, 2009, at 5:59 AM, Olle E. Johansson wrote:
> Hello Asterisk users around the world!
>
> Recently, I have been working with pretty large Asterisk
> installations. 300 servers running Asterisk and Kamailio (OpenSER).
> Replacing large Nortel systems with just a few tiny boxes and other
> interesting solutions. Testing has been a large part of these
> projects. How much can we put into one Asterisk box? Calls per euro
> invested matters.
>
> So far, we've been able to reach about 2000 channels of G.711 with
> quad core CPU's and Intel Pro/1000 network cards in IBM servers. At
> that point, we see that IRQ balancer gives up and goes to bed, and all
> the traffic is directed to one core and the system gives up. We've
> been running these tests on several systems, with different NICs and
> have been working hard to tweak capacity. New drivers, new cards, new
> stuff. But all indications told us that the problem was the CPU
> context switching between handling network traffic (RTP traffic) and
> Asterisk. This was also confirmed from a few different independent
> software development teams.
>
> Imaging my surprise this Monday when I installed a plain old Asterisk
> 1.4 on a new HP server, a DL380 G6, and could run in circles around
> the old IBM servers. Three servers looping calls between them and we
> bypassed 10.000 channels without any issues.  SIP to SIP calls, the
> p2p RTP bridge, basically running a media proxy. At that point, our
> cheap gigabit switch gave up, and of course the NICs. Pushing 850 Mbit
> was more than enough. The CPU's (we had 16 of them with
> hyperthreading) was not very stressed. Asterisk was occupying a few of
> them in a nice way, but we had a majority of them idling around
> looking for something to do.I'll
>
> So, please help me. I need answers to John Todds questions while he's
> treating me with really good expensive wine at Astricon. How did this
> happen? Was it the Broadcom NICs? Was it the Intel 5530 Xeon CPU's? Or
> a combination? Or maybe just the cheap Netgear switch...
>
> I hope to get more access to these boxes, three of them, to run tests
> with the latest code. In that version we have the new hashtables, all
> the refcounters and fancy stuff that the Digium team has reworked on
> the inside of Asterisk. The trunk version will propably behave much,
> much better than 1.4 when it comes to heavy loads and high call setup
> rates.
>
> We're on our way to build a new generation of Asterisk, far away from
> the 1.0 platform. At the same time, the hardware guys have obviously
> not been asleep. They're giving us inexpensive hardware that makes our
> software shine. Now we need to test other things and see how the rest
> of Asterisk scales, apart from the actual calls. Manager, events,
> musiconhold, agi/fastagi... New interesting challenges.
>
> So take one of these standard rack servers from HP and run a telco for
> a small city on one box. While you're at it, buy a spare one, hardware
> can fail ( ;-) ).
> But don't say that Asterisk does not scale well. Those times are gone.
>
> /Olle
>
> ---
> * Olle E Johansson - oej at edvina.net
> * Open Unified Communication - SIP & XMPP projects
>

Your dinner awaits, along with a very nice bottle of wine. (or port,  
or whatever it is you prefer.)  But, just a few questions..  ;-)

Moving back away from Layer 3 discussion that erupted on this thread,  
let's get back to what you actually did.

It seems odd to me (though I'm hopeful!) that this would "just
work"
without any changes.  Especially on 1.4.  So being somewhat scientific  
about it, I'd say that the first thing to do is to examine your  
measurements with an assumption that there is a flaw in your  
observations.  If you find no errors with that hypothesis, then you've  
deduced that things are as they seem.  :-)

1) Are you certain that the media was actually being routed?  I know  
you said that the switch and NICs gave up because of traffic, but did  
you choose a random channel and record the media between two servers?   
In other words, are you 100% certain that there was valid RTP being  
exchanged?  (I typically make one call out of every 100 or 1000 a  
"monkeys" call, and record it instead of just routing to Echo()  
directly, then play back later to ensure media was actually happening.)

2) Were the SIP transactions completing normally?  What was the rate  
of ramp-up?

3) Can you post your dialplan and a few snippets of "core show  
channels" at peak usage?

4) What was the sampling rate for the media?  20ms? 30ms? 40ms?

5) Any summary stats on RTP packet loss, etc? (from  
"CHANNEL(rtpqos,audio,all)") on channels?

JT

---
John Todd                       email:jtodd at digium.com
Digium, Inc. | Asterisk Open Source Community Director
445 Jan Davis Drive NW -  Huntsville AL 35806  -   USA
direct: +1-256-428-6083         http://www.digium.com/

Klaus Darilion

2009-Aug-27 20:04 UTC

head link

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

John Todd wrote:> 5) Any summary stats on RTP packet loss, etc? (from  
> "CHANNEL(rtpqos,audio,all)") on channels?
I wonder how to retrieve those stats:
- after Dial()?
- during Dial()? (how?)

regards
klaus

Darryl Dunkin

2009-Aug-27 22:52 UTC

head link

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

After dial.

I have put this in my hangup context as:
exten => h,1,Noop(QOS=${RTPAUDIOQOS})

-----Original Message-----
From: asterisk-users-bounces at lists.digium.com
[mailto:asterisk-users-bounces at lists.digium.com] On Behalf Of Klaus
Darilion
Sent: Thursday, August 27, 2009 13:04
To: Asterisk Users Mailing List - Non-Commercial Discussion
Subject: Re: [asterisk-users] Breaking news,but what happened? 11.000
channels on one server

John Todd wrote:> 5) Any summary stats on RTP packet loss, etc? (from  
> "CHANNEL(rtpqos,audio,all)") on channels?
I wonder how to retrieve those stats:
- after Dial()?
- during Dial()? (how?)

regards
klaus

Benny Amorsen

2009-Sep-07 20:19 UTC

head link

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

"Olle E. Johansson" <oej at edvina.net> writes:
> Imaging my surprise this Monday when I installed a plain old Asterisk  
> 1.4 on a new HP server, a DL380 G6, and could run in circles around  
> the old IBM servers.
The G6 series is pure magic for everything I've let it touch
network-wise.

I have three guesses as to why:

1) Lots and lots of bandwidth between CPU and I/O, plus built-in memory
controller so any packet copying runs wicked fast.

2) MSI-X seems to really help, at least when combined with modern
ethernet chipsets (the original PRO/1000 is looking a bit dated now, but
more modern PRO/1000 should still be a good choice).

3) Multi-queue NIC. This should REALLY help when you have lots of cores
and CPU threads. Depends on fairly new kernels.

I'm not sure which is the answer though.

/Benny

Apparently Analagous Threads

Search for more apparently analagous threads

asterisk users - Aug 2009 - Breaking news, but what happened? 11.000 channels on one server

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

[asterisk-users] Breaking news, but what happened? 11.000 channels on one server

Apparently Analagous Threads