We purchased a new Dual Xeon 3ghz, 2gb ram to upgrade our 3ghz Pentium 1gb ram, that has been having load issues due to our growing company. We are having problems... We use a predictive dialer that we custom programmed in perl. It basically drops, moves, files into the callout directory and uses queues to transfer to agents when someone picks up. It has been working pretty good, except we now have 50+ dialers on the system taking calls. The system dials 2-4 per available agent every 3-5 seconds based on, calls ringing and available agents. We can keep them to about 8-20 seconds between calls. But the number of ringing lines is causing load issues. Hence the new server. We put Fedora Core 4 on with now problem. We were running 2 t1's in the beginning of the day just to make sure the system was running good. We finally put it on 8 t1's and the system ran great for about 4 hours. Then the load started going up and up until the server just locked completely. I could not get much information from the server. The lead went to 170+ before it locked. Asterisk was showing 99% cpu usage at crash. I have some information that the log had in it just before the crash. There was something about cpu3 soft lockup and page fault messages. If someone can help I will post the log tomorrow when I get into work. We had to switch back to the old server with the load issues. Some other information about the servers follows: We are running a separate slim server to stream moh. The predictive server is a separate pc connecting via manager interface for agent information, available, busy and callerid of the person they are talking to We have a script (perl) running on the Asterisk server to move the callout files into the callout directory that are created via a web POST via apache, the script checks for files in a temp directory and move the files into the callout directory. Thanks, Kyle
Kyle Hagan wrote:> We purchased a new Dual Xeon 3ghz, 2gb ram to upgrade our 3ghz Pentium > 1gb ram, that has been having load issues due to our growing company. > > We are having problems... We use a predictive dialer that we custom > programmed in perl. It basically drops, moves, files into the callout > directory and uses queues to transfer to agents when someone picks up.Oh, we are running HEAD version. Kyle
Hi all, I am trying to test Asterisk with TE210 and SPANDSP. So i connect back-to-back (with E1 crossover cable) the two E1 ports of the TE210 and it seems that everything is fine. The i create a script that calls an extension that starts the rxfax application and initiate another extension that starts the txfax application. What i expect is to start sending a fax from the first E1 and to receive it from the other. But the result is to open the appropriate channels, move normall to the next priorities of each extension that means start sending fax and receiving fax but nothing more, freeze there.... Doing a pri intense debug at the spans i can see that the T203 counter restarts all the time. Find bellow configuration. Please help!!!!! --- SCRIPT --- #!/usr/bin/perl use Asterisk::Manager; $|++; my $astman = new Asterisk::Manager; $astman->user('admin'); $astman->secret('secret'); $astman->host('localhost'); $astman->connect || die $astman->error . "\n"; $astman->setcallback('Hangup', \&hangup_callback); $astman->setcallback('DEFAULT', \&default_callback); print $astman->sendcommand( Action => 'Originate', Callerid => SLOT1, Channel => 'Zap/g1/getfax', Exten => 'sendfax', Context => 'Outgoing', Priority => '1' ); $astman->eventloop; $astman->disconnect; sub hangup_callback { printf("hangup callback\n"); } sub default_callback { my (%stuff) = @_; foreach (keys %stuff) { printf("%s: %s\n", $_, $stuff{$_}); } printf("\n"); } --- RESULT WHEN RUNNING THE SCRIPT --- EventNewchannelChannelZap/3-1StateRsrvdCallerID<unknown>Uniqueid1131547210.4CallerID: SLOT1 Event: Newcallerid Uniqueid: 1131547210.4 Channel: Zap/3-1 CallerID: SLOT1 Event: Newcallerid Uniqueid: 1131547210.4 Channel: Zap/3-1 CallerID: SLOT1 Event: Newstate Uniqueid: 1131547210.4 Channel: Zap/3-1 State: Dialing CallerID: <unknown> Event: Newchannel Uniqueid: 1131547210.5 Channel: Zap/34-1 State: Ring Event: Newexten Channel: Zap/34-1 Context: Incoming Extension: getfax Application: SetVar Uniqueid: 1131547210.5 AppData: FAXFILE=/tmp/1131547210.5.tiff Priority: 1 Event: Newexten Channel: Zap/34-1 Context: Incoming Extension: getfax Uniqueid: 1131547210.5 Application: RxFAX AppData: /tmp/1131547210.5.tiff Priority: 2 CallerID: <unknown> Event: Newstate Channel: Zap/34-1 State: Up Uniqueid: 1131547210.5 CallerID: SLOT1 Event: Newstate Channel: Zap/3-1 State: Up Uniqueid: 1131547210.4 Event: Newexten Channel: Zap/3-1 Context: Outgoing Extension: sendfax Uniqueid: 1131547210.4 Application: SetVar AppData: SENDFAX=/tmp/sendfax.tiff Priority: 1 Event: Newexten Channel: Zap/3-1 Context: Outgoing Extension: sendfax Uniqueid: 1131547210.4 Application: TxFAX AppData: /tmp/sendfax.tiff|caller Priority: 2 ***** Stays there --- ZAPATA.CONF --- [trunkgroups] ; define any trunk groups [channels] switchtype=euroisdn ;pridialplan=national signalling=pri_cpe usecallerid=yes hidecallerid=no callwaiting=yes callwaitingcallerid=yes threewaycalling=yes transfer=yes cancallforward=yes echocancel=no rxgain=0.0 txgain=0.0 ;faxdetect=both ; Span 1 context=Outgoing group=1 ;signalling=pri_net signalling=pri_cpe channel => 1-15 channel => 17-31 ; Span 2 context=Incoming group=2 signalling=pri_net ;signalling=pri_cpe channel => 32-46 channel => 48-62 --- ZAPTEL.CONF --- # # Zaptel Configuration File # # This file is parsed by the Zaptel Configurator, ztcfg # # # First come the span definitions, in the format # span=<span num>,<timing>,<line build out (LBO)>,<framing>,<coding>[,yellow] # # The timing parameter determines the selection of primary, secondary, and # so on sync sources. If this span should be considered a primary sync # source, then give it a value of "1". For a secondary, use "2", and so on. # To not use this as a sync source, just use "0" # # The line build-out (or LBO) is an integer, from the following table: # 0: 0 db (CSU) / 0-133 feet (DSX-1) # 1: 133-266 feet (DSX-1) # 2: 266-399 feet (DSX-1) # 3: 399-533 feet (DSX-1) # 4: 533-655 feet (DSX-1) # 5: -7.5db (CSU) # 6: -15db (CSU) # 7: -22.5db (CSU) # # The framing is one of "d4" or "esf" for T1 or "cas" or "ccs" for E1 # # Note: "d4" could be referred to as "sf" or "superframe" # # The coding is one of "ami" or "b8zs" for T1 or "ami" or "hdb3" for E1 # # E1's may have the additional keyword "crc4" to enable CRC4 checking # # If the keyword "yellow" follows, yellow alarm is transmitted when no # channels are open. # #span=1,0,0,esf,b8zs #span=2,1,0,esf,b8zs #span=3,0,0,ccs,hdb3,crc4 # # Next come the dynamic span definitions, in the form: # dynamic=<driver>,<address>,<numchans>,<timing> # # Where <driver> is the name of the driver (e.g. eth), <address> is the # driver specific address (like a MAC for eth), <numchans> is the number # of channels, and <timing> is a timing priority, like for a normal span. # use "0" to not use this as a timing source, or prioritize them as # primary, secondard, etc. Note that you MUST have a REAL zaptel device # if you are not using external timing. # # dynamic=eth,eth0/00:02:b3:35:43:9c,24,0 # # Next come the definitions for using the channels. The format is: # <device>=<channel list> # # Valid devices are: # # "e&m" : Channel(s) are signalled using E&M signalling (specific # implementation, such as Immediate, Wink, or Feature Group D # are handled by the userspace library). # "fxsls" : Channel(s) are signalled using FXS Loopstart protocol. # "fxsgs" : Channel(s) are signalled using FXS Groundstart protocol. # "fxsks" : Channel(s) are signalled using FXS Koolstart protocol. # "fxols" : Channel(s) are signalled using FXO Loopstart protocol. # "fxogs" : Channel(s) are signalled using FXO Groundstart protocol. # "fxoks" : Channel(s) are signalled using FXO Koolstart protocol. # "sf" : Channel(s) are signalled using in-band single freq tone. # Syntax as follows: # channel# => sf:<rxfreq>,<rxbw>,<rxflag>,<txfreq>,<txlevel>,<txflag> # rxfreq is rx tone freq in hz, rxbw is rx notch (and decode) # bandwith in hz (typically 10.0), rxflag is either 'normal' or # 'inverted', txfreq is tx tone freq in hz, txlevel is tx tone # level in dbm, txflag is either 'normal' or 'inverted'. Set # rxfreq or txfreq to 0.0 if that tone is not desired. # "unused" : No signalling is performed, each channel in the list remains idle # "clear" : Channel(s) are bundled into a single span. No conversion or # signalling is performed, and raw data is available on the master. # "indclear": Like "clear" except all channels are treated individually and # are not bundled. "bchan" is an alias for this. # "rawhdlc" : The zaptel driver performs HDLC encoding and decoding on the # bundle, and the resulting data is communicated via the master # device. # "fcshdlc" : The zapdel driver performs HDLC encoding and decoding on the # bundle and also performs incoming and outgoing FCS insertion # and verification. "dchan" is an alias for this. # "nethdlc" : The zaptel driver bundles the channels together into an # hdlc network device, which in turn can be configured with # sethdlc (available separately). # "dacs" : The zaptel driver cross connects the channels starting at # the channel number listed at the end, after a colon # "dacsrbs" : The zaptel driver cross connects the channels starting at # the channel number listed at the end, after a colon and # also performs the DACSing of RBS bits # # The channel list is a comma-separated list of channels or ranges, for # example: # # 1,3,5 (channels one, three, and five) # 16-23, 29 (channels 16 through 23, as well as channel 29 # # So, some complete examples are: # e&m=1-12 # nethdlc=13-24 # fxsls=25,26,27,28 # fxols=29-32 # #fxoks=1-24 #bchan=25-47 #dchan=48 #fxols=1-12 #fxols=13-24 #e&m=25-29 #nethdlc=30-33 #clear=44 #clear=45 #clear=46 #clear=47 #fcshdlc=48 #dacs=1-24:48 #dacsrbs=1-24:48 span=1,1,0,ccs,hdb3,crc4 span=2,0,0,ccs,hdb3,crc4 bchan = 1-15, 17-31 dchan = 16 bchan = 32-46,48-62 dchan = 47 # # Finally, you can preload some tone zones, to prevent them from getting # overwritten by other users (if you allow non-root users to open /dev/zap/* # interfaces anyway. Also this means they won't have to be loaded at runtime. # The format is "loadzone=<zone>" where the zone is a two letter country code. # # You may also specify a default zone with "defaultzone=<zone>" where zone # is a two letter country code. # # An up-to-date list of the zones can be found in the file zaptel/zonedata.c # loadzone = gr #loadzone = us-old #loadzone=gr #loadzone=it #loadzone=fr #loadzone=de #loadzone=uk #loadzone=fi #loadzone=jp #loadzone=sp #loadzone=no defaultzone=gr --- EXTENSION.CONF --- [Outgoing] ;exten => s,1,Dial() ;exten => s,2,Hangup() exten => sendfax,1,SetVar(SENDFAX=/tmp/sendfax.tiff) exten => sendfax,2,txfax(${SENDFAX},caller) exten => sendfax,3,Hangup() [Incoming] ;exten => s,1,Answer() ;exten => s,2,Hangup() exten => getfax,1,SetVar(FAXFILE=/tmp/${UNIQUEID}.tiff) exten => getfax,2,rxfax(${FAXFILE}) exten => getfax,3,Hangup() ;exten => getfax,1,Answer() ;exten => getfax,2,rxfax(/tmp/${UNIQUEID}) ;exten => getfax,3,Hangup() --- PRI INTENSE DEBUG SPAN 1 --- *CLI> pri intense debug span 1 Enabled EXTENSIVE debugging on span 1 *CLI> == Parsing '/etc/asterisk/manager.conf': Found == Manager 'admin' logged on from 127.0.0.1 > [ 00 01 84 84 08 02 00 05 05 04 03 80 90 a3 18 03 a9 83 84 6c 06 21 81 31 32 33 34 70 07 a1 67 65 74 66 61 78 a1 ] > Informational frame: > SAPI: 00 C/R: 0 EA: 0 > TEI: 000 EA: 1 > N(S): 066 0: 0 > N(R): 066 P: 0 > 33 bytes of data -- Restarting T203 counter Stopping T_203 timer Starting T_200 timer > Protocol Discriminator: Q.931 (8) len=33 > Call Ref: len= 2 (reference 5/0x5) (Originator) > Message type: SETUP (5) > [04 03 80 90 a3] > Bearer Capability (len= 5) [ Ext: 1 Q.931 Std: 0 Info transfer capability: Speech (0) > Ext: 1 Trans mode/rate: 64kbps, circuit-mode (16) > Ext: 1 User information layer 1: A-Law (35) > [18 03 a9 83 84] > Channel ID (len= 5) [ Ext: 1 IntID: Implicit, PRI Spare: 0, Exclusive Dchan: 0 > ChanSel: Reserved > Ext: 1 Coding: 0 Number Specified Channel Type: 3 > Ext: 1 Channel: 4 ] > [6c 06 21 81 31 32 33 34] > Calling Number (len= 8) [ Ext: 0 TON: National Number (2) NPI: ISDN/Telephony Numbering Plan (E.164/E.163) (1) > Presentation: Presentation permitted, user number passed network screening (1) '1234' ] > [70 07 a1 67 65 74 66 61 78] > Called Number (len= 9) [ Ext: 1 TON: National Number (2) NPI: ISDN/Telephony Numbering Plan (E.164/E.163) (1) 'getfax' ] > [a1] > Sending Complete (len= 1) -- Accepting call from '1234' to 'getfax' on channel 0/4, span 2 -- Executing SetVar("Zap/35-1", "FAXFILE=/tmp/1131617439.7.tiff") in new stack -- Executing RxFAX("Zap/35-1", "/tmp/1131617439.7.tiff") in new stack < [ 00 01 01 86 ] < Supervisory frame: < SAPI: 00 C/R: 0 EA: 0 < TEI: 000 EA: 1 < Zero: 0 S: 0 01: 1 [ RR (receive ready) ] < N(R): 067 P/F: 0 < 0 bytes of data -- ACKing all packets from 65 to (but not including) 67 -- ACKing packet 66, new txqueue is -1 (-1 means empty) -- Since there was nothing left, stopping T200 counter -- Nothing left, starting T203 counter -- Restarting T203 counter < [ 02 01 84 86 08 02 80 05 02 18 03 a9 83 84 ] < Informational frame: < SAPI: 00 C/R: 1 EA: 0 < TEI: 000 EA: 1 < N(S): 066 0: 0 < N(R): 067 P: 0 < 10 bytes of data -- ACKing all packets from 66 to (but not including) 67 -- Since there was nothing left, stopping T200 counter -- Stopping T203 counter since we got an ACK -- Nothing left, starting T203 counter < Protocol Discriminator: Q.931 (8) len=10 < Call Ref: len= 2 (reference 32773/0x8005) (Terminator) < Message type: CALL PROCEEDING (2) < [18 03 a9 83 84] < Channel ID (len= 5) [ Ext: 1 IntID: Implicit, PRI Spare: 0, Exclusive Dchan: 0 < ChanSel: Reserved < Ext: 1 Coding: 0 Number Specified Channel Type: 3 < Ext: 1 Channel: 4 ] Sending Receiver Ready (67) > [ 02 01 01 86 ] > Supervisory frame: > SAPI: 00 C/R: 1 EA: 0 > TEI: 000 EA: 1 > Zero: 0 S: 0 01: 1 [ RR (receive ready) ] > N(R): 067 P/F: 0 > 0 bytes of data -- Restarting T203 counter -- Restarting T203 counter < [ 02 01 86 86 08 02 80 05 07 18 03 a9 83 84 1e 02 81 82 29 05 05 0b 0a 0c 0a ] < Informational frame: < SAPI: 00 C/R: 1 EA: 0 < TEI: 000 EA: 1 < N(S): 067 0: 0 < N(R): 067 P: 0 < 21 bytes of data -- ACKing all packets from 66 to (but not including) 67 -- Since there was nothing left, stopping T200 counter -- Stopping T203 counter since we got an ACK -- Nothing left, starting T203 counter < Protocol Discriminator: Q.931 (8) len=21 < Call Ref: len= 2 (reference 32773/0x8005) (Terminator) < Message type: CONNECT (7) < [18 03 a9 83 84] < Channel ID (len= 5) [ Ext: 1 IntID: Implicit, PRI Spare: 0, Exclusive Dchan: 0 < ChanSel: Reserved < Ext: 1 Coding: 0 Number Specified Channel Type: 3 < Ext: 1 Channel: 4 ] < [1e 02 81 82] < Progress Indicator (len= 4) [ Ext: 1 Coding: CCITT (ITU) standard (0) 0: 0 Location: Private network serving the local user (1) < Ext: 1 Progress Description: Called equipment is non-ISDN. (2) ] < [29 05 05 0b 0a 0c 0a] < Time Date (len= 7) [ 05-11-10 12:10 ] > [ 00 01 86 88 08 02 00 05 0f ] > Informational frame: > SAPI: 00 C/R: 0 EA: 0 > TEI: 000 EA: 1 > N(S): 067 0: 0 > N(R): 068 P: 0 > 5 bytes of data -- Restarting T203 counter Stopping T_203 timer Starting T_200 timer > Protocol Discriminator: Q.931 (8) len=5 > Call Ref: len= 2 (reference 5/0x5) (Originator) > Message type: CONNECT ACKNOWLEDGE (15) > Channel Zap/4-1 was answered. -- Executing SetVar("Zap/4-1", "SENDFAX=/tmp/sendfax.tiff") in new stack -- Executing TxFAX("Zap/4-1", "/tmp/sendfax.tiff|caller") in new stack < [ 00 01 01 88 ] < Supervisory frame: < SAPI: 00 C/R: 0 EA: 0 < TEI: 000 EA: 1 < Zero: 0 S: 0 01: 1 [ RR (receive ready) ] < N(R): 068 P/F: 0 < 0 bytes of data -- ACKing all packets from 66 to (but not including) 68 -- ACKing packet 67, new txqueue is -1 (-1 means empty) -- Since there was nothing left, stopping T200 counter -- Nothing left, starting T203 counter -- Restarting T203 counter T203 counter expired, sending RR and scheduling T203 again Sending Receiver Ready (68) > [ 00 01 01 89 ] > Supervisory frame: > SAPI: 00 C/R: 0 EA: 0 > TEI: 000 EA: 1 > Zero: 0 S: 0 01: 1 [ RR (receive ready) ] > N(R): 068 P/F: 1 > 0 bytes of data -- Restarting T203 counter < [ 00 01 01 89 ] < Supervisory frame: < SAPI: 00 C/R: 0 EA: 0 < TEI: 000 EA: 1 < Zero: 0 S: 0 01: 1 [ RR (receive ready) ] < N(R): 068 P/F: 1 < 0 bytes of data -- ACKing all packets from 67 to (but not including) 68 -- Since there was nothing left, stopping T200 counter -- Stopping T203 counter since we got an ACK -- Nothing left, starting T203 counter -- Got RR response to our frame -- Restarting T203 counter < [ 02 01 01 89 ] < Supervisory frame: < SAPI: 00 C/R: 1 EA: 0 < TEI: 000 EA: 1 < Zero: 0 S: 0 01: 1 [ RR (receive ready) ] < N(R): 068 P/F: 1 < 0 bytes of data -- ACKing all packets from 67 to (but not including) 68 -- Since there was nothing left, stopping T200 counter -- Stopping T203 counter since we got an ACK -- Nothing left, starting T203 counter -- Unsolicited RR with P/F bit, responding Sending Receiver Ready (68) > [ 02 01 01 89 ] > Supervisory frame: > SAPI: 00 C/R: 1 EA: 0 > TEI: 000 EA: 1 > Zero: 0 S: 0 01: 1 [ RR (receive ready) ] > N(R): 068 P/F: 1 > 0 bytes of data -- Restarting T203 counter -- Restarting T203 counter T203 counter expired, sending RR and scheduling T203 again Sending Receiver Ready (68) > [ 00 01 01 89 ] > Supervisory frame: > SAPI: 00 C/R: 0 EA: 0 > TEI: 000 EA: 1 > Zero: 0 S: 0 01: 1 [ RR (receive ready) ] > N(R): 068 P/F: 1 > 0 bytes of data -- Restarting T203 counter < [ 00 01 01 89 ] < Supervisory frame: < SAPI: 00 C/R: 0 EA: 0 < TEI: 000 EA: 1 < Zero: 0 S: 0 01: 1 [ RR (receive ready) ] < N(R): 068 P/F: 1 < 0 bytes of data -- ACKing all packets from 67 to (but not including) 68 -- Since there was nothing left, stopping T200 counter -- Stopping T203 counter since we got an ACK -- Nothing left, starting T203 counter -- Got RR response to our frame -- Restarting T203 counter < [ 02 01 01 89 ] < Supervisory frame: < SAPI: 00 C/R: 1 EA: 0 < TEI: 000 EA: 1 < Zero: 0 S: 0 01: 1 [ RR (receive ready) ] < N(R): 068 P/F: 1 < 0 bytes of data -- ACKing all packets from 67 to (but not including) 68 -- Since there was nothing left, stopping T200 counter -- Stopping T203 counter since we got an ACK -- Nothing left, starting T203 counter -- Unsolicited RR with P/F bit, responding Sending Receiver Ready (68) > [ 02 01 01 89 ] > Supervisory frame: > SAPI: 00 C/R: 1 EA: 0 > TEI: 000 EA: 1 > Zero: 0 S: 0 01: 1 [ RR (receive ready) ] > N(R): 068 P/F: 1 > 0 bytes of data -- Restarting T203 counter -- Restarting T203 counter T203 counter expired, sending RR and scheduling T203 again Sending Receiver Ready (68) > [ 00 01 01 89 ] > Supervisory frame: > SAPI: 00 C/R: 0 EA: 0 > TEI: 000 EA: 1 > Zero: 0 S: 0 01: 1 [ RR (receive ready) ] > N(R): 068 P/F: 1 > 0 bytes of data -- Restarting T203 counter < [ 00 01 01 89 ] < Supervisory frame: < SAPI: 00 C/R: 0 EA: 0 < TEI: 000 EA: 1 < Zero: 0 S: 0 01: 1 [ RR (receive ready) ] < N(R): 068 P/F: 1 < 0 bytes of data -- ACKing all packets from 67 to (but not including) 68 -- Since there was nothing left, stopping T200 counter -- Stopping T203 counter since we got an ACK -- Nothing left, starting T203 counter -- Got RR response to our frame -- Restarting T203 counter < [ 02 01 01 89 ] < Supervisory frame: < SAPI: 00 C/R: 1 EA: 0 < TEI: 000 EA: 1 < Zero: 0 S: 0 01: 1 [ RR (receive ready) ] < N(R): 068 P/F: 1 T203 counter expired, sending RR and scheduling T203 again Sending Receiver Ready (68) > [ 00 01 01 89 ] > Supervisory frame: > SAPI: 00 C/R: 0 EA: 0 > TEI: 000 EA: 1 > Zero: 0 S: 0 01: 1 [ RR (receive ready) ] > N(R): 068 P/F: 1 > 0 bytes of data -- Restarting T203 counter < [ 00 01 01 89 ] < Supervisory frame: < SAPI: 00 C/R: 0 EA: 0 < TEI: 000 EA: 1 < Zero: 0 S: 0 01: 1 [ RR (receive ready) ] < N(R): 068 P/F: 1 < 0 bytes of data -- ACKing all packets from 67 to (but not including) 68 -- Since there was nothing left, stopping T200 counter -- Stopping T203 counter since we got an ACK -- Nothing left, starting T203 counter -- Got RR response to our frame -- Restarting T203 counter < [ 02 01 01 89 ] < Supervisory frame: < SAPI: 00 C/R: 1 EA: 0 < TEI: 000 EA: 1 < Zero: 0 S: 0 01: 1 [ RR (receive ready) ] < N(R): 068 P/F: 1 < 0 bytes of data -- ACKing all packets from 67 to (but not including) 68 -- Since there was nothing left, stopping T200 counter -- Stopping T203 counter since we got an ACK -- Nothing left, starting T203 counter -- Unsolicited RR with P/F bit, responding Sending Receiver Ready (68) > [ 02 01 01 89 ] > Supervisory frame: > SAPI: 00 C/R: 1 EA: 0 > TEI: 000 EA: 1 > Zero: 0 S: 0 01: 1 [ RR (receive ready) ] > N(R): 068 P/F: 1 > 0 bytes of data -- Restarting T203 counter -- Restarting T203 counter pri no debug span 1 Disabled debugging on span 1
I would say your best bet is to change your system into a distributed dialing system. We did this with Vicidial and have installations on multiple servers with over 100 agents all working off of the same lists and campaigns. A distributed system will also allow for more redundancy and less total downtime if one server goes down. We noticed the same kind of limitations you are and now do a max of 40 agents per server, and when we need more capacity we just add another server. MATT--- On 11/10/05, Kyle Hagan <info@quadrasoftware.com> wrote:> Kyle Hagan wrote: > > > We purchased a new Dual Xeon 3ghz, 2gb ram to upgrade our 3ghz Pentium > > 1gb ram, that has been having load issues due to our growing company. > > > > We are having problems... We use a predictive dialer that we custom > > programmed in perl. It basically drops, moves, files into the callout > > directory and uses queues to transfer to agents when someone picks up. > > > Oh, we are running HEAD version. > > > Kyle > _______________________________________________ > --Bandwidth and Colocation sponsored by Easynews.com -- > > Asterisk-Users mailing list > Asterisk-Users@lists.digium.com > http://lists.digium.com/mailman/listinfo/asterisk-users > To UNSUBSCRIBE or update options visit: > http://lists.digium.com/mailman/listinfo/asterisk-users >
One more point, just for troubleshooting, try using the same app without the manager over the netowork (use the manager localy, even if it means running the app localy) this might solve the problem, as running the manager over the network is not very stable for asterisk. On 11/10/05, Matt Florell <astmattf@gmail.com> wrote:> I would say your best bet is to change your system into a distributed > dialing system. We did this with Vicidial and have installations on > multiple servers with over 100 agents all working off of the same > lists and campaigns. A distributed system will also allow for more > redundancy and less total downtime if one server goes down. > > We noticed the same kind of limitations you are and now do a max of 40 > agents per server, and when we need more capacity we just add another > server. > > MATT--- > > On 11/10/05, Kyle Hagan <info@quadrasoftware.com> wrote: > > Kyle Hagan wrote: > > > > > We purchased a new Dual Xeon 3ghz, 2gb ram to upgrade our 3ghz Pentium > > > 1gb ram, that has been having load issues due to our growing company. > > > > > > We are having problems... We use a predictive dialer that we custom > > > programmed in perl. It basically drops, moves, files into the callout > > > directory and uses queues to transfer to agents when someone picks up. > > > > > > Oh, we are running HEAD version. > > > > > > Kyle > > _______________________________________________ > > --Bandwidth and Colocation sponsored by Easynews.com -- > > > > Asterisk-Users mailing list > > Asterisk-Users@lists.digium.com > > http://lists.digium.com/mailman/listinfo/asterisk-users > > To UNSUBSCRIBE or update options visit: > > http://lists.digium.com/mailman/listinfo/asterisk-users > > > _______________________________________________ > --Bandwidth and Colocation sponsored by Easynews.com -- > > Asterisk-Users mailing list > Asterisk-Users@lists.digium.com > http://lists.digium.com/mailman/listinfo/asterisk-users > To UNSUBSCRIBE or update options visit: > http://lists.digium.com/mailman/listinfo/asterisk-users >
The problem seems to be related to the number of calls being dialed, and not the number of agents logged in. Is there any configuration for the Digium cards that would make it work better? The older server is not crashing with 50+ agents logged in, the load is just high. We were planning on getting the DS3 card digium is supposed to release soon. But it seems the problem is dialing 100+ lines at a time. Will the DS3 card help asterisk handle making the calls? If not what use is the DS3 card, right? Today we are going to try Fedora Core 1 x86_64 and Mandriva 2005 x86_64 to see if it works better... Any ideas? Here is the actual messages file: 9 15:18:12 xeonAsterisk kernel: zaptel Disabled echo canceller because of tone (rx) on channel 12 Nov 9 15:18:20 xeonAsterisk kernel: zaptel Disabled echo canceller because of tone (rx) on channel 39 Nov 9 15:19:12 xeonAsterisk kernel: BUG: soft lockup detected on CPU#3! Nov 9 15:19:12 xeonAsterisk kernel: Nov 9 15:19:12 xeonAsterisk kernel: Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core rfc omm l2cap bluetooth sunrpc pcmcia yenta_socket rsrc_nonstatic pcmcia_core dm_mod video button battery ac uhci_hcd e hci_hcd wct4xxp(U) zaptel(U) crc_ccitt shpchp tg3 floppy ext3 jbd megaraid_mbox megaraid_mm sd_mod scsi_mod Nov 9 15:19:12 xeonAsterisk kernel: Pid: 4136, comm: asterisk Not tainted 2.6.11-1.1369_FC4smp Nov 9 15:19:12 xeonAsterisk kernel: RIP: 0010:[<ffffffff8013f198>] <ffffffff8013f198>{__do_softirq+104} Nov 9 15:19:12 xeonAsterisk kernel: RSP: 0018:ffff8100033b3f68 EFLAGS: 00000202 Nov 9 15:19:12 xeonAsterisk kernel: RAX: ffffffff80510800 RBX: 0000000000000002 RCX: 00000001003ac615 Nov 9 15:19:12 xeonAsterisk kernel: RDX: 0000000000000000 RSI: ffff810002c28680 RDI: 0000000000000002 Nov 9 15:19:12 xeonAsterisk kernel: RBP: ffff81007597fe18 R08: 0000000000000000 R09: 0000000000000000 Nov 9 15:19:12 xeonAsterisk kernel: R10: 0000000000000000 R11: 000000000000008c R12: ffffffff80513980 Nov 9 15:19:12 xeonAsterisk kernel: R13: 0000000000000180 R14: ffffffff80557a80 R15: 000000000000000a Nov 9 15:19:12 xeonAsterisk kernel: FS: 0000000040aea960(0063) GS:ffffffff80510800(0000) knlGS:0000000000000000 Nov 9 15:19:12 xeonAsterisk kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Nov 9 15:19:13 xeonAsterisk kernel: CR2: 0000000000ab4a7c CR3: 0000000077d12000 CR4: 00000000000006e0 Nov 9 15:19:13 xeonAsterisk kernel: Nov 9 15:19:13 xeonAsterisk kernel: Call Trace: <IRQ> <ffffffff8013f260>{do_softirq+48} <ffffffff8010f419>{apic_ti mer_interrupt+133} Nov 9 15:19:13 xeonAsterisk kernel: <EOI> <ffffffff80189843>{fget+83} <ffffffff80189829>{fget+57} Nov 9 15:19:13 xeonAsterisk kernel: <ffffffff8019dd14>{sys_poll+452} <ffffffff8019d590>{__pollwait+0} Nov 9 15:19:13 xeonAsterisk kernel: <ffffffff80111c47>{syscall_trace_leave+55} <ffffffff8010ebf6>{tracesys+ 209} Nov 9 15:19:13 xeonAsterisk kernel: Nov 9 15:19:13 xeonAsterisk kernel: Nov 9 15:19:13 xeonAsterisk kernel: Call Trace: <IRQ> <ffffffff80162344>{softlockup_tick+132} <ffffffff80113481>{t imer_interrupt+481} Nov 9 15:19:13 xeonAsterisk kernel: <ffffffff801625ec>{handle_IRQ_event+44} <ffffffff8016271d>{__do_IRQ+253 } : Kyle Matt Florell wrote:>I would say your best bet is to change your system into a distributed >dialing system. We did this with Vicidial and have installations on >multiple servers with over 100 agents all working off of the same >lists and campaigns. A distributed system will also allow for more >redundancy and less total downtime if one server goes down. > >We noticed the same kind of limitations you are and now do a max of 40 >agents per server, and when we need more capacity we just add another >server. > >MATT--- > >On 11/10/05, Kyle Hagan <info@quadrasoftware.com> wrote: > > >>Kyle Hagan wrote: >> >> >> >>>We purchased a new Dual Xeon 3ghz, 2gb ram to upgrade our 3ghz Pentium >>>1gb ram, that has been having load issues due to our growing company. >>> >>>We are having problems... We use a predictive dialer that we custom >>>programmed in perl. It basically drops, moves, files into the callout >>>directory and uses queues to transfer to agents when someone picks up. >>> >>> >>Oh, we are running HEAD version. >> >> >>Kyle >>_______________________________________________ >>--Bandwidth and Colocation sponsored by Easynews.com -- >> >>Asterisk-Users mailing list >>Asterisk-Users@lists.digium.com >>http://lists.digium.com/mailman/listinfo/asterisk-users >>To UNSUBSCRIBE or update options visit: >> http://lists.digium.com/mailman/listinfo/asterisk-users >> >> >> >_______________________________________________ >--Bandwidth and Colocation sponsored by Easynews.com -- > >Asterisk-Users mailing list >Asterisk-Users@lists.digium.com >http://lists.digium.com/mailman/listinfo/asterisk-users >To UNSUBSCRIBE or update options visit: > http://lists.digium.com/mailman/listinfo/asterisk-users > > >
steve@daviesfam.org
2005-Nov-11 04:59 UTC
[Asterisk-Users] Asterisk Crashing (high load issues)
On Wed, 9 Nov 2005, Kyle Hagan wrote:> We purchased a new Dual Xeon 3ghz, 2gb ram to upgrade our 3ghz Pentium > 1gb ram, that has been having load issues due to our growing company. > > We are having problems... We use a predictive dialer that we custom > programmed in perl. It basically drops, moves, files into the callout > directory and uses queues to transfer to agents when someone picks up. > > It has been working pretty good, except we now have 50+ dialers on the > system taking calls. The system dials 2-4 per available agent every 3-5 > seconds based on, calls ringing and available agents. We can keep them > to about 8-20 seconds between calls. But the number of ringing lines is > causing load issues. Hence the new server. > > We put Fedora Core 4 on with now problem. We were running 2 t1's in the > beginning of the day just to make sure the system was running good. We > finally put it on 8 t1's and the system ran great for about 4 hours. > Then the load started going up and up until the server just locked > completely. I could not get much information from the server. The lead > went to 170+ before it locked. Asterisk was showing 99% cpu usage at crash. > > I have some information that the log had in it just before the crash. > There was something about cpu3 soft lockup and page fault messages. If > someone can help I will post the log tomorrow when I get into work. > We had to switch back to the old server with the load issues. > > Some other information about the servers follows: > > We are running a separate slim server to stream moh. > The predictive server is a separate pc connecting via manager interface > for agent information, available, busy and callerid of the person they > are talking to > We have a script (perl) running on the Asterisk server to move the > callout files into the callout directory that are created via a web POST > via apache, the script checks for files in a temp directory and move the > files into the callout directory.Hi Kyle, I'd simply say that you have overloaded that machine. We use boxes like that for a similar outbound dial setup. I don't think I'd attempt to go past 4 E1s (120 lines) which would be 5 T1s. If the box is running hard like that the load average will sit around 7 or so, still fair amount of spare CPU but there is no way an Asterisk box will run well with the CPU anything like maxed out. Our site has 250 agents or so and the work is currently spread over 6 servers with 3 E1 PRIs on each. Each box makes around 3000 to 5000 call attempts per hour. If you are getting very high load average - are you recording calls? It would REALLY not be a good idea to use the "m" option to Monitor to mix calls on the fly - the soxmix processes will accumulate and accumulate. Your Perl dialler also needs to be more sympathetic to the machine capacity and back off when the server is getting overloaded. Otherwise you are certain to drive it into the ground. Steve