Hello, I was wondering if anyone out there is successfully running Asterisk 1.2 svn w/ Centos 4.3. I had an experience over the last two weeks that has me scratching my head and muttering strange things in the wee hours of the morning. I am going to try and be as descriptive as my brain will allow right now, but if there is something that I do not cover, please do not hesitate to ask and I'll be happy to answer. For the last 2 years, I have been running a mixture of Tao Linux and Centos (both RHEL derivatives) on our production boxes. Asterisk has run flawlessly on all installations. Last week, I updated one of our gateway boxes from Centos 4.2 (under which it ran for 6 months without issue) to the new 4.3 code. Almost immediately, we began to experience problems. Asterisk would core w/ the following: #0 0x004878ab in test_err () from /usr/lib/asterisk/modules/codec_g729a.so The segfaults would happen under very light loads, in some cases with just a single call. Kevin was able to log in to the box, and put a debugging version of codec_g729 on the box. He determined that the problem was that the values that were being returned in that routine were incorrect. I.E. something in the system was returning a non-zero value when multiplying a number by "0". Barring any other explanations, we assumed that there was a hardware issue somewhere, either in the memory, or the FPU on the CPU. So, we replaced the box w/ a brand new Dual-Core system running a Dual-Core Pentium D 920. We loaded the 32 bit version of Centos 4.3 onto the box and proceeded to start testing. BAM.. same problem.. the backtrace showed the failure in the same routine. We scratched our heads, and after many hours of trying various things (backing off the kernel to 2.6.9-22) and even moving to the new development kernel 2.6.9-34.19 (from the testing tree) we could do nothing to solve the issue. Mind you, this is the exact same behavior on two different hardware platforms running the exact same distribution. We even loaded up a third box and could reproduce the behavior on it as well. Three different boxes, one common distribution. As a test, we installed Fedora Core 5 x86_64 on the new Dual Core box and ran extensive tests overnight, simulating 96 channels doing G729 to Ulaw transcoding. The box ran completely stable. No hiccups. So, this morning, we put it back into the cluster, and it's now taking about 200 concurrent calls, doing an insane amount of transcoding and it is working just fine. Before, it would have cored in the first couple of minutes. I'm scratching my head here, because I generally have had excellent experiences with Centos. However, I have NO idea what might be the issue here. Could it be the kernel? (We tried three different ones!). Could it be the libc? Maybe it is the compiler? In any case, if anyone is having success with Centos 4.3 (32 bit), please speak up. I'd like to get to the bottom of it. I generally do not like to run Fedora on production equipment as it is generally bleeding edge. In this case, FC5 is running 2.6.16 something.. -- Vice President of N2Net, a New Age Consulting Service, Inc. Company http://www.n2net.net Where everything clicks into place! KP-216-121-ST
On Mon, 2006-05-22 at 12:16 -0400, Greg Boehnlein wrote:> Hello, > I was wondering if anyone out there is successfully running > Asterisk 1.2 svn w/ Centos 4.3. I had an experience over the last two > weeks that has me scratching my head and muttering strange things in the > wee hours of the morning. I am going to try and be as descriptive as my > brain will allow right now, but if there is something that I do not cover, > please do not hesitate to ask and I'll be happy to answer. > > For the last 2 years, I have been running a mixture of Tao Linux > and Centos (both RHEL derivatives) on our production boxes. Asterisk has > run flawlessly on all installations. Last week, I updated one of our > gateway boxes from Centos 4.2 (under which it ran for 6 months without > issue) to the new 4.3 code. Almost immediately, we began to experience > problems. Asterisk would core w/ the following: > > #0 0x004878ab in test_err () from > /usr/lib/asterisk/modules/codec_g729a.so > > The segfaults would happen under very light loads, in some cases > with just a single call. Kevin was able to log in to the box, and put a > debugging version of codec_g729 on the box. He determined that the problem > was that the values that were being returned in that routine were > incorrect. I.E. something in the system was returning a non-zero value > when multiplying a number by "0". Barring any other explanations, we > assumed that there was a hardware issue somewhere, either in the memory, > or the FPU on the CPU. > So, we replaced the box w/ a brand new Dual-Core system running a > Dual-Core Pentium D 920. We loaded the 32 bit version of Centos 4.3 onto > the box and proceeded to start testing. BAM.. same problem.. the backtrace > showed the failure in the same routine. > We scratched our heads, and after many hours of trying various > things (backing off the kernel to 2.6.9-22) and even moving to the new > development kernel 2.6.9-34.19 (from the testing tree) we could do nothing > to solve the issue. > Mind you, this is the exact same behavior on two different > hardware platforms running the exact same distribution. We even loaded up > a third box and could reproduce the behavior on it as well. Three > different boxes, one common distribution. > > As a test, we installed Fedora Core 5 x86_64 on the new Dual Core > box and ran extensive tests overnight, simulating 96 channels doing G729 > to Ulaw transcoding. The box ran completely stable. No hiccups. > > So, this morning, we put it back into the cluster, and it's now > taking about 200 concurrent calls, doing an insane amount of transcoding > and it is working just fine. Before, it would have cored in the first > couple of minutes. > > I'm scratching my head here, because I generally have had excellent > experiences with Centos. However, I have NO idea what might be the issue > here. Could it be the kernel? (We tried three different ones!). Could it > be the libc? Maybe it is the compiler? > > In any case, if anyone is having success with Centos 4.3 (32 bit), please > speak up. I'd like to get to the bottom of it. I generally do not like to > run Fedora on production equipment as it is generally bleeding edge. In > this case, FC5 is running 2.6.16 something.. >Have you tried compiling statically on CentOS 4.2 and running on 4.3? I am assuming you have made sure the dist is up to date with patches. We do not use 729, so I cannot try it out for you, but we do use CentOS. Is it only w/ SVN, or all releases of *? -Greg
Greg Boehnlein wrote:>Hello, > I was wondering if anyone out there is successfully running >Asterisk 1.2 svn w/ Centos 4.3. I had an experience over the last two >weeks that has me scratching my head and muttering strange things in the >wee hours of the morning. I am going to try and be as descriptive as my >brain will allow right now, but if there is something that I do not cover, >please do not hesitate to ask and I'll be happy to answer. >Greg, When I upgraded to 4.3 I experienced problems with some non-asterisk RPM's that were compiled on earlier versions of CentOS 4. Once they were recompiled on a fully updated 4.3 system they worked fine. Have you tried recompiling everything? Andrew