Digging further into the FXO cpu spike vs clock issue, I removed the 18.432 MHZ crystal from an FXO card and replaced it with a 20.000 MHZ crystal. This of course forced the zaptel timing way off ~ 93% accurate using ztclock. I then proceeded to modify the wcfxo.c driver source code to set the proper PLL divider values to return the DAA clock back to 8 Khz. I came up with the values of N1=25, M1=72 and CGM=1. I wrote the corresponding values out to registers 7, 8 & 10. This seemed to bring the clock back closer to the true 8 Khz spec and in fact seemed to provide a slightly better clock value than the original crystal. Before the mod, I was seeing CPU spikes once every 12 seconds while ztclock was predicting "Estimate 8 frame slips every 12.083200 seconds." After the mod, I was seeing them once every 15 second while ztclock was predicting "Estimate 8 frame slips every 15.104000 seconds." It certainly seems that there is a direct, predictable relationship here. I'd appreciate any thoughts that others may be able to contribute based upon these results or the results of their own testing with ztclock and vmstat 1 on any of the FXO/FXS hardware. Here are my thoughts on this: I suspect that frame slips are occuring somehow. I have not quite figured out how at this point, but it does appear (if ztclock is accurate) that the math is pointing in that direction. The predictability of the spikes seems too much to be just coincidence. Also, assuming ztclock is accurate, it appears that for most FXO/FXS hardware, the clock actually runs just a little faster than 8000 hz. If the FXO card was moving data into a zaptel buffer at a rate slightly faster than it was being removed, then a buffer overrun condition aka frame slip would be the invariable result. I'm thinking that meshing against precisely timed VOIP data (or T1) would be one example where we could expect something like this to occur. In any event a buffer overrun would most certainly result in lost data. I suspect this is causing the CPU spikes, and also is the reason why nobody seems to be able to reliably use data/fax applications across these types of cards. Best as I can determine, it seems that certain channels of the Asterisk PBX seem to time independently from the primary clock source. My experience tells me that in order for data to pass across a telecom network, every node must be in precise timing sync in order to avoid data loss. I would not expect Asterisk to be any different. Interestingly, by adding a TDMOE connection to a second system and configuring it to time from the first, the exact same ztclock and vmstat 1 results were obtained on the second system. Here is some raw data on the results I obtained... *** ztclock results before modification *** ./ztclock ztclock - clock source accuracy test (3 passes) Flushing input buffer... Flush Complete. Test is approximately 3 minutes. Please wait... 483328 samples in 60.410900 sec. (483288 sample intervals) 99.991722% 483328 samples in 60.410901 sec. (483288 sample intervals) 99.991722% 483328 samples in 60.410899 sec. (483288 sample intervals) 99.991722% Estimate 8 frame slips every 12.083200 seconds. *** ztclock results after modification *** ./ztclock ztclock - clock source accuracy test (3 passes) Flushing input buffer... Flush Complete. Test is approximately 3 minutes. Please wait... 483328 samples in 60.411915 sec. (483296 sample intervals) 99.993378% 483328 samples in 60.411915 sec. (483296 sample intervals) 99.993378% 483328 samples in 60.411918 sec. (483296 sample intervals) 99.993378% Estimate 8 frame slips every 15.104000 seconds. Results of vmstat 1 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 0 0 0 65108 13224 35340 0 0 0 24 1115 185 0 17 83 1 0 0 0 65108 13224 35340 0 0 0 0 1113 178 0 0 100 0 0 0 0 65108 13224 35340 0 0 0 0 1113 180 0 0 100 0 0 0 0 65108 13224 35340 0 0 0 0 1113 178 0 0 100 0 0 0 0 65108 13224 35340 0 0 0 0 1114 178 0 0 100 1 0 0 0 65108 13224 35340 0 0 0 12 1115 192 0 0 100 0 0 0 0 65108 13224 35340 0 0 0 0 1113 180 0 0 100 0 0 0 0 65108 13224 35340 0 0 0 0 1113 180 0 0 100 0 0 0 0 65108 13224 35340 0 0 0 0 1113 180 0 0 100 1 0 0 0 65108 13224 35340 0 0 0 0 1113 178 0 0 100 0 0 0 0 65104 13228 35340 0 0 0 24 1115 190 0 0 100 0 0 0 0 65104 13228 35340 0 0 0 0 1113 183 0 0 100 0 0 0 0 65104 13228 35340 0 0 0 0 1113 180 0 0 100 1 0 0 0 65104 13228 35340 0 0 0 0 1113 179 0 0 100 0 0 0 0 65104 13228 35340 0 0 0 0 1113 180 0 0 100 0 0 0 0 65100 13232 35340 0 0 0 36 1118 190 0 16 84
Rich Adamson
2005-Jun-20 14:53 UTC
[Asterisk-Users] FXO/FXS cpu spikes, data loss and ztclock.
Certainly sounds like you're getting closer to the problem. I thought about doing something like that, but after spending soooo much time with the TDM card, decided not to mess with it. I assume you tried a few other values for N1, M1 and CGM as well? Do you think any of the profiling tools would be useful to isolate the asterisk routines causing the spikes? ------------------------> Digging further into the FXO cpu spike vs clock issue, I > removed the 18.432 MHZ crystal from an FXO card and replaced > it with a 20.000 MHZ crystal. This of course forced the zaptel > timing way off ~ 93% accurate using ztclock. I then proceeded to > modify the wcfxo.c driver source code to set the proper PLL divider > values to return the DAA clock back to 8 Khz. I came up with the > values of N1=25, M1=72 and CGM=1. I wrote the corresponding > values out to registers 7, 8 & 10. This seemed to bring the clock > back closer to the true 8 Khz spec and in fact seemed to provide > a slightly better clock value than the original crystal. > > Before the mod, I was seeing CPU spikes once every 12 seconds > while ztclock was predicting "Estimate 8 frame slips every > 12.083200 seconds." > > After the mod, I was seeing them once every 15 second while > ztclock was predicting "Estimate 8 frame slips every > 15.104000 seconds." > > It certainly seems that there is a direct, predictable > relationship here. I'd appreciate any thoughts that others > may be able to contribute based upon these results or the > results of their own testing with ztclock and vmstat 1 on > any of the FXO/FXS hardware. > > Here are my thoughts on this: > > I suspect that frame slips are occuring somehow. I have not > quite figured out how at this point, but it does appear (if ztclock > is accurate) that the math is pointing in that direction. The > predictability of the spikes seems too much to be just coincidence. > Also, assuming ztclock is accurate, it appears that for most FXO/FXS > hardware, the clock actually runs just a little faster than 8000 hz. > If the FXO card was moving data into a zaptel buffer at a rate slightly > faster than it was being removed, then a buffer overrun condition aka > frame slip would be the invariable result. I'm thinking that meshing > against precisely timed VOIP data (or T1) would be one example where we could > expect something like this to occur. In any event a buffer overrun > would most certainly result in lost data. I suspect this is causing the > CPU spikes, and also is the reason why nobody seems to be able to > reliably use data/fax applications across these types of cards. > Best as I can determine, it seems that certain channels of the Asterisk > PBX seem to time independently from the primary clock source. My > experience tells me that in order for data to pass across a telecom > network, every node must be in precise timing sync in order to avoid > data loss. I would not expect Asterisk to be any different. Interestingly, > by adding a TDMOE connection to a second system and configuring it to time > from the first, the exact same ztclock and vmstat 1 results were obtained > on the second system. > > Here is some raw data on the results I obtained... > > > *** ztclock results before modification *** > > ./ztclock > > > ztclock - clock source accuracy test (3 passes) > > Flushing input buffer... > Flush Complete. > > Test is approximately 3 minutes. Please wait... > > 483328 samples in 60.410900 sec. (483288 sample intervals) 99.991722% > 483328 samples in 60.410901 sec. (483288 sample intervals) 99.991722% > 483328 samples in 60.410899 sec. (483288 sample intervals) 99.991722% > > Estimate 8 frame slips every 12.083200 seconds. > > > *** ztclock results after modification *** > > > ./ztclock > > ztclock - clock source accuracy test (3 passes) > > Flushing input buffer... > Flush Complete. > > Test is approximately 3 minutes. Please wait... > > 483328 samples in 60.411915 sec. (483296 sample intervals) 99.993378% > 483328 samples in 60.411915 sec. (483296 sample intervals) 99.993378% > 483328 samples in 60.411918 sec. (483296 sample intervals) 99.993378% > > Estimate 8 frame slips every 15.104000 seconds. > > > > Results of vmstat 1 > > procs memory swap io system cpu > r b w swpd free buff cache si so bi bo in cs us sy id > > 0 0 0 0 65108 13224 35340 0 0 0 24 1115 185 0 17 83 > 1 0 0 0 65108 13224 35340 0 0 0 0 1113 178 0 0 100 > 0 0 0 0 65108 13224 35340 0 0 0 0 1113 180 0 0 100 > 0 0 0 0 65108 13224 35340 0 0 0 0 1113 178 0 0 100 > 0 0 0 0 65108 13224 35340 0 0 0 0 1114 178 0 0 100 > 1 0 0 0 65108 13224 35340 0 0 0 12 1115 192 0 0 100 > 0 0 0 0 65108 13224 35340 0 0 0 0 1113 180 0 0 100 > 0 0 0 0 65108 13224 35340 0 0 0 0 1113 180 0 0 100 > 0 0 0 0 65108 13224 35340 0 0 0 0 1113 180 0 0 100 > 1 0 0 0 65108 13224 35340 0 0 0 0 1113 178 0 0 100 > 0 0 0 0 65104 13228 35340 0 0 0 24 1115 190 0 0 100 > 0 0 0 0 65104 13228 35340 0 0 0 0 1113 183 0 0 100 > 0 0 0 0 65104 13228 35340 0 0 0 0 1113 180 0 0 100 > 1 0 0 0 65104 13228 35340 0 0 0 0 1113 179 0 0 100 > 0 0 0 0 65104 13228 35340 0 0 0 0 1113 180 0 0 100 > 0 0 0 0 65100 13232 35340 0 0 0 36 1118 190 0 16 84 > > _______________________________________________ > Asterisk-Users mailing list > Asterisk-Users@lists.digium.com > http://lists.digium.com/mailman/listinfo/asterisk-users > To UNSUBSCRIBE or update options visit: > http://lists.digium.com/mailman/listinfo/asterisk-users >---------------End of Original Message-----------------
>Probably you are only looking for results of FXO cards, but maybe it's >usefull:Any results that folks can supply are useful. Thankyou for giving ztclock a try. This information is certainly helpful.>483328 samples in 60.416004 sec. (483329 sample intervals) 99.999794% >483328 samples in 60.416000 sec. (483328 sample intervals) 100.000000% >483328 samples in 60.416004 sec. (483329 sample intervals) 99.999794% >Estimate 8 frame slips every 483.328003 seconds.I assume that your system is clocking from the BRI rather than from the TDM card. Is that correct? It looks to me like you have a perfect clock source that is 100% accurate. I would guess that the slight deviation of 4 microseconds in two of the results is likely accounted for as a percentage of error in the test itself. In reality, I'm sure that you are not seeing any slips at all. I'll be compiling any results that I receive in order to try to improve the accuracy of the test.> >No CPU spikes afaics. > >Card: >Module 0: Installed -- AUTO FXS/DPO >Module 1: Installed -- AUTO FXS/DPO >Module 2: Installed -- AUTO FXS/DPO >Module 3: Installed -- AUTO FXS/DPO >Found a Wildcard TDM: Wildcard TDM400P REV E/F (4 modules)>I run a fax on it, this works. But this is FXS. I have ISDN lines >outgoing with the junghanns quadBRI.Do you have fax machines connected to the FXS ports, or are you running something like spandsp?