On Sunday, January 26, 2020 11:18:36 PM CET Pete Biggs wrote:> First of all - disclaimer - I'm no network specialist, I just read and > am interested in it. I may get things wrong!! > > > Both physical interfaces show the same. But does this mean it's on as in > > "rx- checksumming: on" or off as in "tx-checksum-ipv4: off [fixed]"? > > As far as I understand it rx-checksum is the underlying wire > checksumming - and from what I've read about it, disabling that > disables the UDP checksums.You mean layer 1 checksumming? Is there such a thing with ethernet? I think I read something about encoding, when I was trying to understand what "bandwidth" actually means, being involved in signal transmissions; and I seem to remember that there was no checksumming involved and it had to do with identifying signals as a requirement for the very possibility to transmit something before anything could be transmitted at all.> > Assuming that I do not receive packets with invalid UPD checksums, then > > the > > packages must be somehow altered and their UPD checksums recalculated to > > arrive here. Does bad hardware etc. do that? Why would the UDP checksums > > just happen to get recalculated correctly but like randomly without > > intent? > > I'm not sure I understand what you are asking.It is about VOIP calls via SRTP being interrupted at irregular intervals. The intervals appear to depend on the time of day: Such phone calls can last for a duration of about 5--25 minutes during the day to up to 1.5 hours at around 3am before being interrupted. Asterisk says that a package is being replayed, meaning that libsrtp has already seen and processed the packet earlier. That can happen a couple times until asterisk reports authentication failures. The result is that the call is interrupted in that I can not hear the opposite end while the other end sometimes can still hear me, sometimes not. The interruption can take even minutes and the audio can continue after that, though usually I either hang up the call, or the calls ends by itself before the audio is back. IIUC, authentication failures mean that libsrtp figures that the authentication tag of an SRTP package does not match the data contained otherwise within the packet. The authentication tag is encrpyted on the sender side after initially keys have been exchanged between sender and receiver from which new keys are being derived as needed. The key exchange can go over SIP (using TLS) when sdes is used, which it is in this case. The receiver decrypts the authentication tag and verifies that the tag matches all the other data in the packet. Only when the package was thusly successfully authenticated, the RTP-payload of the package is decrypted. The SRTP package seems to be the entire payload of the UDP package, so if the data of the SRTP package gets damaged or were to be intentionally altered, the UDP checksum would have to be intentionally re-calculated. Two independent installations of asterisk at physically different locations are showing the same error messages, both connecting to the same VOIP provider. As you can imagine, this is really fun to debug ...> But it's unlikely (very > unlikely) that the checksums are randomly correct. But packet checksums > are recalculated when packets are forwarded by layer 4 switches - the > contents of the package are inspected as part of the switching process.Yes, I thought so, IIRC it's required for routing and changing the TTL maybe. Now that someone would intentionally alter the SRTP packages and re-calculate the checksums seems rather unlikely, all the more so since they would need to do that at two different places.> > Only when asterisk (i. e. libsrtp) finally verifies the authentication tag > > of an SRTP package against the authenticated part of the package --- > > which, according to RFC 3711, seems to be the entire payload of the UPD > > package --- the verfication fails. > > > > How is that possible? > > If it's SRTP checksum error, then that checksum is part of the packet > payload at the application level - the UDP checksum is for the whole > packet. Presumably the contents of the application payload were > altered after the SRTP checksum was calculated but before the UDP > packet checksum. It could be a bad layer 4 switch I suppose.Right --- or the SRTP package has been created incorrectly by their phone system because it is overloaded at busy times, or it's buggy. My favorite theory is that I am sometimes suddenly receiving the wrong SRTP stream. I think it would fit the symptoms. Perhaps the VOIP provider is experiencing interesting NAT issues when their connection tracking is getting messed up at times when there are more connections than they can handle. That defective hardware is causing the same problem at both places at the same time seems rather unlikely. So I've been trying to figure out what the problem might be. After learning all this, I'm sufficiently sure that the problem is on their side.> Probably your best bet is to use wireshark to decode the packets to see > what the raw data looks like.Hm, I tried that and wireshark doesn't seem to like SRTP packages very much. Apparently it doesn't have a way to decrypt SRTP packages at all, even if I could get the initial keys. Maybe someone who is much more proficient with wireshark could find something. To me, it has been useless so far. If wireshark could do stuff with SRTP packages, what could it possibly show other than that some packages either carry a damaged payload, or that the encryption keys don't fit, which is something I already know? If the problem was with asterisk or libsrtp, the problem would be much more common.
On 1/26/20 5:44 PM, hw wrote:> On Sunday, January 26, 2020 11:18:36 PM CET Pete Biggs wrote: >> First of all - disclaimer - I'm no network specialist, I just read and >> am interested in it. I may get things wrong!! >> >>> Both physical interfaces show the same. But does this mean it's on as in >>> "rx- checksumming: on" or off as in "tx-checksum-ipv4: off [fixed]"? >> As far as I understand it rx-checksum is the underlying wire >> checksumming - and from what I've read about it, disabling that >> disables the UDP checksums. > You mean layer 1 checksumming? Is there such a thing with ethernet? I think > I read something about encoding, when I was trying to understand what > "bandwidth" actually means, being involved in signal transmissions; and I seem > to remember that there was no checksumming involved and it had to do with > identifying signals as a requirement for the very possibility to transmit > something before anything could be transmitted at all. > >>> Assuming that I do not receive packets with invalid UPD checksums, then >>> the >>> packages must be somehow altered and their UPD checksums recalculated to >>> arrive here. Does bad hardware etc. do that? Why would the UDP checksums >>> just happen to get recalculated correctly but like randomly without >>> intent? >> I'm not sure I understand what you are asking. > It is about VOIP calls via SRTP being interrupted at irregular intervals. The > intervals appear to depend on the time of day: Such phone calls can last for > a duration of about 5--25 minutes during the day to up to 1.5 hours at around > 3am before being interrupted.My sense is you may be starting at too low of a level in trying to debug this.? I have seen the same kind of problems with my voip service when there is a problem with my Internet connection.? When this happens I also see high retransmission rates for tcp connections and other signs of network problem.? If I check the modem for my Internet connection there are issues with the signal levels and high error rates reported by the modem.? If you believe your Internet connection is reliable, then if you run managed switches, check your switch logs for any reported errors. You could try tools like iperf to check for problems on your internal network.? You could run some of the basic tools for testing voip performance of your Inetnet connection and if necessary run iperf to a cloud hosted system. I think it is highly unlikely that you are only having issues with srtp packets and I would look at the broader picture first to try to isolate some other problem in your network or Internet connection. Nataraj
Stephen John Smoogen
2020-Jan-28 12:50 UTC
[CentOS] Centos 7: UPD packet checksum verification?
On Sun, 26 Jan 2020 at 20:45, hw <hw at gc-24.de> wrote:>> > I'm not sure I understand what you are asking. > > It is about VOIP calls via SRTP being interrupted at irregular intervals. The > intervals appear to depend on the time of day: Such phone calls can last for > a duration of about 5--25 minutes during the day to up to 1.5 hours at around > 3am before being interrupted. >UDP is called Unreliable Datagram Protocol for a reason. It can be dropped at all kinds of places in between the two users depending on how busy the routers/firewalls between 2 users can be. Packets can get out of order or a dozen other things which then relies on the application layer to put the things back in 'order'. For voice, that usually means a drop or other ugliness because it is assumed that if the quality is too bad, the people would just call each other again. For the most part this works pretty well but all it takes is a firewall to get busy on something else and you have a bunch of UDP packets out of order and people's calls dropping. -- Stephen J Smoogen.
On Tuesday, January 28, 2020 9:00:22 AM CET Nataraj wrote:> On 1/26/20 5:44 PM, hw wrote: > > On Sunday, January 26, 2020 11:18:36 PM CET Pete Biggs wrote: > >> First of all - disclaimer - I'm no network specialist, I just read and > >> am interested in it. I may get things wrong!! > >> > >>> Both physical interfaces show the same. But does this mean it's on as > >>> in > >>> "rx- checksumming: on" or off as in "tx-checksum-ipv4: off [fixed]"? > >> > >> As far as I understand it rx-checksum is the underlying wire > >> checksumming - and from what I've read about it, disabling that > >> disables the UDP checksums. > > > > You mean layer 1 checksumming? Is there such a thing with ethernet? I > > think I read something about encoding, when I was trying to understand > > what "bandwidth" actually means, being involved in signal transmissions; > > and I seem to remember that there was no checksumming involved and it had > > to do with identifying signals as a requirement for the very possibility > > to transmit something before anything could be transmitted at all. > > > >>> Assuming that I do not receive packets with invalid UPD checksums, then > >>> the > >>> packages must be somehow altered and their UPD checksums recalculated to > >>> arrive here. Does bad hardware etc. do that? Why would the UDP > >>> checksums > >>> just happen to get recalculated correctly but like randomly without > >>> intent? > >> > >> I'm not sure I understand what you are asking. > > > > It is about VOIP calls via SRTP being interrupted at irregular intervals. > > The intervals appear to depend on the time of day: Such phone calls can > > last for a duration of about 5--25 minutes during the day to up to 1.5 > > hours at around 3am before being interrupted. > > My sense is you may be starting at too low of a level in trying to debug > this.One of the reasons I have to look into it is that it is usually good to know more/better.> I have seen the same kind of problems with my voip service when > there is a problem with my Internet connection. When this happens I > also see high retransmission rates for tcp connections and other signs > of network problem.How do you monitor such retransmissions to be able to see if and when they occur?> If I check the modem for my Internet connection > there are issues with the signal levels and high error rates reported by > the modem. If you believe your Internet connection is reliable, then if > you run managed switches, check your switch logs for any reported errors. > > You could try tools like iperf to check for problems on your internal > network. You could run some of the basic tools for testing voip > performance of your Inetnet connection and if necessary run iperf to a > cloud hosted system.Can you suggest useful tools to analyze VOIP performance, and how do you define VOIP performance? The performance is kinda acceptable as long as the calls are not interrupted. It's still worlds apart from what it used to be 25 years ago, before VOIP was used. Back then, you never had to worry that calls could be interrupted or that you couldn't hear someone or that you couldn't have a conversation because the latency makes it impossible. You could just talk to someone on a phone, like it should be. Nowadays, we get to pay 10 times as much and more, plus all the expensive hardware, and it still doesn't work right and doesn't even come close.> I think it is highly unlikely that you are only having issues with srtp > packets and I would look at the broader picture first to try to isolate > some other problem in your network or Internet connection.See it this way: It is highly likely that I don't have any issues with SRTP at all. Calls over the LAN work fine. The only issue is with the VOIP provider. What I have learned about SRTP so far tells me that, like everything else does. How would you explain that the same problem occurs at two entirely unrelated physical locations each having their own asterisk installations, using entirely different hardware and entirely different internet connections from entirely different ISPs, with the only thing in common being the VOIP provider? If it was only my internet connection which is affected, I'd be talking to my ISP (probably useless) instead of the VOIP provider (who will probably do something about it).
On Tuesday, January 28, 2020 1:50:57 PM CET Stephen John Smoogen wrote:> On Sun, 26 Jan 2020 at 20:45, hw <hw at gc-24.de> wrote: > > > I'm not sure I understand what you are asking. > > > > It is about VOIP calls via SRTP being interrupted at irregular intervals. > > The intervals appear to depend on the time of day: Such phone calls can > > last for a duration of about 5--25 minutes during the day to up to 1.5 > > hours at around 3am before being interrupted. > > UDP is called Unreliable Datagram Protocol for a reason. It can be > dropped at all kinds of places in between the two users depending on > how busy the routers/firewalls between 2 users can be.How would packets being dropped explain the replay errors and authentication failures?> Packets can get > out of order or a dozen other things which then relies on the > application layer to put the things back in 'order'.libsrtp seems to have provisions to deal with packets arriving out of order.> For voice, that > usually means a drop or other ugliness because it is assumed that if > the quality is too bad, the people would just call each other again.That's a funny idea. Phone calls just worked fine and were good quality 25 years ago, and mostly long before that. I have never expected to have to call anyone back because of poor quality in over 40 years, and I'm not going to start to expect that now. It's unacceptable, and it's not feasible, either. For example, try to call paypal to solve some issue with your account. It can take an hour before they call you back because everyone is busy. Finally you talk to someone and just after you explained the problem, the call is interrupted. Good luck calling the same person back. You won't get anywhere because your next try will only result in another interrupted call.> For the most part this works pretty well but all it takes is a > firewall to get busy on something else and you have a bunch of UDP > packets out of order and people's calls dropping.VOIP calls are worlds away from what phone calls used to be. Dropping calls has never been an option and is not an option now.