thr3ads.net - tinc - tinc generating invalid packet checksums? [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Nathan Stratton Treadway

2015-Sep-10 23:34 UTC

tinc generating invalid packet checksums?

We have a Zenoss server in our main office monitoring (among many other
things) an Apache server in a remote network, with a Tinc link between
the two networks.  The monitoring simply involves making an HTTP request
to a URL once every 5 minutes and confirming that a response page comes
back.

Most of the requests to this particular web server succeed (and similar
requests to other web servers don't have any problems), but several
times a day one of the requests will fail, causing Zenoss to generate an
alert... only to have the next request succeed and the alert clear.

Looking closely through various logs and running tcpdump on the two Tinc
servers, I discovered that Tinc seems to be munging the mss value on the
packets that the remote server is sending back, and in the process
(sometimes) generating an incorrect packet cksum, thus causing the
kernel on the local Tinc server to decide the packet is invalid (which
in turn caused it to get dropped by the iptable firewall rules there).

For example, here's the tcpdump output for a return packet as seen going
into the tun interface on the remote tinc server:

2015-09-09 11:42:13.076518 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF],
proto TCP (6), length 60)
    {webserver}.http > {zenoss-server}.42319: Flags [S.], cksum 0xe771
(correct), seq 1243683600, ack 2163080236, win 26847, options [mss
8961,sackOK,TS val 140685120 ecr 136010701,nop,wscale 7], length 0
        0x0000:  4500 003c 0000 4000 3f06 04e1 0a50 0070  E..<.. at .?....P.p
        0x0010:  ac12 8009 0050 a54f 4a21 1b10 80ed fc2c  .....P.OJ!.....,
        0x0020:  a012 68df e771 0000 0204 2301 0402 080a  ..h..q....#.....
        0x0030:  0862 af40 081b 5bcd 0103 0307            .b. at ..[.....

While here is that same packet coming out of the tun interface on the 
local Tinc server:

2015-09-09 11:42:15.094332 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF],
proto TCP (6), length 60)
    {webserver}.http > {zenoss-server}.42319: Flags [S.], cksum 0x0301
(incorrect -> 0x0302), seq 1243683600, ack 2163080236, win 26847, options
[mss 1405,sackOK,TS val 140685620 ecr 136010701,nop,wscale 7], length 0
        0x0000:  4500 003c 0000 4000 3f06 04e1 0a50 0070  E..<.. at .?....P.p
        0x0010:  ac12 8009 0050 a54f 4a21 1b10 80ed fc2c  .....P.OJ!.....,
        0x0020:  a012 68df 0301 0000 0204 057d 0402 080a  ..h........}....
        0x0030:  0862 b134 081b 5bcd 0103 0307            .b.4..[.....


That packet then causes the following kern.log message on that Tinc server:
2015-09-09T11:42:13.097949-04:00 {tinc-server} kern.notice kernel:
[22743995.678025] nf_ct_tcp: bad TCP checksum IN= OUT= SRC=10.80.0.112
DST=172.18.128.9 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=TCP SPT=80
DPT=42319 SEQ=1243683600 ACK=2163080236 WINDOW=26847 RES=0x00 ACK SYN URGP=0 OPT
(0204057D0402080A0862AF40081B5BCD01030307)



In cases where Zenoss ends up complaining, it appears that all the
"Flag: [S.]" reply packets from the webserver end up with an incorrect
checksum, while in most cases Tinc at least eventually generates a valid
reply packet and the TCP session proceeds successfully.  (For example,
the packet above was followed by another incorrect packet, then one that
succeeded:

2015-09-09 11:42:15.098603 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF],
proto TCP (6), length 60)
    {webserver}.http > {zenoss-server}.42319: Flags [S.], cksum 0x0300
(incorrect -> 0x0301), seq 1243683600, ack 2163080236, win 26847, options
[mss 1405,sackOK,TS val 140685621 ecr 136010701,nop,wscale 7], length 0
        0x0000:  4500 003c 0000 4000 3f06 04e1 0a50 0070  E..<.. at .?....P.p
        0x0010:  ac12 8009 0050 a54f 4a21 1b10 80ed fc2c  .....P.OJ!.....,
        0x0020:  a012 68df 0300 0000 0204 057d 0402 080a  ..h........}....
        0x0030:  0862 b135 081b 5bcd 0103 0307            .b.5..[.....
[...]
2015-09-09 11:42:19.106462 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF],
proto TCP (6), length 60)
    {webserver}.http > {zenoss-server}.42319: Flags [S.], cksum 0xff16
(correct), seq 1243683600, ack 2163080236, win 26847, options [mss
1405,sackOK,TS val 140686623 ecr 136010701,nop,wscale 7], length 0
        0x0000:  4500 003c 0000 4000 3f06 04e1 0a50 0070  E..<.. at .?....P.p
        0x0010:  ac12 8009 0050 a54f 4a21 1b10 80ed fc2c  .....P.OJ!.....,
        0x0020:  a012 68df ff16 0000 0204 057d 0402 080a  ..h........}....
        0x0030:  0862 b51f 081b 5bcd 0103 0307            .b....[.....

)

In all cases (correct and incorrect cksum), the reply packet goes in
to the remote Tinc instance with an mss of 8961 and comes out of the
local instance with that changed to 1405.

It seems that when the cksum is incorrect, it is always off by 1.

(The related packets generated on the Zenoss side start out with an mss
of 1460 so presumable Tinc doesn't edit the packets going in that
direction; I have not found the "incorrect ->" message in the
tcpdump
output on the remote Tinc server.)

Currently we are running Ubuntu Precise for both Tinc servers, so we
have tinc v1.0.16 installed.


Am I correct in concluding that this cksum problem is a bug in Tinc? 
If so, is it a known bug that has been corrected in some later Tinc
release?  

(I looked through release announcements, the git commit log, and list
archives but didn't immediately see anything that appeared to be
related....)

Thanks.

							Nathan



----------------------------------------------------------------------------
Nathan Stratton Treadway  -  nathanst at ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Nathan Stratton Treadway

2015-Sep-12 06:34 UTC

head link

tinc generating invalid packet checksums?

On Thu, Sep 10, 2015 at 19:34:21 -0400, Nathan Stratton Treadway
wrote:> It seems that when the cksum is incorrect, it is always off by 1.
> 
[...]> Am I correct in concluding that this cksum problem is a bug in Tinc? 
After investigating this further, I'm fairly certain that problem
originates in the following lines of the clamp_mss() function in
route.c:

     [...]
     csum ^= 0xffff;
     csum -= oldmss;
     csum += newmss;
     csum ^= 0xffff;
     packet->data[50] = csum >> 8;
     packet->data[51] = csum & 0xff;

Since the TCP checksum value needs to be computed using one's compliment
arithmetic, the above code generates new values that are off by one from
the correct checksum for the packet in cases where the calculations
wrap around zero in one direction but not the other....

According to Eqn 3 given in RFC 1624 (e.g. at
  https://tools.ietf.org/html/rfc1624#section-3
) and the UpdateTTL() example function given RFC 1141,
it looks like the calculation should instead be something like
     uint32_t csum = packet->data[50] << 8 | packet->data[51];
     [...]
     csum ^= 0xffff;
     csum += oldmss ^ 0xffff;
     csum += newmss;
     csum = (csum & 0xFFFF) + (csum >> 16);
     csum += (csum >> 16);
     csum ^= 0xffff;
     packet->data[50] = csum >> 8;
     packet->data[51] = csum & 0xff;


(I did some quick testing in Python which seems to confirm that this
algorithm works correctly [as in "generates the value tcpdump is
expecting"] for the packets I mentioned in my original email; I can
send the sample code if anyone is interested.)

Additional discussion of this topic can be found in the "Incremental
update of TCP Checksum" conversation at:
  https://lkml.org/lkml/2003/9/16/197


							Nathan


----------------------------------------------------------------------------
Nathan Stratton Treadway  -  nathanst at ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Guus Sliepen

2015-Sep-12 14:38 UTC

head link

tinc generating invalid packet checksums?

On Sat, Sep 12, 2015 at 02:34:33AM -0400, Nathan Stratton Treadway wrote:
> On Thu, Sep 10, 2015 at 19:34:21 -0400, Nathan Stratton Treadway wrote:
> > It seems that when the cksum is incorrect, it is always off by 1.
> > 
> [...]
> > Am I correct in concluding that this cksum problem is a bug in Tinc? 
Yes.
> After investigating this further, I'm fairly certain that problem
> originates in the following lines of the clamp_mss() function in
> route.c:
> 
>      [...]
>      csum ^= 0xffff;
>      csum -= oldmss;
>      csum += newmss;
>      csum ^= 0xffff;
>      packet->data[50] = csum >> 8;
>      packet->data[51] = csum & 0xff;
> 
> Since the TCP checksum value needs to be computed using one's
compliment
> arithmetic, the above code generates new values that are off by one from
> the correct checksum for the packet in cases where the calculations
> wrap around zero in one direction but not the other....
Indeed. I was wondering why I never found this bug myself. But a quick
calculation shows that, statistically, the checksum will be invalid with
a chance of |oldmss - newmss| / 65536. For people with an MTU of 1500,
that means only 0.08% of the SYN packets will end up with a wrong
checksum. However, with an MTU of 9000 it'll be wrong 11.5% of the time.
> According to Eqn 3 given in RFC 1624 (e.g. at
>   https://tools.ietf.org/html/rfc1624#section-3
> ) and the UpdateTTL() example function given RFC 1141,
> it looks like the calculation should instead be something like
>      uint32_t csum = packet->data[50] << 8 | packet->data[51];
>      [...]
>      csum ^= 0xffff;
>      csum += oldmss ^ 0xffff;
>      csum += newmss;
>      csum = (csum & 0xFFFF) + (csum >> 16);
>      csum += (csum >> 16);
>      csum ^= 0xffff;
>      packet->data[50] = csum >> 8;
>      packet->data[51] = csum & 0xff;
Thanks for digging into this and providing a fix. I've committed it to
git!

-- 
Met vriendelijke groet / with kind regards,
     Guus Sliepen <guus at tinc-vpn.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL:
<http://www.tinc-vpn.org/pipermail/tinc/attachments/20150912/935991ce/attachment.sig>

Reasonably Related Threads

Search for more maybe matching threads

tinc - Sep 2015 - tinc generating invalid packet checksums?

tinc generating invalid packet checksums?

tinc generating invalid packet checksums?

tinc generating invalid packet checksums?

Reasonably Related Threads