Hi, anyone knows when TCP checksum is updated ? I have problem with FTP transfers. I tested with two linux servers both as ftp client or server. In all cases large (cca 50MB) file transfers are corrupted inside. I want to spot the problem, so that my question is: if packet goes thru 2.4.18 router, does the router TCP checksum recomputation ? Router has NAT enabled but not for packets I''m interested in. If yes then if router itself corrupts packet''s data the case will not be caught because it simply computes valid checksum of corrupted data. On other side if it simply passes packet thru (because nothing except TTL is changed and TTL is not part of TCP checksum) then the checksum should really ensure that nothing is changed between sender and reciever and if data are invalid then error would be on sender''s or reciever''s side. thanks, ------------------------------- Martin Devera aka devik Linux kernel QoS/HTB maintainer http://luxik.cdi.cz/~devik/ _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
On Mon, Jan 20, 2003 at 10:23:16AM +0100, devik wrote:> Hi, > > anyone knows when TCP checksum is updated ?Normally, TCP checksum is not supposed to be changed (or even read) at all during transfer (see rfc793 or TCP/IP Illustrated Vol.1). If NAT is in use then it needs to be accounted for, of course, because TCP chksum involves a pseudoheader which contains both source and destination addresses - but this is not the case, as you say.> On other side if it simply passes packet thru (because nothing > except TTL is changed and TTL is not part of TCP checksum) then > the checksum should really ensure that nothing is changed > between sender and reciever and if data are invalid then error > would be on sender''s or reciever''s side.Or imho even more probably, the problem could be the line. I''ve had a couple of these during last 3 years. Namely, is there a serial line (possibly wireless) involved? pvl _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
devik wrote:>Hi, > >anyone knows when TCP checksum is updated ? I have problem >with FTP transfers. I tested with two linux servers both >as ftp client or server. In all cases large (cca 50MB) file >transfers are corrupted inside. >I want to spot the problem, so that my question is: >if packet goes thru 2.4.18 router, does the router TCP >checksum recomputation ? Router has NAT enabled but not for >packets I''m interested in. >Hi devik, NATted packets have incremtental checksum updates, i think the function is called something like ip_nat_cheat_check. TTL is decreased in include/net/ip.h, thats also where the checksum is updated. If you are using iptables some targets also do checksum recalculation, namely ECN is broken in 2.4.20 (wrong checksums). I''m aware of no place where complete recalculation of checksum is done, i think everything is done as incremental update these days. bye patrick>If yes then if router itself corrupts packet''s data the case >will not be caught because it simply computes valid checksum >of corrupted data. >On other side if it simply passes packet thru (because nothing >except TTL is changed and TTL is not part of TCP checksum) then >the checksum should really ensure that nothing is changed >between sender and reciever and if data are invalid then error >would be on sender''s or reciever''s side. > >thanks, >------------------------------- > Martin Devera aka devik >Linux kernel QoS/HTB maintainer > http://luxik.cdi.cz/~devik/ > >_______________________________________________ >LARTC mailing list / LARTC@mailman.ds9a.nl >http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/ > >_______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Patrick McHardy wrote:> devik wrote: > >> Hi, >> >> anyone knows when TCP checksum is updated ? I have problem >> with FTP transfers. I tested with two linux servers both >> as ftp client or server. In all cases large (cca 50MB) file >> transfers are corrupted inside. >> I want to spot the problem, so that my question is: >> if packet goes thru 2.4.18 router, does the router TCP >> checksum recomputation ? Router has NAT enabled but not for >> packets I''m interested in. >> > Hi devik, > NATted packets have incremtental checksum updates, i think the function > is called something like ip_nat_cheat_check. TTL is decreased in > include/net/ip.h,Sorry just got out of my bed ;) The function is called ip_decrease_ttl but it doesn''t alter tcp checksums. I think for a normal forwarded packet which doesn''t hit any mangling iptables targets tcp checksum is untouched. bye patrick> > thats also where the checksum is updated. If you are using iptables > some targets also > do checksum recalculation, namely ECN is broken in 2.4.20 (wrong > checksums). > I''m aware of no place where complete recalculation of checksum is > done, i think > everything is done as incremental update these days. > bye > patrick > >> If yes then if router itself corrupts packet''s data the case >> will not be caught because it simply computes valid checksum >> of corrupted data. >> On other side if it simply passes packet thru (because nothing >> except TTL is changed and TTL is not part of TCP checksum) then >> the checksum should really ensure that nothing is changed >> between sender and reciever and if data are invalid then error >> would be on sender''s or reciever''s side. >> >> thanks, >> ------------------------------- >> Martin Devera aka devik >> Linux kernel QoS/HTB maintainer >> http://luxik.cdi.cz/~devik/ >> >> _______________________________________________ >> LARTC mailing list / LARTC@mailman.ds9a.nl >> http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/ >> >> > > > _______________________________________________ > LARTC mailing list / LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/_______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
> Normally, TCP checksum is not supposed to be changed (or even read) > at all during transfer (see rfc793 or TCP/IP Illustrated Vol.1). > > If NAT is in use then it needs to be accounted for, of course, because > TCP chksum involves a pseudoheader which contains both source and > destination addresses - but this is not the case, as you say.I thought so.> Or imho even more probably, the problem could be the line. I''ve had a > couple of these during last 3 years. Namely, is there a serial line > (possibly wireless) involved?Yes it is. There is wireless net inbetween. But interestingly I have seen no other packet corruption except when communicating via FTP with concrete W2k user. He can FTP with others as I can too but can''t between themselves. Also how could wireless link change data inside of a TCP packet ? Then checksum should become wrong and the packet rejected ... ? I changed proftpd to compute and log MD5 of each chunk of data going out of read() syscall so I will be able to compare them to file on HDD (catching errors in FS, IDE (DMA) or HDD). Also in the same time I''ll save tcpdump''s raw packets to be able to compare stored data with packet contents (with 100MB ftp file it will be a lot of fun :-( ) thanks for all who replied, if someone has another ideas I''m curious :) devik _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
On Mon, Jan 20, 2003 at 03:09:57PM +0100, devik wrote:> > Or imho even more probably, the problem could be the line. I''ve had a > > couple of these during last 3 years. Namely, is there a serial line > > (possibly wireless) involved? > > Yes it is. There is wireless net inbetween. But interestingly > I have seen no other packet corruption except when communicating > via FTP with concrete W2k user. > He can FTP with others as I can too but can''t between themselves. > > Also how could wireless link change data inside of a TCP packet ? > Then checksum should become wrong and the packet rejected ... ?Yes, they should. What I had in mind was a data-dependent bug on a 2 Mbps wireless link that caused packets having alternate zero and one patterns in it (0x5555 or 0xaaaa) to be dropped by the link and never seen by the receiver. Now, I''m not sure what the exact symptoms are of what you are solving. If your problem is that the TCP transmission completes successfully (i.e. everything''s OK as far as TCP is concerned) but the data that came out of the pipe is different that the data that had been sent, that should be a different problem.> I changed proftpd to compute and log MD5 of each chunk of data > going out of read() syscall so I will be able to compare them > to file on HDD (catching errors in FS, IDE (DMA) or HDD). > Also in the same time I''ll save tcpdump''s raw packets to be able > to compare stored data with packet contents (with 100MB ftp file > it will be a lot of fun :-( )Can you produce a "binary diff" of sent and received file to see how exactly they differ? Are the two files completely different? Or do they differ just in a couple of places? Is something missing? Is something changed? If so, how? That''s how I managed to isolate some difficult data-dependent bugs, it might be useful here, too. pvl _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
> > Also how could wireless link change data inside of a TCP packet ? > > Then checksum should become wrong and the packet rejected ... ? > > what you are solving. If your problem is that the TCP transmission > completes successfully (i.e. everything''s OK as far as TCP is concerned) > but the data that came out of the pipe is different that the data that > had been sent, that should be a different problem.exactly - succesfull completion but invalid data.> > to compare stored data with packet contents (with 100MB ftp file > > it will be a lot of fun :-( ) > > Can you produce a "binary diff" of sent and received file to see how > exactly they differ? Are the two files completely different? Or do > they differ just in a couple of places? Is something missing? Is > something changed? If so, how? That''s how I managed to isolate some > difficult data-dependent bugs, it might be useful here, too.I did (I wrote program for it). 16MB are totaly dirrerent (from offset 14MB aligned on page boundary) and some other part (10kB) is shifted by 1 byte. That part is not even 512B aligned. This is list of different parts between 148005111 long files f1 and f2: from 14557184, cnt 4481067 from 19038287, cnt 11393 at 19038288 (by 1) in f1 at 19038286 (by -1) in f2 from 19049716, cnt 1162 at 19049717 (by 1) in f1 at 19049715 (by -1) in f2 from 19050914, cnt 46210654 from 68514564, cnt 4855808 at 68518659 (by 4095) in f2 "at" block is where the block was found in other file. devik _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
On Tue, Jan 21, 2003 at 02:59:07PM +0100, devik wrote:> > Can you produce a "binary diff" of sent and received file to see how > > exactly they differ? Are the two files completely different? Or do > > they differ just in a couple of places? Is something missing? Is > > something changed? If so, how? That''s how I managed to isolate some > > difficult data-dependent bugs, it might be useful here, too. > > I did (I wrote program for it). 16MB are totaly dirrerent > (from offset 14MB aligned on page boundary) and some other > part (10kB) is shifted by 1 byte. That part is not even 512B > aligned. This is list of different parts between 148005111 > long files f1 and f2: > from 14557184, cnt 4481067 > from 19038287, cnt 11393 > at 19038288 (by 1) in f1 > at 19038286 (by -1) in f2 > from 19049716, cnt 1162 > at 19049717 (by 1) in f1 > at 19049715 (by -1) in f2 > from 19050914, cnt 46210654 > from 68514564, cnt 4855808 > at 68518659 (by 4095) in f2 > > "at" block is where the block was found in other file.Uff. ;) I give up, I don''t remember seeing anything like this. I would do one tcpdump at the sender and another at the receiver and compare the files. If they differ, something''s changed along the path, if they don''t, it should be possible to find out which version of the file the dumps'' data correspond to. But that''s what you''ve probably done already ... ;) pvl _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
> > from 19050914, cnt 46210654 > > from 68514564, cnt 4855808 > > at 68518659 (by 4095) in f2 > > > > "at" block is where the block was found in other file. > > Uff. ;) I give up, I don''t remember seeing anything like this. I would > do one tcpdump at the sender and another at the receiver and compare the > files. If they differ, something''s changed along the path, if they > don''t, it should be possible to find out which version of the file the > dumps'' data correspond to. But that''s what you''ve probably done already > ... ;):) I can''t ! Other user is in Italia, uses Win2k and have no clue about networking. He is helping me a lot but I can''t want too much from him. Just now I have whole tcpdump of transfer at my side (170MB), MD5 checksums of 8kb parts ot transfer and when comparing MD5s with file''s content they are ok. So that my side stored into file exactly what arrived from network. Seems like if w2k sometimes reads bad data from hdd .. When I resolve it I''ll let you know. devik _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/