I'm using the openssh that comes with Ubuntu 12.04 so thats 5.9p1 I'm trying to debug why i'm getting corrupt bzip2 files when they are transferred using sftp. The corruption doesnt happen on every file. I'm running debug mode on sftp-server, but when doing that i see output like the following for both valid and corrupt files, May 3 18:50:55 ftp-new sftp-server[16955]: debug3: request 1470: sent status 0 May 3 18:50:55 ftp-new sftp-server[16955]: sent status Success May 3 18:50:55 ftp-new sftp-server[16955]: debug1: request 1471: write "/home/travelport2/OTH_ALL_20130503_2247_V3.part" (handle 0) off 48103424 len 32768 May 3 18:50:55 ftp-new sftp-server[16955]: debug3: request 1471: sent status 0 May 3 18:50:55 ftp-new sftp-server[16955]: sent status Success May 3 18:50:55 ftp-new sftp-server[16955]: debug1: request 1472: write "/home/travelport2/OTH_ALL_20130503_2247_V3.part" (handle 0) off 48136192 len 32768 May 3 18:50:55 ftp-new sftp-server[16955]: debug3: request 1472: sent status 0 May 3 18:51:01 ftp-new sftp-server[16955]: debug1: request 3383: write "/home/travelport2/OTH_ALL_20130503_2247_V3.part" (handle 0) off 110755840 len 32768 May 3 18:51:01 ftp-new sftp-server[16955]: debug3: request 3383: sent status 0 May 3 18:51:01 ftp-new sftp-server[16955]: sent status Success May 3 18:51:01 ftp-new sftp-server[16955]: debug1: request 3384: write "/home/travelport2/OTH_ALL_20130503_2247_V3.part" (handle 0) off 110788608 len 32768 May 3 18:51:01 ftp-new sftp-server[16955]: debug3: request 3384: sent status 0 May 3 18:51:01 ftp-new sftp-server[16955]: sent status Success May 3 18:51:01 ftp-new sftp-server[16955]: debug1: request 3385: write "/home/travelport2/OTH_ALL_20130503_2247_V3.part" (handle 0) off 110821376 len 32768 Is it normal for request numbers and offsets to suddenly jump massively ?>From looking at the code, it would seem the server is merely responding tothe request from the client, but I dont understand what would make the client 'skip' from offset 48136192 to offset 110755840 and from request 1472 to 3383 - as the client code looks like its just it plods through the file. Is this just a red herring ? -- Graeme Wallace CTO FareCompare.com O: 972 588 1414 M: 214 681 9018
Graeme, I have encountered a similar issue before i.e. https://speakerdeck.com/cmlh/ssh Is the SSH client different from the OpenSSH server i.e. Ubuntu 12.04? Are the bzip2 files created on a different host to that of the SSH client? Also, is the 32-bit CRC calculated by bzip2 different before and after it is transferred? I would recommend that you also calculate the hash of the bzip2 file using http://www.openssl.org/docs/apps/dgst.html or similar. On Sat, May 4, 2013 at 8:58 AM, Graeme Wallace < graeme.wallace at farecompare.com> wrote:> I'm using the openssh that comes with Ubuntu 12.04 so thats 5.9p1 > > I'm trying to debug why i'm getting corrupt bzip2 files when they are > transferred using sftp. The corruption doesnt happen on every file. I'm > running debug mode on sftp-server, but when doing that i see output like > the following for both valid and corrupt files, > > > May 3 18:50:55 ftp-new sftp-server[16955]: debug3: request 1470: sent > status 0 > May 3 18:50:55 ftp-new sftp-server[16955]: sent status Success > May 3 18:50:55 ftp-new sftp-server[16955]: debug1: request 1471: write > "/home/travelport2/OTH_ALL_20130503_2247_V3.part" (handle 0) off 48103424 > len 32768 > May 3 18:50:55 ftp-new sftp-server[16955]: debug3: request 1471: sent > status 0 > May 3 18:50:55 ftp-new sftp-server[16955]: sent status Success > May 3 18:50:55 ftp-new sftp-server[16955]: debug1: request 1472: write > "/home/travelport2/OTH_ALL_20130503_2247_V3.part" (handle 0) off 48136192 > len 32768 > May 3 18:50:55 ftp-new sftp-server[16955]: debug3: request 1472: sent > status 0 > May 3 18:51:01 ftp-new sftp-server[16955]: debug1: request 3383: write > "/home/travelport2/OTH_ALL_20130503_2247_V3.part" (handle 0) off 110755840 > len 32768 > May 3 18:51:01 ftp-new sftp-server[16955]: debug3: request 3383: sent > status 0 > May 3 18:51:01 ftp-new sftp-server[16955]: sent status Success > May 3 18:51:01 ftp-new sftp-server[16955]: debug1: request 3384: write > "/home/travelport2/OTH_ALL_20130503_2247_V3.part" (handle 0) off 110788608 > len 32768 > May 3 18:51:01 ftp-new sftp-server[16955]: debug3: request 3384: sent > status 0 > May 3 18:51:01 ftp-new sftp-server[16955]: sent status Success > May 3 18:51:01 ftp-new sftp-server[16955]: debug1: request 3385: write > "/home/travelport2/OTH_ALL_20130503_2247_V3.part" (handle 0) off 110821376 > len 32768 > > > Is it normal for request numbers and offsets to suddenly jump massively ? > From looking at the code, it would seem the server is merely responding to > the request from the client, but I dont understand what would make the > client 'skip' from offset 48136192 to offset 110755840 and from request > 1472 to 3383 - as the client code looks like its just it plods through the > file. > > Is this just a red herring ? > > -- > Graeme Wallace > CTO > FareCompare.com > O: 972 588 1414 > M: 214 681 9018 > _______________________________________________ > openssh-unix-dev mailing list > openssh-unix-dev at mindrot.org > https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev >-- Regards, Christian Heinrich http://cmlh.id.au/contact
On Fri, 3 May 2013, Graeme Wallace wrote:> I'm using the openssh that comes with Ubuntu 12.04 so thats 5.9p1 > > I'm trying to debug why i'm getting corrupt bzip2 files when they are > transferred using sftp. The corruption doesnt happen on every file. I'm > running debug mode on sftp-server, but when doing that i see output like > the following for both valid and corrupt files,It looks like the client is doing something wrong. Is the client OpenSSH? If so, can you capture a trace from it? -d
On Fri, May 03, 2013 at 05:58:59PM -0500, Graeme Wallace wrote:> Is it normal for request numbers and offsets to suddenly jump massively ?The client doesn't happen to be sshfs does it? If so the offset jump might be the client writing a sparse file. -- Darren Tucker (dtucker at zip.com.au) GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69 Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.
On Sun, May 05, 2013 at 01:39:06PM -0500, Graeme Wallace wrote:> Corruption occurs early in the file. Filesystem is ext4 on Ubuntu 12.04. > I've attached the transfer log at DEBUG3.I notice the log does not show the file being closed. Is that really the case or is it just missing from the log? In theory it could be an error when the file is flushed to disk, and that wouldn't be detected because of the missing close, but that seems pretty unlikely. Looking at the server log I see huge chunks of the file missing. I ran the perl script below over the log and got: chunks written: 0-2097152 41910272-44105728 95027200-97222656 120356864-122552320 156434432-158629888 193921024-196116480 237043712-239239168 269942784-272138240 the non-contiguous portions are suspiciously regular, and they correspond to the jumps in the request numbers (the latter is not necessarily a problem but the former probably is). My guess is that the client is multithreaded, doing multiple concurrent writes and is mixing up the increments of both the request numbers and offsets. How big is the file supposed to be? Based on the number of packets I'd guess about 17M, but the client seems to be creating a sparse file about 260M in size (compare the sizes reported by ls -l and du). while (<>) { unless (/off (\d+) len (\d+)/) { next; } $off = $1; $len = $2; #print "off $off len $len\n"; if (!defined($start)) { $start = $off; $end = $len; } elsif ($off == $end) { $end = $off + $len; } else { print "non-contig write at $off $len\n"; $chunks = "$chunks $start-$end"; $start = $off; $end = $off + $len; } } $chunks = "$chunks $start-$end"; print "chunks written: $chunks\n" -- Darren Tucker (dtucker at zip.com.au) GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69 Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.