I am seeing a really odd problem with Rsync 2.6.2 and 2.6.3. I am trying to rsync a directory containing some 2gb files from a Solaris 9 NFS server to a Solaris 10b69 NFS client. Rsync is running on the Solaris 10 client. Here is the source directory as seen from the client: Source directory: -rwxr--r-- 1 root priv 1270666114 Jan 3 2004 FLY.GHO* -rwxr--r-- 1 root priv 1595131867 Nov 11 2003 LB.GHO* -rwxr--r-- 1 root priv 263738235 Jan 3 2004 PHANT001.GHS* -rwxr--r-- 1 root priv 2147482739 Jan 3 2004 PHANTXP.GHO* -rwxrw-r-- 1 root priv 1489966355 Dec 24 2003 ROVER.GHO* -rwxr--r-- 1 root priv 2147482481 Jan 3 2004 TOSH.GHO* -rwxr--r-- 1 root priv 2147480582 Jan 3 2004 TOSH0001.GHS* -rwxr--r-- 1 root priv 2070930486 Jan 3 2004 TOSH0002.GHS* Using 2.6.2 with -a -H -S -v drwxr-xr-x 2 root other 512 Nov 11 16:58 ./ drwxr-xr-x 16 root root 512 Nov 10 20:53 ../ -rwx------ 1 root root 6366602929 Nov 11 17:06 .TOSH.GHO.lRaGei* -rwxr--r-- 1 root priv 1270666114 Jan 3 2004 FLY.GHO* -rwxr--r-- 1 root priv 1595131867 Nov 11 2003 LB.GHO* -rwxr--r-- 1 root priv 263738235 Jan 3 2004 PHANT001.GHS* -rwxr--r-- 1 root priv 2147482739 Jan 3 2004 PHANTXP.GHO* -rwxrw-r-- 1 root priv 1489966355 Dec 24 2003 ROVER.GHO* Look at the size of the .TOSH.GHO.lraGei file! It's a concatenation of the three TOSH files from the source tree! At this point, rsync hangs when it reaches the end of the TOSH0002.GHS file. Using 2.6.3 with -a -H -S --inplace -v -rwxr--r-- 1 root priv 1270666114 Jan 3 2004 FLY.GHO* -rwxr--r-- 1 root priv 1595131867 Nov 11 2003 LB.GHO* -rwxr--r-- 1 root priv 263738235 Jan 3 2004 PHANT001.GHS* -rwxr--r-- 1 root priv 2147482739 Jan 3 2004 PHANTXP.GHO* -rwxrw-r-- 1 root priv 1489966355 Dec 24 2003 ROVER.GHO* -rwxr--r-- 1 root priv 2147482481 Jan 3 2004 TOSH.GHO* ---------- 1 root root 4218739196 Nov 11 17:20 TOSH0001.GHS Now I tried 2.6.3 with the --inplace option. Interestingly enough, it copied the TOSH.GHO file over, then concatenated TOSH0001.GHS and THS0002.GHS into TOSH0001.GHS on the receiving file system. At this point, rsync hung again. The problem seems to be somewhat random, as it happened several times with the PHANT* files, but it eventually copied them properly, after several attempts. I'm wondering perhaps if this is more a Solaris 10b69 bug than an rsync bug? Any thoughts on this?
On Thu, Nov 11, 2004 at 05:38:56PM -0600, Werner wrote:> I'm wondering perhaps if this is more a Solaris 10b69 bug than an > rsync bug?I would imagine so. I can only suggest (1) try a different transport (perhaps the remote shell you're using is not binary clean, and is thus eating the characters that would differentiate the files from one another in the data stream); (2) specify a much higher level of verbosity (so that rsync tells you what it thinks it is doing at every step); and/or (3) run rsync using a system-call-tracing program (which should help you to determine if the system calls are messing up on the sending or receiving side -- see http://rsync.samba.org/issues.html). My bet is that the problem is that transport is losing data. ..wayne..
> > >I am seeing a really odd problem with Rsync 2.6.2 and 2.6.3. I am trying to > >rsync a directory containing some 2gb files from a Solaris 9 NFS server to >a Solaris 10b69 NFS client. Rsync is running on the Solaris 10 client. > >I'm wondering perhaps if this is more a Solaris 10b69 bug than an rsync bug? >Any thoughts on this? >I found this message in the archives when investigating a similiar problem I was having with Solaris 10 B69. I have found out some more information that may be helpful. I have four systems running Solaris 10 B69 currently. Of these, only two have encountered strange behavior such as rsync dying, corrupting files, mixing files together, or hanging altogether. The other two systems can run rsync without any problems. Oddly enough the problem occurs on both sparc and x86 systems. The only common factor I could find was that the systems that did not have problems were single CPU systems. The x86 box with the problem was also single CPU, but it is a hyperthreaded cpu so it looked like two to the system. With that in mind, I tried disabling all but one CPU by using the psradm command. After the system had only one cpu active, rsync ran perfectly without error every time. Turn back on the extra CPUs, and the random behavior immediately returned. On the x86 box I just disabled hyperthreading which is no big deal, but the sparc box has four real CPUs, so I'd lose 3/4 my cpu power going down to just one CPU. Anyways, this seems to point to a bug in SMP on Solaris 10 B69, but on the other hand rsync is the only utility I've had problems with so far. With this in mind, is there any additional advice as to what might be going wrong? -- James Lick -- ??? -- jlick@jameslick.com -- http://jameslick.com/