Hello all,
I'm experiencing a most unusual problem.. I am copying a directory
from one server to another (both Solaris 10/SPARC). No applications
are using the directory in question on either side while the copy is
in progress. After the copy successfully completes (no errors or
warnings), my user brings up the application (mix of binaries, java
files, and Apache Tomcat servlets) contained in this directory on the
new server. Or at least, they try to..
If I use rsync to perform this copy (tested with 2.5.5 and 3.0.0pre7),
the application logs an unusual Java error (reproduced near the
bottom) and dies. Surely an application issue, right?
Well, it turns out, that if I use GNU tar (and I must use GNU tar as
the directory has very long file names) to copy the same set of files,
the application works without issue after being copied... Hmm..
I have had absolutely no luck at all tracking down WHY this works with
tar but does not work with rsync. This problem is very repeatable. I
have tried to use rsync at least 10 times with different options,
versions, etc and it has failed to start the application every time
while tar on the other hand has worked every time I have tried it (at
least 5 times.) This consistency is the only reason I still think
that there must be something different in the copies that is affecting
the application because I have been unable to find any meaningful
differences..
So this is why I am appealing for help.. can anyone think of
additional tests I can run to try to find any key differences between
the copy I've made using rsync and that which I've made using tar? Or
does anyone have any other ideas why some random Java based program
might not work when copied with rsync? I've run every comparison I can
think of between the copies and can't find anything that I could blame
the application error on.
Still with me? Great! Here are the details..
When I copy the directory with rsync, I use the following command
(local & remote rsync path adjusted depending on whether I was testing
2.5.5 or 3.0.0pre7):
/my/local/rsync -ave ssh --rsync-path=/my/local/rsync /directory/
root@serverB:/directory
Note, the remote directory is empty when I start both the rsync and
tar copies so there are no files being skipped due to matching
modtimes/sizes or the like.
My rsync 2.5.5 version info is:
rsync version 2.5.5 protocol version 26
Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles,
no IPv6, 64-bit system inums, 64-bit internal inums
My rsync 3.0.0pre7 version info is:
rsync version 3.0.0pre7 protocol version 30.PR16
Capabilities:
64-bit files, 64-bit inums, 32-bit timestamps, 64-bit long ints,
socketpairs, hardlinks, symlinks, no IPv6, batchfiles, inplace,
append, ACLs, no xattrs, iconv
When I copy the directory with GNU tar, I use:
cd /directory && gtar -cf - . | ssh serverB 'cd /directory
&& gtar -xpf -'
In both cases, the files transfer without issue (no warnings, no
errors). I have run the following tests to compare the copy made with
gtar versus the copy made with gtar on the remote server:
- gdiff -r /tar-copy /rsync-copy
<No differences reported>
- Ran sha1sum on each file in each copy and compared the outputs
<OK, this is ridiculous but I'm out of ideas. files are 100%
identical according to sha1sum>
- ls -@aR /tar-copy /rsync-copy | grep @
<To check for extended attributes that rsync wouldn't copy..
there were none>
- compared number of total inodes in each copy.. they were identical
- compared number of unique inodes in each copy.. they were identical
<To check for hard link differences>
- Ran ls -slARt on both copies and original and compared the output
using diff.. the only differences found were:
o Total counts for directories were sometimes slightly
different (only for directories, never files).. for example:
< 4 drwxrwxr-x 6 user group 1536 Oct 22 12:59 ../
---
> 2 drwxrwxr-x 6 user group 1024 Oct 22 12:59 ../
o Symbolic links created had different modification times.
< 2 lrwxrwxrwx 1 user group 1 Dec 27 13:26 en_US
-> C/
---
> 2 lrwxrwxrwx 1 user group 1 Dec 27 13:40 en_US
-> C/
<Both of these differences seem to be normal enough as both the
tar and rsync copies were not 100% identical to the original in this
test due to various differences in both of the above two tests. This
test knocks out missing files, corrupted file names, permissions, ACL
differences, and a number of other possible problems I could think of>
The actual error the application gives on startup is logged as this:
java.lang.NoSuchFieldError: _impl
at com.tibco.tibrv.TibrvListener.init(TibrvListener.java:72)
at com.tibco.tibrv.TibrvListener.<init>(TibrvListener.java:58)
at
com.tibco.infra.repository.RemoteDiscoverer.<init>(RepoFactory.java:5904)
.. stack trace continues ..
I have not been able to attach truss to the process starting up on the
original versus starting up on the rsync copy to compare differences
in the startup. This is due to the way the erroring process is called
(forked from another process which is itself nested deep within the
application.) I'm still working on that angle as truss might be my
only hope left. The vendor of the application has also been unable to
offer any insight into this issue and simply suggested that we use
cpio to copy the files... *shudders*
Oi... Any ideas before all my hair has been ripped out?
Thanks for any suggestions,
Brian