Nithya -
Thanks for the reply, I will send this at the top to keep the thread from
getting really ugly.
We did indeed copy from the individual bricks in an effort to speed up the copy.
We had one rsync running from each brick to the mount point for the new cluster.
As stated, we skipped all files with size 0 so that stub files wouldn?t be
copied. Some files with permissions of 1000 (equivalent to ---------T) were
larger than 0 and were also copied.
I?m mostly trying to ascertain why such files would exist (failed rebalance?)
and what we can do about this problem.
Thanks,
Kevin
From: Nithya Balachandran [mailto:nbalacha at redhat.com]
Sent: Tuesday, November 15, 2016 10:21 AM
To: Kevin Leigeb <kevin.leigeb at wisc.edu>
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Gluster File Abnormalities
Hi kevin,
On 15 November 2016 at 20:56, Kevin Leigeb <kevin.leigeb at
wisc.edu<mailto:kevin.leigeb at wisc.edu>> wrote:
All -
We recently moved from an old cluster running 3.7.9 to a new one running 3.8.4.
To move the data we rsync?d all files from the old gluster nodes that were not
in the .glusterfs directory and had a size of greater-than zero (to avoid stub
files) through the front-end of the new cluster.
Did you rsync via the mount point or directly from the bricks?
However, it has recently come to our attention that some of the files copied
over were already ?corrupted? on the old back-end. That is, these files had
permissions of 1000 (like a stub file) yet were the full size of the actual
file.
Does this correspond to a file permission of ___T when viewed using ls? If yes,
these are dht linkto files. They were possibly created during a rebalance and
left behind because the file was skipped. They should be ignored when accessing
the gluster volume via the mount point.
In some cases, these were the only copies of the file that existed at all on any
of the bricks, in others, another version of the file existed that was also full
size and had the proper permissions. In some cases, we believe, these correct
files were rsync?d but then overwritten by the 1000 permission version resulting
in a useless file on the new cluster.
This sounds like you were running rsync directly on the bricks. Can you please
confirm if that is the case?
These files are thought by the OS to be binaries when trying to open them using
vim, but they are actually text files (or at least were originally). We can cat
the file to see that it has a length of zero and so far that is our only
reliable test to find which files are indeed corrupted (find . -type f | xargs
wc -l). With nearly 50 million files on our cluster, this is really a
non-starter because of the speed.
Has anyone seen this issue previously? We?re hoping to find a solution that
doesn?t involve overthinking the problem and thought this might be a great place
to start.
Let me know if there?s any info I may have omitted that could be of further use.
Thanks,
Kevin
Thanks,
Nithya
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161115/6366e6e1/attachment.html>