Is the failure repeatable ? with the same directories ?
It's very weird that the directories appear on the volume when you do an
'ls' on the bricks. Could it be that you only made a single 'ls'
on fuse
mount which not showed the directory ? Is it possible that this 'ls'
triggered a self-heal that repaired the problem, whatever it was, and
when you did another 'ls' on the fuse mount after the 'ls' on
the
bricks, the directories were there ?
The first 'ls' could have healed the files, causing that the following
'ls' on the bricks showed the files as if nothing were damaged. If
that's the case, it's possible that there were some disconnections
during the copy.
Added Pranith because he knows better replication and self-heal details.
Xavi
On 02/04/2015 07:23 PM, David F. Robinson wrote:> Distributed/replicated
>
> Volume Name: homegfs
> Type: Distributed-Replicate
> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
> Status: Started
> Number of Bricks: 4 x 2 = 8
> Transport-type: tcp
> Bricks:
> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
> Options Reconfigured:
> performance.io-thread-count: 32
> performance.cache-size: 128MB
> performance.write-behind-window-size: 128MB
> server.allow-insecure: on
> network.ping-timeout: 10
> storage.owner-gid: 100
> geo-replication.indexing: off
> geo-replication.ignore-pid-check: on
> changelog.changelog: on
> changelog.fsync-interval: 3
> changelog.rollover-time: 15
> server.manage-gids: on
>
>
> ------ Original Message ------
> From: "Xavier Hernandez" <xhernandez at datalab.es>
> To: "David F. Robinson" <david.robinson at corvidtec.com>;
"Benjamin
> Turner" <bennyturns at gmail.com>
> Cc: "gluster-users at gluster.org" <gluster-users at
gluster.org>; "Gluster
> Devel" <gluster-devel at gluster.org>
> Sent: 2/4/2015 6:03:45 AM
> Subject: Re: [Gluster-devel] missing files
>
>> On 02/04/2015 01:30 AM, David F. Robinson wrote:
>>> Sorry. Thought about this a little more. I should have been
clearer.
>>> The files were on both bricks of the replica, not just one side.
So,
>>> both bricks had to have been up... The files/directories just
don't show
>>> up on the mount.
>>> I was reading and saw a related bug
>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it
>>> suggested to run:
>>> find <mount> -d -exec getfattr -h -n trusted.ec.heal
{} \;
>>
>> This command is specific for a dispersed volume. It won't do
anything
>> (aside from the error you are seeing) on a replicated volume.
>>
>> I think you are using a replicated volume, right ?
>>
>> In this case I'm not sure what can be happening. Is your volume a
pure
>> replicated one or a distributed-replicated ? on a pure replicated it
>> doesn't make sense that some entries do not show in an 'ls'
when the
>> file is in both replicas (at least without any error message in the
>> logs). On a distributed-replicated it could be caused by some problem
>> while combining contents of each replica set.
>>
>> What's the configuration of your volume ?
>>
>> Xavi
>>
>>>
>>> I get a bunch of errors for operation not supported:
>>> [root at gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n
>>> trusted.ec.heal {} \;
>>> find: warning: the -d option is deprecated; please use -depth
instead,
>>> because the latter is a POSIX-compliant feature.
>>> wks_backup/homer_backup/backup: trusted.ec.heal: Operation not
supported
>>> wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal:
Operation
>>> not supported
>>> wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal:
Operation
>>> not supported
>>> wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal:
Operation
>>> not supported
>>> wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal:
Operation
>>> not supported
>>> wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal:
Operation
>>> not supported
>>> wks_backup/homer_backup/logs: trusted.ec.heal: Operation not
supported
>>> wks_backup/homer_backup: trusted.ec.heal: Operation not supported
>>> ------ Original Message ------
>>> From: "Benjamin Turner" <bennyturns at gmail.com
>>> <mailto:bennyturns at gmail.com>>
>>> To: "David F. Robinson" <david.robinson at
corvidtec.com
>>> <mailto:david.robinson at corvidtec.com>>
>>> Cc: "Gluster Devel" <gluster-devel at gluster.org
>>> <mailto:gluster-devel at gluster.org>>;
"gluster-users at gluster.org"
>>> <gluster-users at gluster.org <mailto:gluster-users at
gluster.org>>
>>> Sent: 2/3/2015 7:12:34 PM
>>> Subject: Re: [Gluster-devel] missing files
>>>> It sounds to me like the files were only copied to one replica,
werent
>>>> there for the initial for the initial ls which triggered a self
heal,
>>>> and were there for the last ls because they were healed. Is
there any
>>>> chance that one of the replicas was down during the rsync? It
could
>>>> be that you lost a brick during copy or something like that. To
>>>> confirm I would look for disconnects in the brick logs as well
as
>>>> checking glusterfshd.log to verify the missing files were
actually
>>>> healed.
>>>>
>>>> -b
>>>>
>>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
>>>> <david.robinson at corvidtec.com <mailto:david.robinson
at corvidtec.com>>
>>>> wrote:
>>>>
>>>> I rsync'd 20-TB over to my gluster system and noticed
that I had
>>>> some directories missing even though the rsync completed
normally.
>>>> The rsync logs showed that the missing files were
transferred.
>>>> I went to the bricks and did an 'ls -al
>>>> /data/brick*/homegfs/dir/*' the files were on the
bricks. After I
>>>> did this 'ls', the files then showed up on the FUSE
mounts.
>>>> 1) Why are the files hidden on the fuse mount?
>>>> 2) Why does the ls make them show up on the FUSE mount?
>>>> 3) How can I prevent this from happening again?
>>>> Note, I also mounted the gluster volume using NFS and saw
the same
>>>> behavior. The files/directories were not shown until I did
the
>>>> "ls" on the bricks.
>>>> David
>>>> ==============================>>>> David F.
Robinson, Ph.D.
>>>> President - Corvid Technologies
>>>> 704.799.6944 x101 <tel:704.799.6944%20x101> [office]
>>>> 704.252.1310 <tel:704.252.1310> [cell]
>>>> 704.799.7974 <tel:704.799.7974> [fax]
>>>> David.Robinson at corvidtec.com <mailto:David.Robinson
at corvidtec.com>
>>>> http://www.corvidtechnologies.com
>>>> <http://www.corvidtechnologies.com/>
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org <mailto:Gluster-devel at
gluster.org>
>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>