thr3ads.net - Gluster users - [Gluster-users] [Gluster-devel] missing files [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Pranith Kumar Karampuri

2015-Feb-05 10:18 UTC

[Gluster-users] [Gluster-devel] missing files

I believe David already fixed this. I hope this is the same issue he 
told about permissions issue.

Pranith
On 02/05/2015 03:44 PM, Xavier Hernandez wrote:> Is the failure repeatable ? with the same directories ?
>
> It's very weird that the directories appear on the volume when you do 
> an 'ls' on the bricks. Could it be that you only made a single
'ls' on
> fuse mount which not showed the directory ? Is it possible that this 
> 'ls' triggered a self-heal that repaired the problem, whatever it
was,
> and when you did another 'ls' on the fuse mount after the
'ls' on the
> bricks, the directories were there ?
>
> The first 'ls' could have healed the files, causing that the
following
> 'ls' on the bricks showed the files as if nothing were damaged. If 
> that's the case, it's possible that there were some disconnections 
> during the copy.
>
> Added Pranith because he knows better replication and self-heal details.
>
> Xavi
>
> On 02/04/2015 07:23 PM, David F. Robinson wrote:
>> Distributed/replicated
>>
>> Volume Name: homegfs
>> Type: Distributed-Replicate
>> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
>> Status: Started
>> Number of Bricks: 4 x 2 = 8
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
>> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
>> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
>> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
>> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
>> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
>> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
>> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
>> Options Reconfigured:
>> performance.io-thread-count: 32
>> performance.cache-size: 128MB
>> performance.write-behind-window-size: 128MB
>> server.allow-insecure: on
>> network.ping-timeout: 10
>> storage.owner-gid: 100
>> geo-replication.indexing: off
>> geo-replication.ignore-pid-check: on
>> changelog.changelog: on
>> changelog.fsync-interval: 3
>> changelog.rollover-time: 15
>> server.manage-gids: on
>>
>>
>> ------ Original Message ------
>> From: "Xavier Hernandez" <xhernandez at datalab.es>
>> To: "David F. Robinson" <david.robinson at
corvidtec.com>; "Benjamin
>> Turner" <bennyturns at gmail.com>
>> Cc: "gluster-users at gluster.org" <gluster-users at
gluster.org>; "Gluster
>> Devel" <gluster-devel at gluster.org>
>> Sent: 2/4/2015 6:03:45 AM
>> Subject: Re: [Gluster-devel] missing files
>>
>>> On 02/04/2015 01:30 AM, David F. Robinson wrote:
>>>> Sorry. Thought about this a little more. I should have been
clearer.
>>>> The files were on both bricks of the replica, not just one
side. So,
>>>> both bricks had to have been up... The files/directories just
don't
>>>> show
>>>> up on the mount.
>>>> I was reading and saw a related bug
>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it
>>>> suggested to run:
>>>>          find <mount> -d -exec getfattr -h -n
trusted.ec.heal {} \;
>>>
>>> This command is specific for a dispersed volume. It won't do
anything
>>> (aside from the error you are seeing) on a replicated volume.
>>>
>>> I think you are using a replicated volume, right ?
>>>
>>> In this case I'm not sure what can be happening. Is your volume
a pure
>>> replicated one or a distributed-replicated ? on a pure replicated
it
>>> doesn't make sense that some entries do not show in an
'ls' when the
>>> file is in both replicas (at least without any error message in the
>>> logs). On a distributed-replicated it could be caused by some
problem
>>> while combining contents of each replica set.
>>>
>>> What's the configuration of your volume ?
>>>
>>> Xavi
>>>
>>>>
>>>> I get a bunch of errors for operation not supported:
>>>> [root at gfs02a homegfs]# find wks_backup -d -exec getfattr -h
-n
>>>> trusted.ec.heal {} \;
>>>> find: warning: the -d option is deprecated; please use -depth
instead,
>>>> because the latter is a POSIX-compliant feature.
>>>> wks_backup/homer_backup/backup: trusted.ec.heal: Operation not 
>>>> supported
>>>> wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: 
>>>> Operation
>>>> not supported
>>>> wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: 
>>>> Operation
>>>> not supported
>>>> wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: 
>>>> Operation
>>>> not supported
>>>> wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: 
>>>> Operation
>>>> not supported
>>>> wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: 
>>>> Operation
>>>> not supported
>>>> wks_backup/homer_backup/logs: trusted.ec.heal: Operation not
supported
>>>> wks_backup/homer_backup: trusted.ec.heal: Operation not
supported
>>>> ------ Original Message ------
>>>> From: "Benjamin Turner" <bennyturns at gmail.com
>>>> <mailto:bennyturns at gmail.com>>
>>>> To: "David F. Robinson" <david.robinson at
corvidtec.com
>>>> <mailto:david.robinson at corvidtec.com>>
>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org
>>>> <mailto:gluster-devel at gluster.org>>;
"gluster-users at gluster.org"
>>>> <gluster-users at gluster.org <mailto:gluster-users at
gluster.org>>
>>>> Sent: 2/3/2015 7:12:34 PM
>>>> Subject: Re: [Gluster-devel] missing files
>>>>> It sounds to me like the files were only copied to one
replica,
>>>>> werent
>>>>> there for the initial for the initial ls which triggered a
self heal,
>>>>> and were there for the last ls because they were healed. Is
there any
>>>>> chance that one of the replicas was down during the rsync?
It could
>>>>> be that you lost a brick during copy or something like
that. To
>>>>> confirm I would look for disconnects in the brick logs as
well as
>>>>> checking glusterfshd.log to verify the missing files were
actually
>>>>> healed.
>>>>>
>>>>> -b
>>>>>
>>>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
>>>>> <david.robinson at corvidtec.com
<mailto:david.robinson at corvidtec.com>>
>>>>> wrote:
>>>>>
>>>>>     I rsync'd 20-TB over to my gluster system and
noticed that I had
>>>>>     some directories missing even though the rsync
completed
>>>>> normally.
>>>>>     The rsync logs showed that the missing files were
transferred.
>>>>>     I went to the bricks and did an 'ls -al
>>>>>     /data/brick*/homegfs/dir/*' the files were on the
bricks. After I
>>>>>     did this 'ls', the files then showed up on the
FUSE mounts.
>>>>>     1) Why are the files hidden on the fuse mount?
>>>>>     2) Why does the ls make them show up on the FUSE mount?
>>>>>     3) How can I prevent this from happening again?
>>>>>     Note, I also mounted the gluster volume using NFS and
saw the
>>>>> same
>>>>>     behavior. The files/directories were not shown until I
did the
>>>>>     "ls" on the bricks.
>>>>>     David
>>>>>     ==============================>>>>>    
David F. Robinson, Ph.D.
>>>>>     President - Corvid Technologies
>>>>>     704.799.6944 x101 <tel:704.799.6944%20x101>
[office]
>>>>>     704.252.1310 <tel:704.252.1310> [cell]
>>>>>     704.799.7974 <tel:704.799.7974> [fax]
>>>>>     David.Robinson at corvidtec.com 
>>>>> <mailto:David.Robinson at corvidtec.com>
>>>>>     http://www.corvidtechnologies.com
>>>>> <http://www.corvidtechnologies.com/>
>>>>>
>>>>>     _______________________________________________
>>>>>     Gluster-devel mailing list
>>>>>     Gluster-devel at gluster.org <mailto:Gluster-devel
at gluster.org>
>>>>>     http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>

Pranith Kumar Karampuri

2015-Feb-05 10:30 UTC

head link

[Gluster-users] [Gluster-devel] missing files

On 02/05/2015 03:48 PM, Pranith Kumar Karampuri wrote:> I believe David already fixed this. I hope this is the same issue he 
> told about permissions issue.Oops, it is not. I will take a look.

Pranith>
> Pranith
> On 02/05/2015 03:44 PM, Xavier Hernandez wrote:
>> Is the failure repeatable ? with the same directories ?
>>
>> It's very weird that the directories appear on the volume when you
do
>> an 'ls' on the bricks. Could it be that you only made a single
'ls'
>> on fuse mount which not showed the directory ? Is it possible that 
>> this 'ls' triggered a self-heal that repaired the problem,
whatever
>> it was, and when you did another 'ls' on the fuse mount after
the
>> 'ls' on the bricks, the directories were there ?
>>
>> The first 'ls' could have healed the files, causing that the 
>> following 'ls' on the bricks showed the files as if nothing
were
>> damaged. If that's the case, it's possible that there were some
>> disconnections during the copy.
>>
>> Added Pranith because he knows better replication and self-heal
details.
>>
>> Xavi
>>
>> On 02/04/2015 07:23 PM, David F. Robinson wrote:
>>> Distributed/replicated
>>>
>>> Volume Name: homegfs
>>> Type: Distributed-Replicate
>>> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
>>> Status: Started
>>> Number of Bricks: 4 x 2 = 8
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
>>> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
>>> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
>>> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
>>> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
>>> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
>>> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
>>> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
>>> Options Reconfigured:
>>> performance.io-thread-count: 32
>>> performance.cache-size: 128MB
>>> performance.write-behind-window-size: 128MB
>>> server.allow-insecure: on
>>> network.ping-timeout: 10
>>> storage.owner-gid: 100
>>> geo-replication.indexing: off
>>> geo-replication.ignore-pid-check: on
>>> changelog.changelog: on
>>> changelog.fsync-interval: 3
>>> changelog.rollover-time: 15
>>> server.manage-gids: on
>>>
>>>
>>> ------ Original Message ------
>>> From: "Xavier Hernandez" <xhernandez at datalab.es>
>>> To: "David F. Robinson" <david.robinson at
corvidtec.com>; "Benjamin
>>> Turner" <bennyturns at gmail.com>
>>> Cc: "gluster-users at gluster.org" <gluster-users at
gluster.org>; "Gluster
>>> Devel" <gluster-devel at gluster.org>
>>> Sent: 2/4/2015 6:03:45 AM
>>> Subject: Re: [Gluster-devel] missing files
>>>
>>>> On 02/04/2015 01:30 AM, David F. Robinson wrote:
>>>>> Sorry. Thought about this a little more. I should have been
clearer.
>>>>> The files were on both bricks of the replica, not just one
side. So,
>>>>> both bricks had to have been up... The files/directories
just
>>>>> don't show
>>>>> up on the mount.
>>>>> I was reading and saw a related bug
>>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I
saw it
>>>>> suggested to run:
>>>>>          find <mount> -d -exec getfattr -h -n
trusted.ec.heal {} \;
>>>>
>>>> This command is specific for a dispersed volume. It won't
do anything
>>>> (aside from the error you are seeing) on a replicated volume.
>>>>
>>>> I think you are using a replicated volume, right ?
>>>>
>>>> In this case I'm not sure what can be happening. Is your
volume a pure
>>>> replicated one or a distributed-replicated ? on a pure
replicated it
>>>> doesn't make sense that some entries do not show in an
'ls' when the
>>>> file is in both replicas (at least without any error message in
the
>>>> logs). On a distributed-replicated it could be caused by some
problem
>>>> while combining contents of each replica set.
>>>>
>>>> What's the configuration of your volume ?
>>>>
>>>> Xavi
>>>>
>>>>>
>>>>> I get a bunch of errors for operation not supported:
>>>>> [root at gfs02a homegfs]# find wks_backup -d -exec getfattr
-h -n
>>>>> trusted.ec.heal {} \;
>>>>> find: warning: the -d option is deprecated; please use
-depth
>>>>> instead,
>>>>> because the latter is a POSIX-compliant feature.
>>>>> wks_backup/homer_backup/backup: trusted.ec.heal: Operation
not
>>>>> supported
>>>>> wks_backup/homer_backup/logs/2014_05_20.log:
trusted.ec.heal:
>>>>> Operation
>>>>> not supported
>>>>> wks_backup/homer_backup/logs/2014_05_21.log:
trusted.ec.heal:
>>>>> Operation
>>>>> not supported
>>>>> wks_backup/homer_backup/logs/2014_05_18.log:
trusted.ec.heal:
>>>>> Operation
>>>>> not supported
>>>>> wks_backup/homer_backup/logs/2014_05_19.log:
trusted.ec.heal:
>>>>> Operation
>>>>> not supported
>>>>> wks_backup/homer_backup/logs/2014_05_22.log:
trusted.ec.heal:
>>>>> Operation
>>>>> not supported
>>>>> wks_backup/homer_backup/logs: trusted.ec.heal: Operation
not
>>>>> supported
>>>>> wks_backup/homer_backup: trusted.ec.heal: Operation not
supported
>>>>> ------ Original Message ------
>>>>> From: "Benjamin Turner" <bennyturns at
gmail.com
>>>>> <mailto:bennyturns at gmail.com>>
>>>>> To: "David F. Robinson" <david.robinson at
corvidtec.com
>>>>> <mailto:david.robinson at corvidtec.com>>
>>>>> Cc: "Gluster Devel" <gluster-devel at
gluster.org
>>>>> <mailto:gluster-devel at gluster.org>>;
"gluster-users at gluster.org"
>>>>> <gluster-users at gluster.org <mailto:gluster-users
at gluster.org>>
>>>>> Sent: 2/3/2015 7:12:34 PM
>>>>> Subject: Re: [Gluster-devel] missing files
>>>>>> It sounds to me like the files were only copied to one
replica,
>>>>>> werent
>>>>>> there for the initial for the initial ls which
triggered a self
>>>>>> heal,
>>>>>> and were there for the last ls because they were
healed. Is there
>>>>>> any
>>>>>> chance that one of the replicas was down during the
rsync? It could
>>>>>> be that you lost a brick during copy or something like
that. To
>>>>>> confirm I would look for disconnects in the brick logs
as well as
>>>>>> checking glusterfshd.log to verify the missing files
were actually
>>>>>> healed.
>>>>>>
>>>>>> -b
>>>>>>
>>>>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
>>>>>> <david.robinson at corvidtec.com
<mailto:david.robinson at corvidtec.com>>
>>>>>> wrote:
>>>>>>
>>>>>>     I rsync'd 20-TB over to my gluster system and
noticed that I had
>>>>>>     some directories missing even though the rsync
completed
>>>>>> normally.
>>>>>>     The rsync logs showed that the missing files were
transferred.
>>>>>>     I went to the bricks and did an 'ls -al
>>>>>>     /data/brick*/homegfs/dir/*' the files were on
the bricks.
>>>>>> After I
>>>>>>     did this 'ls', the files then showed up on
the FUSE mounts.
>>>>>>     1) Why are the files hidden on the fuse mount?
>>>>>>     2) Why does the ls make them show up on the FUSE
mount?
>>>>>>     3) How can I prevent this from happening again?
>>>>>>     Note, I also mounted the gluster volume using NFS
and saw the
>>>>>> same
>>>>>>     behavior. The files/directories were not shown
until I did the
>>>>>>     "ls" on the bricks.
>>>>>>     David
>>>>>>    
==============================>>>>>>     David F. Robinson,
Ph.D.
>>>>>>     President - Corvid Technologies
>>>>>>     704.799.6944 x101 <tel:704.799.6944%20x101>
[office]
>>>>>>     704.252.1310 <tel:704.252.1310> [cell]
>>>>>>     704.799.7974 <tel:704.799.7974> [fax]
>>>>>>     David.Robinson at corvidtec.com 
>>>>>> <mailto:David.Robinson at corvidtec.com>
>>>>>>     http://www.corvidtechnologies.com
>>>>>> <http://www.corvidtechnologies.com/>
>>>>>>
>>>>>>     _______________________________________________
>>>>>>     Gluster-devel mailing list
>>>>>>     Gluster-devel at gluster.org
<mailto:Gluster-devel at gluster.org>
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Gluster users - Feb 2015 - [Gluster-devel] missing files

[Gluster-users] [Gluster-devel] missing files

[Gluster-users] [Gluster-devel] missing files