thr3ads.net - Gluster users - [Gluster-users] [Gluster-devel] missing files [Feb 2015]

If this information is useful, please help other people find it:
Share via:

David F. Robinson

2015-Feb-04 18:23 UTC

[Gluster-users] [Gluster-devel] missing files

Distributed/replicated

Volume Name: homegfs
Type: Distributed-Replicate
Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
Options Reconfigured:
performance.io-thread-count: 32
performance.cache-size: 128MB
performance.write-behind-window-size: 128MB
server.allow-insecure: on
network.ping-timeout: 10
storage.owner-gid: 100
geo-replication.indexing: off
geo-replication.ignore-pid-check: on
changelog.changelog: on
changelog.fsync-interval: 3
changelog.rollover-time: 15
server.manage-gids: on


------ Original Message ------
From: "Xavier Hernandez" <xhernandez at datalab.es>
To: "David F. Robinson" <david.robinson at corvidtec.com>;
"Benjamin
Turner" <bennyturns at gmail.com>
Cc: "gluster-users at gluster.org" <gluster-users at
gluster.org>; "Gluster
Devel" <gluster-devel at gluster.org>
Sent: 2/4/2015 6:03:45 AM
Subject: Re: [Gluster-devel] missing files
>On 02/04/2015 01:30 AM, David F. Robinson wrote:
>>Sorry. Thought about this a little more. I should have been clearer.
>>The files were on both bricks of the replica, not just one side. So,
>>both bricks had to have been up... The files/directories just don't 
>>show
>>up on the mount.
>>I was reading and saw a related bug
>>(https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it
>>suggested to run:
>>          find <mount> -d -exec getfattr -h -n trusted.ec.heal {}
\;
>
>This command is specific for a dispersed volume. It won't do anything 
>(aside from the error you are seeing) on a replicated volume.
>
>I think you are using a replicated volume, right ?
>
>In this case I'm not sure what can be happening. Is your volume a pure 
>replicated one or a distributed-replicated ? on a pure replicated it 
>doesn't make sense that some entries do not show in an 'ls' when
the
>file is in both replicas (at least without any error message in the 
>logs). On a distributed-replicated it could be caused by some problem 
>while combining contents of each replica set.
>
>What's the configuration of your volume ?
>
>Xavi
>
>>
>>I get a bunch of errors for operation not supported:
>>[root at gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n
>>trusted.ec.heal {} \;
>>find: warning: the -d option is deprecated; please use -depth instead,
>>because the latter is a POSIX-compliant feature.
>>wks_backup/homer_backup/backup: trusted.ec.heal: Operation not 
>>supported
>>wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: 
>>Operation
>>not supported
>>wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: 
>>Operation
>>not supported
>>wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: 
>>Operation
>>not supported
>>wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: 
>>Operation
>>not supported
>>wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: 
>>Operation
>>not supported
>>wks_backup/homer_backup/logs: trusted.ec.heal: Operation not supported
>>wks_backup/homer_backup: trusted.ec.heal: Operation not supported
>>------ Original Message ------
>>From: "Benjamin Turner" <bennyturns at gmail.com 
>><mailto:bennyturns at gmail.com>>
>>To: "David F. Robinson" <david.robinson at corvidtec.com
>><mailto:david.robinson at corvidtec.com>>
>>Cc: "Gluster Devel" <gluster-devel at gluster.org
>><mailto:gluster-devel at gluster.org>>; "gluster-users at
gluster.org"
>><gluster-users at gluster.org <mailto:gluster-users at
gluster.org>>
>>Sent: 2/3/2015 7:12:34 PM
>>Subject: Re: [Gluster-devel] missing files
>>>It sounds to me like the files were only copied to one replica, 
>>>werent
>>>there for the initial for the initial ls which triggered a self
heal,
>>>and were there for the last ls because they were healed. Is there
any
>>>chance that one of the replicas was down during the rsync? It could
>>>be that you lost a brick during copy or something like that. To
>>>confirm I would look for disconnects in the brick logs as well as
>>>checking glusterfshd.log to verify the missing files were actually
>>>healed.
>>>
>>>-b
>>>
>>>On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
>>><david.robinson at corvidtec.com <mailto:david.robinson at
corvidtec.com>>
>>>wrote:
>>>
>>>     I rsync'd 20-TB over to my gluster system and noticed that
I had
>>>     some directories missing even though the rsync completed 
>>>normally.
>>>     The rsync logs showed that the missing files were transferred.
>>>     I went to the bricks and did an 'ls -al
>>>     /data/brick*/homegfs/dir/*' the files were on the bricks.
After I
>>>     did this 'ls', the files then showed up on the FUSE
mounts.
>>>     1) Why are the files hidden on the fuse mount?
>>>     2) Why does the ls make them show up on the FUSE mount?
>>>     3) How can I prevent this from happening again?
>>>     Note, I also mounted the gluster volume using NFS and saw the 
>>>same
>>>     behavior. The files/directories were not shown until I did the
>>>     "ls" on the bricks.
>>>     David
>>>     ==============================>>>     David F.
Robinson, Ph.D.
>>>     President - Corvid Technologies
>>>     704.799.6944 x101 <tel:704.799.6944%20x101> [office]
>>>     704.252.1310 <tel:704.252.1310> [cell]
>>>     704.799.7974 <tel:704.799.7974> [fax]
>>>     David.Robinson at corvidtec.com 
>>><mailto:David.Robinson at corvidtec.com>
>>>     http://www.corvidtechnologies.com 
>>><http://www.corvidtechnologies.com/>
>>>
>>>     _______________________________________________
>>>     Gluster-devel mailing list
>>>     Gluster-devel at gluster.org <mailto:Gluster-devel at
gluster.org>
>>>     http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>>
>>
>>_______________________________________________
>>Gluster-devel mailing list
>>Gluster-devel at gluster.org
>>http://www.gluster.org/mailman/listinfo/gluster-devel
>>

Xavier Hernandez

2015-Feb-05 10:14 UTC

head link

[Gluster-users] [Gluster-devel] missing files

Is the failure repeatable ? with the same directories ?

It's very weird that the directories appear on the volume when you do an 
'ls' on the bricks. Could it be that you only made a single 'ls'
on fuse
mount which not showed the directory ? Is it possible that this 'ls' 
triggered a self-heal that repaired the problem, whatever it was, and 
when you did another 'ls' on the fuse mount after the 'ls' on
the
bricks, the directories were there ?

The first 'ls' could have healed the files, causing that the following 
'ls' on the bricks showed the files as if nothing were damaged. If 
that's the case, it's possible that there were some disconnections 
during the copy.

Added Pranith because he knows better replication and self-heal details.

Xavi

On 02/04/2015 07:23 PM, David F. Robinson wrote:> Distributed/replicated
>
> Volume Name: homegfs
> Type: Distributed-Replicate
> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
> Status: Started
> Number of Bricks: 4 x 2 = 8
> Transport-type: tcp
> Bricks:
> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
> Options Reconfigured:
> performance.io-thread-count: 32
> performance.cache-size: 128MB
> performance.write-behind-window-size: 128MB
> server.allow-insecure: on
> network.ping-timeout: 10
> storage.owner-gid: 100
> geo-replication.indexing: off
> geo-replication.ignore-pid-check: on
> changelog.changelog: on
> changelog.fsync-interval: 3
> changelog.rollover-time: 15
> server.manage-gids: on
>
>
> ------ Original Message ------
> From: "Xavier Hernandez" <xhernandez at datalab.es>
> To: "David F. Robinson" <david.robinson at corvidtec.com>;
"Benjamin
> Turner" <bennyturns at gmail.com>
> Cc: "gluster-users at gluster.org" <gluster-users at
gluster.org>; "Gluster
> Devel" <gluster-devel at gluster.org>
> Sent: 2/4/2015 6:03:45 AM
> Subject: Re: [Gluster-devel] missing files
>
>> On 02/04/2015 01:30 AM, David F. Robinson wrote:
>>> Sorry. Thought about this a little more. I should have been
clearer.
>>> The files were on both bricks of the replica, not just one side.
So,
>>> both bricks had to have been up... The files/directories just
don't show
>>> up on the mount.
>>> I was reading and saw a related bug
>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it
>>> suggested to run:
>>>          find <mount> -d -exec getfattr -h -n trusted.ec.heal
{} \;
>>
>> This command is specific for a dispersed volume. It won't do
anything
>> (aside from the error you are seeing) on a replicated volume.
>>
>> I think you are using a replicated volume, right ?
>>
>> In this case I'm not sure what can be happening. Is your volume a
pure
>> replicated one or a distributed-replicated ? on a pure replicated it
>> doesn't make sense that some entries do not show in an 'ls'
when the
>> file is in both replicas (at least without any error message in the
>> logs). On a distributed-replicated it could be caused by some problem
>> while combining contents of each replica set.
>>
>> What's the configuration of your volume ?
>>
>> Xavi
>>
>>>
>>> I get a bunch of errors for operation not supported:
>>> [root at gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n
>>> trusted.ec.heal {} \;
>>> find: warning: the -d option is deprecated; please use -depth
instead,
>>> because the latter is a POSIX-compliant feature.
>>> wks_backup/homer_backup/backup: trusted.ec.heal: Operation not
supported
>>> wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal:
Operation
>>> not supported
>>> wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal:
Operation
>>> not supported
>>> wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal:
Operation
>>> not supported
>>> wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal:
Operation
>>> not supported
>>> wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal:
Operation
>>> not supported
>>> wks_backup/homer_backup/logs: trusted.ec.heal: Operation not
supported
>>> wks_backup/homer_backup: trusted.ec.heal: Operation not supported
>>> ------ Original Message ------
>>> From: "Benjamin Turner" <bennyturns at gmail.com
>>> <mailto:bennyturns at gmail.com>>
>>> To: "David F. Robinson" <david.robinson at
corvidtec.com
>>> <mailto:david.robinson at corvidtec.com>>
>>> Cc: "Gluster Devel" <gluster-devel at gluster.org
>>> <mailto:gluster-devel at gluster.org>>;
"gluster-users at gluster.org"
>>> <gluster-users at gluster.org <mailto:gluster-users at
gluster.org>>
>>> Sent: 2/3/2015 7:12:34 PM
>>> Subject: Re: [Gluster-devel] missing files
>>>> It sounds to me like the files were only copied to one replica,
werent
>>>> there for the initial for the initial ls which triggered a self
heal,
>>>> and were there for the last ls because they were healed. Is
there any
>>>> chance that one of the replicas was down during the rsync? It
could
>>>> be that you lost a brick during copy or something like that. To
>>>> confirm I would look for disconnects in the brick logs as well
as
>>>> checking glusterfshd.log to verify the missing files were
actually
>>>> healed.
>>>>
>>>> -b
>>>>
>>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
>>>> <david.robinson at corvidtec.com <mailto:david.robinson
at corvidtec.com>>
>>>> wrote:
>>>>
>>>>     I rsync'd 20-TB over to my gluster system and noticed
that I had
>>>>     some directories missing even though the rsync completed
normally.
>>>>     The rsync logs showed that the missing files were
transferred.
>>>>     I went to the bricks and did an 'ls -al
>>>>     /data/brick*/homegfs/dir/*' the files were on the
bricks. After I
>>>>     did this 'ls', the files then showed up on the FUSE
mounts.
>>>>     1) Why are the files hidden on the fuse mount?
>>>>     2) Why does the ls make them show up on the FUSE mount?
>>>>     3) How can I prevent this from happening again?
>>>>     Note, I also mounted the gluster volume using NFS and saw
the same
>>>>     behavior. The files/directories were not shown until I did
the
>>>>     "ls" on the bricks.
>>>>     David
>>>>     ==============================>>>>     David F.
Robinson, Ph.D.
>>>>     President - Corvid Technologies
>>>>     704.799.6944 x101 <tel:704.799.6944%20x101> [office]
>>>>     704.252.1310 <tel:704.252.1310> [cell]
>>>>     704.799.7974 <tel:704.799.7974> [fax]
>>>>     David.Robinson at corvidtec.com <mailto:David.Robinson
at corvidtec.com>
>>>>     http://www.corvidtechnologies.com
>>>> <http://www.corvidtechnologies.com/>
>>>>
>>>>     _______________________________________________
>>>>     Gluster-devel mailing list
>>>>     Gluster-devel at gluster.org <mailto:Gluster-devel at
gluster.org>
>>>>     http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>

Gluster users - Feb 2015 - [Gluster-devel] missing files

[Gluster-users] [Gluster-devel] missing files

[Gluster-users] [Gluster-devel] missing files