thr3ads.net - Gluster users - [Gluster-users] [Gluster-devel] missing files [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Ben Turner

2015-Feb-06 20:33 UTC

[Gluster-users] [Gluster-devel] missing files

----- Original Message -----> From: "Justin Clift" <justin at gluster.org>
> To: "Benjamin Turner" <bennyturns at gmail.com>
> Cc: "David F. Robinson" <david.robinson at corvidtec.com>,
gluster-users at gluster.org, "Gluster Devel"
> <gluster-devel at gluster.org>, "Ben Turner" <bturner at
redhat.com>
> Sent: Friday, February 6, 2015 3:27:53 PM
> Subject: Re: [Gluster-devel] [Gluster-users]  missing files
> 
> On 6 Feb 2015, at 02:05, Benjamin Turner <bennyturns at gmail.com>
wrote:
> > I think that the multi threaded epoll changes that _just_ landed in
master
> > will help resolve this, but they are so new I haven't been able to
test
> > this.  I'll know more when I get a chance to test tomorrow.
> 
> Which multi-threaded epoll code just landed in master?  Are you thinking
> of this one?
> 
>   http://review.gluster.org/#/c/3842/
> 
> If so, it's not in master yet. ;)
Doh!  I just saw - "Required patches are all upstream now" and assumed
they were merged.  I have been in class all week so I am not up2date with
everything.  I gave instructions on compiling it from the gerrit patches +
master so if David wants to give it a go he can.  Sorry for the confusion.

-b
 > + Justin
> 
> 
> > -b
> > 
> > On Thu, Feb 5, 2015 at 6:04 PM, David F. Robinson
> > <david.robinson at corvidtec.com> wrote:
> > Isn't rsync what geo-rep uses?
> > 
> > David  (Sent from mobile)
> > 
> > ==============================> > David F. Robinson, Ph.D.
> > President - Corvid Technologies
> > 704.799.6944 x101 [office]
> > 704.252.1310      [cell]
> > 704.799.7974      [fax]
> > David.Robinson at corvidtec.com
> > http://www.corvidtechnologies.com
> > 
> > > On Feb 5, 2015, at 5:41 PM, Ben Turner <bturner at
redhat.com> wrote:
> > >
> > > ----- Original Message -----
> > >> From: "Ben Turner" <bturner at redhat.com>
> > >> To: "David F. Robinson" <david.robinson at
corvidtec.com>
> > >> Cc: "Pranith Kumar Karampuri" <pkarampu at
redhat.com>, "Xavier Hernandez"
> > >> <xhernandez at datalab.es>, "Benjamin Turner"
> > >> <bennyturns at gmail.com>, gluster-users at
gluster.org, "Gluster Devel"
> > >> <gluster-devel at gluster.org>
> > >> Sent: Thursday, February 5, 2015 5:22:26 PM
> > >> Subject: Re: [Gluster-users] [Gluster-devel] missing files
> > >>
> > >> ----- Original Message -----
> > >>> From: "David F. Robinson" <david.robinson at
corvidtec.com>
> > >>> To: "Ben Turner" <bturner at redhat.com>
> > >>> Cc: "Pranith Kumar Karampuri" <pkarampu at
redhat.com>, "Xavier Hernandez"
> > >>> <xhernandez at datalab.es>, "Benjamin
Turner"
> > >>> <bennyturns at gmail.com>, gluster-users at
gluster.org, "Gluster Devel"
> > >>> <gluster-devel at gluster.org>
> > >>> Sent: Thursday, February 5, 2015 5:01:13 PM
> > >>> Subject: Re: [Gluster-users] [Gluster-devel] missing
files
> > >>>
> > >>> I'll send you the emails I sent Pranith with the
logs. What causes
> > >>> these
> > >>> disconnects?
> > >>
> > >> Thanks David!  Disconnects happen when there are interruption
in
> > >> communication between peers, normally there is ping timeout
that
> > >> happens.
> > >> It could be anything from a flaky NW to the system was to
busy to
> > >> respond
> > >> to the pings.  My initial take is more towards the ladder as
rsync is
> > >> absolutely the worst use case for gluster - IIRC it writes in
4kb
> > >> blocks.  I
> > >> try to keep my writes at least 64KB as in my testing that is
the
> > >> smallest
> > >> block size I can write with before perf starts to really drop
off.  I'll
> > >> try
> > >> something similar in the lab.
> > >
> > > Ok I do think that the file being self healed is RCA for what you
were
> > > seeing.  Lets look at one of the disconnects:
> > >
> > > data-brick02a-homegfs.log:[2015-02-03 20:54:02.772180] I
> > > [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
> > > connection from
> > >
gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1
> > >
> > > And in the glustershd.log from the gfs01b_glustershd.log file:
> > >
> > > [2015-02-03 20:55:48.001797] I
> > > [afr-self-heal-entry.c:554:afr_selfheal_entry_do]
0-homegfs-replicate-0:
> > > performing entry selfheal on 6c79a368-edaa-432b-bef9-ec690ab42448
> > > [2015-02-03 20:55:49.341996] I
> > > [afr-self-heal-common.c:476:afr_log_selfheal]
0-homegfs-replicate-0:
> > > Completed entry selfheal on 6c79a368-edaa-432b-bef9-ec690ab42448.
> > > source=1 sinks=0
> > > [2015-02-03 20:55:49.343093] I
> > > [afr-self-heal-entry.c:554:afr_selfheal_entry_do]
0-homegfs-replicate-0:
> > > performing entry selfheal on 792cb0d6-9290-4447-8cd7-2b2d7a116a69
> > > [2015-02-03 20:55:50.463652] I
> > > [afr-self-heal-common.c:476:afr_log_selfheal]
0-homegfs-replicate-0:
> > > Completed entry selfheal on 792cb0d6-9290-4447-8cd7-2b2d7a116a69.
> > > source=1 sinks=0
> > > [2015-02-03 20:55:51.465289] I
> > > [afr-self-heal-metadata.c:54:__afr_selfheal_metadata_do]
> > > 0-homegfs-replicate-0: performing metadata selfheal on
> > > 403e661a-1c27-4e79-9867-c0572aba2b3c
> > > [2015-02-03 20:55:51.466515] I
> > > [afr-self-heal-common.c:476:afr_log_selfheal]
0-homegfs-replicate-0:
> > > Completed metadata selfheal on
403e661a-1c27-4e79-9867-c0572aba2b3c.
> > > source=1 sinks=0
> > > [2015-02-03 20:55:51.467098] I
> > > [afr-self-heal-entry.c:554:afr_selfheal_entry_do]
0-homegfs-replicate-0:
> > > performing entry selfheal on 403e661a-1c27-4e79-9867-c0572aba2b3c
> > > [2015-02-03 20:55:55.257808] I
> > > [afr-self-heal-common.c:476:afr_log_selfheal]
0-homegfs-replicate-0:
> > > Completed entry selfheal on 403e661a-1c27-4e79-9867-c0572aba2b3c.
> > > source=1 sinks=0
> > > [2015-02-03 20:55:55.258548] I
> > > [afr-self-heal-metadata.c:54:__afr_selfheal_metadata_do]
> > > 0-homegfs-replicate-0: performing metadata selfheal on
> > > c612ee2f-2fb4-4157-a9ab-5a2d5603c541
> > > [2015-02-03 20:55:55.259367] I
> > > [afr-self-heal-common.c:476:afr_log_selfheal]
0-homegfs-replicate-0:
> > > Completed metadata selfheal on
c612ee2f-2fb4-4157-a9ab-5a2d5603c541.
> > > source=1 sinks=0
> > > [2015-02-03 20:55:55.259980] I
> > > [afr-self-heal-entry.c:554:afr_selfheal_entry_do]
0-homegfs-replicate-0:
> > > performing entry selfheal on c612ee2f-2fb4-4157-a9ab-5a2d5603c541
> > >
> > > As you can see the self heal logs are just spammed with files
being
> > > healed, and I looked at a couple of disconnects and I see self
heals
> > > getting run shortly after on the bricks that were down.  Now we
need to
> > > find the cause of the disconnects, I am thinking once the
disconnects
> > > are resolved the files should be properly copied over without SH
having
> > > to fix things.  Like I said I'll give this a go on my lab
systems and
> > > see if I can repro the disconnects, I'll have time to run
through it
> > > tomorrow.  If in the mean time anyone else has a theory /
anything to
> > > add here it would be appreciated.
> > >
> > > -b
> > >
> > >> -b
> > >>
> > >>> David  (Sent from mobile)
> > >>>
> > >>> ==============================> > >>>
David F. Robinson, Ph.D.
> > >>> President - Corvid Technologies
> > >>> 704.799.6944 x101 [office]
> > >>> 704.252.1310      [cell]
> > >>> 704.799.7974      [fax]
> > >>> David.Robinson at corvidtec.com
> > >>> http://www.corvidtechnologies.com
> > >>>
> > >>>> On Feb 5, 2015, at 4:55 PM, Ben Turner <bturner at
redhat.com> wrote:
> > >>>>
> > >>>> ----- Original Message -----
> > >>>>> From: "Pranith Kumar Karampuri"
<pkarampu at redhat.com>
> > >>>>> To: "Xavier Hernandez" <xhernandez
at datalab.es>, "David F. Robinson"
> > >>>>> <david.robinson at corvidtec.com>,
"Benjamin Turner"
> > >>>>> <bennyturns at gmail.com>
> > >>>>> Cc: gluster-users at gluster.org, "Gluster
Devel"
> > >>>>> <gluster-devel at gluster.org>
> > >>>>> Sent: Thursday, February 5, 2015 5:30:04 AM
> > >>>>> Subject: Re: [Gluster-users] [Gluster-devel]
missing files
> > >>>>>
> > >>>>>
> > >>>>>> On 02/05/2015 03:48 PM, Pranith Kumar
Karampuri wrote:
> > >>>>>> I believe David already fixed this. I hope
this is the same issue he
> > >>>>>> told about permissions issue.
> > >>>>> Oops, it is not. I will take a look.
> > >>>>
> > >>>> Yes David exactly like these:
> > >>>>
> > >>>> data-brick02a-homegfs.log:[2015-02-03
19:09:34.568842] I
> > >>>> [server.c:518:server_rpc_notify] 0-homegfs-server:
disconnecting
> > >>>> connection from
> > >>>>
gfs02a.corvidtec.com-18563-2015/02/03-19:07:58:519134-homegfs-client-2-0-0
> > >>>> data-brick02a-homegfs.log:[2015-02-03
19:09:41.286551] I
> > >>>> [server.c:518:server_rpc_notify] 0-homegfs-server:
disconnecting
> > >>>> connection from
> > >>>>
gfs01a.corvidtec.com-12804-2015/02/03-19:09:38:497808-homegfs-client-2-0-0
> > >>>> data-brick02a-homegfs.log:[2015-02-03
19:16:35.906412] I
> > >>>> [server.c:518:server_rpc_notify] 0-homegfs-server:
disconnecting
> > >>>> connection from
> > >>>>
gfs02b.corvidtec.com-27190-2015/02/03-19:15:53:458467-homegfs-client-2-0-0
> > >>>> data-brick02a-homegfs.log:[2015-02-03
19:51:22.761293] I
> > >>>> [server.c:518:server_rpc_notify] 0-homegfs-server:
disconnecting
> > >>>> connection from
> > >>>>
gfs01a.corvidtec.com-25926-2015/02/03-19:51:02:89070-homegfs-client-2-0-0
> > >>>> data-brick02a-homegfs.log:[2015-02-03
20:54:02.772180] I
> > >>>> [server.c:518:server_rpc_notify] 0-homegfs-server:
disconnecting
> > >>>> connection from
> > >>>>
gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1
> > >>>>
> > >>>> You can 100% verify my theory if you can correlate
the time on the
> > >>>> disconnects to the time that the missing files were
healed.  Can you
> > >>>> have
> > >>>> a look at /var/log/glusterfs/glustershd.log?  That
has all of the
> > >>>> healed
> > >>>> files + timestamps, if we can see a disconnect during
the rsync and a
> > >>>> self
> > >>>> heal of the missing file I think we can safely assume
that the
> > >>>> disconnects
> > >>>> may have caused this.  I'll try this on my test
systems, how much data
> > >>>> did
> > >>>> you rsync?  What size ish of files / an idea of the
dir layout?
> > >>>>
> > >>>> @Pranith - Could bricks flapping up and down during
the rsync cause
> > >>>> the
> > >>>> files to be missing on the first ls(written to 1
subvol but not the
> > >>>> other
> > >>>> cause it was down), the ls triggered SH, and thats
why the files were
> > >>>> there for the second ls be a possible cause here?
> > >>>>
> > >>>> -b
> > >>>>
> > >>>>
> > >>>>> Pranith
> > >>>>>>
> > >>>>>> Pranith
> > >>>>>>> On 02/05/2015 03:44 PM, Xavier Hernandez
wrote:
> > >>>>>>> Is the failure repeatable ? with the same
directories ?
> > >>>>>>>
> > >>>>>>> It's very weird that the directories
appear on the volume when you
> > >>>>>>> do
> > >>>>>>> an 'ls' on the bricks. Could it
be that you only made a single 'ls'
> > >>>>>>> on fuse mount which not showed the
directory ? Is it possible that
> > >>>>>>> this 'ls' triggered a self-heal
that repaired the problem, whatever
> > >>>>>>> it was, and when you did another
'ls' on the fuse mount after the
> > >>>>>>> 'ls' on the bricks, the
directories were there ?
> > >>>>>>>
> > >>>>>>> The first 'ls' could have healed
the files, causing that the
> > >>>>>>> following 'ls' on the bricks
showed the files as if nothing were
> > >>>>>>> damaged. If that's the case, it's
possible that there were some
> > >>>>>>> disconnections during the copy.
> > >>>>>>>
> > >>>>>>> Added Pranith because he knows better
replication and self-heal
> > >>>>>>> details.
> > >>>>>>>
> > >>>>>>> Xavi
> > >>>>>>>
> > >>>>>>>> On 02/04/2015 07:23 PM, David F.
Robinson wrote:
> > >>>>>>>> Distributed/replicated
> > >>>>>>>>
> > >>>>>>>> Volume Name: homegfs
> > >>>>>>>> Type: Distributed-Replicate
> > >>>>>>>> Volume ID:
1e32672a-f1b7-4b58-ba94-58c085e59071
> > >>>>>>>> Status: Started
> > >>>>>>>> Number of Bricks: 4 x 2 = 8
> > >>>>>>>> Transport-type: tcp
> > >>>>>>>> Bricks:
> > >>>>>>>> Brick1:
gfsib01a.corvidtec.com:/data/brick01a/homegfs
> > >>>>>>>> Brick2:
gfsib01b.corvidtec.com:/data/brick01b/homegfs
> > >>>>>>>> Brick3:
gfsib01a.corvidtec.com:/data/brick02a/homegfs
> > >>>>>>>> Brick4:
gfsib01b.corvidtec.com:/data/brick02b/homegfs
> > >>>>>>>> Brick5:
gfsib02a.corvidtec.com:/data/brick01a/homegfs
> > >>>>>>>> Brick6:
gfsib02b.corvidtec.com:/data/brick01b/homegfs
> > >>>>>>>> Brick7:
gfsib02a.corvidtec.com:/data/brick02a/homegfs
> > >>>>>>>> Brick8:
gfsib02b.corvidtec.com:/data/brick02b/homegfs
> > >>>>>>>> Options Reconfigured:
> > >>>>>>>> performance.io-thread-count: 32
> > >>>>>>>> performance.cache-size: 128MB
> > >>>>>>>> performance.write-behind-window-size:
128MB
> > >>>>>>>> server.allow-insecure: on
> > >>>>>>>> network.ping-timeout: 10
> > >>>>>>>> storage.owner-gid: 100
> > >>>>>>>> geo-replication.indexing: off
> > >>>>>>>> geo-replication.ignore-pid-check: on
> > >>>>>>>> changelog.changelog: on
> > >>>>>>>> changelog.fsync-interval: 3
> > >>>>>>>> changelog.rollover-time: 15
> > >>>>>>>> server.manage-gids: on
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> ------ Original Message ------
> > >>>>>>>> From: "Xavier Hernandez"
<xhernandez at datalab.es>
> > >>>>>>>> To: "David F. Robinson"
<david.robinson at corvidtec.com>; "Benjamin
> > >>>>>>>> Turner" <bennyturns at
gmail.com>
> > >>>>>>>> Cc: "gluster-users at
gluster.org" <gluster-users at gluster.org>;
> > >>>>>>>> "Gluster
> > >>>>>>>> Devel" <gluster-devel at
gluster.org>
> > >>>>>>>> Sent: 2/4/2015 6:03:45 AM
> > >>>>>>>> Subject: Re: [Gluster-devel] missing
files
> > >>>>>>>>
> > >>>>>>>>>> On 02/04/2015 01:30 AM, David
F. Robinson wrote:
> > >>>>>>>>>> Sorry. Thought about this a
little more. I should have been
> > >>>>>>>>>> clearer.
> > >>>>>>>>>> The files were on both bricks
of the replica, not just one side.
> > >>>>>>>>>> So,
> > >>>>>>>>>> both bricks had to have been
up... The files/directories just
> > >>>>>>>>>> don't show
> > >>>>>>>>>> up on the mount.
> > >>>>>>>>>> I was reading and saw a
related bug
> > >>>>>>>>>>
(https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it
> > >>>>>>>>>> suggested to run:
> > >>>>>>>>>>        find <mount> -d
-exec getfattr -h -n trusted.ec.heal {}
> > >>>>>>>>>>        \;
> > >>>>>>>>>
> > >>>>>>>>> This command is specific for a
dispersed volume. It won't do
> > >>>>>>>>> anything
> > >>>>>>>>> (aside from the error you are
seeing) on a replicated volume.
> > >>>>>>>>>
> > >>>>>>>>> I think you are using a
replicated volume, right ?
> > >>>>>>>>>
> > >>>>>>>>> In this case I'm not sure
what can be happening. Is your volume a
> > >>>>>>>>> pure
> > >>>>>>>>> replicated one or a
distributed-replicated ? on a pure replicated
> > >>>>>>>>> it
> > >>>>>>>>> doesn't make sense that some
entries do not show in an 'ls' when
> > >>>>>>>>> the
> > >>>>>>>>> file is in both replicas (at
least without any error message in
> > >>>>>>>>> the
> > >>>>>>>>> logs). On a
distributed-replicated it could be caused by some
> > >>>>>>>>> problem
> > >>>>>>>>> while combining contents of each
replica set.
> > >>>>>>>>>
> > >>>>>>>>> What's the configuration of
your volume ?
> > >>>>>>>>>
> > >>>>>>>>> Xavi
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> I get a bunch of errors for
operation not supported:
> > >>>>>>>>>> [root at gfs02a homegfs]#
find wks_backup -d -exec getfattr -h -n
> > >>>>>>>>>> trusted.ec.heal {} \;
> > >>>>>>>>>> find: warning: the -d option
is deprecated; please use -depth
> > >>>>>>>>>> instead,
> > >>>>>>>>>> because the latter is a
POSIX-compliant feature.
> > >>>>>>>>>>
wks_backup/homer_backup/backup: trusted.ec.heal: Operation not
> > >>>>>>>>>> supported
> > >>>>>>>>>>
wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal:
> > >>>>>>>>>> Operation
> > >>>>>>>>>> not supported
> > >>>>>>>>>>
wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal:
> > >>>>>>>>>> Operation
> > >>>>>>>>>> not supported
> > >>>>>>>>>>
wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal:
> > >>>>>>>>>> Operation
> > >>>>>>>>>> not supported
> > >>>>>>>>>>
wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal:
> > >>>>>>>>>> Operation
> > >>>>>>>>>> not supported
> > >>>>>>>>>>
wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal:
> > >>>>>>>>>> Operation
> > >>>>>>>>>> not supported
> > >>>>>>>>>> wks_backup/homer_backup/logs:
trusted.ec.heal: Operation not
> > >>>>>>>>>> supported
> > >>>>>>>>>> wks_backup/homer_backup:
trusted.ec.heal: Operation not
> > >>>>>>>>>> supported
> > >>>>>>>>>> ------ Original Message
------
> > >>>>>>>>>> From: "Benjamin
Turner" <bennyturns at gmail.com
> > >>>>>>>>>> <mailto:bennyturns at
gmail.com>>
> > >>>>>>>>>> To: "David F.
Robinson" <david.robinson at corvidtec.com
> > >>>>>>>>>> <mailto:david.robinson at
corvidtec.com>>
> > >>>>>>>>>> Cc: "Gluster Devel"
<gluster-devel at gluster.org
> > >>>>>>>>>> <mailto:gluster-devel at
gluster.org>>; "gluster-users at gluster.org"
> > >>>>>>>>>> <gluster-users at
gluster.org <mailto:gluster-users at gluster.org>>
> > >>>>>>>>>> Sent: 2/3/2015 7:12:34 PM
> > >>>>>>>>>> Subject: Re: [Gluster-devel]
missing files
> > >>>>>>>>>>> It sounds to me like the
files were only copied to one replica,
> > >>>>>>>>>>> werent
> > >>>>>>>>>>> there for the initial for
the initial ls which triggered a self
> > >>>>>>>>>>> heal,
> > >>>>>>>>>>> and were there for the
last ls because they were healed. Is
> > >>>>>>>>>>> there
> > >>>>>>>>>>> any
> > >>>>>>>>>>> chance that one of the
replicas was down during the rsync? It
> > >>>>>>>>>>> could
> > >>>>>>>>>>> be that you lost a brick
during copy or something like that. To
> > >>>>>>>>>>> confirm I would look for
disconnects in the brick logs as well
> > >>>>>>>>>>> as
> > >>>>>>>>>>> checking glusterfshd.log
to verify the missing files were
> > >>>>>>>>>>> actually
> > >>>>>>>>>>> healed.
> > >>>>>>>>>>>
> > >>>>>>>>>>> -b
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Tue, Feb 3, 2015 at
5:37 PM, David F. Robinson
> > >>>>>>>>>>> <david.robinson at
corvidtec.com
> > >>>>>>>>>>> <mailto:david.robinson
at corvidtec.com>>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>   I rsync'd 20-TB
over to my gluster system and noticed that I
> > >>>>>>>>>>>   had
> > >>>>>>>>>>>   some directories
missing even though the rsync completed
> > >>>>>>>>>>> normally.
> > >>>>>>>>>>>   The rsync logs showed
that the missing files were
> > >>>>>>>>>>>   transferred.
> > >>>>>>>>>>>   I went to the bricks
and did an 'ls -al
> > >>>>>>>>>>>  
/data/brick*/homegfs/dir/*' the files were on the bricks.
> > >>>>>>>>>>> After I
> > >>>>>>>>>>>   did this 'ls',
the files then showed up on the FUSE mounts.
> > >>>>>>>>>>>   1) Why are the files
hidden on the fuse mount?
> > >>>>>>>>>>>   2) Why does the ls make
them show up on the FUSE mount?
> > >>>>>>>>>>>   3) How can I prevent
this from happening again?
> > >>>>>>>>>>>   Note, I also mounted
the gluster volume using NFS and saw the
> > >>>>>>>>>>> same
> > >>>>>>>>>>>   behavior. The
files/directories were not shown until I did
> > >>>>>>>>>>>   the
> > >>>>>>>>>>>   "ls" on the
bricks.
> > >>>>>>>>>>>   David
> > >>>>>>>>>>>  
==============================> >
>>>>>>>>>>>   David F. Robinson, Ph.D.
> > >>>>>>>>>>>   President - Corvid
Technologies
> > >>>>>>>>>>>   704.799.6944 x101
<tel:704.799.6944%20x101> [office]
> > >>>>>>>>>>>   704.252.1310
<tel:704.252.1310> [cell]
> > >>>>>>>>>>>   704.799.7974
<tel:704.799.7974> [fax]
> > >>>>>>>>>>>   David.Robinson at
corvidtec.com
> > >>>>>>>>>>> <mailto:David.Robinson
at corvidtec.com>
> > >>>>>>>>>>>  
http://www.corvidtechnologies.com
> > >>>>>>>>>>>
<http://www.corvidtechnologies.com/>
> > >>>>>>>>>>>
> > >>>>>>>>>>>  
_______________________________________________
> > >>>>>>>>>>>   Gluster-devel mailing
list
> > >>>>>>>>>>>   Gluster-devel at
gluster.org <mailto:Gluster-devel at gluster.org>
> > >>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-devel
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
_______________________________________________
> > >>>>>>>>>> Gluster-devel mailing list
> > >>>>>>>>>> Gluster-devel at gluster.org
> > >>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-devel
> > >>>>>>
> > >>>>>>
_______________________________________________
> > >>>>>> Gluster-users mailing list
> > >>>>>> Gluster-users at gluster.org
> > >>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> Gluster-users mailing list
> > >>>>> Gluster-users at gluster.org
> > >>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
> > >>
> > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> --
> GlusterFS - http://www.gluster.org
> 
> An open source, distributed file system scaling to several
> petabytes, and handling thousands of clients.
> 
> My personal twitter: twitter.com/realjustinclift
> 
>

David F. Robinson

2015-Feb-06 20:34 UTC

head link

[Gluster-users] [Gluster-devel] missing files

I don't think I understood what you sent enough to give it a try.  I'll 
wait until it comes out in a beta or release version.

David


------ Original Message ------
From: "Ben Turner" <bturner at redhat.com>
To: "Justin Clift" <justin at gluster.org>; "David F.
Robinson"
<david.robinson at corvidtec.com>
Cc: "Benjamin Turner" <bennyturns at gmail.com>; gluster-users
at gluster.org;
"Gluster Devel" <gluster-devel at gluster.org>
Sent: 2/6/2015 3:33:42 PM
Subject: Re: [Gluster-devel] [Gluster-users] missing files
>----- Original Message -----
>>  From: "Justin Clift" <justin at gluster.org>
>>  To: "Benjamin Turner" <bennyturns at gmail.com>
>>  Cc: "David F. Robinson" <david.robinson at
corvidtec.com>,
>>gluster-users at gluster.org, "Gluster Devel"
>>  <gluster-devel at gluster.org>, "Ben Turner"
<bturner at redhat.com>
>>  Sent: Friday, February 6, 2015 3:27:53 PM
>>  Subject: Re: [Gluster-devel] [Gluster-users] missing files
>>
>>  On 6 Feb 2015, at 02:05, Benjamin Turner <bennyturns at
gmail.com>
>>wrote:
>>  > I think that the multi threaded epoll changes that _just_ landed
in
>>master
>>  > will help resolve this, but they are so new I haven't been
able to
>>test
>>  > this. I'll know more when I get a chance to test tomorrow.
>>
>>  Which multi-threaded epoll code just landed in master? Are you 
>>thinking
>>  of this one?
>>
>>    http://review.gluster.org/#/c/3842/
>>
>>  If so, it's not in master yet. ;)
>
>Doh! I just saw - "Required patches are all upstream now" and
assumed
>they were merged. I have been in class all week so I am not up2date 
>with everything. I gave instructions on compiling it from the gerrit 
>patches + master so if David wants to give it a go he can. Sorry for 
>the confusion.
>
>-b
>
>>  + Justin
>>
>>
>>  > -b
>>  >
>>  > On Thu, Feb 5, 2015 at 6:04 PM, David F. Robinson
>>  > <david.robinson at corvidtec.com> wrote:
>>  > Isn't rsync what geo-rep uses?
>>  >
>>  > David (Sent from mobile)
>>  >
>>  > ==============================>>  > David F. Robinson,
Ph.D.
>>  > President - Corvid Technologies
>>  > 704.799.6944 x101 [office]
>>  > 704.252.1310 [cell]
>>  > 704.799.7974 [fax]
>>  > David.Robinson at corvidtec.com
>>  > http://www.corvidtechnologies.com
>>  >
>>  > > On Feb 5, 2015, at 5:41 PM, Ben Turner <bturner at
redhat.com>
>>wrote:
>>  > >
>>  > > ----- Original Message -----
>>  > >> From: "Ben Turner" <bturner at
redhat.com>
>>  > >> To: "David F. Robinson" <david.robinson at
corvidtec.com>
>>  > >> Cc: "Pranith Kumar Karampuri" <pkarampu at
redhat.com>, "Xavier
>>Hernandez"
>>  > >> <xhernandez at datalab.es>, "Benjamin
Turner"
>>  > >> <bennyturns at gmail.com>, gluster-users at
gluster.org, "Gluster
>>Devel"
>>  > >> <gluster-devel at gluster.org>
>>  > >> Sent: Thursday, February 5, 2015 5:22:26 PM
>>  > >> Subject: Re: [Gluster-users] [Gluster-devel] missing
files
>>  > >>
>>  > >> ----- Original Message -----
>>  > >>> From: "David F. Robinson"
<david.robinson at corvidtec.com>
>>  > >>> To: "Ben Turner" <bturner at
redhat.com>
>>  > >>> Cc: "Pranith Kumar Karampuri" <pkarampu
at redhat.com>, "Xavier
>>Hernandez"
>>  > >>> <xhernandez at datalab.es>, "Benjamin
Turner"
>>  > >>> <bennyturns at gmail.com>, gluster-users at
gluster.org, "Gluster
>>Devel"
>>  > >>> <gluster-devel at gluster.org>
>>  > >>> Sent: Thursday, February 5, 2015 5:01:13 PM
>>  > >>> Subject: Re: [Gluster-users] [Gluster-devel] missing
files
>>  > >>>
>>  > >>> I'll send you the emails I sent Pranith with the
logs. What
>>causes
>>  > >>> these
>>  > >>> disconnects?
>>  > >>
>>  > >> Thanks David! Disconnects happen when there are
interruption in
>>  > >> communication between peers, normally there is ping
timeout that
>>  > >> happens.
>>  > >> It could be anything from a flaky NW to the system was
to busy
>>to
>>  > >> respond
>>  > >> to the pings. My initial take is more towards the ladder
as
>>rsync is
>>  > >> absolutely the worst use case for gluster - IIRC it
writes in
>>4kb
>>  > >> blocks. I
>>  > >> try to keep my writes at least 64KB as in my testing
that is the
>>  > >> smallest
>>  > >> block size I can write with before perf starts to really
drop
>>off. I'll
>>  > >> try
>>  > >> something similar in the lab.
>>  > >
>>  > > Ok I do think that the file being self healed is RCA for
what you
>>were
>>  > > seeing. Lets look at one of the disconnects:
>>  > >
>>  > > data-brick02a-homegfs.log:[2015-02-03 20:54:02.772180] I
>>  > > [server.c:518:server_rpc_notify] 0-homegfs-server:
disconnecting
>>  > > connection from
>>  > > 
>>gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1
>>  > >
>>  > > And in the glustershd.log from the gfs01b_glustershd.log
file:
>>  > >
>>  > > [2015-02-03 20:55:48.001797] I
>>  > > [afr-self-heal-entry.c:554:afr_selfheal_entry_do] 
>>0-homegfs-replicate-0:
>>  > > performing entry selfheal on
6c79a368-edaa-432b-bef9-ec690ab42448
>>  > > [2015-02-03 20:55:49.341996] I
>>  > > [afr-self-heal-common.c:476:afr_log_selfheal] 
>>0-homegfs-replicate-0:
>>  > > Completed entry selfheal on
6c79a368-edaa-432b-bef9-ec690ab42448.
>>  > > source=1 sinks=0
>>  > > [2015-02-03 20:55:49.343093] I
>>  > > [afr-self-heal-entry.c:554:afr_selfheal_entry_do] 
>>0-homegfs-replicate-0:
>>  > > performing entry selfheal on
792cb0d6-9290-4447-8cd7-2b2d7a116a69
>>  > > [2015-02-03 20:55:50.463652] I
>>  > > [afr-self-heal-common.c:476:afr_log_selfheal] 
>>0-homegfs-replicate-0:
>>  > > Completed entry selfheal on
792cb0d6-9290-4447-8cd7-2b2d7a116a69.
>>  > > source=1 sinks=0
>>  > > [2015-02-03 20:55:51.465289] I
>>  > > [afr-self-heal-metadata.c:54:__afr_selfheal_metadata_do]
>>  > > 0-homegfs-replicate-0: performing metadata selfheal on
>>  > > 403e661a-1c27-4e79-9867-c0572aba2b3c
>>  > > [2015-02-03 20:55:51.466515] I
>>  > > [afr-self-heal-common.c:476:afr_log_selfheal] 
>>0-homegfs-replicate-0:
>>  > > Completed metadata selfheal on 
>>403e661a-1c27-4e79-9867-c0572aba2b3c.
>>  > > source=1 sinks=0
>>  > > [2015-02-03 20:55:51.467098] I
>>  > > [afr-self-heal-entry.c:554:afr_selfheal_entry_do] 
>>0-homegfs-replicate-0:
>>  > > performing entry selfheal on
403e661a-1c27-4e79-9867-c0572aba2b3c
>>  > > [2015-02-03 20:55:55.257808] I
>>  > > [afr-self-heal-common.c:476:afr_log_selfheal] 
>>0-homegfs-replicate-0:
>>  > > Completed entry selfheal on
403e661a-1c27-4e79-9867-c0572aba2b3c.
>>  > > source=1 sinks=0
>>  > > [2015-02-03 20:55:55.258548] I
>>  > > [afr-self-heal-metadata.c:54:__afr_selfheal_metadata_do]
>>  > > 0-homegfs-replicate-0: performing metadata selfheal on
>>  > > c612ee2f-2fb4-4157-a9ab-5a2d5603c541
>>  > > [2015-02-03 20:55:55.259367] I
>>  > > [afr-self-heal-common.c:476:afr_log_selfheal] 
>>0-homegfs-replicate-0:
>>  > > Completed metadata selfheal on 
>>c612ee2f-2fb4-4157-a9ab-5a2d5603c541.
>>  > > source=1 sinks=0
>>  > > [2015-02-03 20:55:55.259980] I
>>  > > [afr-self-heal-entry.c:554:afr_selfheal_entry_do] 
>>0-homegfs-replicate-0:
>>  > > performing entry selfheal on
c612ee2f-2fb4-4157-a9ab-5a2d5603c541
>>  > >
>>  > > As you can see the self heal logs are just spammed with
files
>>being
>>  > > healed, and I looked at a couple of disconnects and I see
self
>>heals
>>  > > getting run shortly after on the bricks that were down. Now
we
>>need to
>>  > > find the cause of the disconnects, I am thinking once the 
>>disconnects
>>  > > are resolved the files should be properly copied over
without SH
>>having
>>  > > to fix things. Like I said I'll give this a go on my lab
systems
>>and
>>  > > see if I can repro the disconnects, I'll have time to
run through
>>it
>>  > > tomorrow. If in the mean time anyone else has a theory /
anything
>>to
>>  > > add here it would be appreciated.
>>  > >
>>  > > -b
>>  > >
>>  > >> -b
>>  > >>
>>  > >>> David (Sent from mobile)
>>  > >>>
>>  > >>> ==============================>>  >
>>> David F. Robinson, Ph.D.
>>  > >>> President - Corvid Technologies
>>  > >>> 704.799.6944 x101 [office]
>>  > >>> 704.252.1310 [cell]
>>  > >>> 704.799.7974 [fax]
>>  > >>> David.Robinson at corvidtec.com
>>  > >>> http://www.corvidtechnologies.com
>>  > >>>
>>  > >>>> On Feb 5, 2015, at 4:55 PM, Ben Turner
<bturner at redhat.com>
>>wrote:
>>  > >>>>
>>  > >>>> ----- Original Message -----
>>  > >>>>> From: "Pranith Kumar Karampuri"
<pkarampu at redhat.com>
>>  > >>>>> To: "Xavier Hernandez"
<xhernandez at datalab.es>, "David F.
>>Robinson"
>>  > >>>>> <david.robinson at corvidtec.com>,
"Benjamin Turner"
>>  > >>>>> <bennyturns at gmail.com>
>>  > >>>>> Cc: gluster-users at gluster.org,
"Gluster Devel"
>>  > >>>>> <gluster-devel at gluster.org>
>>  > >>>>> Sent: Thursday, February 5, 2015 5:30:04 AM
>>  > >>>>> Subject: Re: [Gluster-users] [Gluster-devel]
missing files
>>  > >>>>>
>>  > >>>>>
>>  > >>>>>> On 02/05/2015 03:48 PM, Pranith Kumar
Karampuri wrote:
>>  > >>>>>> I believe David already fixed this. I
hope this is the same
>>issue he
>>  > >>>>>> told about permissions issue.
>>  > >>>>> Oops, it is not. I will take a look.
>>  > >>>>
>>  > >>>> Yes David exactly like these:
>>  > >>>>
>>  > >>>> data-brick02a-homegfs.log:[2015-02-03
19:09:34.568842] I
>>  > >>>> [server.c:518:server_rpc_notify]
0-homegfs-server:
>>disconnecting
>>  > >>>> connection from
>>  > >>>> 
>>gfs02a.corvidtec.com-18563-2015/02/03-19:07:58:519134-homegfs-client-2-0-0
>>  > >>>> data-brick02a-homegfs.log:[2015-02-03
19:09:41.286551] I
>>  > >>>> [server.c:518:server_rpc_notify]
0-homegfs-server:
>>disconnecting
>>  > >>>> connection from
>>  > >>>> 
>>gfs01a.corvidtec.com-12804-2015/02/03-19:09:38:497808-homegfs-client-2-0-0
>>  > >>>> data-brick02a-homegfs.log:[2015-02-03
19:16:35.906412] I
>>  > >>>> [server.c:518:server_rpc_notify]
0-homegfs-server:
>>disconnecting
>>  > >>>> connection from
>>  > >>>> 
>>gfs02b.corvidtec.com-27190-2015/02/03-19:15:53:458467-homegfs-client-2-0-0
>>  > >>>> data-brick02a-homegfs.log:[2015-02-03
19:51:22.761293] I
>>  > >>>> [server.c:518:server_rpc_notify]
0-homegfs-server:
>>disconnecting
>>  > >>>> connection from
>>  > >>>> 
>>gfs01a.corvidtec.com-25926-2015/02/03-19:51:02:89070-homegfs-client-2-0-0
>>  > >>>> data-brick02a-homegfs.log:[2015-02-03
20:54:02.772180] I
>>  > >>>> [server.c:518:server_rpc_notify]
0-homegfs-server:
>>disconnecting
>>  > >>>> connection from
>>  > >>>> 
>>gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1
>>  > >>>>
>>  > >>>> You can 100% verify my theory if you can
correlate the time on
>>the
>>  > >>>> disconnects to the time that the missing files
were healed.
>>Can you
>>  > >>>> have
>>  > >>>> a look at /var/log/glusterfs/glustershd.log?
That has all of
>>the
>>  > >>>> healed
>>  > >>>> files + timestamps, if we can see a disconnect
during the
>>rsync and a
>>  > >>>> self
>>  > >>>> heal of the missing file I think we can safely
assume that the
>>  > >>>> disconnects
>>  > >>>> may have caused this. I'll try this on my
test systems, how
>>much data
>>  > >>>> did
>>  > >>>> you rsync? What size ish of files / an idea of
the dir layout?
>>  > >>>>
>>  > >>>> @Pranith - Could bricks flapping up and down
during the rsync
>>cause
>>  > >>>> the
>>  > >>>> files to be missing on the first ls(written to 1
subvol but
>>not the
>>  > >>>> other
>>  > >>>> cause it was down), the ls triggered SH, and
thats why the
>>files were
>>  > >>>> there for the second ls be a possible cause
here?
>>  > >>>>
>>  > >>>> -b
>>  > >>>>
>>  > >>>>
>>  > >>>>> Pranith
>>  > >>>>>>
>>  > >>>>>> Pranith
>>  > >>>>>>> On 02/05/2015 03:44 PM, Xavier
Hernandez wrote:
>>  > >>>>>>> Is the failure repeatable ? with the
same directories ?
>>  > >>>>>>>
>>  > >>>>>>> It's very weird that the
directories appear on the volume
>>when you
>>  > >>>>>>> do
>>  > >>>>>>> an 'ls' on the bricks. Could
it be that you only made a
>>single 'ls'
>>  > >>>>>>> on fuse mount which not showed the
directory ? Is it
>>possible that
>>  > >>>>>>> this 'ls' triggered a
self-heal that repaired the problem,
>>whatever
>>  > >>>>>>> it was, and when you did another
'ls' on the fuse mount
>>after the
>>  > >>>>>>> 'ls' on the bricks, the
directories were there ?
>>  > >>>>>>>
>>  > >>>>>>> The first 'ls' could have
healed the files, causing that
>>the
>>  > >>>>>>> following 'ls' on the bricks
showed the files as if nothing
>>were
>>  > >>>>>>> damaged. If that's the case,
it's possible that there were
>>some
>>  > >>>>>>> disconnections during the copy.
>>  > >>>>>>>
>>  > >>>>>>> Added Pranith because he knows
better replication and
>>self-heal
>>  > >>>>>>> details.
>>  > >>>>>>>
>>  > >>>>>>> Xavi
>>  > >>>>>>>
>>  > >>>>>>>> On 02/04/2015 07:23 PM, David F.
Robinson wrote:
>>  > >>>>>>>> Distributed/replicated
>>  > >>>>>>>>
>>  > >>>>>>>> Volume Name: homegfs
>>  > >>>>>>>> Type: Distributed-Replicate
>>  > >>>>>>>> Volume ID:
1e32672a-f1b7-4b58-ba94-58c085e59071
>>  > >>>>>>>> Status: Started
>>  > >>>>>>>> Number of Bricks: 4 x 2 = 8
>>  > >>>>>>>> Transport-type: tcp
>>  > >>>>>>>> Bricks:
>>  > >>>>>>>> Brick1:
gfsib01a.corvidtec.com:/data/brick01a/homegfs
>>  > >>>>>>>> Brick2:
gfsib01b.corvidtec.com:/data/brick01b/homegfs
>>  > >>>>>>>> Brick3:
gfsib01a.corvidtec.com:/data/brick02a/homegfs
>>  > >>>>>>>> Brick4:
gfsib01b.corvidtec.com:/data/brick02b/homegfs
>>  > >>>>>>>> Brick5:
gfsib02a.corvidtec.com:/data/brick01a/homegfs
>>  > >>>>>>>> Brick6:
gfsib02b.corvidtec.com:/data/brick01b/homegfs
>>  > >>>>>>>> Brick7:
gfsib02a.corvidtec.com:/data/brick02a/homegfs
>>  > >>>>>>>> Brick8:
gfsib02b.corvidtec.com:/data/brick02b/homegfs
>>  > >>>>>>>> Options Reconfigured:
>>  > >>>>>>>> performance.io-thread-count: 32
>>  > >>>>>>>> performance.cache-size: 128MB
>>  > >>>>>>>>
performance.write-behind-window-size: 128MB
>>  > >>>>>>>> server.allow-insecure: on
>>  > >>>>>>>> network.ping-timeout: 10
>>  > >>>>>>>> storage.owner-gid: 100
>>  > >>>>>>>> geo-replication.indexing: off
>>  > >>>>>>>>
geo-replication.ignore-pid-check: on
>>  > >>>>>>>> changelog.changelog: on
>>  > >>>>>>>> changelog.fsync-interval: 3
>>  > >>>>>>>> changelog.rollover-time: 15
>>  > >>>>>>>> server.manage-gids: on
>>  > >>>>>>>>
>>  > >>>>>>>>
>>  > >>>>>>>> ------ Original Message ------
>>  > >>>>>>>> From: "Xavier
Hernandez" <xhernandez at datalab.es>
>>  > >>>>>>>> To: "David F.
Robinson" <david.robinson at corvidtec.com>;
>>"Benjamin
>>  > >>>>>>>> Turner" <bennyturns at
gmail.com>
>>  > >>>>>>>> Cc: "gluster-users at
gluster.org"
>><gluster-users at gluster.org>;
>>  > >>>>>>>> "Gluster
>>  > >>>>>>>> Devel" <gluster-devel at
gluster.org>
>>  > >>>>>>>> Sent: 2/4/2015 6:03:45 AM
>>  > >>>>>>>> Subject: Re: [Gluster-devel]
missing files
>>  > >>>>>>>>
>>  > >>>>>>>>>> On 02/04/2015 01:30 AM,
David F. Robinson wrote:
>>  > >>>>>>>>>> Sorry. Thought about
this a little more. I should have
>>been
>>  > >>>>>>>>>> clearer.
>>  > >>>>>>>>>> The files were on both
bricks of the replica, not just
>>one side.
>>  > >>>>>>>>>> So,
>>  > >>>>>>>>>> both bricks had to have
been up... The files/directories
>>just
>>  > >>>>>>>>>> don't show
>>  > >>>>>>>>>> up on the mount.
>>  > >>>>>>>>>> I was reading and saw a
related bug
>>  > >>>>>>>>>>
(https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I
>>saw it
>>  > >>>>>>>>>> suggested to run:
>>  > >>>>>>>>>> find <mount> -d
-exec getfattr -h -n trusted.ec.heal {}
>>  > >>>>>>>>>> \;
>>  > >>>>>>>>>
>>  > >>>>>>>>> This command is specific for
a dispersed volume. It won't
>>do
>>  > >>>>>>>>> anything
>>  > >>>>>>>>> (aside from the error you
are seeing) on a replicated
>>volume.
>>  > >>>>>>>>>
>>  > >>>>>>>>> I think you are using a
replicated volume, right ?
>>  > >>>>>>>>>
>>  > >>>>>>>>> In this case I'm not
sure what can be happening. Is your
>>volume a
>>  > >>>>>>>>> pure
>>  > >>>>>>>>> replicated one or a
distributed-replicated ? on a pure
>>replicated
>>  > >>>>>>>>> it
>>  > >>>>>>>>> doesn't make sense that
some entries do not show in an
>>'ls' when
>>  > >>>>>>>>> the
>>  > >>>>>>>>> file is in both replicas (at
least without any error
>>message in
>>  > >>>>>>>>> the
>>  > >>>>>>>>> logs). On a
distributed-replicated it could be caused by
>>some
>>  > >>>>>>>>> problem
>>  > >>>>>>>>> while combining contents of
each replica set.
>>  > >>>>>>>>>
>>  > >>>>>>>>> What's the configuration
of your volume ?
>>  > >>>>>>>>>
>>  > >>>>>>>>> Xavi
>>  > >>>>>>>>>
>>  > >>>>>>>>>>
>>  > >>>>>>>>>> I get a bunch of errors
for operation not supported:
>>  > >>>>>>>>>> [root at gfs02a
homegfs]# find wks_backup -d -exec getfattr
>>-h -n
>>  > >>>>>>>>>> trusted.ec.heal {} \;
>>  > >>>>>>>>>> find: warning: the -d
option is deprecated; please use
>>-depth
>>  > >>>>>>>>>> instead,
>>  > >>>>>>>>>> because the latter is a
POSIX-compliant feature.
>>  > >>>>>>>>>>
wks_backup/homer_backup/backup: trusted.ec.heal:
>>Operation not
>>  > >>>>>>>>>> supported
>>  > >>>>>>>>>>
wks_backup/homer_backup/logs/2014_05_20.log:
>>trusted.ec.heal:
>>  > >>>>>>>>>> Operation
>>  > >>>>>>>>>> not supported
>>  > >>>>>>>>>>
wks_backup/homer_backup/logs/2014_05_21.log:
>>trusted.ec.heal:
>>  > >>>>>>>>>> Operation
>>  > >>>>>>>>>> not supported
>>  > >>>>>>>>>>
wks_backup/homer_backup/logs/2014_05_18.log:
>>trusted.ec.heal:
>>  > >>>>>>>>>> Operation
>>  > >>>>>>>>>> not supported
>>  > >>>>>>>>>>
wks_backup/homer_backup/logs/2014_05_19.log:
>>trusted.ec.heal:
>>  > >>>>>>>>>> Operation
>>  > >>>>>>>>>> not supported
>>  > >>>>>>>>>>
wks_backup/homer_backup/logs/2014_05_22.log:
>>trusted.ec.heal:
>>  > >>>>>>>>>> Operation
>>  > >>>>>>>>>> not supported
>>  > >>>>>>>>>>
wks_backup/homer_backup/logs: trusted.ec.heal: Operation
>>not
>>  > >>>>>>>>>> supported
>>  > >>>>>>>>>> wks_backup/homer_backup:
trusted.ec.heal: Operation not
>>  > >>>>>>>>>> supported
>>  > >>>>>>>>>> ------ Original Message
------
>>  > >>>>>>>>>> From: "Benjamin
Turner" <bennyturns at gmail.com
>>  > >>>>>>>>>> <mailto:bennyturns at
gmail.com>>
>>  > >>>>>>>>>> To: "David F.
Robinson" <david.robinson at corvidtec.com
>>  > >>>>>>>>>>
<mailto:david.robinson at corvidtec.com>>
>>  > >>>>>>>>>> Cc: "Gluster
Devel" <gluster-devel at gluster.org
>>  > >>>>>>>>>> <mailto:gluster-devel
at gluster.org>>;
>>"gluster-users at gluster.org"
>>  > >>>>>>>>>> <gluster-users at
gluster.org
>><mailto:gluster-users at gluster.org>>
>>  > >>>>>>>>>> Sent: 2/3/2015 7:12:34
PM
>>  > >>>>>>>>>> Subject: Re:
[Gluster-devel] missing files
>>  > >>>>>>>>>>> It sounds to me like
the files were only copied to one
>>replica,
>>  > >>>>>>>>>>> werent
>>  > >>>>>>>>>>> there for the
initial for the initial ls which
>>triggered a self
>>  > >>>>>>>>>>> heal,
>>  > >>>>>>>>>>> and were there for
the last ls because they were
>>healed. Is
>>  > >>>>>>>>>>> there
>>  > >>>>>>>>>>> any
>>  > >>>>>>>>>>> chance that one of
the replicas was down during the
>>rsync? It
>>  > >>>>>>>>>>> could
>>  > >>>>>>>>>>> be that you lost a
brick during copy or something like
>>that. To
>>  > >>>>>>>>>>> confirm I would look
for disconnects in the brick logs
>>as well
>>  > >>>>>>>>>>> as
>>  > >>>>>>>>>>> checking
glusterfshd.log to verify the missing files
>>were
>>  > >>>>>>>>>>> actually
>>  > >>>>>>>>>>> healed.
>>  > >>>>>>>>>>>
>>  > >>>>>>>>>>> -b
>>  > >>>>>>>>>>>
>>  > >>>>>>>>>>> On Tue, Feb 3, 2015
at 5:37 PM, David F. Robinson
>>  > >>>>>>>>>>> <david.robinson
at corvidtec.com
>>  > >>>>>>>>>>>
<mailto:david.robinson at corvidtec.com>>
>>  > >>>>>>>>>>> wrote:
>>  > >>>>>>>>>>>
>>  > >>>>>>>>>>> I rsync'd 20-TB
over to my gluster system and noticed
>>that I
>>  > >>>>>>>>>>> had
>>  > >>>>>>>>>>> some directories
missing even though the rsync
>>completed
>>  > >>>>>>>>>>> normally.
>>  > >>>>>>>>>>> The rsync logs
showed that the missing files were
>>  > >>>>>>>>>>> transferred.
>>  > >>>>>>>>>>> I went to the bricks
and did an 'ls -al
>>  > >>>>>>>>>>>
/data/brick*/homegfs/dir/*' the files were on the
>>bricks.
>>  > >>>>>>>>>>> After I
>>  > >>>>>>>>>>> did this
'ls', the files then showed up on the FUSE
>>mounts.
>>  > >>>>>>>>>>> 1) Why are the files
hidden on the fuse mount?
>>  > >>>>>>>>>>> 2) Why does the ls
make them show up on the FUSE mount?
>>  > >>>>>>>>>>> 3) How can I prevent
this from happening again?
>>  > >>>>>>>>>>> Note, I also mounted
the gluster volume using NFS and
>>saw the
>>  > >>>>>>>>>>> same
>>  > >>>>>>>>>>> behavior. The
files/directories were not shown until I
>>did
>>  > >>>>>>>>>>> the
>>  > >>>>>>>>>>> "ls" on
the bricks.
>>  > >>>>>>>>>>> David
>>  > >>>>>>>>>>>
==============================>>  >
>>>>>>>>>>> David F. Robinson, Ph.D.
>>  > >>>>>>>>>>> President - Corvid
Technologies
>>  > >>>>>>>>>>> 704.799.6944 x101
<tel:704.799.6944%20x101> [office]
>>  > >>>>>>>>>>> 704.252.1310
<tel:704.252.1310> [cell]
>>  > >>>>>>>>>>> 704.799.7974
<tel:704.799.7974> [fax]
>>  > >>>>>>>>>>> David.Robinson at
corvidtec.com
>>  > >>>>>>>>>>>
<mailto:David.Robinson at corvidtec.com>
>>  > >>>>>>>>>>>
http://www.corvidtechnologies.com
>>  > >>>>>>>>>>>
<http://www.corvidtechnologies.com/>
>>  > >>>>>>>>>>>
>>  > >>>>>>>>>>>
_______________________________________________
>>  > >>>>>>>>>>> Gluster-devel
mailing list
>>  > >>>>>>>>>>> Gluster-devel at
gluster.org
>><mailto:Gluster-devel at gluster.org>
>>  > >>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-devel
>>  > >>>>>>>>>>
>>  > >>>>>>>>>>
>>  > >>>>>>>>>>
_______________________________________________
>>  > >>>>>>>>>> Gluster-devel mailing
list
>>  > >>>>>>>>>> Gluster-devel at
gluster.org
>>  > >>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-devel
>>  > >>>>>>
>>  > >>>>>>
_______________________________________________
>>  > >>>>>> Gluster-users mailing list
>>  > >>>>>> Gluster-users at gluster.org
>>  > >>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>  > >>>>>
>>  > >>>>>
_______________________________________________
>>  > >>>>> Gluster-users mailing list
>>  > >>>>> Gluster-users at gluster.org
>>  > >>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>  > >>
>>  >
>>  > _______________________________________________
>>  > Gluster-devel mailing list
>>  > Gluster-devel at gluster.org
>>  > http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>>  --
>>  GlusterFS - http://www.gluster.org
>>
>>  An open source, distributed file system scaling to several
>>  petabytes, and handling thousands of clients.
>>
>>  My personal twitter: twitter.com/realjustinclift
>>
>>

Justin Clift

2015-Feb-08 13:43 UTC

head link

[Gluster-users] [Gluster-devel] missing files

On 6 Feb 2015, at 20:33, Ben Turner <bturner at redhat.com>
wrote:> ----- Original Message -----
>> From: "Justin Clift" <justin at gluster.org>
>> To: "Benjamin Turner" <bennyturns at gmail.com>
>> Cc: "David F. Robinson" <david.robinson at
corvidtec.com>, gluster-users at gluster.org, "Gluster Devel"
>> <gluster-devel at gluster.org>, "Ben Turner"
<bturner at redhat.com>
>> Sent: Friday, February 6, 2015 3:27:53 PM
>> Subject: Re: [Gluster-devel] [Gluster-users]  missing files
>> 
>> On 6 Feb 2015, at 02:05, Benjamin Turner <bennyturns at
gmail.com> wrote:
>>> I think that the multi threaded epoll changes that _just_ landed in
master
>>> will help resolve this, but they are so new I haven't been able
to test
>>> this.  I'll know more when I get a chance to test tomorrow.
>> 
>> Which multi-threaded epoll code just landed in master?  Are you
thinking
>> of this one?
>> 
>>  http://review.gluster.org/#/c/3842/
>> 
>> If so, it's not in master yet. ;)
> 
> Doh!  I just saw - "Required patches are all upstream now" and
assumed they were merged.  I have been in class all week so I am not up2date
with everything.  I gave instructions on compiling it from the gerrit patches +
master so if David wants to give it a go he can.  Sorry for the confusion.
Vijay merged the code into master yesterday, so it should be too long under we
can get some rpms created for people to test with (easily). :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

Gluster users - Feb 2015 - [Gluster-devel] missing files

[Gluster-users] [Gluster-devel] missing files

[Gluster-users] [Gluster-devel] missing files

[Gluster-users] [Gluster-devel] missing files