Pranith Kumar Karampuri
2015-Feb-05 10:18 UTC
[Gluster-users] [Gluster-devel] missing files
I believe David already fixed this. I hope this is the same issue he told about permissions issue. Pranith On 02/05/2015 03:44 PM, Xavier Hernandez wrote:> Is the failure repeatable ? with the same directories ? > > It's very weird that the directories appear on the volume when you do > an 'ls' on the bricks. Could it be that you only made a single 'ls' on > fuse mount which not showed the directory ? Is it possible that this > 'ls' triggered a self-heal that repaired the problem, whatever it was, > and when you did another 'ls' on the fuse mount after the 'ls' on the > bricks, the directories were there ? > > The first 'ls' could have healed the files, causing that the following > 'ls' on the bricks showed the files as if nothing were damaged. If > that's the case, it's possible that there were some disconnections > during the copy. > > Added Pranith because he knows better replication and self-heal details. > > Xavi > > On 02/04/2015 07:23 PM, David F. Robinson wrote: >> Distributed/replicated >> >> Volume Name: homegfs >> Type: Distributed-Replicate >> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 >> Status: Started >> Number of Bricks: 4 x 2 = 8 >> Transport-type: tcp >> Bricks: >> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs >> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs >> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs >> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs >> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs >> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs >> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs >> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs >> Options Reconfigured: >> performance.io-thread-count: 32 >> performance.cache-size: 128MB >> performance.write-behind-window-size: 128MB >> server.allow-insecure: on >> network.ping-timeout: 10 >> storage.owner-gid: 100 >> geo-replication.indexing: off >> geo-replication.ignore-pid-check: on >> changelog.changelog: on >> changelog.fsync-interval: 3 >> changelog.rollover-time: 15 >> server.manage-gids: on >> >> >> ------ Original Message ------ >> From: "Xavier Hernandez" <xhernandez at datalab.es> >> To: "David F. Robinson" <david.robinson at corvidtec.com>; "Benjamin >> Turner" <bennyturns at gmail.com> >> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>; "Gluster >> Devel" <gluster-devel at gluster.org> >> Sent: 2/4/2015 6:03:45 AM >> Subject: Re: [Gluster-devel] missing files >> >>> On 02/04/2015 01:30 AM, David F. Robinson wrote: >>>> Sorry. Thought about this a little more. I should have been clearer. >>>> The files were on both bricks of the replica, not just one side. So, >>>> both bricks had to have been up... The files/directories just don't >>>> show >>>> up on the mount. >>>> I was reading and saw a related bug >>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it >>>> suggested to run: >>>> find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \; >>> >>> This command is specific for a dispersed volume. It won't do anything >>> (aside from the error you are seeing) on a replicated volume. >>> >>> I think you are using a replicated volume, right ? >>> >>> In this case I'm not sure what can be happening. Is your volume a pure >>> replicated one or a distributed-replicated ? on a pure replicated it >>> doesn't make sense that some entries do not show in an 'ls' when the >>> file is in both replicas (at least without any error message in the >>> logs). On a distributed-replicated it could be caused by some problem >>> while combining contents of each replica set. >>> >>> What's the configuration of your volume ? >>> >>> Xavi >>> >>>> >>>> I get a bunch of errors for operation not supported: >>>> [root at gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n >>>> trusted.ec.heal {} \; >>>> find: warning: the -d option is deprecated; please use -depth instead, >>>> because the latter is a POSIX-compliant feature. >>>> wks_backup/homer_backup/backup: trusted.ec.heal: Operation not >>>> supported >>>> wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: >>>> Operation >>>> not supported >>>> wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: >>>> Operation >>>> not supported >>>> wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: >>>> Operation >>>> not supported >>>> wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: >>>> Operation >>>> not supported >>>> wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: >>>> Operation >>>> not supported >>>> wks_backup/homer_backup/logs: trusted.ec.heal: Operation not supported >>>> wks_backup/homer_backup: trusted.ec.heal: Operation not supported >>>> ------ Original Message ------ >>>> From: "Benjamin Turner" <bennyturns at gmail.com >>>> <mailto:bennyturns at gmail.com>> >>>> To: "David F. Robinson" <david.robinson at corvidtec.com >>>> <mailto:david.robinson at corvidtec.com>> >>>> Cc: "Gluster Devel" <gluster-devel at gluster.org >>>> <mailto:gluster-devel at gluster.org>>; "gluster-users at gluster.org" >>>> <gluster-users at gluster.org <mailto:gluster-users at gluster.org>> >>>> Sent: 2/3/2015 7:12:34 PM >>>> Subject: Re: [Gluster-devel] missing files >>>>> It sounds to me like the files were only copied to one replica, >>>>> werent >>>>> there for the initial for the initial ls which triggered a self heal, >>>>> and were there for the last ls because they were healed. Is there any >>>>> chance that one of the replicas was down during the rsync? It could >>>>> be that you lost a brick during copy or something like that. To >>>>> confirm I would look for disconnects in the brick logs as well as >>>>> checking glusterfshd.log to verify the missing files were actually >>>>> healed. >>>>> >>>>> -b >>>>> >>>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson >>>>> <david.robinson at corvidtec.com <mailto:david.robinson at corvidtec.com>> >>>>> wrote: >>>>> >>>>> I rsync'd 20-TB over to my gluster system and noticed that I had >>>>> some directories missing even though the rsync completed >>>>> normally. >>>>> The rsync logs showed that the missing files were transferred. >>>>> I went to the bricks and did an 'ls -al >>>>> /data/brick*/homegfs/dir/*' the files were on the bricks. After I >>>>> did this 'ls', the files then showed up on the FUSE mounts. >>>>> 1) Why are the files hidden on the fuse mount? >>>>> 2) Why does the ls make them show up on the FUSE mount? >>>>> 3) How can I prevent this from happening again? >>>>> Note, I also mounted the gluster volume using NFS and saw the >>>>> same >>>>> behavior. The files/directories were not shown until I did the >>>>> "ls" on the bricks. >>>>> David >>>>> ==============================>>>>> David F. Robinson, Ph.D. >>>>> President - Corvid Technologies >>>>> 704.799.6944 x101 <tel:704.799.6944%20x101> [office] >>>>> 704.252.1310 <tel:704.252.1310> [cell] >>>>> 704.799.7974 <tel:704.799.7974> [fax] >>>>> David.Robinson at corvidtec.com >>>>> <mailto:David.Robinson at corvidtec.com> >>>>> http://www.corvidtechnologies.com >>>>> <http://www.corvidtechnologies.com/> >>>>> >>>>> _______________________________________________ >>>>> Gluster-devel mailing list >>>>> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org> >>>>> http://www.gluster.org/mailman/listinfo/gluster-devel >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Gluster-devel mailing list >>>> Gluster-devel at gluster.org >>>> http://www.gluster.org/mailman/listinfo/gluster-devel >>>> >>
Pranith Kumar Karampuri
2015-Feb-05 10:30 UTC
[Gluster-users] [Gluster-devel] missing files
On 02/05/2015 03:48 PM, Pranith Kumar Karampuri wrote:> I believe David already fixed this. I hope this is the same issue he > told about permissions issue.Oops, it is not. I will take a look. Pranith> > Pranith > On 02/05/2015 03:44 PM, Xavier Hernandez wrote: >> Is the failure repeatable ? with the same directories ? >> >> It's very weird that the directories appear on the volume when you do >> an 'ls' on the bricks. Could it be that you only made a single 'ls' >> on fuse mount which not showed the directory ? Is it possible that >> this 'ls' triggered a self-heal that repaired the problem, whatever >> it was, and when you did another 'ls' on the fuse mount after the >> 'ls' on the bricks, the directories were there ? >> >> The first 'ls' could have healed the files, causing that the >> following 'ls' on the bricks showed the files as if nothing were >> damaged. If that's the case, it's possible that there were some >> disconnections during the copy. >> >> Added Pranith because he knows better replication and self-heal details. >> >> Xavi >> >> On 02/04/2015 07:23 PM, David F. Robinson wrote: >>> Distributed/replicated >>> >>> Volume Name: homegfs >>> Type: Distributed-Replicate >>> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 >>> Status: Started >>> Number of Bricks: 4 x 2 = 8 >>> Transport-type: tcp >>> Bricks: >>> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs >>> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs >>> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs >>> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs >>> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs >>> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs >>> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs >>> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs >>> Options Reconfigured: >>> performance.io-thread-count: 32 >>> performance.cache-size: 128MB >>> performance.write-behind-window-size: 128MB >>> server.allow-insecure: on >>> network.ping-timeout: 10 >>> storage.owner-gid: 100 >>> geo-replication.indexing: off >>> geo-replication.ignore-pid-check: on >>> changelog.changelog: on >>> changelog.fsync-interval: 3 >>> changelog.rollover-time: 15 >>> server.manage-gids: on >>> >>> >>> ------ Original Message ------ >>> From: "Xavier Hernandez" <xhernandez at datalab.es> >>> To: "David F. Robinson" <david.robinson at corvidtec.com>; "Benjamin >>> Turner" <bennyturns at gmail.com> >>> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>; "Gluster >>> Devel" <gluster-devel at gluster.org> >>> Sent: 2/4/2015 6:03:45 AM >>> Subject: Re: [Gluster-devel] missing files >>> >>>> On 02/04/2015 01:30 AM, David F. Robinson wrote: >>>>> Sorry. Thought about this a little more. I should have been clearer. >>>>> The files were on both bricks of the replica, not just one side. So, >>>>> both bricks had to have been up... The files/directories just >>>>> don't show >>>>> up on the mount. >>>>> I was reading and saw a related bug >>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it >>>>> suggested to run: >>>>> find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \; >>>> >>>> This command is specific for a dispersed volume. It won't do anything >>>> (aside from the error you are seeing) on a replicated volume. >>>> >>>> I think you are using a replicated volume, right ? >>>> >>>> In this case I'm not sure what can be happening. Is your volume a pure >>>> replicated one or a distributed-replicated ? on a pure replicated it >>>> doesn't make sense that some entries do not show in an 'ls' when the >>>> file is in both replicas (at least without any error message in the >>>> logs). On a distributed-replicated it could be caused by some problem >>>> while combining contents of each replica set. >>>> >>>> What's the configuration of your volume ? >>>> >>>> Xavi >>>> >>>>> >>>>> I get a bunch of errors for operation not supported: >>>>> [root at gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n >>>>> trusted.ec.heal {} \; >>>>> find: warning: the -d option is deprecated; please use -depth >>>>> instead, >>>>> because the latter is a POSIX-compliant feature. >>>>> wks_backup/homer_backup/backup: trusted.ec.heal: Operation not >>>>> supported >>>>> wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: >>>>> Operation >>>>> not supported >>>>> wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: >>>>> Operation >>>>> not supported >>>>> wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: >>>>> Operation >>>>> not supported >>>>> wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: >>>>> Operation >>>>> not supported >>>>> wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: >>>>> Operation >>>>> not supported >>>>> wks_backup/homer_backup/logs: trusted.ec.heal: Operation not >>>>> supported >>>>> wks_backup/homer_backup: trusted.ec.heal: Operation not supported >>>>> ------ Original Message ------ >>>>> From: "Benjamin Turner" <bennyturns at gmail.com >>>>> <mailto:bennyturns at gmail.com>> >>>>> To: "David F. Robinson" <david.robinson at corvidtec.com >>>>> <mailto:david.robinson at corvidtec.com>> >>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org >>>>> <mailto:gluster-devel at gluster.org>>; "gluster-users at gluster.org" >>>>> <gluster-users at gluster.org <mailto:gluster-users at gluster.org>> >>>>> Sent: 2/3/2015 7:12:34 PM >>>>> Subject: Re: [Gluster-devel] missing files >>>>>> It sounds to me like the files were only copied to one replica, >>>>>> werent >>>>>> there for the initial for the initial ls which triggered a self >>>>>> heal, >>>>>> and were there for the last ls because they were healed. Is there >>>>>> any >>>>>> chance that one of the replicas was down during the rsync? It could >>>>>> be that you lost a brick during copy or something like that. To >>>>>> confirm I would look for disconnects in the brick logs as well as >>>>>> checking glusterfshd.log to verify the missing files were actually >>>>>> healed. >>>>>> >>>>>> -b >>>>>> >>>>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson >>>>>> <david.robinson at corvidtec.com <mailto:david.robinson at corvidtec.com>> >>>>>> wrote: >>>>>> >>>>>> I rsync'd 20-TB over to my gluster system and noticed that I had >>>>>> some directories missing even though the rsync completed >>>>>> normally. >>>>>> The rsync logs showed that the missing files were transferred. >>>>>> I went to the bricks and did an 'ls -al >>>>>> /data/brick*/homegfs/dir/*' the files were on the bricks. >>>>>> After I >>>>>> did this 'ls', the files then showed up on the FUSE mounts. >>>>>> 1) Why are the files hidden on the fuse mount? >>>>>> 2) Why does the ls make them show up on the FUSE mount? >>>>>> 3) How can I prevent this from happening again? >>>>>> Note, I also mounted the gluster volume using NFS and saw the >>>>>> same >>>>>> behavior. The files/directories were not shown until I did the >>>>>> "ls" on the bricks. >>>>>> David >>>>>> ==============================>>>>>> David F. Robinson, Ph.D. >>>>>> President - Corvid Technologies >>>>>> 704.799.6944 x101 <tel:704.799.6944%20x101> [office] >>>>>> 704.252.1310 <tel:704.252.1310> [cell] >>>>>> 704.799.7974 <tel:704.799.7974> [fax] >>>>>> David.Robinson at corvidtec.com >>>>>> <mailto:David.Robinson at corvidtec.com> >>>>>> http://www.corvidtechnologies.com >>>>>> <http://www.corvidtechnologies.com/> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-devel mailing list >>>>>> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org> >>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel >>>>>> >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-devel mailing list >>>>> Gluster-devel at gluster.org >>>>> http://www.gluster.org/mailman/listinfo/gluster-devel >>>>> >>> > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users