Is the failure repeatable ? with the same directories ? It's very weird that the directories appear on the volume when you do an 'ls' on the bricks. Could it be that you only made a single 'ls' on fuse mount which not showed the directory ? Is it possible that this 'ls' triggered a self-heal that repaired the problem, whatever it was, and when you did another 'ls' on the fuse mount after the 'ls' on the bricks, the directories were there ? The first 'ls' could have healed the files, causing that the following 'ls' on the bricks showed the files as if nothing were damaged. If that's the case, it's possible that there were some disconnections during the copy. Added Pranith because he knows better replication and self-heal details. Xavi On 02/04/2015 07:23 PM, David F. Robinson wrote:> Distributed/replicated > > Volume Name: homegfs > Type: Distributed-Replicate > Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 > Status: Started > Number of Bricks: 4 x 2 = 8 > Transport-type: tcp > Bricks: > Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs > Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs > Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs > Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs > Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs > Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs > Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs > Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs > Options Reconfigured: > performance.io-thread-count: 32 > performance.cache-size: 128MB > performance.write-behind-window-size: 128MB > server.allow-insecure: on > network.ping-timeout: 10 > storage.owner-gid: 100 > geo-replication.indexing: off > geo-replication.ignore-pid-check: on > changelog.changelog: on > changelog.fsync-interval: 3 > changelog.rollover-time: 15 > server.manage-gids: on > > > ------ Original Message ------ > From: "Xavier Hernandez" <xhernandez at datalab.es> > To: "David F. Robinson" <david.robinson at corvidtec.com>; "Benjamin > Turner" <bennyturns at gmail.com> > Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>; "Gluster > Devel" <gluster-devel at gluster.org> > Sent: 2/4/2015 6:03:45 AM > Subject: Re: [Gluster-devel] missing files > >> On 02/04/2015 01:30 AM, David F. Robinson wrote: >>> Sorry. Thought about this a little more. I should have been clearer. >>> The files were on both bricks of the replica, not just one side. So, >>> both bricks had to have been up... The files/directories just don't show >>> up on the mount. >>> I was reading and saw a related bug >>> (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it >>> suggested to run: >>> find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \; >> >> This command is specific for a dispersed volume. It won't do anything >> (aside from the error you are seeing) on a replicated volume. >> >> I think you are using a replicated volume, right ? >> >> In this case I'm not sure what can be happening. Is your volume a pure >> replicated one or a distributed-replicated ? on a pure replicated it >> doesn't make sense that some entries do not show in an 'ls' when the >> file is in both replicas (at least without any error message in the >> logs). On a distributed-replicated it could be caused by some problem >> while combining contents of each replica set. >> >> What's the configuration of your volume ? >> >> Xavi >> >>> >>> I get a bunch of errors for operation not supported: >>> [root at gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n >>> trusted.ec.heal {} \; >>> find: warning: the -d option is deprecated; please use -depth instead, >>> because the latter is a POSIX-compliant feature. >>> wks_backup/homer_backup/backup: trusted.ec.heal: Operation not supported >>> wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: Operation >>> not supported >>> wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: Operation >>> not supported >>> wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: Operation >>> not supported >>> wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: Operation >>> not supported >>> wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: Operation >>> not supported >>> wks_backup/homer_backup/logs: trusted.ec.heal: Operation not supported >>> wks_backup/homer_backup: trusted.ec.heal: Operation not supported >>> ------ Original Message ------ >>> From: "Benjamin Turner" <bennyturns at gmail.com >>> <mailto:bennyturns at gmail.com>> >>> To: "David F. Robinson" <david.robinson at corvidtec.com >>> <mailto:david.robinson at corvidtec.com>> >>> Cc: "Gluster Devel" <gluster-devel at gluster.org >>> <mailto:gluster-devel at gluster.org>>; "gluster-users at gluster.org" >>> <gluster-users at gluster.org <mailto:gluster-users at gluster.org>> >>> Sent: 2/3/2015 7:12:34 PM >>> Subject: Re: [Gluster-devel] missing files >>>> It sounds to me like the files were only copied to one replica, werent >>>> there for the initial for the initial ls which triggered a self heal, >>>> and were there for the last ls because they were healed. Is there any >>>> chance that one of the replicas was down during the rsync? It could >>>> be that you lost a brick during copy or something like that. To >>>> confirm I would look for disconnects in the brick logs as well as >>>> checking glusterfshd.log to verify the missing files were actually >>>> healed. >>>> >>>> -b >>>> >>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson >>>> <david.robinson at corvidtec.com <mailto:david.robinson at corvidtec.com>> >>>> wrote: >>>> >>>> I rsync'd 20-TB over to my gluster system and noticed that I had >>>> some directories missing even though the rsync completed normally. >>>> The rsync logs showed that the missing files were transferred. >>>> I went to the bricks and did an 'ls -al >>>> /data/brick*/homegfs/dir/*' the files were on the bricks. After I >>>> did this 'ls', the files then showed up on the FUSE mounts. >>>> 1) Why are the files hidden on the fuse mount? >>>> 2) Why does the ls make them show up on the FUSE mount? >>>> 3) How can I prevent this from happening again? >>>> Note, I also mounted the gluster volume using NFS and saw the same >>>> behavior. The files/directories were not shown until I did the >>>> "ls" on the bricks. >>>> David >>>> ==============================>>>> David F. Robinson, Ph.D. >>>> President - Corvid Technologies >>>> 704.799.6944 x101 <tel:704.799.6944%20x101> [office] >>>> 704.252.1310 <tel:704.252.1310> [cell] >>>> 704.799.7974 <tel:704.799.7974> [fax] >>>> David.Robinson at corvidtec.com <mailto:David.Robinson at corvidtec.com> >>>> http://www.corvidtechnologies.com >>>> <http://www.corvidtechnologies.com/> >>>> >>>> _______________________________________________ >>>> Gluster-devel mailing list >>>> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org> >>>> http://www.gluster.org/mailman/listinfo/gluster-devel >>>> >>>> >>> >>> >>> _______________________________________________ >>> Gluster-devel mailing list >>> Gluster-devel at gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-devel >>> >
Pranith Kumar Karampuri
2015-Feb-05 10:18 UTC
[Gluster-users] [Gluster-devel] missing files
I believe David already fixed this. I hope this is the same issue he told about permissions issue. Pranith On 02/05/2015 03:44 PM, Xavier Hernandez wrote:> Is the failure repeatable ? with the same directories ? > > It's very weird that the directories appear on the volume when you do > an 'ls' on the bricks. Could it be that you only made a single 'ls' on > fuse mount which not showed the directory ? Is it possible that this > 'ls' triggered a self-heal that repaired the problem, whatever it was, > and when you did another 'ls' on the fuse mount after the 'ls' on the > bricks, the directories were there ? > > The first 'ls' could have healed the files, causing that the following > 'ls' on the bricks showed the files as if nothing were damaged. If > that's the case, it's possible that there were some disconnections > during the copy. > > Added Pranith because he knows better replication and self-heal details. > > Xavi > > On 02/04/2015 07:23 PM, David F. Robinson wrote: >> Distributed/replicated >> >> Volume Name: homegfs >> Type: Distributed-Replicate >> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 >> Status: Started >> Number of Bricks: 4 x 2 = 8 >> Transport-type: tcp >> Bricks: >> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs >> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs >> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs >> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs >> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs >> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs >> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs >> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs >> Options Reconfigured: >> performance.io-thread-count: 32 >> performance.cache-size: 128MB >> performance.write-behind-window-size: 128MB >> server.allow-insecure: on >> network.ping-timeout: 10 >> storage.owner-gid: 100 >> geo-replication.indexing: off >> geo-replication.ignore-pid-check: on >> changelog.changelog: on >> changelog.fsync-interval: 3 >> changelog.rollover-time: 15 >> server.manage-gids: on >> >> >> ------ Original Message ------ >> From: "Xavier Hernandez" <xhernandez at datalab.es> >> To: "David F. Robinson" <david.robinson at corvidtec.com>; "Benjamin >> Turner" <bennyturns at gmail.com> >> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>; "Gluster >> Devel" <gluster-devel at gluster.org> >> Sent: 2/4/2015 6:03:45 AM >> Subject: Re: [Gluster-devel] missing files >> >>> On 02/04/2015 01:30 AM, David F. Robinson wrote: >>>> Sorry. Thought about this a little more. I should have been clearer. >>>> The files were on both bricks of the replica, not just one side. So, >>>> both bricks had to have been up... The files/directories just don't >>>> show >>>> up on the mount. >>>> I was reading and saw a related bug >>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it >>>> suggested to run: >>>> find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \; >>> >>> This command is specific for a dispersed volume. It won't do anything >>> (aside from the error you are seeing) on a replicated volume. >>> >>> I think you are using a replicated volume, right ? >>> >>> In this case I'm not sure what can be happening. Is your volume a pure >>> replicated one or a distributed-replicated ? on a pure replicated it >>> doesn't make sense that some entries do not show in an 'ls' when the >>> file is in both replicas (at least without any error message in the >>> logs). On a distributed-replicated it could be caused by some problem >>> while combining contents of each replica set. >>> >>> What's the configuration of your volume ? >>> >>> Xavi >>> >>>> >>>> I get a bunch of errors for operation not supported: >>>> [root at gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n >>>> trusted.ec.heal {} \; >>>> find: warning: the -d option is deprecated; please use -depth instead, >>>> because the latter is a POSIX-compliant feature. >>>> wks_backup/homer_backup/backup: trusted.ec.heal: Operation not >>>> supported >>>> wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: >>>> Operation >>>> not supported >>>> wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: >>>> Operation >>>> not supported >>>> wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: >>>> Operation >>>> not supported >>>> wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: >>>> Operation >>>> not supported >>>> wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: >>>> Operation >>>> not supported >>>> wks_backup/homer_backup/logs: trusted.ec.heal: Operation not supported >>>> wks_backup/homer_backup: trusted.ec.heal: Operation not supported >>>> ------ Original Message ------ >>>> From: "Benjamin Turner" <bennyturns at gmail.com >>>> <mailto:bennyturns at gmail.com>> >>>> To: "David F. Robinson" <david.robinson at corvidtec.com >>>> <mailto:david.robinson at corvidtec.com>> >>>> Cc: "Gluster Devel" <gluster-devel at gluster.org >>>> <mailto:gluster-devel at gluster.org>>; "gluster-users at gluster.org" >>>> <gluster-users at gluster.org <mailto:gluster-users at gluster.org>> >>>> Sent: 2/3/2015 7:12:34 PM >>>> Subject: Re: [Gluster-devel] missing files >>>>> It sounds to me like the files were only copied to one replica, >>>>> werent >>>>> there for the initial for the initial ls which triggered a self heal, >>>>> and were there for the last ls because they were healed. Is there any >>>>> chance that one of the replicas was down during the rsync? It could >>>>> be that you lost a brick during copy or something like that. To >>>>> confirm I would look for disconnects in the brick logs as well as >>>>> checking glusterfshd.log to verify the missing files were actually >>>>> healed. >>>>> >>>>> -b >>>>> >>>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson >>>>> <david.robinson at corvidtec.com <mailto:david.robinson at corvidtec.com>> >>>>> wrote: >>>>> >>>>> I rsync'd 20-TB over to my gluster system and noticed that I had >>>>> some directories missing even though the rsync completed >>>>> normally. >>>>> The rsync logs showed that the missing files were transferred. >>>>> I went to the bricks and did an 'ls -al >>>>> /data/brick*/homegfs/dir/*' the files were on the bricks. After I >>>>> did this 'ls', the files then showed up on the FUSE mounts. >>>>> 1) Why are the files hidden on the fuse mount? >>>>> 2) Why does the ls make them show up on the FUSE mount? >>>>> 3) How can I prevent this from happening again? >>>>> Note, I also mounted the gluster volume using NFS and saw the >>>>> same >>>>> behavior. The files/directories were not shown until I did the >>>>> "ls" on the bricks. >>>>> David >>>>> ==============================>>>>> David F. Robinson, Ph.D. >>>>> President - Corvid Technologies >>>>> 704.799.6944 x101 <tel:704.799.6944%20x101> [office] >>>>> 704.252.1310 <tel:704.252.1310> [cell] >>>>> 704.799.7974 <tel:704.799.7974> [fax] >>>>> David.Robinson at corvidtec.com >>>>> <mailto:David.Robinson at corvidtec.com> >>>>> http://www.corvidtechnologies.com >>>>> <http://www.corvidtechnologies.com/> >>>>> >>>>> _______________________________________________ >>>>> Gluster-devel mailing list >>>>> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org> >>>>> http://www.gluster.org/mailman/listinfo/gluster-devel >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Gluster-devel mailing list >>>> Gluster-devel at gluster.org >>>> http://www.gluster.org/mailman/listinfo/gluster-devel >>>> >>
Not repeatable. Once it shows up, it stays there. I sent some other strange behavior I am seeing to Pranith earlier this evening. Attached below... David Another issue I am having that might be related is that I cannot delete some directories. It complains that the directories are not empty. But when I list them out, there is nothing there. However, if I know of the name of the directory, I can cd into it and see the files. [root at gfs01a Phase_1_SOCOM14-003_adv_armor]# pwd /homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor [root at gfs01a Phase_1_SOCOM14-003_adv_armor]# ls -al total 0 drwxrws--x 7 root root 449 Feb 4 18:12 . drwxrwx--- 3 root root 200 Feb 4 18:19 .. drwxrws--- 3 root root 41 Feb 4 18:12 References drwxrws--x 4 root root 54 Feb 4 18:12 Testing drwxrws--- 4 root root 51 Feb 4 18:12 Velodyne drwxrws--x 4 root root 38 Feb 4 18:12 progress_reports [root at gfs01a Phase_1_SOCOM14-003_adv_armor]# rm -rf * rm: cannot remove `References': Directory not empty rm: cannot remove `Testing': Directory not empty rm: cannot remove `Velodyne': Directory not empty rm: cannot remove `progress_reports/pr2': Directory not empty rm: cannot remove `progress_reports/pr3': Directory not empty [root at gfs01a Phase_1_SOCOM14-003_adv_armor]# ls -alR total 0 drwxrws--x 6 root root 449 Feb 4 18:12 . drwxrwx--- 3 root root 200 Feb 4 18:19 .. drwxrws--- 3 root root 41 Feb 4 18:12 References *** Note that there is nothing in this References directory. drwxrws--x 4 root root 54 Feb 4 18:12 Testing drwxrws--- 4 root root 51 Feb 4 18:12 Velodyne drwxrws--x 4 root root 38 Feb 4 18:12 progress_reports However, from the bricks (see listings below), there are other directories that are not shown. For example, the References directory contains the USSOCOM_OPAQUE_ARMOR directory on the brick, but it doesn't show up on the volume. [root at gfs01a USSOCOM_OPAQUE_ARMOR]# pwd /homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor [root at gfs01a Phase_1_SOCOM14-003_adv_armor]# cd References/ [root at gfs01a References]# ls -al *** There is nothing shown in the References directory total 0 drwxrws--- 3 root root 133 Feb 4 18:12 . drwxrws--x 7 root root 449 Feb 4 18:12 .. [root at gfs01a References]# cd USSOCOM_OPAQUE_ARMOR *** From the brick listing, I knew the directory name. Even though it isn't shown, I can cd to it and see the files. [root at gfs01a USSOCOM_OPAQUE_ARMOR]# ls -al total 6787 drwxrws--- 2 streadway sbir 244 Feb 5 21:28 . drwxrws--- 3 root root 164 Feb 5 21:28 .. -rwxrw---- 1 streadway sbir 42440 Jun 19 2014 ARMOR PACKAGES.one -rwxrw---- 1 streadway sbir 17248 Jun 19 2014 COMPARISON OF SOLUTIONS.one -rwxrw---- 1 streadway sbir 38184 Jun 19 2014 CURRENT STANDARD ARMORING.one -rwxrw---- 1 sgilbert sbir 2974120 Jan 22 09:15 FEASABILITY STUDY.docx -rwxrw---- 1 streadway sbir 3826704 Jan 21 14:57 FEASABILITY STUDY.one -rwxrw---- 1 streadway sbir 49736 Jan 21 13:18 GIVEN TRADE SPACE.one The recursive file listed (ls -alR) from each of the bricks shows that there are files/directories that do not show up on the /homegfs volume. [root at gfs01a Phase_1_SOCOM14-003_adv_armor]# ls -alR /data/brick0*/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References /data/brick01a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 75 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick01a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 6648 drwxrws--- 2 streadway sbir 75 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw---- 2 sgilbert sbir 2974120 Jan 22 09:15 FEASABILITY STUDY.docx -rwxrw---- 2 streadway sbir 3826704 Jan 21 14:57 FEASABILITY STUDY.one /data/brick02a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 2 root root 10 Feb 4 18:12 . drwxrws--x 6 root root 95 Feb 4 18:12 .. [root at gfs01b ~]# ls -alR /data/brick0*/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References /data/brick01b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 75 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick01b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 6648 drwxrws--- 2 streadway sbir 75 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw---- 2 sgilbert sbir 2974120 Jan 22 09:15 FEASABILITY STUDY.docx -rwxrw---- 2 streadway sbir 3826704 Jan 21 14:57 FEASABILITY STUDY.one /data/brick02b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 2 root root 10 Feb 4 18:12 . drwxrws--x 6 root root 95 Feb 4 18:12 .. [root at gfs02a ~]# ls -alR /data/brick0*/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References /data/brick01a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 80 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick01a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 72 drwxrws--- 2 streadway sbir 80 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw---- 2 streadway sbir 17248 Jun 19 2014 COMPARISON OF SOLUTIONS.one -rwxrw---- 2 streadway sbir 49736 Jan 21 13:18 GIVEN TRADE SPACE.one /data/brick02a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 79 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick02a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 84 drwxrws--- 2 streadway sbir 79 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw---- 2 streadway sbir 42440 Jun 19 2014 ARMOR PACKAGES.one -rwxrw---- 2 streadway sbir 38184 Jun 19 2014 CURRENT STANDARD ARMORING.one [root at gfs02b ~]# ls -alR /data/brick0*/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References /data/brick01b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 80 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick01b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 72 drwxrws--- 2 streadway sbir 80 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw---- 2 streadway sbir 17248 Jun 19 2014 COMPARISON OF SOLUTIONS.one -rwxrw---- 2 streadway sbir 49736 Jan 21 13:18 GIVEN TRADE SPACE.one /data/brick02b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 79 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick02b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 84 drwxrws--- 2 streadway sbir 79 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw---- 2 streadway sbir 42440 Jun 19 2014 ARMOR PACKAGES.one -rwxrw---- 2 streadway sbir 38184 Jun 19 2014 CURRENT STANDARD ARMORING.one ------ Original Message ------ From: "Xavier Hernandez" <xhernandez at datalab.es> To: "David F. Robinson" <david.robinson at corvidtec.com>; "Benjamin Turner" <bennyturns at gmail.com>; "Pranith Kumar Karampuri" <pkarampu at redhat.com> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>; "Gluster Devel" <gluster-devel at gluster.org> Sent: 2/5/2015 5:14:22 AM Subject: Re: [Gluster-devel] missing files>Is the failure repeatable ? with the same directories ? > >It's very weird that the directories appear on the volume when you do >an 'ls' on the bricks. Could it be that you only made a single 'ls' on >fuse mount which not showed the directory ? Is it possible that this >'ls' triggered a self-heal that repaired the problem, whatever it was, >and when you did another 'ls' on the fuse mount after the 'ls' on the >bricks, the directories were there ? > >The first 'ls' could have healed the files, causing that the following >'ls' on the bricks showed the files as if nothing were damaged. If >that's the case, it's possible that there were some disconnections >during the copy. > >Added Pranith because he knows better replication and self-heal >details. > >Xavi > >On 02/04/2015 07:23 PM, David F. Robinson wrote: >>Distributed/replicated >> >>Volume Name: homegfs >>Type: Distributed-Replicate >>Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 >>Status: Started >>Number of Bricks: 4 x 2 = 8 >>Transport-type: tcp >>Bricks: >>Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs >>Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs >>Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs >>Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs >>Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs >>Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs >>Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs >>Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs >>Options Reconfigured: >>performance.io-thread-count: 32 >>performance.cache-size: 128MB >>performance.write-behind-window-size: 128MB >>server.allow-insecure: on >>network.ping-timeout: 10 >>storage.owner-gid: 100 >>geo-replication.indexing: off >>geo-replication.ignore-pid-check: on >>changelog.changelog: on >>changelog.fsync-interval: 3 >>changelog.rollover-time: 15 >>server.manage-gids: on >> >> >>------ Original Message ------ >>From: "Xavier Hernandez" <xhernandez at datalab.es> >>To: "David F. Robinson" <david.robinson at corvidtec.com>; "Benjamin >>Turner" <bennyturns at gmail.com> >>Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>; "Gluster >>Devel" <gluster-devel at gluster.org> >>Sent: 2/4/2015 6:03:45 AM >>Subject: Re: [Gluster-devel] missing files >> >>>On 02/04/2015 01:30 AM, David F. Robinson wrote: >>>>Sorry. Thought about this a little more. I should have been clearer. >>>>The files were on both bricks of the replica, not just one side. So, >>>>both bricks had to have been up... The files/directories just don't >>>>show >>>>up on the mount. >>>>I was reading and saw a related bug >>>>(https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it >>>>suggested to run: >>>> find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \; >>> >>>This command is specific for a dispersed volume. It won't do anything >>>(aside from the error you are seeing) on a replicated volume. >>> >>>I think you are using a replicated volume, right ? >>> >>>In this case I'm not sure what can be happening. Is your volume a >>>pure >>>replicated one or a distributed-replicated ? on a pure replicated it >>>doesn't make sense that some entries do not show in an 'ls' when the >>>file is in both replicas (at least without any error message in the >>>logs). On a distributed-replicated it could be caused by some problem >>>while combining contents of each replica set. >>> >>>What's the configuration of your volume ? >>> >>>Xavi >>> >>>> >>>>I get a bunch of errors for operation not supported: >>>>[root at gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n >>>>trusted.ec.heal {} \; >>>>find: warning: the -d option is deprecated; please use -depth >>>>instead, >>>>because the latter is a POSIX-compliant feature. >>>>wks_backup/homer_backup/backup: trusted.ec.heal: Operation not >>>>supported >>>>wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: >>>>Operation >>>>not supported >>>>wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: >>>>Operation >>>>not supported >>>>wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: >>>>Operation >>>>not supported >>>>wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: >>>>Operation >>>>not supported >>>>wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: >>>>Operation >>>>not supported >>>>wks_backup/homer_backup/logs: trusted.ec.heal: Operation not >>>>supported >>>>wks_backup/homer_backup: trusted.ec.heal: Operation not supported >>>>------ Original Message ------ >>>>From: "Benjamin Turner" <bennyturns at gmail.com >>>><mailto:bennyturns at gmail.com>> >>>>To: "David F. Robinson" <david.robinson at corvidtec.com >>>><mailto:david.robinson at corvidtec.com>> >>>>Cc: "Gluster Devel" <gluster-devel at gluster.org >>>><mailto:gluster-devel at gluster.org>>; "gluster-users at gluster.org" >>>><gluster-users at gluster.org <mailto:gluster-users at gluster.org>> >>>>Sent: 2/3/2015 7:12:34 PM >>>>Subject: Re: [Gluster-devel] missing files >>>>>It sounds to me like the files were only copied to one replica, >>>>>werent >>>>>there for the initial for the initial ls which triggered a self >>>>>heal, >>>>>and were there for the last ls because they were healed. Is there >>>>>any >>>>>chance that one of the replicas was down during the rsync? It could >>>>>be that you lost a brick during copy or something like that. To >>>>>confirm I would look for disconnects in the brick logs as well as >>>>>checking glusterfshd.log to verify the missing files were actually >>>>>healed. >>>>> >>>>>-b >>>>> >>>>>On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson >>>>><david.robinson at corvidtec.com >>>>><mailto:david.robinson at corvidtec.com>> >>>>>wrote: >>>>> >>>>> I rsync'd 20-TB over to my gluster system and noticed that I >>>>>had >>>>> some directories missing even though the rsync completed >>>>>normally. >>>>> The rsync logs showed that the missing files were transferred. >>>>> I went to the bricks and did an 'ls -al >>>>> /data/brick*/homegfs/dir/*' the files were on the bricks. After >>>>>I >>>>> did this 'ls', the files then showed up on the FUSE mounts. >>>>> 1) Why are the files hidden on the fuse mount? >>>>> 2) Why does the ls make them show up on the FUSE mount? >>>>> 3) How can I prevent this from happening again? >>>>> Note, I also mounted the gluster volume using NFS and saw the >>>>>same >>>>> behavior. The files/directories were not shown until I did the >>>>> "ls" on the bricks. >>>>> David >>>>> ==============================>>>>> David F. Robinson, Ph.D. >>>>> President - Corvid Technologies >>>>> 704.799.6944 x101 <tel:704.799.6944%20x101> [office] >>>>> 704.252.1310 <tel:704.252.1310> [cell] >>>>> 704.799.7974 <tel:704.799.7974> [fax] >>>>> David.Robinson at corvidtec.com >>>>><mailto:David.Robinson at corvidtec.com> >>>>> http://www.corvidtechnologies.com >>>>><http://www.corvidtechnologies.com/> >>>>> >>>>> _______________________________________________ >>>>> Gluster-devel mailing list >>>>> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org> >>>>> http://www.gluster.org/mailman/listinfo/gluster-devel >>>>> >>>>> >>>> >>>> >>>>_______________________________________________ >>>>Gluster-devel mailing list >>>>Gluster-devel at gluster.org >>>>http://www.gluster.org/mailman/listinfo/gluster-devel >>>> >>