thr3ads.net - Gluster users - [Gluster-users] Hundreds of duplicate files [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Olav Peeters

2015-Feb-20 20:21 UTC

[Gluster-users] Hundreds of duplicate files

Hi,
after waiting a really long time (nearly two days) for a heal and a 
rebalance to finish we are left with the following situation:

- the heal did get rid of some of the empty sticky bit files outside of 
.glusterfs dir (on the root of each brick), but not all

- the duplicates are still there also after doing a rebalance (and 
rebalance fix-layout)

Our cluster is:
Type: Distributed-Replicate (over three nodes)
Number of Bricks: 21 x 2 = 42 (replication set to 2)
Transport-type: tcp

Let's take one file (3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd) as an 
example...
On the 3 nodes where all bricks are formatted as XFS and mounted in 
/export and 272b2366-dfbf-ad47-2a0f-5d5cc40863e3 is the mounting point 
of a NFS shared storage connection from XenServer machines:

[root at gluster01 ~]# find /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ 
-name '300*' -exec ls -la {} \;
-rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
/export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 2 root root 0 Feb 18 00:51 
/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

root at gluster02 ~]# find /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ 
-name '300*' -exec ls -la {} \;
-rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

[root at gluster03 ~]# find /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ 
-name '300*' -exec ls -la {} \;
-rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 2 root root 0 Feb 18 00:51 
/export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

3 files with information, 2 x a 0-bit file with the same name

Checking the 0-bit files:
[root at gluster01 ~]# getfattr -m . -d -e hex 
/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file: 
export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-34=0x000000000000000000000000
trusted.afr.sr_vol01-client-35=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417

[root at gluster03 ~]# getfattr -m . -d -e hex 
/export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file: 
export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-34=0x000000000000000000000000
trusted.afr.sr_vol01-client-35=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417

This is not a glusterfs link file since there is no 
"trusted.glusterfs.dht.linkto", am I correct?

And checking the "good" files:

# file: 
export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-32=0x000000000000000000000000
trusted.afr.sr_vol01-client-33=0x000000000000000000000000
trusted.afr.sr_vol01-client-34=0x000000000000000000000000
trusted.afr.sr_vol01-client-35=0x000000010000000100000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417

[root at gluster02 ~]# getfattr -m . -d -e hex 
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file: 
export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-32=0x000000000000000000000000
trusted.afr.sr_vol01-client-33=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417

[root at gluster03 ~]# getfattr -m . -d -e hex 
/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file: 
export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-40=0x000000000000000000000000
trusted.afr.sr_vol01-client-41=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417



Seen from a client via a glusterfs mount:
[root at client ~]# ls -al 
/mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
-rw-r--r--. 1 root root 0 Feb 18 00:51 
/mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 1 root root 0 Feb 18 00:51 
/mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 1 root root 0 Feb 18 00:51 
/mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd



Via NFS (just after performing a umount and mount the volume again):
[root at client ~]# ls -al /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
-rw-r--r--. 1 root root 44332659200 Feb 17 23:55 
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 1 root root 44332659200 Feb 17 23:55 
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 1 root root 44332659200 Feb 17 23:55 
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

Doing the same list a couple of seconds later:
[root at client ~]# ls -al /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
-rw-r--r--. 1 root root 0 Feb 18 00:51 
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 1 root root 0 Feb 18 00:51 
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 1 root root 0 Feb 18 00:51 
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
And again, and again, and again:
[root at client ~]# ls -al /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
-rw-r--r--. 1 root root 0 Feb 18 00:51 
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 1 root root 0 Feb 18 00:51 
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 1 root root 0 Feb 18 00:51 
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd

This really seems odd. Why do we get to see "real data file" once
only?

It seems more and more that this crazy file duplication (and writing of 
sticky bit files) was actually triggered when rebooting one of the three 
nodes while there still is an active (even when there is no data 
exchange at all) NFS connection, since all 0-bit files (of the non 
Sticky bit type) were either created at 00:51 or 00:41, the exact moment 
one of the three nodes in the cluster were rebooted. This would mean 
that replication currently with GlusterFS creates hardly any redundancy. 
Quiet the opposite, if one of the machines goes down, all of your data 
seriously gets disorganised. I am buzzy configuring a test installation 
to see how this can be best reproduced for a bug report..

Does anyone have a suggestion how to best get rid of the duplicates, or 
rather get this mess organised the way it should be?
This is a cluster with millions of files. A rebalance does not fix the 
issue, neither does a rebalance fix-layout help. Since this is a 
replicated volume all files should be their 2x, not 3x. Can I safely 
just remove all the 0 bit files outside of the .glusterfs directory 
including the sticky bit files?

The empty 0 bit files outside of .glusterfs on every brick I can 
probably safely removed like this:
find /export/* -path */.glusterfs -prune -o -type f -size 0 -perm 1000 
-exec rm {} \;
not?

Thanks!

Cheers,
Olav


On 18/02/15 22:10, Olav Peeters wrote:> Thanks Tom and Joe,
> for the fast response!
>
> Before I started my upgrade I stopped all clients using the volume and 
> stopped all VM's with VHD on the volume, but I guess, and this may be 
> the missing thing to reproduce this in a lab, I did not detach a NFS 
> shared storage mount from a XenServer pool to this volume, since this 
> is an extremely risky business. I also did not stop the volume. This I 
> guess was a bit stupid, but since I did upgrades in the past this way 
> without any issues I skipped this step (a really bad habit). I'll make 
> amends and file a proper bug report :-). I agree with you Joe, this 
> should never happen, even when someone ignores the advice of stopping 
> the volume. If it would also be nessessary to detach shared storage 
> NFS connections to a volume, than franky, glusterfs is unusable in a 
> private cloud. No one can afford downtime of the whole infrastructure 
> just for a glusterfs upgrade. Ideally a replicated gluster volume 
> should even be able to remain online and used during (at least a minor 
> version) upgrade.
>
> I don't know whether a heal was maybe buzzy when I started the 
> upgrade. I forgot to check. I did check the CPU activity on the 
> gluster nodes which were very low (in the 0.0X range via top), so I 
> doubt it. I will add this to the bug report as a suggestion should 
> they not be able to reproduce with an open NFS connection.
>
> By the way, is it sufficient to do:
> service glusterd stop
> service glusterfsd stop
> and do a:
> ps aux | gluster*
> to see if everything has stopped and kill any leftovers should this be 
> necessary?
>
> For the fix, do you agree that if I run e.g.:
> find /export/* -type f -size 0 -perm 1000 -exec /bin/rm {} \;
> on every node if /export is the location of all my bricks, also in a 
> replicated set-up, this will be save?
> No necessary 0bit files will be deleted in e.g. the .glusterfs of 
> every brick?
>
> Thanks for your support!
>
> Cheers,
> Olav
>
>
>
>
>
> On 18/02/15 20:51, Joe Julian wrote:
>>
>> On 02/18/2015 11:43 AM, tbenzvi at 3vgeomatics.com wrote:
>>> Hi Olav,
>>>
>>> I have a hunch that our problem was caused by improper unmounting
of
>>> the gluster volume, and have since found that the proper order 
>>> should be: kill all jobs using volume -> unmount volume on
clients
>>> -> gluster volume stop -> stop gluster service (if necessary)
>>> In my case, I wrote a Python script to find duplicate files on the 
>>> mounted volume, then delete the corresponding link files on the 
>>> bricks (making sure to also delete files in the .glusterfs
directory)
>>> However, your find command was also suggested to me and I think
it's
>>> a simpler solution. I believe removing all link files (even ones 
>>> that are not causing duplicates) is fine since the next file access
>>> gluster will do a lookup on all bricks and recreate any link files 
>>> if necessary. Hopefully a gluster expert can chime in on this point
>>> as I'm not completely sure.
>>
>> You are correct.
>>
>>> Keep in mind your setup is somewhat different than mine as I have 
>>> only 5 bricks with no replication.
>>> Regards,
>>> Tom
>>>
>>>     --------- Original Message ---------
>>>     Subject: Re: [Gluster-users] Hundreds of duplicate files
>>>     From: "Olav Peeters" <opeeters at gmail.com>
>>>     Date: 2/18/15 10:52 am
>>>     To: gluster-users at gluster.org, tbenzvi at 3vgeomatics.com
>>>
>>>     Hi all,
>>>     I'm have this problem after upgrading from 3.5.3 to 3.6.2.
>>>     At the moment I am still waiting for a heal to finish (on a
31TB
>>>     volume with 42 bricks, replicated over three nodes).
>>>
>>>     Tom,
>>>     how did you remove the duplicates?
>>>     with 42 bricks I will not be able to do this manually..
>>>     Did a:
>>>     find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \;
>>>     work for you?
>>>
>>>     Should this type of thing ideally not be checked and mended by
a
>>>     heal?
>>>
>>>     Does anyone have an idea yet how this happens in the first
>>>     place? Can it be connected to upgrading?
>>>
>>>     Cheers,
>>>     Olav
>>>
>>>       
>>>
>>>     On 01/01/15 03:07, tbenzvi at 3vgeomatics.com wrote:
>>>
>>>         No, the files can be read on a newly mounted client! I went
>>>         ahead and deleted all of the link files associated with
>>>         these duplicates, and then remounted the volume. The
problem
>>>         is fixed!
>>>         Thanks again for the help, Joe and Vijay.
>>>         Tom
>>>
>>>             --------- Original Message ---------
>>>             Subject: Re: [Gluster-users] Hundreds of duplicate
files
>>>             From: "Vijay Bellur" <vbellur at
redhat.com>
>>>             Date: 12/28/14 3:23 am
>>>             To: tbenzvi at 3vgeomatics.com, gluster-users at
gluster.org
>>>
>>>             On 12/28/2014 01:20 PM, tbenzvi at 3vgeomatics.com
wrote:
>>>             > Hi Vijay,
>>>             > Yes the files are still readable from the
.glusterfs path.
>>>             > There is no explicit error. However, trying to
read a
>>>             text file in
>>>             > python simply gives me null characters:
>>>             >
>>>             > >>>
open('ott_mf_itab').readlines()
>>>             >
>>>            
['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']
>>>             >
>>>             > And reading binary files does the same
>>>             >
>>>
>>>             Is this behavior seen with a freshly mounted client
too?
>>>
>>>             -Vijay
>>>
>>>             > --------- Original Message ---------
>>>             > Subject: Re: [Gluster-users] Hundreds of duplicate
files
>>>             > From: "Vijay Bellur" <vbellur at
redhat.com>
>>>             > Date: 12/27/14 9:57 pm
>>>             > To: tbenzvi at 3vgeomatics.com, gluster-users at
gluster.org
>>>             >
>>>             > On 12/28/2014 10:13 AM, tbenzvi at 3vgeomatics.com
wrote:
>>>             > > Thanks Joe, I've read your blog post as
well as your
>>>             post
>>>             > regarding the
>>>             > > .glusterfs directory.
>>>             > > I found some unneeded duplicate files which
were not
>>>             being read
>>>             > > properly. I then deleted the link file from
the
>>>             brick. This always
>>>             > > removes the duplicate file from the listing,
but the
>>>             file does not
>>>             > > always become readable. If I also delete the
>>>             associated file in the
>>>             > > .glusterfs directory on that brick, then some
more
>>>             files become
>>>             > > readable. However this solution still
doesn't work
>>>             for all files.
>>>             > > I know the file on the brick is not corrupt
as it
>>>             can be read
>>>             > directly
>>>             > > from the brick directory.
>>>             >
>>>             > For files that are not readable from the client,
can
>>>             you check if the
>>>             > file is readable from the .glusterfs/ path?
>>>             >
>>>             > What is the specific error that is seen while
trying
>>>             to read one such
>>>             > file from the client?
>>>             >
>>>             > Thanks,
>>>             > Vijay
>>>             >
>>>             >
>>>             >
>>>             > _______________________________________________
>>>             > Gluster-users mailing list
>>>             > Gluster-users at gluster.org
>>>             >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>             >
>>>
>>>
>>>
>>>         _______________________________________________
>>>         Gluster-users mailing list
>>>         Gluster-users at gluster.org
>>>         http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150220/59740f17/attachment.html>

Joe Julian

2015-Feb-20 20:51 UTC

head link

[Gluster-users] Hundreds of duplicate files

On 02/20/2015 12:21 PM, Olav Peeters wrote:> Let's take one file (3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd) as an 
> example...
> On the 3 nodes where all bricks are formatted as XFS and mounted in 
> /export and 272b2366-dfbf-ad47-2a0f-5d5cc40863e3 is the mounting point 
> of a NFS shared storage connection from XenServer machines:Did I just read this correctly? Your bricks are NFS mounts? ie, 
GlusterFS Client <-> GlusterFS Server <-> NFS <->
XFS>
> [root at gluster01 ~]# find 
> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec
ls
> -la {} \;
> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
>
/export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
Supposedly, this is the actual file.> -rw-r--r--. 2 root root 0 Feb 18 00:51 
>
/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhdThis is not a linkfile. Note it's mode 0644. How it got there with those 
permissions would be a matter of history and would require information 
that's probably lost.>
> root at gluster02 ~]# find 
> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec
ls
> -la {} \;
> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
>
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>
> [root at gluster03 ~]# find 
> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec
ls
> -la {} \;
> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
>
/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> -rw-r--r--. 2 root root 0 Feb 18 00:51 
>
/export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
Same analysis as above.>
> 3 files with information, 2 x a 0-bit file with the same name
>
> Checking the 0-bit files:
> [root at gluster01 ~]# getfattr -m . -d -e hex 
>
/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> getfattr: Removing leading '/' from absolute path names
> # file: 
>
export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
> trusted.afr.sr_vol01-client-35=0x000000000000000000000000
> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>
> [root at gluster03 ~]# getfattr -m . -d -e hex 
>
/export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> getfattr: Removing leading '/' from absolute path names
> # file: 
>
export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
> trusted.afr.sr_vol01-client-35=0x000000000000000000000000
> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>
> This is not a glusterfs link file since there is no 
> "trusted.glusterfs.dht.linkto", am I correct?
You are correct.>
> And checking the "good" files:
>
> # file: 
>
export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
> trusted.afr.sr_vol01-client-35=0x000000010000000100000000
> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>
> [root at gluster02 ~]# getfattr -m . -d -e hex 
>
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> getfattr: Removing leading '/' from absolute path names
> # file: 
>
export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>
> [root at gluster03 ~]# getfattr -m . -d -e hex 
>
/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> getfattr: Removing leading '/' from absolute path names
> # file: 
>
export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.sr_vol01-client-40=0x000000000000000000000000
> trusted.afr.sr_vol01-client-41=0x000000000000000000000000
> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>
>
>
> Seen from a client via a glusterfs mount:
> [root at client ~]# ls -al 
> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>
/mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>
/mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>
/mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>
>
>
> Via NFS (just after performing a umount and mount the volume again):
> [root at client ~]# ls -al 
> /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55 
>
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55 
>
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55 
>
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>
> Doing the same list a couple of seconds later:
> [root at client ~]# ls -al
/mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> And again, and again, and again:
> [root at client ~]# ls -al
/mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>
/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>
> This really seems odd. Why do we get to see "real data file" once
only?
>
> It seems more and more that this crazy file duplication (and writing 
> of sticky bit files) was actually triggered when rebooting one of the 
> three nodes while there still is an active (even when there is no data 
> exchange at all) NFS connection, since all 0-bit files (of the non 
> Sticky bit type) were either created at 00:51 or 00:41, the exact 
> moment one of the three nodes in the cluster were rebooted. This would 
> mean that replication currently with GlusterFS creates hardly any 
> redundancy. Quiet the opposite, if one of the machines goes down, all 
> of your data seriously gets disorganised. I am buzzy configuring a 
> test installation to see how this can be best reproduced for a bug 
> report..
>
> Does anyone have a suggestion how to best get rid of the duplicates, 
> or rather get this mess organised the way it should be?
> This is a cluster with millions of files. A rebalance does not fix the 
> issue, neither does a rebalance fix-layout help. Since this is a 
> replicated volume all files should be their 2x, not 3x. Can I safely 
> just remove all the 0 bit files outside of the .glusterfs directory 
> including the sticky bit files?
>
> The empty 0 bit files outside of .glusterfs on every brick I can 
> probably safely removed like this:
> find /export/* -path */.glusterfs -prune -o -type f -size 0 -perm 1000 
> -exec rm {} \;
> not?
>
> Thanks!
>
> Cheers,
> Olav
> On 18/02/15 22:10, Olav Peeters wrote:
>> Thanks Tom and Joe,
>> for the fast response!
>>
>> Before I started my upgrade I stopped all clients using the volume 
>> and stopped all VM's with VHD on the volume, but I guess, and this 
>> may be the missing thing to reproduce this in a lab, I did not detach 
>> a NFS shared storage mount from a XenServer pool to this volume, 
>> since this is an extremely risky business. I also did not stop the 
>> volume. This I guess was a bit stupid, but since I did upgrades in 
>> the past this way without any issues I skipped this step (a really 
>> bad habit). I'll make amends and file a proper bug report :-). I 
>> agree with you Joe, this should never happen, even when someone 
>> ignores the advice of stopping the volume. If it would also be 
>> nessessary to detach shared storage NFS connections to a volume, than 
>> franky, glusterfs is unusable in a private cloud. No one can afford 
>> downtime of the whole infrastructure just for a glusterfs upgrade. 
>> Ideally a replicated gluster volume should even be able to remain 
>> online and used during (at least a minor version) upgrade.
>>
>> I don't know whether a heal was maybe buzzy when I started the 
>> upgrade. I forgot to check. I did check the CPU activity on the 
>> gluster nodes which were very low (in the 0.0X range via top), so I 
>> doubt it. I will add this to the bug report as a suggestion should 
>> they not be able to reproduce with an open NFS connection.
>>
>> By the way, is it sufficient to do:
>> service glusterd stop
>> service glusterfsd stop
>> and do a:
>> ps aux | gluster*
>> to see if everything has stopped and kill any leftovers should this 
>> be necessary?
>>
>> For the fix, do you agree that if I run e.g.:
>> find /export/* -type f -size 0 -perm 1000 -exec /bin/rm {} \;
>> on every node if /export is the location of all my bricks, also in a 
>> replicated set-up, this will be save?
>> No necessary 0bit files will be deleted in e.g. the .glusterfs of 
>> every brick?
>>
>> Thanks for your support!
>>
>> Cheers,
>> Olav
>>
>>
>>
>>
>>
>> On 18/02/15 20:51, Joe Julian wrote:
>>>
>>> On 02/18/2015 11:43 AM, tbenzvi at 3vgeomatics.com wrote:
>>>> Hi Olav,
>>>>
>>>> I have a hunch that our problem was caused by improper
unmounting
>>>> of the gluster volume, and have since found that the proper
order
>>>> should be: kill all jobs using volume -> unmount volume on
clients
>>>> -> gluster volume stop -> stop gluster service (if
necessary)
>>>> In my case, I wrote a Python script to find duplicate files on
the
>>>> mounted volume, then delete the corresponding link files on the
>>>> bricks (making sure to also delete files in the .glusterfs
directory)
>>>> However, your find command was also suggested to me and I think
>>>> it's a simpler solution. I believe removing all link files
(even
>>>> ones that are not causing duplicates) is fine since the next
file
>>>> access gluster will do a lookup on all bricks and recreate any
link
>>>> files if necessary. Hopefully a gluster expert can chime in on
this
>>>> point as I'm not completely sure.
>>>
>>> You are correct.
>>>
>>>> Keep in mind your setup is somewhat different than mine as I
have
>>>> only 5 bricks with no replication.
>>>> Regards,
>>>> Tom
>>>>
>>>>     --------- Original Message ---------
>>>>     Subject: Re: [Gluster-users] Hundreds of duplicate files
>>>>     From: "Olav Peeters" <opeeters at
gmail.com>
>>>>     Date: 2/18/15 10:52 am
>>>>     To: gluster-users at gluster.org, tbenzvi at
3vgeomatics.com
>>>>
>>>>     Hi all,
>>>>     I'm have this problem after upgrading from 3.5.3 to
3.6.2.
>>>>     At the moment I am still waiting for a heal to finish (on a
>>>>     31TB volume with 42 bricks, replicated over three nodes).
>>>>
>>>>     Tom,
>>>>     how did you remove the duplicates?
>>>>     with 42 bricks I will not be able to do this manually..
>>>>     Did a:
>>>>     find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm
{} \;
>>>>     work for you?
>>>>
>>>>     Should this type of thing ideally not be checked and mended
by
>>>>     a heal?
>>>>
>>>>     Does anyone have an idea yet how this happens in the first
>>>>     place? Can it be connected to upgrading?
>>>>
>>>>     Cheers,
>>>>     Olav
>>>>
>>>>       
>>>>
>>>>     On 01/01/15 03:07, tbenzvi at 3vgeomatics.com wrote:
>>>>
>>>>         No, the files can be read on a newly mounted client! I
went
>>>>         ahead and deleted all of the link files associated with
>>>>         these duplicates, and then remounted the volume. The
>>>>         problem is fixed!
>>>>         Thanks again for the help, Joe and Vijay.
>>>>         Tom
>>>>
>>>>             --------- Original Message ---------
>>>>             Subject: Re: [Gluster-users] Hundreds of duplicate
files
>>>>             From: "Vijay Bellur" <vbellur at
redhat.com>
>>>>             Date: 12/28/14 3:23 am
>>>>             To: tbenzvi at 3vgeomatics.com, gluster-users at
gluster.org
>>>>
>>>>             On 12/28/2014 01:20 PM, tbenzvi at 3vgeomatics.com
wrote:
>>>>             > Hi Vijay,
>>>>             > Yes the files are still readable from the
.glusterfs
>>>>             path.
>>>>             > There is no explicit error. However, trying to
read a
>>>>             text file in
>>>>             > python simply gives me null characters:
>>>>             >
>>>>             > >>>
open('ott_mf_itab').readlines()
>>>>             >
>>>>            
['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']
>>>>             >
>>>>             > And reading binary files does the same
>>>>             >
>>>>
>>>>             Is this behavior seen with a freshly mounted client
too?
>>>>
>>>>             -Vijay
>>>>
>>>>             > --------- Original Message ---------
>>>>             > Subject: Re: [Gluster-users] Hundreds of
duplicate files
>>>>             > From: "Vijay Bellur" <vbellur at
redhat.com>
>>>>             > Date: 12/27/14 9:57 pm
>>>>             > To: tbenzvi at 3vgeomatics.com, gluster-users
at gluster.org
>>>>             >
>>>>             > On 12/28/2014 10:13 AM, tbenzvi at
3vgeomatics.com wrote:
>>>>             > > Thanks Joe, I've read your blog post
as well as
>>>>             your post
>>>>             > regarding the
>>>>             > > .glusterfs directory.
>>>>             > > I found some unneeded duplicate files
which were
>>>>             not being read
>>>>             > > properly. I then deleted the link file
from the
>>>>             brick. This always
>>>>             > > removes the duplicate file from the
listing, but
>>>>             the file does not
>>>>             > > always become readable. If I also delete
the
>>>>             associated file in the
>>>>             > > .glusterfs directory on that brick, then
some more
>>>>             files become
>>>>             > > readable. However this solution still
doesn't work
>>>>             for all files.
>>>>             > > I know the file on the brick is not
corrupt as it
>>>>             can be read
>>>>             > directly
>>>>             > > from the brick directory.
>>>>             >
>>>>             > For files that are not readable from the
client, can
>>>>             you check if the
>>>>             > file is readable from the .glusterfs/ path?
>>>>             >
>>>>             > What is the specific error that is seen while
trying
>>>>             to read one such
>>>>             > file from the client?
>>>>             >
>>>>             > Thanks,
>>>>             > Vijay
>>>>             >
>>>>             >
>>>>             >
>>>>             >
_______________________________________________
>>>>             > Gluster-users mailing list
>>>>             > Gluster-users at gluster.org
>>>>             >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>             >
>>>>
>>>>
>>>>
>>>>         _______________________________________________
>>>>         Gluster-users mailing list
>>>>         Gluster-users at gluster.org
>>>>         http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150220/d8bf651c/attachment.html>

Gluster users - Feb 2015 - Hundreds of duplicate files

[Gluster-users] Hundreds of duplicate files

[Gluster-users] Hundreds of duplicate files