Thanks Tom and Joe,
for the fast response!
Before I started my upgrade I stopped all clients using the volume and
stopped all VM's with VHD on the volume, but I guess, and this may be
the missing thing to reproduce this in a lab, I did not detach a NFS
shared storage mount from a XenServer pool to this volume, since this is
an extremely risky business. I also did not stop the volume. This I
guess was a bit stupid, but since I did upgrades in the past this way
without any issues I skipped this step (a really bad habit). I'll make
amends and file a proper bug report :-). I agree with you Joe, this
should never happen, even when someone ignores the advice of stopping
the volume. If it would also be nessessary to detach shared storage NFS
connections to a volume, than franky, glusterfs is unusable in a private
cloud. No one can afford downtime of the whole infrastructure just for a
glusterfs upgrade. Ideally a replicated gluster volume should even be
able to remain online and used during (at least a minor version) upgrade.
I don't know whether a heal was maybe buzzy when I started the upgrade.
I forgot to check. I did check the CPU activity on the gluster nodes
which were very low (in the 0.0X range via top), so I doubt it. I will
add this to the bug report as a suggestion should they not be able to
reproduce with an open NFS connection.
By the way, is it sufficient to do:
service glusterd stop
service glusterfsd stop
and do a:
ps aux | gluster*
to see if everything has stopped and kill any leftovers should this be
necessary?
For the fix, do you agree that if I run e.g.:
find /export/* -type f -size 0 -perm 1000 -exec /bin/rm {} \;
on every node if /export is the location of all my bricks, also in a
replicated set-up, this will be save?
No necessary 0bit files will be deleted in e.g. the .glusterfs of every
brick?
Thanks for your support!
Cheers,
Olav
On 18/02/15 20:51, Joe Julian wrote:>
> On 02/18/2015 11:43 AM, tbenzvi at 3vgeomatics.com wrote:
>> Hi Olav,
>>
>> I have a hunch that our problem was caused by improper unmounting of
>> the gluster volume, and have since found that the proper order should
>> be: kill all jobs using volume -> unmount volume on clients ->
>> gluster volume stop -> stop gluster service (if necessary)
>> In my case, I wrote a Python script to find duplicate files on the
>> mounted volume, then delete the corresponding link files on the
>> bricks (making sure to also delete files in the .glusterfs directory)
>> However, your find command was also suggested to me and I think
it's
>> a simpler solution. I believe removing all link files (even ones that
>> are not causing duplicates) is fine since the next file access
>> gluster will do a lookup on all bricks and recreate any link files if
>> necessary. Hopefully a gluster expert can chime in on this point as
>> I'm not completely sure.
>
> You are correct.
>
>> Keep in mind your setup is somewhat different than mine as I have
>> only 5 bricks with no replication.
>> Regards,
>> Tom
>>
>> --------- Original Message ---------
>> Subject: Re: [Gluster-users] Hundreds of duplicate files
>> From: "Olav Peeters" <opeeters at gmail.com>
>> Date: 2/18/15 10:52 am
>> To: gluster-users at gluster.org, tbenzvi at 3vgeomatics.com
>>
>> Hi all,
>> I'm have this problem after upgrading from 3.5.3 to 3.6.2.
>> At the moment I am still waiting for a heal to finish (on a 31TB
>> volume with 42 bricks, replicated over three nodes).
>>
>> Tom,
>> how did you remove the duplicates?
>> with 42 bricks I will not be able to do this manually..
>> Did a:
>> find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \;
>> work for you?
>>
>> Should this type of thing ideally not be checked and mended by a
>> heal?
>>
>> Does anyone have an idea yet how this happens in the first place?
>> Can it be connected to upgrading?
>>
>> Cheers,
>> Olav
>>
>>
>>
>> On 01/01/15 03:07, tbenzvi at 3vgeomatics.com wrote:
>>
>> No, the files can be read on a newly mounted client! I went
>> ahead and deleted all of the link files associated with these
>> duplicates, and then remounted the volume. The problem is
fixed!
>> Thanks again for the help, Joe and Vijay.
>> Tom
>>
>> --------- Original Message ---------
>> Subject: Re: [Gluster-users] Hundreds of duplicate files
>> From: "Vijay Bellur" <vbellur at
redhat.com>
>> Date: 12/28/14 3:23 am
>> To: tbenzvi at 3vgeomatics.com, gluster-users at
gluster.org
>>
>> On 12/28/2014 01:20 PM, tbenzvi at 3vgeomatics.com wrote:
>> > Hi Vijay,
>> > Yes the files are still readable from the .glusterfs
path.
>> > There is no explicit error. However, trying to read a
>> text file in
>> > python simply gives me null characters:
>> >
>> > >>> open('ott_mf_itab').readlines()
>> >
>>
['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']
>> >
>> > And reading binary files does the same
>> >
>>
>> Is this behavior seen with a freshly mounted client too?
>>
>> -Vijay
>>
>> > --------- Original Message ---------
>> > Subject: Re: [Gluster-users] Hundreds of duplicate
files
>> > From: "Vijay Bellur" <vbellur at
redhat.com>
>> > Date: 12/27/14 9:57 pm
>> > To: tbenzvi at 3vgeomatics.com, gluster-users at
gluster.org
>> >
>> > On 12/28/2014 10:13 AM, tbenzvi at 3vgeomatics.com
wrote:
>> > > Thanks Joe, I've read your blog post as well
as your post
>> > regarding the
>> > > .glusterfs directory.
>> > > I found some unneeded duplicate files which were
not
>> being read
>> > > properly. I then deleted the link file from the
>> brick. This always
>> > > removes the duplicate file from the listing, but
the
>> file does not
>> > > always become readable. If I also delete the
>> associated file in the
>> > > .glusterfs directory on that brick, then some
more
>> files become
>> > > readable. However this solution still doesn't
work
>> for all files.
>> > > I know the file on the brick is not corrupt as it
can
>> be read
>> > directly
>> > > from the brick directory.
>> >
>> > For files that are not readable from the client, can
>> you check if the
>> > file is readable from the .glusterfs/ path?
>> >
>> > What is the specific error that is seen while trying
to
>> read one such
>> > file from the client?
>> >
>> > Thanks,
>> > Vijay
>> >
>> >
>> >
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > http://www.gluster.org/mailman/listinfo/gluster-users
>> >
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150218/618b9835/attachment.html>