thr3ads.net - Gluster users - [Gluster-users] gluster 3.2.0

If this information is useful, please help other people find it:
Share via:

Udo Waechter

2011-May-18 12:45 UTC

[Gluster-users] gluster 3.2.0 - totally broken?

Hi there,
after reporting some trouble with group access permissions, 
http://gluster.org/pipermail/gluster-users/2011-May/007619.html (which 
still persist, btw.)

things get worse and worse with each day.

Now, we see a lot of duplicate files (again, only fuse-clients here), 
access permissions are reset on a random and totally annoying basis. 
Files are empty from time to time and become:
-rwxrws--x  1 user1  group2       594 2011-02-04 18:43 preprocessing128.m
-rwxrws--x  1 user1  group2       594 2011-02-04 18:43 preprocessing128.m
-rwxrws--x  1 user2 group2       531 2011-03-03 10:47 result_11.mat
------S--T  1 root     group2         0 2011-04-14 07:57 result_11.mat
-rwxrws--x  1 user1  group2     11069 2010-12-02 14:53 trigger.odt
-rwxrws--x  1 user1  group2     11069 2010-12-02 14:53 trigger.odt

where group2 are secondary groups.

How come that there are these empty and duplicate files? Again, this 
listing is from the fuse-mount

Could it be that version 3.2.0 is totally borked?

Btw.: From time to time, these permissions as well as which duplicate 
files one sees change in a random manner.

I followed various hints on configuring and deconfiguring options Went from:

root at store02:/var/log/glusterfs# gluster volume info store

Volume Name: store
Type: Distributed-Replicate
Status: Started
Number of Bricks: 5 x 2 = 10
Transport-type: tcp
Bricks:
Brick1: store01-i:/srv/store01
Brick2: pvmserv01-i:/srv/store01
Brick3: pvmserv02-i:/srv/store01
Brick4: store02-i:/srv/store03
Brick5: store02-i:/srv/store01
Brick6: store01-i:/srv/store02
Brick7: store02-i:/srv/store04
Brick8: store02-i:/srv/store05
Brick9: store02-i:/srv/store06
Brick10: store02-i:/srv/store02
Options Reconfigured:
nfs.disable: on
auth.allow: 127.0.0.1,10.10.*
performance.cache-size: 1024Mb
performance.write-behind-window-size: 64Mb
performance.io-thread-count: 32
diagnostics.dump-fd-stats: off
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: WARNING
performance.stat-prefetch: off
diagnostics.latency-measurement: off
performance.flush-behind: off
performance.quick-read: disable

to:

Volume Name: store
Type: Distributed-Replicate
Status: Started
Number of Bricks: 5 x 2 = 10
Transport-type: tcp
Bricks:
Brick1: store01-i:/srv/store01
Brick2: pvmserv01-i:/srv/store01
Brick3: pvmserv02-i:/srv/store01
Brick4: store02-i:/srv/store03
Brick5: store02-i:/srv/store01
Brick6: store01-i:/srv/store02
Brick7: store02-i:/srv/store04
Brick8: store02-i:/srv/store05
Brick9: store02-i:/srv/store06
Brick10: store02-i:/srv/store02
Options Reconfigured:
auth.allow: 127.0.0.1,10.10.*

nothing helped.

Currently our only option seems to be to go away from glusterfs to some 
other filesystem which would be a bitter decission.

Thanks for any help,
udo.

-- 
Institute of Cognitive Science - System Administration Team
      Albrechtstrasse 28 - 49076 Osnabrueck - Germany
       Tel: +49-541-969-3362 - Fax: +49-541-969-3361
         https://doc.ikw.uni-osnabrueck.de

Stephan von Krawczynski

2011-May-18 13:44 UTC

head link

[Gluster-users] gluster 3.2.0 - totally broken?

On Wed, 18 May 2011 14:45:19 +0200
Udo Waechter <udo.waechter at uni-osnabrueck.de> wrote:
> Hi there,
> after reporting some trouble with group access permissions, 
> http://gluster.org/pipermail/gluster-users/2011-May/007619.html (which 
> still persist, btw.)
> 
> things get worse and worse with each day.
> [...]
> Currently our only option seems to be to go away from glusterfs to some 
> other filesystem which would be a bitter decission.
> 
> Thanks for any help,
> udo.
Hello Udo,

unfortunately I can only confirm your problems. The last known-to-work version
we see is 2.0.9. Everything beyond is just bogus.
3.X did not solve a single issue but brought quite a lot of new ones instead.
The project only gained featurism but did not solve the very basic problems.
Up to the current day there is no way to see a list of not-synced files on a
replication setup, that is ridiculous. I hope ever since 2.0.9 that someone
does a fork and really attacks the basics. IOW: good idea, pretty bad
implementation, no will to listen or learn.

Regards,
Stephan

> 
> -- 
> Institute of Cognitive Science - System Administration Team
>       Albrechtstrasse 28 - 49076 Osnabrueck - Germany
>        Tel: +49-541-969-3362 - Fax: +49-541-969-3361
>          https://doc.ikw.uni-osnabrueck.de
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>

Whit Blauvelt

2011-May-18 13:54 UTC

head link

[Gluster-users] gluster 3.2.0 - totally broken?

>From reading this list, I wonder if this would be an accurate summary of thecurrent state of Gluster:

3.1.3 - most dependable current version

3.1.4 - gained a few bugs

3.2.0 - not stable

So 3.1.3 would be suitable for production systems, as long as the known bug
in mishandling Posix group permissions is worked around (by loosening
permissions).

There has been a suggestion that stat-prefetch be turned off, and perhaps
that other, non-default options are better not used. 

Now, I'm not personally knowledgeable on any of this aside from the Posix
group problem. Just asking for confirmation or not of the basic sense I'm
getting from those with extensive experience that 3.1.3 is essentially
dependable, while 3.1.4 is problematic, and 3.2.0 should perhaps only be
used if you want to gain familiarity with the new geo-replication feature,
but avoided for current production use.

Whit

Anand Avati

2011-May-18 17:14 UTC

head link

[Gluster-users] gluster 3.2.0 - totally broken?

Udo,
 Do you know what kind of access was performed on those files? Were they
just copied in (via cp), were they rsync'ed over an existing set of data?
Was it data carried over from 3.1 into a 3.2 system? We hate to lose users
(community users or paid customers equally) and will do our best to keep you
happy. Please file a bug report with as much history as possible and we will
have it assigned on priority.

Thanks,
Avati

On Wed, May 18, 2011 at 5:45 AM, Udo Waechter <
udo.waechter at uni-osnabrueck.de> wrote:
> Hi there,
> after reporting some trouble with group access permissions,
> http://gluster.org/pipermail/gluster-users/2011-May/007619.html (which
> still persist, btw.)
>
> things get worse and worse with each day.
>
> Now, we see a lot of duplicate files (again, only fuse-clients here),
> access permissions are reset on a random and totally annoying basis. Files
> are empty from time to time and become:
> -rwxrws--x  1 user1  group2       594 2011-02-04 18:43 preprocessing128.m
> -rwxrws--x  1 user1  group2       594 2011-02-04 18:43 preprocessing128.m
> -rwxrws--x  1 user2 group2       531 2011-03-03 10:47 result_11.mat
> ------S--T  1 root     group2         0 2011-04-14 07:57 result_11.mat
> -rwxrws--x  1 user1  group2     11069 2010-12-02 14:53 trigger.odt
> -rwxrws--x  1 user1  group2     11069 2010-12-02 14:53 trigger.odt
>
> where group2 are secondary groups.
>
> How come that there are these empty and duplicate files? Again, this
> listing is from the fuse-mount
>
> Could it be that version 3.2.0 is totally borked?
>
> Btw.: From time to time, these permissions as well as which duplicate files
> one sees change in a random manner.
>
> I followed various hints on configuring and deconfiguring options Went
> from:
>
> root at store02:/var/log/glusterfs# gluster volume info store
>
> Volume Name: store
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 5 x 2 = 10
> Transport-type: tcp
> Bricks:
> Brick1: store01-i:/srv/store01
> Brick2: pvmserv01-i:/srv/store01
> Brick3: pvmserv02-i:/srv/store01
> Brick4: store02-i:/srv/store03
> Brick5: store02-i:/srv/store01
> Brick6: store01-i:/srv/store02
> Brick7: store02-i:/srv/store04
> Brick8: store02-i:/srv/store05
> Brick9: store02-i:/srv/store06
> Brick10: store02-i:/srv/store02
> Options Reconfigured:
> nfs.disable: on
> auth.allow: 127.0.0.1,10.10.*
> performance.cache-size: 1024Mb
> performance.write-behind-window-size: 64Mb
> performance.io-thread-count: 32
> diagnostics.dump-fd-stats: off
> diagnostics.brick-log-level: WARNING
> diagnostics.client-log-level: WARNING
> performance.stat-prefetch: off
> diagnostics.latency-measurement: off
> performance.flush-behind: off
> performance.quick-read: disable
>
> to:
>
> Volume Name: store
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 5 x 2 = 10
> Transport-type: tcp
> Bricks:
> Brick1: store01-i:/srv/store01
> Brick2: pvmserv01-i:/srv/store01
> Brick3: pvmserv02-i:/srv/store01
> Brick4: store02-i:/srv/store03
> Brick5: store02-i:/srv/store01
> Brick6: store01-i:/srv/store02
> Brick7: store02-i:/srv/store04
> Brick8: store02-i:/srv/store05
> Brick9: store02-i:/srv/store06
> Brick10: store02-i:/srv/store02
> Options Reconfigured:
> auth.allow: 127.0.0.1,10.10.*
>
> nothing helped.
>
> Currently our only option seems to be to go away from glusterfs to some
> other filesystem which would be a bitter decission.
>
> Thanks for any help,
> udo.
>
> --
> Institute of Cognitive Science - System Administration Team
>     Albrechtstrasse 28 - 49076 Osnabrueck - Germany
>      Tel: +49-541-969-3362 - Fax: +49-541-969-3361
>        https://doc.ikw.uni-osnabrueck.de
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110518/0295700e/attachment.html>

Tomasz Chmielewski

2011-May-18 20:09 UTC

head link

[Gluster-users] gluster 3.2.0 - totally broken?

On 18.05.2011 21:34, Anthony J. Biacco wrote:> When you say you removed the config before and added the nodes after, do
> you mean you deleted the volume and recreated it?
Yes.

I think I tried to do this without removing the config first, but 3.1.4 
was complaining upon startup. Also, I've read about config problems with 
3.1.4 -> 3.2.0 updates, so I figured it would be the easiest for me to 
totally remove 3.2.0 and its config and start from scratch with 3.1.4.


-- 
Tomasz Chmielewski
http://wpkg.org

Dan Bretherton

2011-May-19 17:19 UTC

head link

[Gluster-users] gluster 3.2.0 - totally broken?

> Message: 2 Date: Wed, 18 May 2011 19:00:30 +0200 From: Udo Waechter 
> <udo.waechter at uni-osnabrueck.de> Subject: Re: [Gluster-users]
gluster
> 3.2.0 - totally broken? To: Gluster Users <gluster-users at
gluster.org>
> Message-ID: <948199A7-C1EE-42CB-8540-8856000D0C0E at
uni-osnabrueck.de>
> Content-Type: text/plain; charset="windows-1252" On 18.05.2011,
at
> 18:56, Anthony J. Biacco wrote:
>> >  
>> >  I?m actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder
if I?d have any ill-effects on the volume with a simple rpm downgrade and daemon
restart.
> I read somewhere in the docs that you need to reset the volume option
beforehand
>
> gluster volume reset<volname>
>
> good luck. Would be nice to hear if it worked for you.
> --udo.
>
> -- :: udo waechter - root at zoide.net :: N 52?16'30.5" E
8?3'10.1" ::
> genuine input for your ears: http://auriculabovinari.de :: your eyes: 
> http://ezag.zoide.net :: your brain: http://zoide.net -------------- 
> next part -------------- A non-text attachment was scrubbed... Name: 
> smime.p7s Type: application/pkcs7-signature Size: 2427 bytes Desc: not 
> available URL: 
>
<http://gluster.org/pipermail/gluster-users/attachments/20110518/fec670c4/attachment-0001.bin>
>Hello All- A few words of warning about downgrading, after what happened 
to me when I tried it.

I downgraded from 3.2 to 3.1.4, but I am back on 3.2 again now because 
the downgrade broke the rebalancing feature.  I thought this might have 
been due to version 3.2 having done something to the xattrs.  I tried 
downgrading to 3.1.3 and 3.1.2 as well,  but rebalance was also not 
working in those versions, having worked successfully in the past.

I found that the downgrade didn't go as smoothly as the upgrades usually 
do.  After downgrading the RPMs on the servers and restarting glusterd, 
I couldn't mount the volumes, and the client logs were flooded with 
errors like these for each server.

[2011-05-03 18:05:26.563591] E 
[client-handshake.c:1101:client_query_portmap_cbk] 0-atmos-client-1: 
failed to get the port number for remote subvolume
[2011-05-03 18:05:26.564543] I [client.c:1601:client_rpc_notify] 
0-atmos-client-1: disconnected

I didn't need to reset the volumes after downgrading because none of the 
volume files had been created or reset under version 3.2.  Despite that 
I did try doing "gluster volume reset <volname>" for all the
volumes,
but it didn't stop the client log errors or solve the mounting problems.

I desperation I unmounted all the volumes from the clients and shut down 
all the gluster related processes on all the servers.  After waiting a 
few minutes for any locked ports to clear (in case locked ports had been 
causing the problems after the RPM downgrades) I restarted glusterd on 
the servers, and then a few minutes later I was able to mount the 
volumes again.   I discovered that I could no longer rebalance 
(fix-layout or migrate-data) a few days later.

To answer an earlier question, I am using 3.2 in a production 
environment, although in the light of recent discussions in this thread 
I wish I wasn't.  Having said that, my users haven't reported any 
problems nearly a week after the upgrade, so I am hoping that we won't 
be affected by any of the issues that have been causing problems at 
other sites.

-Dan.

Gluster users - May 2011 - gluster 3.2.0 - totally broken?

[Gluster-users] gluster 3.2.0 - totally broken?

[Gluster-users] gluster 3.2.0 - totally broken?

[Gluster-users] gluster 3.2.0 - totally broken?

[Gluster-users] gluster 3.2.0 - totally broken?

[Gluster-users] gluster 3.2.0 - totally broken?

[Gluster-users] gluster 3.2.0 - totally broken?