Hi there, after reporting some trouble with group access permissions, http://gluster.org/pipermail/gluster-users/2011-May/007619.html (which still persist, btw.) things get worse and worse with each day. Now, we see a lot of duplicate files (again, only fuse-clients here), access permissions are reset on a random and totally annoying basis. Files are empty from time to time and become: -rwxrws--x 1 user1 group2 594 2011-02-04 18:43 preprocessing128.m -rwxrws--x 1 user1 group2 594 2011-02-04 18:43 preprocessing128.m -rwxrws--x 1 user2 group2 531 2011-03-03 10:47 result_11.mat ------S--T 1 root group2 0 2011-04-14 07:57 result_11.mat -rwxrws--x 1 user1 group2 11069 2010-12-02 14:53 trigger.odt -rwxrws--x 1 user1 group2 11069 2010-12-02 14:53 trigger.odt where group2 are secondary groups. How come that there are these empty and duplicate files? Again, this listing is from the fuse-mount Could it be that version 3.2.0 is totally borked? Btw.: From time to time, these permissions as well as which duplicate files one sees change in a random manner. I followed various hints on configuring and deconfiguring options Went from: root at store02:/var/log/glusterfs# gluster volume info store Volume Name: store Type: Distributed-Replicate Status: Started Number of Bricks: 5 x 2 = 10 Transport-type: tcp Bricks: Brick1: store01-i:/srv/store01 Brick2: pvmserv01-i:/srv/store01 Brick3: pvmserv02-i:/srv/store01 Brick4: store02-i:/srv/store03 Brick5: store02-i:/srv/store01 Brick6: store01-i:/srv/store02 Brick7: store02-i:/srv/store04 Brick8: store02-i:/srv/store05 Brick9: store02-i:/srv/store06 Brick10: store02-i:/srv/store02 Options Reconfigured: nfs.disable: on auth.allow: 127.0.0.1,10.10.* performance.cache-size: 1024Mb performance.write-behind-window-size: 64Mb performance.io-thread-count: 32 diagnostics.dump-fd-stats: off diagnostics.brick-log-level: WARNING diagnostics.client-log-level: WARNING performance.stat-prefetch: off diagnostics.latency-measurement: off performance.flush-behind: off performance.quick-read: disable to: Volume Name: store Type: Distributed-Replicate Status: Started Number of Bricks: 5 x 2 = 10 Transport-type: tcp Bricks: Brick1: store01-i:/srv/store01 Brick2: pvmserv01-i:/srv/store01 Brick3: pvmserv02-i:/srv/store01 Brick4: store02-i:/srv/store03 Brick5: store02-i:/srv/store01 Brick6: store01-i:/srv/store02 Brick7: store02-i:/srv/store04 Brick8: store02-i:/srv/store05 Brick9: store02-i:/srv/store06 Brick10: store02-i:/srv/store02 Options Reconfigured: auth.allow: 127.0.0.1,10.10.* nothing helped. Currently our only option seems to be to go away from glusterfs to some other filesystem which would be a bitter decission. Thanks for any help, udo. -- Institute of Cognitive Science - System Administration Team Albrechtstrasse 28 - 49076 Osnabrueck - Germany Tel: +49-541-969-3362 - Fax: +49-541-969-3361 https://doc.ikw.uni-osnabrueck.de
Stephan von Krawczynski
2011-May-18 13:44 UTC
[Gluster-users] gluster 3.2.0 - totally broken?
On Wed, 18 May 2011 14:45:19 +0200 Udo Waechter <udo.waechter at uni-osnabrueck.de> wrote:> Hi there, > after reporting some trouble with group access permissions, > http://gluster.org/pipermail/gluster-users/2011-May/007619.html (which > still persist, btw.) > > things get worse and worse with each day. > [...] > Currently our only option seems to be to go away from glusterfs to some > other filesystem which would be a bitter decission. > > Thanks for any help, > udo.Hello Udo, unfortunately I can only confirm your problems. The last known-to-work version we see is 2.0.9. Everything beyond is just bogus. 3.X did not solve a single issue but brought quite a lot of new ones instead. The project only gained featurism but did not solve the very basic problems. Up to the current day there is no way to see a list of not-synced files on a replication setup, that is ridiculous. I hope ever since 2.0.9 that someone does a fork and really attacks the basics. IOW: good idea, pretty bad implementation, no will to listen or learn. Regards, Stephan> > -- > Institute of Cognitive Science - System Administration Team > Albrechtstrasse 28 - 49076 Osnabrueck - Germany > Tel: +49-541-969-3362 - Fax: +49-541-969-3361 > https://doc.ikw.uni-osnabrueck.de > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >
>From reading this list, I wonder if this would be an accurate summary of thecurrent state of Gluster: 3.1.3 - most dependable current version 3.1.4 - gained a few bugs 3.2.0 - not stable So 3.1.3 would be suitable for production systems, as long as the known bug in mishandling Posix group permissions is worked around (by loosening permissions). There has been a suggestion that stat-prefetch be turned off, and perhaps that other, non-default options are better not used. Now, I'm not personally knowledgeable on any of this aside from the Posix group problem. Just asking for confirmation or not of the basic sense I'm getting from those with extensive experience that 3.1.3 is essentially dependable, while 3.1.4 is problematic, and 3.2.0 should perhaps only be used if you want to gain familiarity with the new geo-replication feature, but avoided for current production use. Whit
Udo, Do you know what kind of access was performed on those files? Were they just copied in (via cp), were they rsync'ed over an existing set of data? Was it data carried over from 3.1 into a 3.2 system? We hate to lose users (community users or paid customers equally) and will do our best to keep you happy. Please file a bug report with as much history as possible and we will have it assigned on priority. Thanks, Avati On Wed, May 18, 2011 at 5:45 AM, Udo Waechter < udo.waechter at uni-osnabrueck.de> wrote:> Hi there, > after reporting some trouble with group access permissions, > http://gluster.org/pipermail/gluster-users/2011-May/007619.html (which > still persist, btw.) > > things get worse and worse with each day. > > Now, we see a lot of duplicate files (again, only fuse-clients here), > access permissions are reset on a random and totally annoying basis. Files > are empty from time to time and become: > -rwxrws--x 1 user1 group2 594 2011-02-04 18:43 preprocessing128.m > -rwxrws--x 1 user1 group2 594 2011-02-04 18:43 preprocessing128.m > -rwxrws--x 1 user2 group2 531 2011-03-03 10:47 result_11.mat > ------S--T 1 root group2 0 2011-04-14 07:57 result_11.mat > -rwxrws--x 1 user1 group2 11069 2010-12-02 14:53 trigger.odt > -rwxrws--x 1 user1 group2 11069 2010-12-02 14:53 trigger.odt > > where group2 are secondary groups. > > How come that there are these empty and duplicate files? Again, this > listing is from the fuse-mount > > Could it be that version 3.2.0 is totally borked? > > Btw.: From time to time, these permissions as well as which duplicate files > one sees change in a random manner. > > I followed various hints on configuring and deconfiguring options Went > from: > > root at store02:/var/log/glusterfs# gluster volume info store > > Volume Name: store > Type: Distributed-Replicate > Status: Started > Number of Bricks: 5 x 2 = 10 > Transport-type: tcp > Bricks: > Brick1: store01-i:/srv/store01 > Brick2: pvmserv01-i:/srv/store01 > Brick3: pvmserv02-i:/srv/store01 > Brick4: store02-i:/srv/store03 > Brick5: store02-i:/srv/store01 > Brick6: store01-i:/srv/store02 > Brick7: store02-i:/srv/store04 > Brick8: store02-i:/srv/store05 > Brick9: store02-i:/srv/store06 > Brick10: store02-i:/srv/store02 > Options Reconfigured: > nfs.disable: on > auth.allow: 127.0.0.1,10.10.* > performance.cache-size: 1024Mb > performance.write-behind-window-size: 64Mb > performance.io-thread-count: 32 > diagnostics.dump-fd-stats: off > diagnostics.brick-log-level: WARNING > diagnostics.client-log-level: WARNING > performance.stat-prefetch: off > diagnostics.latency-measurement: off > performance.flush-behind: off > performance.quick-read: disable > > to: > > Volume Name: store > Type: Distributed-Replicate > Status: Started > Number of Bricks: 5 x 2 = 10 > Transport-type: tcp > Bricks: > Brick1: store01-i:/srv/store01 > Brick2: pvmserv01-i:/srv/store01 > Brick3: pvmserv02-i:/srv/store01 > Brick4: store02-i:/srv/store03 > Brick5: store02-i:/srv/store01 > Brick6: store01-i:/srv/store02 > Brick7: store02-i:/srv/store04 > Brick8: store02-i:/srv/store05 > Brick9: store02-i:/srv/store06 > Brick10: store02-i:/srv/store02 > Options Reconfigured: > auth.allow: 127.0.0.1,10.10.* > > nothing helped. > > Currently our only option seems to be to go away from glusterfs to some > other filesystem which would be a bitter decission. > > Thanks for any help, > udo. > > -- > Institute of Cognitive Science - System Administration Team > Albrechtstrasse 28 - 49076 Osnabrueck - Germany > Tel: +49-541-969-3362 - Fax: +49-541-969-3361 > https://doc.ikw.uni-osnabrueck.de > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110518/0295700e/attachment.html>
On 18.05.2011 21:34, Anthony J. Biacco wrote:> When you say you removed the config before and added the nodes after, do > you mean you deleted the volume and recreated it?Yes. I think I tried to do this without removing the config first, but 3.1.4 was complaining upon startup. Also, I've read about config problems with 3.1.4 -> 3.2.0 updates, so I figured it would be the easiest for me to totally remove 3.2.0 and its config and start from scratch with 3.1.4. -- Tomasz Chmielewski http://wpkg.org
> Message: 2 Date: Wed, 18 May 2011 19:00:30 +0200 From: Udo Waechter > <udo.waechter at uni-osnabrueck.de> Subject: Re: [Gluster-users] gluster > 3.2.0 - totally broken? To: Gluster Users <gluster-users at gluster.org> > Message-ID: <948199A7-C1EE-42CB-8540-8856000D0C0E at uni-osnabrueck.de> > Content-Type: text/plain; charset="windows-1252" On 18.05.2011, at > 18:56, Anthony J. Biacco wrote: >> > >> > I?m actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder if I?d have any ill-effects on the volume with a simple rpm downgrade and daemon restart. > I read somewhere in the docs that you need to reset the volume option beforehand > > gluster volume reset<volname> > > good luck. Would be nice to hear if it worked for you. > --udo. > > -- :: udo waechter - root at zoide.net :: N 52?16'30.5" E 8?3'10.1" :: > genuine input for your ears: http://auriculabovinari.de :: your eyes: > http://ezag.zoide.net :: your brain: http://zoide.net -------------- > next part -------------- A non-text attachment was scrubbed... Name: > smime.p7s Type: application/pkcs7-signature Size: 2427 bytes Desc: not > available URL: > <http://gluster.org/pipermail/gluster-users/attachments/20110518/fec670c4/attachment-0001.bin> >Hello All- A few words of warning about downgrading, after what happened to me when I tried it. I downgraded from 3.2 to 3.1.4, but I am back on 3.2 again now because the downgrade broke the rebalancing feature. I thought this might have been due to version 3.2 having done something to the xattrs. I tried downgrading to 3.1.3 and 3.1.2 as well, but rebalance was also not working in those versions, having worked successfully in the past. I found that the downgrade didn't go as smoothly as the upgrades usually do. After downgrading the RPMs on the servers and restarting glusterd, I couldn't mount the volumes, and the client logs were flooded with errors like these for each server. [2011-05-03 18:05:26.563591] E [client-handshake.c:1101:client_query_portmap_cbk] 0-atmos-client-1: failed to get the port number for remote subvolume [2011-05-03 18:05:26.564543] I [client.c:1601:client_rpc_notify] 0-atmos-client-1: disconnected I didn't need to reset the volumes after downgrading because none of the volume files had been created or reset under version 3.2. Despite that I did try doing "gluster volume reset <volname>" for all the volumes, but it didn't stop the client log errors or solve the mounting problems. I desperation I unmounted all the volumes from the clients and shut down all the gluster related processes on all the servers. After waiting a few minutes for any locked ports to clear (in case locked ports had been causing the problems after the RPM downgrades) I restarted glusterd on the servers, and then a few minutes later I was able to mount the volumes again. I discovered that I could no longer rebalance (fix-layout or migrate-data) a few days later. To answer an earlier question, I am using 3.2 in a production environment, although in the light of recent discussions in this thread I wish I wasn't. Having said that, my users haven't reported any problems nearly a week after the upgrade, so I am hoping that we won't be affected by any of the issues that have been causing problems at other sites. -Dan.