thr3ads.net - Gluster users - [Gluster-users] Is rebalance completely broken on 3.5.3 ? [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Nithya Balachandran

2015-Mar-25 09:09 UTC

[Gluster-users] Is rebalance completely broken on 3.5.3 ?

Hi Alessandro,


I am sorry to hear that you are facing problems with rebalance.

Currently rebalance does not have the information as to how many files exist on
the volume and so cannot calculate/estimate the time it will take to complete.
Improving the rebalance status output to provide that info is on our to-do list
already and we will be working on that.

I have a few questions :

1. Which version of Glusterfs are you using? 
2. How did you stop the rebalance ? I assume you ran "gluster
<volume> rebalance stop" but just wanted confirmation.
3. What file operations were being performed during the rebalance?
4. Can you send the "gluster volume info" output as well as the
gluster log files?

Regards,
Nithya

----- Original Message -----
From: "Alessandro Ipe" <Alessandro.Ipe at meteo.be>
To: gluster-users at gluster.org
Sent: Friday, March 20, 2015 4:52:35 PM
Subject: [Gluster-users] Is rebalance completely broken on 3.5.3 ?



Hi, 





After lauching a "rebalance" on an idle gluster system one week ago,
its status told me it has scanned

more than 23 millions files on each of my 6 bricks. However, without knowing at
least the total files to

be scanned, this status is USELESS from an end-user perspective, because it does
not allow you to

know WHEN the rebalance could eventually complete (one day, one week, one year
or never). From

my point of view, the total files per bricks could be obtained and maintained
when activating quota,

since the whole filesystem has to be crawled... 



After one week being offline and still no clue when the rebalance would
complete, I decided to stop it...

Enormous mistake... It seems that rebalance cannot manage to not screw some
files. Example, on

the only client mounting the gluster system, "ls -la /home/seviri"
returns

ls: cannot access /home/seviri/.forward: Stale NFS file handle 

ls: cannot access /home/seviri/.forward: Stale NFS file handle 

-????????? ? ? ? ? ? .forward 

-????????? ? ? ? ? ? .forward 

while this file could perfectly be accessed before (being rebalanced) and has
not been modifed for at

least 3 years. 



Getting the extended attributes on the various bricks 3, 4, 5, 6 (3-4 replicate,
5-6 replicate)

Brick 3: 

ls -l /data/glusterfs/home/brick?/seviri/.forward 

-rw-r--r-- 2 seviri users 68 May 26 2014
/data/glusterfs/home/brick1/seviri/.forward

-rw-r--r-- 2 seviri users 68 Mar 10 10:22
/data/glusterfs/home/brick2/seviri/.forward



getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward 

# file: data/glusterfs/home/brick1/seviri/.forward 

trusted.afr.home-client-8=0x000000000000000000000000 

trusted.afr.home-client-9=0x000000000000000000000000 

trusted.gfid=0xc1d268beb17443a39d914de917de123a 



# file: data/glusterfs/home/brick2/seviri/.forward 

trusted.afr.home-client-10=0x000000000000000000000000 

trusted.afr.home-client-11=0x000000000000000000000000 

trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce 

trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x0000000000000200

trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x00000001 



Brick 4: 

ls -l /data/glusterfs/home/brick?/seviri/.forward 

-rw-r--r-- 2 seviri users 68 May 26 2014
/data/glusterfs/home/brick1/seviri/.forward

-rw-r--r-- 2 seviri users 68 Mar 10 10:22
/data/glusterfs/home/brick2/seviri/.forward



getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward 

# file: data/glusterfs/home/brick1/seviri/.forward 

trusted.afr.home-client-8=0x000000000000000000000000 

trusted.afr.home-client-9=0x000000000000000000000000 

trusted.gfid=0xc1d268beb17443a39d914de917de123a 



# file: data/glusterfs/home/brick2/seviri/.forward 

trusted.afr.home-client-10=0x000000000000000000000000 

trusted.afr.home-client-11=0x000000000000000000000000 

trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce 

trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x0000000000000200

trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x00000001 



Brick 5: 

ls -l /data/glusterfs/home/brick?/seviri/.forward 

---------T 2 root root 0 Mar 18 08:19
/data/glusterfs/home/brick2/seviri/.forward



getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward 

# file: data/glusterfs/home/brick2/seviri/.forward 

trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce 

trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400 



Brick 6: 

ls -l /data/glusterfs/home/brick?/seviri/.forward 

---------T 2 root root 0 Mar 18 08:19
/data/glusterfs/home/brick2/seviri/.forward



getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward 

# file: data/glusterfs/home/brick2/seviri/.forward 

trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce 

trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400 



Looking at the results from bricks 3 & 4 shows something weird. The file
exists on 2 sub-bricks

storage directories, while it should only be found once on each brick server. Or
is the issue lying in the

results of bricks 5 & 6 ? How can I fix this, please ? By the way, the
split-brain tutorial only covers

BASIC split-brain conditions and not complex (real life) cases like this one. It
would definitely benefit if

enriched by this one. 



More generally, I think the concept of gluster is promising, but if basic
commands (rebalance,

absolutely needed after adding more storage) from its own cli allows to put the
system into an

unstable state, I am really starting to question its ability to be used in a
production environment. And

from an end-user perspective, I do not care about new features added, no matter
how appealing they

could be, if the basic ones are not almost totally reliable. Finally, testing
gluster under high load on the

brick servers (real world conditions) would certainly gives insight to the
developpers on what it failing

and what needs therefore to be fixed to mitigate this and improve gluster
reliability.



Forgive my harsh words/criticisms, but having to struggle with gluster issues
for two weeks now is

getting on my nerves since my colleagues can not use the data stored on it and I
do not see any time

from now when it will be back online. 





Regards, 





Alessandro. 



_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Alessandro Ipe

2015-Mar-25 12:12 UTC

head link

[Gluster-users] Is rebalance completely broken on 3.5.3 ?

Hi Nithya,

Thanks for your reply. I am glad that improving the rebalance status will be 
addressed in the (near) future. For my perspective, if the status is giving 
the total files to be scanned together with the files already scanned, it is 
sufficient information. Indeed, the user would see when it would complete (by 
doing several "gluster volume rebalance status" and computing
differences
according to elapsed time between them).

Please find below the answers to your questions:
1. Server and client are version 3.5.3
2. Indeed, I stopped the rebalance through the associated commdn from CLI, 
i.e. gluster <volume> rebalance stop
3. Very limited file operations were carried out through a single client mount 
(servers were almost idle)
4.gluster volume info :
Volume Name: home
Type: Distributed-Replicate
Volume ID: 501741ed-4146-4022-af0b-41f5b1297766
Status: Started
Number of Bricks: 12 x 2 = 24
Transport-type: tcp
Bricks:
Brick1: tsunami1:/data/glusterfs/home/brick1
Brick2: tsunami2:/data/glusterfs/home/brick1
Brick3: tsunami1:/data/glusterfs/home/brick2
Brick4: tsunami2:/data/glusterfs/home/brick2
Brick5: tsunami1:/data/glusterfs/home/brick3
Brick6: tsunami2:/data/glusterfs/home/brick3
Brick7: tsunami1:/data/glusterfs/home/brick4
Brick8: tsunami2:/data/glusterfs/home/brick4
Brick9: tsunami3:/data/glusterfs/home/brick1
Brick10: tsunami4:/data/glusterfs/home/brick1
Brick11: tsunami3:/data/glusterfs/home/brick2
Brick12: tsunami4:/data/glusterfs/home/brick2
Brick13: tsunami3:/data/glusterfs/home/brick3
Brick14: tsunami4:/data/glusterfs/home/brick3
Brick15: tsunami3:/data/glusterfs/home/brick4
Brick16: tsunami4:/data/glusterfs/home/brick4
Brick17: tsunami5:/data/glusterfs/home/brick1
Brick18: tsunami6:/data/glusterfs/home/brick1
Brick19: tsunami5:/data/glusterfs/home/brick2
Brick20: tsunami6:/data/glusterfs/home/brick2
Brick21: tsunami5:/data/glusterfs/home/brick3
Brick22: tsunami6:/data/glusterfs/home/brick3
Brick23: tsunami5:/data/glusterfs/home/brick4
Brick24: tsunami6:/data/glusterfs/home/brick4
Options Reconfigured:
performance.cache-size: 512MB
performance.io-thread-count: 64
performance.flush-behind: off
performance.write-behind-window-size: 4MB
performance.write-behind: on
nfs.disable: on
features.quota: off
cluster.read-hash-mode: 2
diagnostics.brick-log-level: CRITICAL
cluster.lookup-unhashed: on
server.allow-insecure: on
cluster.ensure-durability: on

For the logs, it will be more difficult because it happened several days ago, 
and they were rotated. But I can dig... By the way, do you need a specific 
logfile, because gluster produces a lot of them...

I read in some discussion on the gluster-users mailing list that rebalance on 
version 3.5.x could leave the system with errors when stopped (or even when 
ran up to its completion ?) and that rebalance had gone a complete rewrite in 
3.6.x.  The issue is that I will put back online gluster next week, so my 
colleagues will definitively put it under high load and I was planning to run 
again the rebalance in the background. However, is it advisable ? Or should I 
wait after upgrading to 3.6.3 ?

I also noticed (currently undergoing a full heal on the volume) that accessing 
to some files on the client returned a "Transport endoint is not
connected"
the first time, but any new access was OK (probably due to self-healing). 
However, it is possible to setup a client or a volume parameter to just wait 
(and make the calling process wait) for the self-healing to complete and 
deliver the file the first time without issuing an error (extremely usefull in 
batch/operational processing) ?

Regards,

Alessandro.

On Wednesday 25 March 2015 05:09:38 Nithya Balachandran
wrote:> Hi Alessandro,
> 
> 
> I am sorry to hear that you are facing problems with rebalance.
> 
> Currently rebalance does not have the information as to how many files
exist
> on the volume and so cannot calculate/estimate the time it will take to
> complete. Improving the rebalance status output to provide that info is on
> our to-do list already and we will be working on that.
> 
> I have a few questions :
> 
> 1. Which version of Glusterfs are you using?
> 2. How did you stop the rebalance ? I assume you ran "gluster
<volume>
> rebalance stop" but just wanted confirmation. 3. What file operations
were
> being performed during the rebalance? 4. Can you send the "gluster
volume
> info" output as well as the gluster log files?
> 
> Regards,
> Nithya
> 
> ----- Original Message -----
> From: "Alessandro Ipe" <Alessandro.Ipe at meteo.be>
> To: gluster-users at gluster.org
> Sent: Friday, March 20, 2015 4:52:35 PM
> Subject: [Gluster-users] Is rebalance completely broken on 3.5.3 ?
> 
> 
> 
> Hi,
> 
> 
> 
> 
> 
> After lauching a "rebalance" on an idle gluster system one week
ago, its
> status told me it has scanned
> 
> more than 23 millions files on each of my 6 bricks. However, without
knowing
> at least the total files to
> 
> be scanned, this status is USELESS from an end-user perspective, because it
> does not allow you to
> 
> know WHEN the rebalance could eventually complete (one day, one week, one
> year or never). From
> 
> my point of view, the total files per bricks could be obtained and
> maintained when activating quota,
> 
> since the whole filesystem has to be crawled...
> 
> 
> 
> After one week being offline and still no clue when the rebalance would
> complete, I decided to stop it...
> 
> Enormous mistake... It seems that rebalance cannot manage to not screw some
> files. Example, on
> 
> the only client mounting the gluster system, "ls -la
/home/seviri" returns
> 
> ls: cannot access /home/seviri/.forward: Stale NFS file handle
> 
> ls: cannot access /home/seviri/.forward: Stale NFS file handle
> 
> -????????? ? ? ? ? ? .forward
> 
> -????????? ? ? ? ? ? .forward
> 
> while this file could perfectly be accessed before (being rebalanced) and
> has not been modifed for at
> 
> least 3 years.
> 
> 
> 
> Getting the extended attributes on the various bricks 3, 4, 5, 6 (3-4
> replicate, 5-6 replicate)
> 
> Brick 3:
> 
> ls -l /data/glusterfs/home/brick?/seviri/.forward
> 
> -rw-r--r-- 2 seviri users 68 May 26 2014
> /data/glusterfs/home/brick1/seviri/.forward
> 
> -rw-r--r-- 2 seviri users 68 Mar 10 10:22
> /data/glusterfs/home/brick2/seviri/.forward
> 
> 
> 
> getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward
> 
> # file: data/glusterfs/home/brick1/seviri/.forward
> 
> trusted.afr.home-client-8=0x000000000000000000000000
> 
> trusted.afr.home-client-9=0x000000000000000000000000
> 
> trusted.gfid=0xc1d268beb17443a39d914de917de123a
> 
> 
> 
> # file: data/glusterfs/home/brick2/seviri/.forward
> 
> trusted.afr.home-client-10=0x000000000000000000000000
> 
> trusted.afr.home-client-11=0x000000000000000000000000
> 
> trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce
> 
>
trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x000000
> 0000000200
> 
> trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x00000001
> 
> 
> 
> Brick 4:
> 
> ls -l /data/glusterfs/home/brick?/seviri/.forward
> 
> -rw-r--r-- 2 seviri users 68 May 26 2014
> /data/glusterfs/home/brick1/seviri/.forward
> 
> -rw-r--r-- 2 seviri users 68 Mar 10 10:22
> /data/glusterfs/home/brick2/seviri/.forward
> 
> 
> 
> getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward
> 
> # file: data/glusterfs/home/brick1/seviri/.forward
> 
> trusted.afr.home-client-8=0x000000000000000000000000
> 
> trusted.afr.home-client-9=0x000000000000000000000000
> 
> trusted.gfid=0xc1d268beb17443a39d914de917de123a
> 
> 
> 
> # file: data/glusterfs/home/brick2/seviri/.forward
> 
> trusted.afr.home-client-10=0x000000000000000000000000
> 
> trusted.afr.home-client-11=0x000000000000000000000000
> 
> trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce
> 
>
trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x000000
> 0000000200
> 
> trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x00000001
> 
> 
> 
> Brick 5:
> 
> ls -l /data/glusterfs/home/brick?/seviri/.forward
> 
> ---------T 2 root root 0 Mar 18 08:19
> /data/glusterfs/home/brick2/seviri/.forward
> 
> 
> 
> getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward
> 
> # file: data/glusterfs/home/brick2/seviri/.forward
> 
> trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce
> 
> trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400
> 
> 
> 
> Brick 6:
> 
> ls -l /data/glusterfs/home/brick?/seviri/.forward
> 
> ---------T 2 root root 0 Mar 18 08:19
> /data/glusterfs/home/brick2/seviri/.forward
> 
> 
> 
> getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward
> 
> # file: data/glusterfs/home/brick2/seviri/.forward
> 
> trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce
> 
> trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400
> 
> 
> 
> Looking at the results from bricks 3 & 4 shows something weird. The
file
> exists on 2 sub-bricks
> 
> storage directories, while it should only be found once on each brick
> server. Or is the issue lying in the
> 
> results of bricks 5 & 6 ? How can I fix this, please ? By the way, the
> split-brain tutorial only covers
> 
> BASIC split-brain conditions and not complex (real life) cases like this
> one. It would definitely benefit if
> 
> enriched by this one.
> 
> 
> 
> More generally, I think the concept of gluster is promising, but if basic
> commands (rebalance,
> 
> absolutely needed after adding more storage) from its own cli allows to put
> the system into an
> 
> unstable state, I am really starting to question its ability to be used in
a
> production environment. And
> 
> from an end-user perspective, I do not care about new features added, no
> matter how appealing they
> 
> could be, if the basic ones are not almost totally reliable. Finally,
> testing gluster under high load on the
> 
> brick servers (real world conditions) would certainly gives insight to the
> developpers on what it failing
> 
> and what needs therefore to be fixed to mitigate this and improve gluster
> reliability.
> 
> 
> 
> Forgive my harsh words/criticisms, but having to struggle with gluster
> issues for two weeks now is
> 
> getting on my nerves since my colleagues can not use the data stored on it
> and I do not see any time
> 
> from now when it will be back online.
> 
> 
> 
> 
> 
> Regards,
> 
> 
> 
> 
> 
> Alessandro.
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-- 

 Dr. Ir. Alessandro Ipe   
 Department of Observations             Tel. +32 2 373 06 31
 Remote Sensing from Space              Fax. +32 2 374 67 88  
 Royal Meteorological Institute  
 Avenue Circulaire 3                    Email:  
 B-1180 Brussels        Belgium         Alessandro.Ipe at meteo.be 
 Web: http://gerb.oma.be

Gluster users - Mar 2015 - Is rebalance completely broken on 3.5.3 ?

[Gluster-users] Is rebalance completely broken on 3.5.3 ?

[Gluster-users] Is rebalance completely broken on 3.5.3 ?