thr3ads.net - Gluster users - [Gluster-users] selfheal operation takes infinite to complete [Oct 2018]

If this information is useful, please help other people find it:
Share via:

hsafe

2018-Oct-21 04:35 UTC

[Gluster-users] selfheal operation takes infinite to complete

Hello all gluster community,

I am in a scenario unmatched for the past year of using glusterfs in a 2 
replica set on glusterfs 3.10.12 servers where they are the storage back 
of my application which saves small images into them.

Now the problem I face and unique for the time is that whenever we were 
asynced or one server went down; bringing the other one will start the 
self heal and eventually we could see the clustered volume in sync, but 
now if I run the volume heal info the list of the gfid does not even 
finish after couple of hours. if I look at the heal log I can see that 
the process is ongoing but it a very small scale and speed!

My question is how can I expect it finished and how can I speed it up there?

Here is a bit of info:

Status of volume: gv1
Gluster process???????????????????????????? TCP Port? RDMA Port Online? Pid
------------------------------------------------------------------------------
Brick IMG-01:/images/storage/brick1???????? 49152???? 0 Y?????? 4176
Brick IMG-02:/images/storage/brick1???????? 49152???? 0 Y?????? 4095
Self-heal Daemon on localhost?????????????? N/A?????? N/A Y?????? 4067
Self-heal Daemon on IMG-01????????????????? N/A?????? N/A Y?????? 4146

Task Status of Volume gv1
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: gv2
Gluster process???????????????????????????? TCP Port? RDMA Port Online? Pid
------------------------------------------------------------------------------
Brick IMG-01:/data/brick2?????????????????? 49153???? 0 Y?????? 4185
Brick IMG-02:/data/brick2?????????????????? 49153???? 0 Y?????? 4104
NFS Server on localhost???????????????????? N/A?????? N/A N?????? N/A
Self-heal Daemon on localhost?????????????? N/A?????? N/A Y?????? 4067
NFS Server on IMG-01??????????????????????? N/A?????? N/A N?????? N/A
Self-heal Daemon on IMG-01????????????????? N/A?????? N/A Y?????? 4146

Task Status of Volume gv2
------------------------------------------------------------------------------
There are no active volume tasks



gluster> peer status
Number of Peers: 1

Hostname: IMG-01
Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
State: Peer in Cluster (Connected)

Hostname: IMG-01
Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
State: Peer in Cluster (Connected)
gluster> exit
root at NAS02:/var/log/glusterfs# gluster volume gv1 info
unrecognized word: gv1 (position 1)
root at NAS02:/var/log/glusterfs# gluster volume info

Volume Name: gv1
Type: Replicate
Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: IMG-01:/images/storage/brick1
Brick2: IMG-02:/images/storage/brick1
Options Reconfigured:
server.event-threads: 4
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.lookup-optimize: on
cluster.shd-max-threads: 4
cluster.readdir-optimize: on
performance.md-cache-timeout: 30
cluster.background-self-heal-count: 32
server.statedump-path: /tmp
performance.readdir-ahead: on
nfs.disable: true
network.inode-lru-limit: 50000
features.bitrot: off
features.scrub: Inactive
performance.cache-max-file-size: 16MB
client.event-threads: 8
cluster.eager-lock: on
cluster.self-heal-daemon: enable


Please do help me out...Thanks



-- 
Hamid Safe
www.devopt.net
+989361491768

hsafe

2018-Oct-23 09:30 UTC

head link

[Gluster-users] selfheal operation takes infinite to complete

Hello all,

Can somebody please respond to this? as of now if I run "gluster volume 
heal gv1 info"

there is infinite number of lines of gfid which never ends...usually and 
in stable scenario this ended with some numbers and status but currently 
it never finishes...is it a bad sign ? is it a loop? are there any 
actions required to do beside gluster?

Appreciate any help...

On 10/21/18 8:05 AM, hsafe wrote:> Hello all gluster community,
>
> I am in a scenario unmatched for the past year of using glusterfs in a 
> 2 replica set on glusterfs 3.10.12 servers where they are the storage 
> back of my application which saves small images into them.
>
> Now the problem I face and unique for the time is that whenever we 
> were asynced or one server went down; bringing the other one will 
> start the self heal and eventually we could see the clustered volume 
> in sync, but now if I run the volume heal info the list of the gfid 
> does not even finish after couple of hours. if I look at the heal log 
> I can see that the process is ongoing but it a very small scale and 
> speed!
>
> My question is how can I expect it finished and how can I speed it up 
> there?
>
> Here is a bit of info:
>
> Status of volume: gv1
> Gluster process???????????????????????????? TCP Port? RDMA Port 
> Online? Pid
>
------------------------------------------------------------------------------
>
> Brick IMG-01:/images/storage/brick1???????? 49152???? 0 Y 4176
> Brick IMG-02:/images/storage/brick1???????? 49152???? 0 Y 4095
> Self-heal Daemon on localhost?????????????? N/A?????? N/A Y 4067
> Self-heal Daemon on IMG-01????????????????? N/A?????? N/A Y 4146
>
> Task Status of Volume gv1
>
------------------------------------------------------------------------------
>
> There are no active volume tasks
>
> Status of volume: gv2
> Gluster process???????????????????????????? TCP Port? RDMA Port 
> Online? Pid
>
------------------------------------------------------------------------------
>
> Brick IMG-01:/data/brick2?????????????????? 49153???? 0 Y 4185
> Brick IMG-02:/data/brick2?????????????????? 49153???? 0 Y 4104
> NFS Server on localhost???????????????????? N/A?????? N/A N N/A
> Self-heal Daemon on localhost?????????????? N/A?????? N/A Y 4067
> NFS Server on IMG-01??????????????????????? N/A?????? N/A N N/A
> Self-heal Daemon on IMG-01????????????????? N/A?????? N/A Y 4146
>
> Task Status of Volume gv2
>
------------------------------------------------------------------------------
>
> There are no active volume tasks
>
>
>
> gluster> peer status
> Number of Peers: 1
>
> Hostname: IMG-01
> Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
> State: Peer in Cluster (Connected)
>
> Hostname: IMG-01
> Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
> State: Peer in Cluster (Connected)
> gluster> exit
> root at NAS02:/var/log/glusterfs# gluster volume gv1 info
> unrecognized word: gv1 (position 1)
> root at NAS02:/var/log/glusterfs# gluster volume info
>
> Volume Name: gv1
> Type: Replicate
> Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: IMG-01:/images/storage/brick1
> Brick2: IMG-02:/images/storage/brick1
> Options Reconfigured:
> server.event-threads: 4
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> cluster.lookup-optimize: on
> cluster.shd-max-threads: 4
> cluster.readdir-optimize: on
> performance.md-cache-timeout: 30
> cluster.background-self-heal-count: 32
> server.statedump-path: /tmp
> performance.readdir-ahead: on
> nfs.disable: true
> network.inode-lru-limit: 50000
> features.bitrot: off
> features.scrub: Inactive
> performance.cache-max-file-size: 16MB
> client.event-threads: 8
> cluster.eager-lock: on
> cluster.self-heal-daemon: enable
>
>
> Please do help me out...Thanks
>
>
>-- 
Hamid Safe
www.devopt.net
+989361491768

Gluster users - Oct 2018 - selfheal operation takes infinite to complete

[Gluster-users] selfheal operation takes infinite to complete

[Gluster-users] selfheal operation takes infinite to complete