Ravishankar N
2017-Jul-22 06:13 UTC
[Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements
On 07/21/2017 11:41 PM, yayo (j) wrote:> Hi, > > Sorry for follow up again, but, checking the ovirt interface I've > found that ovirt report the "engine" volume as an "arbiter" > configuration and the "data" volume as full replicated volume. Check > these screenshots:This is probably some refresh bug in the UI, Sahina might be able to tell you.> > https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFfVmR5aDQ?usp=sharing > > But the "gluster volume info" command report that all 2 volume are > full replicated: > > > /Volume Name: data/ > /Type: Replicate/ > /Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d/ > /Status: Started/ > /Snapshot Count: 0/ > /Number of Bricks: 1 x 3 = 3/ > /Transport-type: tcp/ > /Bricks:/ > /Brick1: gdnode01:/gluster/data/brick/ > /Brick2: gdnode02:/gluster/data/brick/ > /Brick3: gdnode04:/gluster/data/brick/ > /Options Reconfigured:/ > /nfs.disable: on/ > /performance.readdir-ahead: on/ > /transport.address-family: inet/ > /storage.owner-uid: 36/ > /performance.quick-read: off/ > /performance.read-ahead: off/ > /performance.io-cache: off/ > /performance.stat-prefetch: off/ > /performance.low-prio-threads: 32/ > /network.remote-dio: enable/ > /cluster.eager-lock: enable/ > /cluster.quorum-type: auto/ > /cluster.server-quorum-type: server/ > /cluster.data-self-heal-algorithm: full/ > /cluster.locking-scheme: granular/ > /cluster.shd-max-threads: 8/ > /cluster.shd-wait-qlength: 10000/ > /features.shard: on/ > /user.cifs: off/ > /storage.owner-gid: 36/ > /features.shard-block-size: 512MB/ > /network.ping-timeout: 30/ > /performance.strict-o-direct: on/ > /cluster.granular-entry-heal: on/ > /auth.allow: */ > /server.allow-insecure: on/ > > > > > > /Volume Name: engine/ > /Type: Replicate/ > /Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515/ > /Status: Started/ > /Snapshot Count: 0/ > /Number of Bricks: 1 x 3 = 3/ > /Transport-type: tcp/ > /Bricks:/ > /Brick1: gdnode01:/gluster/engine/brick/ > /Brick2: gdnode02:/gluster/engine/brick/ > /Brick3: gdnode04:/gluster/engine/brick/ > /Options Reconfigured:/ > /nfs.disable: on/ > /performance.readdir-ahead: on/ > /transport.address-family: inet/ > /storage.owner-uid: 36/ > /performance.quick-read: off/ > /performance.read-ahead: off/ > /performance.io-cache: off/ > /performance.stat-prefetch: off/ > /performance.low-prio-threads: 32/ > /network.remote-dio: off/ > /cluster.eager-lock: enable/ > /cluster.quorum-type: auto/ > /cluster.server-quorum-type: server/ > /cluster.data-self-heal-algorithm: full/ > /cluster.locking-scheme: granular/ > /cluster.shd-max-threads: 8/ > /cluster.shd-wait-qlength: 10000/ > /features.shard: on/ > /user.cifs: off/ > /storage.owner-gid: 36/ > /features.shard-block-size: 512MB/ > /network.ping-timeout: 30/ > /performance.strict-o-direct: on/ > /cluster.granular-entry-heal: on/ > /auth.allow: */ > > server.allow-insecure: on > > > 2017-07-21 19:13 GMT+02:00 yayo (j) <jaganz at gmail.com > <mailto:jaganz at gmail.com>>: > > 2017-07-20 14:48 GMT+02:00 Ravishankar N <ravishankar at redhat.com > <mailto:ravishankar at redhat.com>>: > > > But it does say something. All these gfids of completed heals > in the log below are the for the ones that you have given the > getfattr output of. So what is likely happening is there is an > intermittent connection problem between your mount and the > brick process, leading to pending heals again after the heal > gets completed, which is why the numbers are varying each > time. You would need to check why that is the case. > Hope this helps, > Ravi > > >> >> /[2017-07-20 09:58:46.573079] I [MSGID: 108026] >> [afr-self-heal-common.c:1254:afr_log_selfheal] >> 0-engine-replicate-0: Completed data selfheal on >> e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1 sinks=2/ >> /[2017-07-20 09:59:22.995003] I [MSGID: 108026] >> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] >> 0-engine-replicate-0: performing metadata selfheal on >> f05b9742-2771-484a-85fc-5b6974bcef81/ >> /[2017-07-20 09:59:22.999372] I [MSGID: 108026] >> [afr-self-heal-common.c:1254:afr_log_selfheal] >> 0-engine-replicate-0: Completed metadata selfheal on >> f05b9742-2771-484a-85fc-5b6974bcef81. sources=[0] 1 sinks=2/ >> > > > Hi, > > following your suggestion, I've checked the "peer" status and I > found that there is too many name for the hosts, I don't know if > this can be the problem or part of it: > > /*gluster peer status on NODE01:*/ > /Number of Peers: 2/ > / > / > /Hostname: dnode02.localdomain.local/ > /Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd/ > /State: Peer in Cluster (Connected)/ > /Other names:/ > /192.168.10.52/ > /dnode02.localdomain.local/ > /10.10.20.90/ > /10.10.10.20/ > / > / > / > / > / > / > / > / > */gluster peer status on //NODE02:/* > /Number of Peers: 2/ > / > / > /Hostname: dnode01.localdomain.local/ > /Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12/ > /State: Peer in Cluster (Connected)/ > /Other names:/ > /gdnode01/ > /10.10.10.10/ > / > / > /Hostname: gdnode04/ > /Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828/ > /State: Peer in Cluster (Connected)/ > /Other names:/ > /192.168.10.54/ > /10.10.10.40/ > / > / > /* > */ > */gluster peer status on //NODE04:/* > /Number of Peers: 2/ > / > / > /Hostname: dnode02.neridom.dom/ > /Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd/ > /State: Peer in Cluster (Connected)/ > /Other names:/ > /10.10.20.90/ > /gdnode02/ > /192.168.10.52/ > /10.10.10.20/ > / > / > /Hostname: dnode01.localdomain.local/ > /Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12/ > /State: Peer in Cluster (Connected)/ > /Other names:/ > /gdnode01/ > /10.10.10.10/ > > / > / > / > / > All these ip are pingable and hosts resolvible across all 3 nodes > but, only the 10.10.10.0 network is the decidated network for > gluster (rosolved using gdnode* host names) ... You think that > remove other entries can fix the problem? So, sorry, but, how can > I remove other entries? >I don't think having extra entries could be a problem. Did you check the fuse mount logs for disconnect messages that I referred to in the other email?> > > And, what about the selinux? >Not sure about this. See if there are disconnect messages in the mount logs first. -Ravi> > > Thank you > > > > > > -- > Linux User: 369739 http://counter.li.org-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170722/d43218ed/attachment.html>
Kasturi Narra
2017-Jul-24 06:12 UTC
[Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements
Hi, Regarding the UI showing incorrect information about engine and data volumes, can you please refresh the UI and see if the issue persists plus any errors in the engine.log files ? Thanks kasturi On Sat, Jul 22, 2017 at 11:43 AM, Ravishankar N <ravishankar at redhat.com> wrote:> > On 07/21/2017 11:41 PM, yayo (j) wrote: > > Hi, > > Sorry for follow up again, but, checking the ovirt interface I've found > that ovirt report the "engine" volume as an "arbiter" configuration and the > "data" volume as full replicated volume. Check these screenshots: > > > This is probably some refresh bug in the UI, Sahina might be able to tell > you. > > > https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFfVmR5aDQ? > usp=sharing > > But the "gluster volume info" command report that all 2 volume are full > replicated: > > > *Volume Name: data* > *Type: Replicate* > *Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d* > *Status: Started* > *Snapshot Count: 0* > *Number of Bricks: 1 x 3 = 3* > *Transport-type: tcp* > *Bricks:* > *Brick1: gdnode01:/gluster/data/brick* > *Brick2: gdnode02:/gluster/data/brick* > *Brick3: gdnode04:/gluster/data/brick* > *Options Reconfigured:* > *nfs.disable: on* > *performance.readdir-ahead: on* > *transport.address-family: inet* > *storage.owner-uid: 36* > *performance.quick-read: off* > *performance.read-ahead: off* > *performance.io-cache: off* > *performance.stat-prefetch: off* > *performance.low-prio-threads: 32* > *network.remote-dio: enable* > *cluster.eager-lock: enable* > *cluster.quorum-type: auto* > *cluster.server-quorum-type: server* > *cluster.data-self-heal-algorithm: full* > *cluster.locking-scheme: granular* > *cluster.shd-max-threads: 8* > *cluster.shd-wait-qlength: 10000* > *features.shard: on* > *user.cifs: off* > *storage.owner-gid: 36* > *features.shard-block-size: 512MB* > *network.ping-timeout: 30* > *performance.strict-o-direct: on* > *cluster.granular-entry-heal: on* > *auth.allow: ** > *server.allow-insecure: on* > > > > > > *Volume Name: engine* > *Type: Replicate* > *Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515* > *Status: Started* > *Snapshot Count: 0* > *Number of Bricks: 1 x 3 = 3* > *Transport-type: tcp* > *Bricks:* > *Brick1: gdnode01:/gluster/engine/brick* > *Brick2: gdnode02:/gluster/engine/brick* > *Brick3: gdnode04:/gluster/engine/brick* > *Options Reconfigured:* > *nfs.disable: on* > *performance.readdir-ahead: on* > *transport.address-family: inet* > *storage.owner-uid: 36* > *performance.quick-read: off* > *performance.read-ahead: off* > *performance.io-cache: off* > *performance.stat-prefetch: off* > *performance.low-prio-threads: 32* > *network.remote-dio: off* > *cluster.eager-lock: enable* > *cluster.quorum-type: auto* > *cluster.server-quorum-type: server* > *cluster.data-self-heal-algorithm: full* > *cluster.locking-scheme: granular* > *cluster.shd-max-threads: 8* > *cluster.shd-wait-qlength: 10000* > *features.shard: on* > *user.cifs: off* > *storage.owner-gid: 36* > *features.shard-block-size: 512MB* > *network.ping-timeout: 30* > *performance.strict-o-direct: on* > *cluster.granular-entry-heal: on* > *auth.allow: ** > > server.allow-insecure: on > > > 2017-07-21 19:13 GMT+02:00 yayo (j) <jaganz at gmail.com>: > >> 2017-07-20 14:48 GMT+02:00 Ravishankar N <ravishankar at redhat.com>: >> >>> >>> But it does say something. All these gfids of completed heals in the >>> log below are the for the ones that you have given the getfattr output of. >>> So what is likely happening is there is an intermittent connection problem >>> between your mount and the brick process, leading to pending heals again >>> after the heal gets completed, which is why the numbers are varying each >>> time. You would need to check why that is the case. >>> Hope this helps, >>> Ravi >>> >>> >>> >>> *[2017-07-20 09:58:46.573079] I [MSGID: 108026] >>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: >>> Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327. >>> sources=[0] 1 sinks=2* >>> *[2017-07-20 09:59:22.995003] I [MSGID: 108026] >>> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] >>> 0-engine-replicate-0: performing metadata selfheal on >>> f05b9742-2771-484a-85fc-5b6974bcef81* >>> *[2017-07-20 09:59:22.999372] I [MSGID: 108026] >>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: >>> Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81. >>> sources=[0] 1 sinks=2* >>> >>> >> >> Hi, >> >> following your suggestion, I've checked the "peer" status and I found >> that there is too many name for the hosts, I don't know if this can be the >> problem or part of it: >> >> *gluster peer status on NODE01:* >> *Number of Peers: 2* >> >> *Hostname: dnode02.localdomain.local* >> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd* >> *State: Peer in Cluster (Connected)* >> *Other names:* >> *192.168.10.52* >> *dnode02.localdomain.local* >> *10.10.20.90* >> *10.10.10.20* >> >> >> >> >> *gluster peer status on NODE02:* >> *Number of Peers: 2* >> >> *Hostname: dnode01.localdomain.local* >> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12* >> *State: Peer in Cluster (Connected)* >> *Other names:* >> *gdnode01* >> *10.10.10.10* >> >> *Hostname: gdnode04* >> *Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828* >> *State: Peer in Cluster (Connected)* >> *Other names:* >> *192.168.10.54* >> *10.10.10.40* >> >> >> *gluster peer status on NODE04:* >> *Number of Peers: 2* >> >> *Hostname: dnode02.neridom.dom* >> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd* >> *State: Peer in Cluster (Connected)* >> *Other names:* >> *10.10.20.90* >> *gdnode02* >> *192.168.10.52* >> *10.10.10.20* >> >> *Hostname: dnode01.localdomain.local* >> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12* >> *State: Peer in Cluster (Connected)* >> *Other names:* >> *gdnode01* >> *10.10.10.10* >> >> >> >> All these ip are pingable and hosts resolvible across all 3 nodes but, >> only the 10.10.10.0 network is the decidated network for gluster (rosolved >> using gdnode* host names) ... You think that remove other entries can fix >> the problem? So, sorry, but, how can I remove other entries? >> > I don't think having extra entries could be a problem. Did you check the > fuse mount logs for disconnect messages that I referred to in the other > email? > > >> And, what about the selinux? >> > Not sure about this. See if there are disconnect messages in the mount > logs first. > -Ravi > > >> Thank you >> >> >> > > > -- > Linux User: 369739 http://counter.li.org > > > > _______________________________________________ > Users mailing list > Users at ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170724/0825957d/attachment.html>
yayo (j)
2017-Jul-24 14:00 UTC
[Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements
Hi, UI refreshed but problem still remain ... No specific error, I've only these errors but I've read that there is no problem if I have this kind of errors: 2017-07-24 15:53:59,823+02 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2) [b7590c4] START, GlusterServersListVDSCommand(HostName = node01.localdomain.local, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 29a62417 2017-07-24 15:54:01,066+02 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2) [b7590c4] FINISH, GlusterServersListVDSCommand, return: [10.10.20.80/24:CONNECTED, node02.localdomain.local:CONNECTED, gdnode04:CONNECTED], log id: 29a62417 2017-07-24 15:54:01,076+02 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2) [b7590c4] START, GlusterVolumesListVDSCommand(HostName = node01.localdomain.local, GlusterVolumesListVDSParameters:{runAsync='true', hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 7fce25d3 2017-07-24 15:54:02,209+02 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4] Could not associate brick 'gdnode01:/gluster/engine/brick' of volume 'd19c19e3-910d-437b-8ba7- 4f2a23d17515' with correct network as no gluster network found in cluster '00000002-0002-0002-0002-00000000017a' 2017-07-24 15:54:02,212+02 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4] Could not associate brick 'gdnode02:/gluster/engine/brick' of volume 'd19c19e3-910d-437b-8ba7- 4f2a23d17515' with correct network as no gluster network found in cluster '00000002-0002-0002-0002-00000000017a' 2017-07-24 15:54:02,215+02 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4] Could not associate brick 'gdnode04:/gluster/engine/brick' of volume 'd19c19e3-910d-437b-8ba7- 4f2a23d17515' with correct network as no gluster network found in cluster '00000002-0002-0002-0002-00000000017a' 2017-07-24 15:54:02,218+02 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4] Could not associate brick 'gdnode01:/gluster/data/brick' of volume 'c7a5dfc9-3e72-4ea1-843e- c8275d4a7c2d' with correct network as no gluster network found in cluster '00000002-0002-0002-0002-00000000017a' 2017-07-24 15:54:02,221+02 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4] Could not associate brick 'gdnode02:/gluster/data/brick' of volume 'c7a5dfc9-3e72-4ea1-843e- c8275d4a7c2d' with correct network as no gluster network found in cluster '00000002-0002-0002-0002-00000000017a' 2017-07-24 15:54:02,224+02 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4] Could not associate brick 'gdnode04:/gluster/data/brick' of volume 'c7a5dfc9-3e72-4ea1-843e- c8275d4a7c2d' with correct network as no gluster network found in cluster '00000002-0002-0002-0002-00000000017a' 2017-07-24 15:54:02,224+02 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2) [b7590c4] FINISH, GlusterVolumesListVDSCommand, return: {d19c19e3-910d-437b-8ba7-4f2a23d17515=org.ovirt.engine.core. common.businessentities.gluster.GlusterVolumeEntity at fdc91062, c7a5dfc9-3e72 -4ea1-843e-c8275d4a7c2d=org.ovirt.engine.core.common.businessentities. gluster.GlusterVolumeEntity at 999a6f23}, log id: 7fce25d3 Thank you 2017-07-24 8:12 GMT+02:00 Kasturi Narra <knarra at redhat.com>:> Hi, > > Regarding the UI showing incorrect information about engine and data > volumes, can you please refresh the UI and see if the issue persists plus > any errors in the engine.log files ? > > Thanks > kasturi > > On Sat, Jul 22, 2017 at 11:43 AM, Ravishankar N <ravishankar at redhat.com> > wrote: > >> >> On 07/21/2017 11:41 PM, yayo (j) wrote: >> >> Hi, >> >> Sorry for follow up again, but, checking the ovirt interface I've found >> that ovirt report the "engine" volume as an "arbiter" configuration and the >> "data" volume as full replicated volume. Check these screenshots: >> >> >> This is probably some refresh bug in the UI, Sahina might be able to tell >> you. >> >> >> https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFf >> VmR5aDQ?usp=sharing >> >> But the "gluster volume info" command report that all 2 volume are full >> replicated: >> >> >> *Volume Name: data* >> *Type: Replicate* >> *Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d* >> *Status: Started* >> *Snapshot Count: 0* >> *Number of Bricks: 1 x 3 = 3* >> *Transport-type: tcp* >> *Bricks:* >> *Brick1: gdnode01:/gluster/data/brick* >> *Brick2: gdnode02:/gluster/data/brick* >> *Brick3: gdnode04:/gluster/data/brick* >> *Options Reconfigured:* >> *nfs.disable: on* >> *performance.readdir-ahead: on* >> *transport.address-family: inet* >> *storage.owner-uid: 36* >> *performance.quick-read: off* >> *performance.read-ahead: off* >> *performance.io-cache: off* >> *performance.stat-prefetch: off* >> *performance.low-prio-threads: 32* >> *network.remote-dio: enable* >> *cluster.eager-lock: enable* >> *cluster.quorum-type: auto* >> *cluster.server-quorum-type: server* >> *cluster.data-self-heal-algorithm: full* >> *cluster.locking-scheme: granular* >> *cluster.shd-max-threads: 8* >> *cluster.shd-wait-qlength: 10000* >> *features.shard: on* >> *user.cifs: off* >> *storage.owner-gid: 36* >> *features.shard-block-size: 512MB* >> *network.ping-timeout: 30* >> *performance.strict-o-direct: on* >> *cluster.granular-entry-heal: on* >> *auth.allow: ** >> *server.allow-insecure: on* >> >> >> >> >> >> *Volume Name: engine* >> *Type: Replicate* >> *Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515* >> *Status: Started* >> *Snapshot Count: 0* >> *Number of Bricks: 1 x 3 = 3* >> *Transport-type: tcp* >> *Bricks:* >> *Brick1: gdnode01:/gluster/engine/brick* >> *Brick2: gdnode02:/gluster/engine/brick* >> *Brick3: gdnode04:/gluster/engine/brick* >> *Options Reconfigured:* >> *nfs.disable: on* >> *performance.readdir-ahead: on* >> *transport.address-family: inet* >> *storage.owner-uid: 36* >> *performance.quick-read: off* >> *performance.read-ahead: off* >> *performance.io-cache: off* >> *performance.stat-prefetch: off* >> *performance.low-prio-threads: 32* >> *network.remote-dio: off* >> *cluster.eager-lock: enable* >> *cluster.quorum-type: auto* >> *cluster.server-quorum-type: server* >> *cluster.data-self-heal-algorithm: full* >> *cluster.locking-scheme: granular* >> *cluster.shd-max-threads: 8* >> *cluster.shd-wait-qlength: 10000* >> *features.shard: on* >> *user.cifs: off* >> *storage.owner-gid: 36* >> *features.shard-block-size: 512MB* >> *network.ping-timeout: 30* >> *performance.strict-o-direct: on* >> *cluster.granular-entry-heal: on* >> *auth.allow: ** >> >> server.allow-insecure: on >> >> >> 2017-07-21 19:13 GMT+02:00 yayo (j) <jaganz at gmail.com>: >> >>> 2017-07-20 14:48 GMT+02:00 Ravishankar N <ravishankar at redhat.com>: >>> >>>> >>>> But it does say something. All these gfids of completed heals in the >>>> log below are the for the ones that you have given the getfattr output of. >>>> So what is likely happening is there is an intermittent connection problem >>>> between your mount and the brick process, leading to pending heals again >>>> after the heal gets completed, which is why the numbers are varying each >>>> time. You would need to check why that is the case. >>>> Hope this helps, >>>> Ravi >>>> >>>> >>>> >>>> *[2017-07-20 09:58:46.573079] I [MSGID: 108026] >>>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: >>>> Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327. >>>> sources=[0] 1 sinks=2* >>>> *[2017-07-20 09:59:22.995003] I [MSGID: 108026] >>>> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] >>>> 0-engine-replicate-0: performing metadata selfheal on >>>> f05b9742-2771-484a-85fc-5b6974bcef81* >>>> *[2017-07-20 09:59:22.999372] I [MSGID: 108026] >>>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: >>>> Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81. >>>> sources=[0] 1 sinks=2* >>>> >>>> >>> >>> Hi, >>> >>> following your suggestion, I've checked the "peer" status and I found >>> that there is too many name for the hosts, I don't know if this can be the >>> problem or part of it: >>> >>> *gluster peer status on NODE01:* >>> *Number of Peers: 2* >>> >>> *Hostname: dnode02.localdomain.local* >>> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd* >>> *State: Peer in Cluster (Connected)* >>> *Other names:* >>> *192.168.10.52* >>> *dnode02.localdomain.local* >>> *10.10.20.90* >>> *10.10.10.20* >>> >>> >>> >>> >>> *gluster peer status on NODE02:* >>> *Number of Peers: 2* >>> >>> *Hostname: dnode01.localdomain.local* >>> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12* >>> *State: Peer in Cluster (Connected)* >>> *Other names:* >>> *gdnode01* >>> *10.10.10.10* >>> >>> *Hostname: gdnode04* >>> *Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828* >>> *State: Peer in Cluster (Connected)* >>> *Other names:* >>> *192.168.10.54* >>> *10.10.10.40* >>> >>> >>> *gluster peer status on NODE04:* >>> *Number of Peers: 2* >>> >>> *Hostname: dnode02.neridom.dom* >>> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd* >>> *State: Peer in Cluster (Connected)* >>> *Other names:* >>> *10.10.20.90* >>> *gdnode02* >>> *192.168.10.52* >>> *10.10.10.20* >>> >>> *Hostname: dnode01.localdomain.local* >>> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12* >>> *State: Peer in Cluster (Connected)* >>> *Other names:* >>> *gdnode01* >>> *10.10.10.10* >>> >>> >>> >>> All these ip are pingable and hosts resolvible across all 3 nodes but, >>> only the 10.10.10.0 network is the decidated network for gluster (rosolved >>> using gdnode* host names) ... You think that remove other entries can fix >>> the problem? So, sorry, but, how can I remove other entries? >>> >> I don't think having extra entries could be a problem. Did you check the >> fuse mount logs for disconnect messages that I referred to in the other >> email? >> >> >>> And, what about the selinux? >>> >> Not sure about this. See if there are disconnect messages in the mount >> logs first. >> -Ravi >> >> >>> Thank you >>> >>> >>> >> >> >> -- >> Linux User: 369739 http://counter.li.org >> >> >> >> _______________________________________________ >> Users mailing list >> Users at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> >-- Linux User: 369739 http://counter.li.org -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170724/5a726969/attachment.html>
yayo (j)
2017-Jul-24 14:16 UTC
[Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements
> > All these ip are pingable and hosts resolvible across all 3 nodes but, >> only the 10.10.10.0 network is the decidated network for gluster (rosolved >> using gdnode* host names) ... You think that remove other entries can fix >> the problem? So, sorry, but, how can I remove other entries? >> > I don't think having extra entries could be a problem. Did you check the > fuse mount logs for disconnect messages that I referred to in the other > email? >* tail -f /var/log/glusterfs/rhev-data-center-mnt-glusterSD-dvirtgluster\:engine.log* *NODE01:* [2017-07-24 07:34:00.799347] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0 -glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport endpoint is not connected) [2017-07-24 07:44:46.687334] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0 -glusterfsd-mgmt: Exhausted all volfile servers [2017-07-24 09:04:25.951350] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0 -glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport endpoint is not connected) [2017-07-24 09:15:11.839357] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0 -glusterfsd-mgmt: Exhausted all volfile servers [2017-07-24 10:34:51.231353] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0 -glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport endpoint is not connected) [2017-07-24 10:45:36.991321] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0 -glusterfsd-mgmt: Exhausted all volfile servers [2017-07-24 12:05:16.383323] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0 -glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport endpoint is not connected) [2017-07-24 12:16:02.271320] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0 -glusterfsd-mgmt: Exhausted all volfile servers [2017-07-24 13:35:41.535308] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0 -glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport endpoint is not connected) [2017-07-24 13:46:27.423304] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0 -glusterfsd-mgmt: Exhausted all volfile servers Why again gdnode03? Was removed from gluster! was the arbiter node... *NODE02:* [2017-07-24 14:08:18.709209] I [MSGID: 108026] [ afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on db56ac00-fd5b-4326-a879-326ff56181de. sources=0 [ 1] sinks=2 [2017-07-24 14:08:38.746688] I [MSGID: 108026] [ afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] 0-engine-replicate-0: performing metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81 [2017-07-24 14:08:38.749379] I [MSGID: 108026] [ afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81. sources=0 [1] sinks=2 [2017-07-24 14:08:46.068001] I [MSGID: 108026] [ afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on db56ac00-fd5b-4326-a879-326ff56181de. sources=0 [ 1] sinks=2 The message "I [MSGID: 108026] [ afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] 0-engine-replicate-0: performing metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81" repeated 3 times between [2017-07-24 14:08:38.746688] and [2017-07-24 14:10:09.088625] The message "I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81. sources=0 [1] sinks=2 " repeated 3 times between [2017-07-24 14:08:38.749379] and [2017-07-24 14:10:09.091377] [2017-07-24 14:10:19.384379] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on db56ac00-fd5b-4326-a879-326ff56181de. sources=0 [1] sinks=2 [2017-07-24 14:10:39.433155] I [MSGID: 108026] [afr-self-heal-metadata.c:51: __afr_selfheal_metadata_do] 0-engine-replicate-0: performing metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81 [2017-07-24 14:10:39.435847] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81. sources=0 [1] sinks=2 *NODE04:* [2017-07-24 14:08:56.789598] I [MSGID: 108026] [afr-self-heal-common.c:1254 :afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1 sinks=2 [2017-07-24 14:09:17.231987] I [MSGID: 108026] [afr-self-heal-common.c:1254 :afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on db56ac00 -fd5b-4326-a879-326ff56181de. sources=[0] 1 sinks=2 [2017-07-24 14:09:38.039541] I [MSGID: 108026] [afr-self-heal-common.c:1254 :afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1 sinks=2 [2017-07-24 14:09:48.875602] I [MSGID: 108026] [afr-self-heal-common.c:1254 :afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on db56ac00 -fd5b-4326-a879-326ff56181de. sources=[0] 1 sinks=2 [2017-07-24 14:10:39.832068] I [MSGID: 108026] [afr-self-heal-common.c:1254 :afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1 sinks=2 The message "I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1 sinks=2 " repeated 3 times between [2017-07-24 14:10: 39.832068] and [2017-07-24 14:12:22.686142] Last message was (I think) because I have reexecute an "heal" command n.b. dvirtgluster is the RR DNS for all node gluster> > >> And, what about the selinux? >> > Not sure about this. See if there are disconnect messages in the mount > logs first. > -Ravi > > >> Thank you >> >> >> >No messages selinux related... Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170724/e0957b70/attachment.html>
Possibly Parallel Threads
- [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements
- [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements
- [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements
- [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements
- [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements