thr3ads.net - Gluster users - [Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements [Jul 2017]

If this information is useful, please help other people find it:
Share via:

yayo (j)

2017-Jul-24 14:00 UTC

[Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

Hi,

UI refreshed but problem still remain ...

No specific error, I've only these errors but I've read that there is no
problem if I have this kind of errors:


2017-07-24 15:53:59,823+02 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler2) [b7590c4] START,
GlusterServersListVDSCommand(HostName
= node01.localdomain.local,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 29a62417
2017-07-24 15:54:01,066+02 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler2) [b7590c4] FINISH, GlusterServersListVDSCommand,
return: [10.10.20.80/24:CONNECTED, node02.localdomain.local:CONNECTED,
gdnode04:CONNECTED], log id: 29a62417
2017-07-24 15:54:01,076+02 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler2) [b7590c4] START,
GlusterVolumesListVDSCommand(HostName
= node01.localdomain.local,
GlusterVolumesListVDSParameters:{runAsync='true',
hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 7fce25d3
2017-07-24 15:54:02,209+02 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler2) [b7590c4] Could not associate brick
'gdnode01:/gluster/engine/brick' of volume 'd19c19e3-910d-437b-8ba7-
4f2a23d17515' with correct network as no gluster network found in cluster
'00000002-0002-0002-0002-00000000017a'
2017-07-24 15:54:02,212+02 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler2) [b7590c4] Could not associate brick
'gdnode02:/gluster/engine/brick' of volume 'd19c19e3-910d-437b-8ba7-
4f2a23d17515' with correct network as no gluster network found in cluster
'00000002-0002-0002-0002-00000000017a'
2017-07-24 15:54:02,215+02 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler2) [b7590c4] Could not associate brick
'gdnode04:/gluster/engine/brick' of volume 'd19c19e3-910d-437b-8ba7-
4f2a23d17515' with correct network as no gluster network found in cluster
'00000002-0002-0002-0002-00000000017a'
2017-07-24 15:54:02,218+02 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler2) [b7590c4] Could not associate brick
'gdnode01:/gluster/data/brick' of volume 'c7a5dfc9-3e72-4ea1-843e-
c8275d4a7c2d' with correct network as no gluster network found in cluster
'00000002-0002-0002-0002-00000000017a'
2017-07-24 15:54:02,221+02 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler2) [b7590c4] Could not associate brick
'gdnode02:/gluster/data/brick' of volume 'c7a5dfc9-3e72-4ea1-843e-
c8275d4a7c2d' with correct network as no gluster network found in cluster
'00000002-0002-0002-0002-00000000017a'
2017-07-24 15:54:02,224+02 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler2) [b7590c4] Could not associate brick
'gdnode04:/gluster/data/brick' of volume 'c7a5dfc9-3e72-4ea1-843e-
c8275d4a7c2d' with correct network as no gluster network found in cluster
'00000002-0002-0002-0002-00000000017a'
2017-07-24 15:54:02,224+02 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler2) [b7590c4] FINISH, GlusterVolumesListVDSCommand,
return: {d19c19e3-910d-437b-8ba7-4f2a23d17515=org.ovirt.engine.core.
common.businessentities.gluster.GlusterVolumeEntity at fdc91062, c7a5dfc9-3e72
-4ea1-843e-c8275d4a7c2d=org.ovirt.engine.core.common.businessentities.
gluster.GlusterVolumeEntity at 999a6f23}, log id: 7fce25d3


Thank you


2017-07-24 8:12 GMT+02:00 Kasturi Narra <knarra at redhat.com>:
> Hi,
>
>    Regarding the UI showing incorrect information about engine and data
> volumes, can you please refresh the UI and see if the issue persists  plus
> any errors in the engine.log files ?
>
> Thanks
> kasturi
>
> On Sat, Jul 22, 2017 at 11:43 AM, Ravishankar N <ravishankar at
redhat.com>
> wrote:
>
>>
>> On 07/21/2017 11:41 PM, yayo (j) wrote:
>>
>> Hi,
>>
>> Sorry for follow up again, but, checking the ovirt interface I've
found
>> that ovirt report the "engine" volume as an
"arbiter" configuration and the
>> "data" volume as full replicated volume. Check these
screenshots:
>>
>>
>> This is probably some refresh bug in the UI, Sahina might be able to
tell
>> you.
>>
>>
>> https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFf
>> VmR5aDQ?usp=sharing
>>
>> But the "gluster volume info" command report that all 2
volume are full
>> replicated:
>>
>>
>> *Volume Name: data*
>> *Type: Replicate*
>> *Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d*
>> *Status: Started*
>> *Snapshot Count: 0*
>> *Number of Bricks: 1 x 3 = 3*
>> *Transport-type: tcp*
>> *Bricks:*
>> *Brick1: gdnode01:/gluster/data/brick*
>> *Brick2: gdnode02:/gluster/data/brick*
>> *Brick3: gdnode04:/gluster/data/brick*
>> *Options Reconfigured:*
>> *nfs.disable: on*
>> *performance.readdir-ahead: on*
>> *transport.address-family: inet*
>> *storage.owner-uid: 36*
>> *performance.quick-read: off*
>> *performance.read-ahead: off*
>> *performance.io-cache: off*
>> *performance.stat-prefetch: off*
>> *performance.low-prio-threads: 32*
>> *network.remote-dio: enable*
>> *cluster.eager-lock: enable*
>> *cluster.quorum-type: auto*
>> *cluster.server-quorum-type: server*
>> *cluster.data-self-heal-algorithm: full*
>> *cluster.locking-scheme: granular*
>> *cluster.shd-max-threads: 8*
>> *cluster.shd-wait-qlength: 10000*
>> *features.shard: on*
>> *user.cifs: off*
>> *storage.owner-gid: 36*
>> *features.shard-block-size: 512MB*
>> *network.ping-timeout: 30*
>> *performance.strict-o-direct: on*
>> *cluster.granular-entry-heal: on*
>> *auth.allow: **
>> *server.allow-insecure: on*
>>
>>
>>
>>
>>
>> *Volume Name: engine*
>> *Type: Replicate*
>> *Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515*
>> *Status: Started*
>> *Snapshot Count: 0*
>> *Number of Bricks: 1 x 3 = 3*
>> *Transport-type: tcp*
>> *Bricks:*
>> *Brick1: gdnode01:/gluster/engine/brick*
>> *Brick2: gdnode02:/gluster/engine/brick*
>> *Brick3: gdnode04:/gluster/engine/brick*
>> *Options Reconfigured:*
>> *nfs.disable: on*
>> *performance.readdir-ahead: on*
>> *transport.address-family: inet*
>> *storage.owner-uid: 36*
>> *performance.quick-read: off*
>> *performance.read-ahead: off*
>> *performance.io-cache: off*
>> *performance.stat-prefetch: off*
>> *performance.low-prio-threads: 32*
>> *network.remote-dio: off*
>> *cluster.eager-lock: enable*
>> *cluster.quorum-type: auto*
>> *cluster.server-quorum-type: server*
>> *cluster.data-self-heal-algorithm: full*
>> *cluster.locking-scheme: granular*
>> *cluster.shd-max-threads: 8*
>> *cluster.shd-wait-qlength: 10000*
>> *features.shard: on*
>> *user.cifs: off*
>> *storage.owner-gid: 36*
>> *features.shard-block-size: 512MB*
>> *network.ping-timeout: 30*
>> *performance.strict-o-direct: on*
>> *cluster.granular-entry-heal: on*
>> *auth.allow: **
>>
>>           server.allow-insecure: on
>>
>>
>> 2017-07-21 19:13 GMT+02:00 yayo (j) <jaganz at gmail.com>:
>>
>>> 2017-07-20 14:48 GMT+02:00 Ravishankar N <ravishankar at
redhat.com>:
>>>
>>>>
>>>> But it does  say something. All these gfids of completed heals
in the
>>>> log below are the for the ones that you have given the getfattr
output of.
>>>> So what is likely happening is there is an intermittent
connection problem
>>>> between your mount and the brick process, leading to pending
heals again
>>>> after the heal gets completed, which is why the numbers are
varying each
>>>> time. You would need to check why that is the case.
>>>> Hope this helps,
>>>> Ravi
>>>>
>>>>
>>>>
>>>> *[2017-07-20 09:58:46.573079] I [MSGID: 108026]
>>>> [afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0:
>>>> Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327.
>>>> sources=[0] 1  sinks=2*
>>>> *[2017-07-20 09:59:22.995003] I [MSGID: 108026]
>>>> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
>>>> 0-engine-replicate-0: performing metadata selfheal on
>>>> f05b9742-2771-484a-85fc-5b6974bcef81*
>>>> *[2017-07-20 09:59:22.999372] I [MSGID: 108026]
>>>> [afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0:
>>>> Completed metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81.
>>>> sources=[0] 1  sinks=2*
>>>>
>>>>
>>>
>>> Hi,
>>>
>>> following your suggestion, I've checked the "peer"
status and I found
>>> that there is too many name for the hosts, I don't know if this
can be the
>>> problem or part of it:
>>>
>>> *gluster peer status on NODE01:*
>>> *Number of Peers: 2*
>>>
>>> *Hostname: dnode02.localdomain.local*
>>> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
>>> *State: Peer in Cluster (Connected)*
>>> *Other names:*
>>> *192.168.10.52*
>>> *dnode02.localdomain.local*
>>> *10.10.20.90*
>>> *10.10.10.20*
>>>
>>>
>>>
>>>
>>> *gluster peer status on NODE02:*
>>> *Number of Peers: 2*
>>>
>>> *Hostname: dnode01.localdomain.local*
>>> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
>>> *State: Peer in Cluster (Connected)*
>>> *Other names:*
>>> *gdnode01*
>>> *10.10.10.10*
>>>
>>> *Hostname: gdnode04*
>>> *Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828*
>>> *State: Peer in Cluster (Connected)*
>>> *Other names:*
>>> *192.168.10.54*
>>> *10.10.10.40*
>>>
>>>
>>> *gluster peer status on NODE04:*
>>> *Number of Peers: 2*
>>>
>>> *Hostname: dnode02.neridom.dom*
>>> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
>>> *State: Peer in Cluster (Connected)*
>>> *Other names:*
>>> *10.10.20.90*
>>> *gdnode02*
>>> *192.168.10.52*
>>> *10.10.10.20*
>>>
>>> *Hostname: dnode01.localdomain.local*
>>> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
>>> *State: Peer in Cluster (Connected)*
>>> *Other names:*
>>> *gdnode01*
>>> *10.10.10.10*
>>>
>>>
>>>
>>> All these ip are pingable and hosts resolvible across all 3 nodes
but,
>>> only the 10.10.10.0 network is the decidated network for gluster 
(rosolved
>>> using gdnode* host names) ... You think that remove other entries
can fix
>>> the problem? So, sorry, but, how can I remove other entries?
>>>
>> I don't think having extra entries could be a problem. Did you
check the
>> fuse mount logs for disconnect messages that I referred to in the other
>> email?
>>
>>
>>> And, what about the selinux?
>>>
>> Not sure about this. See if there are disconnect messages in the mount
>> logs first.
>> -Ravi
>>
>>
>>> Thank you
>>>
>>>
>>>
>>
>>
>> --
>> Linux User: 369739 http://counter.li.org
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>

-- 
Linux User: 369739 http://counter.li.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170724/5a726969/attachment.html>

Kasturi Narra

2017-Jul-25 05:42 UTC

head link

[Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

These errors are because not having glusternw assigned to the correct
interface. Once you attach that these errors should go away.  This has
nothing to do with the problem you are seeing.

sahina any idea about engine not showing the correct volume info ?

On Mon, Jul 24, 2017 at 7:30 PM, yayo (j) <jaganz at gmail.com> wrote:
> Hi,
>
> UI refreshed but problem still remain ...
>
> No specific error, I've only these errors but I've read that there
is no
> problem if I have this kind of errors:
>
>
> 2017-07-24 15:53:59,823+02 INFO  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2)
> [b7590c4] START, GlusterServersListVDSCommand(HostName >
node01.localdomain.local,
VdsIdVDSCommandParametersBase:{runAsync='true',
> hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 29a62417
> 2017-07-24 15:54:01,066+02 INFO  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2)
> [b7590c4] FINISH, GlusterServersListVDSCommand, return:
[10.10.20.80/24:CONNECTED,
> node02.localdomain.local:CONNECTED, gdnode04:CONNECTED], log id: 29a62417
> 2017-07-24 15:54:01,076+02 INFO  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2)
> [b7590c4] START, GlusterVolumesListVDSCommand(HostName >
node01.localdomain.local,
GlusterVolumesListVDSParameters:{runAsync='true',
> hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 7fce25d3
> 2017-07-24 15:54:02,209+02 WARN  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
> Could not associate brick 'gdnode01:/gluster/engine/brick' of
volume
> 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no
gluster
> network found in cluster '00000002-0002-0002-0002-00000000017a'
> 2017-07-24 15:54:02,212+02 WARN  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
> Could not associate brick 'gdnode02:/gluster/engine/brick' of
volume
> 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no
gluster
> network found in cluster '00000002-0002-0002-0002-00000000017a'
> 2017-07-24 15:54:02,215+02 WARN  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
> Could not associate brick 'gdnode04:/gluster/engine/brick' of
volume
> 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no
gluster
> network found in cluster '00000002-0002-0002-0002-00000000017a'
> 2017-07-24 15:54:02,218+02 WARN  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
> Could not associate brick 'gdnode01:/gluster/data/brick' of volume
> 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no
gluster
> network found in cluster '00000002-0002-0002-0002-00000000017a'
> 2017-07-24 15:54:02,221+02 WARN  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
> Could not associate brick 'gdnode02:/gluster/data/brick' of volume
> 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no
gluster
> network found in cluster '00000002-0002-0002-0002-00000000017a'
> 2017-07-24 15:54:02,224+02 WARN  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
> Could not associate brick 'gdnode04:/gluster/data/brick' of volume
> 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no
gluster
> network found in cluster '00000002-0002-0002-0002-00000000017a'
> 2017-07-24 15:54:02,224+02 INFO  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2)
> [b7590c4] FINISH, GlusterVolumesListVDSCommand, return: {d19c19e3-910d-437
> b-8ba7-4f2a23d17515=org.ovirt.engine.core.common.businessentities.gluste
> r.GlusterVolumeEntity at fdc91062, c7a5dfc9-3e72-4ea1-843e-c8275d
> 4a7c2d=org.ovirt.engine.core.common.businessentities.gluste
> r.GlusterVolumeEntity at 999a6f23}, log id: 7fce25d3
>
>
> Thank you
>
>
> 2017-07-24 8:12 GMT+02:00 Kasturi Narra <knarra at redhat.com>:
>
>> Hi,
>>
>>    Regarding the UI showing incorrect information about engine and data
>> volumes, can you please refresh the UI and see if the issue persists 
plus
>> any errors in the engine.log files ?
>>
>> Thanks
>> kasturi
>>
>> On Sat, Jul 22, 2017 at 11:43 AM, Ravishankar N <ravishankar at
redhat.com>
>> wrote:
>>
>>>
>>> On 07/21/2017 11:41 PM, yayo (j) wrote:
>>>
>>> Hi,
>>>
>>> Sorry for follow up again, but, checking the ovirt interface
I've found
>>> that ovirt report the "engine" volume as an
"arbiter" configuration and the
>>> "data" volume as full replicated volume. Check these
screenshots:
>>>
>>>
>>> This is probably some refresh bug in the UI, Sahina might be able
to
>>> tell you.
>>>
>>>
>>> https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFf
>>> VmR5aDQ?usp=sharing
>>>
>>> But the "gluster volume info" command report that all 2
volume are full
>>> replicated:
>>>
>>>
>>> *Volume Name: data*
>>> *Type: Replicate*
>>> *Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d*
>>> *Status: Started*
>>> *Snapshot Count: 0*
>>> *Number of Bricks: 1 x 3 = 3*
>>> *Transport-type: tcp*
>>> *Bricks:*
>>> *Brick1: gdnode01:/gluster/data/brick*
>>> *Brick2: gdnode02:/gluster/data/brick*
>>> *Brick3: gdnode04:/gluster/data/brick*
>>> *Options Reconfigured:*
>>> *nfs.disable: on*
>>> *performance.readdir-ahead: on*
>>> *transport.address-family: inet*
>>> *storage.owner-uid: 36*
>>> *performance.quick-read: off*
>>> *performance.read-ahead: off*
>>> *performance.io-cache: off*
>>> *performance.stat-prefetch: off*
>>> *performance.low-prio-threads: 32*
>>> *network.remote-dio: enable*
>>> *cluster.eager-lock: enable*
>>> *cluster.quorum-type: auto*
>>> *cluster.server-quorum-type: server*
>>> *cluster.data-self-heal-algorithm: full*
>>> *cluster.locking-scheme: granular*
>>> *cluster.shd-max-threads: 8*
>>> *cluster.shd-wait-qlength: 10000*
>>> *features.shard: on*
>>> *user.cifs: off*
>>> *storage.owner-gid: 36*
>>> *features.shard-block-size: 512MB*
>>> *network.ping-timeout: 30*
>>> *performance.strict-o-direct: on*
>>> *cluster.granular-entry-heal: on*
>>> *auth.allow: **
>>> *server.allow-insecure: on*
>>>
>>>
>>>
>>>
>>>
>>> *Volume Name: engine*
>>> *Type: Replicate*
>>> *Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515*
>>> *Status: Started*
>>> *Snapshot Count: 0*
>>> *Number of Bricks: 1 x 3 = 3*
>>> *Transport-type: tcp*
>>> *Bricks:*
>>> *Brick1: gdnode01:/gluster/engine/brick*
>>> *Brick2: gdnode02:/gluster/engine/brick*
>>> *Brick3: gdnode04:/gluster/engine/brick*
>>> *Options Reconfigured:*
>>> *nfs.disable: on*
>>> *performance.readdir-ahead: on*
>>> *transport.address-family: inet*
>>> *storage.owner-uid: 36*
>>> *performance.quick-read: off*
>>> *performance.read-ahead: off*
>>> *performance.io-cache: off*
>>> *performance.stat-prefetch: off*
>>> *performance.low-prio-threads: 32*
>>> *network.remote-dio: off*
>>> *cluster.eager-lock: enable*
>>> *cluster.quorum-type: auto*
>>> *cluster.server-quorum-type: server*
>>> *cluster.data-self-heal-algorithm: full*
>>> *cluster.locking-scheme: granular*
>>> *cluster.shd-max-threads: 8*
>>> *cluster.shd-wait-qlength: 10000*
>>> *features.shard: on*
>>> *user.cifs: off*
>>> *storage.owner-gid: 36*
>>> *features.shard-block-size: 512MB*
>>> *network.ping-timeout: 30*
>>> *performance.strict-o-direct: on*
>>> *cluster.granular-entry-heal: on*
>>> *auth.allow: **
>>>
>>>           server.allow-insecure: on
>>>
>>>
>>> 2017-07-21 19:13 GMT+02:00 yayo (j) <jaganz at gmail.com>:
>>>
>>>> 2017-07-20 14:48 GMT+02:00 Ravishankar N <ravishankar at
redhat.com>:
>>>>
>>>>>
>>>>> But it does  say something. All these gfids of completed
heals in the
>>>>> log below are the for the ones that you have given the
getfattr output of.
>>>>> So what is likely happening is there is an intermittent
connection problem
>>>>> between your mount and the brick process, leading to
pending heals again
>>>>> after the heal gets completed, which is why the numbers are
varying each
>>>>> time. You would need to check why that is the case.
>>>>> Hope this helps,
>>>>> Ravi
>>>>>
>>>>>
>>>>>
>>>>> *[2017-07-20 09:58:46.573079] I [MSGID: 108026]
>>>>> [afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0:
>>>>> Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327.
>>>>> sources=[0] 1  sinks=2*
>>>>> *[2017-07-20 09:59:22.995003] I [MSGID: 108026]
>>>>> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
>>>>> 0-engine-replicate-0: performing metadata selfheal on
>>>>> f05b9742-2771-484a-85fc-5b6974bcef81*
>>>>> *[2017-07-20 09:59:22.999372] I [MSGID: 108026]
>>>>> [afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0:
>>>>> Completed metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81.
>>>>> sources=[0] 1  sinks=2*
>>>>>
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> following your suggestion, I've checked the
"peer" status and I found
>>>> that there is too many name for the hosts, I don't know if
this can be the
>>>> problem or part of it:
>>>>
>>>> *gluster peer status on NODE01:*
>>>> *Number of Peers: 2*
>>>>
>>>> *Hostname: dnode02.localdomain.local*
>>>> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
>>>> *State: Peer in Cluster (Connected)*
>>>> *Other names:*
>>>> *192.168.10.52*
>>>> *dnode02.localdomain.local*
>>>> *10.10.20.90*
>>>> *10.10.10.20*
>>>>
>>>>
>>>>
>>>>
>>>> *gluster peer status on NODE02:*
>>>> *Number of Peers: 2*
>>>>
>>>> *Hostname: dnode01.localdomain.local*
>>>> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
>>>> *State: Peer in Cluster (Connected)*
>>>> *Other names:*
>>>> *gdnode01*
>>>> *10.10.10.10*
>>>>
>>>> *Hostname: gdnode04*
>>>> *Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828*
>>>> *State: Peer in Cluster (Connected)*
>>>> *Other names:*
>>>> *192.168.10.54*
>>>> *10.10.10.40*
>>>>
>>>>
>>>> *gluster peer status on NODE04:*
>>>> *Number of Peers: 2*
>>>>
>>>> *Hostname: dnode02.neridom.dom*
>>>> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
>>>> *State: Peer in Cluster (Connected)*
>>>> *Other names:*
>>>> *10.10.20.90*
>>>> *gdnode02*
>>>> *192.168.10.52*
>>>> *10.10.10.20*
>>>>
>>>> *Hostname: dnode01.localdomain.local*
>>>> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
>>>> *State: Peer in Cluster (Connected)*
>>>> *Other names:*
>>>> *gdnode01*
>>>> *10.10.10.10*
>>>>
>>>>
>>>>
>>>> All these ip are pingable and hosts resolvible across all 3
nodes but,
>>>> only the 10.10.10.0 network is the decidated network for
gluster  (rosolved
>>>> using gdnode* host names) ... You think that remove other
entries can fix
>>>> the problem? So, sorry, but, how can I remove other entries?
>>>>
>>> I don't think having extra entries could be a problem. Did you
check the
>>> fuse mount logs for disconnect messages that I referred to in the
other
>>> email?
>>>
>>>
>>>> And, what about the selinux?
>>>>
>>> Not sure about this. See if there are disconnect messages in the
mount
>>> logs first.
>>> -Ravi
>>>
>>>
>>>> Thank you
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Linux User: 369739 http://counter.li.org
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>
>
> --
> Linux User: 369739 http://counter.li.org
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170725/6a7cafd7/attachment.html>

Sahina Bose

2017-Jul-25 06:27 UTC

head link

[Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

On Tue, Jul 25, 2017 at 11:12 AM, Kasturi Narra <knarra at redhat.com>
wrote:
> These errors are because not having glusternw assigned to the correct
> interface. Once you attach that these errors should go away.  This has
> nothing to do with the problem you are seeing.
>
> sahina any idea about engine not showing the correct volume info ?
>
Please provide the vdsm.log (contianing the gluster volume info) and
engine.log

> On Mon, Jul 24, 2017 at 7:30 PM, yayo (j) <jaganz at gmail.com>
wrote:
>
>> Hi,
>>
>> UI refreshed but problem still remain ...
>>
>> No specific error, I've only these errors but I've read that
there is no
>> problem if I have this kind of errors:
>>
>>
>> 2017-07-24 15:53:59,823+02 INFO  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2)
>> [b7590c4] START, GlusterServersListVDSCommand(HostName >>
node01.localdomain.local,
VdsIdVDSCommandParametersBase:{runAsync='true',
>> hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id:
29a62417
>> 2017-07-24 15:54:01,066+02 INFO  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2)
>> [b7590c4] FINISH, GlusterServersListVDSCommand, return:
[10.10.20.80/24:CONNECTED,
>> node02.localdomain.local:CONNECTED, gdnode04:CONNECTED], log id:
29a62417
>> 2017-07-24 15:54:01,076+02 INFO  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2)
>> [b7590c4] START, GlusterVolumesListVDSCommand(HostName >>
node01.localdomain.local, GlusterVolumesListVDSParameters:{runAsync>>
'true', hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id:
7fce25d3
>> 2017-07-24 15:54:02,209+02 WARN  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
>> [b7590c4] Could not associate brick
'gdnode01:/gluster/engine/brick' of
>> volume 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct
network as no
>> gluster network found in cluster
'00000002-0002-0002-0002-00000000017a'
>> 2017-07-24 15:54:02,212+02 WARN  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
>> [b7590c4] Could not associate brick
'gdnode02:/gluster/engine/brick' of
>> volume 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct
network as no
>> gluster network found in cluster
'00000002-0002-0002-0002-00000000017a'
>> 2017-07-24 15:54:02,215+02 WARN  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
>> [b7590c4] Could not associate brick
'gdnode04:/gluster/engine/brick' of
>> volume 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct
network as no
>> gluster network found in cluster
'00000002-0002-0002-0002-00000000017a'
>> 2017-07-24 15:54:02,218+02 WARN  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
>> [b7590c4] Could not associate brick
'gdnode01:/gluster/data/brick' of
>> volume 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct
network as no
>> gluster network found in cluster
'00000002-0002-0002-0002-00000000017a'
>> 2017-07-24 15:54:02,221+02 WARN  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
>> [b7590c4] Could not associate brick
'gdnode02:/gluster/data/brick' of
>> volume 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct
network as no
>> gluster network found in cluster
'00000002-0002-0002-0002-00000000017a'
>> 2017-07-24 15:54:02,224+02 WARN  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
>> [b7590c4] Could not associate brick
'gdnode04:/gluster/data/brick' of
>> volume 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct
network as no
>> gluster network found in cluster
'00000002-0002-0002-0002-00000000017a'
>> 2017-07-24 15:54:02,224+02 INFO  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2)
>> [b7590c4] FINISH, GlusterVolumesListVDSCommand, return: {d19c19e3-910d
>> -437b-8ba7-4f2a23d17515=org.ovirt.engine.core.
>> common.businessentities.gluster.GlusterVolumeEntity at fdc91062,
c7a5dfc9
>> -3e72-4ea1-843e-c8275d4a7c2d=org.ovirt.engine.core.c
>> ommon.businessentities.gluster.GlusterVolumeEntity at 999a6f23}, log
id: 7
>> fce25d3
>>
>>
>> Thank you
>>
>>
>> 2017-07-24 8:12 GMT+02:00 Kasturi Narra <knarra at redhat.com>:
>>
>>> Hi,
>>>
>>>    Regarding the UI showing incorrect information about engine and
data
>>> volumes, can you please refresh the UI and see if the issue
persists  plus
>>> any errors in the engine.log files ?
>>>
>>> Thanks
>>> kasturi
>>>
>>> On Sat, Jul 22, 2017 at 11:43 AM, Ravishankar N <ravishankar at
redhat.com>
>>> wrote:
>>>
>>>>
>>>> On 07/21/2017 11:41 PM, yayo (j) wrote:
>>>>
>>>> Hi,
>>>>
>>>> Sorry for follow up again, but, checking the ovirt interface
I've found
>>>> that ovirt report the "engine" volume as an
"arbiter" configuration and the
>>>> "data" volume as full replicated volume. Check these
screenshots:
>>>>
>>>>
>>>> This is probably some refresh bug in the UI, Sahina might be
able to
>>>> tell you.
>>>>
>>>>
>>>> https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFf
>>>> VmR5aDQ?usp=sharing
>>>>
>>>> But the "gluster volume info" command report that all
2 volume are full
>>>> replicated:
>>>>
>>>>
>>>> *Volume Name: data*
>>>> *Type: Replicate*
>>>> *Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d*
>>>> *Status: Started*
>>>> *Snapshot Count: 0*
>>>> *Number of Bricks: 1 x 3 = 3*
>>>> *Transport-type: tcp*
>>>> *Bricks:*
>>>> *Brick1: gdnode01:/gluster/data/brick*
>>>> *Brick2: gdnode02:/gluster/data/brick*
>>>> *Brick3: gdnode04:/gluster/data/brick*
>>>> *Options Reconfigured:*
>>>> *nfs.disable: on*
>>>> *performance.readdir-ahead: on*
>>>> *transport.address-family: inet*
>>>> *storage.owner-uid: 36*
>>>> *performance.quick-read: off*
>>>> *performance.read-ahead: off*
>>>> *performance.io-cache: off*
>>>> *performance.stat-prefetch: off*
>>>> *performance.low-prio-threads: 32*
>>>> *network.remote-dio: enable*
>>>> *cluster.eager-lock: enable*
>>>> *cluster.quorum-type: auto*
>>>> *cluster.server-quorum-type: server*
>>>> *cluster.data-self-heal-algorithm: full*
>>>> *cluster.locking-scheme: granular*
>>>> *cluster.shd-max-threads: 8*
>>>> *cluster.shd-wait-qlength: 10000*
>>>> *features.shard: on*
>>>> *user.cifs: off*
>>>> *storage.owner-gid: 36*
>>>> *features.shard-block-size: 512MB*
>>>> *network.ping-timeout: 30*
>>>> *performance.strict-o-direct: on*
>>>> *cluster.granular-entry-heal: on*
>>>> *auth.allow: **
>>>> *server.allow-insecure: on*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Volume Name: engine*
>>>> *Type: Replicate*
>>>> *Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515*
>>>> *Status: Started*
>>>> *Snapshot Count: 0*
>>>> *Number of Bricks: 1 x 3 = 3*
>>>> *Transport-type: tcp*
>>>> *Bricks:*
>>>> *Brick1: gdnode01:/gluster/engine/brick*
>>>> *Brick2: gdnode02:/gluster/engine/brick*
>>>> *Brick3: gdnode04:/gluster/engine/brick*
>>>> *Options Reconfigured:*
>>>> *nfs.disable: on*
>>>> *performance.readdir-ahead: on*
>>>> *transport.address-family: inet*
>>>> *storage.owner-uid: 36*
>>>> *performance.quick-read: off*
>>>> *performance.read-ahead: off*
>>>> *performance.io-cache: off*
>>>> *performance.stat-prefetch: off*
>>>> *performance.low-prio-threads: 32*
>>>> *network.remote-dio: off*
>>>> *cluster.eager-lock: enable*
>>>> *cluster.quorum-type: auto*
>>>> *cluster.server-quorum-type: server*
>>>> *cluster.data-self-heal-algorithm: full*
>>>> *cluster.locking-scheme: granular*
>>>> *cluster.shd-max-threads: 8*
>>>> *cluster.shd-wait-qlength: 10000*
>>>> *features.shard: on*
>>>> *user.cifs: off*
>>>> *storage.owner-gid: 36*
>>>> *features.shard-block-size: 512MB*
>>>> *network.ping-timeout: 30*
>>>> *performance.strict-o-direct: on*
>>>> *cluster.granular-entry-heal: on*
>>>> *auth.allow: **
>>>>
>>>>           server.allow-insecure: on
>>>>
>>>>
>>>> 2017-07-21 19:13 GMT+02:00 yayo (j) <jaganz at
gmail.com>:
>>>>
>>>>> 2017-07-20 14:48 GMT+02:00 Ravishankar N <ravishankar at
redhat.com>:
>>>>>
>>>>>>
>>>>>> But it does  say something. All these gfids of
completed heals in the
>>>>>> log below are the for the ones that you have given the
getfattr output of.
>>>>>> So what is likely happening is there is an intermittent
connection problem
>>>>>> between your mount and the brick process, leading to
pending heals again
>>>>>> after the heal gets completed, which is why the numbers
are varying each
>>>>>> time. You would need to check why that is the case.
>>>>>> Hope this helps,
>>>>>> Ravi
>>>>>>
>>>>>>
>>>>>>
>>>>>> *[2017-07-20 09:58:46.573079] I [MSGID: 108026]
>>>>>> [afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0:
>>>>>> Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327.
>>>>>> sources=[0] 1  sinks=2*
>>>>>> *[2017-07-20 09:59:22.995003] I [MSGID: 108026]
>>>>>>
[afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
>>>>>> 0-engine-replicate-0: performing metadata selfheal on
>>>>>> f05b9742-2771-484a-85fc-5b6974bcef81*
>>>>>> *[2017-07-20 09:59:22.999372] I [MSGID: 108026]
>>>>>> [afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0:
>>>>>> Completed metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81.
>>>>>> sources=[0] 1  sinks=2*
>>>>>>
>>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> following your suggestion, I've checked the
"peer" status and I found
>>>>> that there is too many name for the hosts, I don't know
if this can be the
>>>>> problem or part of it:
>>>>>
>>>>> *gluster peer status on NODE01:*
>>>>> *Number of Peers: 2*
>>>>>
>>>>> *Hostname: dnode02.localdomain.local*
>>>>> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
>>>>> *State: Peer in Cluster (Connected)*
>>>>> *Other names:*
>>>>> *192.168.10.52*
>>>>> *dnode02.localdomain.local*
>>>>> *10.10.20.90*
>>>>> *10.10.10.20*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *gluster peer status on NODE02:*
>>>>> *Number of Peers: 2*
>>>>>
>>>>> *Hostname: dnode01.localdomain.local*
>>>>> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
>>>>> *State: Peer in Cluster (Connected)*
>>>>> *Other names:*
>>>>> *gdnode01*
>>>>> *10.10.10.10*
>>>>>
>>>>> *Hostname: gdnode04*
>>>>> *Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828*
>>>>> *State: Peer in Cluster (Connected)*
>>>>> *Other names:*
>>>>> *192.168.10.54*
>>>>> *10.10.10.40*
>>>>>
>>>>>
>>>>> *gluster peer status on NODE04:*
>>>>> *Number of Peers: 2*
>>>>>
>>>>> *Hostname: dnode02.neridom.dom*
>>>>> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
>>>>> *State: Peer in Cluster (Connected)*
>>>>> *Other names:*
>>>>> *10.10.20.90*
>>>>> *gdnode02*
>>>>> *192.168.10.52*
>>>>> *10.10.10.20*
>>>>>
>>>>> *Hostname: dnode01.localdomain.local*
>>>>> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
>>>>> *State: Peer in Cluster (Connected)*
>>>>> *Other names:*
>>>>> *gdnode01*
>>>>> *10.10.10.10*
>>>>>
>>>>>
>>>>>
>>>>> All these ip are pingable and hosts resolvible across all 3
nodes but,
>>>>> only the 10.10.10.0 network is the decidated network for
gluster  (rosolved
>>>>> using gdnode* host names) ... You think that remove other
entries can fix
>>>>> the problem? So, sorry, but, how can I remove other
entries?
>>>>>
>>>> I don't think having extra entries could be a problem. Did
you check
>>>> the fuse mount logs for disconnect messages that I referred to
in the other
>>>> email?
>>>>
>>>>
>>>>> And, what about the selinux?
>>>>>
>>>> Not sure about this. See if there are disconnect messages in
the mount
>>>> logs first.
>>>> -Ravi
>>>>
>>>>
>>>>> Thank you
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Linux User: 369739 http://counter.li.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>>
>> --
>> Linux User: 369739 http://counter.li.org
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170725/2d0913dc/attachment.html>

yayo (j)

2017-Jul-25 08:15 UTC

head link

[Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-25 7:42 GMT+02:00 Kasturi Narra <knarra at redhat.com>:
> These errors are because not having glusternw assigned to the correct
> interface. Once you attach that these errors should go away.  This has
> nothing to do with the problem you are seeing.
>
Hi,

You talking  about errors like these?

2017-07-24 15:54:02,209+02 WARN  [org.ovirt.engine.core.vdsbro
ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
Could not associate brick 'gdnode01:/gluster/engine/brick' of volume
'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no
gluster
network found in cluster '00000002-0002-0002-0002-00000000017a'


How to assign "glusternw (???)" to the correct interface?

Other errors on unsync gluster elements still remain... This is a
production env, so, there is any chance to subscribe to RH support?

Thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170725/a453f959/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

Gluster users - Jul 2017 - [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

[Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

[Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

[Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

[Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

Possibly Parallel Threads