thr3ads.net - Gluster users - [Gluster-users] [ovirt-users] 4.0

If this information is useful, please help other people find it:
Share via:

Sahina Bose

2016-Oct-05 07:10 UTC

[Gluster-users] [ovirt-users] 4.0 - 2nd node fails on deploy

[Adding gluster-users ML]
The brick logs are filled with errors :
[2016-10-05 19:30:28.659061] E [MSGID: 113077]
[posix-handle.c:309:posix_handle_pump] 0-engine-posix: malformed internal
link
/var/run/vdsm/storage/0a021563-91b5-4f49-9c6b-fff45e85a025/d84f0551-0f2b-457c-808c-6369c6708d43/1b5a5e34-818c-4914-8192-2f05733b5583
for
/xpool/engine/brick/.glusterfs/b9/8e/b98ed8d2-3bf9-4b11-92fd-ca5324e131a8
[2016-10-05 19:30:28.659069] E [MSGID: 113091] [posix.c:180:posix_lookup]
0-engine-posix: Failed to create inode handle for path
<gfid:b98ed8d2-3bf9-4b11-92fd-ca5324e131a8>
The message "E [MSGID: 113018] [posix.c:198:posix_lookup] 0-engine-posix:
lstat on null failed" repeated 3 times between [2016-10-05 19:30:28.656529]
and [2016-10-05 19:30:28.659076]
[2016-10-05 19:30:28.659087] W [MSGID: 115005]
[server-resolve.c:126:resolve_gfid_cbk] 0-engine-server:
b98ed8d2-3bf9-4b11-92fd-ca5324e131a8: failed to resolve (Success)

- Ravi, the above are from the data brick of the arbiter volume. Can you
take a look?

Jason,
Could you also provide the mount logs from the first host
(/var/log/glusterfs/rhev-data-center-mnt-glusterSD*engine.log) and glusterd
log (/var/log/glusterfs/etc-glusterfs-glusterd.vol.log) around the same
time frame.



On Wed, Oct 5, 2016 at 3:28 AM, Jason Jeffrey <jason at sudo.co.uk> wrote:
> Hi,
>
>
>
> Servers are powered  off  when I?m not looking at the problem.
>
>
>
> There may have been instances where all three were not powered on, during
> the same period.
>
>
>
> Glusterhd log attached, the xpool-engine-brick log is over 1 GB in size,
> I?ve taken a sample of the last  couple days, looks to be highly repative.
>
>
>
> Cheers
>
>
>
> Jason
>
>
>
>
>
>
>
>
>
> *From:* Simone Tiraboschi [mailto:stirabos at redhat.com]
> *Sent:* 04 October 2016 16:50
>
> *To:* Jason Jeffrey <jason at sudo.co.uk>
> *Cc:* users <users at ovirt.org>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
>
>
>
>
> On Tue, Oct 4, 2016 at 5:22 PM, Jason Jeffrey <jason at sudo.co.uk>
wrote:
>
> Hi,
>
>
>
> DCASTORXX is a hosts entry for dedicated  direct 10GB links (each private
> /28) between the x3 servers  i.e 1=> 2&3, 2=> 1&3, etc)
planned to be used
> solely for storage.
>
>
>
> I,e
>
>
>
> 10.100.50.81    dcasrv01
>
> 10.100.101.1    dcastor01
>
> 10.100.50.82    dcasrv02
>
> 10.100.101.2    dcastor02
>
> 10.100.50.83    dcasrv03
>
> 10.100.103.3    dcastor03
>
>
>
> These were setup with the gluster commands
>
>
>
> ?         gluster volume create iso replica 3 arbiter 1
> dcastor01:/xpool/iso/brick   dcastor02:/xpool/iso/brick
> dcastor03:/xpool/iso/brick
>
> ?         gluster volume create export replica 3 arbiter 1
> dcastor02:/xpool/export/brick  dcastor03:/xpool/export/brick
> dcastor01:/xpool/export/brick
>
> ?         gluster volume create engine replica 3 arbiter 1
> dcastor01:/xpool/engine/brick dcastor02:/xpool/engine/brick
> dcastor03:/xpool/engine/brick
>
> ?         gluster volume create data replica 3 arbiter 1
> dcastor01:/xpool/data/brick  dcastor03:/xpool/data/brick
> dcastor02:/xpool/data/bricky
>
>
>
>
>
> So yes, DCASRV01 is the server (pri) and have local bricks access through
> DCASTOR01 interface
>
>
>
> Is the issue here not the incorrect soft link ?
>
>
>
> No, this should be fine.
>
>
>
> The issue is that periodically your gluster volume losses its server
> quorum and become unavailable.
>
> It happened more than once from your logs.
>
>
>
> Can you please attach also gluster logs for that volume?
>
>
>
>
>
> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-
> 496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93
>
> [root at dcasrv01 /]# ls -al /var/run/vdsm/storage/bbb70623-194a-46d2-a164-
> 76a4876ecaaf/
>
> ls: cannot access
/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/:
> No such file or directory
>
> But the data does exist
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al
>
> drwxr-xr-x. 2 vdsm kvm    4096 Oct  3 17:17 .
>
> drwxr-xr-x. 6 vdsm kvm    4096 Oct  3 17:17 ..
>
> -rw-rw----. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93
>
> -rw-rw----. 2 vdsm kvm 1048576 Oct  3 17:17 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93.lease
>
> -rw-r--r--. 2 vdsm kvm     283 Oct  3 17:17
cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta
>
>
>
>
> Thanks
>
>
>
> Jason
>
>
>
>
>
>
>
> *From:* Simone Tiraboschi [mailto:stirabos at redhat.com]
> *Sent:* 04 October 2016 14:40
>
>
> *To:* Jason Jeffrey <jason at sudo.co.uk>
> *Cc:* users <users at ovirt.org>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
>
>
>
>
> On Tue, Oct 4, 2016 at 10:51 AM, Simone Tiraboschi <stirabos at
redhat.com>
> wrote:
>
>
>
>
>
> On Mon, Oct 3, 2016 at 11:56 PM, Jason Jeffrey <jason at sudo.co.uk>
wrote:
>
> Hi,
>
>
>
> Another problem has appeared, after rebooting the primary the VM will not
> start.
>
>
>
> Appears the symlink is broken between gluster mount ref and vdsm
>
>
>
> The first host was correctly deployed but it seas that you are facing some
> issue connecting the storage.
>
> Can you please attach vdsm logs and /var/log/messages from the first host?
>
>
>
> Thanks Jason,
>
> I suspect that your issue is related to this:
>
> Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 17:24:39.522620] C [MSGID: 106002] [glusterd-server-quorum.c:351:
> glusterd_do_volume_quorum_action] 0-management: Server quorum lost for
> volume data. Stopping local bricks.
>
> Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 17:24:39.523272] C [MSGID: 106002] [glusterd-server-quorum.c:351:
> glusterd_do_volume_quorum_action] 0-management: Server quorum lost for
> volume engine. Stopping local bricks.
>
>
>
> and for some time your gluster volume has been working.
>
>
>
> But then:
>
> Oct  4 19:02:09 dcasrv01 systemd: Started /usr/bin/mount -t glusterfs -o
> backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
> /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>
> Oct  4 19:02:09 dcasrv01 systemd: Starting /usr/bin/mount -t glusterfs -o
> backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
> /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: /usr/lib/python2.7/site-
> packages/yajsonrpc/stomp.py:352: DeprecationWarning: Dispatcher.pending
> is deprecated. Use Dispatcher.socket.pending instead.
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
> 'pending', lambda: 0)
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: /usr/lib/python2.7/site-
> packages/yajsonrpc/stomp.py:352: DeprecationWarning: Dispatcher.pending
> is deprecated. Use Dispatcher.socket.pending instead.
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
> 'pending', lambda: 0)
>
> Oct  4 19:02:11 dcasrv01 journal: vdsm vds.dispatcher ERROR SSL error
> during reading data: unexpected eof
>
> Oct  4 19:02:11 dcasrv01 journal: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to
> storage server failed' - trying to restart agent
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent:
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error:
> 'Connection to storage server failed' - trying to restart agent
>
> Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 18:02:12.384611] C [MSGID: 106003] [glusterd-server-quorum.c:346:
> glusterd_do_volume_quorum_action] 0-management: Server quorum regained
> for volume data. Starting local bricks.
>
> Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 18:02:12.388981] C [MSGID: 106003] [glusterd-server-quorum.c:346:
> glusterd_do_volume_quorum_action] 0-management: Server quorum regained
> for volume engine. Starting local bricks.
>
>
>
> And at that point VDSM started complaining that the hosted-engine-storage
> domain doesn't exist anymore:
>
> Oct  4 19:02:30 dcasrv01 journal: ovirt-ha-agent
> ovirt_hosted_engine_ha.lib.image.Image ERROR Error fetching volumes list:
> Storage domain does not exist:
(u'bbb70623-194a-46d2-a164-76a4876ecaaf',)
>
> Oct  4 19:02:30 dcasrv01 ovirt-ha-agent:
ERROR:ovirt_hosted_engine_ha.lib.image.Image:Error
> fetching volumes list: Storage domain does not exist:
> (u'bbb70623-194a-46d2-a164-76a4876ecaaf',)
>
>
>
> I see from the logs that the ovirt-ha-agent is trying to mount the
> hosted-engine storage domain as:
>
> /usr/bin/mount -t glusterfs -o backup-volfile-servers=dcastor02:dcastor03
> dcastor01:engine /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>
>
>
> Pointing to dcastor01, dcastor02 and dcastor03 while your server is
> dcasrv01.
>
> But at the same time it seams that also dcasrv01 has local bricks for the
> same engine volume.
>
>
>
> So, is dcasrv01 just an alias fro dcastor01? if not you probably have some
> issue with the configuration of your gluster volume.
>
>
>
>
>
>
>
> From broker.log
>
>
>
> Thread-169::ERROR::2016-10-04 22:44:16,189::storage_broker::138::
> ovirt_hosted_engine_ha.broker.storage_broker.
> StorageBroker::(get_raw_stats_for_service_type) Failed to read metadata
> from /rhev/data-center/mnt/glusterSD/dcastor01:engine/
> bbb70623-194a-46d2-a164-76a4876ecaaf/ha_agent/hosted-engine.metadata
>
>
>
> [root at dcasrv01 ovirt-hosted-engine-ha]# ls -al /rhev/data-center/mnt/
> glusterSD/dcastor01\:engine/bbb70623-194a-46d2-a164-76a4876ecaaf/ha_agent/
>
> total 9
>
> drwxrwx---. 2 vdsm kvm 4096 Oct  3 17:27 .
>
> drwxr-xr-x. 5 vdsm kvm 4096 Oct  3 17:17 ..
>
> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.lockspace ->
> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/23d81b73-bcb7-
> 4742-abde-128522f43d78/11d6a3e1-1817-429d-b2e0-9051a3cf41a4
>
> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-
> 496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93
>
>
>
> [root at dcasrv01 /]# ls -al /var/run/vdsm/storage/bbb70623-194a-46d2-a164-
> 76a4876ecaaf/
>
> ls: cannot access
/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/:
> No such file or directory
>
>
>
> Though file appears to be there
>
>
>
> Gluster is setup as xpool/engine
>
>
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# pwd
>
> /xpool/engine/brick/bbb70623-194a-46d2-a164-76a4876ecaaf/
> images/fd44dbf9-473a-496a-9996-c8abe3278390
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al
>
> total 2060
>
> drwxr-xr-x. 2 vdsm kvm    4096 Oct  3 17:17 .
>
> drwxr-xr-x. 6 vdsm kvm    4096 Oct  3 17:17 ..
>
> -rw-rw----. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93
>
> -rw-rw----. 2 vdsm kvm 1048576 Oct  3 17:17 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93.lease
>
> -rw-r--r--. 2 vdsm kvm     283 Oct  3 17:17
cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta
>
>
>
>
>
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume
info
>
>
>
> Volume Name: data
>
> Type: Replicate
>
> Volume ID: 54fbcafc-fed9-4bce-92ec-fa36cdcacbd4
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor01:/xpool/data/brick
>
> Brick2: dcastor03:/xpool/data/brick
>
> Brick3: dcastor02:/xpool/data/bricky (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> performance.quick-read: off
>
> performance.read-ahead: off
>
> performance.io-cache: off
>
> performance.stat-prefetch: off
>
> cluster.eager-lock: enable
>
> network.remote-dio: enable
>
> cluster.quorum-type: auto
>
> cluster.server-quorum-type: server
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
> Volume Name: engine
>
> Type: Replicate
>
> Volume ID: dd4c692d-03aa-4fc6-9011-a8dad48dad96
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor01:/xpool/engine/brick
>
> Brick2: dcastor02:/xpool/engine/brick
>
> Brick3: dcastor03:/xpool/engine/brick (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> performance.quick-read: off
>
> performance.read-ahead: off
>
> performance.io-cache: off
>
> performance.stat-prefetch: off
>
> cluster.eager-lock: enable
>
> network.remote-dio: enable
>
> cluster.quorum-type: auto
>
> cluster.server-quorum-type: server
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
> Volume Name: export
>
> Type: Replicate
>
> Volume ID: 23f14730-d264-4cc2-af60-196b943ecaf3
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor02:/xpool/export/brick
>
> Brick2: dcastor03:/xpool/export/brick
>
> Brick3: dcastor01:/xpool/export/brick (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
> Volume Name: iso
>
> Type: Replicate
>
> Volume ID: b2d3d7e2-9919-400b-8368-a0443d48e82a
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor01:/xpool/iso/brick
>
> Brick2: dcastor02:/xpool/iso/brick
>
> Brick3: dcastor03:/xpool/iso/brick (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
>
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume
> status
>
> Status of volume: data
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick dcastor01:/xpool/data/brick           49153     0          Y
> 3076
>
> Brick dcastor03:/xpool/data/brick           49153     0          Y
> 3019
>
> Brick dcastor02:/xpool/data/bricky          49153     0          Y
> 3857
>
> NFS Server on localhost                     2049      0          Y
>     3097
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 3088
>
> NFS Server on dcastor03                     2049      0          Y
> 3039
>
> Self-heal Daemon on dcastor03               N/A       N/A        Y
> 3114
>
> NFS Server on dcasrv02                      2049      0          Y
> 3871
>
> Self-heal Daemon on dcasrv02                N/A       N/A        Y
> 3864
>
>
>
> Task Status of Volume data
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
> Status of volume: engine
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick dcastor01:/xpool/engine/brick         49152     0          Y
> 3131
>
> Brick dcastor02:/xpool/engine/brick         49152     0          Y
> 3852
>
> Brick dcastor03:/xpool/engine/brick         49152     0          Y
> 2992
>
> NFS Server on localhost                     2049      0          Y
> 3097
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 3088
>
> NFS Server on dcastor03                     2049      0          Y
> 3039
>
> Self-heal Daemon on dcastor03               N/A       N/A        Y
> 3114
>
> NFS Server on dcasrv02                      2049      0          Y
> 3871
>
> Self-heal Daemon on dcasrv02                N/A       N/A        Y
> 3864
>
>
>
> Task Status of Volume engine
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
> Status of volume: export
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick dcastor02:/xpool/export/brick         49155     0          Y
> 3872
>
> Brick dcastor03:/xpool/export/brick         49155     0          Y
> 3147
>
> Brick dcastor01:/xpool/export/brick         49155     0          Y
> 3150
>
> NFS Server on localhost                     2049      0          Y
> 3097
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 3088
>
> NFS Server on dcastor03                     2049      0          Y
> 3039
>
> Self-heal Daemon on dcastor03               N/A       N/A        Y
> 3114
>
> NFS Server on dcasrv02                      2049      0          Y
> 3871
>
> Self-heal Daemon on dcasrv02                N/A       N/A        Y
> 3864
>
>
>
> Task Status of Volume export
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
> Status of volume: iso
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick dcastor01:/xpool/iso/brick            49154     0          Y
> 3152
>
> Brick dcastor02:/xpool/iso/brick            49154     0          Y
> 3881
>
> Brick dcastor03:/xpool/iso/brick            49154     0          Y
> 3146
>
> NFS Server on localhost                     2049      0          Y
> 3097
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 3088
>
> NFS Server on dcastor03                     2049      0          Y
> 3039
>
> Self-heal Daemon on dcastor03               N/A       N/A        Y
> 3114
>
> NFS Server on dcasrv02                      2049      0          Y
> 3871
>
> Self-heal Daemon on dcasrv02                N/A       N/A        Y
> 3864
>
>
>
> Task Status of Volume iso
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
>
> Thanks
>
>
>
> Jason
>
>
>
>
>
>
>
> *From:* users-bounces at ovirt.org [mailto:users-bounces at ovirt.org] *On
> Behalf Of *Jason Jeffrey
> *Sent:* 03 October 2016 18:40
> *To:* users at ovirt.org
>
>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
> Hi,
>
>
>
> Setup log attached for primary
>
>
>
> Regards
>
>
>
> Jason
>
>
>
> *From:* Simone Tiraboschi [mailto:stirabos at redhat.com
> <stirabos at redhat.com>]
> *Sent:* 03 October 2016 09:27
> *To:* Jason Jeffrey <jason at sudo.co.uk>
> *Cc:* users <users at ovirt.org>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
>
>
>
>
> On Mon, Oct 3, 2016 at 12:45 AM, Jason Jeffrey <jason at sudo.co.uk>
wrote:
>
> Hi,
>
>
>
> I am trying to build a x3 HC cluster, with a self hosted engine using
> gluster.
>
>
>
> I have successful built the 1st node,  however when I attempt to run
> hosted-engine ?deploy on node 2, I get the following error
>
>
>
> [WARNING] A configuration file must be supplied to deploy Hosted Engine on
> an additional host.
>
> [ ERROR ] 'version' is not stored in the HE configuration image
>
> [ ERROR ] Unable to get the answer file from the shared storage
>
> [ ERROR ] Failed to execute stage 'Environment customization':
Unable to
> get the answer file from the shared storage
>
> [ INFO  ] Stage: Clean up
>
> [ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-
> setup/answers/answers-20161002232505.conf'
>
> [ INFO  ] Stage: Pre-termination
>
> [ INFO  ] Stage: Termination
>
> [ ERROR ] Hosted Engine deployment failed
>
>
>
> Looking at the failure in the log file..
>
>
>
> Can you please attach hosted-engine-setup logs from the first host?
>
>
>
>
>
> 2016-10-02 23:25:05 WARNING
otopi.plugins.gr_he_common.core.remote_answerfile
> remote_answerfile._customization:151 A configuration
>
> file must be supplied to deploy Hosted Engine on an additional host.
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> remote_answerfile._fetch_answer_file:61 _fetch_answer_f
>
> ile
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> remote_answerfile._fetch_answer_file:69 fetching from:
>
> /rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-
> fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/7
>
> 8cb2527-a2e2-489a-9fad-465a72221b37
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib._dd_pipe_tar:69 executing: 'sudo -u vdsm dd i
>
> f=/rhev/data-center/mnt/glusterSD/dcastor02:engine/
> 0a021563-91b5-4f49-9c6b-fff45e85a025/images/f055216c-
> 02f9-4cd1-a22c-d6b56a0a8e9b
>
> /78cb2527-a2e2-489a-9fad-465a72221b37 bs=4k'
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib._dd_pipe_tar:70 executing: 'tar -tvf -'
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib._dd_pipe_tar:88 stdout:
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib._dd_pipe_tar:89 stderr:
>
> 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib.validateConfImage:111 'version' is not stored
>
> in the HE configuration image
>
> 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile
> remote_answerfile._fetch_answer_file:73 Unable to get t
>
> he answer file from the shared storage
>
>
>
> Looking at the detected gluster path - /rhev/data-center/mnt/
> glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-
> fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/
>
>
>
> [root at dcasrv02 ~]# ls -al /rhev/data-center/mnt/
> glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-
> fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/
>
> total 1049609
>
> drwxr-xr-x. 2 vdsm kvm       4096 Oct  2 04:46 .
>
> drwxr-xr-x. 6 vdsm kvm       4096 Oct  2 04:46 ..
>
> -rw-rw----. 1 vdsm kvm 1073741824 Oct  2 04:46 78cb2527-a2e2-489a-9fad-
> 465a72221b37
>
> -rw-rw----. 1 vdsm kvm    1048576 Oct  2 04:46 78cb2527-a2e2-489a-9fad-
> 465a72221b37.lease
>
> -rw-r--r--. 1 vdsm kvm        294 Oct  2 04:46
78cb2527-a2e2-489a-9fad-465a72221b37.meta
>
>
>
>
> 78cb2527-a2e2-489a-9fad-465a72221b37 is  a 1 GB file, is this the engine
> VM ?
>
>
>
> Copying the answers file form primary
(/etc/ovirt-hosted-engine/answers.conf
> ) to  node 2 and rerunning produces the same error : (
>
> (hosted-engine --deploy  --config-append=/root/answers.conf )
>
>
>
> Also tried on node 3, same issues
>
>
>
> Happy to provide logs and other debugs
>
>
>
> Thanks
>
>
>
> Jason
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161005/d24c5cd2/attachment.html>

Jason Jeffrey

2016-Oct-05 08:26 UTC

head link

[Gluster-users] [ovirt-users] 4.0 - 2nd node fails on deploy

HI,

 

Logs attached

 

Thanks 

 

From: Sahina Bose [mailto:sabose at redhat.com] 
Sent: 05 October 2016 08:11
To: Jason Jeffrey <jason at sudo.co.uk>; gluster-users at gluster.org;
Ravishankar Narayanankutty <ravishankar at redhat.com>
Cc: Simone Tiraboschi <stirabos at redhat.com>; users <users at
ovirt.org>
Subject: Re: [ovirt-users] 4.0 - 2nd node fails on deploy

 

[Adding gluster-users ML]

The brick logs are filled with errors :
[2016-10-05 19:30:28.659061] E [MSGID: 113077]
[posix-handle.c:309:posix_handle_pump] 0-engine-posix: malformed internal link
/var/run/vdsm/storage/0a021563-91b5-4f49-9c6b-fff45e85a025/d84f0551-0f2b-457c-808c-6369c6708d43/1b5a5e34-818c-4914-8192-2f05733b5583
for /xpool/engine/brick/.glusterfs/b9/8e/b98ed8d2-3bf9-4b11-92fd-ca5324e131a8
[2016-10-05 19:30:28.659069] E [MSGID: 113091] [posix.c:180:posix_lookup]
0-engine-posix: Failed to create inode handle for path
<gfid:b98ed8d2-3bf9-4b11-92fd-ca5324e131a8>
The message "E [MSGID: 113018] [posix.c:198:posix_lookup] 0-engine-posix:
lstat on null failed" repeated 3 times between [2016-10-05 19:30:28.656529]
and [2016-10-05 19:30:28.659076]
[2016-10-05 19:30:28.659087] W [MSGID: 115005]
[server-resolve.c:126:resolve_gfid_cbk] 0-engine-server:
b98ed8d2-3bf9-4b11-92fd-ca5324e131a8: failed to resolve (Success)

- Ravi, the above are from the data brick of the arbiter volume. Can you take a
look?

 

Jason,

Could you also provide the mount logs from the first host
(/var/log/glusterfs/rhev-data-center-mnt-glusterSD*engine.log) and glusterd log
(/var/log/glusterfs/etc-glusterfs-glusterd.vol.log) around the same time frame.

 

 

On Wed, Oct 5, 2016 at 3:28 AM, Jason Jeffrey <jason at sudo.co.uk
<mailto:jason at sudo.co.uk> > wrote:

Hi,

 

Servers are powered  off  when I?m not looking at the problem.

 

There may have been instances where all three were not powered on, during the
same period.

 

Glusterhd log attached, the xpool-engine-brick log is over 1 GB in size, I?ve
taken a sample of the last  couple days, looks to be highly repative.

 

Cheers

 

Jason

 

 

 

 

From: Simone Tiraboschi [mailto:stirabos at redhat.com <mailto:stirabos at
redhat.com> ]
Sent: 04 October 2016 16:50


To: Jason Jeffrey <jason at sudo.co.uk <mailto:jason at sudo.co.uk>
>
Cc: users <users at ovirt.org <mailto:users at ovirt.org> >
Subject: Re: [ovirt-users] 4.0 - 2nd node fails on deploy

 

 

 

On Tue, Oct 4, 2016 at 5:22 PM, Jason Jeffrey <jason at sudo.co.uk
<mailto:jason at sudo.co.uk> > wrote:

Hi,

 

DCASTORXX is a hosts entry for dedicated  direct 10GB links (each private /28)
between the x3 servers  i.e 1=> 2&3, 2=> 1&3, etc) planned to be
used solely for storage.

 

I,e 

 

10.100.50.81    dcasrv01

10.100.101.1    dcastor01

10.100.50.82    dcasrv02

10.100.101.2    dcastor02

10.100.50.83    dcasrv03

10.100.103.3    dcastor03  

 

These were setup with the gluster commands

 

*         gluster volume create iso replica 3 arbiter 1 
dcastor01:/xpool/iso/brick   dcastor02:/xpool/iso/brick  
dcastor03:/xpool/iso/brick

*         gluster volume create export replica 3 arbiter 1 
dcastor02:/xpool/export/brick  dcastor03:/xpool/export/brick 
dcastor01:/xpool/export/brick

*         gluster volume create engine replica 3 arbiter 1
dcastor01:/xpool/engine/brick dcastor02:/xpool/engine/brick
dcastor03:/xpool/engine/brick

*         gluster volume create data replica 3 arbiter 1 
dcastor01:/xpool/data/brick  dcastor03:/xpool/data/brick 
dcastor02:/xpool/data/bricky

 

 

So yes, DCASRV01 is the server (pri) and have local bricks access through
DCASTOR01 interface

 

Is the issue here not the incorrect soft link ?

 

No, this should be fine.

 

The issue is that periodically your gluster volume losses its server quorum and
become unavailable.

It happened more than once from your logs.

 

Can you please attach also gluster logs for that volume?

 

 

lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93

[root at dcasrv01 /]# ls -al
/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/

ls: cannot access /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/:
No such file or directory

But the data does exist 

[root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al

drwxr-xr-x. 2 vdsm kvm    4096 Oct  3 17:17 .

drwxr-xr-x. 6 vdsm kvm    4096 Oct  3 17:17 ..

-rw-rw----. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-c47e6f9cbc93

-rw-rw----. 2 vdsm kvm 1048576 Oct  3 17:17
cee9440c-4eb8-453b-bc04-c47e6f9cbc93.lease

-rw-r--r--. 2 vdsm kvm     283 Oct  3 17:17
cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta

 

Thanks 

 

Jason 

 

 

 

From: Simone Tiraboschi [mailto:stirabos at redhat.com <mailto:stirabos at
redhat.com> ]
Sent: 04 October 2016 14:40


To: Jason Jeffrey <jason at sudo.co.uk <mailto:jason at sudo.co.uk>
>
Cc: users <users at ovirt.org <mailto:users at ovirt.org> >
Subject: Re: [ovirt-users] 4.0 - 2nd node fails on deploy

 

 

 

On Tue, Oct 4, 2016 at 10:51 AM, Simone Tiraboschi <stirabos at redhat.com
<mailto:stirabos at redhat.com> > wrote:

 

 

On Mon, Oct 3, 2016 at 11:56 PM, Jason Jeffrey <jason at sudo.co.uk
<mailto:jason at sudo.co.uk> > wrote:

Hi,

 

Another problem has appeared, after rebooting the primary the VM will not start.

 

Appears the symlink is broken between gluster mount ref and vdsm

 

The first host was correctly deployed but it seas that you are facing some issue
connecting the storage.

Can you please attach vdsm logs and /var/log/messages from the first host?

 

Thanks Jason,

I suspect that your issue is related to this:

Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
17:24:39.522620] C [MSGID: 106002]
[glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management:
Server quorum lost for volume data. Stopping local bricks.

Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
17:24:39.523272] C [MSGID: 106002]
[glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management:
Server quorum lost for volume engine. Stopping local bricks.

 

and for some time your gluster volume has been working.

 

But then:

Oct  4 19:02:09 dcasrv01 systemd: Started /usr/bin/mount -t glusterfs -o
backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
/rhev/data-center/mnt/glusterSD/dcastor01:engine.

Oct  4 19:02:09 dcasrv01 systemd: Starting /usr/bin/mount -t glusterfs -o
backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
/rhev/data-center/mnt/glusterSD/dcastor01:engine.

Oct  4 19:02:11 dcasrv01 ovirt-ha-agent:
/usr/lib/python2.7/site-packages/yajsonrpc/stomp.py:352: DeprecationWarning:
Dispatcher.pending is deprecated. Use Dispatcher.socket.pending instead.

Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
'pending', lambda: 0)

Oct  4 19:02:11 dcasrv01 ovirt-ha-agent:
/usr/lib/python2.7/site-packages/yajsonrpc/stomp.py:352: DeprecationWarning:
Dispatcher.pending is deprecated. Use Dispatcher.socket.pending instead.

Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
'pending', lambda: 0)

Oct  4 19:02:11 dcasrv01 journal: vdsm vds.dispatcher ERROR SSL error during
reading data: unexpected eof

Oct  4 19:02:11 dcasrv01 journal: ovirt-ha-agent
ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to storage
server failed' - trying to restart agent

Oct  4 19:02:11 dcasrv01 ovirt-ha-agent:
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to storage
server failed' - trying to restart agent

Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
18:02:12.384611] C [MSGID: 106003]
[glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action] 0-management:
Server quorum regained for volume data. Starting local bricks.

Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
18:02:12.388981] C [MSGID: 106003]
[glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action] 0-management:
Server quorum regained for volume engine. Starting local bricks.

 

And at that point VDSM started complaining that the hosted-engine-storage domain
doesn't exist anymore:

Oct  4 19:02:30 dcasrv01 journal: ovirt-ha-agent
ovirt_hosted_engine_ha.lib.image.Image ERROR Error fetching volumes list:
Storage domain does not exist:
(u'bbb70623-194a-46d2-a164-76a4876ecaaf',)

Oct  4 19:02:30 dcasrv01 ovirt-ha-agent:
ERROR:ovirt_hosted_engine_ha.lib.image.Image:Error fetching volumes list:
Storage domain does not exist:
(u'bbb70623-194a-46d2-a164-76a4876ecaaf',)

 

I see from the logs that the ovirt-ha-agent is trying to mount the hosted-engine
storage domain as:

/usr/bin/mount -t glusterfs -o backup-volfile-servers=dcastor02:dcastor03
dcastor01:engine /rhev/data-center/mnt/glusterSD/dcastor01:engine.

 

Pointing to dcastor01, dcastor02 and dcastor03 while your server is dcasrv01.

But at the same time it seams that also dcasrv01 has local bricks for the same
engine volume.

 

So, is dcasrv01 just an alias fro dcastor01? if not you probably have some issue
with the configuration of your gluster volume.

 

 

 
>From broker.log
 

Thread-169::ERROR::2016-10-04
22:44:16,189::storage_broker::138::ovirt_hosted_engine_ha.br
<http://ovirt_hosted_engine_ha.br>
oker.storage_broker.StorageBroker::(get_raw_stats_for_service_type) Failed to
read metadata from
/rhev/data-center/mnt/glusterSD/dcastor01:engine/bbb70623-194a-46d2-a164-76a4876ecaaf/ha_agent/hosted-engine.metadata

 

[root at dcasrv01 ovirt-hosted-engine-ha]# ls -al
/rhev/data-center/mnt/glusterSD/dcastor01\:engine/bbb70623-194a-46d2-a164-76a4876ecaaf/ha_agent/

total 9

drwxrwx---. 2 vdsm kvm 4096 Oct  3 17:27 .

drwxr-xr-x. 5 vdsm kvm 4096 Oct  3 17:17 ..

lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.lockspace ->
/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/23d81b73-bcb7-4742-abde-128522f43d78/11d6a3e1-1817-429d-b2e0-9051a3cf41a4

lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93

 

[root at dcasrv01 /]# ls -al
/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/

ls: cannot access /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/:
No such file or directory

 

Though file appears to be there 

 

Gluster is setup as xpool/engine 

 

[root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# pwd

/xpool/engine/brick/bbb70623-194a-46d2-a164-76a4876ecaaf/images/fd44dbf9-473a-496a-9996-c8abe3278390

[root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al

total 2060

drwxr-xr-x. 2 vdsm kvm    4096 Oct  3 17:17 .

drwxr-xr-x. 6 vdsm kvm    4096 Oct  3 17:17 ..

-rw-rw----. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-c47e6f9cbc93

-rw-rw----. 2 vdsm kvm 1048576 Oct  3 17:17
cee9440c-4eb8-453b-bc04-c47e6f9cbc93.lease

-rw-r--r--. 2 vdsm kvm     283 Oct  3 17:17
cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta

 

 

[root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume info

 

Volume Name: data

Type: Replicate

Volume ID: 54fbcafc-fed9-4bce-92ec-fa36cdcacbd4

Status: Started

Number of Bricks: 1 x (2 + 1) = 3

Transport-type: tcp

Bricks:

Brick1: dcastor01:/xpool/data/brick

Brick2: dcastor03:/xpool/data/brick

Brick3: dcastor02:/xpool/data/bricky (arbiter)

Options Reconfigured:

performance.readdir-ahead: on

performance.quick-read: off

performance.read-ahead: off

performance.io-cache: off

performance.stat-prefetch: off

cluster.eager-lock: enable

network.remote-dio: enable

cluster.quorum-type: auto

cluster.server-quorum-type: server

storage.owner-uid: 36

storage.owner-gid: 36

 

Volume Name: engine

Type: Replicate

Volume ID: dd4c692d-03aa-4fc6-9011-a8dad48dad96

Status: Started

Number of Bricks: 1 x (2 + 1) = 3

Transport-type: tcp

Bricks:

Brick1: dcastor01:/xpool/engine/brick

Brick2: dcastor02:/xpool/engine/brick

Brick3: dcastor03:/xpool/engine/brick (arbiter)

Options Reconfigured:

performance.readdir-ahead: on

performance.quick-read: off

performance.read-ahead: off

performance.io-cache: off

performance.stat-prefetch: off

cluster.eager-lock: enable

network.remote-dio: enable

cluster.quorum-type: auto

cluster.server-quorum-type: server

storage.owner-uid: 36

storage.owner-gid: 36

 

Volume Name: export

Type: Replicate

Volume ID: 23f14730-d264-4cc2-af60-196b943ecaf3

Status: Started

Number of Bricks: 1 x (2 + 1) = 3

Transport-type: tcp

Bricks:

Brick1: dcastor02:/xpool/export/brick

Brick2: dcastor03:/xpool/export/brick

Brick3: dcastor01:/xpool/export/brick (arbiter)

Options Reconfigured:

performance.readdir-ahead: on

storage.owner-uid: 36

storage.owner-gid: 36

 

Volume Name: iso

Type: Replicate

Volume ID: b2d3d7e2-9919-400b-8368-a0443d48e82a

Status: Started

Number of Bricks: 1 x (2 + 1) = 3

Transport-type: tcp

Bricks:

Brick1: dcastor01:/xpool/iso/brick

Brick2: dcastor02:/xpool/iso/brick

Brick3: dcastor03:/xpool/iso/brick (arbiter)

Options Reconfigured:

performance.readdir-ahead: on

storage.owner-uid: 36

storage.owner-gid: 36                                   

 

 

[root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume status

Status of volume: data

Gluster process                             TCP Port  RDMA Port  Online  Pid

------------------------------------------------------------------------------

Brick dcastor01:/xpool/data/brick           49153     0          Y       3076

Brick dcastor03:/xpool/data/brick           49153     0          Y       3019

Brick dcastor02:/xpool/data/bricky          49153     0          Y       3857

NFS Server on localhost                     2049      0          Y       3097

Self-heal Daemon on localhost               N/A       N/A        Y       3088

NFS Server on dcastor03                     2049      0          Y       3039

Self-heal Daemon on dcastor03               N/A       N/A        Y       3114

NFS Server on dcasrv02                      2049      0          Y       3871

Self-heal Daemon on dcasrv02                N/A       N/A        Y       3864

 

Task Status of Volume data

------------------------------------------------------------------------------

There are no active volume tasks

 

Status of volume: engine

Gluster process                             TCP Port  RDMA Port  Online  Pid

------------------------------------------------------------------------------

Brick dcastor01:/xpool/engine/brick         49152     0          Y       3131

Brick dcastor02:/xpool/engine/brick         49152     0          Y       3852

Brick dcastor03:/xpool/engine/brick         49152     0          Y       2992

NFS Server on localhost                     2049      0          Y       3097

Self-heal Daemon on localhost               N/A       N/A        Y       3088

NFS Server on dcastor03                     2049      0          Y       3039

Self-heal Daemon on dcastor03               N/A       N/A        Y       3114

NFS Server on dcasrv02                      2049      0          Y       3871

Self-heal Daemon on dcasrv02                N/A       N/A        Y       3864

 

Task Status of Volume engine

------------------------------------------------------------------------------

There are no active volume tasks

 

Status of volume: export

Gluster process                             TCP Port  RDMA Port  Online  Pid

------------------------------------------------------------------------------

Brick dcastor02:/xpool/export/brick         49155     0          Y       3872

Brick dcastor03:/xpool/export/brick         49155     0          Y       3147

Brick dcastor01:/xpool/export/brick         49155     0          Y       3150

NFS Server on localhost                     2049      0          Y       3097

Self-heal Daemon on localhost               N/A       N/A        Y       3088

NFS Server on dcastor03                     2049      0          Y       3039

Self-heal Daemon on dcastor03               N/A       N/A        Y       3114

NFS Server on dcasrv02                      2049      0          Y       3871

Self-heal Daemon on dcasrv02                N/A       N/A        Y       3864

 

Task Status of Volume export

------------------------------------------------------------------------------

There are no active volume tasks

 

Status of volume: iso

Gluster process                             TCP Port  RDMA Port  Online  Pid

------------------------------------------------------------------------------

Brick dcastor01:/xpool/iso/brick            49154     0          Y       3152

Brick dcastor02:/xpool/iso/brick            49154     0          Y       3881

Brick dcastor03:/xpool/iso/brick            49154     0          Y       3146

NFS Server on localhost                     2049      0          Y       3097

Self-heal Daemon on localhost               N/A       N/A        Y       3088

NFS Server on dcastor03                     2049      0          Y       3039

Self-heal Daemon on dcastor03               N/A       N/A        Y       3114

NFS Server on dcasrv02                      2049      0          Y       3871

Self-heal Daemon on dcasrv02                N/A       N/A        Y       3864

 

Task Status of Volume iso

------------------------------------------------------------------------------

There are no active volume tasks



Thanks

 

Jason

 

 

 

From: users-bounces at ovirt.org <mailto:users-bounces at ovirt.org> 
[mailto:users-bounces at ovirt.org <mailto:users-bounces at ovirt.org> ]
On Behalf Of Jason Jeffrey
Sent: 03 October 2016 18:40
To: users at ovirt.org <mailto:users at ovirt.org> 


Subject: Re: [ovirt-users] 4.0 - 2nd node fails on deploy

 

Hi,

 

Setup log attached for primary

 

Regards

 

Jason 

 

From: Simone Tiraboschi [mailto:stirabos at redhat.com] 
Sent: 03 October 2016 09:27
To: Jason Jeffrey <jason at sudo.co.uk <mailto:jason at sudo.co.uk>
>
Cc: users <users at ovirt.org <mailto:users at ovirt.org> >
Subject: Re: [ovirt-users] 4.0 - 2nd node fails on deploy

 

 

 

On Mon, Oct 3, 2016 at 12:45 AM, Jason Jeffrey <jason at sudo.co.uk
<mailto:jason at sudo.co.uk> > wrote:

Hi,

 

I am trying to build a x3 HC cluster, with a self hosted engine using gluster.

 

I have successful built the 1st node,  however when I attempt to run
hosted-engine ?deploy on node 2, I get the following error

 

[WARNING] A configuration file must be supplied to deploy Hosted Engine on an
additional host.

[ ERROR ] 'version' is not stored in the HE configuration image

[ ERROR ] Unable to get the answer file from the shared storage

[ ERROR ] Failed to execute stage 'Environment customization': Unable to
get the answer file from the shared storage

[ INFO  ] Stage: Clean up

[ INFO  ] Generating answer file
'/var/lib/ovirt-hosted-engine-setup/answers/answers-20161002232505.conf'

[ INFO  ] Stage: Pre-termination

[ INFO  ] Stage: Termination

[ ERROR ] Hosted Engine deployment failed    

 

Looking at the failure in the log file..

 

Can you please attach hosted-engine-setup logs from the first host?

 

 

2016-10-02 23:25:05 WARNING otopi.plugins.gr_he_common.core.remote_answerfile
remote_answerfile._customization:151 A configuration

file must be supplied to deploy Hosted Engine on an additional host.

2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
remote_answerfile._fetch_answer_file:61 _fetch_answer_f

ile

2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
remote_answerfile._fetch_answer_file:69 fetching from:

/rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/7

8cb2527-a2e2-489a-9fad-465a72221b37

2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
heconflib._dd_pipe_tar:69 executing: 'sudo -u vdsm dd i

f=/rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b

/78cb2527-a2e2-489a-9fad-465a72221b37 bs=4k'

2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
heconflib._dd_pipe_tar:70 executing: 'tar -tvf -'

2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
heconflib._dd_pipe_tar:88 stdout:

2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
heconflib._dd_pipe_tar:89 stderr:

2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile
heconflib.validateConfImage:111 'version' is not stored

in the HE configuration image

2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile
remote_answerfile._fetch_answer_file:73 Unable to get t

he answer file from the shared storage

 

Looking at the detected gluster path -
/rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/

 

[root at dcasrv02 ~]# ls -al
/rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/

total 1049609

drwxr-xr-x. 2 vdsm kvm       4096 Oct  2 04:46 .

drwxr-xr-x. 6 vdsm kvm       4096 Oct  2 04:46 ..

-rw-rw----. 1 vdsm kvm 1073741824 Oct  2 04:46
78cb2527-a2e2-489a-9fad-465a72221b37

-rw-rw----. 1 vdsm kvm    1048576 Oct  2 04:46
78cb2527-a2e2-489a-9fad-465a72221b37.lease

-rw-r--r--. 1 vdsm kvm        294 Oct  2 04:46
78cb2527-a2e2-489a-9fad-465a72221b37.meta

 

78cb2527-a2e2-489a-9fad-465a72221b37 is  a 1 GB file, is this the engine VM ?

 

Copying the answers file form primary (/etc/ovirt-hosted-engine/answers.conf )
to  node 2 and rerunning produces the same error : (

(hosted-engine --deploy  --config-append=/root/answers.conf )

 

Also tried on node 3, same issues 

 

Happy to provide logs and other debugs

 

Thanks 

 

Jason 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


_______________________________________________
Users mailing list
Users at ovirt.org <mailto:Users at ovirt.org> 
http://lists.ovirt.org/mailman/listinfo/users

 


_______________________________________________
Users mailing list
Users at ovirt.org <mailto:Users at ovirt.org> 
http://lists.ovirt.org/mailman/listinfo/users

 

 

 


_______________________________________________
Users mailing list
Users at ovirt.org <mailto:Users at ovirt.org> 
http://lists.ovirt.org/mailman/listinfo/users

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161005/187c5248/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: etc-glusterfs-glusterd.vol.log.gz
Type: application/octet-stream
Size: 15852 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161005/187c5248/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rhev-data-center-mnt-glusterSD-dcastor01%3Aengine.log.gz
Type: application/octet-stream
Size: 55672 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161005/187c5248/attachment-0001.obj>

Sahina Bose

2016-Oct-05 10:31 UTC

head link

[Gluster-users] [ovirt-users] 4.0 - 2nd node fails on deploy

On Wed, Oct 5, 2016 at 1:56 PM, Jason Jeffrey <jason at sudo.co.uk> wrote:
> HI,
>
>
>
> Logs attached
>
Have you probed 2 interfaces for same host, that is - dcasrv02 and
dcastor02? Does "gluster peer status" understand both names as for
same
host?
>From glusterd logs and the mount logs - the connection between the peers islost, and quorum is lost, which is reaffirming what Simone said earlier.
Logs seem to indicate network issues - check the direct link setup. See
below
>From mount logs:[2016-10-04 17:26:15.718300] E [socket.c:2292:socket_connect_finish]
0-engine-client-2: connection to 10.100.103.3:24007 failed (No route to
host)
[2016-10-04 17:26:15.718345] W [MSGID: 108001]
[afr-common.c:4379:afr_notify] 0-engine-replicate-0: Client-quorum is not
met
[2016-10-04 17:26:16.428290] E [socket.c:2292:socket_connect_finish]
0-engine-client-1: connection to 10.100.101.2:24007 failed (No route to
host)
[2016-10-04 17:26:16.428336] E [MSGID: 108006]
[afr-common.c:4321:afr_notify] 0-engine-replicate-0: All subvolumes are
down. Going offline until atleast one of them comes back up

And in glusterd logs:
[2016-10-04 17:24:39.522402] E [socket.c:2292:socket_connect_finish]
0-management: connection to 10.100.50.82:24007 failed (No route to host)
[2016-10-04 17:24:39.522578] I [MSGID: 106004]
[glusterd-handler.c:5201:__glusterd_peer_rpc_notify] 0-management: Peer
<dcasrv02> (<1e788fc9-dfe9-4753-92c7-76a95c8d0891>), in state
<Peer in
Cluster>, has disconnected from glusterd.
[2016-10-04 17:24:39.523272] C [MSGID: 106002]
[glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume engine. Stopping local bricks.
[2016-10-04 17:24:39.523314] I [MSGID: 106132]
[glusterd-utils.c:1560:glusterd_service_stop] 0-management: brick already
stopped
[2016-10-04 17:24:39.526188] E [socket.c:2292:socket_connect_finish]
0-management: connection to 10.100.103.3:24007 failed (No route to host)
[2016-10-04 17:24:39.526219] I [MSGID: 106004]
[glusterd-handler.c:5201:__glusterd_peer_rpc_notify] 0-management: Peer
<dcastor03> (<9a9c037e-96cd-4f73-9800-a1df5cdd2818>), in state
<Peer in
Cluster>, has disconnected from glusterd.


> Thanks
>
>
>
> *From:* Sahina Bose [mailto:sabose at redhat.com]
> *Sent:* 05 October 2016 08:11
> *To:* Jason Jeffrey <jason at sudo.co.uk>; gluster-users at
gluster.org;
> Ravishankar Narayanankutty <ravishankar at redhat.com>
> *Cc:* Simone Tiraboschi <stirabos at redhat.com>; users <users at
ovirt.org>
>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
> [Adding gluster-users ML]
>
> The brick logs are filled with errors :
> [2016-10-05 19:30:28.659061] E [MSGID: 113077]
[posix-handle.c:309:posix_handle_pump]
> 0-engine-posix: malformed internal link /var/run/vdsm/storage/
> 0a021563-91b5-4f49-9c6b-fff45e85a025/d84f0551-0f2b-457c-808c-6369c6708d43/
> 1b5a5e34-818c-4914-8192-2f05733b5583 for /xpool/engine/brick/.
> glusterfs/b9/8e/b98ed8d2-3bf9-4b11-92fd-ca5324e131a8
> [2016-10-05 19:30:28.659069] E [MSGID: 113091] [posix.c:180:posix_lookup]
> 0-engine-posix: Failed to create inode handle for path
> <gfid:b98ed8d2-3bf9-4b11-92fd-ca5324e131a8>
> The message "E [MSGID: 113018] [posix.c:198:posix_lookup]
0-engine-posix:
> lstat on null failed" repeated 3 times between [2016-10-05
19:30:28.656529]
> and [2016-10-05 19:30:28.659076]
> [2016-10-05 19:30:28.659087] W [MSGID: 115005]
> [server-resolve.c:126:resolve_gfid_cbk] 0-engine-server:
> b98ed8d2-3bf9-4b11-92fd-ca5324e131a8: failed to resolve (Success)
>
> - Ravi, the above are from the data brick of the arbiter volume. Can you
> take a look?
>
>
>
> Jason,
>
> Could you also provide the mount logs from the first host
> (/var/log/glusterfs/rhev-data-center-mnt-glusterSD*engine.log) and
> glusterd log (/var/log/glusterfs/etc-glusterfs-glusterd.vol.log) around
> the same time frame.
>
>
>
>
>
> On Wed, Oct 5, 2016 at 3:28 AM, Jason Jeffrey <jason at sudo.co.uk>
wrote:
>
> Hi,
>
>
>
> Servers are powered  off  when I?m not looking at the problem.
>
>
>
> There may have been instances where all three were not powered on, during
> the same period.
>
>
>
> Glusterhd log attached, the xpool-engine-brick log is over 1 GB in size,
> I?ve taken a sample of the last  couple days, looks to be highly repative.
>
>
>
> Cheers
>
>
>
> Jason
>
>
>
>
>
>
>
>
>
> *From:* Simone Tiraboschi [mailto:stirabos at redhat.com]
> *Sent:* 04 October 2016 16:50
>
>
> *To:* Jason Jeffrey <jason at sudo.co.uk>
> *Cc:* users <users at ovirt.org>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
>
>
>
>
> On Tue, Oct 4, 2016 at 5:22 PM, Jason Jeffrey <jason at sudo.co.uk>
wrote:
>
> Hi,
>
>
>
> DCASTORXX is a hosts entry for dedicated  direct 10GB links (each private
> /28) between the x3 servers  i.e 1=> 2&3, 2=> 1&3, etc)
planned to be used
> solely for storage.
>
>
>
> I,e
>
>
>
> 10.100.50.81    dcasrv01
>
> 10.100.101.1    dcastor01
>
> 10.100.50.82    dcasrv02
>
> 10.100.101.2    dcastor02
>
> 10.100.50.83    dcasrv03
>
> 10.100.103.3    dcastor03
>
>
>
> These were setup with the gluster commands
>
>
>
> ?         gluster volume create iso replica 3 arbiter 1
> dcastor01:/xpool/iso/brick   dcastor02:/xpool/iso/brick
> dcastor03:/xpool/iso/brick
>
> ?         gluster volume create export replica 3 arbiter 1
> dcastor02:/xpool/export/brick  dcastor03:/xpool/export/brick
> dcastor01:/xpool/export/brick
>
> ?         gluster volume create engine replica 3 arbiter 1
> dcastor01:/xpool/engine/brick dcastor02:/xpool/engine/brick
> dcastor03:/xpool/engine/brick
>
> ?         gluster volume create data replica 3 arbiter 1
> dcastor01:/xpool/data/brick  dcastor03:/xpool/data/brick
> dcastor02:/xpool/data/bricky
>
>
>
>
>
> So yes, DCASRV01 is the server (pri) and have local bricks access through
> DCASTOR01 interface
>
>
>
> Is the issue here not the incorrect soft link ?
>
>
>
> No, this should be fine.
>
>
>
> The issue is that periodically your gluster volume losses its server
> quorum and become unavailable.
>
> It happened more than once from your logs.
>
>
>
> Can you please attach also gluster logs for that volume?
>
>
>
>
>
> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-
> 496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93
>
> [root at dcasrv01 /]# ls -al /var/run/vdsm/storage/bbb70623-194a-46d2-a164-
> 76a4876ecaaf/
>
> ls: cannot access
/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/:
> No such file or directory
>
> But the data does exist
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al
>
> drwxr-xr-x. 2 vdsm kvm    4096 Oct  3 17:17 .
>
> drwxr-xr-x. 6 vdsm kvm    4096 Oct  3 17:17 ..
>
> -rw-rw----. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93
>
> -rw-rw----. 2 vdsm kvm 1048576 Oct  3 17:17 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93.lease
>
> -rw-r--r--. 2 vdsm kvm     283 Oct  3 17:17
cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta
>
>
>
>
> Thanks
>
>
>
> Jason
>
>
>
>
>
>
>
> *From:* Simone Tiraboschi [mailto:stirabos at redhat.com]
> *Sent:* 04 October 2016 14:40
>
>
> *To:* Jason Jeffrey <jason at sudo.co.uk>
> *Cc:* users <users at ovirt.org>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
>
>
>
>
> On Tue, Oct 4, 2016 at 10:51 AM, Simone Tiraboschi <stirabos at
redhat.com>
> wrote:
>
>
>
>
>
> On Mon, Oct 3, 2016 at 11:56 PM, Jason Jeffrey <jason at sudo.co.uk>
wrote:
>
> Hi,
>
>
>
> Another problem has appeared, after rebooting the primary the VM will not
> start.
>
>
>
> Appears the symlink is broken between gluster mount ref and vdsm
>
>
>
> The first host was correctly deployed but it seas that you are facing some
> issue connecting the storage.
>
> Can you please attach vdsm logs and /var/log/messages from the first host?
>
>
>
> Thanks Jason,
>
> I suspect that your issue is related to this:
>
> Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 17:24:39.522620] C [MSGID: 106002] [glusterd-server-quorum.c:351:
> glusterd_do_volume_quorum_action] 0-management: Server quorum lost for
> volume data. Stopping local bricks.
>
> Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 17:24:39.523272] C [MSGID: 106002] [glusterd-server-quorum.c:351:
> glusterd_do_volume_quorum_action] 0-management: Server quorum lost for
> volume engine. Stopping local bricks.
>
>
>
> and for some time your gluster volume has been working.
>
>
>
> But then:
>
> Oct  4 19:02:09 dcasrv01 systemd: Started /usr/bin/mount -t glusterfs -o
> backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
> /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>
> Oct  4 19:02:09 dcasrv01 systemd: Starting /usr/bin/mount -t glusterfs -o
> backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
> /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: /usr/lib/python2.7/site-
> packages/yajsonrpc/stomp.py:352: DeprecationWarning: Dispatcher.pending
> is deprecated. Use Dispatcher.socket.pending instead.
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
> 'pending', lambda: 0)
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: /usr/lib/python2.7/site-
> packages/yajsonrpc/stomp.py:352: DeprecationWarning: Dispatcher.pending
> is deprecated. Use Dispatcher.socket.pending instead.
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
> 'pending', lambda: 0)
>
> Oct  4 19:02:11 dcasrv01 journal: vdsm vds.dispatcher ERROR SSL error
> during reading data: unexpected eof
>
> Oct  4 19:02:11 dcasrv01 journal: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to
> storage server failed' - trying to restart agent
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent:
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error:
> 'Connection to storage server failed' - trying to restart agent
>
> Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 18:02:12.384611] C [MSGID: 106003] [glusterd-server-quorum.c:346:
> glusterd_do_volume_quorum_action] 0-management: Server quorum regained
> for volume data. Starting local bricks.
>
> Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 18:02:12.388981] C [MSGID: 106003] [glusterd-server-quorum.c:346:
> glusterd_do_volume_quorum_action] 0-management: Server quorum regained
> for volume engine. Starting local bricks.
>
>
>
> And at that point VDSM started complaining that the hosted-engine-storage
> domain doesn't exist anymore:
>
> Oct  4 19:02:30 dcasrv01 journal: ovirt-ha-agent
> ovirt_hosted_engine_ha.lib.image.Image ERROR Error fetching volumes list:
> Storage domain does not exist:
(u'bbb70623-194a-46d2-a164-76a4876ecaaf',)
>
> Oct  4 19:02:30 dcasrv01 ovirt-ha-agent:
ERROR:ovirt_hosted_engine_ha.lib.image.Image:Error
> fetching volumes list: Storage domain does not exist:
> (u'bbb70623-194a-46d2-a164-76a4876ecaaf',)
>
>
>
> I see from the logs that the ovirt-ha-agent is trying to mount the
> hosted-engine storage domain as:
>
> /usr/bin/mount -t glusterfs -o backup-volfile-servers=dcastor02:dcastor03
> dcastor01:engine /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>
>
>
> Pointing to dcastor01, dcastor02 and dcastor03 while your server is
> dcasrv01.
>
> But at the same time it seams that also dcasrv01 has local bricks for the
> same engine volume.
>
>
>
> So, is dcasrv01 just an alias fro dcastor01? if not you probably have some
> issue with the configuration of your gluster volume.
>
>
>
>
>
>
>
> From broker.log
>
>
>
> Thread-169::ERROR::2016-10-04 22:44:16,189::storage_broker::138::
> ovirt_hosted_engine_ha.broker.storage_broker.
> StorageBroker::(get_raw_stats_for_service_type) Failed to read metadata
> from /rhev/data-center/mnt/glusterSD/dcastor01:engine/
> bbb70623-194a-46d2-a164-76a4876ecaaf/ha_agent/hosted-engine.metadata
>
>
>
> [root at dcasrv01 ovirt-hosted-engine-ha]# ls -al /rhev/data-center/mnt/
> glusterSD/dcastor01\:engine/bbb70623-194a-46d2-a164-76a4876ecaaf/ha_agent/
>
> total 9
>
> drwxrwx---. 2 vdsm kvm 4096 Oct  3 17:27 .
>
> drwxr-xr-x. 5 vdsm kvm 4096 Oct  3 17:17 ..
>
> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.lockspace ->
> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/23d81b73-bcb7-
> 4742-abde-128522f43d78/11d6a3e1-1817-429d-b2e0-9051a3cf41a4
>
> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-
> 496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93
>
>
>
> [root at dcasrv01 /]# ls -al /var/run/vdsm/storage/bbb70623-194a-46d2-a164-
> 76a4876ecaaf/
>
> ls: cannot access
/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/:
> No such file or directory
>
>
>
> Though file appears to be there
>
>
>
> Gluster is setup as xpool/engine
>
>
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# pwd
>
> /xpool/engine/brick/bbb70623-194a-46d2-a164-76a4876ecaaf/
> images/fd44dbf9-473a-496a-9996-c8abe3278390
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al
>
> total 2060
>
> drwxr-xr-x. 2 vdsm kvm    4096 Oct  3 17:17 .
>
> drwxr-xr-x. 6 vdsm kvm    4096 Oct  3 17:17 ..
>
> -rw-rw----. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93
>
> -rw-rw----. 2 vdsm kvm 1048576 Oct  3 17:17 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93.lease
>
> -rw-r--r--. 2 vdsm kvm     283 Oct  3 17:17
cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta
>
>
>
>
>
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume
info
>
>
>
> Volume Name: data
>
> Type: Replicate
>
> Volume ID: 54fbcafc-fed9-4bce-92ec-fa36cdcacbd4
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor01:/xpool/data/brick
>
> Brick2: dcastor03:/xpool/data/brick
>
> Brick3: dcastor02:/xpool/data/bricky (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> performance.quick-read: off
>
> performance.read-ahead: off
>
> performance.io-cache: off
>
> performance.stat-prefetch: off
>
> cluster.eager-lock: enable
>
> network.remote-dio: enable
>
> cluster.quorum-type: auto
>
> cluster.server-quorum-type: server
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
> Volume Name: engine
>
> Type: Replicate
>
> Volume ID: dd4c692d-03aa-4fc6-9011-a8dad48dad96
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor01:/xpool/engine/brick
>
> Brick2: dcastor02:/xpool/engine/brick
>
> Brick3: dcastor03:/xpool/engine/brick (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> performance.quick-read: off
>
> performance.read-ahead: off
>
> performance.io-cache: off
>
> performance.stat-prefetch: off
>
> cluster.eager-lock: enable
>
> network.remote-dio: enable
>
> cluster.quorum-type: auto
>
> cluster.server-quorum-type: server
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
> Volume Name: export
>
> Type: Replicate
>
> Volume ID: 23f14730-d264-4cc2-af60-196b943ecaf3
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor02:/xpool/export/brick
>
> Brick2: dcastor03:/xpool/export/brick
>
> Brick3: dcastor01:/xpool/export/brick (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
> Volume Name: iso
>
> Type: Replicate
>
> Volume ID: b2d3d7e2-9919-400b-8368-a0443d48e82a
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor01:/xpool/iso/brick
>
> Brick2: dcastor02:/xpool/iso/brick
>
> Brick3: dcastor03:/xpool/iso/brick (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
>
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume
> status
>
> Status of volume: data
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick dcastor01:/xpool/data/brick           49153     0          Y
> 3076
>
> Brick dcastor03:/xpool/data/brick           49153     0          Y
> 3019
>
> Brick dcastor02:/xpool/data/bricky          49153     0          Y
> 3857
>
> NFS Server on localhost                     2049      0          Y
>     3097
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 3088
>
> NFS Server on dcastor03                     2049      0          Y
> 3039
>
> Self-heal Daemon on dcastor03               N/A       N/A        Y
> 3114
>
> NFS Server on dcasrv02                      2049      0          Y
> 3871
>
> Self-heal Daemon on dcasrv02                N/A       N/A        Y
> 3864
>
>
>
> Task Status of Volume data
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
> Status of volume: engine
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick dcastor01:/xpool/engine/brick         49152     0          Y
> 3131
>
> Brick dcastor02:/xpool/engine/brick         49152     0          Y
> 3852
>
> Brick dcastor03:/xpool/engine/brick         49152     0          Y
> 2992
>
> NFS Server on localhost                     2049      0          Y
> 3097
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 3088
>
> NFS Server on dcastor03                     2049      0          Y
> 3039
>
> Self-heal Daemon on dcastor03               N/A       N/A        Y
> 3114
>
> NFS Server on dcasrv02                      2049      0          Y
> 3871
>
> Self-heal Daemon on dcasrv02                N/A       N/A        Y
> 3864
>
>
>
> Task Status of Volume engine
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
> Status of volume: export
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick dcastor02:/xpool/export/brick         49155     0          Y
> 3872
>
> Brick dcastor03:/xpool/export/brick         49155     0          Y
> 3147
>
> Brick dcastor01:/xpool/export/brick         49155     0          Y
> 3150
>
> NFS Server on localhost                     2049      0          Y
> 3097
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 3088
>
> NFS Server on dcastor03                     2049      0          Y
> 3039
>
> Self-heal Daemon on dcastor03               N/A       N/A        Y
> 3114
>
> NFS Server on dcasrv02                      2049      0          Y
> 3871
>
> Self-heal Daemon on dcasrv02                N/A       N/A        Y
> 3864
>
>
>
> Task Status of Volume export
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
> Status of volume: iso
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick dcastor01:/xpool/iso/brick            49154     0          Y
> 3152
>
> Brick dcastor02:/xpool/iso/brick            49154     0          Y
> 3881
>
> Brick dcastor03:/xpool/iso/brick            49154     0          Y
> 3146
>
> NFS Server on localhost                     2049      0          Y
> 3097
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 3088
>
> NFS Server on dcastor03                     2049      0          Y
> 3039
>
> Self-heal Daemon on dcastor03               N/A       N/A        Y
> 3114
>
> NFS Server on dcasrv02                      2049      0          Y
> 3871
>
> Self-heal Daemon on dcasrv02                N/A       N/A        Y
> 3864
>
>
>
> Task Status of Volume iso
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
>
> Thanks
>
>
>
> Jason
>
>
>
>
>
>
>
> *From:* users-bounces at ovirt.org [mailto:users-bounces at ovirt.org] *On
> Behalf Of *Jason Jeffrey
> *Sent:* 03 October 2016 18:40
> *To:* users at ovirt.org
>
>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
> Hi,
>
>
>
> Setup log attached for primary
>
>
>
> Regards
>
>
>
> Jason
>
>
>
> *From:* Simone Tiraboschi [mailto:stirabos at redhat.com
> <stirabos at redhat.com>]
> *Sent:* 03 October 2016 09:27
> *To:* Jason Jeffrey <jason at sudo.co.uk>
> *Cc:* users <users at ovirt.org>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
>
>
>
>
> On Mon, Oct 3, 2016 at 12:45 AM, Jason Jeffrey <jason at sudo.co.uk>
wrote:
>
> Hi,
>
>
>
> I am trying to build a x3 HC cluster, with a self hosted engine using
> gluster.
>
>
>
> I have successful built the 1st node,  however when I attempt to run
> hosted-engine ?deploy on node 2, I get the following error
>
>
>
> [WARNING] A configuration file must be supplied to deploy Hosted Engine on
> an additional host.
>
> [ ERROR ] 'version' is not stored in the HE configuration image
>
> [ ERROR ] Unable to get the answer file from the shared storage
>
> [ ERROR ] Failed to execute stage 'Environment customization':
Unable to
> get the answer file from the shared storage
>
> [ INFO  ] Stage: Clean up
>
> [ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-
> setup/answers/answers-20161002232505.conf'
>
> [ INFO  ] Stage: Pre-termination
>
> [ INFO  ] Stage: Termination
>
> [ ERROR ] Hosted Engine deployment failed
>
>
>
> Looking at the failure in the log file..
>
>
>
> Can you please attach hosted-engine-setup logs from the first host?
>
>
>
>
>
> 2016-10-02 23:25:05 WARNING
otopi.plugins.gr_he_common.core.remote_answerfile
> remote_answerfile._customization:151 A configuration
>
> file must be supplied to deploy Hosted Engine on an additional host.
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> remote_answerfile._fetch_answer_file:61 _fetch_answer_f
>
> ile
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> remote_answerfile._fetch_answer_file:69 fetching from:
>
> /rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-
> fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/7
>
> 8cb2527-a2e2-489a-9fad-465a72221b37
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib._dd_pipe_tar:69 executing: 'sudo -u vdsm dd i
>
> f=/rhev/data-center/mnt/glusterSD/dcastor02:engine/
> 0a021563-91b5-4f49-9c6b-fff45e85a025/images/f055216c-
> 02f9-4cd1-a22c-d6b56a0a8e9b
>
> /78cb2527-a2e2-489a-9fad-465a72221b37 bs=4k'
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib._dd_pipe_tar:70 executing: 'tar -tvf -'
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib._dd_pipe_tar:88 stdout:
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib._dd_pipe_tar:89 stderr:
>
> 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib.validateConfImage:111 'version' is not stored
>
> in the HE configuration image
>
> 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile
> remote_answerfile._fetch_answer_file:73 Unable to get t
>
> he answer file from the shared storage
>
>
>
> Looking at the detected gluster path - /rhev/data-center/mnt/
> glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-
> fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/
>
>
>
> [root at dcasrv02 ~]# ls -al /rhev/data-center/mnt/
> glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-
> fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/
>
> total 1049609
>
> drwxr-xr-x. 2 vdsm kvm       4096 Oct  2 04:46 .
>
> drwxr-xr-x. 6 vdsm kvm       4096 Oct  2 04:46 ..
>
> -rw-rw----. 1 vdsm kvm 1073741824 Oct  2 04:46 78cb2527-a2e2-489a-9fad-
> 465a72221b37
>
> -rw-rw----. 1 vdsm kvm    1048576 Oct  2 04:46 78cb2527-a2e2-489a-9fad-
> 465a72221b37.lease
>
> -rw-r--r--. 1 vdsm kvm        294 Oct  2 04:46
78cb2527-a2e2-489a-9fad-465a72221b37.meta
>
>
>
>
> 78cb2527-a2e2-489a-9fad-465a72221b37 is  a 1 GB file, is this the engine
> VM ?
>
>
>
> Copying the answers file form primary
(/etc/ovirt-hosted-engine/answers.conf
> ) to  node 2 and rerunning produces the same error : (
>
> (hosted-engine --deploy  --config-append=/root/answers.conf )
>
>
>
> Also tried on node 3, same issues
>
>
>
> Happy to provide logs and other debugs
>
>
>
> Thanks
>
>
>
> Jason
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161005/e32cbcd1/attachment.html>

Gluster users - Oct 2016 - [ovirt-users] 4.0 - 2nd node fails on deploy

[Gluster-users] [ovirt-users] 4.0 - 2nd node fails on deploy

[Gluster-users] [ovirt-users] 4.0 - 2nd node fails on deploy

[Gluster-users] [ovirt-users] 4.0 - 2nd node fails on deploy