thr3ads.net - Gluster users - [Gluster-users] Fwd: vm paused unknown storage error one node out of 3 only [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Krutika Dhananjay

2016-Aug-15 23:24 UTC

[Gluster-users] Fwd: vm paused unknown storage error one node out of 3 only

No. The EEXIST errors are normal and can be ignored. This can happen when
multiple threads try to create the same
shard in parallel. Nothing wrong with that.

-Krutika

On Tue, Aug 16, 2016 at 1:02 AM, David Gossage <dgossage at
carouselchecks.com>
wrote:
> On Sat, Aug 13, 2016 at 6:37 AM, David Gossage <
> dgossage at carouselchecks.com> wrote:
>
>> Here is reply again just in case.  I got quarantine message so not sure
>> if first went through or wll anytime soon.  Brick logs weren't
large so Ill
>> just include as text files this time
>>
>
> Did maintenance over weekend updating ovirt from 3.6.6->3.6.7 and after
> restrating the complaining ovirt node I was able to migrate the 2 vm with
> issues.  So not sure why the mount got stale, but I imagine that one node
> couldn't see the new image files after that had occurred?
>
> Still getting a few sporadic errors, but seems much fewer than before and
> never get any corresponding notices in any other log files
>
> [2016-08-15 13:40:31.510798] E [MSGID: 113022] [posix.c:1245:posix_mknod]
> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
> 0e5ad95d-722d-4374-88fb-66fca0b14341.584 failed [File exists]
> [2016-08-15 13:40:31.522067] E [MSGID: 113022] [posix.c:1245:posix_mknod]
> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
> 0e5ad95d-722d-4374-88fb-66fca0b14341.584 failed [File exists]
> [2016-08-15 17:47:06.375708] E [MSGID: 113022] [posix.c:1245:posix_mknod]
> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
> d5a328be-03d0-42f7-a443-248290849e7d.722 failed [File exists]
> [2016-08-15 17:47:26.435198] E [MSGID: 113022] [posix.c:1245:posix_mknod]
> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
> d5a328be-03d0-42f7-a443-248290849e7d.723 failed [File exists]
> [2016-08-15 17:47:06.405481] E [MSGID: 113022] [posix.c:1245:posix_mknod]
> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
> d5a328be-03d0-42f7-a443-248290849e7d.722 failed [File exists]
> [2016-08-15 17:47:26.464542] E [MSGID: 113022] [posix.c:1245:posix_mknod]
> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
> d5a328be-03d0-42f7-a443-248290849e7d.723 failed [File exists]
> [2016-08-15 18:46:47.187967] E [MSGID: 113022] [posix.c:1245:posix_mknod]
> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
> f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.739 failed [File exists]
> [2016-08-15 18:47:41.414312] E [MSGID: 113022] [posix.c:1245:posix_mknod]
> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
> f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed [File exists]
> [2016-08-15 18:47:41.450470] E [MSGID: 113022] [posix.c:1245:posix_mknod]
> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/
> f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed [File exists]
>
>
>
>
>
> The attached file bricks.zip you sent to <kdhananj at
redhat.com>;<Gluster
>> -users at gluster.org> on 8/13/2016 7:17:35 AM was quarantined. As a
safety
>> precaution, the University of South Carolina quarantines .zip and .docm
>> files sent via email. If this is a legitimate attachment <
>> kdhananj at redhat.com>;<Gluster-users at gluster.org> may
contact the Service
>> Desk at 803-777-1800 (servicedesk at sc.edu) and the attachment file
will
>> be released from quarantine and delivered.
>>
>>
>> On Sat, Aug 13, 2016 at 6:15 AM, David Gossage <
>> dgossage at carouselchecks.com> wrote:
>>
>>> On Sat, Aug 13, 2016 at 12:26 AM, Krutika Dhananjay <kdhananj at
redhat.com
>>> > wrote:
>>>
>>>> 1. Could you share the output of `gluster volume heal
<VOL> info`?
>>>>
>>> Results were same moments after issue occurred as well
>>> Brick ccgl1.gl.local:/gluster1/BRICK1/1
>>> Status: Connected
>>> Number of entries: 0
>>>
>>> Brick ccgl2.gl.local:/gluster1/BRICK1/1
>>> Status: Connected
>>> Number of entries: 0
>>>
>>> Brick ccgl4.gl.local:/gluster1/BRICK1/1
>>> Status: Connected
>>> Number of entries: 0
>>>
>>>
>>>
>>>> 2. `gluster volume info`
>>>>
>>> Volume Name: GLUSTER1
>>> Type: Replicate
>>> Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f
>>> Status: Started
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>> Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>> Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>> Options Reconfigured:
>>> cluster.locking-scheme: granular
>>> nfs.enable-ino32: off
>>> nfs.addr-namelookup: off
>>> nfs.disable: on
>>> performance.strict-write-ordering: off
>>> cluster.background-self-heal-count: 16
>>> cluster.self-heal-window-size: 1024
>>> server.allow-insecure: on
>>> cluster.server-quorum-type: server
>>> cluster.quorum-type: auto
>>> network.remote-dio: enable
>>> cluster.eager-lock: enable
>>> performance.stat-prefetch: on
>>> performance.io-cache: off
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> storage.owner-gid: 36
>>> storage.owner-uid: 36
>>> performance.readdir-ahead: on
>>> features.shard: on
>>> features.shard-block-size: 64MB
>>> diagnostics.brick-log-level: WARNING
>>>
>>>
>>>
>>>> 3. fuse mount logs of the affected volume(s)?
>>>>
>>>  [2016-08-12 21:34:19.518511] W [MSGID: 114031]
>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1:
>>> remote operation failed [No such file or directory]
>>> [2016-08-12 21:34:19.519115] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0:
>>> remote operation failed [No such file or directory]
>>> [2016-08-12 21:34:19.519203] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1:
>>> remote operation failed [No such file or directory]
>>> [2016-08-12 21:34:19.519226] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2:
>>> remote operation failed [No such file or directory]
>>> [2016-08-12 21:34:19.520737] W [MSGID: 108008]
>>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0:
Unreadable
>>> subvolume -1 found with event generation 3 for gfid
>>> e18650c4-02c0-4a5a-bd4c-bbdf5fbd9c88. (Possible split-brain)
>>> [2016-08-12 21:34:19.521393] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2:
>>> remote operation failed [No such file or directory]
>>> [2016-08-12 21:34:19.522269] E [MSGID: 109040]
>>> [dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht:
>>> (null): failed to lookup the file on GLUSTER1-dht [Stale file
handle]
>>> [2016-08-12 21:34:19.522341] W [fuse-bridge.c:2227:fuse_readv_cbk]
>>> 0-glusterfs-fuse: 18479997: READ => -1
gfid=31d7c904-775e-4b9f-8ef7-888218679845
>>> fd=0x7f00a80bde58 (Stale file handle)
>>> [2016-08-12 21:34:19.521296] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1:
>>> remote operation failed [No such file or directory]
>>> [2016-08-12 21:34:19.521357] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0:
>>> remote operation failed [No such file or directory]
>>> [2016-08-12 22:15:08.337528] I [MSGID: 109066]
>>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
>>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
>>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
>>> [2016-08-12 22:15:12.240026] I [MSGID: 109066]
>>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
>>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
>>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
>>> [2016-08-12 22:15:11.105593] I [MSGID: 109066]
>>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
>>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
>>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
>>> [2016-08-12 22:15:14.772713] I [MSGID: 109066]
>>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
>>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
>>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
>>>
>>> 4. glustershd logs
>>>>
>>> Nothing recent same on all 3 storage nodes
>>> [2016-08-07 08:48:03.593401] I
[glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
>>> 0-glusterfs: No change in volfile, continuing
>>> [2016-08-11 08:14:03.683287] I [MSGID: 100011]
>>> [glusterfsd.c:1323:reincarnate] 0-glusterfsd: Fetching the volume
file
>>> from server...
>>> [2016-08-11 08:14:03.684492] I
[glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
>>> 0-glusterfs: No change in volfile, continuing
>>>
>>>
>>>
>>>> 5. Brick logs
>>>>
>>>  Their have been some error in brick logs I hadn't noticed
occurring.
>>> I've zip'd and attached all 3 nodes logs, but from this
snippet on one node
>>> none of them seem to coincide with the  time window when migration
had
>>> issues.  f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42 shard refers to an
image
>>> for a different vm than one I had issues with as well.  Maybe
gluster is
>>> trying to do some sort of make shard test before writing out
changes that
>>> would go to that image and that shard file?
>>>
>>> [2016-08-12 18:48:22.463628] E [MSGID: 113022]
>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>> /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.697
>>> failed [File exists]
>>> [2016-08-12 18:48:24.553455] E [MSGID: 113022]
>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>> /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
>>> failed [File exists]
>>> [2016-08-12 18:49:16.065502] E [MSGID: 113022]
>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>> /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.738
>>> failed [File exists]
>>> The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
>>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
>>> f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed [File exists]"
repeated 5
>>> times between [2016-08-12 18:48:22.463628] and [2016-08-12
18:48:22.514777]
>>> [2016-08-12 18:48:24.581216] E [MSGID: 113022]
>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>> /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
>>> failed [File exists]
>>> The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
>>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
>>> f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed [File exists]"
repeated 5
>>> times between [2016-08-12 18:49:16.065502] and [2016-08-12
18:49:16.107746]
>>> [2016-08-12 19:23:40.964678] E [MSGID: 113022]
>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>> /gluster1/BRICK1/1/.shard/83794e5d-2225-4560-8df6-7c903c8a648a.1301
>>> failed [File exists]
>>> [2016-08-12 20:00:33.498751] E [MSGID: 113022]
>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>> /gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
>>> failed [File exists]
>>> [2016-08-12 20:00:33.530938] E [MSGID: 113022]
>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>> /gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
>>> failed [File exists]
>>> [2016-08-13 01:47:23.338036] E [MSGID: 113022]
>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>> /gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.211
>>> failed [File exists]
>>> The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
>>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/1884
>>> 3fb4-e31c-4fc3-b519-cc6e5e947813.211 failed [File exists]"
repeated 16
>>> times between [2016-08-13 01:47:23.338036] and [2016-08-13
01:47:23.380980]
>>> [2016-08-13 01:48:02.224494] E [MSGID: 113022]
>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>> /gluster1/BRICK1/1/.shard/ffbbcce0-3c4a-4fdf-b79f-a96ca3215657.211
>>> failed [File exists]
>>> [2016-08-13 01:48:42.266148] E [MSGID: 113022]
>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>> /gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.177
>>> failed [File exists]
>>> [2016-08-13 01:49:09.717434] E [MSGID: 113022]
>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>> /gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.178
>>> failed [File exists]
>>>
>>>
>>>> -Krutika
>>>>
>>>>
>>>> On Sat, Aug 13, 2016 at 3:10 AM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>>
>>>>> On Fri, Aug 12, 2016 at 4:25 PM, Dan Lavu <dan at
redhat.com> wrote:
>>>>>
>>>>>> David,
>>>>>>
>>>>>> I'm seeing similar behavior in my lab, but it has
been caused by
>>>>>> healing files in the gluster cluster, though I
attribute my problems to
>>>>>> problems with the storage fabric. See if 'gluster
volume heal $VOL info'
>>>>>> indicates files that are being healed, and if those
reduce in number, can
>>>>>> the VM start?
>>>>>>
>>>>>>
>>>>> I haven't had any files in a state of being healed
according to either
>>>>> of the 3 storage nodes.
>>>>>
>>>>> I shut down one VM that has been around awhile a moment ago
then told
>>>>> it to start on the one ovirt server that complained
previously.  It ran
>>>>> fine, and I was able to migrate it off and on the host no
issues.
>>>>>
>>>>> I told one of the new VM's to migrate to the one node
and within
>>>>> seconds it paused from unknown storage errors no shards
showing heals
>>>>> nothing with an error on storage node.  Same stale file
handle issues.
>>>>>
>>>>> I'll probably put this node in maintenance later and
reboot it.  Other
>>>>> than that I may re-clone those 2 reccent VM's.  maybe
images just got
>>>>> corrupted though why it would only fail on one node of 3 if
image was bad
>>>>> not sure.
>>>>>
>>>>>
>>>>> Dan
>>>>>>
>>>>>> On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <
>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>
>>>>>>> Figure I would repost here as well.  one client out
of 3 complaining
>>>>>>> of stale file handles on a few new VM's I
migrated over. No errors on
>>>>>>> storage nodes just client.  Maybe just put that one
in maintenance and
>>>>>>> restart gluster mount?
>>>>>>>
>>>>>>> *David Gossage*
>>>>>>> *Carousel Checks Inc. | System Administrator*
>>>>>>> *Office* 708.613.2284
>>>>>>>
>>>>>>> ---------- Forwarded message ----------
>>>>>>> From: David Gossage <dgossage at
carouselchecks.com>
>>>>>>> Date: Thu, Aug 11, 2016 at 12:17 AM
>>>>>>> Subject: vm paused unknown storage error one node
out of 3 only
>>>>>>> To: users <users at ovirt.org>
>>>>>>>
>>>>>>>
>>>>>>> Out of a 3 node cluster running oVirt
3.6.6.2-1.el7.centos with a 3
>>>>>>> replicate gluster 3.7.14 starting a VM i just
copied in on one node of the
>>>>>>> 3 gets the following errors.  The other 2 the vm
starts fine.  All ovirt
>>>>>>> and gluster are centos 7 based. VM on start of the
one node it tries to
>>>>>>> default to on its own accord immediately puts into
paused for unknown
>>>>>>> reason.  Telling it to start on different node
starts ok.  node with issue
>>>>>>> already has 5 VMs running fine on it same gluster
storage plus the hosted
>>>>>>> engine on different volume.
>>>>>>>
>>>>>>> gluster nodes logs did not have any errors for
volume
>>>>>>> nodes own gluster logs had this in log
>>>>>>>
>>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find
in .glusterfs
>>>>>>> .shard or images/
>>>>>>>
>>>>>>> 7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of
the bootable
>>>>>>> drive of the vm
>>>>>>>
>>>>>>> [2016-08-11 04:31:39.982952] W [MSGID: 114031]
>>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk]
0-GLUSTER1-client-2:
>>>>>>> remote operation failed [No such file or directory]
>>>>>>> [2016-08-11 04:31:39.983683] W [MSGID: 114031]
>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2:
>>>>>>> remote operation failed [No such file or directory]
>>>>>>> [2016-08-11 04:31:39.984182] W [MSGID: 114031]
>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0:
>>>>>>> remote operation failed [No such file or directory]
>>>>>>> [2016-08-11 04:31:39.984221] W [MSGID: 114031]
>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1:
>>>>>>> remote operation failed [No such file or directory]
>>>>>>> [2016-08-11 04:31:39.985941] W [MSGID: 108008]
>>>>>>> [afr-read-txn.c:244:afr_read_txn]
0-GLUSTER1-replicate-0:
>>>>>>> Unreadable subvolume -1 found with event generation
3 for gfid
>>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible
split-brain)
>>>>>>> [2016-08-11 04:31:39.986633] W [MSGID: 114031]
>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2:
>>>>>>> remote operation failed [No such file or directory]
>>>>>>> [2016-08-11 04:31:39.987644] E [MSGID: 109040]
>>>>>>>
[dht-helper.c:1190:dht_migration_complete_check_task]
>>>>>>> 0-GLUSTER1-dht: (null): failed to lookup the file
on GLUSTER1-dht [Stale
>>>>>>> file handle]
>>>>>>> [2016-08-11 04:31:39.987751] W
[fuse-bridge.c:2227:fuse_readv_cbk]
>>>>>>> 0-glusterfs-fuse: 15152930: READ => -1
gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
>>>>>>> fd=0x7f00a80bdb64 (Stale file handle)
>>>>>>> [2016-08-11 04:31:39.986567] W [MSGID: 114031]
>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0:
>>>>>>> remote operation failed [No such file or directory]
>>>>>>> [2016-08-11 04:31:39.986567] W [MSGID: 114031]
>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1:
>>>>>>> remote operation failed [No such file or directory]
>>>>>>> [2016-08-11 04:35:21.210145] W [MSGID: 108008]
>>>>>>> [afr-read-txn.c:244:afr_read_txn]
0-GLUSTER1-replicate-0:
>>>>>>> Unreadable subvolume -1 found with event generation
3 for gfid
>>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible
split-brain)
>>>>>>> [2016-08-11 04:35:21.210873] W [MSGID: 114031]
>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1:
>>>>>>> remote operation failed [No such file or directory]
>>>>>>> [2016-08-11 04:35:21.210888] W [MSGID: 114031]
>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2:
>>>>>>> remote operation failed [No such file or directory]
>>>>>>> [2016-08-11 04:35:21.210947] W [MSGID: 114031]
>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0:
>>>>>>> remote operation failed [No such file or directory]
>>>>>>> [2016-08-11 04:35:21.213270] E [MSGID: 109040]
>>>>>>>
[dht-helper.c:1190:dht_migration_complete_check_task]
>>>>>>> 0-GLUSTER1-dht: (null): failed to lookup the file
on GLUSTER1-dht [Stale
>>>>>>> file handle]
>>>>>>> [2016-08-11 04:35:21.213345] W
[fuse-bridge.c:2227:fuse_readv_cbk]
>>>>>>> 0-glusterfs-fuse: 15156910: READ => -1
gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
>>>>>>> fd=0x7f00a80bf6d0 (Stale file handle)
>>>>>>> [2016-08-11 04:35:21.211516] W [MSGID: 108008]
>>>>>>> [afr-read-txn.c:244:afr_read_txn]
0-GLUSTER1-replicate-0:
>>>>>>> Unreadable subvolume -1 found with event generation
3 for gfid
>>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible
split-brain)
>>>>>>> [2016-08-11 04:35:21.212013] W [MSGID: 114031]
>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0:
>>>>>>> remote operation failed [No such file or directory]
>>>>>>> [2016-08-11 04:35:21.212081] W [MSGID: 114031]
>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1:
>>>>>>> remote operation failed [No such file or directory]
>>>>>>> [2016-08-11 04:35:21.212121] W [MSGID: 114031]
>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2:
>>>>>>> remote operation failed [No such file or directory]
>>>>>>>
>>>>>>> I attached vdsm.log starting from when I spun up vm
on offending node
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160816/d0219406/attachment.html>

David Gossage

2016-Aug-16 00:20 UTC

head link

[Gluster-users] Fwd: vm paused unknown storage error one node out of 3 only

On Mon, Aug 15, 2016 at 6:24 PM, Krutika Dhananjay <kdhananj at
redhat.com>
wrote:
> No. The EEXIST errors are normal and can be ignored. This can happen when
> multiple threads try to create the same
> shard in parallel. Nothing wrong with that.
>
>Other than they pop up as E errors making a user worry hehe

Is their a known bug filed against that or should I maybe create one to see
if we can get that sent to an informational level maybe?


> -Krutika
>
> On Tue, Aug 16, 2016 at 1:02 AM, David Gossage <
> dgossage at carouselchecks.com> wrote:
>
>> On Sat, Aug 13, 2016 at 6:37 AM, David Gossage <
>> dgossage at carouselchecks.com> wrote:
>>
>>> Here is reply again just in case.  I got quarantine message so not
sure
>>> if first went through or wll anytime soon.  Brick logs weren't
large so Ill
>>> just include as text files this time
>>>
>>
>> Did maintenance over weekend updating ovirt from 3.6.6->3.6.7 and
after
>> restrating the complaining ovirt node I was able to migrate the 2 vm
with
>> issues.  So not sure why the mount got stale, but I imagine that one
node
>> couldn't see the new image files after that had occurred?
>>
>> Still getting a few sporadic errors, but seems much fewer than before
and
>> never get any corresponding notices in any other log files
>>
>> [2016-08-15 13:40:31.510798] E [MSGID: 113022]
[posix.c:1245:posix_mknod]
>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/0e5a
>> d95d-722d-4374-88fb-66fca0b14341.584 failed [File exists]
>> [2016-08-15 13:40:31.522067] E [MSGID: 113022]
[posix.c:1245:posix_mknod]
>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/0e5a
>> d95d-722d-4374-88fb-66fca0b14341.584 failed [File exists]
>> [2016-08-15 17:47:06.375708] E [MSGID: 113022]
[posix.c:1245:posix_mknod]
>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/d5a3
>> 28be-03d0-42f7-a443-248290849e7d.722 failed [File exists]
>> [2016-08-15 17:47:26.435198] E [MSGID: 113022]
[posix.c:1245:posix_mknod]
>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/d5a3
>> 28be-03d0-42f7-a443-248290849e7d.723 failed [File exists]
>> [2016-08-15 17:47:06.405481] E [MSGID: 113022]
[posix.c:1245:posix_mknod]
>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/d5a3
>> 28be-03d0-42f7-a443-248290849e7d.722 failed [File exists]
>> [2016-08-15 17:47:26.464542] E [MSGID: 113022]
[posix.c:1245:posix_mknod]
>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/d5a3
>> 28be-03d0-42f7-a443-248290849e7d.723 failed [File exists]
>> [2016-08-15 18:46:47.187967] E [MSGID: 113022]
[posix.c:1245:posix_mknod]
>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
>> f3c5-4c13-4020-b560-1f4f7b1e3c42.739 failed [File exists]
>> [2016-08-15 18:47:41.414312] E [MSGID: 113022]
[posix.c:1245:posix_mknod]
>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
>> f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed [File exists]
>> [2016-08-15 18:47:41.450470] E [MSGID: 113022]
[posix.c:1245:posix_mknod]
>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
>> f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed [File exists]
>>
>>
>>
>>
>>
>> The attached file bricks.zip you sent to <kdhananj at
redhat.com>;<Gluster
>>> -users at gluster.org> on 8/13/2016 7:17:35 AM was quarantined.
As a
>>> safety precaution, the University of South Carolina quarantines
.zip and
>>> .docm files sent via email. If this is a legitimate attachment <
>>> kdhananj at redhat.com>;<Gluster-users at gluster.org> may
contact the
>>> Service Desk at 803-777-1800 (servicedesk at sc.edu) and the
attachment
>>> file will be released from quarantine and delivered.
>>>
>>>
>>> On Sat, Aug 13, 2016 at 6:15 AM, David Gossage <
>>> dgossage at carouselchecks.com> wrote:
>>>
>>>> On Sat, Aug 13, 2016 at 12:26 AM, Krutika Dhananjay <
>>>> kdhananj at redhat.com> wrote:
>>>>
>>>>> 1. Could you share the output of `gluster volume heal
<VOL> info`?
>>>>>
>>>> Results were same moments after issue occurred as well
>>>> Brick ccgl1.gl.local:/gluster1/BRICK1/1
>>>> Status: Connected
>>>> Number of entries: 0
>>>>
>>>> Brick ccgl2.gl.local:/gluster1/BRICK1/1
>>>> Status: Connected
>>>> Number of entries: 0
>>>>
>>>> Brick ccgl4.gl.local:/gluster1/BRICK1/1
>>>> Status: Connected
>>>> Number of entries: 0
>>>>
>>>>
>>>>
>>>>> 2. `gluster volume info`
>>>>>
>>>> Volume Name: GLUSTER1
>>>> Type: Replicate
>>>> Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f
>>>> Status: Started
>>>> Number of Bricks: 1 x 3 = 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>> Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>> Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>> Options Reconfigured:
>>>> cluster.locking-scheme: granular
>>>> nfs.enable-ino32: off
>>>> nfs.addr-namelookup: off
>>>> nfs.disable: on
>>>> performance.strict-write-ordering: off
>>>> cluster.background-self-heal-count: 16
>>>> cluster.self-heal-window-size: 1024
>>>> server.allow-insecure: on
>>>> cluster.server-quorum-type: server
>>>> cluster.quorum-type: auto
>>>> network.remote-dio: enable
>>>> cluster.eager-lock: enable
>>>> performance.stat-prefetch: on
>>>> performance.io-cache: off
>>>> performance.read-ahead: off
>>>> performance.quick-read: off
>>>> storage.owner-gid: 36
>>>> storage.owner-uid: 36
>>>> performance.readdir-ahead: on
>>>> features.shard: on
>>>> features.shard-block-size: 64MB
>>>> diagnostics.brick-log-level: WARNING
>>>>
>>>>
>>>>
>>>>> 3. fuse mount logs of the affected volume(s)?
>>>>>
>>>>  [2016-08-12 21:34:19.518511] W [MSGID: 114031]
>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk]
0-GLUSTER1-client-1:
>>>> remote operation failed [No such file or directory]
>>>> [2016-08-12 21:34:19.519115] W [MSGID: 114031]
>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0:
>>>> remote operation failed [No such file or directory]
>>>> [2016-08-12 21:34:19.519203] W [MSGID: 114031]
>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1:
>>>> remote operation failed [No such file or directory]
>>>> [2016-08-12 21:34:19.519226] W [MSGID: 114031]
>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2:
>>>> remote operation failed [No such file or directory]
>>>> [2016-08-12 21:34:19.520737] W [MSGID: 108008]
>>>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0:
Unreadable
>>>> subvolume -1 found with event generation 3 for gfid
>>>> e18650c4-02c0-4a5a-bd4c-bbdf5fbd9c88. (Possible split-brain)
>>>> [2016-08-12 21:34:19.521393] W [MSGID: 114031]
>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2:
>>>> remote operation failed [No such file or directory]
>>>> [2016-08-12 21:34:19.522269] E [MSGID: 109040]
>>>> [dht-helper.c:1190:dht_migration_complete_check_task]
0-GLUSTER1-dht:
>>>> (null): failed to lookup the file on GLUSTER1-dht [Stale file
handle]
>>>> [2016-08-12 21:34:19.522341] W
[fuse-bridge.c:2227:fuse_readv_cbk]
>>>> 0-glusterfs-fuse: 18479997: READ => -1
gfid=31d7c904-775e-4b9f-8ef7-888218679845
>>>> fd=0x7f00a80bde58 (Stale file handle)
>>>> [2016-08-12 21:34:19.521296] W [MSGID: 114031]
>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1:
>>>> remote operation failed [No such file or directory]
>>>> [2016-08-12 21:34:19.521357] W [MSGID: 114031]
>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0:
>>>> remote operation failed [No such file or directory]
>>>> [2016-08-12 22:15:08.337528] I [MSGID: 109066]
>>>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
>>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
>>>>
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
>>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
>>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
>>>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
>>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
>>>> [2016-08-12 22:15:12.240026] I [MSGID: 109066]
>>>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
>>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
>>>>
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
>>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
>>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
>>>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
>>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
>>>> [2016-08-12 22:15:11.105593] I [MSGID: 109066]
>>>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
>>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
>>>>
35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new
>>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
>>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4
>>>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta
>>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
>>>> [2016-08-12 22:15:14.772713] I [MSGID: 109066]
>>>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming
>>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
>>>>
aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new
>>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) =>
>>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4
>>>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta
>>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0)
>>>>
>>>> 4. glustershd logs
>>>>>
>>>> Nothing recent same on all 3 storage nodes
>>>> [2016-08-07 08:48:03.593401] I
[glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
>>>> 0-glusterfs: No change in volfile, continuing
>>>> [2016-08-11 08:14:03.683287] I [MSGID: 100011]
>>>> [glusterfsd.c:1323:reincarnate] 0-glusterfsd: Fetching the
volume file
>>>> from server...
>>>> [2016-08-11 08:14:03.684492] I
[glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
>>>> 0-glusterfs: No change in volfile, continuing
>>>>
>>>>
>>>>
>>>>> 5. Brick logs
>>>>>
>>>>  Their have been some error in brick logs I hadn't noticed
occurring.
>>>> I've zip'd and attached all 3 nodes logs, but from this
snippet on one node
>>>> none of them seem to coincide with the  time window when
migration had
>>>> issues.  f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42 shard refers to
an image
>>>> for a different vm than one I had issues with as well.  Maybe
gluster is
>>>> trying to do some sort of make shard test before writing out
changes that
>>>> would go to that image and that shard file?
>>>>
>>>> [2016-08-12 18:48:22.463628] E [MSGID: 113022]
>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.697
>>>> failed [File exists]
>>>> [2016-08-12 18:48:24.553455] E [MSGID: 113022]
>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
>>>> failed [File exists]
>>>> [2016-08-12 18:49:16.065502] E [MSGID: 113022]
>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.738
>>>> failed [File exists]
>>>> The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
>>>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
>>>> f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed [File exists]"
repeated 5
>>>> times between [2016-08-12 18:48:22.463628] and [2016-08-12
18:48:22.514777]
>>>> [2016-08-12 18:48:24.581216] E [MSGID: 113022]
>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>
/gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698
>>>> failed [File exists]
>>>> The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
>>>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7
>>>> f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed [File exists]"
repeated 5
>>>> times between [2016-08-12 18:49:16.065502] and [2016-08-12
18:49:16.107746]
>>>> [2016-08-12 19:23:40.964678] E [MSGID: 113022]
>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>
/gluster1/BRICK1/1/.shard/83794e5d-2225-4560-8df6-7c903c8a648a.1301
>>>> failed [File exists]
>>>> [2016-08-12 20:00:33.498751] E [MSGID: 113022]
>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
>>>> failed [File exists]
>>>> [2016-08-12 20:00:33.530938] E [MSGID: 113022]
>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>
/gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580
>>>> failed [File exists]
>>>> [2016-08-13 01:47:23.338036] E [MSGID: 113022]
>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.211
>>>> failed [File exists]
>>>> The message "E [MSGID: 113022] [posix.c:1245:posix_mknod]
>>>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/1884
>>>> 3fb4-e31c-4fc3-b519-cc6e5e947813.211 failed [File exists]"
repeated 16
>>>> times between [2016-08-13 01:47:23.338036] and [2016-08-13
01:47:23.380980]
>>>> [2016-08-13 01:48:02.224494] E [MSGID: 113022]
>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>
/gluster1/BRICK1/1/.shard/ffbbcce0-3c4a-4fdf-b79f-a96ca3215657.211
>>>> failed [File exists]
>>>> [2016-08-13 01:48:42.266148] E [MSGID: 113022]
>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.177
>>>> failed [File exists]
>>>> [2016-08-13 01:49:09.717434] E [MSGID: 113022]
>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>
/gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.178
>>>> failed [File exists]
>>>>
>>>>
>>>>> -Krutika
>>>>>
>>>>>
>>>>> On Sat, Aug 13, 2016 at 3:10 AM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> On Fri, Aug 12, 2016 at 4:25 PM, Dan Lavu <dan at
redhat.com> wrote:
>>>>>>
>>>>>>> David,
>>>>>>>
>>>>>>> I'm seeing similar behavior in my lab, but it
has been caused by
>>>>>>> healing files in the gluster cluster, though I
attribute my problems to
>>>>>>> problems with the storage fabric. See if
'gluster volume heal $VOL info'
>>>>>>> indicates files that are being healed, and if those
reduce in number, can
>>>>>>> the VM start?
>>>>>>>
>>>>>>>
>>>>>> I haven't had any files in a state of being healed
according to
>>>>>> either of the 3 storage nodes.
>>>>>>
>>>>>> I shut down one VM that has been around awhile a moment
ago then told
>>>>>> it to start on the one ovirt server that complained
previously.  It ran
>>>>>> fine, and I was able to migrate it off and on the host
no issues.
>>>>>>
>>>>>> I told one of the new VM's to migrate to the one
node and within
>>>>>> seconds it paused from unknown storage errors no shards
showing heals
>>>>>> nothing with an error on storage node.  Same stale file
handle issues.
>>>>>>
>>>>>> I'll probably put this node in maintenance later
and reboot it.
>>>>>> Other than that I may re-clone those 2 reccent
VM's.  maybe images just got
>>>>>> corrupted though why it would only fail on one node of
3 if image was bad
>>>>>> not sure.
>>>>>>
>>>>>>
>>>>>> Dan
>>>>>>>
>>>>>>> On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <
>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>
>>>>>>>> Figure I would repost here as well.  one client
out of 3
>>>>>>>> complaining of stale file handles on a few new
VM's I migrated over. No
>>>>>>>> errors on storage nodes just client.  Maybe
just put that one in
>>>>>>>> maintenance and restart gluster mount?
>>>>>>>>
>>>>>>>> *David Gossage*
>>>>>>>> *Carousel Checks Inc. | System Administrator*
>>>>>>>> *Office* 708.613.2284
>>>>>>>>
>>>>>>>> ---------- Forwarded message ----------
>>>>>>>> From: David Gossage <dgossage at
carouselchecks.com>
>>>>>>>> Date: Thu, Aug 11, 2016 at 12:17 AM
>>>>>>>> Subject: vm paused unknown storage error one
node out of 3 only
>>>>>>>> To: users <users at ovirt.org>
>>>>>>>>
>>>>>>>>
>>>>>>>> Out of a 3 node cluster running oVirt
3.6.6.2-1.el7.centos with a
>>>>>>>> 3 replicate gluster 3.7.14 starting a VM i just
copied in on one node of
>>>>>>>> the 3 gets the following errors.  The other 2
the vm starts fine.  All
>>>>>>>> ovirt and gluster are centos 7 based. VM on
start of the one node it tries
>>>>>>>> to default to on its own accord immediately
puts into paused for unknown
>>>>>>>> reason.  Telling it to start on different node
starts ok.  node with issue
>>>>>>>> already has 5 VMs running fine on it same
gluster storage plus the hosted
>>>>>>>> engine on different volume.
>>>>>>>>
>>>>>>>> gluster nodes logs did not have any errors for
volume
>>>>>>>> nodes own gluster logs had this in log
>>>>>>>>
>>>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt
find in .glusterfs
>>>>>>>> .shard or images/
>>>>>>>>
>>>>>>>> 7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the
gfid of the bootable
>>>>>>>> drive of the vm
>>>>>>>>
>>>>>>>> [2016-08-11 04:31:39.982952] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk]
0-GLUSTER1-client-2:
>>>>>>>> remote operation failed [No such file or
directory]
>>>>>>>> [2016-08-11 04:31:39.983683] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2:
>>>>>>>> remote operation failed [No such file or
directory]
>>>>>>>> [2016-08-11 04:31:39.984182] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0:
>>>>>>>> remote operation failed [No such file or
directory]
>>>>>>>> [2016-08-11 04:31:39.984221] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1:
>>>>>>>> remote operation failed [No such file or
directory]
>>>>>>>> [2016-08-11 04:31:39.985941] W [MSGID: 108008]
>>>>>>>> [afr-read-txn.c:244:afr_read_txn]
0-GLUSTER1-replicate-0:
>>>>>>>> Unreadable subvolume -1 found with event
generation 3 for gfid
>>>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible
split-brain)
>>>>>>>> [2016-08-11 04:31:39.986633] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2:
>>>>>>>> remote operation failed [No such file or
directory]
>>>>>>>> [2016-08-11 04:31:39.987644] E [MSGID: 109040]
>>>>>>>>
[dht-helper.c:1190:dht_migration_complete_check_task]
>>>>>>>> 0-GLUSTER1-dht: (null): failed to lookup the
file on GLUSTER1-dht [Stale
>>>>>>>> file handle]
>>>>>>>> [2016-08-11 04:31:39.987751] W
[fuse-bridge.c:2227:fuse_readv_cbk]
>>>>>>>> 0-glusterfs-fuse: 15152930: READ => -1
gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
>>>>>>>> fd=0x7f00a80bdb64 (Stale file handle)
>>>>>>>> [2016-08-11 04:31:39.986567] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0:
>>>>>>>> remote operation failed [No such file or
directory]
>>>>>>>> [2016-08-11 04:31:39.986567] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1:
>>>>>>>> remote operation failed [No such file or
directory]
>>>>>>>> [2016-08-11 04:35:21.210145] W [MSGID: 108008]
>>>>>>>> [afr-read-txn.c:244:afr_read_txn]
0-GLUSTER1-replicate-0:
>>>>>>>> Unreadable subvolume -1 found with event
generation 3 for gfid
>>>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible
split-brain)
>>>>>>>> [2016-08-11 04:35:21.210873] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1:
>>>>>>>> remote operation failed [No such file or
directory]
>>>>>>>> [2016-08-11 04:35:21.210888] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2:
>>>>>>>> remote operation failed [No such file or
directory]
>>>>>>>> [2016-08-11 04:35:21.210947] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0:
>>>>>>>> remote operation failed [No such file or
directory]
>>>>>>>> [2016-08-11 04:35:21.213270] E [MSGID: 109040]
>>>>>>>>
[dht-helper.c:1190:dht_migration_complete_check_task]
>>>>>>>> 0-GLUSTER1-dht: (null): failed to lookup the
file on GLUSTER1-dht [Stale
>>>>>>>> file handle]
>>>>>>>> [2016-08-11 04:35:21.213345] W
[fuse-bridge.c:2227:fuse_readv_cbk]
>>>>>>>> 0-glusterfs-fuse: 15156910: READ => -1
gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
>>>>>>>> fd=0x7f00a80bf6d0 (Stale file handle)
>>>>>>>> [2016-08-11 04:35:21.211516] W [MSGID: 108008]
>>>>>>>> [afr-read-txn.c:244:afr_read_txn]
0-GLUSTER1-replicate-0:
>>>>>>>> Unreadable subvolume -1 found with event
generation 3 for gfid
>>>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible
split-brain)
>>>>>>>> [2016-08-11 04:35:21.212013] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-0:
>>>>>>>> remote operation failed [No such file or
directory]
>>>>>>>> [2016-08-11 04:35:21.212081] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-1:
>>>>>>>> remote operation failed [No such file or
directory]
>>>>>>>> [2016-08-11 04:35:21.212121] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk]
0-GLUSTER1-client-2:
>>>>>>>> remote operation failed [No such file or
directory]
>>>>>>>>
>>>>>>>> I attached vdsm.log starting from when I spun
up vm on offending
>>>>>>>> node
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160815/19fd23ff/attachment.html>

Gluster users - Aug 2016 - Fwd: vm paused unknown storage error one node out of 3 only

[Gluster-users] Fwd: vm paused unknown storage error one node out of 3 only

[Gluster-users] Fwd: vm paused unknown storage error one node out of 3 only