Krutika Dhananjay
2016-Aug-15 23:24 UTC
[Gluster-users] Fwd: vm paused unknown storage error one node out of 3 only
No. The EEXIST errors are normal and can be ignored. This can happen when multiple threads try to create the same shard in parallel. Nothing wrong with that. -Krutika On Tue, Aug 16, 2016 at 1:02 AM, David Gossage <dgossage at carouselchecks.com> wrote:> On Sat, Aug 13, 2016 at 6:37 AM, David Gossage < > dgossage at carouselchecks.com> wrote: > >> Here is reply again just in case. I got quarantine message so not sure >> if first went through or wll anytime soon. Brick logs weren't large so Ill >> just include as text files this time >> > > Did maintenance over weekend updating ovirt from 3.6.6->3.6.7 and after > restrating the complaining ovirt node I was able to migrate the 2 vm with > issues. So not sure why the mount got stale, but I imagine that one node > couldn't see the new image files after that had occurred? > > Still getting a few sporadic errors, but seems much fewer than before and > never get any corresponding notices in any other log files > > [2016-08-15 13:40:31.510798] E [MSGID: 113022] [posix.c:1245:posix_mknod] > 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/ > 0e5ad95d-722d-4374-88fb-66fca0b14341.584 failed [File exists] > [2016-08-15 13:40:31.522067] E [MSGID: 113022] [posix.c:1245:posix_mknod] > 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/ > 0e5ad95d-722d-4374-88fb-66fca0b14341.584 failed [File exists] > [2016-08-15 17:47:06.375708] E [MSGID: 113022] [posix.c:1245:posix_mknod] > 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/ > d5a328be-03d0-42f7-a443-248290849e7d.722 failed [File exists] > [2016-08-15 17:47:26.435198] E [MSGID: 113022] [posix.c:1245:posix_mknod] > 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/ > d5a328be-03d0-42f7-a443-248290849e7d.723 failed [File exists] > [2016-08-15 17:47:06.405481] E [MSGID: 113022] [posix.c:1245:posix_mknod] > 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/ > d5a328be-03d0-42f7-a443-248290849e7d.722 failed [File exists] > [2016-08-15 17:47:26.464542] E [MSGID: 113022] [posix.c:1245:posix_mknod] > 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/ > d5a328be-03d0-42f7-a443-248290849e7d.723 failed [File exists] > [2016-08-15 18:46:47.187967] E [MSGID: 113022] [posix.c:1245:posix_mknod] > 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/ > f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.739 failed [File exists] > [2016-08-15 18:47:41.414312] E [MSGID: 113022] [posix.c:1245:posix_mknod] > 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/ > f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed [File exists] > [2016-08-15 18:47:41.450470] E [MSGID: 113022] [posix.c:1245:posix_mknod] > 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/ > f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed [File exists] > > > > > > The attached file bricks.zip you sent to <kdhananj at redhat.com>;<Gluster >> -users at gluster.org> on 8/13/2016 7:17:35 AM was quarantined. As a safety >> precaution, the University of South Carolina quarantines .zip and .docm >> files sent via email. If this is a legitimate attachment < >> kdhananj at redhat.com>;<Gluster-users at gluster.org> may contact the Service >> Desk at 803-777-1800 (servicedesk at sc.edu) and the attachment file will >> be released from quarantine and delivered. >> >> >> On Sat, Aug 13, 2016 at 6:15 AM, David Gossage < >> dgossage at carouselchecks.com> wrote: >> >>> On Sat, Aug 13, 2016 at 12:26 AM, Krutika Dhananjay <kdhananj at redhat.com >>> > wrote: >>> >>>> 1. Could you share the output of `gluster volume heal <VOL> info`? >>>> >>> Results were same moments after issue occurred as well >>> Brick ccgl1.gl.local:/gluster1/BRICK1/1 >>> Status: Connected >>> Number of entries: 0 >>> >>> Brick ccgl2.gl.local:/gluster1/BRICK1/1 >>> Status: Connected >>> Number of entries: 0 >>> >>> Brick ccgl4.gl.local:/gluster1/BRICK1/1 >>> Status: Connected >>> Number of entries: 0 >>> >>> >>> >>>> 2. `gluster volume info` >>>> >>> Volume Name: GLUSTER1 >>> Type: Replicate >>> Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f >>> Status: Started >>> Number of Bricks: 1 x 3 = 3 >>> Transport-type: tcp >>> Bricks: >>> Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >>> Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >>> Brick3: ccgl4.gl.local:/gluster1/BRICK1/1 >>> Options Reconfigured: >>> cluster.locking-scheme: granular >>> nfs.enable-ino32: off >>> nfs.addr-namelookup: off >>> nfs.disable: on >>> performance.strict-write-ordering: off >>> cluster.background-self-heal-count: 16 >>> cluster.self-heal-window-size: 1024 >>> server.allow-insecure: on >>> cluster.server-quorum-type: server >>> cluster.quorum-type: auto >>> network.remote-dio: enable >>> cluster.eager-lock: enable >>> performance.stat-prefetch: on >>> performance.io-cache: off >>> performance.read-ahead: off >>> performance.quick-read: off >>> storage.owner-gid: 36 >>> storage.owner-uid: 36 >>> performance.readdir-ahead: on >>> features.shard: on >>> features.shard-block-size: 64MB >>> diagnostics.brick-log-level: WARNING >>> >>> >>> >>>> 3. fuse mount logs of the affected volume(s)? >>>> >>> [2016-08-12 21:34:19.518511] W [MSGID: 114031] >>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: >>> remote operation failed [No such file or directory] >>> [2016-08-12 21:34:19.519115] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>> remote operation failed [No such file or directory] >>> [2016-08-12 21:34:19.519203] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>> remote operation failed [No such file or directory] >>> [2016-08-12 21:34:19.519226] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>> remote operation failed [No such file or directory] >>> [2016-08-12 21:34:19.520737] W [MSGID: 108008] >>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable >>> subvolume -1 found with event generation 3 for gfid >>> e18650c4-02c0-4a5a-bd4c-bbdf5fbd9c88. (Possible split-brain) >>> [2016-08-12 21:34:19.521393] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>> remote operation failed [No such file or directory] >>> [2016-08-12 21:34:19.522269] E [MSGID: 109040] >>> [dht-helper.c:1190:dht_migration_complete_check_task] 0-GLUSTER1-dht: >>> (null): failed to lookup the file on GLUSTER1-dht [Stale file handle] >>> [2016-08-12 21:34:19.522341] W [fuse-bridge.c:2227:fuse_readv_cbk] >>> 0-glusterfs-fuse: 18479997: READ => -1 gfid=31d7c904-775e-4b9f-8ef7-888218679845 >>> fd=0x7f00a80bde58 (Stale file handle) >>> [2016-08-12 21:34:19.521296] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>> remote operation failed [No such file or directory] >>> [2016-08-12 21:34:19.521357] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>> remote operation failed [No such file or directory] >>> [2016-08-12 22:15:08.337528] I [MSGID: 109066] >>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming >>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4 >>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new >>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) => >>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4 >>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta >>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) >>> [2016-08-12 22:15:12.240026] I [MSGID: 109066] >>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming >>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4 >>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new >>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) => >>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4 >>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta >>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) >>> [2016-08-12 22:15:11.105593] I [MSGID: 109066] >>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming >>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4 >>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new >>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) => >>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4 >>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta >>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) >>> [2016-08-12 22:15:14.772713] I [MSGID: 109066] >>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming >>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4 >>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new >>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) => >>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4 >>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta >>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) >>> >>> 4. glustershd logs >>>> >>> Nothing recent same on all 3 storage nodes >>> [2016-08-07 08:48:03.593401] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] >>> 0-glusterfs: No change in volfile, continuing >>> [2016-08-11 08:14:03.683287] I [MSGID: 100011] >>> [glusterfsd.c:1323:reincarnate] 0-glusterfsd: Fetching the volume file >>> from server... >>> [2016-08-11 08:14:03.684492] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] >>> 0-glusterfs: No change in volfile, continuing >>> >>> >>> >>>> 5. Brick logs >>>> >>> Their have been some error in brick logs I hadn't noticed occurring. >>> I've zip'd and attached all 3 nodes logs, but from this snippet on one node >>> none of them seem to coincide with the time window when migration had >>> issues. f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42 shard refers to an image >>> for a different vm than one I had issues with as well. Maybe gluster is >>> trying to do some sort of make shard test before writing out changes that >>> would go to that image and that shard file? >>> >>> [2016-08-12 18:48:22.463628] E [MSGID: 113022] >>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>> /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.697 >>> failed [File exists] >>> [2016-08-12 18:48:24.553455] E [MSGID: 113022] >>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>> /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698 >>> failed [File exists] >>> [2016-08-12 18:49:16.065502] E [MSGID: 113022] >>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>> /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.738 >>> failed [File exists] >>> The message "E [MSGID: 113022] [posix.c:1245:posix_mknod] >>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7 >>> f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed [File exists]" repeated 5 >>> times between [2016-08-12 18:48:22.463628] and [2016-08-12 18:48:22.514777] >>> [2016-08-12 18:48:24.581216] E [MSGID: 113022] >>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>> /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698 >>> failed [File exists] >>> The message "E [MSGID: 113022] [posix.c:1245:posix_mknod] >>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7 >>> f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed [File exists]" repeated 5 >>> times between [2016-08-12 18:49:16.065502] and [2016-08-12 18:49:16.107746] >>> [2016-08-12 19:23:40.964678] E [MSGID: 113022] >>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>> /gluster1/BRICK1/1/.shard/83794e5d-2225-4560-8df6-7c903c8a648a.1301 >>> failed [File exists] >>> [2016-08-12 20:00:33.498751] E [MSGID: 113022] >>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>> /gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580 >>> failed [File exists] >>> [2016-08-12 20:00:33.530938] E [MSGID: 113022] >>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>> /gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580 >>> failed [File exists] >>> [2016-08-13 01:47:23.338036] E [MSGID: 113022] >>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>> /gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.211 >>> failed [File exists] >>> The message "E [MSGID: 113022] [posix.c:1245:posix_mknod] >>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/1884 >>> 3fb4-e31c-4fc3-b519-cc6e5e947813.211 failed [File exists]" repeated 16 >>> times between [2016-08-13 01:47:23.338036] and [2016-08-13 01:47:23.380980] >>> [2016-08-13 01:48:02.224494] E [MSGID: 113022] >>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>> /gluster1/BRICK1/1/.shard/ffbbcce0-3c4a-4fdf-b79f-a96ca3215657.211 >>> failed [File exists] >>> [2016-08-13 01:48:42.266148] E [MSGID: 113022] >>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>> /gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.177 >>> failed [File exists] >>> [2016-08-13 01:49:09.717434] E [MSGID: 113022] >>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>> /gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.178 >>> failed [File exists] >>> >>> >>>> -Krutika >>>> >>>> >>>> On Sat, Aug 13, 2016 at 3:10 AM, David Gossage < >>>> dgossage at carouselchecks.com> wrote: >>>> >>>>> On Fri, Aug 12, 2016 at 4:25 PM, Dan Lavu <dan at redhat.com> wrote: >>>>> >>>>>> David, >>>>>> >>>>>> I'm seeing similar behavior in my lab, but it has been caused by >>>>>> healing files in the gluster cluster, though I attribute my problems to >>>>>> problems with the storage fabric. See if 'gluster volume heal $VOL info' >>>>>> indicates files that are being healed, and if those reduce in number, can >>>>>> the VM start? >>>>>> >>>>>> >>>>> I haven't had any files in a state of being healed according to either >>>>> of the 3 storage nodes. >>>>> >>>>> I shut down one VM that has been around awhile a moment ago then told >>>>> it to start on the one ovirt server that complained previously. It ran >>>>> fine, and I was able to migrate it off and on the host no issues. >>>>> >>>>> I told one of the new VM's to migrate to the one node and within >>>>> seconds it paused from unknown storage errors no shards showing heals >>>>> nothing with an error on storage node. Same stale file handle issues. >>>>> >>>>> I'll probably put this node in maintenance later and reboot it. Other >>>>> than that I may re-clone those 2 reccent VM's. maybe images just got >>>>> corrupted though why it would only fail on one node of 3 if image was bad >>>>> not sure. >>>>> >>>>> >>>>> Dan >>>>>> >>>>>> On Thu, Aug 11, 2016 at 7:52 AM, David Gossage < >>>>>> dgossage at carouselchecks.com> wrote: >>>>>> >>>>>>> Figure I would repost here as well. one client out of 3 complaining >>>>>>> of stale file handles on a few new VM's I migrated over. No errors on >>>>>>> storage nodes just client. Maybe just put that one in maintenance and >>>>>>> restart gluster mount? >>>>>>> >>>>>>> *David Gossage* >>>>>>> *Carousel Checks Inc. | System Administrator* >>>>>>> *Office* 708.613.2284 >>>>>>> >>>>>>> ---------- Forwarded message ---------- >>>>>>> From: David Gossage <dgossage at carouselchecks.com> >>>>>>> Date: Thu, Aug 11, 2016 at 12:17 AM >>>>>>> Subject: vm paused unknown storage error one node out of 3 only >>>>>>> To: users <users at ovirt.org> >>>>>>> >>>>>>> >>>>>>> Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3 >>>>>>> replicate gluster 3.7.14 starting a VM i just copied in on one node of the >>>>>>> 3 gets the following errors. The other 2 the vm starts fine. All ovirt >>>>>>> and gluster are centos 7 based. VM on start of the one node it tries to >>>>>>> default to on its own accord immediately puts into paused for unknown >>>>>>> reason. Telling it to start on different node starts ok. node with issue >>>>>>> already has 5 VMs running fine on it same gluster storage plus the hosted >>>>>>> engine on different volume. >>>>>>> >>>>>>> gluster nodes logs did not have any errors for volume >>>>>>> nodes own gluster logs had this in log >>>>>>> >>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs >>>>>>> .shard or images/ >>>>>>> >>>>>>> 7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable >>>>>>> drive of the vm >>>>>>> >>>>>>> [2016-08-11 04:31:39.982952] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-2: >>>>>>> remote operation failed [No such file or directory] >>>>>>> [2016-08-11 04:31:39.983683] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>>>>>> remote operation failed [No such file or directory] >>>>>>> [2016-08-11 04:31:39.984182] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>>>>>> remote operation failed [No such file or directory] >>>>>>> [2016-08-11 04:31:39.984221] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>>>>>> remote operation failed [No such file or directory] >>>>>>> [2016-08-11 04:31:39.985941] W [MSGID: 108008] >>>>>>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: >>>>>>> Unreadable subvolume -1 found with event generation 3 for gfid >>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain) >>>>>>> [2016-08-11 04:31:39.986633] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>>>>>> remote operation failed [No such file or directory] >>>>>>> [2016-08-11 04:31:39.987644] E [MSGID: 109040] >>>>>>> [dht-helper.c:1190:dht_migration_complete_check_task] >>>>>>> 0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale >>>>>>> file handle] >>>>>>> [2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>>>>> 0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43 >>>>>>> fd=0x7f00a80bdb64 (Stale file handle) >>>>>>> [2016-08-11 04:31:39.986567] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>>>>>> remote operation failed [No such file or directory] >>>>>>> [2016-08-11 04:31:39.986567] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>>>>>> remote operation failed [No such file or directory] >>>>>>> [2016-08-11 04:35:21.210145] W [MSGID: 108008] >>>>>>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: >>>>>>> Unreadable subvolume -1 found with event generation 3 for gfid >>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain) >>>>>>> [2016-08-11 04:35:21.210873] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>>>>>> remote operation failed [No such file or directory] >>>>>>> [2016-08-11 04:35:21.210888] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>>>>>> remote operation failed [No such file or directory] >>>>>>> [2016-08-11 04:35:21.210947] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>>>>>> remote operation failed [No such file or directory] >>>>>>> [2016-08-11 04:35:21.213270] E [MSGID: 109040] >>>>>>> [dht-helper.c:1190:dht_migration_complete_check_task] >>>>>>> 0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale >>>>>>> file handle] >>>>>>> [2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>>>>> 0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43 >>>>>>> fd=0x7f00a80bf6d0 (Stale file handle) >>>>>>> [2016-08-11 04:35:21.211516] W [MSGID: 108008] >>>>>>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: >>>>>>> Unreadable subvolume -1 found with event generation 3 for gfid >>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain) >>>>>>> [2016-08-11 04:35:21.212013] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>>>>>> remote operation failed [No such file or directory] >>>>>>> [2016-08-11 04:35:21.212081] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>>>>>> remote operation failed [No such file or directory] >>>>>>> [2016-08-11 04:35:21.212121] W [MSGID: 114031] >>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>>>>>> remote operation failed [No such file or directory] >>>>>>> >>>>>>> I attached vdsm.log starting from when I spun up vm on offending node >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>> >>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160816/d0219406/attachment.html>
David Gossage
2016-Aug-16 00:20 UTC
[Gluster-users] Fwd: vm paused unknown storage error one node out of 3 only
On Mon, Aug 15, 2016 at 6:24 PM, Krutika Dhananjay <kdhananj at redhat.com> wrote:> No. The EEXIST errors are normal and can be ignored. This can happen when > multiple threads try to create the same > shard in parallel. Nothing wrong with that. > >Other than they pop up as E errors making a user worry hehe Is their a known bug filed against that or should I maybe create one to see if we can get that sent to an informational level maybe?> -Krutika > > On Tue, Aug 16, 2016 at 1:02 AM, David Gossage < > dgossage at carouselchecks.com> wrote: > >> On Sat, Aug 13, 2016 at 6:37 AM, David Gossage < >> dgossage at carouselchecks.com> wrote: >> >>> Here is reply again just in case. I got quarantine message so not sure >>> if first went through or wll anytime soon. Brick logs weren't large so Ill >>> just include as text files this time >>> >> >> Did maintenance over weekend updating ovirt from 3.6.6->3.6.7 and after >> restrating the complaining ovirt node I was able to migrate the 2 vm with >> issues. So not sure why the mount got stale, but I imagine that one node >> couldn't see the new image files after that had occurred? >> >> Still getting a few sporadic errors, but seems much fewer than before and >> never get any corresponding notices in any other log files >> >> [2016-08-15 13:40:31.510798] E [MSGID: 113022] [posix.c:1245:posix_mknod] >> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/0e5a >> d95d-722d-4374-88fb-66fca0b14341.584 failed [File exists] >> [2016-08-15 13:40:31.522067] E [MSGID: 113022] [posix.c:1245:posix_mknod] >> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/0e5a >> d95d-722d-4374-88fb-66fca0b14341.584 failed [File exists] >> [2016-08-15 17:47:06.375708] E [MSGID: 113022] [posix.c:1245:posix_mknod] >> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/d5a3 >> 28be-03d0-42f7-a443-248290849e7d.722 failed [File exists] >> [2016-08-15 17:47:26.435198] E [MSGID: 113022] [posix.c:1245:posix_mknod] >> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/d5a3 >> 28be-03d0-42f7-a443-248290849e7d.723 failed [File exists] >> [2016-08-15 17:47:06.405481] E [MSGID: 113022] [posix.c:1245:posix_mknod] >> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/d5a3 >> 28be-03d0-42f7-a443-248290849e7d.722 failed [File exists] >> [2016-08-15 17:47:26.464542] E [MSGID: 113022] [posix.c:1245:posix_mknod] >> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/d5a3 >> 28be-03d0-42f7-a443-248290849e7d.723 failed [File exists] >> [2016-08-15 18:46:47.187967] E [MSGID: 113022] [posix.c:1245:posix_mknod] >> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7 >> f3c5-4c13-4020-b560-1f4f7b1e3c42.739 failed [File exists] >> [2016-08-15 18:47:41.414312] E [MSGID: 113022] [posix.c:1245:posix_mknod] >> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7 >> f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed [File exists] >> [2016-08-15 18:47:41.450470] E [MSGID: 113022] [posix.c:1245:posix_mknod] >> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7 >> f3c5-4c13-4020-b560-1f4f7b1e3c42.779 failed [File exists] >> >> >> >> >> >> The attached file bricks.zip you sent to <kdhananj at redhat.com>;<Gluster >>> -users at gluster.org> on 8/13/2016 7:17:35 AM was quarantined. As a >>> safety precaution, the University of South Carolina quarantines .zip and >>> .docm files sent via email. If this is a legitimate attachment < >>> kdhananj at redhat.com>;<Gluster-users at gluster.org> may contact the >>> Service Desk at 803-777-1800 (servicedesk at sc.edu) and the attachment >>> file will be released from quarantine and delivered. >>> >>> >>> On Sat, Aug 13, 2016 at 6:15 AM, David Gossage < >>> dgossage at carouselchecks.com> wrote: >>> >>>> On Sat, Aug 13, 2016 at 12:26 AM, Krutika Dhananjay < >>>> kdhananj at redhat.com> wrote: >>>> >>>>> 1. Could you share the output of `gluster volume heal <VOL> info`? >>>>> >>>> Results were same moments after issue occurred as well >>>> Brick ccgl1.gl.local:/gluster1/BRICK1/1 >>>> Status: Connected >>>> Number of entries: 0 >>>> >>>> Brick ccgl2.gl.local:/gluster1/BRICK1/1 >>>> Status: Connected >>>> Number of entries: 0 >>>> >>>> Brick ccgl4.gl.local:/gluster1/BRICK1/1 >>>> Status: Connected >>>> Number of entries: 0 >>>> >>>> >>>> >>>>> 2. `gluster volume info` >>>>> >>>> Volume Name: GLUSTER1 >>>> Type: Replicate >>>> Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f >>>> Status: Started >>>> Number of Bricks: 1 x 3 = 3 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >>>> Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >>>> Brick3: ccgl4.gl.local:/gluster1/BRICK1/1 >>>> Options Reconfigured: >>>> cluster.locking-scheme: granular >>>> nfs.enable-ino32: off >>>> nfs.addr-namelookup: off >>>> nfs.disable: on >>>> performance.strict-write-ordering: off >>>> cluster.background-self-heal-count: 16 >>>> cluster.self-heal-window-size: 1024 >>>> server.allow-insecure: on >>>> cluster.server-quorum-type: server >>>> cluster.quorum-type: auto >>>> network.remote-dio: enable >>>> cluster.eager-lock: enable >>>> performance.stat-prefetch: on >>>> performance.io-cache: off >>>> performance.read-ahead: off >>>> performance.quick-read: off >>>> storage.owner-gid: 36 >>>> storage.owner-uid: 36 >>>> performance.readdir-ahead: on >>>> features.shard: on >>>> features.shard-block-size: 64MB >>>> diagnostics.brick-log-level: WARNING >>>> >>>> >>>> >>>>> 3. fuse mount logs of the affected volume(s)? >>>>> >>>> [2016-08-12 21:34:19.518511] W [MSGID: 114031] >>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: >>>> remote operation failed [No such file or directory] >>>> [2016-08-12 21:34:19.519115] W [MSGID: 114031] >>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>>> remote operation failed [No such file or directory] >>>> [2016-08-12 21:34:19.519203] W [MSGID: 114031] >>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>>> remote operation failed [No such file or directory] >>>> [2016-08-12 21:34:19.519226] W [MSGID: 114031] >>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>>> remote operation failed [No such file or directory] >>>> [2016-08-12 21:34:19.520737] W [MSGID: 108008] >>>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable >>>> subvolume -1 found with event generation 3 for gfid >>>> e18650c4-02c0-4a5a-bd4c-bbdf5fbd9c88. (Possible split-brain) >>>> [2016-08-12 21:34:19.521393] W [MSGID: 114031] >>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>>> remote operation failed [No such file or directory] >>>> [2016-08-12 21:34:19.522269] E [MSGID: 109040] >>>> [dht-helper.c:1190:dht_migration_complete_check_task] 0-GLUSTER1-dht: >>>> (null): failed to lookup the file on GLUSTER1-dht [Stale file handle] >>>> [2016-08-12 21:34:19.522341] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>> 0-glusterfs-fuse: 18479997: READ => -1 gfid=31d7c904-775e-4b9f-8ef7-888218679845 >>>> fd=0x7f00a80bde58 (Stale file handle) >>>> [2016-08-12 21:34:19.521296] W [MSGID: 114031] >>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>>> remote operation failed [No such file or directory] >>>> [2016-08-12 21:34:19.521357] W [MSGID: 114031] >>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>>> remote operation failed [No such file or directory] >>>> [2016-08-12 22:15:08.337528] I [MSGID: 109066] >>>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming >>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4 >>>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new >>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) => >>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4 >>>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta >>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) >>>> [2016-08-12 22:15:12.240026] I [MSGID: 109066] >>>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming >>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4 >>>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new >>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) => >>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4 >>>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta >>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) >>>> [2016-08-12 22:15:11.105593] I [MSGID: 109066] >>>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming >>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4 >>>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta.new >>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) => >>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/ec4f5b10-02b1-4 >>>> 35c-a7e1-97e399532597/0e6ed1c3-ffe0-43b0-9863-439ccc3193c9.meta >>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) >>>> [2016-08-12 22:15:14.772713] I [MSGID: 109066] >>>> [dht-rename.c:1568:dht_rename] 0-GLUSTER1-dht: renaming >>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4 >>>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta.new >>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) => >>>> /7c73a8dd-a72e-4556-ac88-7f6813131e64/images/78636a1b-86dd-4 >>>> aaf-8b4f-4ab9c3509e88/4707d651-06c6-446b-b9c8-408004a55ada.meta >>>> (hash=GLUSTER1-replicate-0/cache=GLUSTER1-replicate-0) >>>> >>>> 4. glustershd logs >>>>> >>>> Nothing recent same on all 3 storage nodes >>>> [2016-08-07 08:48:03.593401] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] >>>> 0-glusterfs: No change in volfile, continuing >>>> [2016-08-11 08:14:03.683287] I [MSGID: 100011] >>>> [glusterfsd.c:1323:reincarnate] 0-glusterfsd: Fetching the volume file >>>> from server... >>>> [2016-08-11 08:14:03.684492] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] >>>> 0-glusterfs: No change in volfile, continuing >>>> >>>> >>>> >>>>> 5. Brick logs >>>>> >>>> Their have been some error in brick logs I hadn't noticed occurring. >>>> I've zip'd and attached all 3 nodes logs, but from this snippet on one node >>>> none of them seem to coincide with the time window when migration had >>>> issues. f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42 shard refers to an image >>>> for a different vm than one I had issues with as well. Maybe gluster is >>>> trying to do some sort of make shard test before writing out changes that >>>> would go to that image and that shard file? >>>> >>>> [2016-08-12 18:48:22.463628] E [MSGID: 113022] >>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>>> /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.697 >>>> failed [File exists] >>>> [2016-08-12 18:48:24.553455] E [MSGID: 113022] >>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>>> /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698 >>>> failed [File exists] >>>> [2016-08-12 18:49:16.065502] E [MSGID: 113022] >>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>>> /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.738 >>>> failed [File exists] >>>> The message "E [MSGID: 113022] [posix.c:1245:posix_mknod] >>>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7 >>>> f3c5-4c13-4020-b560-1f4f7b1e3c42.697 failed [File exists]" repeated 5 >>>> times between [2016-08-12 18:48:22.463628] and [2016-08-12 18:48:22.514777] >>>> [2016-08-12 18:48:24.581216] E [MSGID: 113022] >>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>>> /gluster1/BRICK1/1/.shard/f9a7f3c5-4c13-4020-b560-1f4f7b1e3c42.698 >>>> failed [File exists] >>>> The message "E [MSGID: 113022] [posix.c:1245:posix_mknod] >>>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/f9a7 >>>> f3c5-4c13-4020-b560-1f4f7b1e3c42.738 failed [File exists]" repeated 5 >>>> times between [2016-08-12 18:49:16.065502] and [2016-08-12 18:49:16.107746] >>>> [2016-08-12 19:23:40.964678] E [MSGID: 113022] >>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>>> /gluster1/BRICK1/1/.shard/83794e5d-2225-4560-8df6-7c903c8a648a.1301 >>>> failed [File exists] >>>> [2016-08-12 20:00:33.498751] E [MSGID: 113022] >>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>>> /gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580 >>>> failed [File exists] >>>> [2016-08-12 20:00:33.530938] E [MSGID: 113022] >>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>>> /gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.580 >>>> failed [File exists] >>>> [2016-08-13 01:47:23.338036] E [MSGID: 113022] >>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>>> /gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.211 >>>> failed [File exists] >>>> The message "E [MSGID: 113022] [posix.c:1245:posix_mknod] >>>> 0-GLUSTER1-posix: mknod on /gluster1/BRICK1/1/.shard/1884 >>>> 3fb4-e31c-4fc3-b519-cc6e5e947813.211 failed [File exists]" repeated 16 >>>> times between [2016-08-13 01:47:23.338036] and [2016-08-13 01:47:23.380980] >>>> [2016-08-13 01:48:02.224494] E [MSGID: 113022] >>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>>> /gluster1/BRICK1/1/.shard/ffbbcce0-3c4a-4fdf-b79f-a96ca3215657.211 >>>> failed [File exists] >>>> [2016-08-13 01:48:42.266148] E [MSGID: 113022] >>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>>> /gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.177 >>>> failed [File exists] >>>> [2016-08-13 01:49:09.717434] E [MSGID: 113022] >>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on >>>> /gluster1/BRICK1/1/.shard/18843fb4-e31c-4fc3-b519-cc6e5e947813.178 >>>> failed [File exists] >>>> >>>> >>>>> -Krutika >>>>> >>>>> >>>>> On Sat, Aug 13, 2016 at 3:10 AM, David Gossage < >>>>> dgossage at carouselchecks.com> wrote: >>>>> >>>>>> On Fri, Aug 12, 2016 at 4:25 PM, Dan Lavu <dan at redhat.com> wrote: >>>>>> >>>>>>> David, >>>>>>> >>>>>>> I'm seeing similar behavior in my lab, but it has been caused by >>>>>>> healing files in the gluster cluster, though I attribute my problems to >>>>>>> problems with the storage fabric. See if 'gluster volume heal $VOL info' >>>>>>> indicates files that are being healed, and if those reduce in number, can >>>>>>> the VM start? >>>>>>> >>>>>>> >>>>>> I haven't had any files in a state of being healed according to >>>>>> either of the 3 storage nodes. >>>>>> >>>>>> I shut down one VM that has been around awhile a moment ago then told >>>>>> it to start on the one ovirt server that complained previously. It ran >>>>>> fine, and I was able to migrate it off and on the host no issues. >>>>>> >>>>>> I told one of the new VM's to migrate to the one node and within >>>>>> seconds it paused from unknown storage errors no shards showing heals >>>>>> nothing with an error on storage node. Same stale file handle issues. >>>>>> >>>>>> I'll probably put this node in maintenance later and reboot it. >>>>>> Other than that I may re-clone those 2 reccent VM's. maybe images just got >>>>>> corrupted though why it would only fail on one node of 3 if image was bad >>>>>> not sure. >>>>>> >>>>>> >>>>>> Dan >>>>>>> >>>>>>> On Thu, Aug 11, 2016 at 7:52 AM, David Gossage < >>>>>>> dgossage at carouselchecks.com> wrote: >>>>>>> >>>>>>>> Figure I would repost here as well. one client out of 3 >>>>>>>> complaining of stale file handles on a few new VM's I migrated over. No >>>>>>>> errors on storage nodes just client. Maybe just put that one in >>>>>>>> maintenance and restart gluster mount? >>>>>>>> >>>>>>>> *David Gossage* >>>>>>>> *Carousel Checks Inc. | System Administrator* >>>>>>>> *Office* 708.613.2284 >>>>>>>> >>>>>>>> ---------- Forwarded message ---------- >>>>>>>> From: David Gossage <dgossage at carouselchecks.com> >>>>>>>> Date: Thu, Aug 11, 2016 at 12:17 AM >>>>>>>> Subject: vm paused unknown storage error one node out of 3 only >>>>>>>> To: users <users at ovirt.org> >>>>>>>> >>>>>>>> >>>>>>>> Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a >>>>>>>> 3 replicate gluster 3.7.14 starting a VM i just copied in on one node of >>>>>>>> the 3 gets the following errors. The other 2 the vm starts fine. All >>>>>>>> ovirt and gluster are centos 7 based. VM on start of the one node it tries >>>>>>>> to default to on its own accord immediately puts into paused for unknown >>>>>>>> reason. Telling it to start on different node starts ok. node with issue >>>>>>>> already has 5 VMs running fine on it same gluster storage plus the hosted >>>>>>>> engine on different volume. >>>>>>>> >>>>>>>> gluster nodes logs did not have any errors for volume >>>>>>>> nodes own gluster logs had this in log >>>>>>>> >>>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs >>>>>>>> .shard or images/ >>>>>>>> >>>>>>>> 7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable >>>>>>>> drive of the vm >>>>>>>> >>>>>>>> [2016-08-11 04:31:39.982952] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-2: >>>>>>>> remote operation failed [No such file or directory] >>>>>>>> [2016-08-11 04:31:39.983683] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>>>>>>> remote operation failed [No such file or directory] >>>>>>>> [2016-08-11 04:31:39.984182] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>>>>>>> remote operation failed [No such file or directory] >>>>>>>> [2016-08-11 04:31:39.984221] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>>>>>>> remote operation failed [No such file or directory] >>>>>>>> [2016-08-11 04:31:39.985941] W [MSGID: 108008] >>>>>>>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: >>>>>>>> Unreadable subvolume -1 found with event generation 3 for gfid >>>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain) >>>>>>>> [2016-08-11 04:31:39.986633] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>>>>>>> remote operation failed [No such file or directory] >>>>>>>> [2016-08-11 04:31:39.987644] E [MSGID: 109040] >>>>>>>> [dht-helper.c:1190:dht_migration_complete_check_task] >>>>>>>> 0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale >>>>>>>> file handle] >>>>>>>> [2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>>>>>> 0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43 >>>>>>>> fd=0x7f00a80bdb64 (Stale file handle) >>>>>>>> [2016-08-11 04:31:39.986567] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>>>>>>> remote operation failed [No such file or directory] >>>>>>>> [2016-08-11 04:31:39.986567] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>>>>>>> remote operation failed [No such file or directory] >>>>>>>> [2016-08-11 04:35:21.210145] W [MSGID: 108008] >>>>>>>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: >>>>>>>> Unreadable subvolume -1 found with event generation 3 for gfid >>>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain) >>>>>>>> [2016-08-11 04:35:21.210873] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>>>>>>> remote operation failed [No such file or directory] >>>>>>>> [2016-08-11 04:35:21.210888] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>>>>>>> remote operation failed [No such file or directory] >>>>>>>> [2016-08-11 04:35:21.210947] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>>>>>>> remote operation failed [No such file or directory] >>>>>>>> [2016-08-11 04:35:21.213270] E [MSGID: 109040] >>>>>>>> [dht-helper.c:1190:dht_migration_complete_check_task] >>>>>>>> 0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale >>>>>>>> file handle] >>>>>>>> [2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk] >>>>>>>> 0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43 >>>>>>>> fd=0x7f00a80bf6d0 (Stale file handle) >>>>>>>> [2016-08-11 04:35:21.211516] W [MSGID: 108008] >>>>>>>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: >>>>>>>> Unreadable subvolume -1 found with event generation 3 for gfid >>>>>>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain) >>>>>>>> [2016-08-11 04:35:21.212013] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>>>>>>> remote operation failed [No such file or directory] >>>>>>>> [2016-08-11 04:35:21.212081] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>>>>>>> remote operation failed [No such file or directory] >>>>>>>> [2016-08-11 04:35:21.212121] W [MSGID: 114031] >>>>>>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>>>>>>> remote operation failed [No such file or directory] >>>>>>>> >>>>>>>> I attached vdsm.log starting from when I spun up vm on offending >>>>>>>> node >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160815/19fd23ff/attachment.html>