Ian Geiser
2018-Jul-13 20:48 UTC
[Gluster-users] Insanely long times to run qemu libgfapi operations
Greetings, I am having problems diagnosing an issue with qemu connecting to gluster with the gfapi. In my current setup, I am using a 6 node "Distributed-Disperse" configuration on glusterfs 4.0.2 on Ubuntu 18.04. Below is my current configuration: root at hio-5:~# gluster volume info shared Volume Name: shared Type: Distributed-Disperse Volume ID: 8b8432c4-3b7a-4549-ace7-341deff3fe3d Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: ff422cf72ea6.hio.internal:/zdata/shared/brick Brick2: 958c9787c9f8.hio.internal:/zdata/shared/brick Brick3: 7f8901a86f13.hio.internal:/zdata/shared/brick Brick4: a204b6b51172.hio.internal:/zdata/shared/brick Brick5: fceb09117433.hio.internal:/zdata/shared/brick Brick6: ac78682c85d8.hio.internal:/zdata/shared/brick Options Reconfigured: nfs.disable: on transport.address-family: inet server.allow-insecure: on storage.owner-uid: 64055 storage.owner-gid: 116 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 What is strange is ANY operation I perform via the libgfapi in qemu-img and qemu takes an insanely long time. Somehow qemu-nbd seems to be the exception. On a completely idle system I see the following: root at hio-5:~# time qemu-img create -f qcow2 gluster://localhost/shared/test-gfapi.qcow2 40G Formatting 'gluster://localhost/shared/test-gfapi.qcow2', fmt=qcow2 size=42949672960 cluster_size=65536 lazy_refcounts=off refcount_bits=16 real 0m28.025s user 0m0.099s sys 0m0.109s root at hio-5:~# time qemu-img info gluster://localhost/shared/test-gfapi.qcow2 image: gluster://localhost/shared/test-gfapi.qcow2 file format: qcow2 virtual size: 40G (42949672960 bytes) disk size: 5.0K cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false real 0m7.005s user 0m0.024s sys 0m0.002s but with fuse I see this: root at hio-5:~# time qemu-img create -f qcow2 /mnt/1c96607d-3db4-40ef-a097-31780f45748b/test-fuse.qcow2 40G Formatting '/mnt/1c96607d-3db4-40ef-a097-31780f45748b/test-fuse.qcow2', fmt=qcow2 size=42949672960 cluster_size=65536 lazy_refcounts=off refcount_bits=16 real 0m0.485s user 0m0.011s sys 0m0.021s root at hio-5:~# time qemu-img info /mnt/1c96607d-3db4-40ef-a097-31780f45748b/test-fuse.qcow2 image: /mnt/1c96607d-3db4-40ef-a097-31780f45748b/test-fuse.qcow2 file format: qcow2 virtual size: 40G (42949672960 bytes) disk size: 5.0K cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false real 0m0.533s user 0m0.000s sys 0m0.015s Now, this strikes me as so wrong that I would think if this was normal this would be the #1 google hit. In the hopes of debugging this where would I start? Fuse looks just fine, so it must be something else I am doing wrong. Thanks!