thr3ads.net - Gluster users - [Gluster-users] Very poor heal behaviour in 3.7.9 [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Lindsay Mathieson

2016-Mar-25 07:34 UTC

[Gluster-users] Very poor heal behaviour in 3.7.9

Have resumed testing with 3.7.9 - this time I have propery hardware 
behind it,

- 3 nodes
- each node with 4 WD Reds in ZFS raid 10
- SSD for slog and cache.

Using a sharded VM setup (4MB shards) and performance has been 
excellent, better than ceph on the same hardware. I have some 
interesting notes on that I will detail later.

However unlike with 3.7.7, heal performance has been abysmal - deal 
breaking in fact. Maybe its my setup?

Have been testing healing by killing  the glusterfsd and glusterd 
processes on another node and letting a VM run. Everything is fun at 
this point, despite a node being down, reads and writes continue normally.

However a heal info shows what appears to be an excessive number of 
shards being marked as needing heals. A simple reboot of a Windows VM 
results in 360 4MB shards - 1.5GB of data. A compile resulted in 7GB of 
shards being touched. Could there be some write amplification at work?

However once I restart the glusterd process, which starts glisterfsd 
performance becomes atrocious. Disk IO nearly stops and any running VM's 
hang or slow down and *lot* until the heal is complete. The "heal
info"
command appears to hang as well, not comppleting at all. A build process 
that was taking 4 min's took over an hour.

Once the heal finishes, I/O returns to normal.


Heres a fragment of the glfsheal log

[2016-03-25 07:12:51.041590] I [MSGID: 114057] 
[client-handshake.c:1437:select_server_supported_programs] 
0-datastore2-client-2: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2016-03-25 07:12:51.041637] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 
0-datastore2-client-1: changing port to 49153 (from 0)
[2016-03-25 07:12:51.041808] I [MSGID: 114046] 
[client-handshake.c:1213:client_setvolume_cbk] 0-datastore2-client-2: 
Connected to datastore2-client-2, attached to remote volume 
'/tank/vmdata/datastore2'.
[2016-03-25 07:12:51.041826] I [MSGID: 114047] 
[client-handshake.c:1224:client_setvolume_cbk] 0-datastore2-client-2: 
Server and Client lk-version numbers are not same, reopening the fds
[2016-03-25 07:12:51.041901] I [MSGID: 108005] 
[afr-common.c:4010:afr_notify] 0-datastore2-replicate-0: Subvolume 
'datastore2-client-2' came back up; going online.
[2016-03-25 07:12:51.041929] I [MSGID: 114057] 
[client-handshake.c:1437:select_server_supported_programs] 
0-datastore2-client-0: *Using Program GlusterFS 3.3, Num (1298437), 
Version (330)*
[2016-03-25 07:12:51.041955] I [MSGID: 114035] 
[client-handshake.c:193:client_set_lk_version_cbk] 
0-datastore2-client-2: Server lk version = 1
[2016-03-25 07:12:51.042319] I [MSGID: 114046] 
[client-handshake.c:1213:client_setvolume_cbk] 0-datastore2-client-0: 
Connected to datastore2-client-0, attached to remote volume 
'/tank/vmdata/datastore2'.
[2016-03-25 07:12:51.042333] I [MSGID: 114047] 
[client-handshake.c:1224:client_setvolume_cbk] 0-datastore2-client-0: 
Server and Client lk-version numbers are not same, reopening the fds
[2016-03-25 07:12:51.042455] I [MSGID: 114057] 
[client-handshake.c:1437:select_server_supported_programs] 
0-datastore2-client-1: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2016-03-25 07:12:51.042520] I [MSGID: 114035] 
[client-handshake.c:193:client_set_lk_version_cbk] 
0-datastore2-client-0: Server lk version = 1
[2016-03-25 07:12:51.042846] I [MSGID: 114046] 
[client-handshake.c:1213:client_setvolume_cbk] 0-datastore2-client-1: 
Connected to datastore2-client-1, attached to remote volume 
'/tank/vmdata/datastore2'.
[2016-03-25 07:12:51.042867] I [MSGID: 114047] 
[client-handshake.c:1224:client_setvolume_cbk] 0-datastore2-client-1: 
Server and Client lk-version numbers are not same, reopening the fds
[2016-03-25 07:12:51.058131] I [MSGID: 114035] 
[client-handshake.c:193:client_set_lk_version_cbk] 
0-datastore2-client-1: Server lk version = 1
[2016-03-25 07:12:51.059075] I [MSGID: 108031] 
[afr-common.c:1913:afr_local_discovery_cbk] 0-datastore2-replicate-0: 
selecting local read_child datastore2-client-2
[2016-03-25 07:12:51.059619] I [MSGID: 104041] 
[glfs-resolve.c:869:__glfs_active_subvol] 0-datastore2: switched to 
graph 766e612d-3739-3437-352d-323031362d30 (0)


I have no idea while client version 3.3 is being used! everything should 
be 3.7.9


Environment:

- Proxmox (debian Jessie, 8.2)
- KVM VM's using gfapi, running on the same nodes as the gluster bricks
- bricks are hosted on 3 ZFS Pools (one per node)
     * compression =lz4
     * xattr=sa
     * sync=standard
     * acltype=posixacl

Volume info:
Volume Name: datastore2
Type: Replicate
Volume ID: 7d93a1c6-ac39-4d94-b136-e8379643bddd
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore2
Brick2: vng.proxmox.softlog:/tank/vmdata/datastore2
Brick3: vna.proxmox.softlog:/tank/vmdata/datastore2
Options Reconfigured:
performance.readdir-ahead: on
nfs.addr-namelookup: off
nfs.enable-ino32: off
features.shard: on
cluster.quorum-type: auto
cluster.server-quorum-type: server
nfs.disable: on
performance.write-behind: off
performance.strict-write-ordering: on
performance.stat-prefetch: off
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
cluster.eager-lock: enable
network.remote-dio: enable



I can do any testing required, bring back logs etc. Can't build gluster 
though.


thanks,


-- 
Lindsay Mathieson


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160325/e9d2b80d/attachment.html>

Krutika Dhananjay

2016-Mar-25 12:33 UTC

head link

[Gluster-users] Very poor heal behaviour in 3.7.9

Hi,

There is one bug that was uncovered recently wherein the same file could
possibly get healed twice before marking that it no longer needs a heal.
Pranith sent a patch @ http://review.gluster.org/#/c/13766/ to fix this,
although IIUC this bug existed in versions < 3.7.9 as well.
Also because of this bug, files that need heal may appear in heal-info
output longer than they ought to.
Did you see this issue in versions < 3.7.9 as well?

-Krutika


On Fri, Mar 25, 2016 at 1:04 PM, Lindsay Mathieson <
lindsay.mathieson at gmail.com> wrote:
> Have resumed testing with 3.7.9 - this time I have propery hardware behind
> it,
>
> - 3 nodes
> - each node with 4 WD Reds in ZFS raid 10
> - SSD for slog and cache.
>
> Using a sharded VM setup (4MB shards) and performance has been excellent,
> better than ceph on the same hardware. I have some interesting notes on
> that I will detail later.
>
> However unlike with 3.7.7, heal performance has been abysmal - deal
> breaking in fact. Maybe its my setup?
>
> Have been testing healing by killing  the glusterfsd and glusterd
> processes on another node and letting a VM run. Everything is fun at this
> point, despite a node being down, reads and writes continue normally.
>
> However a heal info shows what appears to be an excessive number of shards
> being marked as needing heals. A simple reboot of a Windows VM results in
> 360 4MB shards - 1.5GB of data. A compile resulted in 7GB of shards being
> touched. Could there be some write amplification at work?
>
> However once I restart the glusterd process, which starts glisterfsd
> performance becomes atrocious. Disk IO nearly stops and any running
VM's
> hang or slow down and *lot* until the heal is complete. The "heal
info"
> command appears to hang as well, not comppleting at all. A build process
> that was taking 4 min's took over an hour.
>
> Once the heal finishes, I/O returns to normal.
>
>
> Heres a fragment of the glfsheal log
>
> [2016-03-25 07:12:51.041590] I [MSGID: 114057]
> [client-handshake.c:1437:select_server_supported_programs]
> 0-datastore2-client-2: Using Program GlusterFS 3.3, Num (1298437), Version
> (330)
> [2016-03-25 07:12:51.041637] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
> 0-datastore2-client-1: changing port to 49153 (from 0)
> [2016-03-25 07:12:51.041808] I [MSGID: 114046]
> [client-handshake.c:1213:client_setvolume_cbk] 0-datastore2-client-2:
> Connected to datastore2-client-2, attached to remote volume
> '/tank/vmdata/datastore2'.
> [2016-03-25 07:12:51.041826] I [MSGID: 114047]
> [client-handshake.c:1224:client_setvolume_cbk] 0-datastore2-client-2:
> Server and Client lk-version numbers are not same, reopening the fds
> [2016-03-25 07:12:51.041901] I [MSGID: 108005]
> [afr-common.c:4010:afr_notify] 0-datastore2-replicate-0: Subvolume
> 'datastore2-client-2' came back up; going online.
> [2016-03-25 07:12:51.041929] I [MSGID: 114057]
> [client-handshake.c:1437:select_server_supported_programs]
> 0-datastore2-client-0: *Using Program GlusterFS 3.3, Num (1298437),
> Version (330)*
> [2016-03-25 07:12:51.041955] I [MSGID: 114035]
> [client-handshake.c:193:client_set_lk_version_cbk] 0-datastore2-client-2:
> Server lk version = 1
> [2016-03-25 07:12:51.042319] I [MSGID: 114046]
> [client-handshake.c:1213:client_setvolume_cbk] 0-datastore2-client-0:
> Connected to datastore2-client-0, attached to remote volume
> '/tank/vmdata/datastore2'.
> [2016-03-25 07:12:51.042333] I [MSGID: 114047]
> [client-handshake.c:1224:client_setvolume_cbk] 0-datastore2-client-0:
> Server and Client lk-version numbers are not same, reopening the fds
> [2016-03-25 07:12:51.042455] I [MSGID: 114057]
> [client-handshake.c:1437:select_server_supported_programs]
> 0-datastore2-client-1: Using Program GlusterFS 3.3, Num (1298437), Version
> (330)
> [2016-03-25 07:12:51.042520] I [MSGID: 114035]
> [client-handshake.c:193:client_set_lk_version_cbk] 0-datastore2-client-0:
> Server lk version = 1
> [2016-03-25 07:12:51.042846] I [MSGID: 114046]
> [client-handshake.c:1213:client_setvolume_cbk] 0-datastore2-client-1:
> Connected to datastore2-client-1, attached to remote volume
> '/tank/vmdata/datastore2'.
> [2016-03-25 07:12:51.042867] I [MSGID: 114047]
> [client-handshake.c:1224:client_setvolume_cbk] 0-datastore2-client-1:
> Server and Client lk-version numbers are not same, reopening the fds
> [2016-03-25 07:12:51.058131] I [MSGID: 114035]
> [client-handshake.c:193:client_set_lk_version_cbk] 0-datastore2-client-1:
> Server lk version = 1
> [2016-03-25 07:12:51.059075] I [MSGID: 108031]
> [afr-common.c:1913:afr_local_discovery_cbk] 0-datastore2-replicate-0:
> selecting local read_child datastore2-client-2
> [2016-03-25 07:12:51.059619] I [MSGID: 104041]
> [glfs-resolve.c:869:__glfs_active_subvol] 0-datastore2: switched to graph
> 766e612d-3739-3437-352d-323031362d30 (0)
>
>
> I have no idea while client version 3.3 is being used! everything should
> be 3.7.9
>
>
> Environment:
>
> - Proxmox (debian Jessie, 8.2)
> - KVM VM's using gfapi, running on the same nodes as the gluster bricks
> - bricks are hosted on 3 ZFS Pools (one per node)
>     * compression =lz4
>     * xattr=sa
>     * sync=standard
>     * acltype=posixacl
>
> Volume info:
> Volume Name: datastore2
> Type: Replicate
> Volume ID: 7d93a1c6-ac39-4d94-b136-e8379643bddd
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore2
> Brick2: vng.proxmox.softlog:/tank/vmdata/datastore2
> Brick3: vna.proxmox.softlog:/tank/vmdata/datastore2
> Options Reconfigured:
> performance.readdir-ahead: on
> nfs.addr-namelookup: off
> nfs.enable-ino32: off
> features.shard: on
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> nfs.disable: on
> performance.write-behind: off
> performance.strict-write-ordering: on
> performance.stat-prefetch: off
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> cluster.eager-lock: enable
> network.remote-dio: enable
>
>
>
> I can do any testing required, bring back logs etc. Can't build gluster
> though.
>
>
> thanks,
>
>
> --
> Lindsay Mathieson
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160325/9373c7aa/attachment.html>

Gluster users - Mar 2016 - Very poor heal behaviour in 3.7.9

[Gluster-users] Very poor heal behaviour in 3.7.9

[Gluster-users] Very poor heal behaviour in 3.7.9