Hu Bert
2018-Jul-20 07:41 UTC
[Gluster-users] Gluter 3.12.12: performance during heal and in general
hmm... no one any idea? Additional question: the hdd on server gluster12 was changed, so far ~220 GB were copied. On the other 2 servers i see a lot of entries in glustershd.log, about 312.000 respectively 336.000 entries there yesterday, most of them (current log output) looking like this: [2018-07-20 07:30:49.757595] I [MSGID: 108026] [afr-self-heal-common.c:1724:afr_log_selfheal] 0-shared-replicate-3: Completed data selfheal on 0d863a62-0dd8-401c-b699-2b642d9fd2b6. sources=0 [2] sinks=1 [2018-07-20 07:30:49.992398] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-shared-replicate-3: performing metadata selfheal on 0d863a62-0dd8-401c-b699-2b642d9fd2b6 [2018-07-20 07:30:50.243551] I [MSGID: 108026] [afr-self-heal-common.c:1724:afr_log_selfheal] 0-shared-replicate-3: Completed metadata selfheal on 0d863a62-0dd8-401c-b699-2b642d9fd2b6. sources=0 [2] sinks=1 or like this: [2018-07-20 07:38:41.726943] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-shared-replicate-3: performing metadata selfheal on 9276097a-cdac-4d12-9dc6-04b1ea4458ba [2018-07-20 07:38:41.855737] I [MSGID: 108026] [afr-self-heal-common.c:1724:afr_log_selfheal] 0-shared-replicate-3: Completed metadata selfheal on 9276097a-cdac-4d12-9dc6-04b1ea4458ba. sources=[0] 2 sinks=1 [2018-07-20 07:38:44.755800] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-shared-replicate-3: performing entry selfheal on 9276097a-cdac-4d12-9dc6-04b1ea4458ba is this behaviour normal? I'd expect these messages on the server with the failed brick, not on the other ones. 2018-07-19 8:31 GMT+02:00 Hu Bert <revirii at googlemail.com>:> Hi there, > > sent this mail yesterday, but somehow it didn't work? Wasn't archived, > so please be indulgent it you receive this mail again :-) > > We are currently running a replicate setup and are experiencing a > quite poor performance. It got even worse when within a couple of > weeks 2 bricks (disks) crashed. Maybe some general information of our > setup: > > 3 Dell PowerEdge R530 (Xeon E5-1650 v3 Hexa-Core, 64 GB DDR4, OS on > separate disks); each server has 4 10TB disks -> each is a brick; > replica 3 setup (see gluster volume status below). Debian stretch, > kernel 4.9.0, gluster version 3.12.12. Servers and clients are > connected via 10 GBit ethernet. > > About a month ago and 2 days ago a disk died (on different servers); > disk were replaced, were brought back into the volume and full self > heal started. But the speed for this is quite... disappointing. Each > brick has ~1.6TB of data on it (mostly the infamous small files). The > full heal i started yesterday copied only ~50GB within 24 hours (48 > hours: about 100GB) - with > this rate it would take weeks until the self heal finishes. > > After the first heal (started on gluster13 about a month ago, took > about 3 weeks) finished we had a terrible performance; CPU on one or > two of the nodes (gluster11, gluster12) was up to 1200%, consumed by > the brick process of the former crashed brick (bricksdd1), > interestingly not on the server with the failed this, but on the other > 2 ones... > > Well... am i doing something wrong? Some options wrongly configured? > Terrible setup? Anyone got an idea? Any additional information needed? > > > Thx in advance :-) > > gluster volume status > > Volume Name: shared > Type: Distributed-Replicate > Volume ID: e879d208-1d8c-4089-85f3-ef1b3aa45d36 > Status: Started > Snapshot Count: 0 > Number of Bricks: 4 x 3 = 12 > Transport-type: tcp > Bricks: > Brick1: gluster11:/gluster/bricksda1/shared > Brick2: gluster12:/gluster/bricksda1/shared > Brick3: gluster13:/gluster/bricksda1/shared > Brick4: gluster11:/gluster/bricksdb1/shared > Brick5: gluster12:/gluster/bricksdb1/shared > Brick6: gluster13:/gluster/bricksdb1/shared > Brick7: gluster11:/gluster/bricksdc1/shared > Brick8: gluster12:/gluster/bricksdc1/shared > Brick9: gluster13:/gluster/bricksdc1/shared > Brick10: gluster11:/gluster/bricksdd1/shared > Brick11: gluster12:/gluster/bricksdd1_new/shared > Brick12: gluster13:/gluster/bricksdd1_new/shared > Options Reconfigured: > cluster.shd-max-threads: 4 > performance.md-cache-timeout: 60 > cluster.lookup-optimize: on > cluster.readdir-optimize: on > performance.cache-refresh-timeout: 4 > performance.parallel-readdir: on > server.event-threads: 8 > client.event-threads: 8 > performance.cache-max-file-size: 128MB > performance.write-behind-window-size: 16MB > performance.io-thread-count: 64 > cluster.min-free-disk: 1% > performance.cache-size: 24GB > nfs.disable: on > transport.address-family: inet > performance.high-prio-threads: 32 > performance.normal-prio-threads: 32 > performance.low-prio-threads: 32 > performance.least-prio-threads: 8 > performance.io-cache: on > server.allow-insecure: on > performance.strict-o-direct: off > transport.listen-backlog: 100 > server.outstanding-rpc-limit: 128
Hu Bert
2018-Jul-23 10:46 UTC
[Gluster-users] Gluter 3.12.12: performance during heal and in general
Well, over the weekend about 200GB were copied, so now there are ~400GB copied to the brick. That's far beyond a speed of 10GB per hour. If I copied the 1.6 TB directly, that would be done within max 2 days. But with the self heal this will take at least 20 days minimum. Why is the performance that bad? No chance of speeding this up? 2018-07-20 9:41 GMT+02:00 Hu Bert <revirii at googlemail.com>:> hmm... no one any idea? > > Additional question: the hdd on server gluster12 was changed, so far > ~220 GB were copied. On the other 2 servers i see a lot of entries in > glustershd.log, about 312.000 respectively 336.000 entries there > yesterday, most of them (current log output) looking like this: > > [2018-07-20 07:30:49.757595] I [MSGID: 108026] > [afr-self-heal-common.c:1724:afr_log_selfheal] 0-shared-replicate-3: > Completed data selfheal on 0d863a62-0dd8-401c-b699-2b642d9fd2b6. > sources=0 [2] sinks=1 > [2018-07-20 07:30:49.992398] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-shared-replicate-3: performing metadata selfheal on > 0d863a62-0dd8-401c-b699-2b642d9fd2b6 > [2018-07-20 07:30:50.243551] I [MSGID: 108026] > [afr-self-heal-common.c:1724:afr_log_selfheal] 0-shared-replicate-3: > Completed metadata selfheal on 0d863a62-0dd8-401c-b699-2b642d9fd2b6. > sources=0 [2] sinks=1 > > or like this: > > [2018-07-20 07:38:41.726943] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-shared-replicate-3: performing metadata selfheal on > 9276097a-cdac-4d12-9dc6-04b1ea4458ba > [2018-07-20 07:38:41.855737] I [MSGID: 108026] > [afr-self-heal-common.c:1724:afr_log_selfheal] 0-shared-replicate-3: > Completed metadata selfheal on 9276097a-cdac-4d12-9dc6-04b1ea4458ba. > sources=[0] 2 sinks=1 > [2018-07-20 07:38:44.755800] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] > 0-shared-replicate-3: performing entry selfheal on > 9276097a-cdac-4d12-9dc6-04b1ea4458ba > > is this behaviour normal? I'd expect these messages on the server with > the failed brick, not on the other ones. > > 2018-07-19 8:31 GMT+02:00 Hu Bert <revirii at googlemail.com>: >> Hi there, >> >> sent this mail yesterday, but somehow it didn't work? Wasn't archived, >> so please be indulgent it you receive this mail again :-) >> >> We are currently running a replicate setup and are experiencing a >> quite poor performance. It got even worse when within a couple of >> weeks 2 bricks (disks) crashed. Maybe some general information of our >> setup: >> >> 3 Dell PowerEdge R530 (Xeon E5-1650 v3 Hexa-Core, 64 GB DDR4, OS on >> separate disks); each server has 4 10TB disks -> each is a brick; >> replica 3 setup (see gluster volume status below). Debian stretch, >> kernel 4.9.0, gluster version 3.12.12. Servers and clients are >> connected via 10 GBit ethernet. >> >> About a month ago and 2 days ago a disk died (on different servers); >> disk were replaced, were brought back into the volume and full self >> heal started. But the speed for this is quite... disappointing. Each >> brick has ~1.6TB of data on it (mostly the infamous small files). The >> full heal i started yesterday copied only ~50GB within 24 hours (48 >> hours: about 100GB) - with >> this rate it would take weeks until the self heal finishes. >> >> After the first heal (started on gluster13 about a month ago, took >> about 3 weeks) finished we had a terrible performance; CPU on one or >> two of the nodes (gluster11, gluster12) was up to 1200%, consumed by >> the brick process of the former crashed brick (bricksdd1), >> interestingly not on the server with the failed this, but on the other >> 2 ones... >> >> Well... am i doing something wrong? Some options wrongly configured? >> Terrible setup? Anyone got an idea? Any additional information needed? >> >> >> Thx in advance :-) >> >> gluster volume status >> >> Volume Name: shared >> Type: Distributed-Replicate >> Volume ID: e879d208-1d8c-4089-85f3-ef1b3aa45d36 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 4 x 3 = 12 >> Transport-type: tcp >> Bricks: >> Brick1: gluster11:/gluster/bricksda1/shared >> Brick2: gluster12:/gluster/bricksda1/shared >> Brick3: gluster13:/gluster/bricksda1/shared >> Brick4: gluster11:/gluster/bricksdb1/shared >> Brick5: gluster12:/gluster/bricksdb1/shared >> Brick6: gluster13:/gluster/bricksdb1/shared >> Brick7: gluster11:/gluster/bricksdc1/shared >> Brick8: gluster12:/gluster/bricksdc1/shared >> Brick9: gluster13:/gluster/bricksdc1/shared >> Brick10: gluster11:/gluster/bricksdd1/shared >> Brick11: gluster12:/gluster/bricksdd1_new/shared >> Brick12: gluster13:/gluster/bricksdd1_new/shared >> Options Reconfigured: >> cluster.shd-max-threads: 4 >> performance.md-cache-timeout: 60 >> cluster.lookup-optimize: on >> cluster.readdir-optimize: on >> performance.cache-refresh-timeout: 4 >> performance.parallel-readdir: on >> server.event-threads: 8 >> client.event-threads: 8 >> performance.cache-max-file-size: 128MB >> performance.write-behind-window-size: 16MB >> performance.io-thread-count: 64 >> cluster.min-free-disk: 1% >> performance.cache-size: 24GB >> nfs.disable: on >> transport.address-family: inet >> performance.high-prio-threads: 32 >> performance.normal-prio-threads: 32 >> performance.low-prio-threads: 32 >> performance.least-prio-threads: 8 >> performance.io-cache: on >> server.allow-insecure: on >> performance.strict-o-direct: off >> transport.listen-backlog: 100 >> server.outstanding-rpc-limit: 128