Hi Rusty,
Sorry for the delay getting back to you. I had a quick look at the
rebalance logs - it looks like the estimates are based on the time taken to
rebalance the smaller files.
We do have a scripting option where we can use virtual xattrs to trigger
file migration from a mount point. That would speed things up.
Regards,
Nithya
On 28 July 2018 at 07:11, Rusty Bower <rusty at rustybower.com> wrote:
> Just wanted to ping this to see if you guys had any thoughts, or other
> scripts I can run for this stuff. It's still predicting another 90 days
to
> rebalance this, and performance is basically garbage while it rebalances.
>
> Rusty
>
> On Mon, Jul 23, 2018 at 10:19 AM, Rusty Bower <rusty at
rustybower.com>
> wrote:
>
>> datanode03 is the newest brick
>>
>> the bricks had gotten pretty full, which I think might be part of the
>> issue:
>> - datanode01 /dev/sda1 51T 48T 3.3T 94% /mnt/data
>> - datanode02 /dev/sda1 51T 48T 3.4T 94% /mnt/data
>> - datanode03 /dev/md0 128T 4.6T 123T 4% /mnt/data
>>
>> each of the bricks are on a completely separate disk from the OS
>>
>> I'll shoot you the log files offline :)
>>
>> Thanks!
>> Rusty
>>
>> On Mon, Jul 23, 2018 at 3:12 AM, Nithya Balachandran <nbalacha at
redhat.com
>> > wrote:
>>
>>> Hi Rusty,
>>>
>>> Sorry I took so long to get back to you.
>>>
>>> Which is the newly added brick? I see datanode02 has not picked up
any
>>> files for migration which is odd.
>>> How full are the individual bricks (df -h ) output.
>>> Is each of your bricks in a separate partition?
>>> Can you send me the rebalance logs from all 3 nodes (offline if you
>>> prefer)?
>>>
>>> We can try using scripts to speed up the rebalance if you prefer.
>>>
>>> Regards,
>>> Nithya
>>>
>>>
>>>
>>> On 16 July 2018 at 22:06, Rusty Bower <rusty at
rustybower.com> wrote:
>>>
>>>> Thanks for the reply Nithya.
>>>>
>>>> 1. glusterfs 4.1.1
>>>>
>>>> 2. Volume Name: data
>>>> Type: Distribute
>>>> Volume ID: 294d95ce-0ff3-4df9-bd8c-a52fc50442ba
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: datanode01:/mnt/data/bricks/data
>>>> Brick2: datanode02:/mnt/data/bricks/data
>>>> Brick3: datanode03:/mnt/data/bricks/data
>>>> Options Reconfigured:
>>>> performance.readdir-ahead: on
>>>>
>>>> 3.
>>>> Node Rebalanced-files
>>>> size scanned failures skipped
status run
>>>> time in h:m:s
>>>> --------- -----------
>>>> ----------- ----------- ----------- -----------
>>>> ------------ --------------
>>>> localhost 36822
>>>> 11.3GB 50715 0 0 in
progress
>>>> 26:46:17
>>>> datanode02 0
>>>> 0Bytes 2852 0 0 in
progress
>>>> 26:46:16
>>>> datanode03 3128
>>>> 513.7MB 11442 0 3128 in
progress
>>>> 26:46:17
>>>> Estimated time left for rebalance to complete : > 2 months.
Please try
>>>> again later.
>>>> volume rebalance: data: success
>>>>
>>>> 4. Directory structure is basically an rsync backup of some old
systems
>>>> as well as all of my personal media. I can elaborate more, but
it's a
>>>> pretty standard filesystem.
>>>>
>>>> 5. In some folders there might be up to like 12-15 levels of
>>>> directories (especially the backups)
>>>>
>>>> 6. I'm honestly not sure, I can try to scrounge this number
up
>>>>
>>>> 7. My guess would be > 100k
>>>>
>>>> 8. Most files are pretty large (media files), but there's a
lot of
>>>> small files (metadata and configuration files) as well
>>>>
>>>> I've also appended a (moderately sanitized) snippet of the
rebalance
>>>> log (let me know if you need more)
>>>>
>>>> [2018-07-16 17:37:59.979003] I [MSGID: 0]
[dht-rebalance.c:1799:dht_migrate_file]
>>>> 0-data-dht: destination for file -
/this/is/a/file/path/that/exis
>>>> ts/wz/wz/Npc.wz/2040036.img.xml is changed to - data-client-2
>>>> [2018-07-16 17:38:00.004262] I [MSGID: 109022]
>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed
>>>> migration of /this/is/a/file/path/that/exis
>>>> ts/wz/wz/Npc.wz/2112002.img.xml from subvolume data-client-0 to
>>>> data-client-2
>>>> [2018-07-16 17:38:00.725582] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>> 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt
>>>> 55419279917056,rate_processed=446597.869797, elapsed =
96526.000000
>>>> [2018-07-16 17:38:00.725641] I
[dht-rebalance.c:5130:gf_defrag_status_get]
>>>> 0-glusterfs: TIME: Estimated total time to complete (size)=
124092127
>>>> seconds, seconds left = 123995601
>>>> [2018-07-16 17:38:00.725709] I [MSGID: 109028]
>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
Rebalance is
>>>> in progress. Time taken is 96526.00 secs
>>>> [2018-07-16 17:38:00.725738] I [MSGID: 109028]
>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files
>>>> migrated: 36876, size: 12270259289, lookups: 50715, failures:
0, skipped: 0
>>>> [2018-07-16 17:38:02.769121] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>> 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt
>>>> 55419279917056,rate_processed=446588.616567, elapsed =
96528.000000
>>>> [2018-07-16 17:38:02.769207] I
[dht-rebalance.c:5130:gf_defrag_status_get]
>>>> 0-glusterfs: TIME: Estimated total time to complete (size)=
124094698
>>>> seconds, seconds left = 123998170
>>>> [2018-07-16 17:38:02.769263] I [MSGID: 109028]
>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
Rebalance is
>>>> in progress. Time taken is 96528.00 secs
>>>> [2018-07-16 17:38:02.769286] I [MSGID: 109028]
>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files
>>>> migrated: 36876, size: 12270259289, lookups: 50715, failures:
0, skipped: 0
>>>> [2018-07-16 17:38:03.410469] I
[dht-rebalance.c:1645:dht_migrate_file]
>>>> 0-data-dht: /this/is/a/file/path/that/exis
>>>> ts/wz/wz/Npc.wz/9201002.img.xml: attempting to move from
data-client-0
>>>> to data-client-2
>>>> [2018-07-16 17:38:03.416127] I [MSGID: 109022]
>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed
>>>> migration of /this/is/a/file/path/that/exis
>>>> ts/wz/wz/Npc.wz/2040036.img.xml from subvolume data-client-0 to
>>>> data-client-2
>>>> [2018-07-16 17:38:04.738885] I
[dht-rebalance.c:1645:dht_migrate_file]
>>>> 0-data-dht: /this/is/a/file/path/that/exis
>>>> ts/wz/wz/Npc.wz/9110012.img.xml: attempting to move from
data-client-0
>>>> to data-client-2
>>>> [2018-07-16 17:38:04.745722] I [MSGID: 109022]
>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed
>>>> migration of /this/is/a/file/path/that/exis
>>>> ts/wz/wz/Npc.wz/9201002.img.xml from subvolume data-client-0 to
>>>> data-client-2
>>>> [2018-07-16 17:38:04.812368] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>> 0-glusterfs: TIME: (size) total_processed=43108308134 tmp_cnt
>>>> 55419279917056,rate_processed=446579.386035, elapsed =
96530.000000
>>>> [2018-07-16 17:38:04.812417] I
[dht-rebalance.c:5130:gf_defrag_status_get]
>>>> 0-glusterfs: TIME: Estimated total time to complete (size)=
124097263
>>>> seconds, seconds left = 124000733
>>>> [2018-07-16 17:38:04.812465] I [MSGID: 109028]
>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
Rebalance is
>>>> in progress. Time taken is 96530.00 secs
>>>> [2018-07-16 17:38:04.812489] I [MSGID: 109028]
>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files
>>>> migrated: 36877, size: 12270261443, lookups: 50715, failures:
0, skipped: 0
>>>> [2018-07-16 17:38:04.992413] I
[dht-rebalance.c:1645:dht_migrate_file]
>>>> 0-data-dht: /this/is/a/file/path/that/exis
>>>> ts/wz/wz/Npc.wz/2050000.img.xml: attempting to move from
data-client-0
>>>> to data-client-2
>>>> [2018-07-16 17:38:04.994122] I [MSGID: 109022]
>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed
>>>> migration of /this/is/a/file/path/that/exis
>>>> ts/wz/wz/Npc.wz/9110012.img.xml from subvolume data-client-0 to
>>>> data-client-2
>>>> [2018-07-16 17:38:06.855618] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>> 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt
>>>> 55419279917056,rate_processed=446570.244043, elapsed =
96532.000000
>>>> [2018-07-16 17:38:06.855719] I
[dht-rebalance.c:5130:gf_defrag_status_get]
>>>> 0-glusterfs: TIME: Estimated total time to complete (size)=
124099804
>>>> seconds, seconds left = 124003272
>>>> [2018-07-16 17:38:06.855770] I [MSGID: 109028]
>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
Rebalance is
>>>> in progress. Time taken is 96532.00 secs
>>>> [2018-07-16 17:38:06.855793] I [MSGID: 109028]
>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files
>>>> migrated: 36879, size: 12270266602, lookups: 50715, failures:
0, skipped: 0
>>>> [2018-07-16 17:38:08.511064] I
[dht-rebalance.c:1645:dht_migrate_file]
>>>> 0-data-dht: /this/is/a/file/path/that/exis
>>>> ts/wz/wz/Npc.wz/9201055.img.xml: attempting to move from
data-client-0
>>>> to data-client-2
>>>> [2018-07-16 17:38:08.533029] I [MSGID: 109022]
>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed
>>>> migration of /this/is/a/file/path/that/exis
>>>> ts/wz/wz/Npc.wz/2050000.img.xml from subvolume data-client-0 to
>>>> data-client-2
>>>> [2018-07-16 17:38:08.899708] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>> 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt
>>>> 55419279917056,rate_processed=446560.991961, elapsed =
96534.000000
>>>> [2018-07-16 17:38:08.899791] I
[dht-rebalance.c:5130:gf_defrag_status_get]
>>>> 0-glusterfs: TIME: Estimated total time to complete (size)=
124102375
>>>> seconds, seconds left = 124005841
>>>> [2018-07-16 17:38:08.899842] I [MSGID: 109028]
>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
Rebalance is
>>>> in progress. Time taken is 96534.00 secs
>>>> [2018-07-16 17:38:08.899865] I [MSGID: 109028]
>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files
>>>> migrated: 36879, size: 12270266602, lookups: 50715, failures:
0, skipped: 0
>>>>
>>>>
>>>> On Mon, Jul 16, 2018 at 7:37 AM, Nithya Balachandran <
>>>> nbalacha at redhat.com> wrote:
>>>>
>>>>> If possible, please send the rebalance logs as well.
>>>>>
>>>>>
>>>>> On 16 July 2018 at 10:14, Nithya Balachandran <nbalacha
at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Rusty,
>>>>>>
>>>>>> We need the following information:
>>>>>>
>>>>>> 1. The exact gluster version you are running
>>>>>> 2. gluster volume info <volname>
>>>>>> 3. gluster rebalance status
>>>>>> 4. Information on the directory structure and file
locations on
>>>>>> your volume.
>>>>>> 5. How many levels of directories
>>>>>> 6. How many files and directories in each level
>>>>>> 7. How many directories and files in total (a rough
estimate)
>>>>>> 8. Average file size
>>>>>>
>>>>>> Please note that having a rebalance running in the
background should
>>>>>> not affect your volume access in any way. However I
would like to know why
>>>>>> only 6000 files have been scanned in 6 hours.
>>>>>>
>>>>>> Regards,
>>>>>> Nithya
>>>>>>
>>>>>>
>>>>>> On 16 July 2018 at 06:13, Rusty Bower <rusty at
rustybower.com> wrote:
>>>>>>
>>>>>>> Hey folks,
>>>>>>>
>>>>>>> I just added a new brick to my existing gluster
volume, but *gluster
>>>>>>> volume rebalance data status* is telling me the
>>>>>>> following: Estimated time left for rebalance to
complete : > 2 months.
>>>>>>> Please try again later.
>>>>>>>
>>>>>>> I already did a fix-mapping, but this thing is
absolutely crawling
>>>>>>> trying to rebalance everything (last estimate was
~40 years)
>>>>>>>
>>>>>>> Any thoughts on if this is a bug, or ways to speed
this up? It's
>>>>>>> taking ~6 hours to scan 6000 files, which seems
unreasonably slow.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Rusty
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180730/a35a381e/attachment.html>