thr3ads.net - Gluster users - [Gluster-users] Rebalance taking

If this information is useful, please help other people find it:
Share via:

Rusty Bower

2018-Jul-30 14:53 UTC

[Gluster-users] Rebalance taking > 2 months

That would be awesome. Where can I find these?

Rusty

Sent from my iPhone
> On Jul 30, 2018, at 03:40, Nithya Balachandran <nbalacha at
redhat.com> wrote:
> 
> Hi Rusty,
> 
> Sorry for the delay getting back to you. I had a quick look at the
rebalance logs - it looks like the estimates are based on the time taken to
rebalance the smaller files.
> 
> We do have a scripting option where we can use virtual xattrs to trigger
file migration from a mount point. That would speed things up.
> 
> 
> Regards,
> Nithya
> 
>> On 28 July 2018 at 07:11, Rusty Bower <rusty at rustybower.com>
wrote:
>> Just wanted to ping this to see if you guys had any thoughts, or other
scripts I can run for this stuff. It's still predicting another 90 days to
rebalance this, and performance is basically garbage while it rebalances.
>> 
>> Rusty
>> 
>>> On Mon, Jul 23, 2018 at 10:19 AM, Rusty Bower <rusty at
rustybower.com> wrote:
>>> datanode03 is the newest brick
>>> 
>>> the bricks had gotten pretty full, which I think might be part of
the issue:
>>> - datanode01 /dev/sda1                 51T   48T  3.3T  94%
/mnt/data
>>> - datanode02 /dev/sda1                 51T   48T  3.4T  94%
/mnt/data
>>> - datanode03 /dev/md0                 128T  4.6T  123T   4%
/mnt/data
>>> 
>>> each of the bricks are on a completely separate disk from the OS
>>> 
>>> I'll shoot you the log files offline :)
>>> 
>>> Thanks!
>>> Rusty
>>> 
>>>> On Mon, Jul 23, 2018 at 3:12 AM, Nithya Balachandran
<nbalacha at redhat.com> wrote:
>>>> Hi Rusty,
>>>> 
>>>> Sorry I took so long to get back to you.
>>>> 
>>>> Which is the newly added brick? I see datanode02 has not picked
up any files for migration which is odd.
>>>> How full are the individual bricks (df -h ) output.
>>>> Is each of your bricks in a separate partition?
>>>> Can you send me the rebalance logs from all 3 nodes (offline if
you prefer)?
>>>> 
>>>> We can try using scripts to speed up the rebalance if you
prefer.
>>>> 
>>>> Regards,
>>>> Nithya
>>>> 
>>>> 
>>>> 
>>>>> On 16 July 2018 at 22:06, Rusty Bower <rusty at
rustybower.com> wrote:
>>>>> Thanks for the reply Nithya.
>>>>> 
>>>>> 1. glusterfs 4.1.1
>>>>> 
>>>>> 2. Volume Name: data
>>>>> Type: Distribute
>>>>> Volume ID: 294d95ce-0ff3-4df9-bd8c-a52fc50442ba
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 3
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: datanode01:/mnt/data/bricks/data
>>>>> Brick2: datanode02:/mnt/data/bricks/data
>>>>> Brick3: datanode03:/mnt/data/bricks/data
>>>>> Options Reconfigured:
>>>>> performance.readdir-ahead: on
>>>>> 
>>>>> 3.
>>>>>                                     Node Rebalanced-files  
size       scanned      failures       skipped               status  run time in
h:m:s
>>>>>                                ---------      -----------  
-----------   -----------   -----------   -----------         ------------    
--------------
>>>>>                                localhost            36822  
11.3GB         50715             0             0          in progress      
26:46:17
>>>>>                               datanode02                0  
0Bytes          2852             0             0          in progress      
26:46:16
>>>>>                               datanode03             3128  
513.7MB         11442             0          3128          in progress      
26:46:17
>>>>> Estimated time left for rebalance to complete : > 2
months. Please try again later.
>>>>> volume rebalance: data: success
>>>>> 
>>>>> 4. Directory structure is basically an rsync backup of some
old systems as well as all of my personal media. I can elaborate more, but
it's a pretty standard filesystem.
>>>>> 
>>>>> 5. In some folders there might be up to like 12-15 levels
of directories (especially the backups)
>>>>> 
>>>>> 6. I'm honestly not sure, I can try to scrounge this
number up
>>>>> 
>>>>> 7. My guess would be > 100k
>>>>> 
>>>>> 8. Most files are pretty large (media files), but
there's a lot of small files (metadata and configuration files) as well
>>>>> 
>>>>> I've also appended a (moderately sanitized) snippet of
the rebalance log (let me know if you need more)
>>>>> 
>>>>> [2018-07-16 17:37:59.979003] I [MSGID: 0]
[dht-rebalance.c:1799:dht_migrate_file] 0-data-dht: destination for file -
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/2040036.img.xml is changed to -
data-client-2
>>>>> [2018-07-16 17:38:00.004262] I [MSGID: 109022]
[dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration of
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/2112002.img.xml from subvolume
data-client-0 to data-client-2
>>>>> [2018-07-16 17:38:00.725582] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] 0-glusterfs: TIME:
(size) total_processed=43108305980 tmp_cnt =
55419279917056,rate_processed=446597.869797, elapsed = 96526.000000
>>>>> [2018-07-16 17:38:00.725641] I
[dht-rebalance.c:5130:gf_defrag_status_get] 0-glusterfs: TIME: Estimated total
time to complete (size)= 124092127 seconds, seconds left = 123995601
>>>>> [2018-07-16 17:38:00.725709] I [MSGID: 109028]
[dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is in
progress. Time taken is 96526.00 secs
>>>>> [2018-07-16 17:38:00.725738] I [MSGID: 109028]
[dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files migrated: 36876,
size: 12270259289, lookups: 50715, failures: 0, skipped: 0
>>>>> [2018-07-16 17:38:02.769121] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] 0-glusterfs: TIME:
(size) total_processed=43108305980 tmp_cnt =
55419279917056,rate_processed=446588.616567, elapsed = 96528.000000
>>>>> [2018-07-16 17:38:02.769207] I
[dht-rebalance.c:5130:gf_defrag_status_get] 0-glusterfs: TIME: Estimated total
time to complete (size)= 124094698 seconds, seconds left = 123998170
>>>>> [2018-07-16 17:38:02.769263] I [MSGID: 109028]
[dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is in
progress. Time taken is 96528.00 secs
>>>>> [2018-07-16 17:38:02.769286] I [MSGID: 109028]
[dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files migrated: 36876,
size: 12270259289, lookups: 50715, failures: 0, skipped: 0
>>>>> [2018-07-16 17:38:03.410469] I
[dht-rebalance.c:1645:dht_migrate_file] 0-data-dht:
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/9201002.img.xml: attempting to
move from data-client-0 to data-client-2
>>>>> [2018-07-16 17:38:03.416127] I [MSGID: 109022]
[dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration of
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/2040036.img.xml from subvolume
data-client-0 to data-client-2
>>>>> [2018-07-16 17:38:04.738885] I
[dht-rebalance.c:1645:dht_migrate_file] 0-data-dht:
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/9110012.img.xml: attempting to
move from data-client-0 to data-client-2
>>>>> [2018-07-16 17:38:04.745722] I [MSGID: 109022]
[dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration of
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/9201002.img.xml from subvolume
data-client-0 to data-client-2
>>>>> [2018-07-16 17:38:04.812368] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] 0-glusterfs: TIME:
(size) total_processed=43108308134 tmp_cnt =
55419279917056,rate_processed=446579.386035, elapsed = 96530.000000
>>>>> [2018-07-16 17:38:04.812417] I
[dht-rebalance.c:5130:gf_defrag_status_get] 0-glusterfs: TIME: Estimated total
time to complete (size)= 124097263 seconds, seconds left = 124000733
>>>>> [2018-07-16 17:38:04.812465] I [MSGID: 109028]
[dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is in
progress. Time taken is 96530.00 secs
>>>>> [2018-07-16 17:38:04.812489] I [MSGID: 109028]
[dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files migrated: 36877,
size: 12270261443, lookups: 50715, failures: 0, skipped: 0
>>>>> [2018-07-16 17:38:04.992413] I
[dht-rebalance.c:1645:dht_migrate_file] 0-data-dht:
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/2050000.img.xml: attempting to
move from data-client-0 to data-client-2
>>>>> [2018-07-16 17:38:04.994122] I [MSGID: 109022]
[dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration of
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/9110012.img.xml from subvolume
data-client-0 to data-client-2
>>>>> [2018-07-16 17:38:06.855618] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] 0-glusterfs: TIME:
(size) total_processed=43108318798 tmp_cnt =
55419279917056,rate_processed=446570.244043, elapsed = 96532.000000
>>>>> [2018-07-16 17:38:06.855719] I
[dht-rebalance.c:5130:gf_defrag_status_get] 0-glusterfs: TIME: Estimated total
time to complete (size)= 124099804 seconds, seconds left = 124003272
>>>>> [2018-07-16 17:38:06.855770] I [MSGID: 109028]
[dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is in
progress. Time taken is 96532.00 secs
>>>>> [2018-07-16 17:38:06.855793] I [MSGID: 109028]
[dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files migrated: 36879,
size: 12270266602, lookups: 50715, failures: 0, skipped: 0
>>>>> [2018-07-16 17:38:08.511064] I
[dht-rebalance.c:1645:dht_migrate_file] 0-data-dht:
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/9201055.img.xml: attempting to
move from data-client-0 to data-client-2
>>>>> [2018-07-16 17:38:08.533029] I [MSGID: 109022]
[dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration of
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/2050000.img.xml from subvolume
data-client-0 to data-client-2
>>>>> [2018-07-16 17:38:08.899708] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] 0-glusterfs: TIME:
(size) total_processed=43108318798 tmp_cnt =
55419279917056,rate_processed=446560.991961, elapsed = 96534.000000
>>>>> [2018-07-16 17:38:08.899791] I
[dht-rebalance.c:5130:gf_defrag_status_get] 0-glusterfs: TIME: Estimated total
time to complete (size)= 124102375 seconds, seconds left = 124005841
>>>>> [2018-07-16 17:38:08.899842] I [MSGID: 109028]
[dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is in
progress. Time taken is 96534.00 secs
>>>>> [2018-07-16 17:38:08.899865] I [MSGID: 109028]
[dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files migrated: 36879,
size: 12270266602, lookups: 50715, failures: 0, skipped: 0
>>>>> 
>>>>> 
>>>>>> On Mon, Jul 16, 2018 at 7:37 AM, Nithya Balachandran
<nbalacha at redhat.com> wrote:
>>>>>> If possible, please send the rebalance logs as well.
>>>>>> 
>>>>>> 
>>>>>>> On 16 July 2018 at 10:14, Nithya Balachandran
<nbalacha at redhat.com> wrote:
>>>>>>> Hi Rusty,
>>>>>>> 
>>>>>>> We need the following information:
>>>>>>> The exact gluster version you are running
>>>>>>> gluster volume info <volname>
>>>>>>> gluster rebalance status
>>>>>>> Information on the directory structure and file
locations on your volume.
>>>>>>> How many levels of directories
>>>>>>> How many files and directories in each level
>>>>>>> How many directories and files in total (a rough
estimate)
>>>>>>> Average file size
>>>>>>> Please note that having a rebalance running in the
background should not affect your volume access in any way. However I would like
to know why only 6000 files have been scanned in 6 hours.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Nithya
>>>>>>> 
>>>>>>> 
>>>>>>>> On 16 July 2018 at 06:13, Rusty Bower <rusty
at rustybower.com> wrote:
>>>>>>>> Hey folks,
>>>>>>>> 
>>>>>>>> I just added a new brick to my existing gluster
volume, but gluster volume rebalance data status is telling me the following:
Estimated time left for rebalance to complete : > 2 months. Please try again
later.
>>>>>>>> 
>>>>>>>> I already did a fix-mapping, but this thing is
absolutely crawling trying to rebalance everything (last estimate was ~40 years)
>>>>>>>> 
>>>>>>>> Any thoughts on if this is a bug, or ways to
speed this up? It's taking ~6 hours to scan 6000 files, which seems
unreasonably slow.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Rusty
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> -------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180730/3627bb84/attachment.html>

Nithya Balachandran

2018-Jul-30 16:48 UTC

head link

[Gluster-users] Rebalance taking > 2 months

I have not documented this yet - I will send you the steps tomorrow.

Regards,
Nithya

On 30 July 2018 at 20:23, Rusty Bower <rusty at rustybower.com> wrote:
> That would be awesome. Where can I find these?
>
> Rusty
>
> Sent from my iPhone
>
> On Jul 30, 2018, at 03:40, Nithya Balachandran <nbalacha at
redhat.com>
> wrote:
>
> Hi Rusty,
>
> Sorry for the delay getting back to you. I had a quick look at the
> rebalance logs - it looks like the estimates are based on the time taken to
> rebalance the smaller files.
>
> We do have a scripting option where we can use virtual xattrs to trigger
> file migration from a mount point. That would speed things up.
>
>
> Regards,
> Nithya
>
> On 28 July 2018 at 07:11, Rusty Bower <rusty at rustybower.com>
wrote:
>
>> Just wanted to ping this to see if you guys had any thoughts, or other
>> scripts I can run for this stuff. It's still predicting another 90
days to
>> rebalance this, and performance is basically garbage while it
rebalances.
>>
>> Rusty
>>
>> On Mon, Jul 23, 2018 at 10:19 AM, Rusty Bower <rusty at
rustybower.com>
>> wrote:
>>
>>> datanode03 is the newest brick
>>>
>>> the bricks had gotten pretty full, which I think might be part of
the
>>> issue:
>>> - datanode01 /dev/sda1                 51T   48T  3.3T  94%
/mnt/data
>>> - datanode02 /dev/sda1                 51T   48T  3.4T  94%
/mnt/data
>>> - datanode03 /dev/md0                 128T  4.6T  123T   4%
/mnt/data
>>>
>>> each of the bricks are on a completely separate disk from the OS
>>>
>>> I'll shoot you the log files offline :)
>>>
>>> Thanks!
>>> Rusty
>>>
>>> On Mon, Jul 23, 2018 at 3:12 AM, Nithya Balachandran <
>>> nbalacha at redhat.com> wrote:
>>>
>>>> Hi Rusty,
>>>>
>>>> Sorry I took so long to get back to you.
>>>>
>>>> Which is the newly added brick? I see datanode02 has not picked
up any
>>>> files for migration which is odd.
>>>> How full are the individual bricks (df -h ) output.
>>>> Is each of your bricks in a separate partition?
>>>> Can you send me the rebalance logs from all 3 nodes (offline if
you
>>>> prefer)?
>>>>
>>>> We can try using scripts to speed up the rebalance if you
prefer.
>>>>
>>>> Regards,
>>>> Nithya
>>>>
>>>>
>>>>
>>>> On 16 July 2018 at 22:06, Rusty Bower <rusty at
rustybower.com> wrote:
>>>>
>>>>> Thanks for the reply Nithya.
>>>>>
>>>>> 1. glusterfs 4.1.1
>>>>>
>>>>> 2. Volume Name: data
>>>>> Type: Distribute
>>>>> Volume ID: 294d95ce-0ff3-4df9-bd8c-a52fc50442ba
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 3
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: datanode01:/mnt/data/bricks/data
>>>>> Brick2: datanode02:/mnt/data/bricks/data
>>>>> Brick3: datanode03:/mnt/data/bricks/data
>>>>> Options Reconfigured:
>>>>> performance.readdir-ahead: on
>>>>>
>>>>> 3.
>>>>>                                     Node Rebalanced-files
>>>>> size       scanned      failures       skipped             
status  run
>>>>> time in h:m:s
>>>>>                                ---------      -----------
>>>>>  -----------   -----------   -----------   -----------
>>>>>  ------------     --------------
>>>>>                                localhost            36822
>>>>> 11.3GB         50715             0             0         
in progress
>>>>>  26:46:17
>>>>>                               datanode02                0
>>>>> 0Bytes          2852             0             0         
in progress
>>>>>  26:46:16
>>>>>                               datanode03             3128
>>>>>  513.7MB         11442             0          3128         
in progress
>>>>>    26:46:17
>>>>> Estimated time left for rebalance to complete : > 2
months. Please try
>>>>> again later.
>>>>> volume rebalance: data: success
>>>>>
>>>>> 4. Directory structure is basically an rsync backup of some
old
>>>>> systems as well as all of my personal media. I can
elaborate more, but it's
>>>>> a pretty standard filesystem.
>>>>>
>>>>> 5. In some folders there might be up to like 12-15 levels
of
>>>>> directories (especially the backups)
>>>>>
>>>>> 6. I'm honestly not sure, I can try to scrounge this
number up
>>>>>
>>>>> 7. My guess would be > 100k
>>>>>
>>>>> 8. Most files are pretty large (media files), but
there's a lot of
>>>>> small files (metadata and configuration files) as well
>>>>>
>>>>> I've also appended a (moderately sanitized) snippet of
the rebalance
>>>>> log (let me know if you need more)
>>>>>
>>>>> [2018-07-16 17:37:59.979003] I [MSGID: 0]
>>>>> [dht-rebalance.c:1799:dht_migrate_file] 0-data-dht:
destination for
>>>>> file -
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/2040036.img.xml
>>>>> is changed to - data-client-2
>>>>> [2018-07-16 17:38:00.004262] I [MSGID: 109022]
>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht:
completed
>>>>> migration of /this/is/a/file/path/that/exis
>>>>> ts/wz/wz/Npc.wz/2112002.img.xml from subvolume
data-client-0 to
>>>>> data-client-2
>>>>> [2018-07-16 17:38:00.725582] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>>> 0-glusterfs: TIME: (size) total_processed=43108305980
tmp_cnt >>>>> 55419279917056,rate_processed=446597.869797,
elapsed = 96526.000000
>>>>> [2018-07-16 17:38:00.725641] I
[dht-rebalance.c:5130:gf_defrag_status_get]
>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)=
124092127
>>>>> seconds, seconds left = 123995601
>>>>> [2018-07-16 17:38:00.725709] I [MSGID: 109028]
>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
Rebalance is
>>>>> in progress. Time taken is 96526.00 secs
>>>>> [2018-07-16 17:38:00.725738] I [MSGID: 109028]
>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs:
Files
>>>>> migrated: 36876, size: 12270259289, lookups: 50715,
failures: 0, skipped: 0
>>>>> [2018-07-16 17:38:02.769121] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>>> 0-glusterfs: TIME: (size) total_processed=43108305980
tmp_cnt >>>>> 55419279917056,rate_processed=446588.616567,
elapsed = 96528.000000
>>>>> [2018-07-16 17:38:02.769207] I
[dht-rebalance.c:5130:gf_defrag_status_get]
>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)=
124094698
>>>>> seconds, seconds left = 123998170
>>>>> [2018-07-16 17:38:02.769263] I [MSGID: 109028]
>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
Rebalance is
>>>>> in progress. Time taken is 96528.00 secs
>>>>> [2018-07-16 17:38:02.769286] I [MSGID: 109028]
>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs:
Files
>>>>> migrated: 36876, size: 12270259289, lookups: 50715,
failures: 0, skipped: 0
>>>>> [2018-07-16 17:38:03.410469] I
[dht-rebalance.c:1645:dht_migrate_file]
>>>>> 0-data-dht: /this/is/a/file/path/that/exis
>>>>> ts/wz/wz/Npc.wz/9201002.img.xml: attempting to move from
>>>>> data-client-0 to data-client-2
>>>>> [2018-07-16 17:38:03.416127] I [MSGID: 109022]
>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht:
completed
>>>>> migration of /this/is/a/file/path/that/exis
>>>>> ts/wz/wz/Npc.wz/2040036.img.xml from subvolume
data-client-0 to
>>>>> data-client-2
>>>>> [2018-07-16 17:38:04.738885] I
[dht-rebalance.c:1645:dht_migrate_file]
>>>>> 0-data-dht: /this/is/a/file/path/that/exis
>>>>> ts/wz/wz/Npc.wz/9110012.img.xml: attempting to move from
>>>>> data-client-0 to data-client-2
>>>>> [2018-07-16 17:38:04.745722] I [MSGID: 109022]
>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht:
completed
>>>>> migration of /this/is/a/file/path/that/exis
>>>>> ts/wz/wz/Npc.wz/9201002.img.xml from subvolume
data-client-0 to
>>>>> data-client-2
>>>>> [2018-07-16 17:38:04.812368] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>>> 0-glusterfs: TIME: (size) total_processed=43108308134
tmp_cnt >>>>> 55419279917056,rate_processed=446579.386035,
elapsed = 96530.000000
>>>>> [2018-07-16 17:38:04.812417] I
[dht-rebalance.c:5130:gf_defrag_status_get]
>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)=
124097263
>>>>> seconds, seconds left = 124000733
>>>>> [2018-07-16 17:38:04.812465] I [MSGID: 109028]
>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
Rebalance is
>>>>> in progress. Time taken is 96530.00 secs
>>>>> [2018-07-16 17:38:04.812489] I [MSGID: 109028]
>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs:
Files
>>>>> migrated: 36877, size: 12270261443, lookups: 50715,
failures: 0, skipped: 0
>>>>> [2018-07-16 17:38:04.992413] I
[dht-rebalance.c:1645:dht_migrate_file]
>>>>> 0-data-dht: /this/is/a/file/path/that/exis
>>>>> ts/wz/wz/Npc.wz/2050000.img.xml: attempting to move from
>>>>> data-client-0 to data-client-2
>>>>> [2018-07-16 17:38:04.994122] I [MSGID: 109022]
>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht:
completed
>>>>> migration of /this/is/a/file/path/that/exis
>>>>> ts/wz/wz/Npc.wz/9110012.img.xml from subvolume
data-client-0 to
>>>>> data-client-2
>>>>> [2018-07-16 17:38:06.855618] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>>> 0-glusterfs: TIME: (size) total_processed=43108318798
tmp_cnt >>>>> 55419279917056,rate_processed=446570.244043,
elapsed = 96532.000000
>>>>> [2018-07-16 17:38:06.855719] I
[dht-rebalance.c:5130:gf_defrag_status_get]
>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)=
124099804
>>>>> seconds, seconds left = 124003272
>>>>> [2018-07-16 17:38:06.855770] I [MSGID: 109028]
>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
Rebalance is
>>>>> in progress. Time taken is 96532.00 secs
>>>>> [2018-07-16 17:38:06.855793] I [MSGID: 109028]
>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs:
Files
>>>>> migrated: 36879, size: 12270266602, lookups: 50715,
failures: 0, skipped: 0
>>>>> [2018-07-16 17:38:08.511064] I
[dht-rebalance.c:1645:dht_migrate_file]
>>>>> 0-data-dht: /this/is/a/file/path/that/exis
>>>>> ts/wz/wz/Npc.wz/9201055.img.xml: attempting to move from
>>>>> data-client-0 to data-client-2
>>>>> [2018-07-16 17:38:08.533029] I [MSGID: 109022]
>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht:
completed
>>>>> migration of /this/is/a/file/path/that/exis
>>>>> ts/wz/wz/Npc.wz/2050000.img.xml from subvolume
data-client-0 to
>>>>> data-client-2
>>>>> [2018-07-16 17:38:08.899708] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>>> 0-glusterfs: TIME: (size) total_processed=43108318798
tmp_cnt >>>>> 55419279917056,rate_processed=446560.991961,
elapsed = 96534.000000
>>>>> [2018-07-16 17:38:08.899791] I
[dht-rebalance.c:5130:gf_defrag_status_get]
>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)=
124102375
>>>>> seconds, seconds left = 124005841
>>>>> [2018-07-16 17:38:08.899842] I [MSGID: 109028]
>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
Rebalance is
>>>>> in progress. Time taken is 96534.00 secs
>>>>> [2018-07-16 17:38:08.899865] I [MSGID: 109028]
>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs:
Files
>>>>> migrated: 36879, size: 12270266602, lookups: 50715,
failures: 0, skipped: 0
>>>>>
>>>>>
>>>>> On Mon, Jul 16, 2018 at 7:37 AM, Nithya Balachandran <
>>>>> nbalacha at redhat.com> wrote:
>>>>>
>>>>>> If possible, please send the rebalance logs as well.
>>>>>>
>>>>>>
>>>>>> On 16 July 2018 at 10:14, Nithya Balachandran
<nbalacha at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Rusty,
>>>>>>>
>>>>>>> We need the following information:
>>>>>>>
>>>>>>>    1. The exact gluster version you are running
>>>>>>>    2. gluster volume info <volname>
>>>>>>>    3. gluster rebalance status
>>>>>>>    4. Information on the directory structure and
file locations on
>>>>>>>    your volume.
>>>>>>>    5. How many levels of directories
>>>>>>>    6. How many files and directories in each level
>>>>>>>    7. How many directories and files in total (a
rough estimate)
>>>>>>>    8. Average file size
>>>>>>>
>>>>>>> Please note that having a rebalance running in the
background should
>>>>>>> not affect your volume access in any way. However I
would like to know why
>>>>>>> only 6000 files have been scanned in 6 hours.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Nithya
>>>>>>>
>>>>>>>
>>>>>>> On 16 July 2018 at 06:13, Rusty Bower <rusty at
rustybower.com> wrote:
>>>>>>>
>>>>>>>> Hey folks,
>>>>>>>>
>>>>>>>> I just added a new brick to my existing gluster
volume, but *gluster
>>>>>>>> volume rebalance data status* is telling me the
>>>>>>>> following: Estimated time left for rebalance to
complete : > 2 months.
>>>>>>>> Please try again later.
>>>>>>>>
>>>>>>>> I already did a fix-mapping, but this thing is
absolutely crawling
>>>>>>>> trying to rebalance everything (last estimate
was ~40 years)
>>>>>>>>
>>>>>>>> Any thoughts on if this is a bug, or ways to
speed this up? It's
>>>>>>>> taking ~6 hours to scan 6000 files, which seems
unreasonably slow.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Rusty
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180730/70fa92d2/attachment.html>

Gluster users - Jul 2018 - Rebalance taking > 2 months

[Gluster-users] Rebalance taking > 2 months

[Gluster-users] Rebalance taking > 2 months