On 31 July 2018 at 19:44, Rusty Bower <rusty at rustybower.com> wrote:> I'll figure out what hasn't been rebalanced yet and run the script. > > There's only a single client accessing this gluster volume, and while the > rebalance is taking place, the I am only able to read/write to the volume > at around 3MB/s. If I log onto one of the bricks, I can read/write to the > physical volumes at speed greater than 100MB/s (which is what I would > expect). >What are the numbers when accessing the volume when rebalance is not running? Regards, Nithya> > Thanks! > Rusty > > On Tue, Jul 31, 2018 at 3:28 AM, Nithya Balachandran <nbalacha at redhat.com> > wrote: > >> Hi Rusty, >> >> A rebalance involves 2 steps: >> >> 1. Setting a new layout on a directory >> 2. Migrating any files inside that directory that hash to a different >> subvol based on the new layout set in step 1. >> >> >> A few things to keep in mind : >> >> - Any new content created on this volume will currently go to the >> newly added brick. >> - Having a more equitable file distribution is beneficial but you >> might not need to do a complete rebalance to do this. You can run the >> script on just enough directories to free up space on your older bricks. >> This should be done on bricks which contains large files to speed this up. >> >> Do the following on one of the server nodes: >> >> - Create a tmp mount point and mount the volume using the rebalance >> volfile >> - mkdir /mnt/rebal >> - glusterfs -s localhost --volfile-id rebalance/data /mnt/rebal >> - Select a directory in the volume which contains a lot of large >> files and which has not been processed by the rebalance yet - the lower >> down in the tree the better. Check the rebalance logs to figure out which >> dirs have not been processed yet. >> - cd /mnt/rebal/<chosen_dir> >> - for dir in `find . -type d`; do echo $dir |xargs -0 -n1 -P10 >> bash process_dir.sh;done >> - You can run this for different values of <chosen_dir> and on >> multiple server nodes in parallel as long as the directory trees for the >> different <chosen_dirs> don't overlap. >> - Do this for multiple directories until the disk space used reduces >> on the older bricks. >> >> This is a very simple script. Let me know how it works - we can always >> tweak it for your particular data set. >> >> >> >and performance is basically garbage while it rebalances >> Can you provide more detail on this? What kind of effects are you seeing? >> How many clients access this volume? >> >> >> Regards, >> Nithya >> >> On 30 July 2018 at 22:18, Nithya Balachandran <nbalacha at redhat.com> >> wrote: >> >>> I have not documented this yet - I will send you the steps tomorrow. >>> >>> Regards, >>> Nithya >>> >>> On 30 July 2018 at 20:23, Rusty Bower <rusty at rustybower.com> wrote: >>> >>>> That would be awesome. Where can I find these? >>>> >>>> Rusty >>>> >>>> Sent from my iPhone >>>> >>>> On Jul 30, 2018, at 03:40, Nithya Balachandran <nbalacha at redhat.com> >>>> wrote: >>>> >>>> Hi Rusty, >>>> >>>> Sorry for the delay getting back to you. I had a quick look at the >>>> rebalance logs - it looks like the estimates are based on the time taken to >>>> rebalance the smaller files. >>>> >>>> We do have a scripting option where we can use virtual xattrs to >>>> trigger file migration from a mount point. That would speed things up. >>>> >>>> >>>> Regards, >>>> Nithya >>>> >>>> On 28 July 2018 at 07:11, Rusty Bower <rusty at rustybower.com> wrote: >>>> >>>>> Just wanted to ping this to see if you guys had any thoughts, or other >>>>> scripts I can run for this stuff. It's still predicting another 90 days to >>>>> rebalance this, and performance is basically garbage while it rebalances. >>>>> >>>>> Rusty >>>>> >>>>> On Mon, Jul 23, 2018 at 10:19 AM, Rusty Bower <rusty at rustybower.com> >>>>> wrote: >>>>> >>>>>> datanode03 is the newest brick >>>>>> >>>>>> the bricks had gotten pretty full, which I think might be part of the >>>>>> issue: >>>>>> - datanode01 /dev/sda1 51T 48T 3.3T 94% /mnt/data >>>>>> - datanode02 /dev/sda1 51T 48T 3.4T 94% /mnt/data >>>>>> - datanode03 /dev/md0 128T 4.6T 123T 4% /mnt/data >>>>>> >>>>>> each of the bricks are on a completely separate disk from the OS >>>>>> >>>>>> I'll shoot you the log files offline :) >>>>>> >>>>>> Thanks! >>>>>> Rusty >>>>>> >>>>>> On Mon, Jul 23, 2018 at 3:12 AM, Nithya Balachandran < >>>>>> nbalacha at redhat.com> wrote: >>>>>> >>>>>>> Hi Rusty, >>>>>>> >>>>>>> Sorry I took so long to get back to you. >>>>>>> >>>>>>> Which is the newly added brick? I see datanode02 has not picked up >>>>>>> any files for migration which is odd. >>>>>>> How full are the individual bricks (df -h ) output. >>>>>>> Is each of your bricks in a separate partition? >>>>>>> Can you send me the rebalance logs from all 3 nodes (offline if you >>>>>>> prefer)? >>>>>>> >>>>>>> We can try using scripts to speed up the rebalance if you prefer. >>>>>>> >>>>>>> Regards, >>>>>>> Nithya >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 16 July 2018 at 22:06, Rusty Bower <rusty at rustybower.com> wrote: >>>>>>> >>>>>>>> Thanks for the reply Nithya. >>>>>>>> >>>>>>>> 1. glusterfs 4.1.1 >>>>>>>> >>>>>>>> 2. Volume Name: data >>>>>>>> Type: Distribute >>>>>>>> Volume ID: 294d95ce-0ff3-4df9-bd8c-a52fc50442ba >>>>>>>> Status: Started >>>>>>>> Snapshot Count: 0 >>>>>>>> Number of Bricks: 3 >>>>>>>> Transport-type: tcp >>>>>>>> Bricks: >>>>>>>> Brick1: datanode01:/mnt/data/bricks/data >>>>>>>> Brick2: datanode02:/mnt/data/bricks/data >>>>>>>> Brick3: datanode03:/mnt/data/bricks/data >>>>>>>> Options Reconfigured: >>>>>>>> performance.readdir-ahead: on >>>>>>>> >>>>>>>> 3. >>>>>>>> Node Rebalanced-files >>>>>>>> size scanned failures skipped status run >>>>>>>> time in h:m:s >>>>>>>> --------- ----------- >>>>>>>> ----------- ----------- ----------- ----------- >>>>>>>> ------------ -------------- >>>>>>>> localhost 36822 >>>>>>>> 11.3GB 50715 0 0 in progress >>>>>>>> 26:46:17 >>>>>>>> datanode02 0 >>>>>>>> 0Bytes 2852 0 0 in progress >>>>>>>> 26:46:16 >>>>>>>> datanode03 3128 >>>>>>>> 513.7MB 11442 0 3128 in progress >>>>>>>> 26:46:17 >>>>>>>> Estimated time left for rebalance to complete : > 2 months. Please >>>>>>>> try again later. >>>>>>>> volume rebalance: data: success >>>>>>>> >>>>>>>> 4. Directory structure is basically an rsync backup of some old >>>>>>>> systems as well as all of my personal media. I can elaborate more, but it's >>>>>>>> a pretty standard filesystem. >>>>>>>> >>>>>>>> 5. In some folders there might be up to like 12-15 levels of >>>>>>>> directories (especially the backups) >>>>>>>> >>>>>>>> 6. I'm honestly not sure, I can try to scrounge this number up >>>>>>>> >>>>>>>> 7. My guess would be > 100k >>>>>>>> >>>>>>>> 8. Most files are pretty large (media files), but there's a lot of >>>>>>>> small files (metadata and configuration files) as well >>>>>>>> >>>>>>>> I've also appended a (moderately sanitized) snippet of the rebalance >>>>>>>> log (let me know if you need more) >>>>>>>> >>>>>>>> [2018-07-16 17:37:59.979003] I [MSGID: 0] >>>>>>>> [dht-rebalance.c:1799:dht_migrate_file] 0-data-dht: destination >>>>>>>> for file - /this/is/a/file/path/that/exis >>>>>>>> ts/wz/wz/Npc.wz/2040036.img.xml is changed to - data-client-2 >>>>>>>> [2018-07-16 17:38:00.004262] I [MSGID: 109022] >>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>> ts/wz/wz/Npc.wz/2112002.img.xml from subvolume data-client-0 to >>>>>>>> data-client-2 >>>>>>>> [2018-07-16 17:38:00.725582] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt >>>>>>>> 55419279917056,rate_processed=446597.869797, elapsed = 96526.000000 >>>>>>>> [2018-07-16 17:38:00.725641] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124092127 >>>>>>>> seconds, seconds left = 123995601 >>>>>>>> [2018-07-16 17:38:00.725709] I [MSGID: 109028] >>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance >>>>>>>> is in progress. Time taken is 96526.00 secs >>>>>>>> [2018-07-16 17:38:00.725738] I [MSGID: 109028] >>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>> migrated: 36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0 >>>>>>>> [2018-07-16 17:38:02.769121] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt >>>>>>>> 55419279917056,rate_processed=446588.616567, elapsed = 96528.000000 >>>>>>>> [2018-07-16 17:38:02.769207] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124094698 >>>>>>>> seconds, seconds left = 123998170 >>>>>>>> [2018-07-16 17:38:02.769263] I [MSGID: 109028] >>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance >>>>>>>> is in progress. Time taken is 96528.00 secs >>>>>>>> [2018-07-16 17:38:02.769286] I [MSGID: 109028] >>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>> migrated: 36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0 >>>>>>>> [2018-07-16 17:38:03.410469] I [dht-rebalance.c:1645:dht_migrate_file] >>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis >>>>>>>> ts/wz/wz/Npc.wz/9201002.img.xml: attempting to move from >>>>>>>> data-client-0 to data-client-2 >>>>>>>> [2018-07-16 17:38:03.416127] I [MSGID: 109022] >>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>> ts/wz/wz/Npc.wz/2040036.img.xml from subvolume data-client-0 to >>>>>>>> data-client-2 >>>>>>>> [2018-07-16 17:38:04.738885] I [dht-rebalance.c:1645:dht_migrate_file] >>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis >>>>>>>> ts/wz/wz/Npc.wz/9110012.img.xml: attempting to move from >>>>>>>> data-client-0 to data-client-2 >>>>>>>> [2018-07-16 17:38:04.745722] I [MSGID: 109022] >>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>> ts/wz/wz/Npc.wz/9201002.img.xml from subvolume data-client-0 to >>>>>>>> data-client-2 >>>>>>>> [2018-07-16 17:38:04.812368] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108308134 tmp_cnt >>>>>>>> 55419279917056,rate_processed=446579.386035, elapsed = 96530.000000 >>>>>>>> [2018-07-16 17:38:04.812417] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124097263 >>>>>>>> seconds, seconds left = 124000733 >>>>>>>> [2018-07-16 17:38:04.812465] I [MSGID: 109028] >>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance >>>>>>>> is in progress. Time taken is 96530.00 secs >>>>>>>> [2018-07-16 17:38:04.812489] I [MSGID: 109028] >>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>> migrated: 36877, size: 12270261443, lookups: 50715, failures: 0, skipped: 0 >>>>>>>> [2018-07-16 17:38:04.992413] I [dht-rebalance.c:1645:dht_migrate_file] >>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis >>>>>>>> ts/wz/wz/Npc.wz/2050000.img.xml: attempting to move from >>>>>>>> data-client-0 to data-client-2 >>>>>>>> [2018-07-16 17:38:04.994122] I [MSGID: 109022] >>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>> ts/wz/wz/Npc.wz/9110012.img.xml from subvolume data-client-0 to >>>>>>>> data-client-2 >>>>>>>> [2018-07-16 17:38:06.855618] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt >>>>>>>> 55419279917056,rate_processed=446570.244043, elapsed = 96532.000000 >>>>>>>> [2018-07-16 17:38:06.855719] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124099804 >>>>>>>> seconds, seconds left = 124003272 >>>>>>>> [2018-07-16 17:38:06.855770] I [MSGID: 109028] >>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance >>>>>>>> is in progress. Time taken is 96532.00 secs >>>>>>>> [2018-07-16 17:38:06.855793] I [MSGID: 109028] >>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>> migrated: 36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0 >>>>>>>> [2018-07-16 17:38:08.511064] I [dht-rebalance.c:1645:dht_migrate_file] >>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis >>>>>>>> ts/wz/wz/Npc.wz/9201055.img.xml: attempting to move from >>>>>>>> data-client-0 to data-client-2 >>>>>>>> [2018-07-16 17:38:08.533029] I [MSGID: 109022] >>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>> ts/wz/wz/Npc.wz/2050000.img.xml from subvolume data-client-0 to >>>>>>>> data-client-2 >>>>>>>> [2018-07-16 17:38:08.899708] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt >>>>>>>> 55419279917056,rate_processed=446560.991961, elapsed = 96534.000000 >>>>>>>> [2018-07-16 17:38:08.899791] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124102375 >>>>>>>> seconds, seconds left = 124005841 >>>>>>>> [2018-07-16 17:38:08.899842] I [MSGID: 109028] >>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance >>>>>>>> is in progress. Time taken is 96534.00 secs >>>>>>>> [2018-07-16 17:38:08.899865] I [MSGID: 109028] >>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>> migrated: 36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0 >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jul 16, 2018 at 7:37 AM, Nithya Balachandran < >>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>> >>>>>>>>> If possible, please send the rebalance logs as well. >>>>>>>>> >>>>>>>>> >>>>>>>>> On 16 July 2018 at 10:14, Nithya Balachandran <nbalacha at redhat.com >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> Hi Rusty, >>>>>>>>>> >>>>>>>>>> We need the following information: >>>>>>>>>> >>>>>>>>>> 1. The exact gluster version you are running >>>>>>>>>> 2. gluster volume info <volname> >>>>>>>>>> 3. gluster rebalance status >>>>>>>>>> 4. Information on the directory structure and file locations >>>>>>>>>> on your volume. >>>>>>>>>> 5. How many levels of directories >>>>>>>>>> 6. How many files and directories in each level >>>>>>>>>> 7. How many directories and files in total (a rough estimate) >>>>>>>>>> 8. Average file size >>>>>>>>>> >>>>>>>>>> Please note that having a rebalance running in the background >>>>>>>>>> should not affect your volume access in any way. However I would like to >>>>>>>>>> know why only 6000 files have been scanned in 6 hours. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Nithya >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 16 July 2018 at 06:13, Rusty Bower <rusty at rustybower.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hey folks, >>>>>>>>>>> >>>>>>>>>>> I just added a new brick to my existing gluster volume, but *gluster >>>>>>>>>>> volume rebalance data status* is telling me the >>>>>>>>>>> following: Estimated time left for rebalance to complete : > 2 months. >>>>>>>>>>> Please try again later. >>>>>>>>>>> >>>>>>>>>>> I already did a fix-mapping, but this thing is absolutely >>>>>>>>>>> crawling trying to rebalance everything (last estimate was ~40 years) >>>>>>>>>>> >>>>>>>>>>> Any thoughts on if this is a bug, or ways to speed this up? It's >>>>>>>>>>> taking ~6 hours to scan 6000 files, which seems unreasonably slow. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Rusty >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180731/3c6ca3e6/attachment.html>
Is it possible to pause the rebalance to get those number? I'm hesitant to stop the rebalance and have to redo the entire thing again. On Tue, Jul 31, 2018 at 11:40 AM, Nithya Balachandran <nbalacha at redhat.com> wrote:> > > On 31 July 2018 at 19:44, Rusty Bower <rusty at rustybower.com> wrote: > >> I'll figure out what hasn't been rebalanced yet and run the script. >> >> There's only a single client accessing this gluster volume, and while the >> rebalance is taking place, the I am only able to read/write to the volume >> at around 3MB/s. If I log onto one of the bricks, I can read/write to the >> physical volumes at speed greater than 100MB/s (which is what I would >> expect). >> > > What are the numbers when accessing the volume when rebalance is not > running? > Regards, > Nithya > >> >> Thanks! >> Rusty >> >> On Tue, Jul 31, 2018 at 3:28 AM, Nithya Balachandran <nbalacha at redhat.com >> > wrote: >> >>> Hi Rusty, >>> >>> A rebalance involves 2 steps: >>> >>> 1. Setting a new layout on a directory >>> 2. Migrating any files inside that directory that hash to a >>> different subvol based on the new layout set in step 1. >>> >>> >>> A few things to keep in mind : >>> >>> - Any new content created on this volume will currently go to the >>> newly added brick. >>> - Having a more equitable file distribution is beneficial but you >>> might not need to do a complete rebalance to do this. You can run the >>> script on just enough directories to free up space on your older bricks. >>> This should be done on bricks which contains large files to speed this up. >>> >>> Do the following on one of the server nodes: >>> >>> - Create a tmp mount point and mount the volume using the rebalance >>> volfile >>> - mkdir /mnt/rebal >>> - glusterfs -s localhost --volfile-id rebalance/data /mnt/rebal >>> - Select a directory in the volume which contains a lot of large >>> files and which has not been processed by the rebalance yet - the lower >>> down in the tree the better. Check the rebalance logs to figure out which >>> dirs have not been processed yet. >>> - cd /mnt/rebal/<chosen_dir> >>> - for dir in `find . -type d`; do echo $dir |xargs -0 -n1 -P10 >>> bash process_dir.sh;done >>> - You can run this for different values of <chosen_dir> and on >>> multiple server nodes in parallel as long as the directory trees for the >>> different <chosen_dirs> don't overlap. >>> - Do this for multiple directories until the disk space used reduces >>> on the older bricks. >>> >>> This is a very simple script. Let me know how it works - we can always >>> tweak it for your particular data set. >>> >>> >>> >and performance is basically garbage while it rebalances >>> Can you provide more detail on this? What kind of effects are you seeing? >>> How many clients access this volume? >>> >>> >>> Regards, >>> Nithya >>> >>> On 30 July 2018 at 22:18, Nithya Balachandran <nbalacha at redhat.com> >>> wrote: >>> >>>> I have not documented this yet - I will send you the steps tomorrow. >>>> >>>> Regards, >>>> Nithya >>>> >>>> On 30 July 2018 at 20:23, Rusty Bower <rusty at rustybower.com> wrote: >>>> >>>>> That would be awesome. Where can I find these? >>>>> >>>>> Rusty >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Jul 30, 2018, at 03:40, Nithya Balachandran <nbalacha at redhat.com> >>>>> wrote: >>>>> >>>>> Hi Rusty, >>>>> >>>>> Sorry for the delay getting back to you. I had a quick look at the >>>>> rebalance logs - it looks like the estimates are based on the time taken to >>>>> rebalance the smaller files. >>>>> >>>>> We do have a scripting option where we can use virtual xattrs to >>>>> trigger file migration from a mount point. That would speed things up. >>>>> >>>>> >>>>> Regards, >>>>> Nithya >>>>> >>>>> On 28 July 2018 at 07:11, Rusty Bower <rusty at rustybower.com> wrote: >>>>> >>>>>> Just wanted to ping this to see if you guys had any thoughts, or >>>>>> other scripts I can run for this stuff. It's still predicting another 90 >>>>>> days to rebalance this, and performance is basically garbage while it >>>>>> rebalances. >>>>>> >>>>>> Rusty >>>>>> >>>>>> On Mon, Jul 23, 2018 at 10:19 AM, Rusty Bower <rusty at rustybower.com> >>>>>> wrote: >>>>>> >>>>>>> datanode03 is the newest brick >>>>>>> >>>>>>> the bricks had gotten pretty full, which I think might be part of >>>>>>> the issue: >>>>>>> - datanode01 /dev/sda1 51T 48T 3.3T 94% /mnt/data >>>>>>> - datanode02 /dev/sda1 51T 48T 3.4T 94% /mnt/data >>>>>>> - datanode03 /dev/md0 128T 4.6T 123T 4% /mnt/data >>>>>>> >>>>>>> each of the bricks are on a completely separate disk from the OS >>>>>>> >>>>>>> I'll shoot you the log files offline :) >>>>>>> >>>>>>> Thanks! >>>>>>> Rusty >>>>>>> >>>>>>> On Mon, Jul 23, 2018 at 3:12 AM, Nithya Balachandran < >>>>>>> nbalacha at redhat.com> wrote: >>>>>>> >>>>>>>> Hi Rusty, >>>>>>>> >>>>>>>> Sorry I took so long to get back to you. >>>>>>>> >>>>>>>> Which is the newly added brick? I see datanode02 has not picked up >>>>>>>> any files for migration which is odd. >>>>>>>> How full are the individual bricks (df -h ) output. >>>>>>>> Is each of your bricks in a separate partition? >>>>>>>> Can you send me the rebalance logs from all 3 nodes (offline if you >>>>>>>> prefer)? >>>>>>>> >>>>>>>> We can try using scripts to speed up the rebalance if you prefer. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Nithya >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 16 July 2018 at 22:06, Rusty Bower <rusty at rustybower.com> wrote: >>>>>>>> >>>>>>>>> Thanks for the reply Nithya. >>>>>>>>> >>>>>>>>> 1. glusterfs 4.1.1 >>>>>>>>> >>>>>>>>> 2. Volume Name: data >>>>>>>>> Type: Distribute >>>>>>>>> Volume ID: 294d95ce-0ff3-4df9-bd8c-a52fc50442ba >>>>>>>>> Status: Started >>>>>>>>> Snapshot Count: 0 >>>>>>>>> Number of Bricks: 3 >>>>>>>>> Transport-type: tcp >>>>>>>>> Bricks: >>>>>>>>> Brick1: datanode01:/mnt/data/bricks/data >>>>>>>>> Brick2: datanode02:/mnt/data/bricks/data >>>>>>>>> Brick3: datanode03:/mnt/data/bricks/data >>>>>>>>> Options Reconfigured: >>>>>>>>> performance.readdir-ahead: on >>>>>>>>> >>>>>>>>> 3. >>>>>>>>> Node Rebalanced-files >>>>>>>>> size scanned failures skipped status run >>>>>>>>> time in h:m:s >>>>>>>>> --------- ----------- >>>>>>>>> ----------- ----------- ----------- ----------- >>>>>>>>> ------------ -------------- >>>>>>>>> localhost 36822 >>>>>>>>> 11.3GB 50715 0 0 in progress >>>>>>>>> 26:46:17 >>>>>>>>> datanode02 0 >>>>>>>>> 0Bytes 2852 0 0 in progress >>>>>>>>> 26:46:16 >>>>>>>>> datanode03 3128 >>>>>>>>> 513.7MB 11442 0 3128 in progress >>>>>>>>> 26:46:17 >>>>>>>>> Estimated time left for rebalance to complete : > 2 months. Please >>>>>>>>> try again later. >>>>>>>>> volume rebalance: data: success >>>>>>>>> >>>>>>>>> 4. Directory structure is basically an rsync backup of some old >>>>>>>>> systems as well as all of my personal media. I can elaborate more, but it's >>>>>>>>> a pretty standard filesystem. >>>>>>>>> >>>>>>>>> 5. In some folders there might be up to like 12-15 levels of >>>>>>>>> directories (especially the backups) >>>>>>>>> >>>>>>>>> 6. I'm honestly not sure, I can try to scrounge this number up >>>>>>>>> >>>>>>>>> 7. My guess would be > 100k >>>>>>>>> >>>>>>>>> 8. Most files are pretty large (media files), but there's a lot of >>>>>>>>> small files (metadata and configuration files) as well >>>>>>>>> >>>>>>>>> I've also appended a (moderately sanitized) snippet of the rebalance >>>>>>>>> log (let me know if you need more) >>>>>>>>> >>>>>>>>> [2018-07-16 17:37:59.979003] I [MSGID: 0] >>>>>>>>> [dht-rebalance.c:1799:dht_migrate_file] 0-data-dht: destination >>>>>>>>> for file - /this/is/a/file/path/that/exis >>>>>>>>> ts/wz/wz/Npc.wz/2040036.img.xml is changed to - data-client-2 >>>>>>>>> [2018-07-16 17:38:00.004262] I [MSGID: 109022] >>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>>> ts/wz/wz/Npc.wz/2112002.img.xml from subvolume data-client-0 to >>>>>>>>> data-client-2 >>>>>>>>> [2018-07-16 17:38:00.725582] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt >>>>>>>>> 55419279917056,rate_processed=446597.869797, elapsed >>>>>>>>> 96526.000000 >>>>>>>>> [2018-07-16 17:38:00.725641] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124092127 >>>>>>>>> seconds, seconds left = 123995601 >>>>>>>>> [2018-07-16 17:38:00.725709] I [MSGID: 109028] >>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: >>>>>>>>> Rebalance is in progress. Time taken is 96526.00 secs >>>>>>>>> [2018-07-16 17:38:00.725738] I [MSGID: 109028] >>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>>> migrated: 36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0 >>>>>>>>> [2018-07-16 17:38:02.769121] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt >>>>>>>>> 55419279917056,rate_processed=446588.616567, elapsed >>>>>>>>> 96528.000000 >>>>>>>>> [2018-07-16 17:38:02.769207] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124094698 >>>>>>>>> seconds, seconds left = 123998170 >>>>>>>>> [2018-07-16 17:38:02.769263] I [MSGID: 109028] >>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: >>>>>>>>> Rebalance is in progress. Time taken is 96528.00 secs >>>>>>>>> [2018-07-16 17:38:02.769286] I [MSGID: 109028] >>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>>> migrated: 36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0 >>>>>>>>> [2018-07-16 17:38:03.410469] I [dht-rebalance.c:1645:dht_migrate_file] >>>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis >>>>>>>>> ts/wz/wz/Npc.wz/9201002.img.xml: attempting to move from >>>>>>>>> data-client-0 to data-client-2 >>>>>>>>> [2018-07-16 17:38:03.416127] I [MSGID: 109022] >>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>>> ts/wz/wz/Npc.wz/2040036.img.xml from subvolume data-client-0 to >>>>>>>>> data-client-2 >>>>>>>>> [2018-07-16 17:38:04.738885] I [dht-rebalance.c:1645:dht_migrate_file] >>>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis >>>>>>>>> ts/wz/wz/Npc.wz/9110012.img.xml: attempting to move from >>>>>>>>> data-client-0 to data-client-2 >>>>>>>>> [2018-07-16 17:38:04.745722] I [MSGID: 109022] >>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>>> ts/wz/wz/Npc.wz/9201002.img.xml from subvolume data-client-0 to >>>>>>>>> data-client-2 >>>>>>>>> [2018-07-16 17:38:04.812368] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108308134 tmp_cnt >>>>>>>>> 55419279917056,rate_processed=446579.386035, elapsed >>>>>>>>> 96530.000000 >>>>>>>>> [2018-07-16 17:38:04.812417] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124097263 >>>>>>>>> seconds, seconds left = 124000733 >>>>>>>>> [2018-07-16 17:38:04.812465] I [MSGID: 109028] >>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: >>>>>>>>> Rebalance is in progress. Time taken is 96530.00 secs >>>>>>>>> [2018-07-16 17:38:04.812489] I [MSGID: 109028] >>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>>> migrated: 36877, size: 12270261443, lookups: 50715, failures: 0, skipped: 0 >>>>>>>>> [2018-07-16 17:38:04.992413] I [dht-rebalance.c:1645:dht_migrate_file] >>>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis >>>>>>>>> ts/wz/wz/Npc.wz/2050000.img.xml: attempting to move from >>>>>>>>> data-client-0 to data-client-2 >>>>>>>>> [2018-07-16 17:38:04.994122] I [MSGID: 109022] >>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>>> ts/wz/wz/Npc.wz/9110012.img.xml from subvolume data-client-0 to >>>>>>>>> data-client-2 >>>>>>>>> [2018-07-16 17:38:06.855618] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt >>>>>>>>> 55419279917056,rate_processed=446570.244043, elapsed >>>>>>>>> 96532.000000 >>>>>>>>> [2018-07-16 17:38:06.855719] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124099804 >>>>>>>>> seconds, seconds left = 124003272 >>>>>>>>> [2018-07-16 17:38:06.855770] I [MSGID: 109028] >>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: >>>>>>>>> Rebalance is in progress. Time taken is 96532.00 secs >>>>>>>>> [2018-07-16 17:38:06.855793] I [MSGID: 109028] >>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>>> migrated: 36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0 >>>>>>>>> [2018-07-16 17:38:08.511064] I [dht-rebalance.c:1645:dht_migrate_file] >>>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis >>>>>>>>> ts/wz/wz/Npc.wz/9201055.img.xml: attempting to move from >>>>>>>>> data-client-0 to data-client-2 >>>>>>>>> [2018-07-16 17:38:08.533029] I [MSGID: 109022] >>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>>> ts/wz/wz/Npc.wz/2050000.img.xml from subvolume data-client-0 to >>>>>>>>> data-client-2 >>>>>>>>> [2018-07-16 17:38:08.899708] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt >>>>>>>>> 55419279917056,rate_processed=446560.991961, elapsed >>>>>>>>> 96534.000000 >>>>>>>>> [2018-07-16 17:38:08.899791] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124102375 >>>>>>>>> seconds, seconds left = 124005841 >>>>>>>>> [2018-07-16 17:38:08.899842] I [MSGID: 109028] >>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: >>>>>>>>> Rebalance is in progress. Time taken is 96534.00 secs >>>>>>>>> [2018-07-16 17:38:08.899865] I [MSGID: 109028] >>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>>> migrated: 36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Jul 16, 2018 at 7:37 AM, Nithya Balachandran < >>>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> If possible, please send the rebalance logs as well. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 16 July 2018 at 10:14, Nithya Balachandran < >>>>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Rusty, >>>>>>>>>>> >>>>>>>>>>> We need the following information: >>>>>>>>>>> >>>>>>>>>>> 1. The exact gluster version you are running >>>>>>>>>>> 2. gluster volume info <volname> >>>>>>>>>>> 3. gluster rebalance status >>>>>>>>>>> 4. Information on the directory structure and file locations >>>>>>>>>>> on your volume. >>>>>>>>>>> 5. How many levels of directories >>>>>>>>>>> 6. How many files and directories in each level >>>>>>>>>>> 7. How many directories and files in total (a rough estimate) >>>>>>>>>>> 8. Average file size >>>>>>>>>>> >>>>>>>>>>> Please note that having a rebalance running in the background >>>>>>>>>>> should not affect your volume access in any way. However I would like to >>>>>>>>>>> know why only 6000 files have been scanned in 6 hours. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Nithya >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 16 July 2018 at 06:13, Rusty Bower <rusty at rustybower.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey folks, >>>>>>>>>>>> >>>>>>>>>>>> I just added a new brick to my existing gluster volume, but *gluster >>>>>>>>>>>> volume rebalance data status* is telling me the >>>>>>>>>>>> following: Estimated time left for rebalance to complete : > 2 months. >>>>>>>>>>>> Please try again later. >>>>>>>>>>>> >>>>>>>>>>>> I already did a fix-mapping, but this thing is absolutely >>>>>>>>>>>> crawling trying to rebalance everything (last estimate was ~40 years) >>>>>>>>>>>> >>>>>>>>>>>> Any thoughts on if this is a bug, or ways to speed this up? >>>>>>>>>>>> It's taking ~6 hours to scan 6000 files, which seems unreasonably slow. >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> Rusty >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180731/40d87d65/attachment.html>
On 31 July 2018 at 22:17, Rusty Bower <rusty at rustybower.com> wrote:> Is it possible to pause the rebalance to get those number? I'm hesitant to > stop the rebalance and have to redo the entire thing again. > > I'm afraid not. Rebalance will start from the beginning if you do so.> On Tue, Jul 31, 2018 at 11:40 AM, Nithya Balachandran <nbalacha at redhat.com > > wrote: > >> >> >> On 31 July 2018 at 19:44, Rusty Bower <rusty at rustybower.com> wrote: >> >>> I'll figure out what hasn't been rebalanced yet and run the script. >>> >>> There's only a single client accessing this gluster volume, and while >>> the rebalance is taking place, the I am only able to read/write to the >>> volume at around 3MB/s. If I log onto one of the bricks, I can read/write >>> to the physical volumes at speed greater than 100MB/s (which is what I >>> would expect). >>> >> >> What are the numbers when accessing the volume when rebalance is not >> running? >> Regards, >> Nithya >> >>> >>> Thanks! >>> Rusty >>> >>> On Tue, Jul 31, 2018 at 3:28 AM, Nithya Balachandran < >>> nbalacha at redhat.com> wrote: >>> >>>> Hi Rusty, >>>> >>>> A rebalance involves 2 steps: >>>> >>>> 1. Setting a new layout on a directory >>>> 2. Migrating any files inside that directory that hash to a >>>> different subvol based on the new layout set in step 1. >>>> >>>> >>>> A few things to keep in mind : >>>> >>>> - Any new content created on this volume will currently go to the >>>> newly added brick. >>>> - Having a more equitable file distribution is beneficial but you >>>> might not need to do a complete rebalance to do this. You can run the >>>> script on just enough directories to free up space on your older bricks. >>>> This should be done on bricks which contains large files to speed this up. >>>> >>>> Do the following on one of the server nodes: >>>> >>>> - Create a tmp mount point and mount the volume using the rebalance >>>> volfile >>>> - mkdir /mnt/rebal >>>> - glusterfs -s localhost --volfile-id rebalance/data /mnt/rebal >>>> - Select a directory in the volume which contains a lot of large >>>> files and which has not been processed by the rebalance yet - the lower >>>> down in the tree the better. Check the rebalance logs to figure out which >>>> dirs have not been processed yet. >>>> - cd /mnt/rebal/<chosen_dir> >>>> - for dir in `find . -type d`; do echo $dir |xargs -0 -n1 -P10 >>>> bash process_dir.sh;done >>>> - You can run this for different values of <chosen_dir> and on >>>> multiple server nodes in parallel as long as the directory trees for the >>>> different <chosen_dirs> don't overlap. >>>> - Do this for multiple directories until the disk space used >>>> reduces on the older bricks. >>>> >>>> This is a very simple script. Let me know how it works - we can always >>>> tweak it for your particular data set. >>>> >>>> >>>> >and performance is basically garbage while it rebalances >>>> Can you provide more detail on this? What kind of effects are you >>>> seeing? >>>> How many clients access this volume? >>>> >>>> >>>> Regards, >>>> Nithya >>>> >>>> On 30 July 2018 at 22:18, Nithya Balachandran <nbalacha at redhat.com> >>>> wrote: >>>> >>>>> I have not documented this yet - I will send you the steps tomorrow. >>>>> >>>>> Regards, >>>>> Nithya >>>>> >>>>> On 30 July 2018 at 20:23, Rusty Bower <rusty at rustybower.com> wrote: >>>>> >>>>>> That would be awesome. Where can I find these? >>>>>> >>>>>> Rusty >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> On Jul 30, 2018, at 03:40, Nithya Balachandran <nbalacha at redhat.com> >>>>>> wrote: >>>>>> >>>>>> Hi Rusty, >>>>>> >>>>>> Sorry for the delay getting back to you. I had a quick look at the >>>>>> rebalance logs - it looks like the estimates are based on the time taken to >>>>>> rebalance the smaller files. >>>>>> >>>>>> We do have a scripting option where we can use virtual xattrs to >>>>>> trigger file migration from a mount point. That would speed things up. >>>>>> >>>>>> >>>>>> Regards, >>>>>> Nithya >>>>>> >>>>>> On 28 July 2018 at 07:11, Rusty Bower <rusty at rustybower.com> wrote: >>>>>> >>>>>>> Just wanted to ping this to see if you guys had any thoughts, or >>>>>>> other scripts I can run for this stuff. It's still predicting another 90 >>>>>>> days to rebalance this, and performance is basically garbage while it >>>>>>> rebalances. >>>>>>> >>>>>>> Rusty >>>>>>> >>>>>>> On Mon, Jul 23, 2018 at 10:19 AM, Rusty Bower <rusty at rustybower.com> >>>>>>> wrote: >>>>>>> >>>>>>>> datanode03 is the newest brick >>>>>>>> >>>>>>>> the bricks had gotten pretty full, which I think might be part of >>>>>>>> the issue: >>>>>>>> - datanode01 /dev/sda1 51T 48T 3.3T 94% >>>>>>>> /mnt/data >>>>>>>> - datanode02 /dev/sda1 51T 48T 3.4T 94% >>>>>>>> /mnt/data >>>>>>>> - datanode03 /dev/md0 128T 4.6T 123T 4% >>>>>>>> /mnt/data >>>>>>>> >>>>>>>> each of the bricks are on a completely separate disk from the OS >>>>>>>> >>>>>>>> I'll shoot you the log files offline :) >>>>>>>> >>>>>>>> Thanks! >>>>>>>> Rusty >>>>>>>> >>>>>>>> On Mon, Jul 23, 2018 at 3:12 AM, Nithya Balachandran < >>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi Rusty, >>>>>>>>> >>>>>>>>> Sorry I took so long to get back to you. >>>>>>>>> >>>>>>>>> Which is the newly added brick? I see datanode02 has not picked >>>>>>>>> up any files for migration which is odd. >>>>>>>>> How full are the individual bricks (df -h ) output. >>>>>>>>> Is each of your bricks in a separate partition? >>>>>>>>> Can you send me the rebalance logs from all 3 nodes (offline if >>>>>>>>> you prefer)? >>>>>>>>> >>>>>>>>> We can try using scripts to speed up the rebalance if you prefer. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Nithya >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 16 July 2018 at 22:06, Rusty Bower <rusty at rustybower.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks for the reply Nithya. >>>>>>>>>> >>>>>>>>>> 1. glusterfs 4.1.1 >>>>>>>>>> >>>>>>>>>> 2. Volume Name: data >>>>>>>>>> Type: Distribute >>>>>>>>>> Volume ID: 294d95ce-0ff3-4df9-bd8c-a52fc50442ba >>>>>>>>>> Status: Started >>>>>>>>>> Snapshot Count: 0 >>>>>>>>>> Number of Bricks: 3 >>>>>>>>>> Transport-type: tcp >>>>>>>>>> Bricks: >>>>>>>>>> Brick1: datanode01:/mnt/data/bricks/data >>>>>>>>>> Brick2: datanode02:/mnt/data/bricks/data >>>>>>>>>> Brick3: datanode03:/mnt/data/bricks/data >>>>>>>>>> Options Reconfigured: >>>>>>>>>> performance.readdir-ahead: on >>>>>>>>>> >>>>>>>>>> 3. >>>>>>>>>> Node Rebalanced-files >>>>>>>>>> size scanned failures skipped status run >>>>>>>>>> time in h:m:s >>>>>>>>>> --------- ----------- >>>>>>>>>> ----------- ----------- ----------- ----------- >>>>>>>>>> ------------ -------------- >>>>>>>>>> localhost 36822 >>>>>>>>>> 11.3GB 50715 0 0 in progress >>>>>>>>>> 26:46:17 >>>>>>>>>> datanode02 0 >>>>>>>>>> 0Bytes 2852 0 0 in progress >>>>>>>>>> 26:46:16 >>>>>>>>>> datanode03 3128 >>>>>>>>>> 513.7MB 11442 0 3128 in progress >>>>>>>>>> 26:46:17 >>>>>>>>>> Estimated time left for rebalance to complete : > 2 months. >>>>>>>>>> Please try again later. >>>>>>>>>> volume rebalance: data: success >>>>>>>>>> >>>>>>>>>> 4. Directory structure is basically an rsync backup of some old >>>>>>>>>> systems as well as all of my personal media. I can elaborate more, but it's >>>>>>>>>> a pretty standard filesystem. >>>>>>>>>> >>>>>>>>>> 5. In some folders there might be up to like 12-15 levels of >>>>>>>>>> directories (especially the backups) >>>>>>>>>> >>>>>>>>>> 6. I'm honestly not sure, I can try to scrounge this number up >>>>>>>>>> >>>>>>>>>> 7. My guess would be > 100k >>>>>>>>>> >>>>>>>>>> 8. Most files are pretty large (media files), but there's a lot >>>>>>>>>> of small files (metadata and configuration files) as well >>>>>>>>>> >>>>>>>>>> I've also appended a (moderately sanitized) snippet of the rebalance >>>>>>>>>> log (let me know if you need more) >>>>>>>>>> >>>>>>>>>> [2018-07-16 17:37:59.979003] I [MSGID: 0] >>>>>>>>>> [dht-rebalance.c:1799:dht_migrate_file] 0-data-dht: destination >>>>>>>>>> for file - /this/is/a/file/path/that/exis >>>>>>>>>> ts/wz/wz/Npc.wz/2040036.img.xml is changed to - data-client-2 >>>>>>>>>> [2018-07-16 17:38:00.004262] I [MSGID: 109022] >>>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>>>> ts/wz/wz/Npc.wz/2112002.img.xml from subvolume data-client-0 to >>>>>>>>>> data-client-2 >>>>>>>>>> [2018-07-16 17:38:00.725582] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt >>>>>>>>>> 55419279917056,rate_processed=446597.869797, elapsed >>>>>>>>>> 96526.000000 >>>>>>>>>> [2018-07-16 17:38:00.725641] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124092127 >>>>>>>>>> seconds, seconds left = 123995601 >>>>>>>>>> [2018-07-16 17:38:00.725709] I [MSGID: 109028] >>>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: >>>>>>>>>> Rebalance is in progress. Time taken is 96526.00 secs >>>>>>>>>> [2018-07-16 17:38:00.725738] I [MSGID: 109028] >>>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>>>> migrated: 36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0 >>>>>>>>>> [2018-07-16 17:38:02.769121] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt >>>>>>>>>> 55419279917056,rate_processed=446588.616567, elapsed >>>>>>>>>> 96528.000000 >>>>>>>>>> [2018-07-16 17:38:02.769207] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124094698 >>>>>>>>>> seconds, seconds left = 123998170 >>>>>>>>>> [2018-07-16 17:38:02.769263] I [MSGID: 109028] >>>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: >>>>>>>>>> Rebalance is in progress. Time taken is 96528.00 secs >>>>>>>>>> [2018-07-16 17:38:02.769286] I [MSGID: 109028] >>>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>>>> migrated: 36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0 >>>>>>>>>> [2018-07-16 17:38:03.410469] I [dht-rebalance.c:1645:dht_migrate_file] >>>>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis >>>>>>>>>> ts/wz/wz/Npc.wz/9201002.img.xml: attempting to move from >>>>>>>>>> data-client-0 to data-client-2 >>>>>>>>>> [2018-07-16 17:38:03.416127] I [MSGID: 109022] >>>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>>>> ts/wz/wz/Npc.wz/2040036.img.xml from subvolume data-client-0 to >>>>>>>>>> data-client-2 >>>>>>>>>> [2018-07-16 17:38:04.738885] I [dht-rebalance.c:1645:dht_migrate_file] >>>>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis >>>>>>>>>> ts/wz/wz/Npc.wz/9110012.img.xml: attempting to move from >>>>>>>>>> data-client-0 to data-client-2 >>>>>>>>>> [2018-07-16 17:38:04.745722] I [MSGID: 109022] >>>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>>>> ts/wz/wz/Npc.wz/9201002.img.xml from subvolume data-client-0 to >>>>>>>>>> data-client-2 >>>>>>>>>> [2018-07-16 17:38:04.812368] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108308134 tmp_cnt >>>>>>>>>> 55419279917056,rate_processed=446579.386035, elapsed >>>>>>>>>> 96530.000000 >>>>>>>>>> [2018-07-16 17:38:04.812417] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124097263 >>>>>>>>>> seconds, seconds left = 124000733 >>>>>>>>>> [2018-07-16 17:38:04.812465] I [MSGID: 109028] >>>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: >>>>>>>>>> Rebalance is in progress. Time taken is 96530.00 secs >>>>>>>>>> [2018-07-16 17:38:04.812489] I [MSGID: 109028] >>>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>>>> migrated: 36877, size: 12270261443, lookups: 50715, failures: 0, skipped: 0 >>>>>>>>>> [2018-07-16 17:38:04.992413] I [dht-rebalance.c:1645:dht_migrate_file] >>>>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis >>>>>>>>>> ts/wz/wz/Npc.wz/2050000.img.xml: attempting to move from >>>>>>>>>> data-client-0 to data-client-2 >>>>>>>>>> [2018-07-16 17:38:04.994122] I [MSGID: 109022] >>>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>>>> ts/wz/wz/Npc.wz/9110012.img.xml from subvolume data-client-0 to >>>>>>>>>> data-client-2 >>>>>>>>>> [2018-07-16 17:38:06.855618] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt >>>>>>>>>> 55419279917056,rate_processed=446570.244043, elapsed >>>>>>>>>> 96532.000000 >>>>>>>>>> [2018-07-16 17:38:06.855719] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124099804 >>>>>>>>>> seconds, seconds left = 124003272 >>>>>>>>>> [2018-07-16 17:38:06.855770] I [MSGID: 109028] >>>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: >>>>>>>>>> Rebalance is in progress. Time taken is 96532.00 secs >>>>>>>>>> [2018-07-16 17:38:06.855793] I [MSGID: 109028] >>>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>>>> migrated: 36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0 >>>>>>>>>> [2018-07-16 17:38:08.511064] I [dht-rebalance.c:1645:dht_migrate_file] >>>>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis >>>>>>>>>> ts/wz/wz/Npc.wz/9201055.img.xml: attempting to move from >>>>>>>>>> data-client-0 to data-client-2 >>>>>>>>>> [2018-07-16 17:38:08.533029] I [MSGID: 109022] >>>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed >>>>>>>>>> migration of /this/is/a/file/path/that/exis >>>>>>>>>> ts/wz/wz/Npc.wz/2050000.img.xml from subvolume data-client-0 to >>>>>>>>>> data-client-2 >>>>>>>>>> [2018-07-16 17:38:08.899708] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt >>>>>>>>>> 55419279917056,rate_processed=446560.991961, elapsed >>>>>>>>>> 96534.000000 >>>>>>>>>> [2018-07-16 17:38:08.899791] I [dht-rebalance.c:5130:gf_defrag_status_get] >>>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124102375 >>>>>>>>>> seconds, seconds left = 124005841 >>>>>>>>>> [2018-07-16 17:38:08.899842] I [MSGID: 109028] >>>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: >>>>>>>>>> Rebalance is in progress. Time taken is 96534.00 secs >>>>>>>>>> [2018-07-16 17:38:08.899865] I [MSGID: 109028] >>>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>>>>>>>>> migrated: 36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Jul 16, 2018 at 7:37 AM, Nithya Balachandran < >>>>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> If possible, please send the rebalance logs as well. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 16 July 2018 at 10:14, Nithya Balachandran < >>>>>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Rusty, >>>>>>>>>>>> >>>>>>>>>>>> We need the following information: >>>>>>>>>>>> >>>>>>>>>>>> 1. The exact gluster version you are running >>>>>>>>>>>> 2. gluster volume info <volname> >>>>>>>>>>>> 3. gluster rebalance status >>>>>>>>>>>> 4. Information on the directory structure and file >>>>>>>>>>>> locations on your volume. >>>>>>>>>>>> 5. How many levels of directories >>>>>>>>>>>> 6. How many files and directories in each level >>>>>>>>>>>> 7. How many directories and files in total (a rough >>>>>>>>>>>> estimate) >>>>>>>>>>>> 8. Average file size >>>>>>>>>>>> >>>>>>>>>>>> Please note that having a rebalance running in the background >>>>>>>>>>>> should not affect your volume access in any way. However I would like to >>>>>>>>>>>> know why only 6000 files have been scanned in 6 hours. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Nithya >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 16 July 2018 at 06:13, Rusty Bower <rusty at rustybower.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hey folks, >>>>>>>>>>>>> >>>>>>>>>>>>> I just added a new brick to my existing gluster volume, but *gluster >>>>>>>>>>>>> volume rebalance data status* is telling me the >>>>>>>>>>>>> following: Estimated time left for rebalance to complete : > 2 months. >>>>>>>>>>>>> Please try again later. >>>>>>>>>>>>> >>>>>>>>>>>>> I already did a fix-mapping, but this thing is absolutely >>>>>>>>>>>>> crawling trying to rebalance everything (last estimate was ~40 years) >>>>>>>>>>>>> >>>>>>>>>>>>> Any thoughts on if this is a bug, or ways to speed this up? >>>>>>>>>>>>> It's taking ~6 hours to scan 6000 files, which seems unreasonably slow. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> Rusty >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180801/c35b06b7/attachment.html>