On Mon, 27 Sep 2021 22:03:01 +0200
Johannes Altmanninger <aclopte at gmail.com> wrote:
> > On Sun, Sep 26, 2021 at 01:54:13PM +0200, Jind?ich Makovi?ka via
> > rsync wrote:
> >
> > Applying the attached patch, which reduces the default allocation
> > to 32 pointers, and preallocates 32K pointers only for the main
> > file lists in send_file_list and recv_file_list, reduces the peak
> > memory usage in my case from 142MB to 12MB.
>
> The patch looks very reasonable from what I can tell.
> Out of curiosity, how did you measure peak memory?
> I used "/bin/time -v rsync", and already get (very small)
> improvements in maximum RSS when copying the rsync source tree :)
Thanks for feedback. I profiled the memory usage using valgrind, simply
by running
valgrind --tool=massif ./rsync -anx /usr /tmp/
This issue affects only large trees with lots of small subdirectories,
you will not see much improvement with just a couple of directories the
rsync tree has. The outputs below are generated from the sender process
output by ms_print .
Before the fix:
--------------------------------------------------------------------------------
Command: ./rsync -anx /usr /tmp/
Massif arguments: (none)
ms_print arguments: massif.out.199013
--------------------------------------------------------------------------------
MB
127.6^ #
| #
| #
| #
| @ #
| @ #
| @ #
| @ #
| @ # :
| @ # :
| @ # ::
| : ::: @ @ # ::
| : :: @ @ # ::
| : : :: @ @ : # : :: :
| : : : :: : : @ :@::: # : : :: :
| ::: :: :: :: : : @ :@: : # :: : ::::
| : ::: :::::: :: : : @ :@: : # : ::: : ::::
| : :::::::::: :: ::: @ :@: : # ::@ ::: : : ::::::
| : ::::::::::::::: ::: : @:: :@: : # ::@ :::@: : ::::@:
| : ::::::::::::::: :::::: @:::::@: @::::#::::@:::::@:::::@:::::@:::::@:
0
+----------------------------------------------------------------------->Gi
0 4.574
--------------------------------------------------------------------------------
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
47 2,853,846,986 133,806,904 133,270,491 536,413 0
99.60% (133,270,491B) (heap allocation functions) malloc/new/new[], --alloc-fns,
etc.
->99.12% (132,629,814B) 0x1329CC: my_alloc (in /home/henry/build/rsync/rsync)
| ->95.80% (128,188,416B) 0x11383E: flist_expand (in
/home/henry/build/rsync/rsync)
| | ->95.80% (128,188,416B) 0x117AAB: send_file_name (in
/home/henry/build/rsync/rsync)
| | | ->95.80% (128,188,416B) 0x1184B4: send_directory (in
/home/henry/build/rsync/rsync)
| | | | ->95.80% (128,188,416B) 0x118CD1: send1extra (in
/home/henry/build/rsync/rsync)
| | | | ->95.80% (128,188,416B) 0x1190F4: send_extra_file_list (in
/home/henry/build/rsync/rsync)
| | | | ->95.80% (128,188,416B) 0x12A17A: send_files (in
/home/henry/build/rsync/rsync)
| | | | | ->95.80% (128,188,416B) 0x1362C5: client_run (in
/home/henry/build/rsync/rsync)
| | | | | ->95.80% (128,188,416B) 0x136F2D: start_client (in
/home/henry/build/rsync/rsync)
| | | | | ->95.80% (128,188,416B) 0x137604: main (in
/home/henry/build/rsync/rsync)
| | | | |
| | | | ->00.00% (0B) in 1+ places, all below ms_print's threshold
(01.00%)
| | | |
| | | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
| | |
| | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
| |
| ->02.16% (2,883,584B) 0x16E977: pool_alloc (in
/home/henry/build/rsync/rsync)
| | ->02.16% (2,883,584B) 0x116E12: make_file (in
/home/henry/build/rsync/rsync)
| | ->02.16% (2,883,584B) 0x11747F: send_file_name (in
/home/henry/build/rsync/rsync)
| | ->02.06% (2,752,512B) 0x1184B4: send_directory (in
/home/henry/build/rsync/rsync)
| | | ->02.06% (2,752,512B) 0x118CD1: send1extra (in
/home/henry/build/rsync/rsync)
| | | ->02.06% (2,752,512B) 0x1190F4: send_extra_file_list (in
/home/henry/build/rsync/rsync)
| | | ->02.06% (2,752,512B) 0x12A17A: send_files (in
/home/henry/build/rsync/rsync)
| | | | ->02.06% (2,752,512B) 0x1362C5: client_run (in
/home/henry/build/rsync/rsync)
| | | | ->02.06% (2,752,512B) 0x136F2D: start_client (in
/home/henry/build/rsync/rsync)
| | | | ->02.06% (2,752,512B) 0x137604: main (in
/home/henry/build/rsync/rsync)
| | | |
| | | ->00.00% (0B) in 1+ places, all below ms_print's threshold
(01.00%)
| | |
| | ->00.10% (131,072B) in 1+ places, all below ms_print's threshold
(01.00%)
| |
| ->01.07% (1,425,928B) 0x116C90: make_file (in
/home/henry/build/rsync/rsync)
| | ->01.07% (1,425,928B) 0x11747F: send_file_name (in
/home/henry/build/rsync/rsync)
| | ->01.07% (1,425,928B) 0x1184B4: send_directory (in
/home/henry/build/rsync/rsync)
| | ->01.07% (1,425,928B) 0x118CD1: send1extra (in
/home/henry/build/rsync/rsync)
| | ->01.07% (1,425,928B) 0x1190F4: send_extra_file_list (in
/home/henry/build/rsync/rsync)
| | ->01.06% (1,422,960B) 0x12A17A: send_files (in
/home/henry/build/rsync/rsync)
| | | ->01.06% (1,422,960B) 0x1362C5: client_run (in
/home/henry/build/rsync/rsync)
| | | ->01.06% (1,422,960B) 0x136F2D: start_client (in
/home/henry/build/rsync/rsync)
| | | ->01.06% (1,422,960B) 0x137604: main (in
/home/henry/build/rsync/rsync)
| | |
| | ->00.00% (2,968B) in 1+ places, all below ms_print's
threshold (01.00%)
| |
| ->00.10% (131,886B) in 1+ places, all below ms_print's threshold
(01.00%)
|
->00.48% (640,677B) in 1+ places, all below ms_print's threshold
(01.00%)
After the fix, the flist_expand allocs drop from 95% to 11%.
--------------------------------------------------------------------------------
Command: ./rsync -anx /usr /tmp/
Massif arguments: (none)
ms_print arguments: massif.out.198082
--------------------------------------------------------------------------------
MB
7.345^ #
| #
| # :::
| #: :: ::: :
| #: : @@::: : :
| @::#: ::::: ::::::@ ::: : :
| ::@: #:::: ::: : :: :@ ::: : :
| : : @: #:: : ::: : :: :@ ::: : :
| :::::@::: @: #:: : ::: : :: :@ ::: : :
| :@: :: @: : @: #:: : ::: : :: :@ ::: : :
| :: :: :@: :: @: : @: #:: : ::: : :: :@ ::: : :
| @@ : @@: ::@: :: @: : @: #:: : ::: : :: :@ ::: : :
| @ :: @ : ::@: :: @: : @: #:: : ::: : :: :@ ::: : :
| @@:::@ :: @ : ::@: :: @: : @: #:: : ::: : :: :@ ::: : :
| :::@ :: @ :: @ : ::@: :: @: : @: #:: : ::: : :: :@ ::: : :
| @@::: :@ :: @ :: @ : ::@: :: @: : @: #:: : ::: : :: :@ ::: : :
| ::@ ::: :@ :: @ :: @ : ::@: :: @: : @: #:: : ::: : :: :@ ::: : :
| @@@: @ ::: :@ :: @ :: @ : ::@: :: @: : @: #:: : ::: : :: :@ ::: : :
|:: :@@@: @ ::: :@ :: @ :: @ : ::@: :: @: : @: #:: : ::: : :: :@ ::: : :
|: :::@@@: @ ::: :@ :: @ :: @ : ::@: :: @: : @: #:: : ::: : :: :@ ::: : :
0
+----------------------------------------------------------------------->Gi
0 4.582
--------------------------------------------------------------------------------
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
--------------------------------------------------------------------------------
33 3,246,882,750 7,702,128 7,138,545 563,583 0
92.68% (7,138,545B) (heap allocation functions) malloc/new/new[], --alloc-fns,
etc.
->80.61% (6,208,772B) 0x1329EB: my_alloc (in /home/henry/build/rsync/rsync)
| ->59.56% (4,587,520B) 0x16E997: pool_alloc (in
/home/henry/build/rsync/rsync)
| | ->59.56% (4,587,520B) 0x116E0F: make_file (in
/home/henry/build/rsync/rsync)
| | ->59.56% (4,587,520B) 0x11747C: send_file_name (in
/home/henry/build/rsync/rsync)
| | ->57.86% (4,456,448B) 0x1184B1: send_directory (in
/home/henry/build/rsync/rsync)
| | | ->57.86% (4,456,448B) 0x118CCE: send1extra (in
/home/henry/build/rsync/rsync)
| | | ->57.86% (4,456,448B) 0x1190F1: send_extra_file_list (in
/home/henry/build/rsync/rsync)
| | | ->57.86% (4,456,448B) 0x12A199: send_files (in
/home/henry/build/rsync/rsync)
| | | | ->57.86% (4,456,448B) 0x1362E4: client_run (in
/home/henry/build/rsync/rsync)
| | | | ->57.86% (4,456,448B) 0x136F4C: start_client (in
/home/henry/build/rsync/rsync)
| | | | ->57.86% (4,456,448B) 0x137623: main (in
/home/henry/build/rsync/rsync)
| | | |
| | | ->00.00% (0B) in 1+ places, all below ms_print's threshold
(01.00%)
| | |
| | ->01.70% (131,072B) 0x11A220: send_file_list (in
/home/henry/build/rsync/rsync)
| | ->01.70% (131,072B) 0x136275: client_run (in
/home/henry/build/rsync/rsync)
| | ->01.70% (131,072B) 0x136F4C: start_client (in
/home/henry/build/rsync/rsync)
| | ->01.70% (131,072B) 0x137623: main (in
/home/henry/build/rsync/rsync)
| |
| ->19.33% (1,489,110B) 0x116C8D: make_file (in
/home/henry/build/rsync/rsync)
| | ->19.33% (1,489,110B) 0x11747C: send_file_name (in
/home/henry/build/rsync/rsync)
| | ->19.33% (1,489,110B) 0x1184B1: send_directory (in
/home/henry/build/rsync/rsync)
| | ->19.33% (1,489,110B) 0x118CCE: send1extra (in
/home/henry/build/rsync/rsync)
| | ->19.33% (1,489,110B) 0x1190F1: send_extra_file_list (in
/home/henry/build/rsync/rsync)
| | ->19.30% (1,486,142B) 0x12A199: send_files (in
/home/henry/build/rsync/rsync)
| | | ->19.30% (1,486,142B) 0x1362E4: client_run (in
/home/henry/build/rsync/rsync)
| | | ->19.30% (1,486,142B) 0x136F4C: start_client (in
/home/henry/build/rsync/rsync)
| | | ->19.30% (1,486,142B) 0x137623: main (in
/home/henry/build/rsync/rsync)
| | |
| | ->00.04% (2,968B) in 1+ places, all below ms_print's
threshold (01.00%)
| |
| ->01.70% (131,072B) 0x145EEC: alloc_xbuf (in /home/henry/build/rsync/rsync)
| | ->01.70% (131,072B) in 3 places, all below massif's threshold (1.00%)
| |
| ->00.01% (1,070B) in 1+ places, all below ms_print's threshold (01.00%)
|
->11.11% (856,064B) 0x132A32: my_alloc (in /home/henry/build/rsync/rsync)
| ->11.11% (856,064B) 0x11383C: flist_expand (in
/home/henry/build/rsync/rsync)
| ->06.81% (524,288B) 0x117F7A: add_dirs_to_tree (in
/home/henry/build/rsync/rsync)
| | ->06.81% (524,288B) 0x119370: send_extra_file_list (in
/home/henry/build/rsync/rsync)
| | ->06.81% (524,288B) 0x12A199: send_files (in
/home/henry/build/rsync/rsync)
| | | ->06.81% (524,288B) 0x1362E4: client_run (in
/home/henry/build/rsync/rsync)
| | | ->06.81% (524,288B) 0x136F4C: start_client (in
/home/henry/build/rsync/rsync)
| | | ->06.81% (524,288B) 0x137623: main (in
/home/henry/build/rsync/rsync)
| | |
| | ->00.00% (0B) in 1+ places, all below ms_print's threshold
(01.00%)
| |
| ->04.31% (331,776B) 0x117AA8: send_file_name (in
/home/henry/build/rsync/rsync)
| ->04.31% (331,776B) 0x1184B1: send_directory (in
/home/henry/build/rsync/rsync)
| ->04.31% (331,776B) 0x118CCE: send1extra (in
/home/henry/build/rsync/rsync)
| ->04.31% (331,776B) 0x1190F1: send_extra_file_list (in
/home/henry/build/rsync/rsync)
| ->04.31% (331,776B) 0x12A199: send_files (in
/home/henry/build/rsync/rsync)
| | ->04.31% (331,776B) 0x1362E4: client_run (in
/home/henry/build/rsync/rsync)
| | ->04.31% (331,776B) 0x136F4C: start_client (in
/home/henry/build/rsync/rsync)
| | ->04.31% (331,776B) 0x137623: main (in
/home/henry/build/rsync/rsync)
| |
| ->00.00% (0B) in 1+ places, all below ms_print's threshold
(01.00%)
|
->00.96% (73,709B) in 1+ places, all below ms_print's threshold (01.00%)
> On Mon, Sep 27, 2021 at 04:42:25PM +0200, Jind?ich Makovi?ka via
> rsync wrote:
>
> > Reduce memory usage
> >
> > Start only with 32 entries for the partial file lists, instead of
> > 32k.
>
> The log message could be a bit more detailed. You already mentioned
> that the big 32k allocation predates the recursive version, that's
> very useful information to add here.
> Maybe even explain why we change the first check from
> "flist->malloced != FLIST_START" to
"flist->files".
> (Also I'd use "git send-email" to send patches inline but
I'm not
> sure what's the convention here)
>
> > diff --git a/flist.c b/flist.c
> > index 3442d868..0f7a64e6 100644
> > --- a/flist.c
> > +++ b/flist.c
> > @@ -303,11 +303,11 @@ static void flist_expand(struct file_list
> > *flist, int extra) if (flist->malloced < flist->used + extra)
> > flist->malloced = flist->used + extra;
> >
> > new_ptr = realloc_array(flist->files, struct file_struct
> > *, flist->malloced);
> > - if (DEBUG_GTE(FLIST, 1) && flist->malloced !=
FLIST_START)
> > {
> > + if (DEBUG_GTE(FLIST, 1) && flist->files) {
>
> Yep, the new check makes more sense because now it's more obvious
> that the debug message is only printed when flist->files is
> realloc'd, and not when it's allocated for the first time.
>
> > rprintf(FCLIENT, "[%s] expand file_list pointer
> > array to %s bytes, did%s move\n", who_am_i(),
> > big_num(sizeof flist->files[0] *
> > flist->malloced), (new_ptr == flist->files) ? " not" :
"");
> > }
> > @@ -2184,10 +2184,11 @@ struct file_list *send_file_list(int f, int
> > argc, char *argv[]) if (preserve_hard_links &&
protocol_version >> > 30 && !cur_flist) init_hard_links();
> > #endif
> >
> > flist = cur_flist = flist_new(0, "send_file_list");
> > + flist_expand(flist, FLIST_START_LARGE);
> > if (inc_recurse) {
> > dir_flist = flist_new(FLIST_TEMP,
> > "send_file_list");
>
> This probably wants an flist_expand(dir_flist, FLIST_START_LARGE),
> because dir_flist is a global. I think the idea is that all
> flist_new() inside loops/recursive calls should start small, but
> lists that are only ever allocated once should stay at 32k.
>
> > flags |= FLAG_DIVERT_DIRS;
> > } else
> > dir_flist = cur_flist;
> > @@ -2539,10 +2540,11 @@ struct file_list *recv_file_list(int f, int
> > dir_ndx) if (preserve_hard_links && !first_flist)
> > init_hard_links();
> > #endif
> >
> > flist = flist_new(0, "recv_file_list");
> > + flist_expand(flist, FLIST_START_LARGE);
> >
> > if (inc_recurse) {
> > if (flist->ndx_start == 1)
> > dir_flist = flist_new(FLIST_TEMP,
> > "recv_file_list");
>
> Same here I guess. Maybe we should add an "initial size"
parameter to
> flist_new(), so it can call flist_expand() automatically?
Thanks for the hints. I will add an explicit flist_expand for the time
being.
>
> > dstart = dir_flist->used;
> > diff --git a/rsync.h b/rsync.h
> > index 88319732..17f8700e 100644
> > --- a/rsync.h
> > +++ b/rsync.h
> > @@ -916,12 +916,13 @@ extern int xattrs_ndx;
> >
> > /*
> > * Start the flist array at FLIST_START entries and grow it
> > * by doubling until FLIST_LINEAR then grow by FLIST_LINEAR
> > */
> > -#define FLIST_START (32 * 1024)
> > -#define FLIST_LINEAR (FLIST_START * 512)
> > +#define FLIST_START (32)
> > +#define FLIST_START_LARGE (32 * 1024)
> > +#define FLIST_LINEAR (FLIST_START_LARGE * 512)
>
> Probably these should remain aligned (I'm assuming tab has width 8)
Point taken.
Regards,
--
Jindrich Makovicka