Hi folks, I've been doing work on memory reduction in Unladen Swallow, and during testing, LiveRanges seemed to be consuming one of the largest chunks of memory. I wrote a replacement allocator for use by BumpPtrAllocator which uses mmap()/munmap() in place of malloc()/free(). It has worked flawlessly in testing, and reduces memory usage quite nicely in Unladen Swallow. The code is available for review here. I'd appreciate feedback if there's an interest in integrating this into LLVM trunk: http://codereview.uplinklabs.net/1905049 Here are the results of our memory utilization tests. The 'Mem max' numbers aren't particularly revealing though, so take a look at the graphs. I think the spambayes benchmark was one of the most interesting. ### 2to3 ### Mem max: 39008.000 -> 38904.000: 1.0027x smaller Usage over time: http://tinyurl.com/3axczjc ### bzr_startup ### Mem max: 11996.000 -> 11984.000: 1.0010x smaller Usage over time: http://tinyurl.com/2ucnbhb ### call_method ### Mem max: 11632.000 -> 11544.000: 1.0076x smaller Usage over time: http://tinyurl.com/22r6y9r ### call_method_slots ### Mem max: 10908.000 -> 10820.000: 1.0081x smaller Usage over time: http://tinyurl.com/3yqb2r4 ### call_method_unknown ### Mem max: 11216.000 -> 11152.000: 1.0057x smaller Usage over time: http://tinyurl.com/3ypudfj ### call_simple ### Mem max: 10692.000 -> 10540.000: 1.0144x smaller Usage over time: http://tinyurl.com/2a62cbv ### django ### Mem max: 21600.000 -> 20672.000: 1.0449x smaller Usage over time: http://tinyurl.com/35rclpd ### float ### Mem max: 15904.000 -> 15852.000: 1.0033x smaller Usage over time: http://tinyurl.com/2vokmep ### hg_startup ### Mem max: 7000.000 -> 7012.000: 1.0017x larger Usage over time: http://tinyurl.com/3x4wneu ### iterative_count ### Mem max: 9992.000 -> 9908.000: 1.0085x smaller Usage over time: http://tinyurl.com/24dy7ql ### nbody ### Mem max: 13552.000 -> 13240.000: 1.0236x smaller Usage over time: http://tinyurl.com/23dstyu ### normal_startup ### Mem max: 5380.000 -> 5396.000: 1.0030x larger Usage over time: http://tinyurl.com/2fh7cmv ### nqueens ### Mem max: 12832.000 -> 12756.000: 1.0060x smaller Usage over time: http://tinyurl.com/29whema ### pickle ### Mem max: 6856.000 -> 6844.000: 1.0018x smaller Usage over time: http://tinyurl.com/3az5v6y ### pickle_dict ### Mem max: 6848.000 -> 6836.000: 1.0018x smaller Usage over time: http://tinyurl.com/2bkjdoh ### pickle_list ### Mem max: 6836.000 -> 6824.000: 1.0018x smaller Usage over time: http://tinyurl.com/23llzct ### regex_compile ### Mem max: 39176.000 -> 38536.000: 1.0166x smaller Usage over time: http://tinyurl.com/33wylgu ### regex_effbot ### Mem max: 12340.000 -> 12084.000: 1.0212x smaller Usage over time: http://tinyurl.com/37u84z4 ### regex_v8 ### Mem max: 33596.000 -> 33828.000: 1.0069x larger Usage over time: http://tinyurl.com/397hyfm ### richards ### Mem max: 12760.000 -> 12680.000: 1.0063x smaller Usage over time: http://tinyurl.com/25n3wkl ### rietveld ### Mem max: 29008.000 -> 28636.000: 1.0130x smaller Usage over time: http://tinyurl.com/25uu4x3 ### slowpickle ### Mem max: 14096.000 -> 13804.000: 1.0212x smaller Usage over time: http://tinyurl.com/2wzxmu2 ### slowspitfire ### Mem max: 94292.000 -> 93992.000: 1.0032x smaller Usage over time: http://tinyurl.com/2wo4lrs ### slowunpickle ### Mem max: 11620.000 -> 11516.000: 1.0090x smaller Usage over time: http://tinyurl.com/26pw2cr ### spambayes ### Mem max: 35896.000 -> 35860.000: 1.0010x smaller Usage over time: http://tinyurl.com/2dhtkeb ### startup_nosite ### Mem max: 4876.000 -> 4868.000: 1.0016x smaller Usage over time: http://tinyurl.com/3agcts8 ### threaded_count ### Mem max: 10048.000 -> 9972.000: 1.0076x smaller Usage over time: http://tinyurl.com/322hltw ### unpack_sequence ### Mem max: 11252.000 -> 11244.000: 1.0007x smaller Usage over time: http://tinyurl.com/34lsbqd ### unpickle ### Mem max: 6872.000 -> 6860.000: 1.0017x smaller Usage over time: http://tinyurl.com/2aeqeua ### unpickle_list ### Mem max: 6860.000 -> 6844.000: 1.0023x smaller Usage over time: http://tinyurl.com/36q766k And to gauge the performance impact, I also ran the speed tests. It seems using mmap()/munmap() has very little performance impact in either direction, so that's good: ### 2to3 ### 35.590589 -> 35.824554: 1.0066x slower ### bzr_startup ### Min: 0.157976 -> 0.155976: 1.0128x faster Avg: 0.167575 -> 0.168924: 1.0081x slower Not significant Stddev: 0.00334 -> 0.00716: 2.1463x larger Timeline: http://tinyurl.com/39thymp ### call_method ### Min: 0.878663 -> 0.884666: 1.0068x slower Avg: 0.887148 -> 0.888667: 1.0017x slower Not significant Stddev: 0.02062 -> 0.02074: 1.0058x larger Timeline: http://tinyurl.com/2fm39l2 ### call_method_slots ### Min: 0.872706 -> 0.867387: 1.0061x faster Avg: 0.877261 -> 0.872754: 1.0052x faster Significant (t=2.510615) Stddev: 0.01523 -> 0.01586: 1.0410x larger Timeline: http://tinyurl.com/3x84s9m ### call_method_unknown ### Min: 1.031445 -> 1.028433: 1.0029x faster Avg: 1.039063 -> 1.034296: 1.0046x faster Not significant Stddev: 0.03708 -> 0.03702: 1.0016x smaller Timeline: http://tinyurl.com/395gevs ### call_simple ### Min: 0.594110 -> 0.589934: 1.0071x faster Avg: 0.606276 -> 0.594366: 1.0200x faster Significant (t=5.874137) Stddev: 0.01760 -> 0.01752: 1.0049x smaller Timeline: http://tinyurl.com/2a2zv56 ### django ### Min: 0.997650 -> 0.993266: 1.0044x faster Avg: 0.999423 -> 0.995495: 1.0039x faster Significant (t=18.075408) Stddev: 0.00093 -> 0.00122: 1.3050x larger Timeline: http://tinyurl.com/28oa6wo ### float ### Min: 0.102826 -> 0.102910: 1.0008x slower Avg: 0.110088 -> 0.110280: 1.0017x slower Not significant Stddev: 0.02758 -> 0.02762: 1.0015x larger Timeline: http://tinyurl.com/2w6ol8d ### hg_startup ### Min: 0.045993 -> 0.044993: 1.0222x faster Avg: 0.053388 -> 0.053510: 1.0023x slower Not significant Stddev: 0.00250 -> 0.00258: 1.0322x larger Timeline: http://tinyurl.com/2ec392w ### iterative_count ### Min: 0.157216 -> 0.156526: 1.0044x faster Avg: 0.166971 -> 0.166897: 1.0004x faster Not significant Stddev: 0.06835 -> 0.07249: 1.0604x larger Timeline: http://tinyurl.com/2g9agwl ### nbody ### Min: 0.443087 -> 0.464941: 1.0493x slower Avg: 0.456435 -> 0.475809: 1.0424x slower Not significant Stddev: 0.05609 -> 0.05523: 1.0156x smaller Timeline: http://tinyurl.com/2wd6z8r ### normal_startup ### Min: 0.438015 -> 0.437763: 1.0006x faster Avg: 0.438425 -> 0.438810: 1.0009x slower Not significant Stddev: 0.00024 -> 0.00274: 11.6231x larger Timeline: http://tinyurl.com/34nunk3 ### nqueens ### Min: 0.693033 -> 0.698259: 1.0075x slower Avg: 0.698948 -> 0.704770: 1.0083x slower Not significant Stddev: 0.02644 -> 0.02590: 1.0208x smaller Timeline: http://tinyurl.com/39ydyjs ### pickle ### Min: 1.654750 -> 1.669246: 1.0088x slower Avg: 1.660298 -> 1.673813: 1.0081x slower Significant (t=-17.007317) Stddev: 0.00391 -> 0.00403: 1.0298x larger Timeline: http://tinyurl.com/36zz8yk ### pickle_dict ### Min: 1.859310 -> 1.862217: 1.0016x slower Avg: 1.864953 -> 1.863408: 1.0008x faster Significant (t=2.269051) Stddev: 0.00300 -> 0.00377: 1.2590x larger Timeline: http://tinyurl.com/32kz4l6 ### pickle_list ### Min: 1.059003 -> 1.045209: 1.0132x faster Avg: 1.065780 -> 1.048728: 1.0163x faster Significant (t=21.791102) Stddev: 0.00413 -> 0.00368: 1.1223x smaller Timeline: http://tinyurl.com/27rpxol ### regex_compile ### Min: 0.828427 -> 0.832179: 1.0045x slower Avg: 0.890980 -> 0.894830: 1.0043x slower Not significant Stddev: 0.26185 -> 0.26180: 1.0002x smaller Timeline: http://tinyurl.com/38c3z8m ### regex_effbot ### Min: 0.162540 -> 0.162873: 1.0020x slower Avg: 0.167092 -> 0.167389: 1.0018x slower Not significant Stddev: 0.02830 -> 0.02830: 1.0001x smaller Timeline: http://tinyurl.com/33r7s5y ### regex_v8 ### Min: 0.164368 -> 0.163174: 1.0073x faster Avg: 0.417027 -> 0.416113: 1.0022x faster Not significant Stddev: 0.86580 -> 0.86190: 1.0045x smaller Timeline: http://tinyurl.com/3yabd5v ### richards ### Min: 0.352872 -> 0.353289: 1.0012x slower Avg: 0.354989 -> 0.355434: 1.0013x slower Not significant Stddev: 0.00543 -> 0.00549: 1.0108x larger Timeline: http://tinyurl.com/36weagp ### rietveld ### Min: 0.693800 -> 0.695649: 1.0027x slower Avg: 0.984689 -> 0.984075: 1.0006x faster Not significant Stddev: 0.36204 -> 0.36497: 1.0081x larger Timeline: http://tinyurl.com/2wfw2z5 ### slowpickle ### Min: 0.772514 -> 0.757389: 1.0200x faster Avg: 0.821586 -> 0.805534: 1.0199x faster Not significant Stddev: 0.17998 -> 0.18494: 1.0275x larger Timeline: http://tinyurl.com/37rb8d5 ### slowspitfire ### Min: 1.022256 -> 1.023268: 1.0010x slower Avg: 1.023244 -> 1.024326: 1.0011x slower Significant (t=-6.305389) Stddev: 0.00022 -> 0.00119: 5.3064x larger Timeline: http://tinyurl.com/29h96r4 ### slowunpickle ### Min: 0.384167 -> 0.380310: 1.0101x faster Avg: 0.410666 -> 0.409001: 1.0041x faster Not significant Stddev: 0.08844 -> 0.09114: 1.0305x larger Timeline: http://tinyurl.com/286o5wt ### spambayes ### Min: 0.417142 -> 0.398649: 1.0464x faster Avg: 0.598665 -> 0.574443: 1.0422x faster Not significant Stddev: 0.58158 -> 0.57470: 1.0120x smaller Timeline: http://tinyurl.com/23ozclq ### startup_nosite ### Min: 0.333111 -> 0.332463: 1.0019x faster Avg: 0.338470 -> 0.335088: 1.0101x faster Significant (t=3.325272) Stddev: 0.00886 -> 0.00500: 1.7726x smaller Timeline: http://tinyurl.com/23ob2vy ### threaded_count ### Min: 0.181111 -> 0.182459: 1.0074x slower Avg: 0.214138 -> 0.211476: 1.0126x faster Not significant Stddev: 0.14987 -> 0.14531: 1.0314x smaller Timeline: http://tinyurl.com/2dkupks ### unpack_sequence ### Min: 0.000217 -> 0.000218: 1.0033x slower Avg: 0.000223 -> 0.000222: 1.0048x faster Significant (t=11.155116) Stddev: 0.00002 -> 0.00001: 3.0782x smaller Timeline: http://tinyurl.com/26tzfco ### unpickle ### Min: 1.187517 -> 1.173118: 1.0123x faster Avg: 1.212250 -> 1.178640: 1.0285x faster Significant (t=9.457625) Stddev: 0.02461 -> 0.00508: 4.8419x smaller Timeline: http://tinyurl.com/2djz64c ### unpickle_list ### Min: 1.106733 -> 1.086683: 1.0185x faster Avg: 1.147736 -> 1.129444: 1.0162x faster Significant (t=4.530626) Stddev: 0.01237 -> 0.02573: 2.0800x larger Timeline: http://tinyurl.com/2fctpe9 Any thoughts? - Steven
Hi Steven- Nice, but will this not break Windows? From an initial glance over your patch, it seems to assume the existence of mmap() in some form or other. Alistair On 8 Aug 2010, at 03:05, Steven Noonan wrote:> Hi folks, > > I've been doing work on memory reduction in Unladen Swallow, and > during testing, LiveRanges seemed to be consuming one of the largest > chunks of memory. I wrote a replacement allocator for use by > BumpPtrAllocator which uses mmap()/munmap() in place of > malloc()/free(). It has worked flawlessly in testing, and reduces > memory usage quite nicely in Unladen Swallow. > > The code is available for review here. I'd appreciate feedback if > there's an interest in integrating this into LLVM trunk: > http://codereview.uplinklabs.net/1905049 > > Here are the results of our memory utilization tests. The 'Mem max' > numbers aren't particularly revealing though, so take a look at the > graphs. I think the spambayes benchmark was one of the most > interesting. > > ### 2to3 ### > Mem max: 39008.000 -> 38904.000: 1.0027x smaller > Usage over time: http://tinyurl.com/3axczjc > > ### bzr_startup ### > Mem max: 11996.000 -> 11984.000: 1.0010x smaller > Usage over time: http://tinyurl.com/2ucnbhb > > ### call_method ### > Mem max: 11632.000 -> 11544.000: 1.0076x smaller > Usage over time: http://tinyurl.com/22r6y9r > > ### call_method_slots ### > Mem max: 10908.000 -> 10820.000: 1.0081x smaller > Usage over time: http://tinyurl.com/3yqb2r4 > > ### call_method_unknown ### > Mem max: 11216.000 -> 11152.000: 1.0057x smaller > Usage over time: http://tinyurl.com/3ypudfj > > ### call_simple ### > Mem max: 10692.000 -> 10540.000: 1.0144x smaller > Usage over time: http://tinyurl.com/2a62cbv > > ### django ### > Mem max: 21600.000 -> 20672.000: 1.0449x smaller > Usage over time: http://tinyurl.com/35rclpd > > ### float ### > Mem max: 15904.000 -> 15852.000: 1.0033x smaller > Usage over time: http://tinyurl.com/2vokmep > > ### hg_startup ### > Mem max: 7000.000 -> 7012.000: 1.0017x larger > Usage over time: http://tinyurl.com/3x4wneu > > ### iterative_count ### > Mem max: 9992.000 -> 9908.000: 1.0085x smaller > Usage over time: http://tinyurl.com/24dy7ql > > ### nbody ### > Mem max: 13552.000 -> 13240.000: 1.0236x smaller > Usage over time: http://tinyurl.com/23dstyu > > ### normal_startup ### > Mem max: 5380.000 -> 5396.000: 1.0030x larger > Usage over time: http://tinyurl.com/2fh7cmv > > ### nqueens ### > Mem max: 12832.000 -> 12756.000: 1.0060x smaller > Usage over time: http://tinyurl.com/29whema > > ### pickle ### > Mem max: 6856.000 -> 6844.000: 1.0018x smaller > Usage over time: http://tinyurl.com/3az5v6y > > ### pickle_dict ### > Mem max: 6848.000 -> 6836.000: 1.0018x smaller > Usage over time: http://tinyurl.com/2bkjdoh > > ### pickle_list ### > Mem max: 6836.000 -> 6824.000: 1.0018x smaller > Usage over time: http://tinyurl.com/23llzct > > ### regex_compile ### > Mem max: 39176.000 -> 38536.000: 1.0166x smaller > Usage over time: http://tinyurl.com/33wylgu > > ### regex_effbot ### > Mem max: 12340.000 -> 12084.000: 1.0212x smaller > Usage over time: http://tinyurl.com/37u84z4 > > ### regex_v8 ### > Mem max: 33596.000 -> 33828.000: 1.0069x larger > Usage over time: http://tinyurl.com/397hyfm > > ### richards ### > Mem max: 12760.000 -> 12680.000: 1.0063x smaller > Usage over time: http://tinyurl.com/25n3wkl > > ### rietveld ### > Mem max: 29008.000 -> 28636.000: 1.0130x smaller > Usage over time: http://tinyurl.com/25uu4x3 > > ### slowpickle ### > Mem max: 14096.000 -> 13804.000: 1.0212x smaller > Usage over time: http://tinyurl.com/2wzxmu2 > > ### slowspitfire ### > Mem max: 94292.000 -> 93992.000: 1.0032x smaller > Usage over time: http://tinyurl.com/2wo4lrs > > ### slowunpickle ### > Mem max: 11620.000 -> 11516.000: 1.0090x smaller > Usage over time: http://tinyurl.com/26pw2cr > > ### spambayes ### > Mem max: 35896.000 -> 35860.000: 1.0010x smaller > Usage over time: http://tinyurl.com/2dhtkeb > > ### startup_nosite ### > Mem max: 4876.000 -> 4868.000: 1.0016x smaller > Usage over time: http://tinyurl.com/3agcts8 > > ### threaded_count ### > Mem max: 10048.000 -> 9972.000: 1.0076x smaller > Usage over time: http://tinyurl.com/322hltw > > ### unpack_sequence ### > Mem max: 11252.000 -> 11244.000: 1.0007x smaller > Usage over time: http://tinyurl.com/34lsbqd > > ### unpickle ### > Mem max: 6872.000 -> 6860.000: 1.0017x smaller > Usage over time: http://tinyurl.com/2aeqeua > > ### unpickle_list ### > Mem max: 6860.000 -> 6844.000: 1.0023x smaller > Usage over time: http://tinyurl.com/36q766k > > > > > And to gauge the performance impact, I also ran the speed tests. It > seems using mmap()/munmap() has very little performance impact in > either direction, so that's good: > > ### 2to3 ### > 35.590589 -> 35.824554: 1.0066x slower > > ### bzr_startup ### > Min: 0.157976 -> 0.155976: 1.0128x faster > Avg: 0.167575 -> 0.168924: 1.0081x slower > Not significant > Stddev: 0.00334 -> 0.00716: 2.1463x larger > Timeline: http://tinyurl.com/39thymp > > ### call_method ### > Min: 0.878663 -> 0.884666: 1.0068x slower > Avg: 0.887148 -> 0.888667: 1.0017x slower > Not significant > Stddev: 0.02062 -> 0.02074: 1.0058x larger > Timeline: http://tinyurl.com/2fm39l2 > > ### call_method_slots ### > Min: 0.872706 -> 0.867387: 1.0061x faster > Avg: 0.877261 -> 0.872754: 1.0052x faster > Significant (t=2.510615) > Stddev: 0.01523 -> 0.01586: 1.0410x larger > Timeline: http://tinyurl.com/3x84s9m > > ### call_method_unknown ### > Min: 1.031445 -> 1.028433: 1.0029x faster > Avg: 1.039063 -> 1.034296: 1.0046x faster > Not significant > Stddev: 0.03708 -> 0.03702: 1.0016x smaller > Timeline: http://tinyurl.com/395gevs > > ### call_simple ### > Min: 0.594110 -> 0.589934: 1.0071x faster > Avg: 0.606276 -> 0.594366: 1.0200x faster > Significant (t=5.874137) > Stddev: 0.01760 -> 0.01752: 1.0049x smaller > Timeline: http://tinyurl.com/2a2zv56 > > ### django ### > Min: 0.997650 -> 0.993266: 1.0044x faster > Avg: 0.999423 -> 0.995495: 1.0039x faster > Significant (t=18.075408) > Stddev: 0.00093 -> 0.00122: 1.3050x larger > Timeline: http://tinyurl.com/28oa6wo > > ### float ### > Min: 0.102826 -> 0.102910: 1.0008x slower > Avg: 0.110088 -> 0.110280: 1.0017x slower > Not significant > Stddev: 0.02758 -> 0.02762: 1.0015x larger > Timeline: http://tinyurl.com/2w6ol8d > > ### hg_startup ### > Min: 0.045993 -> 0.044993: 1.0222x faster > Avg: 0.053388 -> 0.053510: 1.0023x slower > Not significant > Stddev: 0.00250 -> 0.00258: 1.0322x larger > Timeline: http://tinyurl.com/2ec392w > > ### iterative_count ### > Min: 0.157216 -> 0.156526: 1.0044x faster > Avg: 0.166971 -> 0.166897: 1.0004x faster > Not significant > Stddev: 0.06835 -> 0.07249: 1.0604x larger > Timeline: http://tinyurl.com/2g9agwl > > ### nbody ### > Min: 0.443087 -> 0.464941: 1.0493x slower > Avg: 0.456435 -> 0.475809: 1.0424x slower > Not significant > Stddev: 0.05609 -> 0.05523: 1.0156x smaller > Timeline: http://tinyurl.com/2wd6z8r > > ### normal_startup ### > Min: 0.438015 -> 0.437763: 1.0006x faster > Avg: 0.438425 -> 0.438810: 1.0009x slower > Not significant > Stddev: 0.00024 -> 0.00274: 11.6231x larger > Timeline: http://tinyurl.com/34nunk3 > > ### nqueens ### > Min: 0.693033 -> 0.698259: 1.0075x slower > Avg: 0.698948 -> 0.704770: 1.0083x slower > Not significant > Stddev: 0.02644 -> 0.02590: 1.0208x smaller > Timeline: http://tinyurl.com/39ydyjs > > ### pickle ### > Min: 1.654750 -> 1.669246: 1.0088x slower > Avg: 1.660298 -> 1.673813: 1.0081x slower > Significant (t=-17.007317) > Stddev: 0.00391 -> 0.00403: 1.0298x larger > Timeline: http://tinyurl.com/36zz8yk > > ### pickle_dict ### > Min: 1.859310 -> 1.862217: 1.0016x slower > Avg: 1.864953 -> 1.863408: 1.0008x faster > Significant (t=2.269051) > Stddev: 0.00300 -> 0.00377: 1.2590x larger > Timeline: http://tinyurl.com/32kz4l6 > > ### pickle_list ### > Min: 1.059003 -> 1.045209: 1.0132x faster > Avg: 1.065780 -> 1.048728: 1.0163x faster > Significant (t=21.791102) > Stddev: 0.00413 -> 0.00368: 1.1223x smaller > Timeline: http://tinyurl.com/27rpxol > > ### regex_compile ### > Min: 0.828427 -> 0.832179: 1.0045x slower > Avg: 0.890980 -> 0.894830: 1.0043x slower > Not significant > Stddev: 0.26185 -> 0.26180: 1.0002x smaller > Timeline: http://tinyurl.com/38c3z8m > > ### regex_effbot ### > Min: 0.162540 -> 0.162873: 1.0020x slower > Avg: 0.167092 -> 0.167389: 1.0018x slower > Not significant > Stddev: 0.02830 -> 0.02830: 1.0001x smaller > Timeline: http://tinyurl.com/33r7s5y > > ### regex_v8 ### > Min: 0.164368 -> 0.163174: 1.0073x faster > Avg: 0.417027 -> 0.416113: 1.0022x faster > Not significant > Stddev: 0.86580 -> 0.86190: 1.0045x smaller > Timeline: http://tinyurl.com/3yabd5v > > ### richards ### > Min: 0.352872 -> 0.353289: 1.0012x slower > Avg: 0.354989 -> 0.355434: 1.0013x slower > Not significant > Stddev: 0.00543 -> 0.00549: 1.0108x larger > Timeline: http://tinyurl.com/36weagp > > ### rietveld ### > Min: 0.693800 -> 0.695649: 1.0027x slower > Avg: 0.984689 -> 0.984075: 1.0006x faster > Not significant > Stddev: 0.36204 -> 0.36497: 1.0081x larger > Timeline: http://tinyurl.com/2wfw2z5 > > ### slowpickle ### > Min: 0.772514 -> 0.757389: 1.0200x faster > Avg: 0.821586 -> 0.805534: 1.0199x faster > Not significant > Stddev: 0.17998 -> 0.18494: 1.0275x larger > Timeline: http://tinyurl.com/37rb8d5 > > ### slowspitfire ### > Min: 1.022256 -> 1.023268: 1.0010x slower > Avg: 1.023244 -> 1.024326: 1.0011x slower > Significant (t=-6.305389) > Stddev: 0.00022 -> 0.00119: 5.3064x larger > Timeline: http://tinyurl.com/29h96r4 > > ### slowunpickle ### > Min: 0.384167 -> 0.380310: 1.0101x faster > Avg: 0.410666 -> 0.409001: 1.0041x faster > Not significant > Stddev: 0.08844 -> 0.09114: 1.0305x larger > Timeline: http://tinyurl.com/286o5wt > > ### spambayes ### > Min: 0.417142 -> 0.398649: 1.0464x faster > Avg: 0.598665 -> 0.574443: 1.0422x faster > Not significant > Stddev: 0.58158 -> 0.57470: 1.0120x smaller > Timeline: http://tinyurl.com/23ozclq > > ### startup_nosite ### > Min: 0.333111 -> 0.332463: 1.0019x faster > Avg: 0.338470 -> 0.335088: 1.0101x faster > Significant (t=3.325272) > Stddev: 0.00886 -> 0.00500: 1.7726x smaller > Timeline: http://tinyurl.com/23ob2vy > > ### threaded_count ### > Min: 0.181111 -> 0.182459: 1.0074x slower > Avg: 0.214138 -> 0.211476: 1.0126x faster > Not significant > Stddev: 0.14987 -> 0.14531: 1.0314x smaller > Timeline: http://tinyurl.com/2dkupks > > ### unpack_sequence ### > Min: 0.000217 -> 0.000218: 1.0033x slower > Avg: 0.000223 -> 0.000222: 1.0048x faster > Significant (t=11.155116) > Stddev: 0.00002 -> 0.00001: 3.0782x smaller > Timeline: http://tinyurl.com/26tzfco > > ### unpickle ### > Min: 1.187517 -> 1.173118: 1.0123x faster > Avg: 1.212250 -> 1.178640: 1.0285x faster > Significant (t=9.457625) > Stddev: 0.02461 -> 0.00508: 4.8419x smaller > Timeline: http://tinyurl.com/2djz64c > > ### unpickle_list ### > Min: 1.106733 -> 1.086683: 1.0185x faster > Avg: 1.147736 -> 1.129444: 1.0162x faster > Significant (t=4.530626) > Stddev: 0.01237 -> 0.02573: 2.0800x larger > Timeline: http://tinyurl.com/2fctpe9 > > Any thoughts? > > - Steven > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Correct, it does at the moment. I noted this on the code review. - Steven On Sun, Aug 8, 2010 at 2:50 PM, Alistair Lynn <arplynn at gmail.com> wrote:> Hi Steven- > > Nice, but will this not break Windows? From an initial glance over your patch, it seems to assume the existence of mmap() in some form or other. > > Alistair > > On 8 Aug 2010, at 03:05, Steven Noonan wrote: > >> Hi folks, >> >> I've been doing work on memory reduction in Unladen Swallow, and >> during testing, LiveRanges seemed to be consuming one of the largest >> chunks of memory. I wrote a replacement allocator for use by >> BumpPtrAllocator which uses mmap()/munmap() in place of >> malloc()/free(). It has worked flawlessly in testing, and reduces >> memory usage quite nicely in Unladen Swallow. >> >> The code is available for review here. I'd appreciate feedback if >> there's an interest in integrating this into LLVM trunk: >> http://codereview.uplinklabs.net/1905049 >> >> Here are the results of our memory utilization tests. The 'Mem max' >> numbers aren't particularly revealing though, so take a look at the >> graphs. I think the spambayes benchmark was one of the most >> interesting. >> >> ### 2to3 ### >> Mem max: 39008.000 -> 38904.000: 1.0027x smaller >> Usage over time: http://tinyurl.com/3axczjc >> >> ### bzr_startup ### >> Mem max: 11996.000 -> 11984.000: 1.0010x smaller >> Usage over time: http://tinyurl.com/2ucnbhb >> >> ### call_method ### >> Mem max: 11632.000 -> 11544.000: 1.0076x smaller >> Usage over time: http://tinyurl.com/22r6y9r >> >> ### call_method_slots ### >> Mem max: 10908.000 -> 10820.000: 1.0081x smaller >> Usage over time: http://tinyurl.com/3yqb2r4 >> >> ### call_method_unknown ### >> Mem max: 11216.000 -> 11152.000: 1.0057x smaller >> Usage over time: http://tinyurl.com/3ypudfj >> >> ### call_simple ### >> Mem max: 10692.000 -> 10540.000: 1.0144x smaller >> Usage over time: http://tinyurl.com/2a62cbv >> >> ### django ### >> Mem max: 21600.000 -> 20672.000: 1.0449x smaller >> Usage over time: http://tinyurl.com/35rclpd >> >> ### float ### >> Mem max: 15904.000 -> 15852.000: 1.0033x smaller >> Usage over time: http://tinyurl.com/2vokmep >> >> ### hg_startup ### >> Mem max: 7000.000 -> 7012.000: 1.0017x larger >> Usage over time: http://tinyurl.com/3x4wneu >> >> ### iterative_count ### >> Mem max: 9992.000 -> 9908.000: 1.0085x smaller >> Usage over time: http://tinyurl.com/24dy7ql >> >> ### nbody ### >> Mem max: 13552.000 -> 13240.000: 1.0236x smaller >> Usage over time: http://tinyurl.com/23dstyu >> >> ### normal_startup ### >> Mem max: 5380.000 -> 5396.000: 1.0030x larger >> Usage over time: http://tinyurl.com/2fh7cmv >> >> ### nqueens ### >> Mem max: 12832.000 -> 12756.000: 1.0060x smaller >> Usage over time: http://tinyurl.com/29whema >> >> ### pickle ### >> Mem max: 6856.000 -> 6844.000: 1.0018x smaller >> Usage over time: http://tinyurl.com/3az5v6y >> >> ### pickle_dict ### >> Mem max: 6848.000 -> 6836.000: 1.0018x smaller >> Usage over time: http://tinyurl.com/2bkjdoh >> >> ### pickle_list ### >> Mem max: 6836.000 -> 6824.000: 1.0018x smaller >> Usage over time: http://tinyurl.com/23llzct >> >> ### regex_compile ### >> Mem max: 39176.000 -> 38536.000: 1.0166x smaller >> Usage over time: http://tinyurl.com/33wylgu >> >> ### regex_effbot ### >> Mem max: 12340.000 -> 12084.000: 1.0212x smaller >> Usage over time: http://tinyurl.com/37u84z4 >> >> ### regex_v8 ### >> Mem max: 33596.000 -> 33828.000: 1.0069x larger >> Usage over time: http://tinyurl.com/397hyfm >> >> ### richards ### >> Mem max: 12760.000 -> 12680.000: 1.0063x smaller >> Usage over time: http://tinyurl.com/25n3wkl >> >> ### rietveld ### >> Mem max: 29008.000 -> 28636.000: 1.0130x smaller >> Usage over time: http://tinyurl.com/25uu4x3 >> >> ### slowpickle ### >> Mem max: 14096.000 -> 13804.000: 1.0212x smaller >> Usage over time: http://tinyurl.com/2wzxmu2 >> >> ### slowspitfire ### >> Mem max: 94292.000 -> 93992.000: 1.0032x smaller >> Usage over time: http://tinyurl.com/2wo4lrs >> >> ### slowunpickle ### >> Mem max: 11620.000 -> 11516.000: 1.0090x smaller >> Usage over time: http://tinyurl.com/26pw2cr >> >> ### spambayes ### >> Mem max: 35896.000 -> 35860.000: 1.0010x smaller >> Usage over time: http://tinyurl.com/2dhtkeb >> >> ### startup_nosite ### >> Mem max: 4876.000 -> 4868.000: 1.0016x smaller >> Usage over time: http://tinyurl.com/3agcts8 >> >> ### threaded_count ### >> Mem max: 10048.000 -> 9972.000: 1.0076x smaller >> Usage over time: http://tinyurl.com/322hltw >> >> ### unpack_sequence ### >> Mem max: 11252.000 -> 11244.000: 1.0007x smaller >> Usage over time: http://tinyurl.com/34lsbqd >> >> ### unpickle ### >> Mem max: 6872.000 -> 6860.000: 1.0017x smaller >> Usage over time: http://tinyurl.com/2aeqeua >> >> ### unpickle_list ### >> Mem max: 6860.000 -> 6844.000: 1.0023x smaller >> Usage over time: http://tinyurl.com/36q766k >> >> >> >> >> And to gauge the performance impact, I also ran the speed tests. It >> seems using mmap()/munmap() has very little performance impact in >> either direction, so that's good: >> >> ### 2to3 ### >> 35.590589 -> 35.824554: 1.0066x slower >> >> ### bzr_startup ### >> Min: 0.157976 -> 0.155976: 1.0128x faster >> Avg: 0.167575 -> 0.168924: 1.0081x slower >> Not significant >> Stddev: 0.00334 -> 0.00716: 2.1463x larger >> Timeline: http://tinyurl.com/39thymp >> >> ### call_method ### >> Min: 0.878663 -> 0.884666: 1.0068x slower >> Avg: 0.887148 -> 0.888667: 1.0017x slower >> Not significant >> Stddev: 0.02062 -> 0.02074: 1.0058x larger >> Timeline: http://tinyurl.com/2fm39l2 >> >> ### call_method_slots ### >> Min: 0.872706 -> 0.867387: 1.0061x faster >> Avg: 0.877261 -> 0.872754: 1.0052x faster >> Significant (t=2.510615) >> Stddev: 0.01523 -> 0.01586: 1.0410x larger >> Timeline: http://tinyurl.com/3x84s9m >> >> ### call_method_unknown ### >> Min: 1.031445 -> 1.028433: 1.0029x faster >> Avg: 1.039063 -> 1.034296: 1.0046x faster >> Not significant >> Stddev: 0.03708 -> 0.03702: 1.0016x smaller >> Timeline: http://tinyurl.com/395gevs >> >> ### call_simple ### >> Min: 0.594110 -> 0.589934: 1.0071x faster >> Avg: 0.606276 -> 0.594366: 1.0200x faster >> Significant (t=5.874137) >> Stddev: 0.01760 -> 0.01752: 1.0049x smaller >> Timeline: http://tinyurl.com/2a2zv56 >> >> ### django ### >> Min: 0.997650 -> 0.993266: 1.0044x faster >> Avg: 0.999423 -> 0.995495: 1.0039x faster >> Significant (t=18.075408) >> Stddev: 0.00093 -> 0.00122: 1.3050x larger >> Timeline: http://tinyurl.com/28oa6wo >> >> ### float ### >> Min: 0.102826 -> 0.102910: 1.0008x slower >> Avg: 0.110088 -> 0.110280: 1.0017x slower >> Not significant >> Stddev: 0.02758 -> 0.02762: 1.0015x larger >> Timeline: http://tinyurl.com/2w6ol8d >> >> ### hg_startup ### >> Min: 0.045993 -> 0.044993: 1.0222x faster >> Avg: 0.053388 -> 0.053510: 1.0023x slower >> Not significant >> Stddev: 0.00250 -> 0.00258: 1.0322x larger >> Timeline: http://tinyurl.com/2ec392w >> >> ### iterative_count ### >> Min: 0.157216 -> 0.156526: 1.0044x faster >> Avg: 0.166971 -> 0.166897: 1.0004x faster >> Not significant >> Stddev: 0.06835 -> 0.07249: 1.0604x larger >> Timeline: http://tinyurl.com/2g9agwl >> >> ### nbody ### >> Min: 0.443087 -> 0.464941: 1.0493x slower >> Avg: 0.456435 -> 0.475809: 1.0424x slower >> Not significant >> Stddev: 0.05609 -> 0.05523: 1.0156x smaller >> Timeline: http://tinyurl.com/2wd6z8r >> >> ### normal_startup ### >> Min: 0.438015 -> 0.437763: 1.0006x faster >> Avg: 0.438425 -> 0.438810: 1.0009x slower >> Not significant >> Stddev: 0.00024 -> 0.00274: 11.6231x larger >> Timeline: http://tinyurl.com/34nunk3 >> >> ### nqueens ### >> Min: 0.693033 -> 0.698259: 1.0075x slower >> Avg: 0.698948 -> 0.704770: 1.0083x slower >> Not significant >> Stddev: 0.02644 -> 0.02590: 1.0208x smaller >> Timeline: http://tinyurl.com/39ydyjs >> >> ### pickle ### >> Min: 1.654750 -> 1.669246: 1.0088x slower >> Avg: 1.660298 -> 1.673813: 1.0081x slower >> Significant (t=-17.007317) >> Stddev: 0.00391 -> 0.00403: 1.0298x larger >> Timeline: http://tinyurl.com/36zz8yk >> >> ### pickle_dict ### >> Min: 1.859310 -> 1.862217: 1.0016x slower >> Avg: 1.864953 -> 1.863408: 1.0008x faster >> Significant (t=2.269051) >> Stddev: 0.00300 -> 0.00377: 1.2590x larger >> Timeline: http://tinyurl.com/32kz4l6 >> >> ### pickle_list ### >> Min: 1.059003 -> 1.045209: 1.0132x faster >> Avg: 1.065780 -> 1.048728: 1.0163x faster >> Significant (t=21.791102) >> Stddev: 0.00413 -> 0.00368: 1.1223x smaller >> Timeline: http://tinyurl.com/27rpxol >> >> ### regex_compile ### >> Min: 0.828427 -> 0.832179: 1.0045x slower >> Avg: 0.890980 -> 0.894830: 1.0043x slower >> Not significant >> Stddev: 0.26185 -> 0.26180: 1.0002x smaller >> Timeline: http://tinyurl.com/38c3z8m >> >> ### regex_effbot ### >> Min: 0.162540 -> 0.162873: 1.0020x slower >> Avg: 0.167092 -> 0.167389: 1.0018x slower >> Not significant >> Stddev: 0.02830 -> 0.02830: 1.0001x smaller >> Timeline: http://tinyurl.com/33r7s5y >> >> ### regex_v8 ### >> Min: 0.164368 -> 0.163174: 1.0073x faster >> Avg: 0.417027 -> 0.416113: 1.0022x faster >> Not significant >> Stddev: 0.86580 -> 0.86190: 1.0045x smaller >> Timeline: http://tinyurl.com/3yabd5v >> >> ### richards ### >> Min: 0.352872 -> 0.353289: 1.0012x slower >> Avg: 0.354989 -> 0.355434: 1.0013x slower >> Not significant >> Stddev: 0.00543 -> 0.00549: 1.0108x larger >> Timeline: http://tinyurl.com/36weagp >> >> ### rietveld ### >> Min: 0.693800 -> 0.695649: 1.0027x slower >> Avg: 0.984689 -> 0.984075: 1.0006x faster >> Not significant >> Stddev: 0.36204 -> 0.36497: 1.0081x larger >> Timeline: http://tinyurl.com/2wfw2z5 >> >> ### slowpickle ### >> Min: 0.772514 -> 0.757389: 1.0200x faster >> Avg: 0.821586 -> 0.805534: 1.0199x faster >> Not significant >> Stddev: 0.17998 -> 0.18494: 1.0275x larger >> Timeline: http://tinyurl.com/37rb8d5 >> >> ### slowspitfire ### >> Min: 1.022256 -> 1.023268: 1.0010x slower >> Avg: 1.023244 -> 1.024326: 1.0011x slower >> Significant (t=-6.305389) >> Stddev: 0.00022 -> 0.00119: 5.3064x larger >> Timeline: http://tinyurl.com/29h96r4 >> >> ### slowunpickle ### >> Min: 0.384167 -> 0.380310: 1.0101x faster >> Avg: 0.410666 -> 0.409001: 1.0041x faster >> Not significant >> Stddev: 0.08844 -> 0.09114: 1.0305x larger >> Timeline: http://tinyurl.com/286o5wt >> >> ### spambayes ### >> Min: 0.417142 -> 0.398649: 1.0464x faster >> Avg: 0.598665 -> 0.574443: 1.0422x faster >> Not significant >> Stddev: 0.58158 -> 0.57470: 1.0120x smaller >> Timeline: http://tinyurl.com/23ozclq >> >> ### startup_nosite ### >> Min: 0.333111 -> 0.332463: 1.0019x faster >> Avg: 0.338470 -> 0.335088: 1.0101x faster >> Significant (t=3.325272) >> Stddev: 0.00886 -> 0.00500: 1.7726x smaller >> Timeline: http://tinyurl.com/23ob2vy >> >> ### threaded_count ### >> Min: 0.181111 -> 0.182459: 1.0074x slower >> Avg: 0.214138 -> 0.211476: 1.0126x faster >> Not significant >> Stddev: 0.14987 -> 0.14531: 1.0314x smaller >> Timeline: http://tinyurl.com/2dkupks >> >> ### unpack_sequence ### >> Min: 0.000217 -> 0.000218: 1.0033x slower >> Avg: 0.000223 -> 0.000222: 1.0048x faster >> Significant (t=11.155116) >> Stddev: 0.00002 -> 0.00001: 3.0782x smaller >> Timeline: http://tinyurl.com/26tzfco >> >> ### unpickle ### >> Min: 1.187517 -> 1.173118: 1.0123x faster >> Avg: 1.212250 -> 1.178640: 1.0285x faster >> Significant (t=9.457625) >> Stddev: 0.02461 -> 0.00508: 4.8419x smaller >> Timeline: http://tinyurl.com/2djz64c >> >> ### unpickle_list ### >> Min: 1.106733 -> 1.086683: 1.0185x faster >> Avg: 1.147736 -> 1.129444: 1.0162x faster >> Significant (t=4.530626) >> Stddev: 0.01237 -> 0.02573: 2.0800x larger >> Timeline: http://tinyurl.com/2fctpe9 >> >> Any thoughts? >> >> - Steven >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
On Aug 7, 2010, at 7:05 PM, Steven Noonan wrote:> I've been doing work on memory reduction in Unladen Swallow, and > during testing, LiveRanges seemed to be consuming one of the largest > chunks of memory.That's interesting. How did you measure this? I'd love to see your data. Note that the LiveRange struct is allocated by a plain std::vector, and your patch doesn't change that. I assume you are talking about the VNInfo structs?> I wrote a replacement allocator for use by > BumpPtrAllocator which uses mmap()/munmap() in place of > malloc()/free().It's a bit more complicated than that. Modern malloc's use a whole bag of tricks to avoid lock contention in multiprocessor systems, and they know which allocation size the kernel likes, and which system calls to use. By calling mmap directly, you are throwing all that system specific knowledge away. It's great that you provide measurements, but it's not clear what you are measuring. Does 'mem max' include the overhead of asking the kernel for tiny 4K allocations, if any? Also, what is your operating system and architecture? That could make a big difference. Have you looked at the effect of twiddling the default 4K slab size in BumpPtrAllocator? I suspect you could get more dramatic results that way. Thanks for working on this! /jakob
On Sun, Aug 8, 2010 at 8:20 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:> > On Aug 7, 2010, at 7:05 PM, Steven Noonan wrote: >> I've been doing work on memory reduction in Unladen Swallow, and >> during testing, LiveRanges seemed to be consuming one of the largest >> chunks of memory. > > That's interesting. How did you measure this? I'd love to see your data. > > Note that the LiveRange struct is allocated by a plain std::vector, and your patch doesn't change that. I assume you are talking about the VNInfo structs?Steven has been using Instruments, and sending us screenshots. Does anyone else know a better way of exporting that data? I thought I dug into the register allocation code, and found the VNInfo::Allocator typedef. I assumed that was getting the traffic we saw in Instruments, but I don't have the data to back that up.>> I wrote a replacement allocator for use by >> BumpPtrAllocator which uses mmap()/munmap() in place of >> malloc()/free(). > > It's a bit more complicated than that. Modern malloc's use a whole bag of tricks to avoid lock contention in multiprocessor systems, and they know which allocation size the kernel likes, and which system calls to use. > > By calling mmap directly, you are throwing all that system specific knowledge away.So the goal of this particular modification was to find ways to return large, one-time allocations that happen during compilation back the OS. For unladen-swallow, we have a long-lived Python process where we JIT code every so often. We happen to generate an ungodly amount of code, which we're trying to reduce. However, this means that LLVM allocates a lot of memory for us, and it grows our heap by several MB over what it would normally be. The breakdown was roughly 8 MB gets allocated for this one compilation in the spam_bayes benchmark, with 2 MB coming form register allocation and 2 MB from SDNodes. We are looking at using mmap/munmap to avoid permanently growing the heap. This patch switches all allocators over to mmap, so you can see a lot of "stitches" in the graphs, where an allocator is created and thrown away quickly. Those allocations are probably better served by malloc.> It's great that you provide measurements, but it's not clear what you are measuring. Does 'mem max' include the overhead of asking the kernel for tiny 4K allocations, if any? Also, what is your operating system and architecture? That could make a big difference.The memory size measurements in this data are all taken using /proc/smaps data in Linux to find the number of dirty pages.> Have you looked at the effect of twiddling the default 4K slab size in BumpPtrAllocator? I suspect you could get more dramatic results that way.If we did that, one thing that might happen is malloc might start forwarding to mmap, but I think you have to allocate ~128K at a time to hit that threshold. Reid