>>>>> William Dunlap <wdunlap at tibco.com> >>>>> on Mon, 16 Nov 2015 16:01:42 -0800 writes:> If a quick running time is important and your models involve only > numeric data with no missing values and you are willing to spend more > programming time setting things up, the lsfit() function may work > better for you. > Bill Dunlap > TIBCO Software > wdunlap tibco.com or even faster is the extra-simple but fast .lm.fit() function (in R >= 3.1.0). I've written a small demo about it and published it here, http://rpubs.com/maechler/fast_lm Martin Maechler, ETH Zurich (and R Core) > On Mon, Nov 16, 2015 at 3:25 PM, Sasikumar Kandhasamy <ckmsasi at gmail.com> wrote: >> Thanks a lot Bill & Bert. >> >> Hi Bill, >> >> Sorry i was wrong on number of records, actually, i am using two dimensional >> data of 250K records each. And regarding CPU usage, it was the elapsed time. >> Infact, i have pined one core to run R. >> >> Thanks & Regards >> Sasi >> >> On Mon, Nov 16, 2015 at 2:04 PM, William Dunlap <wdunlap at tibco.com> wrote: >>> >>> You cannot do a linear regression with one column of data - there must >>> be at least one response column and one predictor. By default, lm >>> throws in a constant term which gives you a second predictor. If your >>> predictor is categorical, you get a new column for all but the first >>> unique value in it. >>> >>> lm() deals only with double precision data, at 8 bytes/number. Thus >>> 250k numbers occupies 2 million bytes. Your three columns (in the >>> non-categorical-predictor case) take up 6 million bytes, >>> >>> lm()'s output contains several columns the size of the response >>> variable: residuals, effects, and fitted.values. It also contains the >>> QR decomposition of the design matrix (the size of all the predictor >>> columns together). >>> >>> There are also some temporary variables generated in the course of the >>> computation. >>> >>> So your observed 40 MB memory usage seems reasonable. >>> >>> Use the object.size() function to see how big objects are and str() to >>> look at their structure. >>> >>> My laptop with a 2.5 GHz Intel i7 processor takes a quarter second to >>> fit a simple linear model with one numeric predictor and a constant >>> term. 6 seconds sounds slow. Is that cpu or elapsed time (use >>> system.time() to see)? >>> >>> >>> >>> Bill Dunlap >>> TIBCO Software >>> wdunlap tibco.com >>> >>> >>> On Mon, Nov 16, 2015 at 12:25 PM, Sasikumar Kandhasamy >>> <ckmsasi at gmail.com> wrote: >>> > Hi All, >>> > >>> > I have couple of clarifications on R run-time performance. I have >>> > R-3.2.2 >>> > package compiled for MIPS64 and am running it on my linux machine with >>> > mips64 processor (core speed 1.5GHz) and observing the following >>> > behaviors, >>> > >>> > 1. Applying "linear regression model" (lm) on 1MB of data (contains 1 >>> > column of 250K records) takes ~6 seconds to complete. Anyidea, is it an >>> > expected behavior or not? If not, can you please the suggestions or >>> > options >>> > to improve if we have any? >>> > >>> > 2. Also, the R process runtime virtual memory is increased by 40MB after >>> > applying the linear model on 1MB data. Is it also expected behavior? If >>> > it >>> > is expected, can you please share the insight of memory usage? >>> > >>> > Thanks in advance. >>> > >>> > Regards >>> > Sasi >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> > http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >> >> > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thanks a lot, Martin and William. Looks like, we can't apply prediction on lsfit and lm.fit objects. Because, i am trying to use lm object to predict the values for new data frame. Thanks & Regards Sasi On Tue, Nov 17, 2015 at 9:49 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:> >>>>> William Dunlap <wdunlap at tibco.com> > >>>>> on Mon, 16 Nov 2015 16:01:42 -0800 writes: > > > If a quick running time is important and your models involve only > > numeric data with no missing values and you are willing to spend more > > programming time setting things up, the lsfit() function may work > > better for you. > > > Bill Dunlap > > TIBCO Software > > wdunlap tibco.com > > or even faster is the extra-simple but fast .lm.fit() function > (in R >= 3.1.0). > > I've written a small demo about it and published it here, > http://rpubs.com/maechler/fast_lm > > Martin Maechler, ETH Zurich (and R Core) > > > > On Mon, Nov 16, 2015 at 3:25 PM, Sasikumar Kandhasamy < > ckmsasi at gmail.com> wrote: > >> Thanks a lot Bill & Bert. > >> > >> Hi Bill, > >> > >> Sorry i was wrong on number of records, actually, i am using two > dimensional > >> data of 250K records each. And regarding CPU usage, it was the > elapsed time. > >> Infact, i have pined one core to run R. > >> > >> Thanks & Regards > >> Sasi > >> > >> On Mon, Nov 16, 2015 at 2:04 PM, William Dunlap <wdunlap at tibco.com> > wrote: > >>> > >>> You cannot do a linear regression with one column of data - there > must > >>> be at least one response column and one predictor. By default, lm > >>> throws in a constant term which gives you a second predictor. If > your > >>> predictor is categorical, you get a new column for all but the > first > >>> unique value in it. > >>> > >>> lm() deals only with double precision data, at 8 bytes/number. > Thus > >>> 250k numbers occupies 2 million bytes. Your three columns (in the > >>> non-categorical-predictor case) take up 6 million bytes, > >>> > >>> lm()'s output contains several columns the size of the response > >>> variable: residuals, effects, and fitted.values. It also contains > the > >>> QR decomposition of the design matrix (the size of all the > predictor > >>> columns together). > >>> > >>> There are also some temporary variables generated in the course of > the > >>> computation. > >>> > >>> So your observed 40 MB memory usage seems reasonable. > >>> > >>> Use the object.size() function to see how big objects are and > str() to > >>> look at their structure. > >>> > >>> My laptop with a 2.5 GHz Intel i7 processor takes a quarter > second to > >>> fit a simple linear model with one numeric predictor and a constant > >>> term. 6 seconds sounds slow. Is that cpu or elapsed time (use > >>> system.time() to see)? > >>> > >>> > >>> > >>> Bill Dunlap > >>> TIBCO Software > >>> wdunlap tibco.com > >>> > >>> > >>> On Mon, Nov 16, 2015 at 12:25 PM, Sasikumar Kandhasamy > >>> <ckmsasi at gmail.com> wrote: > >>> > Hi All, > >>> > > >>> > I have couple of clarifications on R run-time performance. I have > >>> > R-3.2.2 > >>> > package compiled for MIPS64 and am running it on my linux > machine with > >>> > mips64 processor (core speed 1.5GHz) and observing the following > >>> > behaviors, > >>> > > >>> > 1. Applying "linear regression model" (lm) on 1MB of data > (contains 1 > >>> > column of 250K records) takes ~6 seconds to complete. Anyidea, > is it an > >>> > expected behavior or not? If not, can you please the suggestions > or > >>> > options > >>> > to improve if we have any? > >>> > > >>> > 2. Also, the R process runtime virtual memory is increased by > 40MB after > >>> > applying the linear model on 1MB data. Is it also expected > behavior? If > >>> > it > >>> > is expected, can you please share the insight of memory usage? > >>> > > >>> > Thanks in advance. > >>> > > >>> > Regards > >>> > Sasi > >>> > > >>> > [[alternative HTML version deleted]] > >>> > > >>> > ______________________________________________ > >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, > see > >>> > https://stat.ethz.ch/mailman/listinfo/r-help > >>> > PLEASE do read the posting guide > >>> > http://www.R-project.org/posting-guide.html > >>> > and provide commented, minimal, self-contained, reproducible > code. > >> > >> > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
That is what I meant about saving compute time and increasing programming time. You can do prediction by do the matrix multiplication explicitly. Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Nov 17, 2015 at 9:01 PM, Sasikumar Kandhasamy <ckmsasi at gmail.com> wrote:> Thanks a lot, Martin and William. Looks like, we can't apply prediction on > lsfit and lm.fit objects. Because, i am trying to use lm object to predict > the values for new data frame. > > Thanks & Regards > Sasi > > On Tue, Nov 17, 2015 at 9:49 AM, Martin Maechler > <maechler at stat.math.ethz.ch> wrote: >> >> >>>>> William Dunlap <wdunlap at tibco.com> >> >>>>> on Mon, 16 Nov 2015 16:01:42 -0800 writes: >> >> > If a quick running time is important and your models involve only >> > numeric data with no missing values and you are willing to spend >> more >> > programming time setting things up, the lsfit() function may work >> > better for you. >> >> > Bill Dunlap >> > TIBCO Software >> > wdunlap tibco.com >> >> or even faster is the extra-simple but fast .lm.fit() function >> (in R >= 3.1.0). >> >> I've written a small demo about it and published it here, >> http://rpubs.com/maechler/fast_lm >> >> Martin Maechler, ETH Zurich (and R Core) >> >> >> > On Mon, Nov 16, 2015 at 3:25 PM, Sasikumar Kandhasamy >> <ckmsasi at gmail.com> wrote: >> >> Thanks a lot Bill & Bert. >> >> >> >> Hi Bill, >> >> >> >> Sorry i was wrong on number of records, actually, i am using two >> dimensional >> >> data of 250K records each. And regarding CPU usage, it was the >> elapsed time. >> >> Infact, i have pined one core to run R. >> >> >> >> Thanks & Regards >> >> Sasi >> >> >> >> On Mon, Nov 16, 2015 at 2:04 PM, William Dunlap <wdunlap at tibco.com> >> wrote: >> >>> >> >>> You cannot do a linear regression with one column of data - there >> must >> >>> be at least one response column and one predictor. By default, lm >> >>> throws in a constant term which gives you a second predictor. If >> your >> >>> predictor is categorical, you get a new column for all but the >> first >> >>> unique value in it. >> >>> >> >>> lm() deals only with double precision data, at 8 bytes/number. >> Thus >> >>> 250k numbers occupies 2 million bytes. Your three columns (in the >> >>> non-categorical-predictor case) take up 6 million bytes, >> >>> >> >>> lm()'s output contains several columns the size of the response >> >>> variable: residuals, effects, and fitted.values. It also contains >> the >> >>> QR decomposition of the design matrix (the size of all the >> predictor >> >>> columns together). >> >>> >> >>> There are also some temporary variables generated in the course of >> the >> >>> computation. >> >>> >> >>> So your observed 40 MB memory usage seems reasonable. >> >>> >> >>> Use the object.size() function to see how big objects are and >> str() to >> >>> look at their structure. >> >>> >> >>> My laptop with a 2.5 GHz Intel i7 processor takes a quarter >> second to >> >>> fit a simple linear model with one numeric predictor and a >> constant >> >>> term. 6 seconds sounds slow. Is that cpu or elapsed time (use >> >>> system.time() to see)? >> >>> >> >>> >> >>> >> >>> Bill Dunlap >> >>> TIBCO Software >> >>> wdunlap tibco.com >> >>> >> >>> >> >>> On Mon, Nov 16, 2015 at 12:25 PM, Sasikumar Kandhasamy >> >>> <ckmsasi at gmail.com> wrote: >> >>> > Hi All, >> >>> > >> >>> > I have couple of clarifications on R run-time performance. I >> have >> >>> > R-3.2.2 >> >>> > package compiled for MIPS64 and am running it on my linux >> machine with >> >>> > mips64 processor (core speed 1.5GHz) and observing the following >> >>> > behaviors, >> >>> > >> >>> > 1. Applying "linear regression model" (lm) on 1MB of data >> (contains 1 >> >>> > column of 250K records) takes ~6 seconds to complete. Anyidea, >> is it an >> >>> > expected behavior or not? If not, can you please the suggestions >> or >> >>> > options >> >>> > to improve if we have any? >> >>> > >> >>> > 2. Also, the R process runtime virtual memory is increased by >> 40MB after >> >>> > applying the linear model on 1MB data. Is it also expected >> behavior? If >> >>> > it >> >>> > is expected, can you please share the insight of memory usage? >> >>> > >> >>> > Thanks in advance. >> >>> > >> >>> > Regards >> >>> > Sasi >> >>> > >> >>> > [[alternative HTML version deleted]] >> >>> > >> >>> > ______________________________________________ >> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, >> see >> >>> > https://stat.ethz.ch/mailman/listinfo/r-help >> >>> > PLEASE do read the posting guide >> >>> > http://www.R-project.org/posting-guide.html >> >>> > and provide commented, minimal, self-contained, reproducible >> code. >> >> >> >> >> >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > >