I have found more general questions, but I have a specific one. I have a few million (independent) short regressions that I would like to run (each reg has about 60 observations, though they can have missing observations [yikes]). So, I would like to be running as many `lm` and `coef(lm)` in parallel as possible. my hardware is Mac, with nice GPUs and integrated memory --- and so far completely useless to me. `mclapply` is obviously very useful, but I want more, more, more cores. is there a recommended plug-in library to speed up just `lm` by also using the GPU cores?
Not a direct answer but you may find lm.fit worth experimenting with. Also try the high-performance computing task view on CRAN Cheers, Andrew -- Andrew Robinson Chief Executive Officer, CEBRA and Professor of Biosecurity, School/s of BioSciences and Mathematics & Statistics University of Melbourne, VIC 3010 Australia Tel: (+61) 0403 138 955 Email: apro at unimelb.edu.au Website: https://researchers.ms.unimelb.edu.au/~apro at unimelb/ I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders. On 14 Nov 2024 at 1:13?PM +0100, Ivo Welch <ivo.welch at gmail.com>, wrote: External email: Please exercise caution I have found more general questions, but I have a specific one. I have a few million (independent) short regressions that I would like to run (each reg has about 60 observations, though they can have missing observations [yikes]). So, I would like to be running as many `lm` and `coef(lm)` in parallel as possible. my hardware is Mac, with nice GPUs and integrated memory --- and so far completely useless to me. `mclapply` is obviously very useful, but I want more, more, more cores. is there a recommended plug-in library to speed up just `lm` by also using the GPU cores? ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Just curious... 1) Are these just simple linear regression fits? 2) How many different patterns of missingness are there? What I am thinking is that parallel processing works best on similar tasks, so data shuffling could easily out drown out the actual regression computations. What you might do could be to group the data according to the pattern of missing values, and then for each group work out the regressions in matrix form - essentially B = (X'X)^{-1}X', but could do smarter - and then do a million regressions as BY where Y has a million columns. -pd> On 13 Nov 2024, at 21:02 , Ivo Welch <ivo.welch at gmail.com> wrote: > > I have found more general questions, but I have a specific one. I > have a few million (independent) short regressions that I would like > to run (each reg has about 60 observations, though they can have > missing observations [yikes]). So, I would like to be running as many > `lm` and `coef(lm)` in parallel as possible. my hardware is Mac, with > nice GPUs and integrated memory --- and so far completely useless to > me. `mclapply` is obviously very useful, but I want more, more, more > cores. > > is there a recommended plug-in library to speed up just `lm` by also > using the GPU cores? > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com