Paul Bernal
2020-May-06 21:20 UTC
[R] Working with very large datasets and generating an executable file
Dear R friends, Hope you are doing well. I have two questions, the first one is, can I work with very large datasets in R? That is, say I need to test several machine learning algorithms, like (random forest, multiple linear regression, etc.) on datasets having between 50 to 100 columns and 20 million observations, is there any way that R can handle data that large? The second question is, is there a way I can develop an R model and turn it into an executable program that can work on any OS? Any help and/or guidance will be greatly appreciated, Best regards, Paul [[alternative HTML version deleted]]
Jeff Newmiller
2020-May-06 23:22 UTC
[R] Working with very large datasets and generating an executable file
Large data... yes, though how this can be done may vary. I have used machines with 128G of RAM before with no special big data packages. Making an executable... theoretically, yes, though there are some significant technical (and possibly legal) challenges that will most likely make you question whether it was worth it if you try, particularly if your intent is to obscure your code from the recipient. I (as a random user and programmer on the Internet) would strongly discourage such efforts... it will almost certainly be more practical to deliver code in script/package form. On May 6, 2020 2:20:47 PM PDT, Paul Bernal <paulbernal07 at gmail.com> wrote:>Dear R friends, > >Hope you are doing well. I have two questions, the first one is, can I >work >with very large datasets in R? That is, say I need to test several >machine >learning algorithms, like (random forest, multiple linear regression, >etc.) >on datasets having between 50 to 100 columns and 20 million >observations, >is there any way that R can handle data that large? > >The second question is, is there a way I can develop an R model and >turn it >into an executable program that can work on any OS? > >Any help and/or guidance will be greatly appreciated, > >Best regards, > >Paul > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Bert Gunter
2020-May-06 23:30 UTC
[R] Working with very large datasets and generating an executable file
To supplement Jeff's comments: Big Data: https://CRAN.R-project.org/view=HighPerformanceComputing To deploy models: https://cran.r-project.org/web/views/ModelDeployment.html Opinion: Executables are a security risk. I wouldn't touch one unless from a trusted source. I think I understand what you want to do, but I would second Jeff's comment about using R packages. Don't bother to disagree with me -- just dismiss if you do-- as this is wandering O/T anyway. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, May 6, 2020 at 4:23 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> > Large data... yes, though how this can be done may vary. I have used machines with 128G of RAM before with no special big data packages. > > Making an executable... theoretically, yes, though there are some significant technical (and possibly legal) challenges that will most likely make you question whether it was worth it if you try, particularly if your intent is to obscure your code from the recipient. I (as a random user and programmer on the Internet) would strongly discourage such efforts... it will almost certainly be more practical to deliver code in script/package form. > > On May 6, 2020 2:20:47 PM PDT, Paul Bernal <paulbernal07 at gmail.com> wrote: > >Dear R friends, > > > >Hope you are doing well. I have two questions, the first one is, can I > >work > >with very large datasets in R? That is, say I need to test several > >machine > >learning algorithms, like (random forest, multiple linear regression, > >etc.) > >on datasets having between 50 to 100 columns and 20 million > >observations, > >is there any way that R can handle data that large? > > > >The second question is, is there a way I can develop an R model and > >turn it > >into an executable program that can work on any OS? > > > >Any help and/or guidance will be greatly appreciated, > > > >Best regards, > > > >Paul > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Abby Spurdle
2020-May-07 03:34 UTC
[R] Working with very large datasets and generating an executable file
> The second question is, is there a way I can develop an R model and turn it > into an executable program that can work on any OS?------myrscript.c-------- int main (int argc, char* argv []) { system ("Rscript myrscript.r"); return 0; } ------------------------- command line > gcc -o myrscript.exe myrscript.c command line > myrscript.exe
Paul Bernal
2020-May-07 05:46 UTC
[R] Working with very large datasets and generating an executable file
Thank you Abby! Cheers! El mi?., 6 de mayo de 2020 10:35 p. m., Abby Spurdle <spurdle.a at gmail.com> escribi?:> > The second question is, is there a way I can develop an R model and turn > it > > into an executable program that can work on any OS? > > ------myrscript.c-------- > int main (int argc, char* argv []) > { system ("Rscript myrscript.r"); > return 0; > } > ------------------------- > > command line > gcc -o myrscript.exe myrscript.c > command line > myrscript.exe >[[alternative HTML version deleted]]
Paul Bernal
2020-May-07 05:53 UTC
[R] Working with very large datasets and generating an executable file
Dear Jeff, Thank you for the feedback. So, after reading your comments, it seems that, in order to develop an executable model that could be run in any OS, python might be the way to go then? I appreciate all of your valuable responses. Best regards, Paul El mi?., 6 de mayo de 2020 6:22 p. m., Jeff Newmiller < jdnewmil at dcn.davis.ca.us> escribi?:> Large data... yes, though how this can be done may vary. I have used > machines with 128G of RAM before with no special big data packages. > > Making an executable... theoretically, yes, though there are some > significant technical (and possibly legal) challenges that will most likely > make you question whether it was worth it if you try, particularly if your > intent is to obscure your code from the recipient. I (as a random user and > programmer on the Internet) would strongly discourage such efforts... it > will almost certainly be more practical to deliver code in script/package > form. > > On May 6, 2020 2:20:47 PM PDT, Paul Bernal <paulbernal07 at gmail.com> wrote: > >Dear R friends, > > > >Hope you are doing well. I have two questions, the first one is, can I > >work > >with very large datasets in R? That is, say I need to test several > >machine > >learning algorithms, like (random forest, multiple linear regression, > >etc.) > >on datasets having between 50 to 100 columns and 20 million > >observations, > >is there any way that R can handle data that large? > > > >The second question is, is there a way I can develop an R model and > >turn it > >into an executable program that can work on any OS? > > > >Any help and/or guidance will be greatly appreciated, > > > >Best regards, > > > >Paul > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. >[[alternative HTML version deleted]]
Paul Bernal
2020-May-07 06:40 UTC
[R] Working with very large datasets and generating an executable file
That could be the answer, yes. El jue., 7 de mayo de 2020 1:22 a. m., <cpolwart at chemo.org.uk> escribi?:> Or maybe a Shiny Application? > > On 7 May 2020 06:53, Paul Bernal <paulbernal07 at gmail.com> wrote: > > Dear Jeff, > > Thank you for the feedback. So, after reading your comments, it seems > that, > in order to develop an executable model that could be run in any OS, > python > might be the way to go then? > > I appreciate all of your valuable responses. > > Best regards, > > Paul > > El mi?., 6 de mayo de 2020 6:22 p. m., Jeff Newmiller < > jdnewmil at dcn.davis.ca.us> escribi?: > > > Large data... yes, though how this can be done may vary. I have used > > machines with 128G of RAM before with no special big data packages. > > > > Making an executable... theoretically, yes, though there are some > > significant technical (and possibly legal) challenges that will most > likely > > make you question whether it was worth it if you try, particularly if > your > > intent is to obscure your code from the recipient. I (as a random user > and > > programmer on the Internet) would strongly discourage such efforts... it > > will almost certainly be more practical to deliver code in > script/package > > form. > > > > On May 6, 2020 2:20:47 PM PDT, Paul Bernal <paulbernal07 at gmail.com> > wrote: > > >Dear R friends, > > > > > >Hope you are doing well. I have two questions, the first one is, can I > > >work > > >with very large datasets in R? That is, say I need to test several > > >machine > > >learning algorithms, like (random forest, multiple linear regression, > > >etc.) > > >on datasets having between 50 to 100 columns and 20 million > > >observations, > > >is there any way that R can handle data that large? > > > > > >The second question is, is there a way I can develop an R model and > > >turn it > > >into an executable program that can work on any OS? > > > > > >Any help and/or guidance will be greatly appreciated, > > > > > >Best regards, > > > > > >Paul > > > > > > [[alternative HTML version deleted]] > > > > > >______________________________________________ > > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >https://stat.ethz.ch/mailman/listinfo/r-help > > >PLEASE do read the posting guide > > >http://www.R-project.org/posting-guide.html > > >and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Sent from my phone. Please excuse my brevity. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >[[alternative HTML version deleted]]