Dear all, I would like to ask your advice about a suitable computer for the following usage. I (am starting to) work with moderately big data in R: - cca 2 - 20 million rows * 100 - 1000 columns (market basket data) - mainly clustering, classification trees, association analysis (e.g. libraries rpart, cba, proxy, party) Can you recommend a sufficient computer for this volume? I am routinely working in Windows but feel that Mac or some linux machine might be needed. Please, respond directly to my email. Many thanks! Zdenek Skala zdenek.skala@gfk.com [[alternative HTML version deleted]]
Hi> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Sk?la, Zden?k (INCOMA GfK) > Sent: Friday, October 05, 2012 3:38 PM > To: r-help at r-project.org > Subject: [R] R: machine for moderately large data > > Dear all, > > I would like to ask your advice about a suitable computer for the > following usage. > I (am starting to) work with moderately big data in R: > - cca 2 - 20 million rows * 100 - 1000 columns (market basket data) > - mainly clustering, classification trees, association analysis (e.g. > libraries rpart, cba, proxy, party)If I compute correctly, such a big matrix (20e6*1000) needs about 160 GB just to be in memory. Are you prepared for this? Maybe some suitable database interface shall be preferable. Regards Petr> > Can you recommend a sufficient computer for this volume? > I am routinely working in Windows but feel that Mac or some linux > machine might be needed. > > Please, respond directly to my email. > Many thanks! > > Zdenek Skala > zdenek.skala at gfk.com > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
On Fri, Oct 5, 2012 at 12:09 PM, PIKAL Petr <petr.pikal at precheza.cz> wrote:> Hi > >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- >> project.org] On Behalf Of Sk?la, Zden?k (INCOMA GfK) >> Sent: Friday, October 05, 2012 3:38 PM >> To: r-help at r-project.org >> Subject: [R] R: machine for moderately large data >> >> Dear all, >> >> I would like to ask your advice about a suitable computer for the >> following usage. >> I (am starting to) work with moderately big data in R: >> - cca 2 - 20 million rows * 100 - 1000 columns (market basket data) >> - mainly clustering, classification trees, association analysis (e.g. >> libraries rpart, cba, proxy, party) > > If I compute correctly, such a big matrix (20e6*1000) needs about 160 GB just to be in memory. Are you prepared for this?This is not as outrageous as one might think -- you can get a mac pro with 32 gigs of memory for around $3,500 Best, Ista> > Maybe some suitable database interface shall be preferable. > > Regards > Petr > >> >> Can you recommend a sufficient computer for this volume? >> I am routinely working in Windows but feel that Mac or some linux >> machine might be needed. >> >> Please, respond directly to my email. >> Many thanks! >> >> Zdenek Skala >> zdenek.skala at gfk.com >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.