Has anyone got any advice about what hardware to buy to run lots of R analysis? Links to studies or other documents would be great as would be personal opinion. We are not currently certain what analysis we shall be running, but our first implementation uses the functions lme and gls from the library nlme. To do one data point currently takes 1.5 seconds on our 3 year old sunfire box, and the data points are completely independant so the analysis is fully parallelisable without implmenting multi-threading within each data point. We have a reasnoble amount of sys admin support in house. We are an academic institution. We are looking at spending a few thousand to a small number of tens of thousands of dollars. Any help greatly appreciated This email may have a PROTECTIVE MARKING, for an explanation please see: http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm
How many data points do you have? -- View this message in context: http://r.789695.n4.nabble.com/What-is-the-most-cost-effective-hardware-for-R-tp4617155p4617187.html Sent from the R help mailing list archive at Nabble.com.
R. Michael Weylandt
2012-May-08 13:01 UTC
[R] What is the most cost effective hardware for R?
I think the general experience is that R is going to be more memory-hungry than other resources so you'll get the best bang for your buck on that end. R also has good parallelization support: that and other high performance concerns are addressed here: http://cran.r-project.org/web/views/HighPerformanceComputing.html Performance (as it is for most computationally expensive tasks) will likely be better under Linux and you'll get good free help from R-SIG-Fedora and R-SIG-Debian if you pick one of those (in addition to whatever your sys admin can give) Michael On Tue, May 8, 2012 at 6:49 AM, Hugh Morgan <h.morgan at har.mrc.ac.uk> wrote:> Has anyone got any advice about what hardware to buy to run lots of R > analysis? ?Links to studies or other documents would be great as would be > personal opinion. > > We are not currently certain what analysis we shall be running, but our > first implementation uses the functions lme and gls from the library nlme. > ?To do one data point currently takes 1.5 seconds on our 3 year old sunfire > box, and the data points are completely independant so the analysis is fully > parallelisable without implmenting multi-threading within each data point. > ?We have a reasnoble amount of sys admin support in house. ?We are an > academic institution. ?We are looking at spending a few thousand to a small > number of tens of thousands of dollars. > > Any help greatly appreciated > > > This email may have a PROTECTIVE MARKING, for an explanation please see: > http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Barry Rowlingson
2012-May-08 13:04 UTC
[R] What is the most cost effective hardware for R?
On Tue, May 8, 2012 at 11:49 AM, Hugh Morgan <h.morgan at har.mrc.ac.uk> wrote:> Has anyone got any advice about what hardware to buy to run lots of R > analysis? ?Links to studies or other documents would be great as would be > personal opinion. > > We are not currently certain what analysis we shall be running, but our > first implementation uses the functions lme and gls from the library nlme. > ?To do one data point currently takes 1.5 seconds on our 3 year old sunfire > box, and the data points are completely independant so the analysis is fully > parallelisable without implmenting multi-threading within each data point. > ?We have a reasnoble amount of sys admin support in house. ?We are an > academic institution. ?We are looking at spending a few thousand to a small > number of tens of thousands of dollars. > > Any help greatly appreciatedWhy buy when you can rent? Unless your hardware is going to be running 24/7 doing these analyses then you are paying for it to sit idle. You might be better off purchasing computing time from Amazon or another cloud computing provider. If you need to run more analyses quickly, just buy some more virtual hosts. Also saves on needing to run a data center, hardware warranty costs, disposing 10U of rack-mounted obsolete hardware after five years etc. Obviously the rental cost includes all these things as expenses of the cloud computing provider, but they have massive economies of scale. I've not gone this way yet for any projects I've been involved with, but its becoming more of a possibility with every grant award we get... Barry
On 05/08/2012 12:14 PM, Zhou Fang wrote:> How many data points do you have? >Currently 200,000. We are likely to have 10 times that in 5 years.> Why buy when you can rent? Unless your hardware is going to be > running 24/7 doing these analyses then you are paying for it to sit > idle. You might be better off purchasing computing time from Amazon or > another cloud computing provider. If you need to run more analyses > quickly, just buy some more virtual hosts.Because of the nature of the funding we are likely to be better off buying. We are likely to be running most of the time, most of the analysis must be rerun as more data becomes available, and that is likely to happen a few times every week. Thank you for all the pointers, we shall consider them all. This email may have a PROTECTIVE MARKING, for an explanation please see: http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm
You should think about the cloud as a serious alternative. I completely agree with Barry. Unless you will utilize your machines (and by utilize, I mean 100% cpu usage) all the time (including weekends) you will probably better use your funds to purchase blocks of machines when you need to run your sim, and turn them off afterwards. There are some new packages that make it very easy to access the cloud from a local R session (in an lapply like way). Happy to point those out to you if you are interested... -Whit On Tue, May 8, 2012 at 11:50 AM, Hugh Morgan <h.morgan at har.mrc.ac.uk> wrote:> On 05/08/2012 12:14 PM, Zhou Fang wrote: >> >> How many data points do you have? >> > > Currently 200,000. ?We are likely to have 10 times that in 5 years. > > >> ?Why buy when you can rent? Unless your hardware is going to be >> running 24/7 doing these analyses then you are paying for it to sit >> idle. You might be better off purchasing computing time from Amazon or >> another cloud computing provider. If you need to run more analyses >> quickly, just buy some more virtual hosts. > > > Because of the nature of the funding we are likely to be better off buying. > ?We are likely to be running most of the time, most of the analysis must be > rerun as more data becomes available, and that is likely to happen a few > times every week. > > Thank you for all the pointers, we shall consider them all. > > > > This email may have a PROTECTIVE MARKING, for an explanation please see: > http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Jim and Michael, Thank you very much for replying. Here is the information about my data. I have a data frame, including more than 800 variables(columns) and 30000 cases(rows).400 of those variables are categorical variables. I used to use Rcmdr to convert variables, however, when the number of variables I need to convert getting more, I need to keep left clicking mouse to confirm the conversion (400 times) because as.factor() can only convert 1 variable at a time. So I am thinking if there is any faster way. Thank you very much. Best regards, ya ya From: R. Michael Weylandt Date: 2012-05-08 19:35 To: xinxi813 CC: r-help Subject: Re: [R] convert 400 numeric variables to categorical together How are they arranged currently? And should they be all one set of levels or different factor sets? Michael On Tue, May 8, 2012 at 12:32 PM, ya <xinxi813@163.com> wrote:> Hi everyone, > > Is there anyway I can convert more than 400 numeric variables to categorical variables simultaneously? > > as.factor() is really slow, and only one at a time. > > Thank you very much. > > ya > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Here is an example that may help. I found the idea somewhere in the R-help archives but don't have a reference any more. mydata <- data.frame(a1 = 1:5, a2 = 2:6, a3 = 3:7) str(mydata) mydata[, 1:2] <- lapply(mydata[,1:2], factor) str(mydata) so basically all you need to do is specific what columns you want to convert to factors and use lapply. Good luck and BTW Rcmdr is a nice GUI but it is much more effiient in the long run to use a command line interface, combinded with a good text editior (Tinn-R is nice for Windows) John Kane Kingston ON Canada> -----Original Message----- > From: xinxi813 at 163.com > Sent: Tue, 8 May 2012 20:01:28 +0300 > To: r-help at r-project.org > Subject: [R] convert 400 numeric variables to categorical together > > > > > > > Hi Jim and Michael, > > Thank you very much for replying. > > Here is the information about my data. I have a data frame, including > more than 800 variables(columns) and 30000 cases(rows).400 of those > variables are categorical variables. I used to use Rcmdr to convert > variables, however, when the number of variables I need to convert > getting more, I need to keep left clicking mouse to confirm the > conversion (400 times) because as.factor() can only convert 1 variable at > a time. So I am thinking if there is any faster way. Thank you very much. > > Best regards, > > ya > > > > ya > > From: R. Michael Weylandt > Date: 2012-05-08 19:35 > To: xinxi813 > CC: r-help > Subject: Re: [R] convert 400 numeric variables to categorical together > How are they arranged currently? And should they be all one set of > levels or different factor sets? > > Michael > > On Tue, May 8, 2012 at 12:32 PM, ya <xinxi813 at 163.com> wrote: >> Hi everyone, >> >> Is there anyway I can convert more than 400 numeric variables to >> categorical variables simultaneously? >> >> as.factor() is really slow, and only one at a time. >> >> Thank you very much. >> >> ya >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails
David L Carlson
2012-May-08 17:59 UTC
[R] convert 400 numeric variables to categorical together
Assuming the 400 numeric variables are integers this will be simpler if you can identify the columns to be converted to factors as a block of column numbers (e.g. 1:400, or 401:800) # Create some data X <- data.frame(matrix(nrow=20, ncol=20)) for (i in 1:10) X[,i] <- round(runif(20, .5, 5.5), 0) for (i in 11:20) X[,i] <- rnorm(20) str(X) # structure of X # convert numbers to factors for the first 10 columns X2 <- X for (i in 1:10) X2[,i] <- factor(X[,i]) str(X2) ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of ya > Sent: Tuesday, May 08, 2012 12:01 PM > To: r-help > Subject: [R] convert 400 numeric variables to categorical together > > > > > > > Hi Jim and Michael, > > Thank you very much for replying. > > Here is the information about my data. I have a data frame, including > more than 800 variables(columns) and 30000 cases(rows).400 of those > variables are categorical variables. I used to use Rcmdr to convert > variables, however, when the number of variables I need to convert > getting more, I need to keep left clicking mouse to confirm the > conversion (400 times) because as.factor() can only convert 1 variable > at a time. So I am thinking if there is any faster way. Thank you very > much. > > Best regards, > > ya > > > > ya > > From: R. Michael Weylandt > Date: 2012-05-08 19:35 > To: xinxi813 > CC: r-help > Subject: Re: [R] convert 400 numeric variables to categorical together > How are they arranged currently? And should they be all one set of > levels or different factor sets? > > Michael > > On Tue, May 8, 2012 at 12:32 PM, ya <xinxi813 at 163.com> wrote: > > Hi everyone, > > > > Is there anyway I can convert more than 400 numeric variables to > categorical variables simultaneously? > > > > as.factor() is really slow, and only one at a time. > > > > Thank you very much. > > > > ya > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Barry Rowlingson
2012-May-09 13:36 UTC
[R] What is the most cost effective hardware for R?
On Wed, May 9, 2012 at 2:22 PM, John Laing <john.laing at gmail.com> wrote:> For 200,000 analyses at 1.5 seconds each, you're looking at ~83 hours > of computing time. You can buy time from Amazon at roughly $0.08 / > core / hour, so it would cost about $7 to run your analyses in the > cloud. Assuming complete parallelization you could fire up as many > machines as you need to get the work done in as little time as you > want, with the same fixed cost. I think that's a pretty compelling > argument, compared to the hassles of buying and maintaining hardware, > power supply, air conditioning, etc.Noticing Hugh's .ac.uk email address you do have to factor in the hassle of getting something as nebulous as cloud computing past the red tape. "How much will it cost?" says the bureaucrat. "Depends how much CPU time I need", says the academic. "So potentially, what's the most?" says the bureaucrat. "Millions,", says the academic, honestly, adding "but that would only be if my job scheduling went a bit mad and grabbed a few thousand Amazon cores and thrashed them for weeks without me noticing". "Okay", says the bureaucrat, "now, can we send Amazon a purchase order so that Amazon send us an invoice for this unknown and potentially unpredictable cost first?". "Oh no", says the academic, "we need a credit card...". Maybe there are other ways of paying for Amazon cloud CPUs, I've not investigated. Anyone in academia happily crunching on EC2? Barry