Hi all, I have R installed on a box, which is running on a machine with 16 core and Redhat - Linux. I am handling huge (size of dataset will be 5 GB) dataset. Lets assume that my data is in the form of structured (multiple) logs. I access the data by using all.files(). Since by default basic version of R utilizes single core, the processing of my analysis code is taking too much time. I got to know that mclapply() can be used to use all cores (processors) to make R much faster when we have multicores. Can anyone help me in understanding how to use mclapply() function in the above situation. Thanks in advance Regards, Madana -- View this message in context: http://r.789695.n4.nabble.com/R-on-Multicore-for-Linux-tp3682318p3682318.html Sent from the R help mailing list archive at Nabble.com.
Make this reproducible. On Wed, Jul 20, 2011 at 6:44 PM, Madana_Babu <madana_babu at infosys.com> wrote:> Hi all, > > I have R installed on a box, which is running on a machine with 16 core and > Redhat - Linux. I am handling huge (size of dataset will be 5 GB) dataset. > Lets assume that my data is in the form of structured (multiple) logs. I > access the data by using all.files(). Since by default basic version of R > utilizes single core, the processing of my analysis code is taking too much > time. I got to know that mclapply() can be used to use all cores > (processors) to make R much faster when we have multicores. Can anyone help > me in understanding how to use mclapply() function in the above situation. > > Thanks in advance > > Regards, > Madana > > -- > View this message in context: http://r.789695.n4.nabble.com/R-on-Multicore-for-Linux-tp3682318p3682318.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Stephen Sefick ____________________________________ | Auburn University? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? | | Biological Sciences ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ?? ? ? ?| | 331 Funchess Hall? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| | Auburn, Alabama? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ??? | | 36849? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? | |___________________________________| | sas0025 at auburn.edu? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ?| | http://www.auburn.edu/~sas0025? ? ? ? ? ?? ? ? | |___________________________________| Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods.? We are mammals, and have not exhausted the annoying little problems of being mammals. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -K. Mullis "A big computer, a complex algorithm and a long time does not equal science." ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -Robert Gentleman
I have good experiences with the foreach package, available on cran. It includes some tutorials which might help you. cheers, Paul On 07/20/2011 11:44 PM, Madana_Babu wrote:> Hi all, > > I have R installed on a box, which is running on a machine with 16 core and > Redhat - Linux. I am handling huge (size of dataset will be 5 GB) dataset. > Lets assume that my data is in the form of structured (multiple) logs. I > access the data by using all.files(). Since by default basic version of R > utilizes single core, the processing of my analysis code is taking too much > time. I got to know that mclapply() can be used to use all cores > (processors) to make R much faster when we have multicores. Can anyone help > me in understanding how to use mclapply() function in the above situation. > > Thanks in advance > > Regards, > Madana > > -- > View this message in context: http://r.789695.n4.nabble.com/R-on-Multicore-for-Linux-tp3682318p3682318.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
On Thu, Jul 21, 2011 at 1:44 AM, Madana_Babu <madana_babu@infosys.com>wrote:> Hi all, > > I have R installed on a box, which is running on a machine with 16 core and > Redhat - Linux. I am handling huge (size of dataset will be 5 GB) dataset. > Lets assume that my data is in the form of structured (multiple) logs. I > access the data by using all.files(). Since by default basic version of R > utilizes single core, the processing of my analysis code is taking too much > time. I got to know that mclapply() can be used to use all cores > (processors) to make R much faster when we have multicores. Can anyone help > me in understanding how to use mclapply() function in the above situation. >mclapply() works in the same way as lapply() - if you use lapply, simply replace it with mclapply, if you are using a loop, translate it into an lapply / mclapply structure. But be aware, that the bottleneck might be disk access!. So: rprof is your friend. Cheers, Rainer> Thanks in advance > > Regards, > Madana > > -- > View this message in context: > http://r.789695.n4.nabble.com/R-on-Multicore-for-Linux-tp3682318p3682318.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax (F): +33 - (0)9 58 10 27 44 Fax (D): +49 - (0)3 21 21 25 22 44 email: Rainer@krugs.de Skype: RMkrug [[alternative HTML version deleted]]
Hi all, Currently i am trying to this on R which is running on multicore processor. I am not sure how to use mclapply() function on this task. Can anyone help me. # Setting up directory setwd("/XXX/XXXXXXXX/XXXX/XXXX/2011/07/20") library(sqldf) # Data is available in the form of multiple structured log files (nearly 10K log files) # I am using the following syntax to get required fields and aggregations from the logs and creating a file called DF (with 3 columns V2, V14 and Min(V16)) a <- list.files(path = ".", pattern = "2011-07-20", all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE) DF <- NULL for (f in a) { dat <- read.csv(f, header=FALSE, sep="\t", na.strings="",dec=".", strip.white=TRUE, fill=TRUE) data_1 <- sqldf("SELECT V2, V14, MIN(V16) FROM dat WHERE V6=104 GROUP BY V2, V14") DF <- rbind(DF, data_1) } # Currently this process is taking almost 3 Hrs for me. Can anyone help me to use mclapply() on this operation and get this process completed asap. Request you to provide me the syntax. Thanks in advance Regards, Madana -- View this message in context: http://r.789695.n4.nabble.com/R-on-Multicore-for-Linux-tp3682318p3684736.html Sent from the R help mailing list archive at Nabble.com.