I have recently been using R - more speciifcally the GUI packages Rattle and Rcmdr. I like these products a lot and want to use them for some projects - the problem that I run into is when I start to try and run large datasets through them. The data sets are 10-15 million in record quantity and usually have 15-30 fields (both numerical and categorical). I saw that there were some packages that could deal with large datasets in R - bigmemory, ff, ffdf, biganalytics. My problem is that I am not much of a coder (and the reason I use the above mentioned GUIs). These GUIs do show the executable R code in the background - my thought was to run a small sample through the GUI, copy the code, and then incorporate some of the large data packages mentioned above - have anyone every tried to do this and would you have working examples. In terms of what I am trying to do to the data - really simple stuff - desriptive statistics, k-means clustering, and possibly some decision trees. Any help would be greatly appreciated. Thank you - John John Filben Cell Phone - 773.401.2822 Email - johnfilben@yahoo.com [[alternative HTML version deleted]]
I have not ever tried to use any GUI package. Thus, I cannot give you a good help. Instead, I would like to report my experience of exploiting the 'ff' package to have access to large dataset. To achieve your goal, I think that you need to make any function which handles ff objects. According to my experience, when I created a function which handles ff objects, it could not recognize these ff objects correctly inside the function. If you encounter such problems, you can refer to this article. http://wonsangyou.blogspot.com/2011/01/fast-access-to-large-database-in-r.html 2011/2/11 John Filben <johnfilben@yahoo.com>> I have recently been using R - more speciifcally the GUI packages Rattle > and Rcmdr. > > I like these products a lot and want to use them for some projects - the > problem > that I run into is when I start to try and run large datasets through > them. The > data sets are 10-15 million in record quantity and usually have 15-30 > fields > (both numerical and categorical). > > I saw that there were some packages that could deal with large datasets in > R - > bigmemory, ff, ffdf, biganalytics. My problem is that I am not much of a > coder > (and the reason I use the above mentioned GUIs). These GUIs do show > the executable R code in the background - my thought was to run a small > sample > through the GUI, copy the code, and then incorporate some of the large data > packages mentioned above - have anyone every tried to do this and would you > have > working examples. In terms of what I am trying to do to the data - really > simple stuff - desriptive statistics, k-means clustering, and possibly some > decision trees. Any help would be greatly appreciated. > > Thank you - John > John Filben > Cell Phone - 773.401.2822 > Email - johnfilben@yahoo.com > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
On Feb 11, 2011, at 7:51 AM, John Filben wrote:> I have recently been using R - more speciifcally the GUI packages > Rattle > and Rcmdr. > > I like these products a lot and want to use them for some projects - > the problem > that I run into is when I start to try and run large datasets > through them. The > data sets are 10-15 million in record quantity and usually have > 15-30 fields > (both numerical and categorical).You could instead just buy memory. 32GB ought to be sufficient for descriptives and regression. Might even get away with 24.> > I saw that there were some packages that could deal with large > datasets in R - > bigmemory, ff, ffdf, biganalytics. My problem is that I am not much > of a coder > (and the reason I use the above mentioned GUIs). These GUIs do show > the executable R code in the background - my thought was to run a > small sample > through the GUI, copy the code, and then incorporate some of the > large data > packages mentioned above - have anyone every tried to do this and > would you have > working examples. In terms of what I am trying to do to the data - > really > simple stuff - desriptive statistics,Should be fine here.> k-means clustering, and possibly some decision trees.Not sure how well those scale to tasks as large as what you propose, especially since you don't mention packages or functions. Not sure they don't, either. -- David.> Any help would be greatly appreciated. > > Thank you - John > John Filben-- David Winsemius, MD West Hartford, CT