Hi, I am new to R, but a fairly `old' user of Stata. I read posts asking about survey methods and large datasets in the archive, so I will not ask those questions again. But some still remain: - R seems to consume more memory given the same set of data, say if I have only a data frame defined, than Stata. Am I right if I think that this is because the object oriented nature of R and can not be overcome (ie., sort of traedoff between efficiency and complexity as with assembly <-> C <-> C++/Java)? - If not, is it a design goal of the developers to do speed/memory optimization (apart from dynamic memory allocation, which, as I understand orthogonal to this problem) - Since sometimes I need to use modestly really large datasets (60000*300 matrix), I wonder if I can do that in R at all? More adequately: is R scalable without limits by brute force (adding more CPU/RAM)? - I noted, that R can use SQL datasources. Since it is really the case that one have to use both huge amount of records _and_ variables, an SQL+R combination might be one for me. Is it right? How fast would this be? - Browsing the package lists, I have not seen a library for hypothesis testing. Everybody builds it from primitives or serious people do not do this at all? - Generally: how do you think somebody in survey/econometrics could use R? (Answer: It depends on.... I know. But this is not the motivation. Stata is the last residuum in my GNU system. I would like to get rid of it...) BTW, I compiled R on the Hurd (http://hurd.gnu.org). It compiled flawlessly, but I was not able to test it because an X failure I have not been able to track down yet (X is somewhat immature in the Hurd). Thank you in advance, Zsombor Cseres-Gergely PS: please (also) cc. me the reply, because I am not on the list (yet) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
As much as I'm a fan of R, I _really_ like Stata for numerous reasons (great user support, etc), wonderful breadth of analysis tools. I don't see R as replacing Stata (at least not at one of the groups where I work at, which is mostly consulting/collab), but I do see it augmenting Stata, especially with respect to graphics (which are horrid under Stata). best, -tony -- A.J. Rossini Rsrch. Asst. Prof. of Biostatistics BlindGlobe Networks (home/default) rossini at blindglobe.net UW Biostat/Center for AIDS Research rossini at u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org FHCRC: M/Tu: 206-667-7025 (fax=4812) | Voicemail is pretty sketchy CFAR: W/F: 206-731-3647 (fax=3694) | Email is far better than phone UW: Th/F: 206-543-1044 (fax=3286) | Change last 4 digits of phone for fax -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Mon, 30 Oct 2000, Zsombor Cseres-Gergely wrote:> Hi, > > I am new to R, but a fairly `old' user of Stata. I read posts asking about > survey methods and large datasets in the archive, so I will not ask those > questions again. But some still remain: > - R seems to consume more memory given the same set of data, say if I have > only a data frame defined, than Stata. Am I right if I think that this is > because the object oriented nature of R and can not be overcome (ie., sort > of traedoff between efficiency and complexity as with assembly <-> C <-> > C++/Java)?Yes it is unavoidable though it isn't because of object orientedness. Stata saves memory by allowing only one rectangular dataset at a time, which simplifies things a lot. Stata also has a simpler programming language.> - If not, is it a design goal of the developers to do speed/memory > optimization (apart from dynamic memory allocation, which, as I understand > orthogonal to this problem)There is ongoing speed/memory optimisation, but it's not going to make a huge difference to most problems.> - Since sometimes I need to use modestly really large datasets (60000*300 > matrix), I wonder if I can do that in R at all? More adequately: is R > scalable without limits by brute force (adding more CPU/RAM)?R is scalable up to at least 2Gb of memory (perhaps more now on 64bit machines). It does not scale with added CPUs. Faster CPUs help, of course.> - Browsing the package lists, I have not seen a library for hypothesis > testing. Everybody builds it from primitives or serious people do not do > this at all?library(ctest) is in the main R distribution. It has a lot of classical hypothesis tests.> BTW, I compiled R on the Hurd (http://hurd.gnu.org). It compiled flawlessly, > but I was not able to test it because an X failure I have not been able to > track down yet (X is somewhat immature in the Hurd).It's good to hear that R works with the Hurd. -thomas Thomas Lumley Assistant Professor, Biostatistics University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Mon, 30 Oct 2000, Zsombor Cseres-Gergely wrote: To chip in a few points not already answered:> I am new to R, but a fairly `old' user of Stata. I read posts asking about > survey methods and large datasets in the archive, so I will not ask those > questions again. But some still remain:> - If not, is it a design goal of the developers to do speed/memory > optimization (apart from dynamic memory allocation, which, as I understand > orthogonal to this problem)I think dynamic memory allocation is very pertinent. At present you need to allocate to R at start up the maximum memory needed, and if large that can hit performance badly on some systems. Under the system under test for 1.2.0 you only get large memory usage if you need it, and (hopefully when the tuning is finished) not when you don't.> - Since sometimes I need to use modestly really large datasets (60000*300 > matrix), I wonder if I can do that in R at all? More adequately: is R > scalable without limits by brute force (adding more CPU/RAM)?`Really large' is relative. That's a 144Mb dataset and it should run happily in 512Mb or so (at least on Linux). We are starting to get datasets 10x that. As I understand it Stata is on Windows, and there are seem to be some problems with scaling on Windows (that was not designed with very large processes in mind).> - I noted, that R can use SQL datasources. Since it is really the case that > one have to use both huge amount of records _and_ variables, an SQL+R > combination might be one for me. Is it right? How fast would this be?That's certainly what we are looking at, as well as auxiliary awk scripts (I would have used Perl, but the student knows awk) to extract things from the dataset before reaing into R.> - Browsing the package lists, I have not seen a library for hypothesis > testing. Everybody builds it from primitives or serious people do not do > this at all?There is package ctest shipping with R, but also there is quite a lot in the last point: we do find we hardly ever use it. With large problems, the multiple-testing problems get to be quite serious. In a recent paper, we are adjusting for 50,000 simultaneous tests. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Mon, Oct 30, 2000 at 03:00:14PM -0800, A.J. Rossini wrote: No replace: Stata is commercial, R is GNU. There is a purity issue here.> where I work at, which is mostly consulting/collab), but I do see it > augmenting Stata, especially with respect to graphics (which are > horrid under Stata).Yes, Stata graphics is terrible (although they say it will improve). But how do you exchange datafiles? Dump to ascii and infile? Thanks, Zsombor -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Tue, 31 Oct 2000, Zsombor Cseres-Gergely wrote:> On Mon, Oct 30, 2000 at 05:27:29PM -0600, Douglas Bates wrote: > > > You may want to try "make check" after configuring and compiling R. > > That should use the postscript graphics device driver rather than > > anything related to X. We would be very interested in hearing if that > > succeeds. > > OK, one good and a bad news. Good news is that R runs with option --gui=none. > It does demo()-s that does not involve graphics. It does this at a failry good > speed, and can allocate all my ram (I have not tried more, though) > > Bad news is that make check fails saying: > > Fatal error: The X11 shared library could not be loaded. > The error was /home/zs/R-1.1.1/bin/R_X11.so: undefined symbol: R_GlobalEnv >If you configure R using the --without-x option it should not use the X libraries at all. This may work better in your case. -thomas Thomas Lumley Assistant Professor, Biostatistics University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._