useR's, I am writing a program in which the input can be multidimensional. As of now, to hold the input, I have created an n by m matrix where n is the number of observations and m is the number of variables. The data that I could potentially use can contain well over 20,000 observations. Can a simple matrix be used for this or would it be better and more efficient to create an external database to hold the data. If so, should the database be created using C and how would I do this (seeing as that I have never programmed in C)? Any help would be greatly appreciated. Thank you Derek -- View this message in context: http://www.nabble.com/creating-a-database-tp14375875p14375875.html Sent from the R help mailing list archive at Nabble.com.
What are you intending to do with the data? How big is 'm'? How do you want to access the data? You can always put it in a SQL database that R can access and then pull out the rows that you are interested in. If 'm' is 100, then if you are just keeping numeric data, this will only require 16MB of memory, so you can just keep it in memory. Some more information about the characteristics of the data and what you want to do with it are required to determine what might be the appropriate method for storing/accessing it. On Dec 17, 2007 10:10 PM, dxc13 <dxc13 at health.state.ny.us> wrote:> > useR's, > > I am writing a program in which the input can be multidimensional. As of > now, to hold the input, I have created an n by m matrix where n is the > number of observations and m is the number of variables. The data that I > could potentially use can contain well over 20,000 observations. > > Can a simple matrix be used for this or would it be better and more > efficient to create an external database to hold the data. If so, should > the database be created using C and how would I do this (seeing as that I > have never programmed in C)? > > Any help would be greatly appreciated. Thank you > > Derek > -- > View this message in context: http://www.nabble.com/creating-a-database-tp14375875p14375875.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
> Can a simple matrix be used for this or would it be better and more > efficient to create an external database to hold the data. If so, > should the database be created using C and how would I do this (seeing > as that I have never programmed in C)?You don't want to be down at the C level, most likely: it would be much more straightforward and programmer-efficient to use one of the available bindings to one of the available open-source databases. R has useful / usable bindings to postgresql, sqlite, and mysql, among many others. These are, however, more generally useful when you reach the point that you simply can't manage the volume of your data in R objects or in data frames. [And, well, you can go a LONG way with intelligently named R objects. :-)] --elijah
If all your entries are double precision then you are using 8 bytes per entry, so 20,000*n entries are just 160,000*n bytes, i.e. less than 160*n Kb. If your n is 100 you get 16 Mb which is not that much (especially if you pre-allocate it only once). So just use the matrix and don't worry! --- dxc13 <dxc13 at health.state.ny.us> wrote:> > useR's, > > I am writing a program in which the input can be > multidimensional. As of > now, to hold the input, I have created an n by m > matrix where n is the > number of observations and m is the number of > variables. The data that I > could potentially use can contain well over 20,000 > observations. > > Can a simple matrix be used for this or would it be > better and more > efficient to create an external database to hold the > data. If so, should > the database be created using C and how would I do > this (seeing as that I > have never programmed in C)? > > Any help would be greatly appreciated. Thank you > > Derek > -- > View this message in context: >http://www.nabble.com/creating-a-database-tp14375875p14375875.html> Sent from the R help mailing list archive at > Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >