Hi all, I am faced with the situation where I want to store/analyze relatively large, organized sets of numerical data, which depend on a number of conditions (biological properties, exposure times, concentrations etc etc). Imagine about a hundred dataframes of a few thousand numerical values, with some annotation in text for some entries. Intuitively, I would like to be able to slice the data in a 'data- cube' kind of way to query, analyze, cluster, fit etc., which resembles the database data-cube way of thinking common in de db world these days. ( http://en.wikipedia.org/wiki/Data_cube ) I have no knowledge of a package that supports such things in an elegant way within R. If this exists, please point me to it. Also considering implementing a similar setup myself, I started wondering about the possibility of use references (or "pointers" aargh) to dataframes and store them in a list etc. Separate lists can then represent different 'views' on the shared instance dataframes etc. I have no knowledge if that is even possible in R, and if that is even the smart way to do it. If someone could provide some help, that would be great. Other option is of course to link to MySQL and do all data handling in that way. Also considering that. Any thoughts/hints would be appreciated ! thanks, Piet -- Dr. P. van Remortel Intelligent Systems Lab Dept. of Mathematics and Computer Science University of Antwerp Belgium http://www.islab.ua.ac.be +32 3 265 33 57 (secr.) [[alternative HTML version deleted]]
> Intuitively, I would like to be able to slice the data in a 'data- > cube' kind of way to query, analyze, cluster, fit etc., which > resembles the database data-cube way of thinking common in de db > world these days. ( http://en.wikipedia.org/wiki/Data_cube ) > > I have no knowledge of a package that supports such things in an > elegant way within R. If this exists, please point me to it.Have a look at the reshape package (http://had.co.nz/reshape) which supports some of these operations. Hadley
What kind of operations do you need to be able to do? I frequently use 3 and higher dimensional arrays for storing data, and then I use indexing operations to extract slices of data, or sometimes apply() and friends to process the data. The abind() function (in the 'abind' package) will bind together vectors and arrays into higher dimensional arrays -- it might come in handy for you. -- Tony Plate Piet van Remortel wrote:> Hi all, > > I am faced with the situation where I want to store/analyze > relatively large, organized sets of numerical data, which depend on a > number of conditions (biological properties, exposure times, > concentrations etc etc). Imagine about a hundred dataframes of a few > thousand numerical values, with some annotation in text for some > entries. > > Intuitively, I would like to be able to slice the data in a 'data- > cube' kind of way to query, analyze, cluster, fit etc., which > resembles the database data-cube way of thinking common in de db > world these days. ( http://en.wikipedia.org/wiki/Data_cube ) > > I have no knowledge of a package that supports such things in an > elegant way within R. If this exists, please point me to it. > > Also considering implementing a similar setup myself, I started > wondering about the possibility of use references (or "pointers" > aargh) to dataframes and store them in a list etc. Separate lists > can then represent different 'views' on the shared instance > dataframes etc. I have no knowledge if that is even possible in R, > and if that is even the smart way to do it. If someone could provide > some help, that would be great. > > Other option is of course to link to MySQL and do all data handling > in that way. Also considering that. > > Any thoughts/hints would be appreciated ! > > thanks, > > Piet > > > > -- > Dr. P. van Remortel > Intelligent Systems Lab > Dept. of Mathematics and Computer Science > University of Antwerp > Belgium > http://www.islab.ua.ac.be > +32 3 265 33 57 (secr.) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
In case the other replies aren't to your liking, and you want to write something yourself... Piet van Remortel <piet.vanremortel at gmail.com> writes: [snip]> Also considering implementing a similar setup myself, I started > wondering about the possibility of use references (or "pointers" > aargh) to dataframes and store them in a list etc. Separate listsMy own experimentation with this is to create an S4 'View' class that indexes / subsets / accesses small parts of the 'big' data, with the actual data treated essentially as 'read-only' or otherwise abstracted out of memory. Something along the lines of setClass("ViewSet", representation=representation( data="environment", # environments are reference-like idx="list" # 1 element per dimension, or something more clever )) setMethod("initialize", signature(.Object="ViewSet"), function(.Object, ...) { env <- new.env() ## get the big data: arguments to "new" / SQL query / ??? ## assign big data to env (e.g., see below) then .Object at env <- env ## set up idx ## ... .Object }) setMethod("[", signature(x="ViewSet"), function (x, i, j, ..., drop = TRUE) { ## adjust x at idx, maybe querying x at data for help }) setReplaceMethod("[", signature(x="ViewSet"), function (x, i, j, ..., value) ## adjust x at idx[i, j, ... ## return x, i.e., a ViewSet -- bigData not changed / copied })> can then represent different 'views' on the shared instance > dataframes etc. I have no knowledge if that is even possible in R, > and if that is even the smart way to do it. If someone could provide > some help, that would be great. > > Other option is of course to link to MySQL and do all data handling > in that way. Also considering that.or do both, i.e., write ViewSqlSet to 'contain' ViewSet, etc.> Any thoughts/hints would be appreciated !Probably you could implement the same ideas in the less intimidating S3 way, using e.g., a list with makeView <- function(data) { ## e.g., 'data' a named list of commonly-sized elements, in or out ## of memory -- details depend on needs env <- new.env() for (elt in names(data)) env[[elt]] <- data[[elt]] ## initialize index idx <- list(rows=1:nrow(data[[1]]), cols=1:ncol(data[[1]])) lst <- list(env=env, idx=idx) class(lst) <- "View" lst } "[.View" <- function (x, i, j, ..., drop = TRUE) { ## x will be like lst from above, use i, j, etc to subset ## adjust and then return idx, e.g.,... x$idx$rows <- x$idx$rows[i] x } getData <- function(x, ...) UseMethod("getData") getData.View <- function(x, ...) { ## return list of subsetted elements res <- with(x, lapply(ls(env), function(elt) env[[elt]][idx$rows, idx$cols])) names(res) <- ls(x$env) res } and then...> bigView <- makeView(list(df=data.frame(x=1:100, y=100:1),+ m=matrix(1:200, ncol=2)))> smallView <- bigView[1:5,] > getData(smallView) ## copies, but only the 'small' data$df x y 5 5 96 4 4 97 3 3 98 2 2 99 1 1 100 $m [,1] [,2] [1,] 5 105 [2,] 4 104 [3,] 3 103 [4,] 2 102 [5,] 1 101 Obviously a hack, but perhaps it gets you going...> thanks, > > Piet > > > > -- > Dr. P. van Remortel > Intelligent Systems Lab > Dept. of Mathematics and Computer Science > University of Antwerp > Belgium > http://www.islab.ua.ac.be > +32 3 265 33 57 (secr.) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Martin T. Morgan Bioconductor / Computational Biology http://bioconductor.org
Piet van Remortel <piet.vanremortel at gmail.com> writes:> Hi all, >[...]> Intuitively, I would like to be able to slice the data in a 'data- > cube' kind of way to query, analyze, cluster, fit etc., which > resembles the database data-cube way of thinking common in de db > world these days. ( http://en.wikipedia.org/wiki/Data_cube ) > > I have no knowledge of a package that supports such things in an > elegant way within R. If this exists, please point me to it.[...] If non-R systtems are an option for you, please have a look at PALO http://www.palo.net/ or Mondrian http://sourceforge.net/projects/mondrian Maybe writing an interface to these systems may be easier than implementing it... HTH, Jens