thr3ads.net - R help - [R] data storage/cubes and pointers in R [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Piet van Remortel

2006-Nov-09 16:53 UTC

[R] data storage/cubes and pointers in R

Hi all,

I am faced with the situation where I want to store/analyze  
relatively large, organized sets of numerical data, which depend on a  
number of conditions (biological properties, exposure times,  
concentrations etc etc).  Imagine about a hundred dataframes of a few  
thousand numerical values, with some annotation in text for some  
entries.

Intuitively, I would like to be able to slice the data in a 'data- 
cube' kind of way to query, analyze, cluster, fit etc., which  
resembles the database data-cube way of thinking common in de db  
world these days. ( http://en.wikipedia.org/wiki/Data_cube )

I have no knowledge of a package that supports such things in an  
elegant way within R.  If this exists, please point me to it.

Also considering implementing a similar setup myself, I started  
wondering about the possibility of use references (or "pointers"  
aargh) to dataframes and store them in a list etc.   Separate lists  
can then represent different 'views' on the shared instance  
dataframes etc.   I have no knowledge if that is even possible in R,  
and if that is even the smart way to do it.  If someone could provide  
some help, that would be great.

Other option is of course to link to MySQL and do all data handling  
in that way.  Also considering that.

Any thoughts/hints would be appreciated !

thanks,

Piet



--
Dr. P. van Remortel
Intelligent Systems Lab
Dept. of Mathematics and Computer Science
University of Antwerp
Belgium
http://www.islab.ua.ac.be
+32 3 265 33 57 (secr.)


	[[alternative HTML version deleted]]

hadley wickham

2006-Nov-09 18:00 UTC

head link

[R] data storage/cubes and pointers in R

> Intuitively, I would like to be able to slice the data in a 'data-
> cube' kind of way to query, analyze, cluster, fit etc., which
> resembles the database data-cube way of thinking common in de db
> world these days. ( http://en.wikipedia.org/wiki/Data_cube )
>
> I have no knowledge of a package that supports such things in an
> elegant way within R.  If this exists, please point me to it.
Have a look at the reshape package (http://had.co.nz/reshape) which
supports some of these operations.

Hadley

Tony Plate

2006-Nov-09 21:12 UTC

head link

[R] data storage/cubes and pointers in R

What kind of operations do you need to be able to do?  I frequently use 
3 and higher dimensional arrays for storing data, and then I use 
indexing operations to extract slices of data, or sometimes apply() and 
friends to process the data.

The abind() function (in the 'abind' package) will bind together vectors
and arrays into higher dimensional arrays -- it might come in handy for you.

-- Tony Plate

Piet van Remortel wrote:> Hi all,
> 
> I am faced with the situation where I want to store/analyze  
> relatively large, organized sets of numerical data, which depend on a  
> number of conditions (biological properties, exposure times,  
> concentrations etc etc).  Imagine about a hundred dataframes of a few  
> thousand numerical values, with some annotation in text for some  
> entries.
> 
> Intuitively, I would like to be able to slice the data in a 'data- 
> cube' kind of way to query, analyze, cluster, fit etc., which  
> resembles the database data-cube way of thinking common in de db  
> world these days. ( http://en.wikipedia.org/wiki/Data_cube )
> 
> I have no knowledge of a package that supports such things in an  
> elegant way within R.  If this exists, please point me to it.
> 
> Also considering implementing a similar setup myself, I started  
> wondering about the possibility of use references (or "pointers"
> aargh) to dataframes and store them in a list etc.   Separate lists  
> can then represent different 'views' on the shared instance  
> dataframes etc.   I have no knowledge if that is even possible in R,  
> and if that is even the smart way to do it.  If someone could provide  
> some help, that would be great.
> 
> Other option is of course to link to MySQL and do all data handling  
> in that way.  Also considering that.
> 
> Any thoughts/hints would be appreciated !
> 
> thanks,
> 
> Piet
> 
> 
> 
> --
> Dr. P. van Remortel
> Intelligent Systems Lab
> Dept. of Mathematics and Computer Science
> University of Antwerp
> Belgium
> http://www.islab.ua.ac.be
> +32 3 265 33 57 (secr.)
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Martin Morgan

2006-Nov-09 22:26 UTC

head link

[R] data storage/cubes and pointers in R

In case the other replies aren't to your liking, and you want to write
something yourself...

Piet van Remortel <piet.vanremortel at gmail.com> writes:

[snip]
> Also considering implementing a similar setup myself, I started  
> wondering about the possibility of use references (or "pointers"
> aargh) to dataframes and store them in a list etc.   Separate lists
My own experimentation with this is to create an S4 'View' class that
indexes / subsets / accesses small parts of the 'big' data, with the
actual data treated essentially as 'read-only' or otherwise abstracted
out of memory. Something along the lines of

setClass("ViewSet",
         representation=representation(
           data="environment", # environments are reference-like
           idx="list" # 1 element per dimension, or something more
clever
           ))

setMethod("initialize",
          signature(.Object="ViewSet"),
          function(.Object, ...) {
              env <- new.env()
              ## get the big data: arguments to "new" / SQL query /
???
              ## assign big data to env (e.g., see below) then
              .Object at env <- env
              ## set up idx
              ## ...
              .Object
          })

setMethod("[",
          signature(x="ViewSet"),
          function (x, i, j, ..., drop = TRUE) {
              ## adjust x at idx, maybe querying x at data for help
          })

setReplaceMethod("[",
                 signature(x="ViewSet"),
                 function (x, i, j, ..., value) 
                 ## adjust x at idx[i, j, ...
                 ## return x, i.e., a ViewSet -- bigData not changed / copied
          })
> can then represent different 'views' on the shared instance  
> dataframes etc.   I have no knowledge if that is even possible in R,  
> and if that is even the smart way to do it.  If someone could provide  
> some help, that would be great.
>
> Other option is of course to link to MySQL and do all data handling  
> in that way.  Also considering that.
or do both, i.e., write ViewSqlSet to 'contain' ViewSet, etc.
> Any thoughts/hints would be appreciated !
Probably you could implement the same ideas in the less intimidating
S3 way, using e.g., a list with

makeView <- function(data) {
    ## e.g., 'data' a named list of commonly-sized elements, in or out
    ## of memory -- details depend on needs
    env <- new.env()
    for (elt in names(data)) env[[elt]] <- data[[elt]]
    ## initialize index
    idx <- list(rows=1:nrow(data[[1]]), cols=1:ncol(data[[1]]))
    lst <- list(env=env, idx=idx)
    class(lst) <- "View"
    lst
}

"[.View" <- function (x, i, j, ..., drop = TRUE) {
    ## x will be like lst from above, use i, j, etc to subset
    ## adjust and then return idx, e.g.,...
    x$idx$rows <- x$idx$rows[i]
    x
}

getData <- function(x, ...) UseMethod("getData")

getData.View <- function(x, ...) {
    ## return list of subsetted elements
    res <- with(x,
                lapply(ls(env), function(elt) env[[elt]][idx$rows, idx$cols]))
    names(res) <- ls(x$env)
    res
}

and then...
> bigView <- makeView(list(df=data.frame(x=1:100, y=100:1),
+     m=matrix(1:200, ncol=2)))> smallView <- bigView[1:5,]
> getData(smallView) ## copies, but only the 'small' data$df
  x   y
5 5  96
4 4  97
3 3  98
2 2  99
1 1 100

$m
     [,1] [,2]
[1,]    5  105
[2,]    4  104
[3,]    3  103
[4,]    2  102
[5,]    1  101

Obviously a hack, but perhaps it gets you going...
> thanks,
>
> Piet
>
>
>
> --
> Dr. P. van Remortel
> Intelligent Systems Lab
> Dept. of Mathematics and Computer Science
> University of Antwerp
> Belgium
> http://www.islab.ua.ac.be
> +32 3 265 33 57 (secr.)
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Martin T. Morgan
Bioconductor / Computational Biology
http://bioconductor.org

Jens Scheidtmann

2006-Nov-16 20:34 UTC

head link

[R] data storage/cubes and pointers in R

Piet van Remortel <piet.vanremortel at gmail.com> writes:
> Hi all,
>
[...]> Intuitively, I would like to be able to slice the data in a 'data- 
> cube' kind of way to query, analyze, cluster, fit etc., which  
> resembles the database data-cube way of thinking common in de db  
> world these days. ( http://en.wikipedia.org/wiki/Data_cube )
>
> I have no knowledge of a package that supports such things in an  
> elegant way within R.  If this exists, please point me to it.[...]

If non-R systtems are an option for you, please have a look at PALO
http://www.palo.net/ or Mondrian http://sourceforge.net/projects/mondrian

Maybe writing an interface to these systems may be easier than
implementing it...
 
HTH,

Jens

Seemingly Similar Threads

Search for more reasonably related threads

R help - Nov 2006 - data storage/cubes and pointers in R

[R] data storage/cubes and pointers in R

[R] data storage/cubes and pointers in R

[R] data storage/cubes and pointers in R

[R] data storage/cubes and pointers in R

[R] data storage/cubes and pointers in R

Seemingly Similar Threads