Hello everyone, I need to create and work with some big matrices that actually have somewhat over 2 million columns and 117 rows. To do some calculations on such big matrices R just needs too much memory for my PC (4GB installed). So I need a solution to work with large datasets. I'm trying to use the ff-package but I don't think I really understand the whole functionality of the package. Hopefully someone can help me either with the ff-package or a different solution. I am saving some calculated matrices as ff-objects as follows: require(ff) nr <- 117; nc <- 50 dat <- sample(0:100, size=(nr*nc), replace=TRUE) a <- matrix(dat, nrow=nr) ncols <- (nc*(nc-1))/2 b <- ff(vmode="double", dim=c(nr, ncols)) namb <- vector(mode="character", length=ncols) x <- 1 for(i in 1:(nc-1)){ for(j in (i+1):nc){ b[,x] <- a[,i]+a[,j] namb[x] <- paste(i, "_", j, sep="") x <- x+1 } } dimnames(b)[[2]] <- namb After the above step I need to convert my ff_matrix to a data.frame to discretize the whole matrix and calculate the mutual information. The calculated result should be saved as an ffdf-object or something similar. require(infotheo) disc <- as.ffdf(discretize(as.data.frame(as.ffdf(cc)), disc="equalwidth", nbins=5)) This won't work. After this step it somehow loses the path to the working directory. As soon as I try to discretize the next data.frame I get the following message: Error in if (dfile == getOption("fftempdir")) finalizer <- "delete" else finalizer <- "close" : Argument has length 0 Error in setwd(cwd) : character as argument expected I would be really glad if anybody can help me understanding the functionality and show me how to convert between the different data types. Thanks in advance, Anne S.
Jens Oehlschlägel
2010-Apr-15 18:56 UTC
[R] how to work with big matrices and the ff-package?
Anne, ?> After the above step I need to convert my ff_matrix to a data.frame to discretize the whole matrix and calculate the mutual information.> The calculated result should be saved as an ffdf-object or something similar. > disc <- as.ffdf(discretize(as.data.frame(as.ffdf(ffmat)), disc="equalwidth", nbins=5))? ffdf are ff's aquivalent to data.frames: they handle many rows (2^31-1) and a limited number of columns (with potentially different column types). Like data.frames, they are not suitable for millions of columns. You probably want to store your data in one big ff matrix. If you use ff objects because you don't have the RAM for standard R objects, converting ff to a data.frame is not an option because it will require too much RAM. If 'discretize' expects a data.frame, you cannot call it on an ff matrix either. But if 'discretize' works on single columns, you can call discretize on chunks of columns that you coerce to data.frames. ? something like for (i in chunk(from=1, to=ncol(ffmat), by=10)) ffmat[,i] <- as.matrix(discretize(as.data.frame(ffmat[,i]))) ? If discretize returns integers, you might want to write the results rather to an integer ff matrix because this saves disk space and improves caching. ? HTH Jens Oehlschl?gel ? ? ? ?
Jens Oehlschlägel
2010-Apr-15 21:26 UTC
[R] how to work with big matrices and the ff-package?
Anne,> After the above step I need to convert my ff_matrix to a data.frame to discretize the whole matrix and calculate the mutual information.> The calculated result should be saved as an ffdf-object or something similar. > disc <- as.ffdf(discretize(as.data.frame(as.ffdf(ffmat)), disc="equalwidth", nbins=5))ffdf are ff's aquivalent to data.frames: they handle many rows (2^31-1) and a limited number of columns (with potentially different column types). Like data.frames, they are not suitable for millions of columns. You probably want to store your data in one big ff matrix. If you use ff objects because you don't have the RAM for standard R objects, converting ff to a data.frame is not an option because it will require too much RAM. If 'discretize' expects a data.frame, you cannot call it on an ff matrix either. But if 'discretize' works on single columns, you can call discretize on chunks of columns that you coerce to data.frames. something like for (i in chunk(from=1, to=ncol(ffmat), by=10)) ffmat[,i] <- as.matrix(discretize(as.data.frame(ffmat[,i]))) If discretize returns integers, you might want to write the results rather to an integer ff matrix because this saves disk space and improves caching. HTH Jens Oehlschl?gel
Seemingly Similar Threads
- ff package: reading selected columns from csv
- How to specify ff object filepaths when reading a CSV file into a ff data frame.
- Any way to get read.table.ffdf() (in the ff package) to pass colClasses or comment.char parameters through to read.fwf() ?
- Reading in csv data with ff package
- ff object in lapply function