Michael Pearmain
2011-Dec-20 11:21 UTC
[R] Convert ragged list to structured matrix efficiently
Hi All, I'm wanting to convert a ragged list of values into a structured matrix for further analysis later on, i have a solution to this problem (below) but i'm dealing with datasets upto 1GB in size, (i have 24GB of memory so can load it) but it takes a LONG time to run the code on a large dataset. I was wondering if anyone had any tips or tricks that may make this run faster? Below is some sample code of what ive been doing, (in the full version i use snowfall to spread the work via sfSapply) bhvs <- c(1,2,3,4,5,6) ragged.list <- list('23' = c(13,4,5,6,3,65,67,2), '34' = c(1,2,3,4,56,7,8), '45' = c(5,6,89,87,56)) # Define the matrix to store results cluster.data <- as.data.frame(matrix(0, length(bhvs), nrow length(ragged.list))) # Keep the names of the bhvs, names(cluster.data) <- bhvs cluster.data <- t(sapply(rep(1:length(ragged.list)), function (i) { cluster.data[i,] <- as.numeric(names(cluster.data) %in% (ragged.list[[i]][])) return(cluster.data[i,]) })) cluster.data <- matrix(unlist(cluster.data), ncol = ncol(cluster.data), dimnames = list(NULL, colnames(cluster.data)))> cluster.data1 2 3 4 5 6 [1,] 0 1 1 1 1 1 [2,] 1 1 1 1 0 0 [3,] 0 0 0 0 1 1>The returned matrix is as i desire it, with the bhv being the colnames and a binary for each row representing if it was present or not in that list Many thanks in advance Mike [[alternative HTML version deleted]]
Jean V Adams
2011-Dec-20 13:39 UTC
[R] Convert ragged list to structured matrix efficiently
Michael Pearmain wrote on 12/20/2011 05:21:42 AM:> Hi All, > > I'm wanting to convert a ragged list of values into a structured matrixfor> further analysis later on, i have a solution to this problem (below) but > i'm dealing with datasets upto 1GB in size, (i have 24GB of memory socan> load it) but it takes a LONG time to run the code on a large dataset. I > was wondering if anyone had any tips or tricks that may make this run > faster? > > Below is some sample code of what ive been doing, (in the full version i > use snowfall to spread the work via sfSapply) > > bhvs <- c(1,2,3,4,5,6) > ragged.list <- list('23' = c(13,4,5,6,3,65,67,2), > '34' = c(1,2,3,4,56,7,8), > '45' = c(5,6,89,87,56)) > > # Define the matrix to store results > cluster.data <- as.data.frame(matrix(0, length(bhvs), nrow > length(ragged.list))) > # Keep the names of the bhvs, > names(cluster.data) <- bhvs > cluster.data <- t(sapply(rep(1:length(ragged.list)), function (i) { > cluster.data[i,] <- as.numeric(names(cluster.data) %in% > (ragged.list[[i]][])) > return(cluster.data[i,]) > })) > cluster.data <- matrix(unlist(cluster.data), > ncol = ncol(cluster.data), > dimnames = list(NULL, colnames(cluster.data))) > > cluster.data > 1 2 3 4 5 6 > [1,] 0 1 1 1 1 1 > [2,] 1 1 1 1 0 0 > [3,] 0 0 0 0 1 1 > > > > The returned matrix is as i desire it, with the bhv being the colnamesand> a binary for each row representing if it was present or not in that list > > Many thanks in advance > > MikeTry this: cluster.data <- 1*t(sapply(ragged.list, function(x) bhvs %in% x)) Jean [[alternative HTML version deleted]]