David Romano
2012-Dec-12 17:14 UTC
[R] using 'apply' to apply princomp to an array of datasets
Hi everyone, Suppose I have a 3D array of datasets, where say dimension 1 corresponds to cases, dimension 2 to datasets, and dimension 3 to observations within a dataset. As an example, suppose I do the following:> x <- sample(1:20, 48, replace=TRUE) > datasets <- array(x, dim=c(4,3,2))Here, for each j=1,2,3, I'd like to think of datasets[,j,] as a single data matrix with four cases and two observations. Now, I'd like to be able to do the following: apply pca to each dataset, and create a matrix of the first principal component scores. In this example, I could do:> pcl<-apply(datasets,2,princomp)which yields a list of princomp output, one for each dataset, so that the vector of first principal component scores for dataset 1 is obtained by> score1set1 <- pcl[[1]]$scores[,1]and I could then obtain the desired matrix by> score1matrix <- cbind( score1set1, score1set2, score1set3)So my first question is: 1) how could I use *apply to do this? I'm having trouble because pcl is a list of lists, so I can't use, say, do.call(cbind, ...) without first having a list of the first component score vectors, which I'm not sure how to produce. My second question is: 2) Having answered question 1), now suppose there may be datasets containing NA value -- how could I select the subset of values from dimension 2 corresponding to the datasets for which this is true (again using *apply?)? Thanks in advance for any light you might be able to shed on these questions! David Romano [[alternative HTML version deleted]]
David Romano
2012-Dec-12 17:27 UTC
[R] using 'apply' to apply princomp to an array of datasets
Sorry, I just realized I didn't send the message below in plain text. -David Romano On Wed, Dec 12, 2012 at 9:14 AM, David Romano <dromano at stanford.edu> wrote:> > Hi everyone, > > Suppose I have a 3D array of datasets, where say dimension 1 corresponds > to cases, dimension 2 to datasets, and dimension 3 to observations within a > dataset. As an example, suppose I do the following: > > > x <- sample(1:20, 48, replace=TRUE) > > datasets <- array(x, dim=c(4,3,2)) > > Here, for each j=1,2,3, I'd like to think of datasets[,j,] as a single > data matrix with four cases and two observations. Now, I'd like to be able > to do the following: apply pca to each dataset, and create a matrix of the > first principal component scores. > > In this example, I could do: > > > pcl<-apply(datasets,2,princomp) > > which yields a list of princomp output, one for each dataset, so that the > vector of first principal component scores for dataset 1 is obtained by > > > score1set1 <- pcl[[1]]$scores[,1] > > and I could then obtain the desired matrix by > > > score1matrix <- cbind( score1set1, score1set2, score1set3) > > > So my first question is: 1) how could I use *apply to do this? I'm having > trouble because pcl is a list of lists, so I can't use, say, do.call(cbind, > ...) without first having a list of the first component score vectors, which > I'm not sure how to produce. > > My second question is: 2) Having answered question 1), now suppose there > may be datasets containing NA value -- how could I select the subset of > values from dimension 2 corresponding to the datasets for which this is true > (again using *apply?)? > > Thanks in advance for any light you might be able to shed on these > questions! > > David Romano
Rui Barradas
2012-Dec-12 18:12 UTC
[R] using 'apply' to apply princomp to an array of datasets
Hello, As for the first question try scoreset <- lapply(pcl, function(x) x$scores[, 1]) do.call(cbind, scoreset) As for the second question, you want to know which columns in 'datasets' have NA's? colidx <- apply(datasets, 2, function(x) any(is.na(x))) datasets[, colidx] # These have NA's For the column numbers you can do colnums <- which(colidx) Hope this helps, Rui Barradas Em 12-12-2012 17:14, David Romano escreveu:> Hi everyone, > > Suppose I have a 3D array of datasets, where say dimension 1 corresponds to > cases, dimension 2 to datasets, and dimension 3 to observations within a > dataset. As an example, suppose I do the following: > >> x <- sample(1:20, 48, replace=TRUE) >> datasets <- array(x, dim=c(4,3,2)) > Here, for each j=1,2,3, I'd like to think of datasets[,j,] as a single data > matrix with four cases and two observations. Now, I'd like to be able to > do the following: apply pca to each dataset, and create a matrix of the > first principal component scores. > > In this example, I could do: > >> pcl<-apply(datasets,2,princomp) > which yields a list of princomp output, one for each dataset, so that the > vector of first principal component scores for dataset 1 is obtained by > >> score1set1 <- pcl[[1]]$scores[,1] > and I could then obtain the desired matrix by > >> score1matrix <- cbind( score1set1, score1set2, score1set3) > > So my first question is: 1) how could I use *apply to do this? I'm having > trouble because pcl is a list of lists, so I can't use, say, do.call(cbind, > ...) without first having a list of the first component score vectors, > which I'm not sure how to produce. > > My second question is: 2) Having answered question 1), now suppose there > may be datasets containing NA value -- how could I select the subset of > values from dimension 2 corresponding to the datasets for which this is > true (again using *apply?)? > > Thanks in advance for any light you might be able to shed on these > questions! > > David Romano > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.