Matthew Robinson
2014-Apr-22 06:32 UTC
[R] lm models over all possible pairwise combinations of the columns of two matrices
Dear all, I am working through a problem at the moment and have got stuck. I have searched around on the help list for assistance but could not find anything - but apologies if I have missed something. A dummy example of my problem is below. I will continue to work on it, but any help would be greatly appreciated. Thanks in advance for your time. Best wishes, Matt I have a matrix of response variables: p<-matrix(c(rnorm(120,1), rnorm(120,1), rnorm(120,1)), 120,3) and two matrices of covariates: g<-matrix(c(rep(1:3, each=40), rep(3:1, each=40), rep(1:3, 40)), 120,3) m<-matrix(c(rep(1:2, 60), rep(2:1, 60), rep(1:2, each=60)), 120,3) For all combinations of the columns of the covariate matrices g and m I want to run these two models: test <- function(uniq_m, uniq_g, p = p) { full <- lm(p ~ factor(uniq_m) * factor(uniq_g)) null <- lm(p ~ factor(uniq_m) + factor(uniq_g)) return(list('f'=full, 'n'=null)) } So I want to test for an interaction between column 1 of m and column 1 of g, then column 2 of m and column 1 of g, then column 2 of m and column 2 of g...and so forth across all possible pairwise interactions. The response variable is the same each time and is a matrix containing multiple columns. So far, I can do this for a single combination of columns: test_1 <- test(m[ ,1], g[ ,1], p) And I can also run the model over all columns of m and one coloumn of g: test_2 <- apply(m, 2, function(uniq_m) { test(uniq_m, g[ ,1], p = p) }) I can then get the F statistics for each response variable of each model: sapply(summary(test_2[[1]]$f), function(x) x$fstatistic) sapply(summary(test_2[[1]]$n), function(x) x$fstatistic) And I can compare models for each response variable using an F-test: d1<-colSums(matrix(residuals(test_2[[1]]$n),nrow(g),ncol(p))^2) d2<-colSums(matrix(residuals(test_2[[2]]$f),nrow(g),ncol(p))^2) F<-((d1-d2) / (d2/114)) My question is how do I run the lm models over all combinations of columns from the m and the g matrix, and get the F-statistics? While this is a dummy example, the real analysis will have a response matrix that is 700 x 8000, and the covariate matrices will be 700 x 4000 and 700 x 100 so I need something that is as fast as possible. [[alternative HTML version deleted]]
Bert Gunter
2014-Apr-22 14:00 UTC
[R] lm models over all possible pairwise combinations of the columns of two matrices
Well... If my arithmetic and understanding is correct, that's 32 billion combinations, which, to put it politely, is nuts. As all you'll be doing is generating random numbers anyway, the fastest way to do this is just to use a random number generator. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." H. Gilbert Welch On Mon, Apr 21, 2014 at 11:32 PM, Matthew Robinson <m.robinson11 at uq.edu.au> wrote:> Dear all, > > I am working through a problem at the moment and have got stuck. I have searched around on the help list for assistance but could not find anything - but apologies if I have missed something. A dummy example of my problem is below. I will continue to work on it, but any help would be greatly appreciated. > > Thanks in advance for your time. > > Best wishes, > Matt > > > I have a matrix of response variables: > > p<-matrix(c(rnorm(120,1), > rnorm(120,1), > rnorm(120,1)), > 120,3) > > and two matrices of covariates: > > g<-matrix(c(rep(1:3, each=40), > rep(3:1, each=40), > rep(1:3, 40)), > 120,3) > m<-matrix(c(rep(1:2, 60), > rep(2:1, 60), > rep(1:2, each=60)), > 120,3) > > For all combinations of the columns of the covariate matrices g and m I want to run these two models: > > test <- function(uniq_m, uniq_g, p = p) { > > > full <- lm(p ~ factor(uniq_m) * factor(uniq_g)) > null <- lm(p ~ factor(uniq_m) + factor(uniq_g)) > return(list('f'=full, 'n'=null)) > } > > So I want to test for an interaction between column 1 of m and column 1 of g, then column 2 of m and column 1 of g, then column 2 of m and column 2 of g...and so forth across all possible pairwise interactions. The response variable is the same each time and is a matrix containing multiple columns. > > > So far, I can do this for a single combination of columns: > > test_1 <- test(m[ ,1], g[ ,1], p) > > And I can also run the model over all columns of m and one coloumn of g: > > test_2 <- apply(m, 2, function(uniq_m) { > test(uniq_m, g[ ,1], p = p) > }) > > > I can then get the F statistics for each response variable of each model: > > sapply(summary(test_2[[1]]$f), function(x) x$fstatistic) > sapply(summary(test_2[[1]]$n), function(x) x$fstatistic) > > And I can compare models for each response variable using an F-test: > > d1<-colSums(matrix(residuals(test_2[[1]]$n),nrow(g),ncol(p))^2) > d2<-colSums(matrix(residuals(test_2[[2]]$f),nrow(g),ncol(p))^2) > F<-((d1-d2) / (d2/114)) > > > My question is how do I run the lm models over all combinations of columns from the m and the g matrix, and get the F-statistics? While this is a dummy example, the real analysis will have a response matrix that is 700 x 8000, and the covariate matrices will be 700 x 4000 and 700 x 100 so I need something that is as fast as possible. > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.