Juliet Hannah
2009-Jul-15 19:08 UTC
[R] strategy to iterate over repeated measures/longitudinal data
Hi Group, Create some example data. set.seed(1) wide_data <- data.frame( id=c(1:10), predictor1 = sample(c("a","b"),10,replace=TRUE), predictor2 = sample(c("a","b"),10,replace=TRUE), predictor3 = sample(c("a","b"),10,replace=TRUE), measurement1=rnorm(10), measurement2=rnorm(10)) head(wide_data) id predictor1 predictor2 predictor3 measurement1 measurement2 1 1 a a b -0.04493361 -0.05612874 2 2 a a a -0.01619026 -0.15579551 3 3 b b b 0.94383621 -1.47075238 4 4 b a a 0.82122120 -0.47815006 5 5 a b a 0.59390132 0.41794156 6 6 b a a 0.91897737 1.35867955 The measurements are repeated measures, and I am looking at one predictor at a time. In the actual problem, there are around 400,000 predictors (again, one at a time). Now, I want to use multiple measurements (the responses) to run a regression of measurements on a predictor. So I will convert this data from wide to long format. I want to iterate through each predictor. So one (inefficient) way is shown below. For each predictor: 1. create a long data set using the predictor and all measurements (using make.univ function from multilevel package) 2. run model, extract the coefficient of interest 3. go to next predictor The end result is a vector of 400,000 coefficients. I'm sure this can be improved upon. I will be running this on a unix cluster with 16G. In the wide format, there are 2000 rows (individuals). With 4 repeated measures, it seems converting everything up front could be problematic. Also, I'm not sure how to iterate through that (maybe putting it in a list). Any suggestions? Thanks for your help. Juliet Here is the inefficient, working code. library(multilevel) library(lme4) #Same data as above set.seed(1) wide_data <- data.frame( id=c(1:10), predictor1 = sample(c("a","b"),10,replace=TRUE), predictor2 = sample(c("a","b"),10,replace=TRUE), predictor3 = sample(c("a","b"),10,replace=TRUE), measurement1=rnorm(10), measurement2=rnorm(10)) #vector of names to iterate over predictor_names <- colnames(wide_data)[2:4] #vector to store coefficients mycoefs <- rep(-1,length(predictor_names)) names(mycoefs) <- predictor_names for (predictor in predictor_names) { long_data <- make.univ( data.frame(wide_data$id,wide_data[,predictor]), data.frame( wide_data$measurement1, wide_data$measurement2 ) ) names(long_data) <- c('id', 'predictor', 'time','measurement') myfit <- lmer(measurement ~ predictor + (1|id),data=long_data) mycoefs[predictor] <- myfit at fixef[2] } mycoefs
Possibly Parallel Threads
- ANOVA test to decide whether to use multiple linear regression or linear mixed effects model
- Different x-axis scales using c() in latticeExtra
- how do i find the annual maximun within several years?
- Plotting residuals from a sem object
- GBM package: Extract coefficients